Am analyzing a 16S dataset currently and alignment looks beautiful… But screen.seqs somehow deletes the basepairs of several sequences, making filter.seqs impossible to run. No idea what is going on there… I first thought it was due to incompatibility of count and fasta file, but that doesnt seem to be the case… Also running minlength= 300 gives the same output. Any suggestions whats wrong here?
What struck my eye was that the number of unique sequences before and after alignment is not the same… See in bold… What can I do?
mothur >
summary.seqs(count=seqs.trim.contigs.good.count_table)
Using seqs.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 300 300 0 3 1
2.5%-tile: 1 438 438 0 4 15251
25%-tile: 1 440 440 0 4 152503
Median: 1 455 455 0 5 305006
75%-tile: 1 462 462 0 6 457509
97.5%-tile: 1 465 465 0 7 594761
Maximum: 1 550 550 0 8 610011
Mean: 1 450.236 450.236 0 5.1741
of unique seqs: 429099
total # of seqs: 610011
Output File Names:
seqs.trim.contigs.good.unique.summary
It took 7 secs to summarize 610011 sequences.
mothur >
align.seqs(fasta=current,reference=silva.bacteria.fasta)
Using seqs.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using 8 processors.
Reading in the silva.bacteria.fasta template sequences… DONE.
It took 17 to read 14956 sequences.
Aligning sequences from seqs.trim.contigs.good.unique.fasta …
Reading in the silva.bacteria.fasta template sequences…
Reading in the silva.bacteria.fasta template sequences...
Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... DONE. It took 131 to read 14956 sequences. DONE. It took 132 to read 14956 sequences. DONE. It took 133 to read 14956 sequences. DONE. It took 133 to read 14956 sequences. DONE. It took 137 to read 14956 sequences. DONE. It took 141 to read 14956 sequences. DONE. It took 142 to read 14956 sequences. Some of you sequences generated alignments that eliminated too many bases, a list is provided in seqs.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well. It took 1986 secs to align 429099 sequences.
Output File Names: seqs.trim.contigs.good.unique.align seqs.trim.contigs.good.unique.align.report seqs.trim.contigs.good.unique.flip.accnos
mothur > summary.seqs(fasta=seqs.trim.contigs.good.unique.align, count=seqs.trim.contigs.good.count_table)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6388 25316 438 0 4 15251
25%-tile: 6388 25316 440 0 4 152503
Median: 6388 25316 455 0 5 305006
75%-tile: 6388 25316 463 0 6 457509
97.5%-tile: 43116 43116 550 0 8 594761
Maximum: 43116 43116 550 0 8 610011
Mean: 6286.76 24556.5 436.136 0 5.01254
of unique seqs: 414252
total # of seqs: 610011
Output File Names:
seqs.trim.contigs.good.unique.summary
It took 404 secs to summarize 610011 sequences.
mothur >
screen.seqs(fasta=seqs.trim.contigs.good.unique.align, count=seqs.trim.contigs.good.count_table, summary=seqs.trim.contigs.good.unique.summary, start=6388, end=25316)
Using 8 processors.
Output File Names:
seqs.trim.contigs.good.unique.good.summary
seqs.trim.contigs.good.unique.good.align
seqs.trim.contigs.good.unique.bad.accnos
seqs.trim.contigs.good.good.count_table
It took 411 secs to screen 414252 sequences.
mothur >
summary.seqs(fasta=current, count=current)
Using seqs.trim.contigs.good.good.count_table as input file for the count parameter.
Using seqs.trim.contigs.good.unique.good.align as input file for the fasta parameter.
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6388 25316 440 0 4 14544
25%-tile: 6388 25318 511 0 8 145435
Median: 0 0 0 0 0 290869
75%-tile: 0 0 0 0 0 436303
97.5%-tile: 0 0 0 0 0 567193
Maximum: 6388 25318 511 0 8 581736
Mean: 1330.15 5271.83 94.0469 0 1.07592
of unique seqs: 48030
total # of seqs: 581736
Output File Names:
seqs.trim.contigs.good.unique.good.summary
It took 45 secs to summarize 581736 sequences.