screen.seqs removes all basepairs from certain sequences

Angrist · November 13, 2015, 10:37am

Am analyzing a 16S dataset currently and alignment looks beautiful… But screen.seqs somehow deletes the basepairs of several sequences, making filter.seqs impossible to run. No idea what is going on there… I first thought it was due to incompatibility of count and fasta file, but that doesnt seem to be the case… Also running minlength= 300 gives the same output. Any suggestions whats wrong here?

What struck my eye was that the number of unique sequences before and after alignment is not the same… See in bold… What can I do?

mothur >
summary.seqs(count=seqs.trim.contigs.good.count_table)
Using seqs.trim.contigs.good.unique.fasta as input file for the fasta parameter.

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 300 300 0 3 1
2.5%-tile: 1 438 438 0 4 15251
25%-tile: 1 440 440 0 4 152503
Median: 1 455 455 0 5 305006
75%-tile: 1 462 462 0 6 457509
97.5%-tile: 1 465 465 0 7 594761
Maximum: 1 550 550 0 8 610011
Mean: 1 450.236 450.236 0 5.1741

of unique seqs: 429099

total # of seqs: 610011

Output File Names:
seqs.trim.contigs.good.unique.summary

It took 7 secs to summarize 610011 sequences.

mothur >
align.seqs(fasta=current,reference=silva.bacteria.fasta)
Using seqs.trim.contigs.good.unique.fasta as input file for the fasta parameter.

Using 8 processors.

Reading in the silva.bacteria.fasta template sequences… DONE.
It took 17 to read 14956 sequences.
Aligning sequences from seqs.trim.contigs.good.unique.fasta …

Reading in the silva.bacteria.fasta template sequences…

Reading in the silva.bacteria.fasta template sequences...
Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... Reading in the silva.bacteria.fasta template sequences... DONE. It took 131 to read 14956 sequences. DONE. It took 132 to read 14956 sequences. DONE. It took 133 to read 14956 sequences. DONE. It took 133 to read 14956 sequences. DONE. It took 137 to read 14956 sequences. DONE. It took 141 to read 14956 sequences. DONE. It took 142 to read 14956 sequences. Some of you sequences generated alignments that eliminated too many bases, a list is provided in seqs.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well. It took 1986 secs to align 429099 sequences.
Output File Names: seqs.trim.contigs.good.unique.align seqs.trim.contigs.good.unique.align.report seqs.trim.contigs.good.unique.flip.accnos
mothur > summary.seqs(fasta=seqs.trim.contigs.good.unique.align, count=seqs.trim.contigs.good.count_table)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6388 25316 438 0 4 15251
25%-tile: 6388 25316 440 0 4 152503
Median: 6388 25316 455 0 5 305006
75%-tile: 6388 25316 463 0 6 457509
97.5%-tile: 43116 43116 550 0 8 594761
Maximum: 43116 43116 550 0 8 610011
Mean: 6286.76 24556.5 436.136 0 5.01254

of unique seqs: 414252

total # of seqs: 610011

Output File Names:
seqs.trim.contigs.good.unique.summary

It took 404 secs to summarize 610011 sequences.

mothur >
screen.seqs(fasta=seqs.trim.contigs.good.unique.align, count=seqs.trim.contigs.good.count_table, summary=seqs.trim.contigs.good.unique.summary, start=6388, end=25316)

Using 8 processors.

Output File Names:
seqs.trim.contigs.good.unique.good.summary
seqs.trim.contigs.good.unique.good.align
seqs.trim.contigs.good.unique.bad.accnos
seqs.trim.contigs.good.good.count_table

It took 411 secs to screen 414252 sequences.

mothur >
summary.seqs(fasta=current, count=current)
Using seqs.trim.contigs.good.good.count_table as input file for the count parameter.
Using seqs.trim.contigs.good.unique.good.align as input file for the fasta parameter.

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6388 25316 440 0 4 14544
25%-tile: 6388 25318 511 0 8 145435
Median: 0 0 0 0 0 290869
75%-tile: 0 0 0 0 0 436303
97.5%-tile: 0 0 0 0 0 567193
Maximum: 6388 25318 511 0 8 581736
Mean: 1330.15 5271.83 94.0469 0 1.07592

of unique seqs: 48030

total # of seqs: 581736

Output File Names:
seqs.trim.contigs.good.unique.good.summary

It took 45 secs to summarize 581736 sequences.

westcott · November 16, 2015, 6:43pm

Have you tried the align.seqs command with flip=t? Could you try screen.seqs without the summary file option? Do you have enough disk space to write out the full aligned file and run the screen.seqs command? Insufficient hard drive space caused this issue for another mothur user.

Topic		Replies	Views
Screen.seqs getting rid of most sequences Commands in mothur	3	544	March 1, 2021
understanding screen.seqs? Commands in mothur	1	2252	January 29, 2013
Loss of bases with filter.seqs Commands in mothur	1	2145	February 22, 2012
filter.seqs Commands in mothur	4	3922	May 31, 2012
filter.seqs removes all data Commands in mothur	10	6747	January 25, 2016

screen.seqs removes all basepairs from certain sequences

of unique seqs: 429099

of unique seqs: 414252

of unique seqs: 48030

Related topics