mothur

Running into trouble when filtering sequences

Hi again!! So I’m working on another dataset that’s grabbed samples from a couple different projects and once I get to the filter.seqs stage my sequences end up being 12 BP long. I think the trouble starts back when I go to align my sequences with SILVA. My workflow and outputs are as follows from my count table after I’ve filtered my sequences for length:

mothur >summary.seqs(count=stability.trim.contigs.good.count_table)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 150 150 0 3 1
2.5%-tile: 1 159 159 0 3 76119
25%-tile: 1 251 251 0 4 761183
Median: 1 292 292 0 5 1522366
75%-tile: 1 292 292 0 6 2283548
97.5%-tile: 1 292 292 0 6 2968612
Maximum: 1 295 295 0 89 3044730
Mean: 1 273 273 0 4
unique seqs: 1097728
total # of seqs: 3044730
It took 20 secs to summarize 3044730 sequences.
Output File Names:
stability.trim.contigs.good.unique.summary

mothur > align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)

It took 214 secs to align 1097728 sequences.
[WARNING]: 46418 of your sequences generated alignments that eliminated too many bases, a list is provided in stability.trim.contigs.good.unique.flip.accnos.
[NOTE]: 27247 of your sequences were reversed to produce a better alignment.
It took 214 seconds to align 1097728 sequences.
Output File Names:
stability.trim.contigs.good.unique.align
stability.trim.contigs.good.unique.align.report
stability.trim.contigs.good.unique.flip.accnos

mothur >summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1 13400 12 0 3 76119
25%-tile: 1 13424 244 0 4 761183
Median: 1 13424 292 0 5 1522366
75%-tile: 1 13424 292 0 6 2283548
97.5%-tile: 13389 13425 292 0 6 2968612
Maximum: 13425 13425 295 0 17 3044730
Mean: 1288 13231 262 0 4
of unique seqs: 1097728
total # of seqs: 3044730
It took 88 secs to summarize 3044730 sequences.
Output File Names:
stability.trim.contigs.good.unique.summary

mothur >screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=13389, end=13400, maxhomop=8)

It took 63 secs to screen 1097728 sequences, removed 27399.

Running command: remove.seqs(accnos=/Users/dgomezni/Desktop/reflux/stability.trim.contigs.good.unique.bad.accnos.temp, count=/Users/dgomezni/Desktop/reflux/stability.trim.contigs.good.count_table)
Removed 110417 sequences from your count file.
Output File Names:
stability.trim.contigs.good.pick.count_table

Output File Names:
stability.trim.contigs.good.unique.good.summary
stability.trim.contigs.good.unique.good.align
stability.trim.contigs.good.unique.bad.accnos
stability.trim.contigs.good.good.count_table
It took 80 secs to screen 1097728 sequences.

mothur >summary.seqs(fasta=stability.trim.contigs.good.unique.good.align, count=stability.trim.contigs.good.good.count_table, processors=16)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 13400 11 0 1 1
2.5%-tile: 1 13400 15 0 3 73358
25%-tile: 1 13424 291 0 4 733579
Median: 1 13424 292 0 5 1467157
75%-tile: 1 13424 292 0 6 2200735
97.5%-tile: 12070 13425 292 0 6 2860956
Maximum: 13389 13425 295 0 8 2934313
Mean: 1061 13419 271 0 4
of unique seqs: 1070329
total # of seqs: 2934313
It took 71 secs to summarize 2934313 sequences.
Output File Names:
stability.trim.contigs.good.unique.good.summary

mothur >filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=.)

It took 53 secs to filter 1070329 sequences.
Length of filtered alignment: 12
Number of columns removed: 13413
Length of the original alignment: 13425
Number of sequences used to construct filter: 1070329
Output File Names:
stability.filter
stability.trim.contigs.good.unique.good.filter.fasta

mothur >summary.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, vertical=T, trump=.)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 11 5 0 1 1
2.5%-tile: 1 12 7 0 2 26759
25%-tile: 1 12 7 0 2 267583
Median: 1 12 7 0 2 535165
75%-tile: 1 12 7 0 3 802747
97.5%-tile: 1 12 7 0 4 1043571
Maximum: 3 12 9 0 7 1070329
Mean: 1 11 6 0 2
of Seqs: 1070329
It took 2 secs to summarize 1070329 sequences.
Output File Names:
stability.trim.contigs.good.unique.good.filter.summary

Any and all suggestions are much, much appreciated!! Many thanks!!
JM

In the screen.seqs step, I think you want to use start=1 rather than start=13389

Pat