Hi there,
I’m running a V3V4 dataset with my mothur-based pipeline and during screen.seqs
with optimize=start-end
. Interestingly, mothur (v1.39.5) makes an odd choice for it, which results in losing all but 1 nt in almost all seqs. See below the details.
I know that it’s an old version but I haven’t seen any mention about this other than force specific start and end positions. The forum examples I saw weren’t as extreme as this one though
mothur > align.seqs(fasta=current, reference=/share/amplicon/otureporter/db/silva.nr_v132.v3v4.align, processors=12, flip=T)
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using 12 processors.
Reading in the /share/amplicon/otureporter/db/silva.nr_v132.v3v4.align template sequences... DONE.
It took 77 to read 213119 sequences.
Aligning sequences from /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.fasta ...
[WARNING]: Some of your sequences generated alignments that eliminated too many bases, a list is provided in /results/amplicon/NGU6358/mot
hur_run/NGU6358.trim.contigs.good.unique.flip.accnos. If the reverse compliment proved to be better it was reported.
It took 3470 secs to align 1748916 sequences.
Output File Names:
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.align
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.align.report
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.flip.accnos
mothur > summary.seqs(fasta=current, count=current)
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.count_table as input file for the count parameter.
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.align as input file for the fasta parameter.
Using 12 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 10 270 7 0 2 150067
25%-tile: 1423 17011 348 0 4 1500662
Median: 1423 17011 371 0 6 3001324
75%-tile: 1423 17011 371 0 6 4501986
97.5%-tile: 16964 17011 372 0 7 5852581
Maximum: 17011 17012 472 0 8 6002647
Mean: 2080.85 15904.5 324.537 0 5.03819
# of unique seqs: 1748916
total # of seqs: 6002647
Output File Names:
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.summary
It took 85 secs to summarize 6002647 sequences.
mothur > screen.seqs(fasta=current, count=current, summary=current, optimize=start-end, criteria=95, maxhomop=8)
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.count_table as input file for the count parameter.
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.align as input file for the fasta parameter.
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.summary as input file for the summary parameter.
Using 12 processors.
Optimizing start to 1424.
Optimizing end to 270.
Output File Names:
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.summary
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.align
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.bad.accnos
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.good.count_table
It took 381 secs to screen 1748916 sequences.
mothur > filter.seqs(fasta=current, vertical=T, trump=.)
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.align as input file for the fasta parameter.
Using 12 processors.
Creating Filter...
Running Filter...
Length of filtered alignment: 0
Number of columns removed: 17012
Length of the original alignment: 17012
Number of sequences used to construct filter: 1628144
Output File Names:
/results/amplicon/NGU6358/mothur_run/NGU6358.filter
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.filter.fasta
mothur > unique.seqs(fasta=current, count=current)
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.good.count_table as input file for the count parameter.
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.filter.fasta as input file for the fasta parameter.
1628144 1
Output File Names:
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.filter.count_table
/results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.filter.unique.fasta
mothur > summary.seqs(fasta=current, count=current)
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.filter.count_table as input file for the count parameter.
Using /results/amplicon/NGU6358/mothur_run/NGU6358.trim.contigs.good.unique.good.filter.unique.fasta as input file for the fasta parameter.
Using 12 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 0 0 0 0 1 134233
25%-tile: 0 0 0 0 1 1342330
Median: 0 0 0 0 1 2684660
75%-tile: 0 0 0 0 1 4026989
97.5%-tile: 0 0 0 0 1 5235086
Maximum: -1 -1 0 0 1 5369318
Mean: 0 0 0 0 1
# of unique seqs: 1
total # of seqs: 5369318