Bug in align.seqs?

Hi,
I’m analyzing a full Miseq run (204 samples, ~50k reads per sample) on a linux server, following the Miseq SOP. It’s going great until the align.seqs, where the consequent summary.seqs is very weird and the # of unique seqs is wrong. Please help…!
Logfile is attached:

Linux version

Using ReadLine

Running 64Bit Version

mothur v.1.34.4
Last updated: 12/22/2014

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

Type ‘quit()’ to exit program
Interactive Mode


mothur > summary.seqs(fasta=stability.trim.contigs.good.fasta, processors=8)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 250 250 0 3 1
2.5%-tile: 1 291 291 0 4 272820
25%-tile: 1 292 292 0 4 2728198
Median: 1 292 292 0 4 5456395
75%-tile: 1 292 292 0 5 8184592
97.5%-tile: 1 294 294 0 6 10639970
Maximum: 1 320 320 0 58 10912789
Mean: 1 292.112 292.112 0 4.71934

of Seqs: 10912789

Output File Names:
stability.trim.contigs.good.summary

It took 70 secs to summarize 10912789 sequences.

mothur > screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, maxambig=0, maxlength=300)

Using 8 processors.

Output File Names:
stability.trim.contigs.good.fasta
stability.trim.contigs.bad.accnos
stability.contigs.good.groups


It took 282 secs to screen 16640727 sequences.

mothur > unique.seqs(fasta=stability.trim.contigs.good.fasta)
10911586 4549033

Output File Names:
stability.trim.contigs.good.names
stability.trim.contigs.good.unique.fasta


mothur > count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)

Using 8 processors.
It took 284 secs to create a table for 10911586 sequences.


Total number of sequences: 10911586

Output File Names:
stability.trim.contigs.good.count_table


mothur > summary.seqs(count=stability.trim.contigs.good.count_table) Using stability.trim.contigs.good.unique.fasta as input file for the fasta parameter.

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 250 250 0 3 1
2.5%-tile: 1 291 291 0 4 272790
25%-tile: 1 292 292 0 4 2727897
Median: 1 292 292 0 4 5455794
75%-tile: 1 292 292 0 5 8183690
97.5%-tile: 1 294 294 0 6 10638797
Maximum: 1 300 300 0 58 10911586
Mean: 1 292.11 292.11 0 4.71927

of unique seqs: 4549033

total # of seqs: 10911586

Output File Names:
stability.trim.contigs.good.unique.summary

It took 98 secs to summarize 10911586 sequences.

mothur > align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.bacteria.fasta)

Using 8 processors.

Reading in the silva.bacteria.fasta template sequences… DONE.
It took 19 to read 14956 sequences.
Aligning sequences from stability.trim.contigs.good.unique.fasta …
Some of you sequences generated alignments that eliminated too many bases, a list is provided in stability.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.
It took 15622 secs to align 4549033 sequences.


Output File Names: stability.trim.contigs.good.unique.align stability.trim.contigs.good.unique.align.report stability.trim.contigs.good.unique.flip.accnos
mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 11895 25318 292 0 4 272790
25%-tile: 43116 43116 300 0 46 2727897
Median: 0 0 0 0 0 5455794
75%-tile: 0 0 0 0 0 8183690
97.5%-tile: 0 0 0 0 0 10638797
Maximum: 43116 43116 300 0 46 10911586
Mean: 2907.94 6188.81 71.399 0 1.20989

of unique seqs: 630790

total # of seqs: 10911586

Output File Names:
stability.trim.contigs.good.unique.summary

It took 566 secs to summarize 10911586 sequences.



mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 11895 25318 292 0 4 272790
25%-tile: 43116 43116 300 0 46 2727897
Median: 0 0 0 0 0 5455794
75%-tile: 0 0 0 0 0 8183690
97.5%-tile: 0 0 0 0 0 10638797
Maximum: 43116 43116 300 0 46 10911586
Mean: 2920.48 6208.38 71.5522 0 1.2124

of unique seqs: 634493

total # of seqs: 10911586

Output File Names:
stability.trim.contigs.good.unique.summary

It took 570 secs to summarize 10911586 sequences.

mothur > summary.seqs(count=stability.trim.contigs.good.count_table)
Using stability.trim.contigs.good.unique.align as input file for the fasta parameter.

Using 8 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 11895 25318 292 0 4 272790
25%-tile: 43116 43116 300 0 46 2727897
Median: 0 0 0 0 0 5455794
75%-tile: 0 0 0 0 0 8183690
97.5%-tile: 0 0 0 0 0 10638797
Maximum: 43116 43116 300 0 46 10911586
Mean: 2920.48 6208.38 71.5522 0 1.2124

of unique seqs: 634493

total # of seqs: 10911586

Output File Names:
stability.trim.contigs.good.unique.summary

It took 570 secs to summarize 10911586 sequences.


mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, processors=1)

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 11895 25318 292 0 4 272790
25%-tile: 43116 43116 300 0 46 2727897
Median: 0 0 0 0 0 5455794
75%-tile: 0 0 0 0 0 8183690
97.5%-tile: 0 0 0 0 0 10638797
Maximum: 43116 43116 300 0 46 10911586
Mean: 2920.48 6208.38 71.5522 0 1.2124

of unique seqs: 634493

total # of seqs: 10911586

Output File Names:
stability.trim.contigs.good.unique.summary

It took 881 secs to summarize 10911586 sequences.

Could you delete silva.bacteria.fasta and the other silva.bactera like files, redownload them and try align.seqs again? Also, are you sure that you put the R1 and R2 files names in the correct columns of your files file?

pat

Pat,
Thanks for the reply.
I re-downloaded silva.bacteria.fasta, and now Mothur can’t even read it, I get an error message saying that the template (i.e. new reference file) is not aligned. I get the same error message when trying align.seqs with the silva seed_v119.align.
P.

I am not able to reproduce the problems you are having on my Linux test machine with our executable. Are you using the executable version of mothur or did you build from source?

Executable version, linux 64 on Amazon EC2.

Could you send your fasta file to mothur.bugs@gmail.com?

Sent it last week :slight_smile:

Thanks for the help!