pcr.seqs bug? in v.1.33.3

manureva · November 5, 2014, 12:17am

Hi, Iâ€™m new to Mothur and Iâ€™m therefore using the MiSeqSOP with the tutorialâ€™s dataset to get familiar with the pipeline before playing with my own data. I think I might have spotted a bug on Mothur v.1.33.3 (Windows 32-bit version). Well, at least thereâ€™s something that bugs meâ€¦In order to follow the SOP I downloaded the silva.seed_v119.align database on the 3rd of November 2014 (2 days ago). The problem appears when I run the following command:

pcr.seqs(fasta=silva.seed_v119.align, start=11894, end=25319, keepdots=F, processors=1)
system(rename=silva.seed_v119.pcr.align silva.v4.fasta)
summary.seqs(fasta=silva.v4.fasta)

Instead of getting this, as the SOP states:

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 13424 270 0 3 1
2.5%-tile: 1 13425 292 0 4 374
25%-tile: 1 13425 293 0 4 3740
Median: 1 13425 293 0 4 7479
75%-tile: 1 13425 293 0 5 11218
97.5%-tile: 1 13425 294 1 6 14583
Maximum: 3 13425 351 5 9 14956
Mean: 1.00074 13425 292.977 0.0573014 4.57014

of Seqs: 14956

I get this:
Start End Nbases Ambigs Polymer NumSeqs
Minimum: 1 1150 260 0 3 1
2.5%-tile: 1 13425 292 0 4 376
25%-tile: 1 13425 293 0 4 3753
Median: 1 13425 293 0 5 7505
75%-tile: 1 13425 293 0 5 11257
97.5%-tile: 1 13425 460 1 6 14634
Maximum: 3 13425 1122 5 9 15009
Mean: 1.0006 13424.9 321.419 0.064428 1.78433

of Seqs: 15009

I understand that perhaps the database got 53 sequences bigger between the time you wrote the SOP and the day I downloaded it. So I got rid of the sequences (in theory the 53 additional sequences because 15009-14956=43) that were smaller than 270 and longer than 351 in order to match the SOP output. To do so I ran:

trim.seqs(fasta=silva.v4.fasta, minlength=270, maxlength=351, processors=1)
summary.seqs(fasta=silva.v4.trim.fasta)

which gives me this:
Start End Nbases Ambigs Polymer NumSeqs
Minimum: 1 1150 270 0 3 1
2.5%-tile: 1 13425 292 0 4 312
25%-tile: 1 13425 293 0 4 3114
Median: 1 13425 293 0 4 6227
75%-tile: 1 13425 293 0 5 9340
97.5%-tile: 1 13425 294 1 6 12141
Maximum: 1 13425 351 5 9 12452
Mean: 1 293.056 293.056 0.061115 4.5983

of Seqs: 12452

So it turns out if I get rid of the sequences that in theory make up for the difference between my results and the ones from the SOP I end up getting rid of 2557 sequences (15009 originally, but only 12452 left after trim.seqs), instead of the theoretical 53 (15009 sequence in my silva.seed_v119.align but only 14956 in yours).

Questions are:

How can that be?
Should I be worried about using this DB, is this a real bug or something I can ignore?

Many thanks in advance for your help!

pschloss · November 5, 2014, 10:12pm

Hi the seed in the SOP is the older version of the seed that only had 14956 sequences. The newer one has more sequences and was actually created in a slightly different manner. You should be ok.

Pat

manureva · November 5, 2014, 10:40pm

OK, thanks!

Topic		Replies	Views
Silva.seed_v119.align mothur bugs	1	2863	November 19, 2014
Bug in align.seqs? Commands in mothur	6	3772	February 1, 2015
Help for Processing improved sequences in align.seqs Commands in mothur	4	2654	March 27, 2015
18S reference alignment (and other bugs) mothur bugs	3	1074	June 3, 2020
Mothur GUI pcr.seq error mothur bugs	3	3395	March 4, 2014

pcr.seqs bug? in v.1.33.3

of Seqs: 14956

of Seqs: 15009

of Seqs: 12452

Related topics