Hi,
I’ve noticed a problem that occurs with the silva.bacteria.fasta file used for doing alignments. For my analysis I used the pcr.seqs command to specifically select for the V6 region of the gene using my sequencing primers on the silva.bacteria.fasta file then used that file for the alignment of my sequences. After finishing my analysis I found that there were far more OTUs in some of my samples than was possible. I made my quality checks and sequencing as stringent as possible but still observed this phenomenon. Finally I looked the actual alignment of the silva.bacteria.fasta file and noticed that the alignment had some very obvious errors.
Using the RDP Infernal alignment completely eliminated the problem.
Here’s the readout from mothur on these two files.
mothur > summary.seqs(fasta=silva.bacteria.fasta, processors=2)
Using 2 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 43061 1252 0 4 1
2.5%-tile: 1044 43116 1399 0 5 374
25%-tile: 1044 43116 1438 0 5 3740
Median: 1044 43116 1454 0 5 7479
75%-tile: 1044 43116 1465 0 6 11218
97.5%-tile: 1044 43116 1490 3 7 14583
Maximum: 1044 43116 1712 5 9 14956
Mean: 1044 43116 1449.59 0.362931 5.49104
of Seqs: 14956
Output File Names:
silva.bacteria.summary
mothur > pcr.seqs(fasta=silva.bacteria.fasta, oligos=primersr.txt, keepprimer=true, keepdots=false)
Using 2 processors.
Processing sequence: 100
Processing sequence: 100
Processing sequence: 200
Processing sequence: 200
Processing sequence: 300
Processing sequence: 300
Processing sequence: 400
Processing sequence: 400
Processing sequence: 500
Processing sequence: 500
Processing sequence: 600
Processing sequence: 600
Processing sequence: 700
Processing sequence: 700
Processing sequence: 800
Processing sequence: 800
Processing sequence: 900
Processing sequence: 900
Processing sequence: 1000Processing sequence: 1000
Processing sequence: 1100
Processing sequence: 1100
Processing sequence: 1200
Processing sequence: 1200
Processing sequence: 1300
Processing sequence: 1300
Processing sequence: 1400
Processing sequence: 1400
Processing sequence: 1500
Processing sequence: 1500
Processing sequence: 1600
Processing sequence: 1600
Processing sequence: 1700
Processing sequence: 1700
Processing sequence: 1800
Processing sequence: 1800
Processing sequence: 1900
Processing sequence: 1900
Processing sequence: 2000
Processing sequence: 2000
Processing sequence: 2100
Processing sequence: 2100
Processing sequence: 2200
Processing sequence: 2200
Processing sequence: 2300
Processing sequence: 2300
Processing sequence: 2400
Processing sequence: 2400
Processing sequence: 2500
Processing sequence: 2500
Processing sequence: 2600
Processing sequence: 2600
Processing sequence: 2700
Processing sequence: 2700
Processing sequence: 2800
Processing sequence: 2800
Processing sequence: 2900
Processing sequence: 2900
Processing sequence: 3000
Processing sequence: 3000
Processing sequence: 3100
Processing sequence: 3100
Processing sequence: 3200
Processing sequence: 3200
Processing sequence: 3300
Processing sequence: 3300
Processing sequence: 3400
Processing sequence: 3400
Processing sequence: 3500
Processing sequence: 3500
Processing sequence: 3600
Processing sequence: 3600
Processing sequence: 3700
Processing sequence: 3700
Processing sequence: 3800
Processing sequence: 3800
Processing sequence: 3900
Processing sequence: 3900
Processing sequence: 4000
Processing sequence: 4000
Processing sequence: 4100
Processing sequence: 4100
Processing sequence: 4200
Processing sequence: 4200
Processing sequence: 4300
Processing sequence: 4300
Processing sequence: 4400
Processing sequence: 4400
Processing sequence: 4500
Processing sequence: 4500
Processing sequence: 4600
Processing sequence: 4600
Processing sequence: 4700
Processing sequence: 4700
Processing sequence: 4800
Processing sequence: 4800
Processing sequence: 4900
Processing sequence: 4900
Processing sequence: 5000
Processing sequence: 5000
Processing sequence: 5100
Processing sequence: 5100
Processing sequence: 5200
Processing sequence: 5200
Processing sequence: 5300
Processing sequence: 5300
Processing sequence: 5400
Processing sequence: 5400
Processing sequence: 5500
Processing sequence: 5500
Processing sequence: 5600
Processing sequence: 5600
Processing sequence: 5700
Processing sequence: 5700
Processing sequence: 5800
Processing sequence: 5800
Processing sequence: 5900
Processing sequence: 5900
Processing sequence: 6000
Processing sequence: 6000
Processing sequence: 6100
Processing sequence: 6100
Processing sequence: 6200
Processing sequence: 6200
Processing sequence: 6300
Processing sequence: 6300
Processing sequence: 6400
Processing sequence: 6400
Processing sequence: 6500
Processing sequence: 6500
Processing sequence: 6600
Processing sequence: 6600
Processing sequence: 6700
Processing sequence: 6700
Processing sequence: 6800
Processing sequence: 6800
Processing sequence: 6900
Processing sequence: 6900
Processing sequence: 7000
Processing sequence: 7000
Processing sequence: 7100
Processing sequence: 7100
Processing sequence: 7200
Processing sequence: 7200
Processing sequence: 7300
Processing sequence: 7300
Processing sequence: 7400
Processing sequence: 7400
Thread Processing sequence: 7478
Processing sequence: 7478
Output File Names:
silva.bacteria.pcr.fasta
silva.bacteria.bad.accnos
silva.bacteria.scrap.pcr.fasta
It took 70 secs to screen 14956 sequences.
mothur > summary.seqs()
Using silva.bacteria.pcr.fasta as input file for the fasta parameter.
Using 2 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 2979 88 0 2 1
2.5%-tile: 1 2980 93 0 3 238
25%-tile: 1 2980 95 0 3 2379
Median: 1 2980 96 0 3 4757
75%-tile: 1 2980 96 0 4 7135
97.5%-tile: 1 2980 101 0 6 9275
Maximum: 1 2981 157 4 9 9512
Mean: 1 2980 95.8464 0.0279647 3.71131
of Seqs: 9512
Output File Names:
silva.bacteria.pcr.summary
mothur > screen.seqs(maxambig=0, maxhomop=8) Using silva.bacteria.pcr.fasta as input file for the fasta parameter.
Using 2 processors.
Processing sequence: 100
Processing sequence: 100
Processing sequence: 200
Processing sequence: 200
Processing sequence: 300
Processing sequence: 300
Processing sequence: 400
Processing sequence: 400
Processing sequence: 500
Processing sequence: 500
Processing sequence: 600
Processing sequence: 600
Processing sequence: 700
Processing sequence: 700
Processing sequence: 800
Processing sequence: 800
Processing sequence: 900
Processing sequence: 900
Processing sequence: 1000
Processing sequence: 1000
Processing sequence: 1100
Processing sequence: 1100
Processing sequence: 1200
Processing sequence: 1200
Processing sequence: 1300
Processing sequence: 1300
Processing sequence: 1400
Processing sequence: 1400
Processing sequence: 1500
Processing sequence: 1500
Processing sequence: 1600
Processing sequence: 1600
Processing sequence: 1700
Processing sequence: 1700
Processing sequence: 1800
Processing sequence: 1800
Processing sequence: 1900
Processing sequence: 1900
Processing sequence: 2000
Processing sequence: 2000
Processing sequence: 2100
Processing sequence: 2100
Processing sequence: 2200
Processing sequence: 2200
Processing sequence: 2300
Processing sequence: 2300
Processing sequence: 2400
Processing sequence: 2400
Processing sequence: 2500
Processing sequence: 2500
Processing sequence: 2600
Processing sequence: 2600
Processing sequence: 2700
Processing sequence: 2700
Processing sequence: 2800
Processing sequence: 2800
Processing sequence: 2900
Processing sequence: 2900
Processing sequence: 3000
Processing sequence: 3000
Processing sequence: 3100
Processing sequence: 3100
Processing sequence: 3200
Processing sequence: 3200
Processing sequence: 3300
Processing sequence: 3300
Processing sequence: 3400
Processing sequence: 3400
Processing sequence: 3500
Processing sequence: 3500
Processing sequence: 3600
Processing sequence: 3600
Processing sequence: 3700
Processing sequence: 3800
Processing sequence: 3700
Processing sequence: 3900
Processing sequence: 3800
Processing sequence: 4000
Processing sequence: 3900
Processing sequence: 4100
Processing sequence: 4000
Processing sequence: 4200
Processing sequence: 4100
Processing sequence: 4300
Processing sequence: 4200
Processing sequence: 4400
Processing sequence: 4300
Processing sequence: 4500
Processing sequence: 4400
Processing sequence: 4600
Processing sequence: 4500
Processing sequence: 4600
Processing sequence: 4700
Processing sequence: 4756
Processing sequence: 4700
Processing sequence: 4756
Output File Names:
silva.bacteria.pcr.good.fasta
silva.bacteria.pcr.bad.accnos
It took 3 secs to screen 9512 sequences.
mothur > summary.seqs()
Using silva.bacteria.pcr.good.fasta as input file for the fasta parameter.
Using 2 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 2979 88 0 2 1
2.5%-tile: 1 2980 93 0 3 234
25%-tile: 1 2980 95 0 3 2332
Median: 1 2980 96 0 3 4663
75%-tile: 1 2980 96 0 4 6994
97.5%-tile: 1 2980 101 0 6 9091
Maximum: 1 2981 157 0 8 9324
Mean: 1 2980 95.8333 0 3.71332
of Seqs: 9324
Output File Names:
silva.bacteria.pcr.good.summary
mothur > summary.seqs(fasta=silva.bacteria.rdp.fasta)
Using 2 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 903 1103 82 0 2 1
2.5%-tile: 903 1103 87 0 3 286
25%-tile: 903 1103 89 0 3 2853
Median: 903 1103 90 0 3 5705
75%-tile: 903 1103 92 0 4 8557
97.5%-tile: 903 1103 96 0 6 11123
Maximum: 903 1103 151 0 8 11408
Mean: 903 1103 90.2805 0 3.68058
of Seqs: 11408
Output File Names:
silva.bacteria.rdp.summary