Mock community analysis problem

Hi,

I sequenced ~300-bp 16S rRNA gene amplicons (V1-V2) of an HMP mock community (20 species at even concentration) on the MiSeq using 2x300 bp chemistry. After following the MiSeq SOP through to the classify.seqs stage, I had almost no sequences belonging to Neisseria meningitidis left. As there were no primer mismatches to the N. meningitidis sequence, I thought its sequences may have been filtered out somehow. I therefore ran classify.seqs after each of the various stages in the SOP that remove sequences. It turned out that it was present (in approximately the expected proportion) after align.seqs but was removed during the subsequent screen.seqs step. Below are the summaries after align.seqs and screen.seqs:

After align.seqs:

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 6332 286 0 3 1
2.5%-tile: 1044 6333 301 0 4 1637
25%-tile: 1044 6333 304 0 4 16365
Median: 1044 6333 310 0 5 32729
75%-tile: 1044 6333 317 0 5 49093
97.5%-tile: 1045 6434 339 0 6 63820
Maximum: 1113 6443 340 0 8 65456
Mean: 1044.1 6350.05 311.817 0 4.68004

of unique seqs: 30196

total # of seqs: 65456

screen.seqs(fasta=miseqv1v2.trim.contigs.good.unique.pick.align, count=miseqv1v2.trim.contigs.good.pick.count_table, start=1044, end=6333, maxhomop=8, processors=4)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 6333 286 0 3 1
2.5%-tile: 1044 6333 301 0 4 1487
25%-tile: 1044 6333 302 0 4 14861
Median: 1044 6333 310 0 5 29721
75%-tile: 1044 6389 317 0 5 44581
97.5%-tile: 1044 6434 339 0 6 57955
Maximum: 1044 6443 340 0 8 59441
Mean: 1044 6351.78 311.993 0 4.64775

of unique seqs: 28892

total # of seqs: 59441

Any ideas why the 16S sequence of this particular species is getting screened out?

Thanks in advance!

What happens when you run screen.seqs with start=1113?


Also, FWIW, you might want to see this if you haven't already regarding the V3 chemistry and fully overlapping reads: http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Thanks Pat.

Changing the start position to 1113 retained those sequences. Do you have any idea why sequences for that organism ended up starting at a different position in the alignment to the others?