All groups removed after screen.seqs

I’m analyzing samples sequenced using MiSeq 2x300 targeting 16S V1V3. We swapped the sequencing reads so that the reverse read from V3 is R1 and the forward read from V1 is R2. Our genome center demultiplexes reads and removes barcodes before they send them, but primer sequences are not removed.

I started by doing make.contigs with each of my samples. Here’s the command for one of them:
make.contigs(ffastq=SNF2W3L-H12_S384_L001_R1_001.fastq, rfastq=SNF2W3L-H12_S384_L001_R2_001.fastq, oligos=SNF2W3L-H12_oligos.txt, pdiffs=2, processors=24)
–> my oligos file reads: primer ATTACCGCGGCTGCTGG AGAGTTTGATCMTGGCTCAG SNF2W3LH12

I then merged the fastas using merge.files and made a groups file using make.group (The commands were really long because I had to do this with 346 files so I’m not pasting those commands).

Then I had a nice big dataset:
summary.seqs(fasta=MOSN.fasta, processors=24)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 279 279 0 3 1
2.5%-tile: 1 441 441 0 4 332018
25%-tile: 1 484 484 2 4 3320172
Median: 1 508 508 4 5 6640344
75%-tile: 1 511 511 8 6 9960515
97.5%-tile: 1 546 546 17 8 12948669
Maximum: 1 566 566 66 283 13280686
Mean: 1 497.35 497.35 5.37081 5.02602

of Seqs: 13280686

Following the MiSeq SOP, I then did screen.seqs and unique.seqs
screen.seqs(fasta=MOSN.fasta, group=MOSN.groups, maxambig=0, processors=24)
unique.seqs(fasta=MOSN.good.fasta)
summary.seqs(fasta=MOSN.good.unique.fasta, name=MOSN.good.names)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 283 283 0 3 1
2.5%-tile: 1 437 437 0 4 33496
25%-tile: 1 488 488 0 4 334960
Median: 1 511 511 0 4 669920
75%-tile: 1 511 511 0 5 1004880
97.5%-tile: 1 560 560 0 8 1306344
Maximum: 1 566 566 0 283 1339839
Mean: 1 498.97 498.97 0 4.85721

of unique seqs: 861343

total # of seqs: 1339839

I keep following the SOP…
count.seqs(name=MOSN.good.names, group=MOSN.good.groups)
pcr.seqs(fasta=silva.bacteria.fasta, oligos=V1V3_primers_ref_align.txt)
–>my oligos file reads: reverse ATTACCGCGGCTGCTGG
summary.seqs(fasta=silva.bacteria.pcr.fasta)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 11897 275 0 3 1
2.5%-tile: 1044 13125 431 0 4 338
25%-tile: 1044 13125 463 0 5 3377
Median: 1044 13125 484 0 5 6754
75%-tile: 1044 13125 490 0 5 10131
97.5%-tile: 1044 13125 510 2 7 13170
Maximum: 1044 13127 629 5 9 13507
Mean: 1044 13124.8 476.246 0.150885 4.93663

of Seqs: 13507

Then I align my reads to my V1V3 silva file and use flip=T since I sequenced from the reverse primer:
align.seqs(fasta=MOSN.good.unique.fasta, reference=silva.V1V3.fasta, flip=T, processors=24)
summary.seqs(fasta=MOSN.good.unique.align, count=MOSN.good.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 1044 13125 21 0 3 33496
25%-tile: 1044 13125 482 0 4 334960
Median: 1044 13125 510 0 4 669920
75%-tile: 1044 13125 511 0 5 1004880
97.5%-tile: 5256 13125 523 0 7 1306344
Maximum: 13125 13127 564 0 239 1339839
Mean: 1273 13007.6 480.995 0 4.62672

of unique seqs: 861343

total # of seqs: 1339839

I notice I now have sequences that are very short (21 bp) but I’m not sure why. I continue and screen for the second time:
screen.seqs(fasta=MOSN.good.unique.align, count=MOSN.good.count_table, summary=MOSN.good.unique.summary, start=1044, end=13127, maxhomop=8)
–> I get the error: Removing group: MOF2W10B1C because all sequences have been removed.
–> I get this error for all of my groups.

Any suggestions on where I went wrong?

Sometimes things get amplified that really shouldn’t be getting amplified and these either don’t align (see the length=0) or are very short like you are seeing. I think the bigger problem is that you are losing 90% of your data in the screen.seqs step because your sequences have so many Ns. This is probably because you are sequencing a long region and using the V3 chemistry and only have a very small amount of overlap. We and others have had horrible experiences with the V3 chemistry and have found that the reads must fully overlap. I kind of rant about it here…

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Pat

Awesome!! I was actually hoping you would say that. I tried explaining this to people (and even sent them that blog post!) and they seem to think it should still work. Now I can actually show them why we need to switch to V4.

Thanks Pat!!

Bonnie

… and your old boss is a co-author on that paper, if i recall correctly :slight_smile: