All groups removed after screen.seqs

bpyoumans · June 25, 2015, 10:29pm

I’m analyzing samples sequenced using MiSeq 2x300 targeting 16S V1V3. We swapped the sequencing reads so that the reverse read from V3 is R1 and the forward read from V1 is R2. Our genome center demultiplexes reads and removes barcodes before they send them, but primer sequences are not removed.

I started by doing make.contigs with each of my samples. Here’s the command for one of them:
make.contigs(ffastq=SNF2W3L-H12_S384_L001_R1_001.fastq, rfastq=SNF2W3L-H12_S384_L001_R2_001.fastq, oligos=SNF2W3L-H12_oligos.txt, pdiffs=2, processors=24)
–> my oligos file reads: primer ATTACCGCGGCTGCTGG AGAGTTTGATCMTGGCTCAG SNF2W3LH12

I then merged the fastas using merge.files and made a groups file using make.group (The commands were really long because I had to do this with 346 files so I’m not pasting those commands).

Then I had a nice big dataset:
summary.seqs(fasta=MOSN.fasta, processors=24)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 279 279 0 3 1
2.5%-tile: 1 441 441 0 4 332018
25%-tile: 1 484 484 2 4 3320172
Median: 1 508 508 4 5 6640344
75%-tile: 1 511 511 8 6 9960515
97.5%-tile: 1 546 546 17 8 12948669
Maximum: 1 566 566 66 283 13280686
Mean: 1 497.35 497.35 5.37081 5.02602

of Seqs: 13280686

Following the MiSeq SOP, I then did screen.seqs and unique.seqs
screen.seqs(fasta=MOSN.fasta, group=MOSN.groups, maxambig=0, processors=24)
unique.seqs(fasta=MOSN.good.fasta)
summary.seqs(fasta=MOSN.good.unique.fasta, name=MOSN.good.names)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 283 283 0 3 1
2.5%-tile: 1 437 437 0 4 33496
25%-tile: 1 488 488 0 4 334960
Median: 1 511 511 0 4 669920
75%-tile: 1 511 511 0 5 1004880
97.5%-tile: 1 560 560 0 8 1306344
Maximum: 1 566 566 0 283 1339839
Mean: 1 498.97 498.97 0 4.85721

of unique seqs: 861343

total # of seqs: 1339839

I keep following the SOP…
count.seqs(name=MOSN.good.names, group=MOSN.good.groups)
pcr.seqs(fasta=silva.bacteria.fasta, oligos=V1V3_primers_ref_align.txt)
–>my oligos file reads: reverse ATTACCGCGGCTGCTGG
summary.seqs(fasta=silva.bacteria.pcr.fasta)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 11897 275 0 3 1
2.5%-tile: 1044 13125 431 0 4 338
25%-tile: 1044 13125 463 0 5 3377
Median: 1044 13125 484 0 5 6754
75%-tile: 1044 13125 490 0 5 10131
97.5%-tile: 1044 13125 510 2 7 13170
Maximum: 1044 13127 629 5 9 13507
Mean: 1044 13124.8 476.246 0.150885 4.93663

of Seqs: 13507

Then I align my reads to my V1V3 silva file and use flip=T since I sequenced from the reverse primer:
align.seqs(fasta=MOSN.good.unique.fasta, reference=silva.V1V3.fasta, flip=T, processors=24)
summary.seqs(fasta=MOSN.good.unique.align, count=MOSN.good.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 1044 13125 21 0 3 33496
25%-tile: 1044 13125 482 0 4 334960
Median: 1044 13125 510 0 4 669920
75%-tile: 1044 13125 511 0 5 1004880
97.5%-tile: 5256 13125 523 0 7 1306344
Maximum: 13125 13127 564 0 239 1339839
Mean: 1273 13007.6 480.995 0 4.62672

of unique seqs: 861343

total # of seqs: 1339839

I notice I now have sequences that are very short (21 bp) but I’m not sure why. I continue and screen for the second time:
screen.seqs(fasta=MOSN.good.unique.align, count=MOSN.good.count_table, summary=MOSN.good.unique.summary, start=1044, end=13127, maxhomop=8)
–> I get the error: Removing group: MOF2W10B1C because all sequences have been removed.
–> I get this error for all of my groups.

Any suggestions on where I went wrong?

pschloss · June 29, 2015, 8:33pm

Sometimes things get amplified that really shouldn’t be getting amplified and these either don’t align (see the length=0) or are very short like you are seeing. I think the bigger problem is that you are losing 90% of your data in the screen.seqs step because your sequences have so many Ns. This is probably because you are sequencing a long region and using the V3 chemistry and only have a very small amount of overlap. We and others have had horrible experiences with the V3 chemistry and have found that the reads must fully overlap. I kind of rant about it here…

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Pat

bpyoumans · June 29, 2015, 9:38pm

Awesome!! I was actually hoping you would say that. I tried explaining this to people (and even sent them that blog post!) and they seem to think it should still work. Now I can actually show them why we need to switch to V4.

Thanks Pat!!

Bonnie

pschloss · July 9, 2015, 12:45pm

… and your old boss is a co-author on that paper, if i recall correctly

Topic		Replies	Views
Removing Sequence Primers Commands in mothur	13	6975	January 14, 2021
Removing primers from 2x250 MiSeq V4 reads Commands in mothur	4	1507	June 17, 2019
screen.seqs removes half of my sequences Commands in mothur	19	12626	January 9, 2015
remove primers and update groups file mothur bugs	1	1147	May 11, 2017
Removing Primers with make.contigs Commands in mothur	1	2590	April 28, 2014

All groups removed after screen.seqs

of Seqs: 13280686

of unique seqs: 861343

of Seqs: 13507

of unique seqs: 861343

Related topics