I’m processing only MiSeq 2x300 R1 reads from samples sequenced targeting the 16S V1V3 region. (My attempt at using both R1 and R2: http://mothur.ltcmp.net/t/all-groups-removed-after-screen-seqs/2425/1) We swapped the sequencing reads so that the read from V3 is R1 and the read from V1 is R2. Our genome center demultiplexes reads and removes barcodes before they send them, but primer sequences are not removed.
I’m using this as a guide for processing only the R1 reads:
http://mothur.ltcmp.net/t/analyzing-miseq-data-with-only-r1-and-index-reads/2228/1
I started with fastq.info for each sample:
mothur > fastq.info(fastq=MOF2W10B1C_S59_L001_R1_001.fastq)
I then did trim.seqs for each sample:
mothur > trim.seqs(fasta=MOF2W10B1C_S59_L001_R1_001.fasta, oligos=MOF2W10B1C_oligos.txt, qfile=MOF2W10B1C_S59_L001_R1_001.qual, pdiffs=2, maxambig=0, maxhomop=8, flip=T, qwindowaverage=25, qwindowsize=50, processors=24)
→ My oligos files reads: forward ATTACCGCGGCTGCTGG MOF2W10B1C
I then merged the many fastas using merge.files and made a groups file using make.group and looked at my dataset:
mothur > summary.seqs(fasta=MOSNR1.trim.fasta, processors=24)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 50 50 0 2 1
2.5%-tile: 1 81 81 0 3 333942
25%-tile: 1 204 204 0 4 3339414
Median: 1 243 243 0 4 6678827
75%-tile: 1 277 277 0 5 10018240
97.5%-tile: 1 284 284 0 6 13023711
Maximum: 1 286 286 0 8 13357652
Mean: 1 229.835 229.835 0 4.30143
of Seqs: 13357652
Output File Names:
/panfs/roc/scratch/youmansb_test/mothur351/MOSN_R1_B/MOSNR1.trim.summary
I then did screen.seqs:
mothur > screen.seqs(fasta=MOSNR1.trim.fasta, group=MOSNR1.groups, minlength=200, processors=24)
mothur > summary.seqs(fasta=MOSNR1.trim.good.fasta)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 2 1
2.5%-tile: 1 205 205 0 3 255578
25%-tile: 1 232 232 0 4 2555771
Median: 1 258 258 0 4 5111542
75%-tile: 1 284 284 0 5 7667313
97.5%-tile: 1 284 284 0 6 9967506
Maximum: 1 286 286 0 8 10223083
Mean: 1 254.783 254.783 0 4.33019
of Seqs: 10223083
Output File Names:
/panfs/roc/scratch/youmansb_test/mothur351/MOSN_R1_B/MOSNR1.trim.good.summary
Unique.seqs then count.seqs:
mothur > unique.seqs(fasta=MOSNR1.trim.good.fasta)
→ Last line from unique.seqs: 10223083 7120685
I get that using make.contigs on V1V3 data results in many spurious OTUs because of the minimal overlap and the un-quality-checked regions outside of the overlap region. I thought that using only the R1 reads and the quality parameters of trim.seqs would result in good quality reads and a “normal” ~10%ish unique sequences. Or least not 70% uniques…
What am I missing? Or is phylotype analysis the only thing I can do at this point?
Bonnie