Illumina single-read with index in 2nd sequencing run

Hey guys,

I just got an illumina run back. I only have single reads (xxx.R1.fastq) and the indexes are supposedly in the second sequencing run (xxx.R2.fastq). I’m wondering if I could get help with getting these together so that I can trim.seqs and demultiplex. make.contings wont allow me to use only 1 fastq (f without r)??

Conceptually it should just be a matter of concatenating the two, right?

Thank you

shaun

Hi Shaun,

Mothur is expecting a forward and reverse fastq, but perhaps we can fake her out, :). Let’s try this:

fastq.info(fastq=xxx.R1.fastq) - creates fasta and qual file
reverse.seqs(fasta=xxx.R1.fasta) - reverse the fasta file
reverse.seqs(qfile=xxx.R1.qual) - reverse the quality file
make.fastq(fasta=xxx.R1.rc.fasta, qfile=xxx.R1.rc.qual) - reverse of xxx.R1.fastq so when mothur flips and aligns the fragments there will be a perfect overlap and your sequences should come out whole.
make.contigs(ffastq=xxx.R1.fastq, rfastq=xxx.R1.rc.fastq, index=xxx.R2.fastq, bdiffs=1, rest of your parameters…)

Kindly,
Sarah

Hi Sarah,

I, too, have Illumina Miseq single-end reads. I followed the steps you prescribed to fake out mothur with a duplicate, reversed version of the SE sequence.

In addition to feeling conflicted about the morality of playing tricks on mothur, upon running make.contigs(file=stability.file, oligos=oligos.file, bdiffs=1, pdiffs=2), it appears that many many many of my sequences are lacking paired reads. In fact, several minutes into the process, I’m still being warned that mothur is ignoring my reads because they lack pairs.

Is this to be expected based on the fasta, rev.seqs, and fastq steps you suggested below?

Thanks, Sarah. And perhaps I should admit (sheepishly) that I’m a first-timer.


Joe

An update regarding the outcome of my make.contigs(stability.file, oligos) command with the original and reversed seqs:

All of my reads ended up in the scrap.contigs because of “bf.” I’ve pasted my stability file and first few lines of my oligos file below. I’ve tried a few different make.contigs options, including ffastq, rfastq, and different formats for my oligos file, e.g. without the NONE. Nothing has worked.

Perhaps it’s worth noting, too, that I used the Caporaso et al 2012 MiSeq sequencing method, minus the paired-end part.

Thanks for any ideas!


Joe
Stability Undetermined_S0_L001_R1_001.fastq undetermined_s0_l001_r1_001.rc.fastq Undetermined_S0_L001_I1_001.fastq none

Oligos
primer GTGCCAGCMGCCGCGGTAA
#primer GGACTACHVGGGTWTCTAAT
barcode CTATCTCCTGTC NONE t0M4
barcode ACTCACAGGAAT NONE t0M9
barcode ATGATGAGCCTC NONE t0M10
barcode GTCGACAGAGGA NONE t0M11
barcode TGTCGCAAATAG NONE t0M13
barcode CATCCCTCTACT NONE t1M4
barcode TATACCGCTGCG NONE t1M9

Could you post the first ~10 lines or so of your R1 and R2 file?

Is the R1 file just a forward read, presumably 250bp, since that’s about the length of the caporaso amplicon?

Does the R2 read just have the barcode? You later called it I1, so probably.

There’s probably a way to trick mothur into working with single reads, but I expect you will have a lot of poor quality resultant sequences, since the overlapping paired read serves as quality control.

Hey Adam,

You’re exactly right. The first read is the 250bp forward read, and the second read is the index.

In my first post (above), which you probably already saw, I discuss my attempts to follow Westcott’s advice in reversing the forward read fastq file in order to create the equivalent of a paired-end run so that I could run analyses in Mothur. I’ve been able to analyze the dataset in QIIME – and QIIME confirmed that, post-filtering, the reads were 251bp – but I’d like to get back over to Mothur, so help would be appreciated!

First 10 lines of each read are below. Thanks, Adam.


Joe
Forward read @M00232:2:000000000-A8VKU:1:1101:15827:1331 1:N:0:0 AACCGGAGTAGTTGAAATGGTAATAAGACGACCAATCTGACCAGCAAGGAAGCCAAGATGGGAAAGGTCATGCGGCATACGCTCGGCGCCAGTTTGAATATTAGACATAATTTATCCTCAAGTAAGGGGCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTACATACCAAAGACGAGCGCCTTTACGCNTGCCTTTAGTACCTCGCAACGGCTGCGGACGACCAGGGCGAGCGCCAGAACGTTTTTTAC + AABBABDBBFFFGFGGGGFGGFHHHHFHHGGGGGHHHHGHHHHHHHHHGHHHHHHHEHHHHHHHHHHBFGHFFEGGGGHHFGFFGGG>EG?EFGHHHFHFHHHGHHHHHGHHHHHHHHHGHHHHHHHHGGGGGC@FFHGHGHHHHGHHHHHHHFHHHHFHHGHHHHGHHHHHHFHHGGGGGGGGGGGGGG#9;CEGFFFGFFFGFFFFFFFBFFFAEDFFFFADFFFFFFFFFFFFFFAFFFD.BFFFFFF @M00232:2:000000000-A8VKU:1:1101:15865:1332 1:N:0:0 TACGTAGGGGGCAAGCGTTATCCGGATTCACTGGGTGTAAAGGGAGCGTAGACGGCCATGCAAGCCTGGAGTGAAAGCCCGGGGCCCAACCCCGGGACTGCTCTGGGAACTGTGCGGCTGGAGTGCGGGAGGGGCAGGCGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAACACCGGCNGCGAAGGCGGCCTGCTGGACCGCGACTGACGTTGAGGCTCGAAAGCGTGGGGAGCAAACA + BCCCAFFCCCCCGGGGGGGGGGHGGGGGHHHHHHHHGGHHHHHGHGGGGGGGHGGGFGHHHHHHEGHHHHHHHHHHFHHH@EEFGEGGHGGGGGGCGGHHHHHHHGFHHGHHGHHGGGGGGHHHHHGGGGGGGGGGDGGGGGGGFFFEFFBFFFFEFFDFFFBFFFFFFFFFFFBFFFFFFFFFFFF;D-#;:9EECDFFFF;D.FFFFFFFFDAFFFFBFFFEFFFFEFFFFFFDF?ADFFFFFAEFFFF @M00232:2:000000000-A8VKU:1:1101:15574:1332 1:N:0:0 CGGATTGTTCAGTAACTTTACTCATGATTTCTTACCTATTAGTGGTTGAACAGCATCGGACTCAGATAGTAATCCACGCTCTTTTAAAATGTCAACAAGAGAATCTCTACCATGAACAAAATGTGACTCATATCTAAACCAGTCCTTGACGAACGTGCCAAGCATATTAAGCCACTTCTCCTCATCCAACNCGTCAGTTTTTGACAGAATCGTTAGTTGATGGCGAAAGGTCGCAAAGTAAGAGCTTCTCG

Index read
@M00232:2:000000000-A8VKU:1:1101:15827:1331 1:N:0:0
GAACTACTCCAC
+

111>1111111
@M00232:2:000000000-A8VKU:1:1101:15865:1332 1:N:0:0
AGCTGTTGTTTG

AA1>11B1B1B1
@M00232:2:000000000-A8VKU:1:1101:15574:1332 1:N:0:0
TTTAAATGACTA

Hmm, none of those index reads match what you have in your oligos file… Of course, the first reads from a MiSeq tend be be lousy in my experience anyways.

Have you tried putting the reverse complement of your index sequences in the oligos file instead? You might want to look somewhere into the middle of the index-read file and see if the reads directly match what you have or if they are reverse complements.

If that doesn’t work, the only thing I can think would be to write a script that did something like this:

Demultiplex as follows:

  1. Read a table of index-sample mappings
  2. Read through your I1 file (barcodes), sequence at a time
  3. If index sequence matches one in mapping file, write out the R1 sequence in a sample specific fastq file

Then,
4) Quality filter/trim, using something like Trimmomatic

[you might need to convert fastq to fasta here, depending on how you filter]

In mothur,
5) Run make.group on your newly demultiplexed and cleaned fastq files
6) Run merge.files to merge all of the individual fasta files into one

This would effectively mimic what make.contigs does, generating a .fasta and .groups file that mothur could use.

You might be able to find a script that does steps 1-3, otherwise I have something written somewhere to demultiplex paired ends that could be easily enough adapted for single ends.