Analyzing MiSeq data with only R1 and Index reads

Hi Pat,


I am now facing an awkward issue: we did a MiSeq using Caporaso's primer set and got three files: R1.fastq, R2.fastq and Index.fastq. For unknown reasons (the sequencing core facility have been trying to figure them out for two months), the R2 quality is extremely bad but the R1 and Index are perfect. When the sequencing staff still try to fix this issue, I want to know if there is a way I can use only R1 and index data to go through the MiSeq SOP. I know combining "fastq.info" and "trim.seqs" would be a way: turning the fastq files to fasta and then trim the barcodes to do the demultiplexation. But in that way the Index file would not be used.

Could you please give me some suggestions? Thank you very much!

We have added some features to the fastq.info command that might help you. They will be released in version 1.35.0, but you can download the source from github, https://github.com/mothur/mothur, if you want to use them now. The fastq.info command can take a file option. With this option, mothur can parse the fastq files by sample using an index file. This can be helpful in preparing files for an NCBI submission, but I think would help do what you want as well. Your file would look like:

forward.fastq reverse.fastq index.fastq NONE

Then run:

mothur > fastq.info(file=yourFile, oligos=yourOligos, bdiffs=1, fasta=t, qfile=t, other parameters…)

Mothur will create fastq, fasta and qual files for each sample for each direction. Since you only want the forward reads you can disregard the reverse reads. To create the fasta and groups files run the following:

mothur > make.groups(fasta=forward.group1.fasta-forward.group2.fasta-forward.group3.fasta, groups=group1-group2-group3)

merge.files(input=forward.group1.fasta-forward.group2.fasta-forward.group3.fasta, output=combined.fasta)

merge.files(input=forward.group1.qual-forward.group2.qual-forward.group3.qual, output=combined.qual)

You can then proceed with the trim.seqs command if you want to trim by quality scores.

mothur > trim.seqs(fasta=combined.fasta, qfile=combined.qual, other pararmeters…)
mothur > list.seqs(fasta=current) - list seqs in trimmed file
mothur > get.seqs(group=yourGroupfile, accnos=current) - select seqs that passed trim.seqs from complete group file.

This should leave you with a fasta file and group file that have passed the quality screening and contain the same sequences.