demultiplex dual index fastq files

Nick · September 16, 2015, 10:36pm

Hello,

On Pat’s MISeq SOP he mentions that “Sequences come off the MiSeq as pairs of fastq files with each pair representing the two sets of reads per sample.” This is the form of sequence data I have received from our sequencing center in the past (a F and R fastq file for each sample where the paired reads are in order in each file - similar to the fastq files provided for the MISEQ SOP download). However, I recently received data from a new sequencing center and they have given me only two files which are the pooled (multiplexed - I hope I am using this terminology correctly) forward reads and pooled reverse reads for all samples. They also used a dual indexing approach.

My question is whether there is a good way to go about demultiplexing these two files into individual F and R fastq files given the dual indices? Or is this something that has to be done when processing from the raw data (i.e. when running miseq reporter or casava) and I need to ask my sequencing center to modify their output parameters to generate the individual F and R files? I can do a little python programming if there is not an existing program for this, but I can’t get my head around how I even go about it given it would take both indices?

I have the sample mapping file, 5’ and 3’ barcode/index files, index lengh files etc.

Thanks in advance for any thoughts on this! I found a way to demultiplex files of this type using Pandaseq and fastx (requires assembling the reads first and outputting a fasta file without being able to assess the quality scores for each sample), but would prefer to be able to get the F and R fasta for each so I can check the qual scores over the read length for each , etc. and get them into the Mothur SOP workflow. I also would like to be able to deposit the reads without having to assemble them.

Nick

Kendra · September 17, 2015, 2:48pm

make.contigs will demultiplex those, you should have 4 fastq, R1, R2, I1, I2

Nick · September 17, 2015, 3:34pm

kmitchell,

Thank you for replying to my post and for the suggestion to use make.contigs() to demultiplex the seq data.

This looks like it will work perfectly to demultiplex the data. However, will this command only return fasta and report files in which the forward and reverse reads have been merged into contigs?

Looking at the wiki page it is not clear whether I could use this command to return a separate forward and revere fastq files for each sample in the multiplexed files? Am I overlooking something?

Thanks in advance!

Kendra · September 21, 2015, 5:28pm

you don’t want to merge the sequences, just demultiplex? What are you trying to do with the separate fasta?

westcott · September 21, 2015, 5:58pm

Hi Nick,
The make.contigs command will create a group file that assigns each read its sample. You want to run a command like this:

mothur > make.contigs(ffastq=yourForwardFastqFile, rfastq=yourReverseFastqFile, findex=yourForwardIndexFile, rindex=yourReverseIndexFile, oligos=yourOligosFile, pdiffs=2, bdiffs=1)

Here’s a link to the oligos file wiki page, http://www.mothur.org/wiki/Oligos_File.

Kindly,
Sarah

Nick · September 23, 2015, 1:22pm

kmitchell - Thanks for following up on this. Yes, that is correct. I would like to have separate F and R fastq files for each sample so I can deposit/archive the sequences. I was under the impression that it was best to deposit the unmerged reads with barcodes and primers removed (to provide others the option of implementing alternative merging algorithms etc. per their preferred workflow(s)). I have been able to multiplex the file and could deposit the merged/contigs fasta file for each sample, but I would rather deposit the F and R fastq for each if possible. I would think the quality information might be useful to anyone wanted to use these data.

I also like to have separate F and R fastq files so that I can check the qual scores across the read length for the F and R reads separately for each sample (or really a subset). This might help me to decide if/where to trim the read length to see if i can improve the quailty and number of merged reads (i.e. I might want to trim where the quality plummets especially if I have limited overlap in the reads - which I do).

Sarah - Thanks so much for the syntax for the make.contigs() command. I will give it a go and see if it gives me what I need.

westcott · September 25, 2015, 5:04pm

You can use the fastq.info command to parse the forward and reverse reads using an oligos file, http://www.mothur.org/wiki/Fastq.info.

mothur > fastq.info(fastq=yourForwardFastq, oligos=yourModifiedOligosIncludingForwardPrimersBarcodesOnly, pdiffs=2, bdiffs=1, fasta=t, qual=t)

mothur > fastq.info(fastq=yourReverseFastq, oligos=yourModifiedOligosIncludingReversePrimersBarcodesOnly, pdiffs=2, bdiffs=1, fasta=t, qual=t)

Topic		Replies	Views
MiSeq SOP using multiplexed R1.fastq and R2.fastq Commands in mothur	6	1074	October 9, 2017
Demultiplex fastq files Theory behind mothur	2	5414	November 20, 2015
Analyzing MiSeq data with only R1 and Index reads Commands in mothur	1	2419	February 2, 2015
make.contigs with index reads? Commands in mothur	1	2224	June 4, 2013
DEMULTIPLEXING MISEQ PAIRED READS Commands in mothur	28	22572	April 24, 2017

demultiplex dual index fastq files

Related topics