Hi all - we’re migrating over from using QIIME to Mothur for our analysis of 16S V4 region, and have encountered a problem using the make.contigs function in conjunction with an .oligos file, such that a .group file is produced. I’ve read several of the posts on this topic, such as here, here, and here, but have not found a fix.
We’re running MiSeq V2 PE250, and are using the Schloss dual indexing primers. We always receive our data from the local sequencing core in the form of two fastq files, one for read1 and the second for read2. Looking at the head of one of these files shows that the dual index is actually listed in the header for each sequence (see bold text below). The reverse complement of the i7 is listed first (underlined), followed by the i5. Is there a way to have make.contigs look in the header for these indices? How would I format the .oligos file given that the dual index is listed for both read1 and read 2?
Every version of an oligos file I try results in everything going to the scrap.contigs file. If I don’t specify any oligos file, I get good contigs and can successfully run align.seqs. Of course, then I have no .group file to use downstream.
If your samples are already demultiplexed (R1 and R2 fastq for each sample), you don’t need an oligos file just a stability file.
That’s just it, they’re not demultiplexed. There is only a single R1 and one R2 fastq, but they contain containing close to 300 dual indexed samples, with the dual index embedded as part of the header. So It’s not clear to me how one would go about accessing this barcode to demultiplex in Mothur.
In QIIME, this is accomplished using the extract_barcodes.py function with the option “barcode_in_label” specified. Not sure if there’s is a similar function in Mothur. Perhaps this is an uncommon way for barcodes to be stored in the fastq…anyone else have this issue?
Have you tried reaching out to the sequencing provider and ask them to split the fastq files for you? Alternatively, I’m afraid you’re going to have to write a script to split them by the paired indices. This really shouldn’t be hard for them to do for you. If they have problems, feel free to email us and we can show them what we do.
Pat - Thanks for the warm welcome. I’m talking with our seq core now to have them do the demultiplexing. They’re happy to do this for us, so I think I’ll be over this small hurdle soon.
Hi all - just wanted to post on update on where I’m at with this issue, and also seek some more feedback
Based on Pat’s suggestion, I went back to our sequencing core and asked them to demultiplex the files. Every time they tried this, a significant amount of data ended up in ‘undetermined.fastq’. Looking at the reads in this file, it was clear there were certain barcodes that always ended up here. After a bit of back and forth, they realized there was a low-level error in the their demultiplexing script that caused a barcode conflict when allowing for up to one mismatch. So, they have gone back to giving me one file for all forward reads, and one for reverse. However, they have removed the barcode from the header (the initial issue that prompted me to open this thread), and instead have left them in the read itself. Now the files look like this:
Despite these changes to my input .fastq files, I am still unable to successfully demux (everything going to scrap) when I use the following .oligos file (only head of file shown), where second column is reverse-complement of i7 and third column is i5
If you’re using the method outlined in the Kozich paper, you won’t have the index sequences or the primers on the sequences. The method actually generates four files - two for the index reads and two for the sequence reads.
If these reads are examples of how you get your data back from the sequencing provider, they’re doing something screwy. They seem to have pasted your index to the end and beginning of your reads. Even if they were using our primers and then sequencing off of the adapters (not recommended - it should go barcode, pad, link, primer. Also, the reads should only be 251 nt, not 259. I suspect they may have created a hack by concatenating the index sequence to the end and beginning of the reads. This probably would have worked if they had concatenated the index sequences to the beginning of both reads.
Regardless, I would be a bit worried about their inability to demux the samples on the machine. As far as I can tell, this is done on the machine and not with some custom scripts. If they need help setting this up, have them feel free to send us me an email to my umich.edu account at pschloss.
Hi Pat - thanks again for helping with this. I got this issue resolved with our sequencing core. They now provide me with the barcodes in two separate .fastq files, and this works with make.contigs using the findex/rindex arguments.