Demultiplex dual indexed Illumina files

Hi, I know this topic have came out several times, but I have a problem demultiplexing my data, and I really do not know how to proceed.
I did pair a pair end double indexed procedure in illumina. For each plate, I have 1 forward primer and each well in the plate has a separate reverse primer. In the lane I have 5 reverse primers and 96 reverse primers.
From the sequencer, I have received 4 files (2 index files and 2 sequencing): S0_L001_R1_001.fastq.gz, S0_L001_R2_001.fastq.gz, S0_L001_I1_001.fastq.gz and S0_L001_I2_001.fastq.gz

To demultiplex, I tried the make.contig command in mothur
make.contigs(ffastq=/input_files/S0_L001_R1_001.fastq, rfastq=/input_files/S0_L001_R2_001.fastq, findex=input_files/S0_L001_I1_001.fastq, rindex=/input_files/S0_L001_I2_001.fastq, oligos=barcode_metadata_mothur.tsv, bdiffs=1, pdiffs=2, checkorient=t)

The code do not generate any errors, but all the generated information is deposited in the
S0_L001_R1_001.scrap.contigs.fasta file, the S0_L001_R1_001.trim.contigs.fasta is completely empty and the S0_L001_R1_001.contigs.report only have the headers. Obviously I am missing something, but I really do not know what, and I am quite lost.

Can anybody suggest something? I am quite new in this, and I do not know what I am doing wrong…

Just in case it could be of any use, I also have the oligo table in a tsv file, coded like this:

barcode forwardBarcodeString reverseBarcodeString sampleName
BARCODE GATTTAGAGGCT TCCCTTGTCTCC S_1_1_1
BARCODE GATTTAGAGGCT ATCCTTTGGTTC S_1_1_2
BARCODE GATTTAGAGGCT TACGAGCCCTAA S_1_1_3
BARCODE GATTTAGAGGCT TGTGTTACTCCT S_1_1_4
etc etc etc etc

Hey there,

I’m happy to help - can you email me the first 100 lines of each of the fastq files?

Thanks,
Pat

Sure!
Please find attached the links to the 100 first lines of each fastq (I cannot put so many lines via comment, sorry)
S0_L001_R1_001.fastq

S0_L001_R2_001.fastq

S0_L001_I1_001.fastq

S0_L001_I2_001.fastq

Thanks for your help!

Thanks, do you have the rest of the oligos file that I can see?

Pat

Also - what region did you sequence and what were the primer sequences?

Hi Pat, sorry for the delay, and thanks again for your help:
I noticed that I was missing a # on the first line of the oligo file, and I re-run the experiment. However, most of my reads end on the .scrap.contigs.fasta file (like 95% of them…).

I send you attached the oligo file

The region sequenced was the V5-V6 of the 16S, with the oligos
799F AACMGGATTAGATACCCKG
1192R ACGTCATCCCCACCTTCC

Thanks for the additional information.

Unfortunately, the sequences in the first 100 lines of the files you sent me appear to have very poor quality and don’t seem to match the barcodes. The reads themselves when I attempt to assemble them without screening for the barcodes don’t assemble and form 600 nt contigs although the amplicon should be about 375 nt.

I wonder if you have any other details on the sequencing run. Did the sequencing provider give you an indication on the percentage of reads that passed the Q30 filter? I worry that the sequencing run was overloaded with DNA or had some other problem with it. We generally have poor success with 2x300 sequencing.

Can you ask the sequencing facility to demultiplex the reads based on the index sequences? The software on the sequencer should be able to generate individual forward and reverse fastq files for each combination of barcodes.

Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.