Make.contig problem

Hi all,

I am very new at bioinformatics, and I’m having similar issues to those already outlined with sequences from Argonne (e.g. DEMULTIPLEXING MISEQ PAIRED READS). I joined a MiSeq 2x250 bp PE sequencing run with another lab and they have the demultiplexed files, but are unable to share due to an IT issue on the sequencing facilities end, thus leaving me with non-demultiplexed files. I’ve tried make.contigs many times, based on the instructions in the thread I linked above and others. There seems to be a problem with the barcode mismatching. When by bdiffs parameter is set to 0, most of my files end up in scrap because the bdiffs are 3-5 bp length. Either way, when I proceed forward to summary.seqs, I get the majority of my contigs at <<251 bp (the expected length).

Files from Sequencing facility:
R1: Undetermined_S0_L001_R1_001.fastq
R2: Undetermined_S0_L001_R2_001.fastq
Forward Index: Undetermined_S0_L001_I1_001.fastq.
Mapping: #SampleID BarcodeSequence LinkerPrimerSequence Description

LinkerPrimerSequence: GTGTGYCAGCMGCCGCGGTAA

Work:
Oligos File

barcode CAATTCTGCTTC NONE AF1
barcode GTTATACATTCA NONE AF2

note: past forums have suggested including the linkerprimersequence in the first line as “LPS NONE”, but that results in no .groups. file being generated
as suggested in past forums, I have also tried the rc version of the oligos file, and switching the barcode to column 3. Regardless, I continued:

make.contigs(ffastq=Undetermined_S0_L001_R1_001.fastq, rfastq=Undetermined_S0_L001_R2_001.fastq, findex=Undetermined_S0_L001_I1_001.fastq, oligos=oligos, bdiffs=3)

summary.seq:

In sum, there seems to be two issues,

  1. the mismatches in the barcodes issue- without setting bdiffs to at least 3, most of my contigs end up in scrap
  2. resulting in far shorter sequences than anticipated.

As I mentioned, I am very new to this, so user error is definitely a possibility. I tried other demux packages (exp: idemp was suggested on the forum I linked above), but the results were almost identical. It may be that the sequencing center needs to fix something, but thought I would check here.

Any help would be much appreciated! Thank you!!

Are you trying to sequence the 16S rRNA gene? Which region? It sounds like you intended to sequence the 250 nt V4 region, but have something much smaller. Given the file name, I wonder if these are the PhiX control sequences rather than the amplicons. You might circle back to your sequence provider to see if they can confirm what you have. Something you might try doing is looking at the fastq files and use one of the sequences as input to a blastn search and see what it maps to.

pat

Thanks for the quick response, Pat! I definitely think that the problem was with the files I was provided. Smooth sailing now!

Thanks again!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.