importing itag sequences into mothur from JGI

I have illumina sequences from JGI that have been run through the iTagger pipeline that I would like to run through mothur and am searching for the most logical entry point into mothur.

For raw data, I have one interleaved fastq file per sample. I also have one fastq file per sample containing sequences that have had the primers removed and undergone merging and quality trimming.

To input raw data, I have created separate R1 and R2 files and attempted to run make.contigs. This command scraps almost everything - the scrap files are measured in GB and the trim files are measured in KB. I tried following the advice on the “Problem with make.contigs” thread (Problem with Make.contigs) (with the caveat that I do not have an findex file), but it did not fix the problem:

makeContigs.oligo file:

ER15M_B10_April2013_R1.fq ER15M_B10_April2013_R2.fq
ER15M_Bminus_Aug2012_R1.fq ER15M_Bminus_Aug2012_R2.fq

make.contigs(file=file, oligos=makeContigs.oligos, checkorient=t, processors=8)

I think the primary issue is that the insert is ~250bp when the primers are removed and the reads average 294bp. Of the few sequence pairs I have looked at manually, F and R primers appear in each read and reads overlap completely.

I tried looking into removing primers first, but I could not figure out how to do this with mothur since it expects to be removing primers from 454 or merged illumina data.

Is there a good way to input a series of fastq files (one file per sample) containing merged reads that have had primers removed?

Any insight on how to get JGI itag raw data or pre-merged and trimmed data into mothur would be very much appreciated.


Are they doing V3 chemistry to sequence the V4 region? This is possibly a big big problem. If you sequence off the end of the fragment the error rate skyrockets (this has been confirmed by Illumina). Furthermore, when you make the contigs, your contigs will start and end with adapter sequence, not the primer/barcode sequences. This would cause the problems you are seeing.

Can you try running make.contigs without an oligos file and just use a files file and see what the contigs look like?


Hi Pat,

They are trying to sequence the V4 region. It is not clear to me what chemistry was used, but I am thinking that they used V3 since the individual read lengths are nearly 300 bp (this is based on my naive understanding - I will contact the program officer to confirm).

make.contigs without an oligos file works. I am not sure whether I should be suspicious that no sequences were sent to the scrap file, but the resulting contigs without long strings of N’s look good. The primers map to the sequences and there is a range of ~4-8 bases between where the sequences start and where the primer sequence starts. When I try running trim.seqs to remove the primers all of the contigs end up in the scrap file, so I am not exactly sure how to proceed.

Thank you for your help!


Hi again,

Nothing will go to scrap because you haven’t applied any filtering criteria if you’re just doing a straight up contig formation step. I suspect those 4-8 bases between where the sequences start and the primers are your barcodes. You’ll have to get the oligos files set up correctly to remove the barcodes. Alternatively, they may be base calls for where the sequencer over ran the start of the other read and possibly over the end of the fragment. You can dea with this using the trimoverlap=T parameter.

FWIW, using the V3 chemistry and sequencing over the end of the amplicon (i.e. 300 nt on a 290 nt fragment) are both going to cause you problems. The V3 chemistry quality craps out after 500 total nt (i.e. 200 nt into the second read). Also, the quality for the entire run drops when you sequence beyond the fragment. This will cause many problems and you should check out this blog post:

Feel free to share that and our Kozich paper with JGI if you need help convincing them that they should be using the V2 chemistry still…