help with make.contigs using an index file

I am new to mothur, and am having trouble with one of the earliest stages (make.contigs).

I have 3 files:

  1. sequences_R1.fastq
  2. sequences_R2.fastq
  3. index_I1.fastq

The sequences still have their primers, but the barcodes have been completely removed and are instead found in the index file. I created a stability file in the 4 column format as follows:


sequences_R1.fastq sequences_R2.fastq index_I1.fastq none
I also created an oligos file, indicating the forward/reverse primers and the sequences of the barcodes.

I attempted make.contigs by inputing the following:


mothur> make.contgis(file=sequences.stability, oligos= sequences.oligos)
However, all of my sequences were placed into the scraps.contigs file, and my trim.contigs file was empty. Does anyone know why this is happening, and if there is something I am not doing correctly?

It looks like you are setting up the file correctly. Have you tried using bdiffs=1, pdiffs=2? Could you post the oligos file? What are the scrap codes mothur is giving you for failing the sequences?

I set up a tab delimited oligos file is as follows:

primer forward_primer_sequence reverse_primer_sequence V4
barcode oligo_sequence none sample_ID


What do you mean by "scrap codes?" Is this found in the contig.report?

what does bdiffs=1 and bdiffs=2 do? I don’t recall seeing this in the mothur SOP for MiSeq, so I have not tried this.

In the *.scrap.fasta file, you will see things like this:

M00178_4_000000000-A1AE6_1_1101_13868_1395|bf
TGTCACCTACGGGAGGCAGCAGTGGGTANTATTTGACNATGNGGAAAGCNTGA…

The bf is the scrap code mothur generated to indicate why this sequence failed. In this sequence’s case, it failed due to the barcode (b) and the primer (f). The pdiffs and bdiffs parameters allows you to tell mothur to allows for differences between the barcode or primer and your sequence. First mothur looks for an exact match. If one is not found, then mothur aligns the the barcode or primer and sequence fragment allowing for at most bdiffs or pdiffs differences. If more than one barcode is found as a match, then no match is found. Pat recommends bdiffs=1 and pdiffs=2.

Some reads in my file say “b” and some say “bf”. These seems to be the only code. I will try the bdiffs and see if that helps.

Do the barcodes still need to be in my R1 and R2 reads in order for the index file to find it? In my files, the barcodes are completely removed from the sequences in my R1 and R2 reads, however I noticed that there were 136 sequences within my trim.contigs file (out of 9million total reads), and these sequences still had the barcodes attached (not sure why). So, where the other contigs scrapped because their barcodes were removed? I am still not exactly sure why the barcodes need to be completely removed and placed into a separate index file, or what the purpose of the index file even is, but this is the raw output I was provided.

When using an index file, mothur expect the R1 and R2 files to have the barcodes removed. The barcodes should be in the index file. As mothur is reading through the R1, R2 and Index files, it looks for the barcodes in the index file, the primers in the R1 and R2 files and then if the barcodes and primers are found the reads are assembled.

Thank you! This seems to have worked, I now have close to 10,000 sequences in my trim.contigs file (there were 9million total reads in the raw files). That seems like a very low recovery number though (I have a total of 96 libraries, and when looking at my groups file, most libraries only have about 100-200 sequences. Several just have 1 read).

I will need to take a closer look to see if good reads are getting thrown out, or if the run itself produced mostly bad reads…

thank you for all of your help!