Shared miseq files (via basespace) already demultiplexed

I am very new to mothur. Do I need to create an oligos file if the samples are already demultiplexed? I am following the miseq sop. Anything I need to do before I run the make.contigs command? Thank you in advance.

Assuming the index and primer sequences have been removed, you would need to create something like our stability.files file.


Hello Pat,
It appears that the barcodes were removed, but not the primers. I tried to remove the primers using make.contigs. However, the issue is that a group file is not being generated. I received the following error message:
[WARNING]: your oligos file does not contain any group names. mothur will not create a groupfile.

My oligos.file is just the primers:

Since there are no barcodes in the fastq sequences, I am not able to include barcode in the oligos file. When I included the barcodes, I get a trim.contigs.fasta file that is empty.

So my question is, how do I generate a groupfile without barcodes?

I am using default setting when using the make.file command.

UPDATE: I decided to remove those primers using cutadapt. It would still be great to know if it is possible to remove primers from within Mothur, as well as generate a groupfile.

I have also obtained a dataset in the form of demultiplexed files: for 48 samples, I got 48 pairs of fastq files (one sample_XX_R1.fastq for the forward reads and one sample_XX_R2.fastq for the reverse reads). In these fastq, the barcodes have been removed but the PCR primers are still there. So I directly used make.contigs(file=file_listing_the_samples), the contents of that file being something like:

S1 sample_01_R1.fastq sample_01_R2.fastq
S2 sample_02_R1.fastq sample_02_R2.fastq

In the wiki, this is called the “three-column format”. By doing so I got the large fasta file of all paired-ended sequences and the group file.

Hello Maxime, I was able to make contigs bu using the command above, but is it not problematic that the sequence length averages close to 290 bp? If you trim the primers using cutadapt, or any other way possible, you end up with a length of around 252 or 253. The miseq sop has sequences of this length, so I felt that I had to remove the primers.

I think sequence length is not a problem in itself. However, Pat recommends full overlap of the forward and reverse reads (thus limiting sequence length to ~250b or 300b with standard Illumina kits). We have to choose primers accordingly. The ones that you indicate (referred to as 515F-806R) should fulfill that criterion, so to me you should not worry too much.

I’m also a bit new to using Mothur. Do the reads need to fully overlap? Or do we just need to make sure they significantly overlap? For example, can I use the 515/926 primers and 2x300 bp reads? The sequence length will be around 420 bp so the reads will have ~180 bp of overlap. Is that enough for make.contigs?