I got a problem trying to make contigs from my MiSeq 2x250 Illumina data. I used the original command “make.contigs” with the stability file but later on I got the “stability.trim.contigs.good.unique.align” of 3 Gb size and in subsequent analyses I finished with only 10% of my original sequences…
I decided to do the following approach in order to select good quality sequences before make contigs option:
Using 1 processors.
Making contigs…
[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, HWI-M02808_228_AWGN8_1_1101_13861_2168.
[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, HWI-M02808_228_AWGN8_1_1101_11143_2183.
[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, HWI-M02808_228_AWGN8_1_1101_10817_2281.
[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, HWI-M02808_228_AWGN8_1_1101_12388_2296.
[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, HWI-M02808_228_AWGN8_1_1101_19708_2308.
and many more…
I imagine what’s happening is that some of your forward or reverse sequences are getting discarded during trimming, which is then throwing out the order during make.contigs.
To clarify what I think dwaite is saying, we suspect your sequences were trimmed or processed before coming into mothur. make.contigs needs the raw fastq/fastq.gz files as they come off the sequencer. There should be the same number of reads in the files for the forward and reverse reads
My puropse was to run make.contigs with fasta files that did not contain sequences with Q average values below 30. I suspect that the number of sequences in the forward and reverse fasta files is different and this is the reason of the warning message.
Do you have any idea of how could I make contigs from my data?. Initially, I used the option stability.file but like I said in my messabe above it didn’t work.
I don’t know if this can be done in mothur, but tools like trimmomatic can do quality filtering in a way that only retains pairs when both the forward and reverse sequence pass the filtering criteria. If this is the way you want to analyse the data, it might be worth screening the sequences outside of mothur, then bringing them in for pair joining and the rest of the workflow.