This is my first time with mothur, and I’m using someone else’s data, so it’s not in an ideal format.
Is there a way to make contigs from files that have both forward and reverse in them?
I downloaded my data in just one big fastq file, then I split it by sample into individual fastq files. But, the original file didn’t distinguish between forward and reverse sequences. So, I made the stability.files just contain each file twice, like so:
sample1 sample1.fastq, sample1.fastq
sample2 sample2.fastq, sample2.fastq
The contigs are really bad though - the mean number of ambigs is about 5. It’s V4V5, and each read is 150bp so it’s getting about 250 total, with 50bp overlap. The 75%-tile has a length of 295 though, which means my contigs are all way too long (would increasing the gap penalty help this?)
I couldn’t use ambigs=0 when I do screen.seqs, since this eliminated almost all my data.
After the align step I’m left with about 250,000 sequences, down from 4,000,000 initially. It’s about 1,500 sequences/sample, which is really poor coverage, isn’t it?
All of this makes me think that there is something not good about how I was doing make.contigs. Does anyone know if my method of using the same file for forward and reverse works when both forward and reverse are mixed into one file?
I also know that doing both the V4 and V5 does lead to worse alignment of contigs, since there’s less overlap, are these data normal? I had done the example MiSeq SOP and the quality of contigs was much much better, which is what makes me believe I’m doing something wrong.