I have been running a test with the same dataset in parallel to get an understanding of how using oligos in the make.contigs step affects the downstream OTU table. At the seq.error step the dataset which was run through the MiSeq SOP without oligos included in the make.contigs step had an error rate of 5.195e-08. When I ran seq.error with the dataset where oligos were used in the make.contigs step, the error rate was 0.
Is that possible/suspect?
Does this indicate that using oligos in the make.contigs step yields a more accurate OTU table?
It’s hard to say, but these error rates seem really low, even without the oligos. How many sequences do you have in your test sets? What mock community are you using? How did you get your reference sequences? How many sequences are being kicked out in seq.error as being chimeras? Sorry for all the questions
FWIW, removing the oligos is the best practice when analyzing any amplicon data. The oligo sequence is really the primer sequence. Because there can be non-specific annealing of primers to the genome and subsequent fragments in PCR, the sequence over that region is not “true”.
How many sequences are being kicked out in seq.error as being chimeras?
When I used :
wc -l *error.chimera
The number was 2363. It looks mostly like Saccharomyces cerevisiae. I guess when removing mitochondria in the answer to question 3 I didn’t remove the 18S seqs as well. Maybe the better way to concatenate the ref seqs would be:
cat *_16S_* > zymo_temp.fasta
Let me know if you need more info and thanks for the help!
Hi there, I just ran seq.error on the entire dataset and the error rate was 0.000444531, which I think is more in the typical range. I am assuming that the very low rates I was getting was due to using just a subset of these data from the entire sequencing run.