Suspect to have an overall error rate of 0?

I have been running a test with the same dataset in parallel to get an understanding of how using oligos in the make.contigs step affects the downstream OTU table. At the seq.error step the dataset which was run through the MiSeq SOP without oligos included in the make.contigs step had an error rate of 5.195e-08. When I ran seq.error with the dataset where oligos were used in the make.contigs step, the error rate was 0.

Is that possible/suspect?

Does this indicate that using oligos in the make.contigs step yields a more accurate OTU table?

It’s hard to say, but these error rates seem really low, even without the oligos. How many sequences do you have in your test sets? What mock community are you using? How did you get your reference sequences? How many sequences are being kicked out in seq.error as being chimeras? Sorry for all the questions :wink:

FWIW, removing the oligos is the best practice when analyzing any amplicon data. The oligo sequence is really the primer sequence. Because there can be non-specific annealing of primers to the genome and subsequent fragments in PCR, the sequence over that region is not “true”.

Answers to questions :slightly_smiling_face: :

  1. How many sequences do you have in your test sets?

956261 sequences; 5 environmental samples and 2 mock (1 for each plate)

  1. What mock community are you using?

The zymobiomics community DNA standard

The exact DNA standard can be found here

  1. How did you get your reference sequences?

     wget -N https://s3.amazonaws.com/zymo-files/BioPool/ZymoBIOMICS.STD.refseq.v2.zip
     unzip ZymoBIOMICS.STD.refseq.v2.zip
     cd ZymoBIOMICS.STD.refseq.v2/ssrRNAs
     rm *_Mitochondria_ssrRNA.fasta
     cat *.fasta > zymo_temp.fasta
     sed '0,/Salmonella_enterica_16S_5/{s/Salmonella_enterica_16S_5/Salmonella_enterica_16S_7/}' zymo_temp.fasta > zymo.fasta 
    
     seq.error(fasta=testy.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=testy.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.pick.count_table, reference=zymo.fasta , aligned=F)
    
  2. How many sequences are being kicked out in seq.error as being chimeras?

When I used :

wc -l *error.chimera

The number was 2363. It looks mostly like Saccharomyces cerevisiae. I guess when removing mitochondria in the answer to question 3 I didn’t remove the 18S seqs as well. Maybe the better way to concatenate the ref seqs would be:

cat *_16S_* > zymo_temp.fasta

Let me know if you need more info and thanks for the help!

1 Like

Sorry - how many sequences in your mock samples. Also, the chimeras are those in *.error.chimera that have a 2 in the last column

grep -c "2$" *error.chimera
mock_comm2 184948
mock_comm1 181333

And when I ran your grep command I got 0.

Hi there, I just ran seq.error on the entire dataset and the error rate was 0.000444531, which I think is more in the typical range. I am assuming that the very low rates I was getting was due to using just a subset of these data from the entire sequencing run.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.