Suspect to have an overall error rate of 0?

amwalkero0o · March 31, 2020, 1:27am

I have been running a test with the same dataset in parallel to get an understanding of how using oligos in the make.contigs step affects the downstream OTU table. At the seq.error step the dataset which was run through the MiSeq SOP without oligos included in the make.contigs step had an error rate of 5.195e-08. When I ran seq.error with the dataset where oligos were used in the make.contigs step, the error rate was 0.

Is that possible/suspect?

Does this indicate that using oligos in the make.contigs step yields a more accurate OTU table?

pschloss · April 2, 2020, 11:38am

It’s hard to say, but these error rates seem really low, even without the oligos. How many sequences do you have in your test sets? What mock community are you using? How did you get your reference sequences? How many sequences are being kicked out in seq.error as being chimeras? Sorry for all the questions

FWIW, removing the oligos is the best practice when analyzing any amplicon data. The oligo sequence is really the primer sequence. Because there can be non-specific annealing of primers to the genome and subsequent fragments in PCR, the sequence over that region is not “true”.

amwalkero0o · April 2, 2020, 3:59pm

Answers to questions :

How many sequences do you have in your test sets?

956261 sequences; 5 environmental samples and 2 mock (1 for each plate)

What mock community are you using?

The zymobiomics community DNA standard

The exact DNA standard can be found here

How did you get your reference sequences?

 wget -N https://s3.amazonaws.com/zymo-files/BioPool/ZymoBIOMICS.STD.refseq.v2.zip
 unzip ZymoBIOMICS.STD.refseq.v2.zip
 cd ZymoBIOMICS.STD.refseq.v2/ssrRNAs
 rm *_Mitochondria_ssrRNA.fasta
 cat *.fasta > zymo_temp.fasta
 sed '0,/Salmonella_enterica_16S_5/{s/Salmonella_enterica_16S_5/Salmonella_enterica_16S_7/}' zymo_temp.fasta > zymo.fasta 

 seq.error(fasta=testy.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=testy.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.pick.count_table, reference=zymo.fasta , aligned=F)

How many sequences are being kicked out in seq.error as being chimeras?

When I used :

wc -l *error.chimera

The number was 2363. It looks mostly like Saccharomyces cerevisiae. I guess when removing mitochondria in the answer to question 3 I didn’t remove the 18S seqs as well. Maybe the better way to concatenate the ref seqs would be:

cat *_16S_* > zymo_temp.fasta

Let me know if you need more info and thanks for the help!

pschloss · April 3, 2020, 6:19pm

Sorry - how many sequences in your mock samples. Also, the chimeras are those in *.error.chimera that have a 2 in the last column

grep -c "2$" *error.chimera

amwalkero0o · April 3, 2020, 6:34pm

mock_comm2	184948
mock_comm1	181333

And when I ran your grep command I got 0.

amwalkero0o · April 4, 2020, 9:37pm

Hi there, I just ran seq.error on the entire dataset and the error rate was 0.000444531, which I think is more in the typical range. I am assuming that the very low rates I was getting was due to using just a subset of these data from the entire sequencing run.

system · April 14, 2020, 9:37pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
mock community - spurious OTUs Theory behind mothur	1	2611	February 11, 2015
Input oligos to make.contigs Commands in mothur	2	1615	June 11, 2017
Make.contigs() problem with the oligos option? mothur bugs	2	352	March 17, 2023
seq.error and chimera detection Commands in mothur	2	1048	March 30, 2017
Seq.error output strange Commands in mothur	5	389	November 28, 2020

Suspect to have an overall error rate of 0?

Related topics