PCR or Pyrosequencing error calculation

Hi all,
I might have missed it, but, is there a way in mothur to calculate the PCR/Pyrosequencing error rate as per the nucleotide position by comparing each reads to appropriate template sequences? It’d be awesome if we can estimate the error rate per position of a sequence.

Thank you
Migee

It’s in their if you look hard enough, but it’s not quite ready for prime time…
Pat

Hi Pat,
I am thinking it could be done using chimera.check by using masked and aligned sequences, however i couldn’t find out how is IS value calculated ? Am i totally off the target :frowning: ?

Thanks for the help.
Migee

Can the errors be calculated using the chimera.pintail using a template sequence of known organism and the potential sequences (based on RDP classifier) of that organism as the query sequence ?

Migee

The command to use is seq.error. It isn’t public yet, but it’s in mothur. You give it your sequenced mock community sequences and the reference mock community sequences, both aligned the same way.

For example…

seq.error(query=myseqs.align, reference=reference.align)

Thank you very much. This made things much easier. I used the *.error.seq file and used it to plot the graph for errors per nucleotide.

Migee

Hi,

The documentation for this function (seq.err) is really insufficient when compared to some others. As far as I understand, it cannot be used to make a distinction between errors that are caused by PCR and those by pyrosequencing. Is this true?

Thanks in advance for your time.
Regards

H.

It’s pretty impossible to distinguish the two. In general, PCR errors are about 10^4 lower than sequencing errors.

Thanks for the prompt answer, but I’m not sure if I understand why it should be impossible to do, for instance in the presence of a community that is simulated in-vitro. Considering the mainly different characteristics of PCR (subs) and 454 (homo-indels) errors, I’d argue that it should be possible to make that distinction to some (fine) degree, and was wondering if mothur provides such feature.

Well when you figure the Taq error rate is 1 in 10^5 to 10^6 and the 454 error rate is about 1 in 100, it’s hard to say what’s what. In our PLoS ONE paper, we found that the number of substitutions accounts for about 1/3 the error rate. You can see these different types of errors in the *.error.matrix that is outputted from teh seq.error command.