how does seq.error work?

Hi Mothur community,

I would like to know how does the command seq.error works. I didn’t find much information about that on the wiki page for this command.

My question arises from this story:
The quality of my sequences is rather poor, especially towards the end; however, I have a lot of sequences more than I need. Thus, prior to start the processing with Mothur SOP, I applied a quality filter that removes the sequences with an expected error over a certain threshold (based on the quality scores). To test if this approach is effective, I applied the mothur SOP to the Mock community sample with or without this filtering, until I reached the “assessing error rates” step. I calculated the rarefaction curve and the error rate for the two. The slope of the two rarefaction curves is remarkably different, this, as far as I know, meaning that the filtering works. However, the error rate is the same (high!).

I feel that knowing better how the error rate is calculated could help me to understand this discrepancy.

Thanks a lot!

Essentially it is aligning your sequences to those in the reference sequence set and finds the best match. Then it aligns those sequences to each other and calculates the error rate and types of mismatches. If it sees that it is a chimeric (based on the possible combinations of reference sequences) then the sequence is tossed and does not contribute to the error rates.


Thank you!