mothur

Seq.error output strange

Hi Pat

mothur_v1.44.3
seq.error(fasta=final_otu_seqs.fasta, reference=HMP_MOCK.v35.fasta, aligned=F)

HMP-Mock-Even is sequenced with 16S V6V9 primers; just wanna check error rate in OTUs compared to the reference.

Observed some OTUs have weird output:
|query|reference|weight|insertions|deletions|substitutions|ambig|matches|mismatches|total|error|numparents|
|OTU_1008|M.smithii1|1|0|0|0|0|0|0|0|0|1|
|OTU_2330|S.pneumoniae1|1|0|2|15|0|281|17|298|0.057047|1|

OTU_1008
TAGTCGTCGGTTAAGTCCGGCAACGAGCGCAACCCACGTCCTTAGTTGCCAGCATTCAGT
TGGGCACTCTAGGGAAACTGCCGGTGATAAGCCGGAGGAAGGTGTGGATGACGTCAAGTC
CTCATGGCCCTTACGGGTTGGGCTACACACGTGCTACAATGGCAGTGACAATGGGTTAAT
CCCAAAAAGCTGTCTCAGTTCGGATTGGGGTCTGCAACTCGACCCCATGAAGTCGGAATC
GCTAGTAATCGCGTAACAGCATGACGCGGTGAATACGTTCCCGGGCCTTGTACACACCGC
CCGTCACACCATGGGAATTGGTTCTACCCGAAGGCGGTGCGCCAACCTCGCAAGAGGAGG
CAGCCGACCACGGTAGGATCAGTGACTGGGGTGAAGT

OTU_2330
CGTTACCCTTAGGTACCTACGGCGGTACTACACACGTGCTACAATGGCTGGTACAACGAG
TCGCAAGCCGGTGACGGCAAGCTAATCTCTTAAAGCCAGTCTCAGTTCGGATTGTAGGCT
GCAACTCGCCTACATGAAGTCGGAATCGCTAGTAATCGCGGATCAGCACGCCGCGGTGAA
TACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCACGAGAGTTTGTAACACCCGAAG
TCGGTGAGGTAACCGTAAGGAGCCAGCCGCCTAAGGTGGGATAGATGATTGGGGTGAAGT

Actually, OTU_1008 could be aligned to R. sphaeroides. So not quite sure what is the reason for this performance. most of OTUs have fine output like OTU_2330.

Thanks.

Without looking at the fast file and your HMP_MOCK file it would be hard to say. FWIW, I’d discourage running seq.error on OTU data. It is generally only run on the fasta data prior to clustering as is shown in the MiSeq SOP. If you could forward final_otu_seqs.fasta and HMP_MOCK.v35.fasta to mothur.bugs@gmail.com with a link to this post we can take a look.

Pat

Thanks Pat!

Just sent the two files for you to have a look.

Thanks for sending your files. The seq.error code had a small bug. When OTU_1008 was aligned to the references, some references resulted in an alignment with no overlap. Mothur mistakenly reported 0 diffs between the reference and the query read in these cases, because the length of the aligned read was 0. The number of diffs should be MAX_INT when there is no overlap. I have fixed this bug and the change will be part of our next release.

Thanks Sara!

Make sense now.