I have an OTU that matched to the family level against the greengenes db.
When I took the consensus sequence and blast searched it against the NCBI refseq 16s, it matched 100% identity (253/253) to two " strain"
Could someone explain why that might be?
Thank you so much!
Two reasons, I don’t know which is correct or if it’s a combination of both.
First, there are a lot of entries in Greengenes that don’t have genus-level taxonomy assigned. If you BLAST against Greengenes and have a 100% hit to a sequence with only Family or Order level taxonomic assignment, that’s your result. That same sequence might also be in RefSeq with a full taxonomic assignment, or maybe the sequence you’re finding in NCBI isn’t present in Greengenes (at this stage, Greengenes hasn’t been updated in 2.5 years).
Also, the default classifier in mothur uses the RDP Bayesian method which uses bootstrap support at each taxonomic rank for it to assign to that level. So potentially you could have a number of strong hits to sequences assigned to the same Family but different Genera, so mothur will report the common Family level and not the inconsistent Genus.