Unclassified sequences

What is the meaning of an unclassified sequence? I ask this in terms of whether it cant find a reference (novel) or that there are too many similar that there is no consensus/match? Is there a difference between these two results?


Technically, anything that has a confidence score below your threshold (80%) is classified as “unclassified”. This can happen for the two reasons you mention…

  1. There aren’t good references “close” to your sequence
  2. There are competing taxa that are just as probable because they are just as close to your sequence

These can happen because the database is inadequate in your area of choice or because your fragment is inadequate. This is a cartoon example, but if you sequence the V35 region, you can’t differentiate between Staph aureus and eppidermidis because the sequences are identical. But if you sequence the V13 region you can. This is where fragment length comes into play. So it can be both the database and the DNA fragment you’re trying to sequence.

But would there be a way to score either 1) or 2) above?
e.g 1) “Cannot classify”
2) “Too many matches”

Hmmm… I can’t think of a way. It could be too many matches because there aren’t any good matches.