classify.seqs .tax.summary-output and unclassified sequences

Hi all,

I was trying to understand the .tax.summary output-file I get, using the classify.seqs command, especially what happens to the sequences marked as “unclassified”.
In the corresponding .taxonomy file occur both, sequences that do or do not have a bootstrap-value down to the genus-level when they are unclassified.
I get that “unclassified” sequences that show a bootstrap-value after the genus-level did have a reference-sequence in the database and that this reference-sequence was also “unclassified”, whereas “unclassified” sequences without bootstrap-value at the genus-level did not make it through the cutoff (down to the genus level) and could only be “safely” classified to the, lets say, family-level.

But where do these two types of sequences show up in the .tax.summary file? Do both types occur in there?
I noticed that there always seems to be one “unclassified” genus for each family, so does this one contain all the “unclassified” sequences, no matter if they made it through the cutoff (kind of a “trash can” taxon that contains “trash” sequences(<cutoff) as well as sequences from an unclassified taxon). Or does it only contain those sequences that could be safely classified as “unclassified”, based on a reference sequence in the database (I know, this is extremely confusing…I’m trying to be as precise as I can :oops: )

Thanks a lot!
Lena

1 Like

All unclassified sequences within a named taxon are grouped together - so yeah, it’s a “trash can” designation. There shouldn’t be anything the database with an “unclassified” label. The cons.taxonomy file may have bootstraps for the unclassified if that % of the sequences in the OTU are also unclassified.

Pat

1 Like