Hi Pat,
Thanks for putting together the classify function. I’ve been using it to compare classifications across many pryotag sequenced samples, but because of the way the file is structured, it is difficult to match different classified samples together to compare classified abundances.
I noticed that each sample fasta file (say parsed from the trim function using the groups file) will generate it’s own tax.summary file after classifications. Each tax.summary file line has roughly the format:
0 0 Root 1 39725
1 0.1 Bacteria 26 39725
2 0.1.1 Acidobacteria 3 44
A major issue that makes the classification identifier (here 0.1.1 for Acidobacteria) difficult to use for matching samples is that for each sample tax.summary file the numbering is different, and is based on the list of the existing classifications in the sample rather than a master sort order. However, the classifications are listed - from what I can tell - in the same order (if present)? If I could get the output of all classifications with classification identifiers based on a master sorted list rather than numbered from what is only contained in the sample, this would make matching classifications across samples much much easier to parse.
How hard would this be to implement?
Jackson