classify.seqs taxonomy summary file formatting

jzlee2002 · February 9, 2010, 7:34pm

Hi Pat,
Thanks for putting together the classify function. I’ve been using it to compare classifications across many pryotag sequenced samples, but because of the way the file is structured, it is difficult to match different classified samples together to compare classified abundances.

I noticed that each sample fasta file (say parsed from the trim function using the groups file) will generate it’s own tax.summary file after classifications. Each tax.summary file line has roughly the format:

0 0 Root 1 39725
1 0.1 Bacteria 26 39725
2 0.1.1 Acidobacteria 3 44

A major issue that makes the classification identifier (here 0.1.1 for Acidobacteria) difficult to use for matching samples is that for each sample tax.summary file the numbering is different, and is based on the list of the existing classifications in the sample rather than a master sort order. However, the classifications are listed - from what I can tell - in the same order (if present)? If I could get the output of all classifications with classification identifiers based on a master sorted list rather than numbered from what is only contained in the sample, this would make matching classifications across samples much much easier to parse.
How hard would this be to implement?

Jackson

pschloss · February 9, 2010, 11:26pm

Your idea is a good one and I appreciate your feedback. It shouldn’t be too hard to implement - it’s on the list.

jzlee2002 · April 9, 2010, 12:50am

If I may add to this request, what I’m really trying to do would be covered by support for a “group” file option in classify.seqs. After classifying the sequences, each sequence could be listed in a tax.summary file in column form as to which environment/ sample it comes from instead of as it currently exists where only one entry lists all total sequences classified at a particular taxonomic level. Right now I have a python script parsing this in a crude way over several tax.summary files, but as soon as the file format changes I’ll have to rewrite it again.

e.g.

taxlevel, rank ID, label, daughterlevels, total sample1 sample2 sample3 ...etc.
0 0 Root 1 55 10 15 30
1 0.1 Bacteria 5 25 5 10 10
2 0.1.1 Actinobacteria 1 6 0 2 4

I am attempting to create a figure as in Brazelton et al. 2010 (Fig. 4) http://www.pnas.org/content/107/4/1612.abstract.

Jackson

pschloss · April 9, 2010, 11:19am

Ahhh… That’s a very cool idea. It won’t be in the next release, but it will definitely get in soon.

Thanks for the idea,
Pat

Topic		Replies	Views
create tax.summary file, from taxonomy file Feature requests	1	4932	February 13, 2012
classify.seqs output names mothur bugs	1	2378	January 16, 2013
Constructing a taxonomy summary from a input classification Commands in mothur	1	1305	March 25, 2015
Classify.seqs force classifier categories Feature requests	7	6289	May 20, 2013
classify.seqs output: inconsistent taxon name usage mothur bugs	5	9214	November 20, 2010

classify.seqs taxonomy summary file formatting

Related topics