mothur

Can you collapse a detail tax.summary file to simple format

Hi there,

Just wondering if there’s a command that will allow me to change my tax.summary file from detail to simple format without re-classifying the sequences. What I’m actually trying to do it get a simple format tax.summary file after using merge.taxsummary. I could use tax.summary on the individual taxonomy files again, but then merge.taxsummary only works on detail formatted tax.summary files. Just wondering if there’s a way I can merge all my tax.summaries and then collapse the tax.summary file after merging them.

Thanks.

Hi - do you mean something like summary.tax?

Pat

Hi,

Ideally yes, but summary.tax doesn’t seem to work on a tax.summary file, only a taxonomy file. The merge.taxsummary output file can only be output in a detail format, but then I’m trying to get that tax.summary into simple format.

Thanks.

can you give an example of what you want? simple format can mean pretty much anything.

Hi Kendra,

Yep sure. Essentially I want to change a tax.summary file that looks like this:

taxlevel rankID taxon daughterlevels total
0 0 Root 1 1882862
1 0.1 Bacteria 54 1882862
2 0.1.1 10bav-F6 1 35
3 0.1.1.1 10bav-F6_cl 1 35
4 0.1.1.1.1 10bav-F6_or 1 35
5 0.1.1.1.1.1 10bav-F6_fa 1 35
6 0.1.1.1.1.1.1 10bav-F6_ge 0 35

to look like this:

taxlevel rankID taxon daughterlevels total
taxonomy total L32B7
Root 110 110
Bacteria;Actinobacteriota;Acidimicrobiia;Acidimicrobiia_unclassified;Acidimicrobiia_unclassified;Acidimicrobiia_unclassified; 1 1
Bacteria;Actinobacteriota;Actinobacteria;Actinobacteria_unclassified;Actinobacteria_unclassified;Actinobacteria_unclassified; 35 35
Bacteria;Actinobacteriota;Actinobacteria;Micrococcales;Micrococcaceae;Micrococcaceae_unclassified; 2 2
Bacteria;Actinobacteriota;Actinobacteria;Propionibacteriales;Propionibacteriaceae;Cutibacterium; 9 9

I know the taxon names don’t match in the two examples but they’re just copy/pasted from two separate files as examples. The first is a tax.summary file outputted from merge.taxsummary and the second is a tax.summary file outputted from classify.seqs. I know I could classify.seqs all my samples together and then the tax.summary would be outputted in my preferred format for that. But for samples/fasta that were processed separately, if you want to compare the relative abundance of each taxon between those samples (using merge.taxsummary), then the output comes out as the first example.
The reason I ask is that it’s so much easier getting the second example format into an excel sheet for phyloseq/downstream R analysis (pretty much copy/paste)! Seems like a lot to re-classify all the seqs again which can take a while.

Thanks for the replies hope this makes sense. Thanks

I’m confused. Why are you going through the summary file rather than through the *.taxonomy or *cons.taxonomy files? I’m not sure what’s in your second table that isn’t in those files.

Pat

Agree with Pat (though I don’t use phyloseq so not sure on the format required) but it seems like you just want to use the .taxonomy file. The point of the tax.summary (in my opinion) is to show the nestedness

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.