Difference .taxonomy and tax.summary

Hello there,

I am a bit confused by the .taxonomy file and tax.summary file and can’t seem to find the answer on the wiki and/or forum.

If I take a look at my .taxonomy file I get this output (first 2 lines):
OTU Size Taxonomy
Otu00001 943402 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Leuconostocaceae(100);Leuconostoc(100);

Which I think returns the amount of sequences that correspond to OTU00001 (=943402). If I then search for Leuconostoc through my tax.summary file I expect to find the same amount under the ‘total’ column, but instead get 429 as total size. Am I interpreting the ‘size’ column wrong in tax.summary? If so, what does this column represent?

Thanks in advance

The basis parameter allows you indicate what you want the summary file to represent, options are otu and sequence. Default is otu. For example consider the following basis=sequence could give Clostridiales 3 105 16 43 46, where 105 is the total number of sequences whose otu classified to Clostridiales. 16 is the number of sequences in the otus from groupA, 43 is the number of sequences in the otus from groupB, and 46 is the number of sequences in the otus from groupC. Now for basis=otu could give Clostridiales 3 7 6 1 2, where 7 is the number of otus that classified to Clostridiales. 6 is the number of otus containing sequences from groupA, 1 is the number of otus containing sequences from groupB, and 2 is the number of otus containing sequences from groupC.

You are correct, the size in the .taxonomy file is the number of sequences in the OTU. The *.tax.summary file lists the number of OTUs that classified to the given taxonomy.

Thank you! I did not know that option existed.

Many thanks!