Searching for command...

Hello,

I have been looking around in the manual but cannot seem to find a command to complete a task I’d like to do.

When we had our data sequenced, we got our raw data back, as well as some processing of it. Some of the analyzed files gave us our OTU counts for different samples, for different levels of taxa, in specific lineages. For example, at the phylum level:
______________ sample 1____ sample 2___sample 3 …
bacteroidetes______ X _________ X ________ X
spirochaetes ______ X _________ X_________ X
proteobacteria ____ X _________ X_________ X

or genus:
______________ sample 1____ sample 2___sample 3 …
clostridium_______ X _________ X ________ X
roseburia ________ X _________ X_________ X
pseudomonas_____ X _________ X_________ X

Is there a command that I have overlooked in order to make such an output at different taxa? I would really like to compare the analysis that they did to the results from my own run-through of the SOP.

Thanks

Do you want OTUs or number of sequences from each sample?

I usually do:
classify.seqs
OR
classify.otu
BUT include the group file in there.

If you open the tax.summary file you should hopefully get what you want…

Thank you for your help!

It appears that the classify.seqs tax.summary file was pretty close to what I need. I would still prefer to have each taxa in its own file/sheet. I don’t see any options in the file command that would let me pick out my taxon of choice. I could always cut and paste in Excel the taxon into their own sheet, but if there is another easier way I would be very happy to be enlightened as to what that is.

Also, it seems like there is a lot of “unclassified.” I used the trainset template as specified in the 454 SOP. Is there another template that would provide better results?

Thanks

The output should be easily parsable in R.

As for the unclassifieds that’s all part of the problem of database-dependent approaches like classification. Your sequences are too short, the 16S gene doesn’t have enough resolution, the database is incomplete, the overall taxonomic hierarchy is limited, etc.

Pat