Apologies if this subject has already been discussed. I am attempting to look at taxonimic diversity of 15 samples at various OTU definitions. So I processed all samples together and ran <classify.seqs> with the group option. This gives me a *.summary file for all samples that includes total clones for each taxonomic level.
What I am interested in is a summary file with total OTU’s at each level instead of total clones. As an example, Sample A and Sample B may both have 100 clones of Actinobacteria but the actual taxa evenness may be considerably different between the two samples. Sample A may have 2 Actino OTU’s at 97% while Sample B may have 10.
I imagine a combination of classify.seqs and classify.otu where I can assess the taxonomic diversity between samples at different OTU definitions. Can anyone think of an efficient work-around for this analysis?
Hi again -
For those interested (otherwise I am conversing with myself on a message board) I may have found a solution to this query. Funny that I have been looking for a solution to this problem for a while now and it came to me after posting my earlier message - so sorry about that
First I ran:
get.oturep(column=all.filter.unique.dist, fasta=all.fasta, name=all.filter.names, list=all.filter.unique.fn.list, sorted=group, group=all.groups, label=0.00-0.01-0.02-0.03-0.05-0.10)
this produced two types of files for each OTU definition:
*.rep.names and *.rep.fasta
Next I ran the following command at each OTU definition and the result is a .summary file for each OTU with OTU totals (rather than clone). If you run this command with the *.rep.names file from the get.oturep step, the summary file will contain total clones:
classify.seqs(fasta=all.filter.unique.fn.0.XX.rep.fasta, template=nogap.full.fasta, taxonomy=silva.full.silva.tax, cutoff=80, name=all.filter.names, group=all.groups)
That approach makes sense, but be careful because there maybe situations where, depending on the definition, sequences in an OTU have different taxonomies. What we’re encouraging people to do as an alternative is to classify all of your sequences, and then use the classify.otu command. I’ve just added this method to the Costello example analysis page. So this doesn’t exactly solve your problem, however. We don’t have a command to count the number of OTUs in each taxonomic bin. Perhaps the easiest way of doing this would be to use a text editor, excel, or R, to count the number of times you see each string.
Indeed - I had not thought of this but you are correct. I think your point is especially important at finer taxonomic scales.
Great though! thanks for the advice.