sub.sample and OTUs order of operations

Hi Pat,

I think I must be misinterpreting something here… in the 454 SOP you mention ‘the final step in getting good OTU data is to normalize the number of sequences in each sample’ but sub.sample is run on the shared file and when I pull the sub.sampled shared file and majority consensus taxonomy file into excel, the shared file is subsampled, but the taxonomy file is not (i.e. the name.final.an.0.03.subsample.shared file does not match the name.final.an.0.03.cons.taxonomy file, but the name.final.an.shared file does). Should I sub.sample the final.fasta, final.groups, and final.names files using persample prior to generating the distance matrix? or the list, names and taxonomy files used in classify.otu? Or is the general idea to use the subsampled shared file and only the corresponding OTUs from the consensus taxonomy file?

Thanks

Or is the general idea to use the subsampled shared file and only the corresponding OTUs from the consensus taxonomy file?

Correct - you get the shared file and cons.taxonomy files and then subsample/rarefy the data.

Pat

Thanks Pat,

I’ve been doing this manually in excell by bringing in the name.final.an.0.03.subsample.shared file and the name.final.an.0.03.cons.taxonomy file and deleting the OTUs from the taxonomy file that are not in the shared file. Is there a way to automate this, or am I missing an output step?

thanks
haley

Hi Pat, I think I need to rephrase my question.

I was interpreting ‘getting good OTU data’ the construction and classification of the OTU’s themselves, not downstream analyses using the OTUs. With respect to classifying sequences I’ve tried the following (out of curiosity) and the values change only slightly and I’m wondering if it’s just stochastic change or if there is one method that is more robust than the others. Essentially I’m just trying to create a stacked bar chart of diversity at the Phyla level at the >1% abundance level for each of my samples.

-Using the summary file output of classify.seqs
-Using the shared file and taxonomy file from classify.otu
-Using the subsampled shared file and selected OTUs from the taxonomy file
-Subsampling final.fasta, final.groups, and final.names files using persample and then using the output shared file and taxonomy file from classify.otu

Since I’ve already classified the sequences in classify.seqs, and can get a representative sequence for each OTU using get.oturep, is there any need to run classify.otu to work with the taxonomy file?

Thanks
Haley

For your application, I think the easiest would be to use the output from classify.seqs.

Pat