classify.otu with normalised data

andreas · March 8, 2013, 11:43am

Hi,
after processing my 16S rRNA pyrosequencing data (which followed more or less your SOP), I want to normalize the number of sequences in each sample. Both commands sub.sample and normalize.shared worked well with a shared file. Thatâ€™s very useful for multivariate analysis of the bacterial communities. However, I also want to compare the community structure across the samples at different taxonomic levels. The classify.otu command (using the group option) provides the respective data, but I cannot use it with the normalised shared file. Is it possible to normalise the list file in the same way as applied for the shared file or any other option to run the classify.otu command with a normalised data set?

I tried the commands phylotype + make.shared + sub.sample separately for each taxonomic level as a workaround. However, I noticed a difference in the number of OTUs and the number of sequences per OTU with the two lowest taxonomic levels (label = 1 or 2) comparing the summary file from classify.otu and the shared file from the make.shared command.

What do you think what is the best way?

Thank you
Andreas

pschloss · March 8, 2013, 6:04pm

I’m not sure why the classify.otu output would be affected by subsampling. The number in the parentheses for the output is the fraction of sequences in the OTU that have that classification. So it’s a relative abundance and we only accept those over 50% as being valid.

andreas · March 11, 2013, 8:05am

Sorry, my question is not about the way to get the consensus taxonomy for each otu.
I think that the summary file of classify.otu provides very helpful information (using the group file and basis=sequence or otu) about the community composition at different taxonomic levels. You can compare the number and the relative abundance of the taxa between different treatments (of course by using replicates).
If I subsample my data this reduces the number of total otu and this should also reduce the taxa present in the single samples.
Therefore it would be great to be able to run it with the same number of sequences per sample.
Many thanks for your help
Andreas

pschloss · March 11, 2013, 12:02pm

Sorry for the confusion. Here’s a work around that you can try…

Run sub.sample with the list file
Run list.seqs on the resulting list file
Use get.seqs with the output and the original taxonomy file with dups=T
Use summary.tax with the newly subsampled taxonomy file and names file

We’ll work on allowing a taxonomy file to go into the sub.sample command for a future release.

Hope this helps,
Pat

andreas · March 11, 2013, 12:35pm

Many thanks for your quick response.
I tried the sub.sample command with the list file. But this only produced a subset of the total sequences independent of the samples. Is it necessary to include a specific “group option” or something like that to normalize the number of sequences in each sample?
Thanks
Andreas

westcott · March 11, 2013, 4:39pm

Sorry for this tedious workaround, the taxonomy file will be added as an option in sub.sample in our next release.

sub.sample(list=final.an.list, group=final.groups, persample=t) - selects same number of seqs from each group
list.seqs(list=current) - lists the seqs in the subsample

-work around to avoid issues with sequence names, this will be resolved in next release
3. remove.seqs(list=final.an.list, accnos=current) - creates list file with seqs not in subsample
4. list.seqs(list=current) - lists seqs not in sample sample
3. remove.seqs(taxnonomy=yourTaxonomyFile, name=yourNameFile, dups=f, accnos=current) - removes all seqs in not the subsampled list file.

summary.tax(taxonomy=current, name=current)

Kindly,
Sarah

andreas · March 12, 2013, 7:24am

Thank you so much, it works very well.
Best wishes,
Andreas

svazquez · May 14, 2014, 12:10pm

I’m sorry, I’m new with using Mothur and I really could use some help. :oops:

I am analyzing 8 samples (16S clone libraries) and had the same question about the possibility to run the classify.otu command on the subsampled dataset. I tried what it’s explained here but I ended up with classification of sequences instead of OTUs.
Following more or less the SOP tutorial, I’ve got a subsampled file in which all my 8 samples were subsampled to 96 sequences each (the lowest number), running “sub.sample(shared=final.an.shared, size=96)” I got the file “final.an.0.03.subsample.shared” reducing the number of total OTUs from 525 to 512
Then, running the classify.otu command: classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy, label=0.03)
I got the classification of the original 525 OTUs, not the 512 subsampled OTUs.
How should I do to run the classify.otu command on the subsample shared file?

And one more thing. Also the .rabund files from each library are based on the total number of sequences from each library and not on the final normalized amount of sequences per library (sample). Is that possible to also have the .rabund files from the subsampled set of sequences?

Thank you!

adamc83 · May 14, 2014, 6:51pm

You can subsample list, name, and taxonomy files with sub.sample. You are running classify.otu on non-subsampled inputs, so you wouldn’t get subsampled outputs.

svazquez · May 20, 2014, 3:36pm

Thank you for your answer! so, should I sub-sample the final list file and then get the shared file to use as input in the rest of the commands from here on, using the subsampled list file? I mean, how should I do to get the subsampled list file corresponding to the subsampled shared file I need for OTU analyses? because I need the classification of the subsampled OTUs and a shared file for the alpha and beta diversity analysis corresponding to the identified subsampled OTUs in each sample.
Sorrry and thanx

adamc83 · May 21, 2014, 8:04pm

I dont have your data, but it looks like you want to sub.sample at least your list, name, and taxonomy, and group files.

You can then run make.shared on the newly subsampled list and group files to get a shared file containing only subsampled sequences for use in commands that require a shared file.

svazquez · May 22, 2014, 12:01pm

Thank you! I was afraid if I sub-sampled all the files sepparately, the sequences choosen to be discarded would not be the same and therefore, I should follow a specific sequence of steps or sth like that to be sure all the sub-sampled files had the same set of sequences and their corresponding names and classification. I’ll try as you suggest with my final. files :oops:

svazquez · August 28, 2014, 12:16pm

At the moment I made my last question in this thread I gave up with subsampling list, name and group files. Now I am trying again with a new dataset. If I subsample the list, names, taxonomy and group files, instead of groups containing the same number of seqs (the lowest number in a sample/group) I have that number of seqs subsample in total. Of course when I ran count.groups I have different number of seqs per sample, totalizing the amount of seqs that every sample should have (as indicated with the size parameter in sub.sample).

I tried the workaround explained above but for some reason I could not get what I want.

Is there now a way to run classify.otu with a subsampled shared file, giving as output the classification of only the subsampled OTUs?

Thank you!

pschloss · August 28, 2014, 5:57pm

Please just subsample the shared file as we do in the SOP

Pat

svazquez · August 29, 2014, 9:45am

Pat, I did it. But then, when I want to know the affiliation of the OTUs I am working with, I have to go one by one to detect which ones are already in my taxonomy file but not anymore in my otu table, as they were thrown away during subsampling.
Is there a way to run classify.otu using a shared file? Or to get list and name files that matches with the subsampled shared file? So when I show the affiliation of the OTUs I only show those which were taken in consideration for alpha and betadiversity analysis and not all the OTUs that were built at first, before subsampling?
May be I am not understanding this properly, but I found that when I plot the classification of the OTUs to know the taxonomic composition of my samples, I have all and not only those used for analysis.

pschloss · August 30, 2014, 12:48pm

If you run classify.otu on your list file, then make.shared on your list file, and finally subsample the shared file, the OTU numbers will be consistent between the output from classify.otu and the subsampled shared file. This is what we do in the SOPs.

svazquez · September 1, 2014, 9:57am

When I classify.otu on my list file I have 2152 OTUs classified. When I make.shared on my list file I have 2152 OTUs. But when then I subsample the shared file, I end up with 2147 OTUs. Even when 5 OTUs are not a lot, I cannot find a way to know which those OTUs are, to remove them from the taxonomy file, to finally show the classification of the OTUs as a stacked barplot (I have to show that for all samples, at genus and phylum levels) that corresponds with the OTUs in the subsampled shared file, used as input for heatmap and ordination. Anyway, in this particular case, I think that 5 OTUs won’t make the difference, as I think those 5 OTUs that disappeared with subsampling were likely “rare” OTUs with low abundance in samples. Am I right?

pschloss · September 10, 2014, 5:45pm

The OTU names in the shared file and the cons.taxonomy file are the same. It’s not an issue.

Topic		Replies	Views
sub.sample for use with the classify.otu command? Commands in mothur	1	2754	October 4, 2011
sub.sample Commands in mothur	8	12768	April 12, 2012
Classify.seqs Commands in mothur	4	2267	February 6, 2015
sub.sample and OTUs order of operations Commands in mothur	4	1770	February 25, 2016
Issues subsampling data Commands in mothur	13	11585	September 11, 2014

classify.otu with normalised data

Related topics