when subsample? - correspondence of shared and taxonomy file

cottologist · July 24, 2015, 4:34pm

Hi,

I recently had a discussion about when to subsample to the same amount of reads per sample.

I normally do the following order:

dist.seqs(fasta=cz.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20)

cluster(column=cz.good.unique.good.filter.unique.precluster.pick.pick.dist, count=cz.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table)

make.shared(list=cz.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list,count=cz.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table,label=0.03)

sub.sample(shared=cz.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.shared,list=cz.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list,taxonomy=cz.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy,count=cz.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table,size=2997,persample=T)

classify.otu(list=cz.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.subsample.list, count=cz.good.unique.good.filter.unique.precluster.uchime.pick.pick.subsample.count_table, taxonomy=cz.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.subsample.taxonomy)

Is that wrong? Should I rather subsample before making the shared.file?
Thanks, C.

Kendra · July 25, 2015, 12:37am

I think you should sub.sample after making the shared and only for certain things (like indicator species). Alpha and beta diversity in mothur should be run on the whole shared matrix because mothur will subsample repeatedly to calculate those-repeated subsampling gives you a better idea of your data than a single subsampling.

cottologist · July 26, 2015, 10:13am

Thanks, kmitchell, that’s what I did so far.

However, the question I had was the following:
I would like to plot the bacterial taxonomy and for that I planned to merge the shared file with the cons.taxonomy file and then to plot the relative abundance of each taxon. Maybe there’s a much easier way to do it then forget my question and please tell me how to do it.
The problem I was facing with my strategy was that the OTUs in my subsampled shared file don’t correspond with the OTUs in the cons.taxonomy file and I was trying to find a solution to either subsample much earlier or to subsample the cons.taxonomy file, as well.

Thanks, C.

Edit:

I just found a user having the same question: http://www.mothur.org/forum/viewtopic.php?f=3&t=2006&start=10

The problem he was describing is the same I am facing: after the command procedure suggested by P. Schloss I have different OTUs in my subsample.cons.taxonomy file compared to my subsample.shared file. The amount of OTUs differes slightly and more important the OTUs itself are different. In my shared file I for instance have OTU 00030 but it doesn’t appear in the taxonomy file and vice versa.

pschloss · August 3, 2015, 2:39pm

Hi,

kmitchell is correct - for things like metastats, lefese, classify.rf you want to run subsample. If you want to use alpha and beta-diversity metrics you’ll set the subsample value within the command and it will rarefy the data for you.

As for the cons.taxonomy file, the consensus taxonomy should not depend on the subsampling, so I would use the original list, names/count, and taxonomy files.

Pat

svazquez · January 23, 2017, 12:15pm

Hi
So, then, to go further with the alpha and betadiversity analysis, is it not necessary to subsample the shared file to normalize the number of sequences in each sample? Then the estimators of diversity will be based on “different sampling effort” for each sample? and the same with the betadiversity, the analysis will be based on different amount of sequences per sample?
Sorry but at this point I guess I was wrong and thought all the analysis should be based on the same “sampling effort” for all samples… I´d very much appreciate some explanations about this, pls!
Thanks a lot,

Kendra · January 24, 2017, 6:07pm

mothur will subsample repeatedly when calculating alpha and beta indices! it’s one of the things that I love about mothur. So, don’t subsample the shared file you feed into those commands

Topic		Replies	Views
Issues on sub.sampling mothur bugs	5	8070	June 14, 2014
Get corresponding taxomony to subsampled shared file Commands in mothur	7	6238	January 12, 2015
sub.sample and OTUs order of operations Commands in mothur	4	1781	February 25, 2016
Classify.seqs Commands in mothur	4	2290	February 6, 2015
Issues subsampling data Commands in mothur	13	11597	September 11, 2014

when subsample? - correspondence of shared and taxonomy file

Related topics