sub.sample command

Benny · July 13, 2012, 10:54am

Hi,

I would like to normalize the number of sequences for each of my samples. The number of reads vary (4000-20000) across 11 samples and i would like to try to normalize it to the sample with the lowest no. of reads (4000). The problem is unless you have a .shared file, the default on the sub.sample command will only select 10% of the number of sequences in the original file - i am currently working with a fasta and group file. Is there any way of selecting a specified number of sequences across all my samples?

Any help is appreciated. Thanks.

westcott · July 13, 2012, 1:17pm

By default the size of the sample is set to 10%. If you provide a groupfile and set persample=t, then the default is the size of the smallest group.

You may interested in the size parameter which allows you indicate the size of your subsample.

sub.sample(fasta=yourFastaFile, group=yourGroupFile, size=sampleSize)

or you could set persample=t

sub.sample(fasta=yourFastaFile, group=yourGroupFile, persample=t)

I hope this helps,
Sarah

Benny · July 13, 2012, 5:46pm

Perfect! Thanks Sarah!

Topic		Replies	Views
sub.sample - question about "size" and "persample" Commands in mothur	1	1977	April 4, 2013
sub.sample feature Commands in mothur	2	3908	January 10, 2011
How to determine size for sub.sample Commands in mothur	1	1581	March 30, 2015
Rationale behind sub.sample with persample=f Commands in mothur	1	3056	August 13, 2012
sub.sample Commands in mothur	1	1679	January 21, 2015

sub.sample command

Related topics