sub.sample command


I would like to normalize the number of sequences for each of my samples. The number of reads vary (4000-20000) across 11 samples and i would like to try to normalize it to the sample with the lowest no. of reads (4000). The problem is unless you have a .shared file, the default on the sub.sample command will only select 10% of the number of sequences in the original file - i am currently working with a fasta and group file. Is there any way of selecting a specified number of sequences across all my samples?

Any help is appreciated. Thanks.

By default the size of the sample is set to 10%. If you provide a groupfile and set persample=t, then the default is the size of the smallest group.

You may interested in the size parameter which allows you indicate the size of your subsample.

sub.sample(fasta=yourFastaFile, group=yourGroupFile, size=sampleSize)

or you could set persample=t

sub.sample(fasta=yourFastaFile, group=yourGroupFile, persample=t)

I hope this helps,

Perfect! Thanks Sarah!