sub.sample feature

jangidk · January 5, 2011, 6:44pm

I wonder if there is some error with the sub.sample command or am i asking it something it is not meant for. I am trying to sample out an equal number of sequences for all my groups within a fasta file using the fasta, names, group and size option. However, it samples out an unequal number of sequences for all my groups and the total number of sequences in the subsample.groups file = that of the size option. Whereas, i need a subsample.groups that should equal the value for size option times the number of groups i have in my input file. Am i doing something wrong?

westcott · January 7, 2011, 1:10pm

The current version of the sub.sample command cannot do what you are looking for, but we can certainly add a feature request.

If you run a command like: sub.sample(fasta=abrecovery.fasta, group=abrecovery.groups, name=abrecovery.names, size=100)

mothur will randomly select 100 sequences from the name file and output them to a new fasta file, and create a new group file with those sequences in it.

Here’s a work around to get what you are looking for, unfortunately it could get tedious with many groups:

sub.sample(fasta=abrecovery.fasta, group=abrecovery.groups, name=abrecovery.names, size=100, groups=A)
sub.sample(fasta=abrecovery.fasta, group=abrecovery.groups, name=abrecovery.names, size=100, groups=B)
sub.sample(fasta=abrecovery.fasta, group=abrecovery.groups, name=abrecovery.names, size=100, groups=C)

For each of these commands mothur will select 100 names from the name file that are from the group specified, and create a new fasta file and group file. The output files will all be named abrecovery.subsample.fasta so in between commands you will need to rename the fasta and groups files. Then you can merge the fasta and group files to create one large fasta and group file containing 100 sequences from each of your groups.

I hope this helps,
Sarah

jangidk · January 10, 2011, 5:58pm

Thanks, Sarah! I already figured the workaround. Its just tedious and adds to a huge list of commands to run especially when you have many files. I hope this feature would be added very soon.

Topic		Replies	Views
sub.sample command Commands in mothur	2	2313	July 13, 2012
Error in sub.sample command in v.1.26 mothur bugs	1	3408	July 27, 2012
Problem with sub.sample Commands in mothur	2	1276	February 3, 2015
Generate fasta file from sub.sample shared file Commands in mothur	2	1370	July 3, 2018
Which command can subsample fasta file Commands in mothur	3	2893	February 16, 2014

sub.sample feature

Related topics