Random iterative subsampling

Hello dear Mothur community

I am wondering if there is a way to generate an random subsample of my set. Ideally a process with multiple iterations.

I have been using the MiSop guidelines on 18S/V4 iTags with the Silva 123 database.
I would like to subsample before using cluster.split() (with optiClust)

The subsample command allows to set the desire ‘size’ of the new randomly generated group, but I would like to also have iterations of the process.

Anybody with experience, suggestions or comments on how to get this done?

Thanks in advance,


If you have to subsample before clustering, you’re going to have to script it yourself. This is something that we don’t recommend.


Hi again, thanks for the answer Pat.
So, basically the way to get around a computational demanding clustering step (~12 million unique reads from 318 samples) would be to use pre.cluster with a greater difference?

Thanks for your comments,


Yes or to use phylotype. I think the problem is that you have used V3 chemistry and have a pretty high error rate.


Hi Pat, and community
I am running my data again, and just noticed that after make.contigs() my scrap files (scrap.contigs.fasta and scrap.contigs.qual) are empty. Can anybody explain me a little better what the implications of this are?



That sounds right. Are you running screen.seqs after running make.contigs as described in the MiSeq SOP?


Oh! so is ok if they are empty … good =)

My workflow is based on the MiSOP, added a few commands to adjust to my data. Anyways the start looks like:

make.contigs(file=16S.files, processors=24)
pcr.seqs(fasta=16S.trim.contigs.fasta, oligos=bactV4.oligos, pdiffs=2, rdiffs=2, group=16S.contigs.groups)
screen.seqs(fasta=16S.trim.contigs.pcr.fasta, group=16S.contigs.pcr.groups, minlength=252, maxlength=254, maxambig=0, maxhomop=6)