Rationale behind sub.sample with persample=f

Hi Pat and Sarah,

I was wondering if you could explain what’s happening during the subsample command when you don’t specify persample=t. I included a unique fasta, names, and groups file and size and get back varying numbers of sequences in each group when I don’t set persample to true. I am not sure why the random number of sequences per group, they are all within around 10-15% of the specified size but none are actually on the mark. The analysis was done in version 1.24.1, I’m not sure of the commands entered exactly and don’t have the logfile as this was done a while back and I just assumed the groups would each have equal numbers of sequences.


with persample=f, mothur would randomly draw out however many sequences from the pool and so the percentage in each group will remain the percentage (give or take) in the subsampled files. it really doesn’t makes sense to do persample=f if you have multiple groups.