When running the MiSeq SOP with a count file, I am getting 261,137 uniques and 4,059,903 sequences. After cluster.split at taxlevel=5, I end up with 33,036 OTUs. By sub.sample they are reduced to 26,255 OTUs, 185,224 uniques and 2,371,440 sequences.
When running the MiSeq SOP with a name and group file, I am getting 257,435 uniques and 3,917,871 sequences. After cluster.split at taxlevel=5, I end up with 33,272 OTUs. But by using sub.sample they are reduced to only 7,732 OTUs, 38,802 uniques and 2,286,384 sequences.
I am not too much worried about the slightly deviating numbers before sub.sample, but completely cluelesss why I am getting this massive difference in OTU number and unique sequences by using sub.sample.
Reducing my data set by sub.sampling is what I want and I know that it reduces the number of OTUs.
But I am confused about the very different outcomes of sub.sample when using exactly the same data set employing the MiSeq SOP once with a count file and once with a name and group file: after sub.sample I am getting more than 26,000 OTUs (count file) or less than 8,000 OTUs (name and group file)! Before sub.sample both data sets have about the same number of sequences (4,000,000), uniques (260,000) and number of OTUs (33,000).
Can you post the fasta, group, count, and names files somewhere for me to download and look at? It’d also be good to have the exact commands you are running.