I just want to know how the sub.sample() function works. How does the function determines how many sequences to remove from each OTU/ASV? does it remove more sequences from larger ASVs; is there a ratio kept between ASVs, is the relative abundance of each ASV affected? Are low abundance ASVs eliminated (I see that some singletons, doubletons are eliminated)? Thanks for your help!
Hi - it randomly grabs the specified number of sequences from each sample. It does it empirically more abundant OTUs will be sampled more often than rarer ones. If a sample has a lot more sequences than the desired threshold it will remove more rare OTUs from the sample. If a sample has fewer than the desired threshold of sequences that sample will be removed. if you don’t give it a threshold then it will use the size of the smallest sample. This function will only do one sampling of each sample. Using dist.shared and summary.shared gives the option of doing many subsamplings and then reporting the average of the alpha or beta diversity metric over those subsamplings.
sub.sample outputs a shared file, the rarefaction commands output other file formats. subsampling is effectively rarefaction with a single randomization