How does sub.sample function works

Hello,

I just want to know how the sub.sample() function works. How does the function determines how many sequences to remove from each OTU/ASV? does it remove more sequences from larger ASVs; is there a ratio kept between ASVs, is the relative abundance of each ASV affected? Are low abundance ASVs eliminated (I see that some singletons, doubletons are eliminated)? Thanks for your help!

Regards,
Elliston

Hi - it randomly grabs the specified number of sequences from each sample. It does it empirically more abundant OTUs will be sampled more often than rarer ones. If a sample has a lot more sequences than the desired threshold it will remove more rare OTUs from the sample. If a sample has fewer than the desired threshold of sequences that sample will be removed. if you don’t give it a threshold then it will use the size of the smallest sample. This function will only do one sampling of each sample. Using dist.shared and summary.shared gives the option of doing many subsamplings and then reporting the average of the alpha or beta diversity metric over those subsamplings.

Pat

1 Like

Oh, okay! This makes a lot of sense. Thank you very much!

Also, the sub.sample function would not be the same as rarefying the data, correct? Or is it the same?

sub.sample outputs a shared file, the rarefaction commands output other file formats. subsampling is effectively rarefaction with a single randomization

Pat

1 Like

Alright, thank you very much!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.