subsample? normalize? relabund?

Hi
Sorry to ask this likely silly question, but I’d like to understand…
What is the difference in using sub.sample and normalize.shared with default totalgroup option? and what makerelabund does actually? as I tried normalize.shared with default options and then include the makerelabund option and the output was the same.

Using the same shared file as input:
get.relabund
Group numOtus total seqs
10A 5457 5592
11A 5457 5592
11B 5457 5592
11C 5457 5592
12A 5457 5592
12B 5457 5592
12C 5457 5592
S1B1S 5457 5592
S21S 5457 5592
S31S 5457 5592

normalize.shared
Group numOtus total seqs
10A 5735 5415
11A 5735 5592
11B 5735 5604
11C 5735 5434
12A 5735 5666
12B 5735 5737
12C 5735 5735
S1B1S 5735 5712
S21S 5735 5609
S31S 5735 5783

normalize.shared (with makerelabund option)
Group numOtus total seqs
10A 5735 5415
11A 5735 5592
11B 5735 5604
11C 5735 5434
12A 5735 5666
12B 5735 5737
12C 5735 5735
S1B1S 5735 5712
S21S 5735 5609
S31S 5735 5783

Thank you!

Hi,

As you can hopefully see, sub.sample will take the same number of sequences from each sample. normalize.shared will convert all of the values in the shared file to relative abundances and then multiply them to either what you use as the value for the norm parameter or the size of the smallest sample, and then the values are rounded to an integer value. I’m not a fan of normalize.shared since it does not result in the same number of sequences for each sample. I’d stay away from it. If you want relative abundance data, those values should only be calculated on a shared file that is outputted from sub.sample.

Pat