summary.single sobs with decimal values


I just run summary.single with calc=sobs and subsample option, and it shows values with decimal places in the *groups.ave-std.summary output, not just a trail of zeroes like in nseqs but actual numbers.
Based on what I read on the mothur wiki, shouldn’t they be integer values?

label   group   method  sobs
0.03    F10_S16 ave     549.641000
0.03    F11_S17 ave     641.336000
0.03    F12_S18 ave     689.543000
0.03    F13_S19 ave     572.216000
0.03    F14_S20 ave     478.700000
0.03    F15_S21 ave     453.000000
0.03    F16_S22 ave     536.189000

On the other hand, the values returned in the *groups.summary file don’t have that problem, but the values are not the same, not even close actually, most are 1 order of magnitude higher in this file. I guess those are the non-subsampled values…

label   group   sobs
0.03    F10_S16 5656.000000
0.03    F11_S17 4664.000000
0.03    F12_S18 4027.000000
0.03    F13_S19 6378.000000
0.03    F14_S20 6598.000000
0.03    F15_S21 453.000000
0.03    F16_S22 4376.000000

Any idea why is this happening?

Not sure if there is something off in the way they are calculated or maybe I’m interpreting it in the wrong way… so any comment would be appreciated.

Thank you in advance,

PS: F15_S21 is the sample I used for subsampling, that’s why the values are the same

The numbers in the ave-std file are the average number of OTUs observed after sampling nseqs sequences (e.g. the rarefied values). I’d expect most of the sobs values to have decimal values. The groups.summary output file has the number of OTUs for the full datasets without rarefaction. You want to use the ave-std values since those correct for uneven sampling effort.


oh, I hadn’t realised about the iters parameter in the summary.single page. Now makes sense. Sorry about it.

By the way, how relevant is running summary.single with the subsample option if you have subsampled your data before?

Wouldn’t make more sense to run it with the subsampled data and get the values for the actual subset created?

I wouldn’t run summary.seqs on subsampled data.

Pat, would you mind to elaborate on that?

Don’t subsample your seqs before calculating alpha and beta diversity because summary.single and summary.shared will resample repeatedly giving you a better estimate for the overall diversity rather than just the diversity of a single subsampling.