alpha, beta diversity on subgroups

Another foolish question:

In my lung database I have 138 specimens: 58 subjects with with 2 locations of samples, and the rest are negative controls. The human subjects are two groups: disease and normal (no disease).

I’ve done an analysis with all files together and am at the point of knowing nseqs, sobs, taxonomy, alpha-and beta-diversity, etc for the entire dataset. So far so good. Now I want to look at:

Disease subjects only, location1 and location2
Control subjects only, location1 and location2
Disease and control subjects, location 1 only
Disease and control subjects, location 2 only

I have demographic information on each subject (race, gender, etc) and of course eventually want to look at these separately in the same way.

For my final dataset prior to doing the fun analysis, I have (of course) files that end in shared, taxonomy, tax.summary, list, diet, count.table, etc. To remove the groups I want to remove (e.g., all the location2 group sequences), seems to me I could start at the point in the MiSeq SOP where Pat removes the ‘mock’ community – just do that with the appropriate count.table, fast and taxonomy files, and instead of “groups=Mock” just specify a new group table, and carry on from there.

But before I invest a weekend in that, does that sound right? This seems important: various measurements of beta-diversity etc would change (right?) based on who’s in and who’s out in the group analysis.

Apologies for yet another long-winded question and thanks in advance,

The alpha and beta diversity metrics shouldn’t change based on what other samples are in the dataset. We commonly will calculate these parameters and then parse out the values we want using R.


Very helpful to hear. If you have an ‘example’ R script that would be great; I’m an R rookie and can always learn. Otherwise (as a Mac guy I’m doing some of this in Excel and Deltagraph) I can parse them that way. Thanks.