I have a dataset where I want to only keep the OTUs that are found between pairs of samples (I thought this was what get.coremicrobiome did cause I didn’t read the wiki carefully). Is there a way in mother to select only the OTUs that are shared between groups?
decided this was more an R task
otu <- read.table(file="PROJECT.trim.contigs.good.unique.good.filter.precluster.pick.pick.an.unique_list.0.03.subsample.shared", header=T, stringsAsFactors = FALSE)
expdata <- read.table(file="crossoverSDIplusaddldata.csv", fill=T, header=T, stringsAsFactors = FALSE, sep=",")
group.pair <- select(expdata, group, Milk.stool.pair)
otu <- right_join(group.pair, otu, by=c("group"="Group"))
df.otu <- otu %>%
gather(variable, value, -Milk.stool.pair) %>% # gather the columns into 'long' format
group_by(variable, Milk.stool.pair) %>% # group by column name and group
summarize(keep = all(value != 0)) %>% # variables and groups where all values are non-zero
ungroup %>% group_by(variable) %>% # reset grouping
summarize(keep = any(keep)) %>% # variables where at least 1 group met the aforementioned criterion
dplyr::filter(keep) # final list
otu.common <- otu[df.otu$variable]