R workshop taxonomy function

I’m working through Pat’s R workshop file for making a matrix of median relative abundances at the phylum level. I would like to do this at the family level (or other), and I’m having a challenge manipulating the R function below. Does the forum have any thoughts on how to change the function to get family level data aggregated?

tax_no_confidence <- gsub(pattern="\(\d*\)", replacement="", x=taxonomy$Taxonomy)
phylum <- gsub(“Bacteria;([^;]);.”, “\1”, tax_no_confidence)
otu_phylum <- data.frame(otu = taxonomy$OTU, phylum = phylum, stringsAsFactors=F)

For phylum you do this…

phylum <- gsub(“Bacteria;([^;] *);.* ”, “\1”, tax_no_confidence)

Something like this should also work…

family <- gsub(“Bacteria;[^;] *;[^;] *;[^;] *;([^;] *);.* ”, “\1”, tax_no_confidence)

Basically, each of the [^;] *; chunks represents a taxonomic level. If you wrap the chunk you want in parentheses it will output it with the \1

Pat

Thanks! Really appreciate the help.

Hi Pat, your suggestion works until I get to this step:

family_shared_sapply <- sapply(X=unique_family, FUN=count_family, otu_counts=shared, map=otu_family_overlap)

Once I look at the output, all of my unique sample names (10Ferm, 11Ferm, etc.) become (1,2, etc.) I went back to my shared file and renamed with just letters to see if those would remain, but it appears sapply is the step where things are going wrong. When I run this step for phyla, I don’t have the same problem. Do you have any suggetions?

Thanks,
Ryan