Issue with subsample and taxonomy file

Good day

I am trying to make a bargraph in R with my relative abundances of fungal phyla by treatments. So far I have been using the Vegan package in to run all of my ecological statistics, but for this graph, I think ggplot would be appropriate. I do however have one issue. In my dataset, there was a really large variation in sequencing length between samples and that was significantly confounding the results of my NMDS and permanova’s. I then rarefied my samples using the sub.sample command command on my .shared file. Here is the code that I used:
mothur > sub.sample(shared=final.nn.0.03.pick.shared, size=3000)
This removed 27 of my samples (out of a total of 297) that had a total read length less than my cutoff of 3000. This solved the issue of variable sequencing length confounding my results.

However, now that I want to make the bar graph of relative abundances of fungal phyla between treatments, I am running into issues, because the number of OTU’s in the .shared file that is subsampled does not match that of my taxonomy file. I tried to subsample my taxonomy file at 3000, but it’s a bit confusing, because the taxonomy file is not organized by samples, but rather by taxa and relative abundance. I am honestly not sure how to go about this problem.

After doing some reading on the forum, it seems like the create.database can create something that will allow me to make these bar graphs. This command is however not working for me. Each time I tried it with the get.oturep step, it gives me an error about a sequence that is not in my name or count file.

I would dearly appreciate any suggestions here.



You might want to check out my R materials at minimalR. The material in session 9 covers generating box plots and stripcharts from relative abundance data.


Hi Pat,
These tutorials are fantastic, thanks for sharing. I don’t know if I’m allowed to query about that on this forum, but there’s a minor step that is tripping me up. I’m busy with session 9 and managed to merge my taxonomy and .shared files. However, the step where you pull out your metadata does not work for me:
agg_phylum_data ← inner_join(otu_data, taxonomy) %>%
group_by(sample, phylum) %>%
summarize(agg_rel_abund=sum(rel_abund)) %>%
inner_join(., get_metadata()) %>%

I get the error saying “cannot find function for get_metadata”. I’m assuming it’s got something to do with the previous line in which you download some source code file. I tried replacing the "get_metadata()) with just my metadata file name, like this:

agg_phylum_data ← inner_join(otu_data, taxonomy) %>%

  • group_by(sample, phylum) %>%
  • summarize(agg_rel_abund=sum(rel_abund)) %>%
  • inner_join(., metadata) %>%
  • ungroup()

But I’m still getting errors. I think this is a minor fix, but I can’t seem to find the solution. I think it might have something to do with my metadata and agg_phylum_data files not being the same length, because the agg_phylum_data file has replicates for samples, due to repetition of OTU’s, whereas the metadata does not.


A tibble: 6 x 3

Groups: sample [1]

sample phylum agg_rel_abund

1 J001 k__Fungi_unclassified 0.0173
2 J001 p__Ascomycota 0.923
3 J001 p__Basidiobolomycota 0
4 J001 p__Basidiomycota 0.0443
5 J001 p__Calcarisporiellomycota 0
6 J001 p__Chytridiomycota 0.01


Sample Year substrate Precipitation
1 J001 2013 bare -/+ 80%
2 J002 2013 bare -/+ 50%
3 J003 2013 bare +/- 50%
4 J004 2013 bare -/+ 50%
5 J005 2013 bare -/+ 80%
6 J006 2013 bare -/+ 80%

If you have an idea on how I could fix this little hurdle, I would dearly appreciate that.


Hi Pat,
Don’t worry about my last message, I figured it out. Again, thanks for sharing these great resources.

Kind regards