I use the Silva database as in the SOP and go all the way until the phylotype clustering.
Then I wanted to convert the shared file obtained after make.shared into a biom compatible with Picrus (Greengenes).
I tried to perform what is described in this link http://www.mothur.org/wiki/Make.biom
But I get an error:
make.biom(shared=finalreads.tx.shared, label=1, reftaxonomy=gg_13_5_99.gg.tax, constaxonomy=finalreads.tx.1.cons.taxonomy,picrust=99_otu_map.txt)
1
[ERROR]: Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides; is not in taxonomy tree, please correct.
I must be doing something wrong but I’m not sure what.
Any help is greatly appreciated (even if just a link to any documentation I may be missing).
That’s right. picrust only knows the taxonomic names used by the greengenes taxonomy. So if you want to use picrust, then you will need to run classify.seqs with the greengenes taxonomy we have provided.
I have been playing with 16S data for years now and still cannot get my head around the conundrum of the Silva/RDP/Greengenes issue. People want to use picrust on their data and yet google keeps telling me that Greengenes is not really the most up-to-date nor best database to use.
Seems like picrust should make a version that works with other dbs…
Agreed, although I find picrust to be pretty sketchy anyway - imputing function from 16S is problematic for most environments, IMHO. You should talk with the picrust developers about adding other databases.
I just wasted 15 minutes on the google groups site and have come to the conclusion that picrust is abandon-ware…(sort of like GreenGenes :o )
Converting the source to R and generating a clone that works with Silva looks like a good student project anyway. And agreed on the sketchy part, but sometimes you get paid to do what they tell you, not what makes sense. :roll:
Thanks for this discussion: I had wanted to put some data into PICRUSt to see if I could get back predicted metabolic data, and from this exchange it appears that this isn’t the best use of my time
Are there other ways to generate predicted metabolic pathways from 16S data, or is that whole idea sufficiently sketchy that I should walk away from it?