Picrust analysis using mothur output file

Hello,
I was wondering to know if there is any way that we can do picrust analysis using mothur output file?
Since picrust uses only greengene database, but in my case I used silva database to assign taxonomy. Can we run picrust using silva database?

Thank you so much in advance.

Hi Pratima,

You’ll have to generate a new set of classifications using the greengenes taxonomy. Once you have that, you’ll want to follow the process I’ve outlined here. Alternatively, Picrust2 doesn’t require a taxonomy file and can be run using the files generated running pre.cluster.

Pat

Hi Pat,

I had been looking forward to piping mothur output for PICRUST. Personally, I feel not comfortable to use Silva reference for otu cluster and classification and Greenene for PICRUST. I used to do this function prediction using Tax4Fun (R based software). But I am really curious how the results different from each other. Good to know that Picrust2 take precluter output file as the input.

cheers,
Fang

Thank you so much Pat for your response.
I am planning to rerun the analysis using greengenes database. I had doubdt in the step where we use pcr.seqs command using silva.bacteria.fasta
“mothur > pcr.seqs(fasta=silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)” according to Miseq SOP.
What should we do if we are using greengene database? Are we supposed to use the same pcr.seqs command using the silva.bacteria. fasta?

Thank you.
Pratima

You would need to find your own coordinates. We provide a tutorial on how to do this at http://blog.mothur.org/2016/07/07/Customization-for-your-region/. However, that is really meant for customizing the alignment reference and I can’t think of a reason anyone should use the greengenes reference alignment. For 16S rRNA genes, you should always use the SILVA reference; however, for classification you can certainly use greengenes, but I’m not sure how important it is to customize the database to a specific region.

I found my own coordinates by aligning my sequences to the bacteria.silva.fasta. I customized my sequences since we were amplifying v4-v5 region. By saying classification using greengenes, you mean to say use greengene database in reference while running the comand below?
mothur > classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80).

Thank you so much for your cooperation.

You don’t need the coordinates to run classify.seqs. Also, if you use the latest version of picrust, you don’t need classified data at all.

Hi
I was trying to run tax4fun from my mothur output file. From your post, I came to know that, you used Tax4fun for functional prediction. I tried it but got bugs. I used mothur and used silva132 version for otu cluster and classification. I couldn’t proceed further with tax4fun. Is there any way that I can email you so that I can share my bugs and issues?

I converted by shared file to biom file in mothur and then converted by biom file to txt using Qiime2
and used that txt file for tax4fun

library(Tax4Fun)
file<-importQIIMEData(“growth.txt”)
folderReferenceData<- “C:\Users\pratima\Desktop\SILVA132”
file_collapsed<-Tax4Fun(file,folderReferenceData,fctProfiling = FALSE, refProfile = “UProC”,shortReadMode = TRUE,normCopyNo = TRUE)

Thank you so much. your help will be highly appreciated.
Pratima

Hi Pratima,

Yes, I can help to look into your error information. I would guess the problem either due to the incorrect format of the imported QIIME data or the reference folder. If you go to Tax4Fun documentation, you will see that their most updated available SILVA reference is SILVA123 instead of SILVA132 for Tax4Fun.

I just documented my codes and thoughts in my github repo. For detailed information please found in my github. Please find my email in my github website.

I also used their Tax4Fun2, which is the most updated beta version. IT gave me different results in comparison to Tax4Fun version. This is not totally unexpected as Tax4Fun2 is talking with the most updated database instead of using precalculated KEGG profile. But the problem is toooo~ slow.

By the way, I tried PICRUST2 too. I worked very efficiently and output a lot of informative tables. If you are interested, I have all the customized code for working around (which haven’t got time to put on my github repo).

Best,
Fang

Thank you so much Fang for your prompt response.
So, to use the tax4fun, do I need to reclassify the OTUs using silva 123 database since I used silva132 for the classification?

Even I was thinking to use piCRUst but I haven’t tried it yet since I was not feeling comfortable to reassign the taxonomy using greengene database. So for this, did you reclassify your OTUs using greengene database?

If so do you mind posting those code in github please.

Thank you again. Appreciate your cooperation.

Pratima

Hi Pratima,

For using Tax4Fun, you do not need to redo classification using Silva 123.

In terms of PICRUST2, you do not need to re-classify your contigs based on greengene database. You just need to generate a fasta file to include all unique sequence and associated count table. I will try to document it later today or tomorrow.

Cheers,
Fang

Really, that sounds awesome. I will try it and will let you know. Really, appreciate your cooperation. It means a lot.
I am trying to sort out this problem since long time but got stocked with it.

One more question, we have to convert the shared output file to biom( which I did it using make.biom in mothur) and then to txt (using Qiime) and then used the output txt file for Tax4fun. Am I doing it right?

Thanks again,