Hi, I have downloaded Silva 132 from here (https://www.mothur.org/wiki/Silva_reference_files). I also download RDP database (https://www.mothur.org/wiki/RDP_reference_files).
The RDP database has two files. One is fasta, the other is id to tax file. I can use it directly. However, the Silva database is so weird. Full length sequences and taxonomy references package doesn’t have the fasta file or td to tax file like RDP database?
Do you know where I can download ready-to-use Silva database (like RDP)?
The SILVA reference file archive gives you
silva.nr_v132.align. The align file is the fasta file you want to run
The SILVA file is compressed twice. When you decompress the *.gz file that you download you get another file which you have to decompress a second time. This will give you then the *.tax and *.fasta files that you need. I got confused by this as well.
In case, you didn’t know (nor fully clear from your post), there is a detailed description on how to prepare SILVA files for use with mothur:
And and overview of previous versions with download links:
Also, just to be clear - you don’t have to run the code in the blog post - that is for transparency and for those that might want to tweak what we did. The actual files are provided at the wiki link from above.
Hello Sir, Can we use the .align file directly in our command or do we need to change its extension to .fasta? Thank you.
How did you unzip the file a second time? When I decompress the *.gz file it no longer has an extension.
open your software first (WinZip, WinRAR or whatever you use) and then open and unzip the file, it doesn’t need an extension for that. At least 7-Zip, which I use, has no problems with that.
I ended up using 7-zip like you recommended which worked perfectly. Thank you so much!