PR2 database annotated eukaryote 18S

Hi Pat

The PR2 is database dedicated to 18S eukaryotic sequences with a good taxonomy annotation (better than Silva because curated by specialists).

A paper has been published in NAR :

I have created a fasta file and a taxonomy file following the mothur format for the latest release of PR2 based (gb200). Is there anyway this could be posted on the mothur wiki for other people to use ? How to proceed ?

Daniel Vaulot, Station Biologique de Roscoff

You can post it to the wiki and add a link on the reference taxonomy databases. that would be a great contribution!

Hi Pat

This is done :


I tried to use the PR2 Reference database and download the files for mothur on the website however the taxonomy files seem to have a problem because when I try to open it there is undecipherable characters. Could you help me please.

A. Volant
postgraduate at Hydroscience Montpellier, France

It looks like there’s some encoding error in the mothur/QIIME taxonomy files. The gb203_pr2.tlf file looks fine though, so in the short term your best bet would be to reformat it into a mothur-compatible version.

Here’s a quick python script that will do the trick:

import sys

for line in open(sys.argv[1],'r'):
 line = line.strip().split('\t')
 seq = line.pop(0)
 taxList = [x.split('{')[1] for x in line[1:]]
 taxList = [x.replace('}','') for x in taxList]
 tax = ';'.join(taxList)
 print('%(seq)s\t%(tax)s;' % {'seq':seq,'tax':tax})

Just save this into a file and run as:

python gb203_pr2.tlf >

And you should be good to go.

thanks a lot :wink:

The PR2 page seems to be missing ?

Has it gone during the update of the wiki ?

Yes, I would also be interested in being able to implement the PR2 database in mothur.

I downloaded the PR2 file from their website.
How do you use the PR2 fasta file and PR2 taxonomy file as the database align file?

When I try to use them I get an error that the PR2 fasta file is not aligned.

I think it will only work for classification and not alignment. You will only want to use it with classify.seqs.


An updated version of the PR2 database is now available on Figshare :

The PR2 database has been updated.