Silva database update

Hello,

I would like to use mothur on the new silva database, which could be downloaded from this link

Is there any mothur compatible version of this database?

Another question, for the bacterial sequences here,

where can I find the sequences which are unaligned?

Best Regards,
Hadidi

It’s fairly trivial to create the mothur taxonomy/template files using the ARB export function. If you mark your sequences in ARB you can export them as fasta directly, and then create an export template like:

SUFFIX tax
BEGIN
*(name) *(tax_slv);

to make the taxonomy file.

There are a few little things you need to clean up afterwards - removing spaces from the taxonomies, removing any double ;; in the taxonomies, and spaces/gaps in the fasta file. That said, it’s worthwhile cleaning the database up first, since there are often suspected chimeras, incomplete taxonomies, and many redundant sequences in the ARB databases.

I can’t remember exactly where to download the unaligned sequences, but you could just run degap.seqs on silva.bacteria.fasta and you’d have the same result.

Thanks Dwaite for your help.
Could you please tell me a reference for “That said, it’s worthwhile cleaning the database up first, since there are often suspected chimeras, incomplete taxonomies, and many redundant sequences in the ARB databases.” My supervisor asked me about it.

Thanks,

I don’t know of any reference, these are just things we’ve observed working while working with ARB. I realise that you’re working with the NR database, so this may not hold true for you.

>Suspected chimeras
From the SILVA 108 SSU_Ref database (this is the last one I worked with, so had it on my computer) I marked and exported sequences with a pintail score of less than 75. There were 73,211 of these from about 600,000 sequences total. I aligned them in mothur, where 15,604 couldn’t be aligned at all, and of those that aligned 8,848 were flagged as chimeras.

This is always a bit tricky, because results can change depending on your chimera detection algorithm and reference database, but there are definitely sequences in the database that I would be suspicious of.

>Incomplete taxonomies
ARB databases have 3 fields for taxonomy - SILVA, RDP and Greengenes (tax_slv, tax_rdp, and tax_gg, if I remember correctly). Because the databases uses SILVA sequences, they don’t always have an equivalent taxonomy in RDP/Greengenes. Basically, the tax_slv field is always full, but sometimes the other two will be cut short at some level.

>Redundant sequences
You’re using the NR version, so don’t worry about this.

It’s in our PLoS ONE pipelne paper, the Haas chimeraslayer paper, and the current version is available through BEI

How do you create an export template in ARB? I only know how to export the NDS info and a fasta file.

Thanks!

If you’re using a the linux version the templates are stored in /usr/lib/arb/export. They’re just a simple text file with the extension ‘eft’. The syntax is pretty simple, so you can generally work out how to make new templates from reading the existing ones.


[quote="mhadidi"] Thanks Dwaite for your help. Could you please tell me a reference for "That said, it's worthwhile cleaning the database up first, since there are often suspected chimeras, incomplete taxonomies, and many redundant sequences in the ARB databases." My supervisor asked me about it.

Thanks,
[/quote]
Also, I know this is old now, but I was reading the SILVA documentation recently and realised that they deliberately leave chimeras in the database, and leave it to the researcher to decided whether to use them or not.