Formatting Silva v128 reference taxonomy

pdcountway · August 20, 2017, 4:47pm

Hi,

I’ve exported a Silva v128 database from ARB that includes the taxonomy field ‘tax_slv’ as well as the name the ‘full_name’ for the 190,661 sequences. I did this so that I can (in most cases) get the species or strain name for the sequence - instead of just a taxonomy truncated to genus. I recognize that in some cases, the ‘full_name’ will be misleading, e.g., ‘full_name’ will refer to a eukaryote that was targeted for genome sequencing when the reported sequence was bacterial - but I think I can deal with this.

In the README here: http://blog.mothur.org/2017/03/22/SILVA-v128-reference-files/ there is a step in which you run R code (provided by Eric Collins) to map the taxa to 6 Linnean levels, a process that starts with reading in the Silva mapping file ‘tax_slv_ssu_128.txt’. This may seem obvious, but when I run the R code on the modified taxonomy file (including ‘full_name’) the process fails and no taxonomy string is returned - because the code looks for exact matching of the text in the tax_slv_ssu_128.txt file and the taxonomy string of the 190,661 sequences. It fails because I added the ‘full_name’ to the taxonomy. I’ve checked, and the folks at ARB/Silva do not have a version of tax_slv_ssu_128.txt that includes species or strain.

The reason I’m doing all of this is to try to get better identification for OTUs from harmful algal species and cyanobacteria. The cyanos are problematic as some of the OTUs are simply identified as ‘Family I’, but when you BLAST the sequences that comprise them, they’re identified to at least Genus if not species.

Can anyone suggest a work-around for including species in the v128 reference files and then getting this taxonomy to work in mothur?

Thanks,
Pete

FM_Kerckhof · August 25, 2017, 10:35am

Hi Pete,

My colleague Tim & I are facing a similar issue: we have PacBio SMRTbell data of both full-length bacterial and archaeal primersets and want to add at least a hint of species-level classification, which we believe should be possible if you have an average read length of 1500 nucleotides of the 16S which has been covered at least 4 (but on average a lot more) times.

I will have a look at the readme and try to adapt it myself. I’ve been through the process in the past and I know it is not always foolproof but I am quite competent in using Arb & R so I think it should be feasible.

Furthermoe, I was wondering: would you be willing to share the Arb export filter you used (*.eft) with me?

Kind regards,

FM Kerckhof

pdcountway · August 29, 2017, 10:56pm

Hi FM,

Here’s the *.eft file I’ve been using to extract the full name with the taxonomy…

SUFFIX fasta
BEGIN

(acc).(name)\t*(align_ident_slv)\t*(tax_slv);*(full_name)
*(|export_sequence)

Cheers,

Pete

Topic		Replies	Views
Silva 138 arb file	5	1366	March 9, 2020
Regarding the choice of Ref database & a related Question Commands in mothur	3	762	May 3, 2017
Silva database update Feature requests	6	10905	February 12, 2014
Taxonomy File for Silva Database Commands in mothur	5	6356	January 15, 2013
New Silva database released: will be formatted for Mothur? Feature requests	5	5220	August 22, 2014

Formatting Silva v128 reference taxonomy

Related topics