18S rDNA classification issues

Hello everyone,

I’m trying to adapt the Miseq SOP to 18S rDNA, I’m making strides but they are slow and steady. Hopefully you all can help me speed that up!

I’ve tried using all of the different permutations of SILVA alignment/classification for alignments to minimal success. As a few of the OTUs are classifying to genus levels but many remain unclassified even at the phylum level. I don’t think my problem is the alignment of my sequences and instead have ventured into looking at a way to use the database. I’ve tried going directly to the source, curating 18S SILVA reference sequences from release_119 into a fasta file using some perl scripts.

I’ve generated the taxonomy file and have completed the SOP. The problem that I’m really having (which has only been unique to running 18S sequences) is that I’m finding lots of subfamilies and other things like superorders, I’d love to be able to drop those unnecessary groups and just be left with groups that classify from Phylum, Class, Order, Family, Genus and Species (if possible).I could certainly live with that if everything classified correctly, I’m thinking a related problem is that after I finish classify.seqs I see that not all columns of the output data of OTUs are unique, that is to say that in the same column in the output file you may find OTUs classified to genus but another OTU that is classified family or superfamily for example.

As you can see below not only are there duplicate columns like for super family or genus with different classifications but more concerning is the fact that Craterostigmomorpha is supposed to be an Order and Acariformes is a superorder not an Infraclasses. Even worse is that Craterostigmus is supposed to be a genus not an order. Due to fomratting I can’t seem to work out I’ve left a “;” to show where the line should break and the next classification should start.

Kingdom Phylum Subphylum Class Subclass Infraclass Superorder Order Suborder SuperFamily Superfamily Family Genus Genus
Eukaryota(100) Metazoa(100) Arthropoda(100) Myriapoda(100) Chilopoda(100) Pleurostigmophora(100) Craterostigmomorpha(100) Craterostigmidae(100) Craterostigmus(100);
Eukaryota(100) Metazoa(100) Arthropoda(100) Hexapoda(100) Insecta(100) Pterygota(100) Neoptera(100) Endopterygota(100) Coleoptera(100) Adephaga(100) Caraboidea(100) Carabidae(100) Paussinae(100) Ozaenini(100) Pachyteles(100);
Eukaryota(100) Metazoa(100) Arthropoda(100) Myriapoda(100) Chilopoda(100) Pleurostigmophora(100) Craterostigmomorpha(100) Craterostigmidae(100) Craterostigmus(100);
Eukaryota(100) Metazoa(100) Arthropoda(100) Chelicerata(100) Arachnida(100) Acari(100) Acariformes(100) Sarcoptiformes(100) Oribatida(100) Brachypylina(100) Gustavioidea(100) Xenillidae(100) Xenillus(100);

Any advise?

Thanks for your help,

  • Jake

I’ve also tried to force a classification at 6 levels using the full SILVA file as a template to generate my own taxonomy. The issue there being is tat it gave me 6 levels but only from kingdom to arthropoda for example and I’m looking to squeeze out as much information as I can.

Hi Jake,

The problem is with the eukaryotic taxonomy that is in the reference files. I just learned that SILVA now provides a map of the names to the specific taxonomic levels and so I’ll try to fix this. Because some of the taxonomy descriptions have a gazillion levels and eukaryotic taxonomy is not my strong suit, I just trimmed the strings to 6 levels. Like I said, I’ll try to update this in the next few days.

pat

Hi Pat,

Will you be posting another READ ME file on the mothur blog to show the changes you made? That really helped me out when I was making my own database.

Thanks for your help!

  • Jake

I’ll probably just edit the one that is up there and can post a link to the github history so you can see the changes. Does that work?

Sounds great! I’ll keep my eyes peeled for the changes.

Hi Pat,

Can you post the links to the files/webpages that you are going to update so I can compulsively check them instead of constantly bugging you? :smiley:

Thanks a lot!

  • Jake

I’ll bug you when I get it - I’m just a bit behind at the moment…

Hi Pat,

I just wanted to check in and see how the the updates to the 18s files were going? Or if they were updated and I missed the edits.

Sorry, I’ve gotten bogged down with other things and saw they just put out a new release yesterday. When I looked at their taxonomy mapping file, it seemed like they mix domains and kingdoms and it didn’t immediately seem like an easy fix. Sorry, I’ll get to it - alternatively, if you have R skills and would like to take a swing at it that would be most appreciated!

Pat

I’ve been trying to wrap my head around some of the more complex portions of coding in R so I can give it a shot. How would you go about trying to cull the taxonomies?

I have the same problem to classify 18S taxonomy. I’m not sure that do this issue are fixed? It is difficult to arrange taxonomy in 6 levels.
Thank you very much.

The short answer is no. As far I can tell this problem still persists within the V123 taxonomy as well. I’ve come back to this problem as of late and have gone straight to the source and have been working with the SILVA files but it is slow going. I’ll post an update if I have a breakthrough.

As others have noted, using classify.seqs with the SILVA database provided you get the top 6 levels:

M03580_15_000000000-AHFYP_1_2115_20833_19910 Eukaryota(100);Opisthokonta(100);Holozoa(100);Metazoa_(Animalia)(100);Eumetazoa(100);Bilateria(100);

But if you make your own taxonomy file you can get more:

 grep '>' silva.nr_v123.align | cut -f1,3 -d$'\t' | cut -f2 -d'>' > silva.nr_v123.long.tax
M03580_15_000000000-AHFYP_1_2115_20833_19910 Eukaryota(100);Opisthokonta(100);Holozoa(100);Metazoa_(Animalia)(100);Eumetazoa(100);Bilateria(100);Arthropoda(100);Crustacea(100);Maxillopoda(100);Copepoda(100);Calanoida(100);unclassified;unclassified;unclassified;

What are the things that will break downstream by having more than 6 levels in these files?

Ok, I wrote a script in R that pulls in the SILVA taxonomy map, then re-maps the headers from silva.nr_v123.align to conform to a set of specified taxonomic levels. If there is no level given it invents one based on the closest higher taxonomic level. For my purposes this is better than a million “Incertae Sedis” but YMMV.

you can find the script on github: convert_silva_taxonomy.r

@cryomics I tried to run your r-script and it went well until…

tax.in <- read.table("silva.nr_v123.full",header=F,stringsAsFactors=F,sep="\t")

I get an error because I don’t have this file. Where can I obtain it?

The silva reference files are available here, https://mothur.org/wiki/Silva_reference_files.

In @cryomics R code there is a line you need to run in bash:

grep '>' silva.nr_v123.align | cut -f1,3 | cut -f2 -d'>' > silva.nr_v123.full

This will generate silva.nr_v123.full