I’ve been using mothur to analyze my pyrosequencing dataset and I’ve been having problems with the classification process. I used the Silva database from the taxonomy page along with the both the silva taxonomy and rdp taxonomy. The command works well except that the result doesn’t make sense biologically. The dataset I’m analyzing contains bovine rumen fluid samples. We would expect to find fibrobacteres in those samples but our output shows no fibrobacteres. I’ve explored the silva database and silva taxonomy file on mothur and found only one sequence entry for fibrobacteres. When I look at the corresponding RDP taxonomy file on mothur, the sequence ID associated with fibrobacteres in the silva taxonomy file is listed as Bacteria; unclassified. I looked on the Silva website and found 47 fibrobacter organisms listed in silva and 222 on the RDP website. Following this, I decided to classify all of my sequences on the RDP website and it reports back a total of 413 hits for fibrobacter (although the RDP classifier does not parse my sequences into groups/samples so I would rather use the classifier on mothur). Is the silva database on mothur the full database (bacteria) or have you only chosen to represent major organisms?
Well I’m glad someone is making sure things make sense biologically - good work!
The silva db I provide is a trimmed down version of the overall database. I obtained it by only using sequences that had a perfect alignment score, were not chimeras, only had a few N’s, and were full-length sequences (28 to 1491). Having fewer than 47 sequences for a phylum is pretty low compared to the others so it is likely that they were inadvertantly missed. For some reason the silva site is down right now, but I suspect that the Fibrobacters probably do not span the full length or have some other problem. There are several things you could do… First, you are totally free to add Fibrobacters sequences to your silva.bacteria.fasta file and to add the accompanying taxonomy lines to the *rdp.tax or *silva.tax files. Second, you can get the actual RDP training set and use that to train the mothur classifier. Feel free to check back in with us if you have any questions on how to do this.
Part of our goal in this project is so that people, such as yourself, can modify these files to suit your own biological goals and questions.
Hope this helps,
I’ve tried adding the Fibrobacter (unaligned) sequences to the silva database as well as adding the taxonomy strings to the taxonomy file although, when I run classify.seqs, I still do not get any positive hits for Fibrobacter. Do you have any other ideas as to why this is happening?
You might want to make sure that you erase all of the training data files that are created from your database files and then start again. Also, you might run some of your sequences through the RDP just to see what they come up with - as weird as it sounds, we need to double check that you actually have Fibrobacters in your data.
yes I have run my dataset through RDP and it returns 413 hits for fibrobacter…this is why I’m really concerned as to why I can’t get any hits when I use mothur (silva database with fibrobacter sequences added ). When I use classify.seqs I’ve been classifying all of my sequences using the bayesian method. Since then, I’ve tried using get.otureps first then running classify.seqs…this does return fibrobacter hits (113 of them). Do you know why I’m getting the discrepancy between the two methods?
yeah, this is disconcerting. could you email me those 413 sequences? - firstname.lastname@example.org
Did you get your problem solved by adding Fibrobacters sequences to silva.bacteria.fasta file and to add the *rdp.tax or *silva.tax files?
I am trying the same thing but did work? How did you get your Fibrobacters classified correctly? Or you stick with rdp database? thanks!