classification leads to many unclassified

umberar7 · July 22, 2010, 8:16pm

Hi,
I’ve classified my sequences using get.oturep (0.05 cutoff) and classify.seqs (bayesian method) using a bootstrap cutoff of 80 or 60%. When I look at genus within phylums I can get 40-85% unclassified genus within a phylum (i.e. Firmicutes: 86% of my genus are unclassified). I am currently using the non-redundant silva database (v.102). Is there a way to decrease the number of unclassified genus within a phylum?

pschloss · July 23, 2010, 11:23am

I suspect your problem is your read length. If you have short sequences, then you will be less likely to classify your sequences as deep as if you had longer reads. Another potential issue, although not as significant is sequence quality. Of course, another issue is what type of environment you are sampling - if you’d expect it to have a lot of novel taxa then the classifier is likely to fail. Some suggestions…

Try some of the different reference taxonomies we provide (or that you can get from an ARB database). Often these vary in their ability to classify for various parts of the tree.
Pursue an OTU-based approach and then classify your OTUs. This is the downfall of phylotyping - you can only classify what’s been seen before.

umberar7 · July 23, 2010, 2:20pm

Hi Pat,
the following is a summary of my precluster.fasta file…I’m not sure what read length would be considered short…These samples were taken from cattle rumen and feces so it is very likely that there are many unclassifieds but 86% of firmicutes being unclassified seemed very high.

I did follow an OTU-based approach (get.oturep and classified the otu reps using the non-redundant silva database (with RDP taxonomy). Are you suggesting I try and different database or different reference taxonomy file?

mothur > summary.seqs(fasta=dennisLabeledFinal.pick.trim.unique.good.filter.unique.precluster.fasta)

Start End NBases Ambigs Polymer

Minimum: 1 919 252 0 3
2.5%-tile: 1 919 271 0 4
25%-tile: 1 919 285 0 5
Median: 1 919 291 0 5
75%-tile: 1 919 298 0 5
97.5%-tile: 1 919 313 0 6
Maximum: 3 919 359 0 8

of Seqs: 34496

Thanks for the help!

pschloss · July 23, 2010, 7:42pm

Well maybe, 250 to 350 bases may be too short to get good classification from the Firmicutes. Our analysis shows that with shorter sequences, Firmicutes do not classify as deep as other groups. You might try some of the other *.tax files that are available with the silva reference files. Also, keep in mind that a 250 bp read will not classify the same as a 350 bp read and so it is more appropriate to get all your sequences to be about the same length with the filter.seqs command so you’re comparing like to like.

umberar7 · July 26, 2010, 4:36pm

Hi Pat,
So if I understand correctly, I need to remove all the gaps in my fasta file and make the nbase length all equal for all sequences? If so, the sequences will have different start and end positions? Which options would I need to include in filter.seqs? would the following be appropriate?

filter.seqs(fasta=… vertical=F)

Also at which point in my analysis should I run this command? i.e which fasta file should i use? fasta file created afer pre.cluster?

This is the order of commands I have been using:
align
screen
filter
pre.cluster
dist.seqs
cluster
get.oturep
classify

Thanks again!

pschloss · July 28, 2010, 9:12am

Sorry - they should overlap over the same alignment coordinates, and won’t necessarily be the same length, but they should be close. No need to worry about the gaps for running classify.seqs.

Here’s the order I suggest…

trim.seqs
unique.seqs
align.seqs
screen
filter(vertical=T, trump=.)
unique.seqs
pre.cluster
chimera.slayer
dist.seqs
cluster
get.oturep
classify

I’ve updated the Costello Analysis to reflect how we’re doing this in our lab.

Topic		Replies	Views
Too many unclassified Commands in mothur	4	5397	December 4, 2011
OTU classification Feature requests	4	6420	October 14, 2013
OTUs : *_unclassified etc. Commands in mothur	4	863	June 2, 2017
Struggling to understand classification Theory behind mothur	4	4212	September 1, 2015
Classification of Sequences Commands in mothur	3	794	May 10, 2017

classification leads to many unclassified

of Seqs: 34496

Related topics