Classifying Sequences

Tris · March 9, 2012, 8:08am

Hi Pat and Sarah,

I am having a little difficulty (I think) with the classifying of sequences. I have followed the SOP to the letter and when I classify (regardless of whether I use RDP old or new GreenGenes old or new etc, I’ve tried them all, even the RDP online classifier) I get about 30% of my seuqneces match to unclassified bacteria. It is an Antarctic soil 16S 454 run. I’m used to a little unclassified when using 454 simply because the repositoires are not all emcompassing but 30%? I feel like that’s a bit much.

Any ideas/suggestions?

Thanks
Tris

Kirk · March 9, 2012, 12:12pm

Hi Tris,

take it from me, you don’t have to worry about 30 % unclassified from Antarctic samples.(at least, I hope so )

pschloss · March 9, 2012, 3:59pm

Unclassified at what level? The root? Genus? Kirk is right that unclassifieds are common with short reads, but if they’re unclassified at the Root or Kingdom domain, then there might be a problem with the data.

Tris · March 14, 2012, 11:09am

They’re unclassified at the phylum level (so they get identified as bacteria) they’re probably about 200bp (after following the SOP, is that normal? it’s titanium 454). I reduced the cutoff to 60 and get some reasonable (>80%) matches at lower taxonomic ranks that I didn’t get before at 80% cutoff, identifying a few more of the unclassifieds.

pschloss · March 14, 2012, 5:26pm

Tris, yeah the length is normal (454 makes up numbers…). Can you try taking some of those sequences and blast’ing them against the nt database at NCBI? We actually came across this yesterday and they turned out to be mouse 18S. While your’s are unlikely to be mouse, they could be some other artifact. Regardless, if you can align them, they’re probably “real” but it’s just a matter of figuring out what they are - something weird or something novel.

Tris · March 21, 2012, 11:15am

Hi Pat,

I’ve blasted a handful of the abundant ones and some of the rares and they pretty much match to ‘uncultured bacterium’ clones, with some phylum representation wayyyyy down the list. So they seem like legitimate seqs. I tried to classify them with no cutoff and I got some really good matches (90-100% BSs) on previously unclassified ones. Does that seem right, I can’t rationalise that outcome, surely if the scores are that high they would have come out with the 80% cutoff?

Cheers,
Tris

Topic		Replies	Views
Too many unclassified Commands in mothur	4	5466	December 4, 2011
Bacteria_unclassified Theory behind mothur	10	2513	November 30, 2021
classification leads to many unclassified Commands in mothur	5	6887	July 28, 2010
classify.seqs Commands in mothur	1	910	March 13, 2017
Classifying sequences Theory behind mothur	5	11087	May 29, 2013

Classifying Sequences

Related topics