Too many unclassified

Hi

I’m trying to analyze sequences from cecum, ileum content, ileum tissue etc., but everything I’ve tried has left me with a large percentage of my sequences unclassified at the phylum (and genus) level. I’ve tried using silva.bacteria.rdp6.tax, silva.bacteria.rdp.tax, silva.bacteria.silva.tax as my templates in classify.seqs with a cutoff of 80, 60 and 40. The template silva.bacteria.rdp6.tax has given me the best results thus far at a cutoff of 40, but using that cutoff is really low, so I’d like to use a higher one. I’m out of options as to what to try to do.

Is it something in my classify.seqs that would cause this? Or at another prior command? Are there any other templates you’d suggest using?

I’m aware we expect a large amount of unclassified in these environmental samples, but the %unclassified I’m getting is much higher than other datasets previously analyzed in my lab from the same sample types.

Thanks!

Hmmm… You might try the RDP training set:

http://www.mothur.org/w/images/4/49/RDPTrainingSet.zip

Ok… and if this didn’t work? Try…?

Are you sure the sequences are in the “right” direction? We don’t automatically flip the sequences at this point…

What is the average length of your sequences, it the length is less than 100bp, I think is reasonable to get very high unclassified. There is also another way that you could try to use GAST for classification .