classify.seqs using UNITE fungal db

Kendra · December 18, 2012, 1:34am

I have a bunch of fungal ITS2 sequences and have been processing them following the SOP. I’m to the classification step and am using UNITE ITS2. When I use k=8 ~25% are unclassified (blasted a few of the unclassified and they came back as perfect matches to sequences in UNITE). Tried k=7, fewer unclassified but still more than 20% and still sequences that blast says should have been classified. Tried k=6 and suddenly all are classified at least to phyla. Naturally when something suddenly works, I worry what I might be doing wrong. Any insight why I’m getting such a difference between k=7 and k=6?

pschloss · December 18, 2012, 3:28pm

I suspect your phyla all have at least 6 sequences in them. Remember taht this method works by taking the k-closest matches and reporting the complete consensus taxonomy. So if k=7 and a phylum in your reference only has 6 sequences in it, that phylum will never get detected.

Kendra · December 18, 2012, 7:30pm

I was using the default method which I thought was Wang where k=kmer size, not the knn where k=#nearest neighbours. Is knn the default?

pschloss · December 19, 2012, 12:57pm

Argh, sorry. Not enough sleep… You’re correct.

The k-size will depend on the length of the sequences and the number of sequences in each taxon. Unfortunately, it can only really be determined empirically by doing a leave-one-out test to see how classification accuracy depends on kmer size. The other consideration is that larger k-sizes will take more time to do the classifications. In general a kmer size of 7 or 8 seems to work well in the stuff we’re testing.

Pat

Kendra · December 19, 2012, 6:39pm

ok thanks, I’ll add LOO on the UNITE db to my to do list

Topic		Replies	Views
Unclassified fungi (Fungal analysis ITS2 region) Theory behind mothur	15	2172	June 29, 2020
Limitations with phylotype and fungal ITS Theory behind mothur	5	2375	May 19, 2016
analysing fungal 454 sequences Commands in mothur	8	6254	August 13, 2015
UNITE Database Theory behind mothur	3	8010	November 11, 2014
What Fungal ITS database are people using? Theory behind mothur	1	5370	November 14, 2012

classify.seqs using UNITE fungal db

Related topics