How does Mothur generate a consensus taxonomy when I perform classification via KNN?
On my species-level taxonomy I get the following results:
With k=1, 70% correct (TP), 20% wrong species (FP), and 10% no species identification (FN)
With k=3, 50% correct (TP), 15% wrong species (FP), and 35% no species identification (FN)
With k=5, 45% correct (TP), 5% wrong species (FP), and 50% no species identification (FN)
Question 1:
It appears that Classify.Seqs is taking the least-common denominator among the k-sequences, i.e. the lowest shared taxonomic level, instead of taking the most specific information (i.e. given 3 genus/no-species entries and 2 species-level ids, it reports the genus) Is that correct?
Question 2:
Is there any way to see which sequences are being chosen to classify each sequence? I can make an educated guess based on BLAST-scores or similar metrics, but its hard to optimize my approach with only such indirect information.
-Brett