Database generation speed and ksize


I was trying out some different kmer sizes for classification and I’ve noticed a big increase in the RAM requirements and speed taken when the ksize is increased? Is this some bug on my end or some mistake I’m making? I thought that a larger kmer size would be faster

Yep, we’ve seen that too. Thankfully, when we’ve done the leave-one-out testing we find that the smaller ksizes (6-8) also provide the best classification.


I have also tried the kmer test from size 6-11, the fastest is ksize=10, based on Pat’s comment, I’m little confused ksize=10 or Ksize=8 which should be better for use in my case? or both is good. I will not consider ksize=6, it took almost double time of ksize=8 and 10.

Any suggestions?


Hi Junnie,

Speed has no bearing on classification accuracy. In this case (and most others) speed is inversely correlated with quality. We find that lower kmers tend to have better classification. Which exact kmer is best for your region and database would involve a lot of testing. We find that the defaults do the best for the widest range of datasets.