chimera.uchime chunks


I was wondering what happens when using uchime on sequences shorter than 256bp. I am asking because standard settings for uchime are minchunk=64 (minimum chunk length) and chunks=4 (number of chunks the sequence is shredderd into). If there is no fatal flaw in my math, a sequence <256bp can not have 4 chunks of 64bp or bigger. What does uchime do in that case? I asked Robert Edgar already and he said he does not remember. Do you guys have any idea?
I was wondering if I need to adjust the settings for dealing with shorter sequences. Playing with the settings seems to have an effect on my data.


LOL - we literally just re-used all of his public domain code so if he doesn’t know…

All of our testing has been on sequences right around 250 bp, so I don’t think there’s that big of a problem. We’ve definitely seen a lack of chimeras with shorter sequences. Have you tried changing the minchunk size? I wonder if it needs to be a power of 2 (i.e. 2^6=64) or 4 (i.e. 4^3 = 64).


Yes, played with it a bit. My sequences are trimmed to a minimum of 200bp. I do not necessarily detect more chimeras with reducing minimum chunk size or reducing the number of chunks to 3. But the settings have an influence on which sequence is flagged as a chimera. Now I am using minimum chunk size 50 and keep the number of chunks at 4 so that each sequence can have 4 chunks of 50bp. Just makes most sense to me. Not completly satisfying but I am not good enough a programmer to follow Robert’s recommendation to dig into his code and understand what he did.

So I would encourage you to be a bit cautious in fiddling too much since if you tweak one parameter to up the number of chimeras, you could also up the number of false positives. Looking at the paper it seems like they did optimize for 200-300 bp fragments.