How does mothur decide what is a ‘representative’ sequence from each OTU? This doesn’t sound like the same thing as a consensus sequence… but then what are the criteria for defining a representative sequence?
quoting the mothur source code getoturepcommand.cpp:
479 // if only 1 sequence in bin or processing the "unique" label, then
480 // the first sequence of the OTU is the representative one
...
519 // sequence with the smallest maximum distance is the representative
520 //if tie occurs pick sequence with smallest average distance
Robin
I am curious as to what happens if sequences are of different length (i.e. one has ten extra bases but other than that they are identical), are they considered unique OTUs? Any input would be greatly appreciated.
mtea - This is why we strongly advocate trimming sequences to overlap the same region. By default distances are calculated to treat this type of difference as a mismatch. I suspect the difference would still be below your threshold, but it would still count. The problems with treating short sequences as replicates of a longer sequences include i) the fact that regions of the 16S rRNA gene evolve at different rates and ii) what to do with a case where you’ve got ATGC and ATGCAT and ATGCTA - where does ATGC go?