If I use the command get.oturep to obtain a representative sequence from an OTU that has 27,000 sequences, to my surprise I get a sequence to represent that OTU that has a very rare variant within that OTU. For example, from among all the sequences in an OTU, 2% of the reads contain a variant that appears in the representative sequence. This happened with two different samples and different rare variants which makes me wonder how does the get.oturep command works. Has anyone else looked at this or can give me a possible explanation?


We define a representative sequence for an OTU as the sequence that is the shortest distance to all of the other distances. So even if it is rare, if it is in the “middle” it is the representative. Better than getting a representative sequence are the commands like classify.otu and consensus.seqs, which both use a consensus based approach.

