Understanding the rep.fasta file generated through the get.oturep command


I am trying to get representative sequences for each OTU in my dataset by running the get.oturep command. The distance file that I am using is the one generated through dist.shared command (thetayc method).
The rep.fasta output that I get is confusing me. The first representative sequence is from the most abundant OTU1:

MISEQ_60_000000000-CVFVM_1_1101_10014_17238 Otu001|24534
Am I right in understanding that 24534 is the number of sequences belonging to that OTU? This is where I have a problem: the total number of sequences assigned to that OTU in the cons.taxonomy file is 162625 (much larger). Is there something I am doing wrong? Or does the number 162625 include non-unique sequences and the 24534 number corresponds to unique sequences?

Thanks very much,

Hi Olga,

yes, your suggestion can be right. Depends on whether all or just unique sequnces used to generate representative OTUs. See:

With the name file included representative OTU should be picked from all sequences (as I rember). Run summary.seqs to count sequences. This is just a quick reply. You can ask again if this does not help.


This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.