mothur

Understanding the rep.fasta file generated through the get.oturep command

Hello,

I am trying to get representative sequences for each OTU in my dataset by running the get.oturep command. The distance file that I am using is the one generated through dist.shared command (thetayc method).
The rep.fasta output that I get is confusing me. The first representative sequence is from the most abundant OTU1:

MISEQ_60_000000000-CVFVM_1_1101_10014_17238 Otu001|24534
Am I right in understanding that 24534 is the number of sequences belonging to that OTU? This is where I have a problem: the total number of sequences assigned to that OTU in the cons.taxonomy file is 162625 (much larger). Is there something I am doing wrong? Or does the number 162625 include non-unique sequences and the 24534 number corresponds to unique sequences?

Thanks very much,
-Olga-

Hi Olga,

yes, your suggestion can be right. Depends on whether all or just unique sequnces used to generate representative OTUs. See:

With the name file included representative OTU should be picked from all sequences (as I rember). Run summary.seqs to count sequences. This is just a quick reply. You can ask again if this does not help.

Sigmund

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.