Get.oturep with user-defined clusters

I am trying to generate a representative sequence for my pre-defined clusters. My clusters aren’t defined by identity, rather they are defined by the species from which they came. For each species, I want to generate a representative sequence.

Using a fasta-formatted input file I followed this procedure:

Generate distance matrix using dist.seqs(fasta=dsrB.fas, output=lt)
Make my own list file named dsrBlist.list which looks like:
unique 8 species1a,species1b,species1c species2a,species2b,species2c species3a,species3b
Get representative sequences using get.oturep(phylip=dsrB.phylip.dist, fasta=dsrB.fas, list=dsrBlist.list)

However the output file dsrBedit.unique.rep.fasta does not contain representative sequences that are correct. The representatives should be the minimum distance from all other sequences in their pre-defined cluster. Instead, the representatives are just the first sequences in each cluster from the .list file. For example, the representative sequences found in this example are species1a, species2a and species3a. If you change the order in the list file then the representatives change accordingly. It’s as if the get.oturep command is ignoring the distance matrix entirely. What’s going on?


The reason the first name is always returned is because mothur assumes if a sequence is in an OTU at the unique level then it is the same as everyone else in the OTU, so it just returns the first sequence instead of processing the bin. We will change this for the next version.

Thanks for the insight. Is there anything I can do in the meantime while waiting for the new version?

If you change the label mothur will process it