I am trying to generate a representative sequence for my pre-defined clusters. My clusters aren’t defined by identity, rather they are defined by the species from which they came. For each species, I want to generate a representative sequence.
Using a fasta-formatted input file I followed this procedure:
Generate distance matrix using dist.seqs(fasta=dsrB.fas, output=lt)
Make my own list file named dsrBlist.list which looks like:
unique 8 species1a,species1b,species1c species2a,species2b,species2c species3a,species3b
Get representative sequences using get.oturep(phylip=dsrB.phylip.dist, fasta=dsrB.fas, list=dsrBlist.list)
However the output file dsrBedit.unique.rep.fasta does not contain representative sequences that are correct. The representatives should be the minimum distance from all other sequences in their pre-defined cluster. Instead, the representatives are just the first sequences in each cluster from the .list file. For example, the representative sequences found in this example are species1a, species2a and species3a. If you change the order in the list file then the representatives change accordingly. It’s as if the get.oturep command is ignoring the distance matrix entirely. What’s going on?
Thanks,
Kris