clearcut: any way to reassign names of tree tips?

Hi,
After I ran the clearcut command, I noticed the tree it generated labels the tree tips with the fasta sequence names (e.g. M01098_115_000000000-B5M22_1_1101_14755_2524… I blame the sequence facility for generating these names) instead of OTU (e.g. Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Moraxellaceae_unclassified;). Is there a way to reassign the tree tip names to OTU in mothur or would I have to find some other software to help me do this?
Thanks!

Those names are generated by the sequencer. make.contigs has a rename option that you can use, but it uses sequential numbers. For what you’re hoping to accomplish, I think you’d need to write your own script or find a different program.

Pat

Ok. I found help creating a Python script to reassign tree tip names for anyone interested.

I would be interested in that script! I would like to create a tree with OTU names, to match the output of the shared and taxonomy files

Hi,
Here’s the script (generated by my colleague and includes notes since I’m not familiar with Python). Sorry that it didn’t paste very well, but hopefully it’ll work for you!

What I first did:

  1. Run get.oturep in mothur to request a fasta of representative sequences.

  2. Rename the sequences in rep.fasta to exactly match the desired final OTU names (Python script below):


from Bio import SeqIO           # imports the Biopython library "SeqIO"
import re                              # imports python regular expressions

<br>
fasta = SeqIO.parse(handle='File.fasta', format='fasta')             # reads in your fasta

new_names = []                    # create an empty container which we will put the renamed sequences in

# loops through the fasta, prints the sequence id and the sequence description
# then it looks in the sequence description for something that looks like 'Otu#####|###' and keeps only the Otu#### part
# creates a temporary variable 'otu', then assigns this variable to the sequence id
# then it adds the renamed sequences to the 'new_names' container
for seq in fasta:
print(seq.id)                                        # I use print statements like this when I'm building my for loops
print(seq.description)                          # they arent necessary for the actual loop but they really help
otu = re.search('(Otu[0-9]+)\|[0-9]+', seq.description)
print(otu.group(1))                              # this one lets me know my regular expression works
seq.id = otu.group(1)
new_names.append(seq)

SeqIO.write(sequences=new_names, format='fasta', handle='File_renamed.fasta')             # writes the sequences to a new fasta

<br>
# this little block verifies that the renamed fasta does indeed have the correct ID
test = SeqIO.parse(handle='File_renamed.fasta', format='fasta')
for seq in test:
print(seq.id)

<br>
# I highly recommend Biopython.  I use the PyCharm community edition IDE.  It's free.
# I ran this script by putting your fasta in the same directory as the script, then pressing ctrl+shift+F10 in Pycharm

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

3. Generate a new phylogenetic tree from this fasta containing the renamed sequences.

Good luck!