clearcut: any way to reassign names of tree tips?

Haemophiluser · November 2, 2017, 9:18pm

Hi,
After I ran the clearcut command, I noticed the tree it generated labels the tree tips with the fasta sequence names (e.g. M01098_115_000000000-B5M22_1_1101_14755_2524… I blame the sequence facility for generating these names) instead of OTU (e.g. Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Moraxellaceae_unclassified;). Is there a way to reassign the tree tip names to OTU in mothur or would I have to find some other software to help me do this?
Thanks!

pschloss · November 6, 2017, 1:14pm

Those names are generated by the sequencer. make.contigs has a rename option that you can use, but it uses sequential numbers. For what you’re hoping to accomplish, I think you’d need to write your own script or find a different program.

Pat

Haemophiluser · December 12, 2017, 8:37pm

Ok. I found help creating a Python script to reassign tree tip names for anyone interested.

ADL · April 18, 2018, 6:41pm

I would be interested in that script! I would like to create a tree with OTU names, to match the output of the shared and taxonomy files

Haemophiluser · April 24, 2018, 8:21pm

Hi,
Here’s the script (generated by my colleague and includes notes since I’m not familiar with Python). Sorry that it didn’t paste very well, but hopefully it’ll work for you!

What I first did:

Run get.oturep in mothur to request a fasta of representative sequences.
Rename the sequences in rep.fasta to exactly match the desired final OTU names (Python script below):


from Bio import SeqIO           # imports the Biopython library "SeqIO"
import re                              # imports python regular expressions

<br>
fasta = SeqIO.parse(handle='File.fasta', format='fasta')             # reads in your fasta

new_names = []                    # create an empty container which we will put the renamed sequences in

# loops through the fasta, prints the sequence id and the sequence description
# then it looks in the sequence description for something that looks like 'Otu#####|###' and keeps only the Otu#### part
# creates a temporary variable 'otu', then assigns this variable to the sequence id
# then it adds the renamed sequences to the 'new_names' container
for seq in fasta:
print(seq.id)                                        # I use print statements like this when I'm building my for loops
print(seq.description)                          # they arent necessary for the actual loop but they really help
otu = re.search('(Otu[0-9]+)\|[0-9]+', seq.description)
print(otu.group(1))                              # this one lets me know my regular expression works
seq.id = otu.group(1)
new_names.append(seq)

SeqIO.write(sequences=new_names, format='fasta', handle='File_renamed.fasta')             # writes the sequences to a new fasta

<br>
# this little block verifies that the renamed fasta does indeed have the correct ID
test = SeqIO.parse(handle='File_renamed.fasta', format='fasta')
for seq in test:
print(seq.id)

<br>
# I highly recommend Biopython.  I use the PyCharm community edition IDE.  It's free.
# I ran this script by putting your fasta in the same directory as the script, then pressing ctrl+shift+F10 in Pycharm

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

3. Generate a new phylogenetic tree from this fasta containing the renamed sequences.

Good luck!

Topic		Replies	Views
a phylogenetic tree with OTU tip labels Commands in mothur	4	6512	October 19, 2014
get.oturep and renaming of accessions Feature requests	3	6183	July 29, 2010
question about get.oturep Commands in mothur	1	1441	June 1, 2015
Renaming the sequences in fasta file for asv based analysis	5	478	October 28, 2022
select all sequences from one OTU Commands in mothur	6	6449	March 29, 2016

clearcut: any way to reassign names of tree tips?

Related topics