Reinserting redundant sequence identifiers into a tree

Hey, Thanks for all of your work on Mothur! I was wondering if there was a way to take a tree file and a names file and create a tree that has the identical sequence identifiers mapped right onto it. Since read.tree can take in both of those files and run analyses as if the sequences are there in the tree, it seems logical that there would be a way to print the tree with the full sequence set to a file. FYI, here’s the current protocol that I use:

A shell script can be written to reinsert identical sequences into a tree file that were removed pre-analysis. This script will need to be built from the ground up as a customized script for your sequence set. This can be assembled easily in Microsoft Excel or one of its clones.
Column A: ‘sed -i s/’ all the way down the column
Column B: sequence IDs for representative sequences (Column 1 of the ‘.names’ file)
Column C: backslashes all the way down the column
Column D: lists of sequences represented by each representative sequence (including the representative itself) separated by commas; each line must correlate with the Column B identifiers (Column D corresponds to Column 2 of the Mothur ‘.names’ file)
Column E: ‘/g file_name.tre’ all the way down the column
After this is put together, save it as tab-delimited text, open it with an advanced text editor (one that can perform a search and replace on tabs, e.g., TextWrangler or TextPad), remove all tabs, and add the first few lines manually to make it a working script. [Note: If one sequence name anywhere in the tree file or ‘.names’ file is nested within another (e.g., ‘bacterium’ and ‘bacterium2’), a colon can be added immediately after the name of the representative sequence with the shorter name, as long as a colon is added after the list of sequences being represented by that sequence.] The script can now be run on the original tree file and it will transform it into a tree file containing all of the sequences in the original sequence set (before removing identical sequences).
#$ -S /bin/bash
#$ -cwd
#$ -o search_replace.log -j y

sed -i s/5005c2/5005c2,HL06C03c12/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5005c4/5005c4,CL08C02c09,uncultured_bacterium_FD01A08,uncultured_bacterium_FD04E06/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium:/uncultured_bacterium,HL08B03c26:/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_5C231311/uncultured_bacterium_5C231311,GQ109020/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_nbw397h09c1/uncultured_bacterium_nbw397h09c1,HL05A03c20/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre

Citation: Hodkinson, B.P. 2011. A Phylogenetic, Ecological, and Functional Characterization of Non-Photoautotrophic Bacteria in the Lichen Microbiome. Doctoral Dissertation, Duke University, Durham, North Carolina.

Thanks for the suggestion! The deunique.tree command will be in version 1.20.0.