I have a dataset of 12Â´000 sequences of masD genes (454 tag sequences) from 13 samples in 13 seperate multifasta files. My aim is to translate them into amino acid sequences, align them and finally cluster them and built trees. I already have a ARB reference database of aligned amino acid sequences. What would you suggest as best strategy to work with this dataset?
Looking forward to your answers and cheers, Marion.
Unfortunatley, mothur is not set up to handle amino aid sequences at this point. The method we suggest is to align your references as amino acids, back translate them to DNA sequences adn use that as your reference. With the resulting alignment, you could use that in mothur.