Hello, I am trying to pull a sequence for each sample that is a representative sequence because these are pure isolates run through miseq. I have a relabund file that details the abundances of all ASVs in each sample, but am having trouble getting the actual sequences themselves attached to the sample name. E.g. some isolates have ASVs that make up over 99% but there are extremely low abundance contaminants that make it confusing when trying to make trees out of all of this. Running rename.seqs is problematic because some of the ASVs come up as ASV#_multi, so it isn’t separating sequences how I need them to be. E.g. give a threshold like if ASV is >90% abundant in one sample, pull the aligned sequence from the fasta and append sample name to it.
My ultimate goal is to have all samples that are identical to each other (but have different metadata) be on the same node in the tree, but I need the fasta file to essentially have duplicate sequences but with different sample names attached to them.
Do you have any tips for helping me do this?