Pull representative ASV from each sample based on relabund

Hello, I am trying to pull a sequence for each sample that is a representative sequence because these are pure isolates run through miseq. I have a relabund file that details the abundances of all ASVs in each sample, but am having trouble getting the actual sequences themselves attached to the sample name. E.g. some isolates have ASVs that make up over 99% but there are extremely low abundance contaminants that make it confusing when trying to make trees out of all of this. Running rename.seqs is problematic because some of the ASVs come up as ASV#_multi, so it isn’t separating sequences how I need them to be. E.g. give a threshold like if ASV is >90% abundant in one sample, pull the aligned sequence from the fasta and append sample name to it.

My ultimate goal is to have all samples that are identical to each other (but have different metadata) be on the same node in the tree, but I need the fasta file to essentially have duplicate sequences but with different sample names attached to them.

Do you have any tips for helping me do this?

Could you perhaps do something like make.shared to generate a shared file from the count table so you’d have the presence/absence of each ASV in a sample? Then you could do get.oturep for each OTU to get the actual sequence.

Hope this helps,
Pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.