Pull representative ASV from each sample based on relabund

bakerdyl · November 29, 2022, 9:40pm

Hello, I am trying to pull a sequence for each sample that is a representative sequence because these are pure isolates run through miseq. I have a relabund file that details the abundances of all ASVs in each sample, but am having trouble getting the actual sequences themselves attached to the sample name. E.g. some isolates have ASVs that make up over 99% but there are extremely low abundance contaminants that make it confusing when trying to make trees out of all of this. Running rename.seqs is problematic because some of the ASVs come up as ASV#_multi, so it isn’t separating sequences how I need them to be. E.g. give a threshold like if ASV is >90% abundant in one sample, pull the aligned sequence from the fasta and append sample name to it.

My ultimate goal is to have all samples that are identical to each other (but have different metadata) be on the same node in the tree, but I need the fasta file to essentially have duplicate sequences but with different sample names attached to them.

Do you have any tips for helping me do this?

pschloss · December 1, 2022, 10:13pm

Could you perhaps do something like make.shared to generate a shared file from the count table so you’d have the presence/absence of each ASV in a sample? Then you could do get.oturep for each OTU to get the actual sequence.

Hope this helps,
Pat

system · December 11, 2022, 10:13pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Renaming the sequences in fasta file for asv based analysis	5	478	October 28, 2022
Representative OTU Seqs in Multisample Analyses Commands in mothur	1	3143	June 24, 2010
How to get all unique seqs from modified representative set? Commands in mothur	1	1987	June 17, 2014
Summary.tax and relative abundance per sample Commands in mothur	2	26	July 28, 2024
Issue in ASV make.shared mothur bugs	2	716	September 11, 2021

Pull representative ASV from each sample based on relabund

Related topics