filtered otu's now get rep fasta?

Kendra · November 15, 2013, 1:01am

I’m trying to work with just the abundant OTUs for the moment. I filtered my shared by #of seqs which has reduced it down to <6000 OTUs which I’d now like to analyze with unifraq. I need to create a fasta with just the representative seqs for just those OTUs which I can run dist.seqs on, then build the tree. What would be the most straight forward way to do that? I’ve been trying to write a gawk oneliner that uses an accnos file as keys but don’t know enough gawk to get the line with the sequence, just the one with the OTU number.

gawk ‘FNR==NR {keys[$2]; next} $1 in keys’ b03f100_zone_bor.accnos b03f100_1_nofire_otu2 > b03100_1_nofire_otu_zone_bor

I can’t see how to use get.seqs command but maybe I’m just missing something. Help?

westcott · November 15, 2013, 9:00pm

We are adding a modification to mothur in the next release to help with situations like this. In the next release the list file will include OTU labels. That will allow you to run commands like:

list.otulabels(shared=yourFilteredSharedFile) - list the otu labels in your filtered shared file
get.otuLabels(list=yourListFile) - select otus from list file. Assumes you are using files that relate to eachother.
list.seqs(list=yourListFile) - list sequences from list file
get.dists(phylip=yourDistanceMatrix) - get distances related to abundant otus you selected
get.seqs(fasta=yourFastaFile, name=yourNamesFile) - get abundant sequences
get.oturep(phylip=yourDistanceMatrix, list=yourListFile, fasta=yourFastaFile, name=yourNameFile) - find reprensentative sequences

Without this modification, you will have to select the OTUs from the list file manually. Mothur does sort the list file for you by abundance, so if you selected the top 6000 most abundant OTUs, then that would be the first 6000 OTUs listed in the list file.

Kendra · November 15, 2013, 9:15pm

Thanks. I’m trying to make a database then gawking that. My otus aren’t sorted by abundance but maybe that’s because I used the cluster.split option-they’re ordered by phyla

westcott · November 18, 2013, 11:26am

Could you use the split.abund command with the list file to get the otus you want and then build the shared file? Then you could use the list file with get.seqs?

Kendra · November 20, 2013, 7:12pm

Hey Sarah, I’m not sure I’m following this. I have the filtered shared file (w/ 5578 otus), representative otu fasta for all ~300k otus, list file for all ~300k otus. so split.abund I’d use the representative fasta, full list, and cutoff=100 (which is what I used to filter the shared file)?

Ok ran

split.abund(list=bac_final.an.list, fasta=bac_final.an.0.03.rep.fasta, cutoff=100, label=0.03)

which seems like it worked except I get 5642 sequences instead of 5578 which is how many otus I get when I filter shared with mintotal=100. so close but not exactly the same, any idea why?

westcott · November 22, 2013, 1:29pm

I took a closer look for you. In the split.abund command, we use <= to send an OTU to *.rare. In the filter.shared command, if an OTU is < the cutoff it is removed.

Kendra · November 26, 2013, 10:18pm

oh super subtle. thanks I’ll adjust my command

gaspesie · December 8, 2013, 6:39am

the way i did this was by making a fasta and names file of the just the representative sequences : get.oturep(phylip=, list=,fasta=)
then I opened up the fasta file- voila, ordered by abundance of OTU

Topic		Replies	Views
make.shared, fasta file and the most abudant OTUs Commands in mothur	2	1504	June 17, 2015
select all sequences from one OTU Commands in mothur	6	6452	March 29, 2016
Representative OTU Seqs in Multisample Analyses Commands in mothur	1	3145	June 24, 2010
abundant + shared otus Commands in mothur	0	3949	April 5, 2012
How to get.oturep from a filtered shared file Commands in mothur	4	3220	November 17, 2016

filtered otu's now get rep fasta?

Related topics