how to get groups, sequences and numbers in the same file

pedro.augusto · August 16, 2013, 7:19pm

Hi!
Background: We had 30 samples, and combined the sff files using the sff.multiple command. We then cleaned the data using the 454 SOP. We are confused about how to get the type of output we need.
Specifically, for every OTU, we need to know:

what is the representative sequence (so we can blast it)
how many copies of each OTU occur in each of the 30 samples

We have been able to obtain one or the other of the above (using uclust), but not both together. Our main problem has been retaining the information on what samples things belong to once OTUs are assigned. We need advice on how to correctly obtain information on both (1) and (2) after the data is cleaned.

Thanks!

pschloss · August 16, 2013, 7:37pm

what is the representative sequence (so we can blast it)

get.oturep. But blast is really a bad way to do taxonomy. Better is to run classify.seqs on all of the sequences and get a consensus taxonomy (classify.otu) for the OTU. This is done in the SOPs.

how many copies of each OTU occur in each of the 30 samples

make.shared will generate the table of counts for each otu per sample.

pedro.augusto · August 16, 2013, 7:48pm

Thank you for the quick reply!

We will try these commands and come back if we have further questions!

gaspesie · August 22, 2013, 6:30pm

Related question:

is there a way to get representative sequences without a .dist file?

I have done clustering of euk. ITS regions in cd-hit, then used a perl script to convert the clustr file to a list file for Mothur. Then i used make. shared to look at which OTUs were shared among my samples.
I classified using classify.otu as well.

Now I would like to blast some of the more interesting sequences, so I would like to pull representative sequences from each of the clusters. Any ideas or hints?

Thanks

westcott · August 26, 2013, 1:11pm

In version 1.31 we added method=abundance, which chooses the most abundant sequence in the OTU as the representative. If there is a tie in max abundance we randomly choose a sequence from the ties. A distance matrix is not needed for this method, but a names or countfile is.

get.oturep(list=yourListFile, count=yourCountfile)

or

get.oturep(list=yourListFile, name=yourNamefile)

You may also want to provide your fasta file, so mothur will return a fasta file with the representative sequences in it.

get.oturep(list=yourListFile, name=yourNamefile, fasta=yourFastaFile)

Topic		Replies	Views
Representative OTU Seqs in Multisample Analyses Commands in mothur	1	3144	June 24, 2010
Getting representative OTU sequences Commands in mothur	2	28	March 28, 2025
get.oturep from get.sharedseqs? Commands in mothur	3	21041	October 12, 2010
combine results from get.oturep and make.shared Commands in mothur	3	7359	August 5, 2013
Getting a matrix of OTU counts across samples Commands in mothur	2	3857	March 12, 2014

how to get groups, sequences and numbers in the same file

Related topics