how to get groups, sequences and numbers in the same file

Hi!
Background: We had 30 samples, and combined the sff files using the sff.multiple command. We then cleaned the data using the 454 SOP. We are confused about how to get the type of output we need.
Specifically, for every OTU, we need to know:

  1. what is the representative sequence (so we can blast it)
  2. how many copies of each OTU occur in each of the 30 samples

We have been able to obtain one or the other of the above (using uclust), but not both together. Our main problem has been retaining the information on what samples things belong to once OTUs are assigned. We need advice on how to correctly obtain information on both (1) and (2) after the data is cleaned.

Thanks!

  1. what is the representative sequence (so we can blast it)

get.oturep. But blast is really a bad way to do taxonomy. Better is to run classify.seqs on all of the sequences and get a consensus taxonomy (classify.otu) for the OTU. This is done in the SOPs.

  1. how many copies of each OTU occur in each of the 30 samples

make.shared will generate the table of counts for each otu per sample.

Thank you for the quick reply!

We will try these commands and come back if we have further questions! :slight_smile:

Related question:

is there a way to get representative sequences without a .dist file?

I have done clustering of euk. ITS regions in cd-hit, then used a perl script to convert the clustr file to a list file for Mothur. Then i used make. shared to look at which OTUs were shared among my samples.
I classified using classify.otu as well.

Now I would like to blast some of the more interesting sequences, so I would like to pull representative sequences from each of the clusters. Any ideas or hints?

Thanks

In version 1.31 we added method=abundance, which chooses the most abundant sequence in the OTU as the representative. If there is a tie in max abundance we randomly choose a sequence from the ties. A distance matrix is not needed for this method, but a names or countfile is.

get.oturep(list=yourListFile, count=yourCountfile)

or

get.oturep(list=yourListFile, name=yourNamefile)

You may also want to provide your fasta file, so mothur will return a fasta file with the representative sequences in it.

get.oturep(list=yourListFile, name=yourNamefile, fasta=yourFastaFile)