Cluster command output

miguelangel · November 6, 2012, 4:38pm

Hello there.

I am working with the Schloss SOP with my own sequences and I have a problem:

After running my ‘dist.seqs’ and my ‘cluster’ commands in order to find OTUs present in my fasta I get 3 files: .sabund, .rabund and .list.

How could I get a .fasta file with my cluster at 0.03, for instance? I do not want only the sequences name, but also the nucleotide sequence itself for the representatives of each OTU.

Is it possible?

Thanks a lot

pschloss · November 6, 2012, 7:13pm

try get.oturep

dwaite · November 6, 2012, 7:37pm

Something I’ve been wondering for a while - going the OTU-based approach in the SOP you create OTUs at various scales (say 97% similarity) and perform the downstream analysis based on that, but for a phylogenetic approach it seems to just be at the unique sequence level. If you wanted to go down the phylogenetic route, you could do the following:

dist.seqs(cutoff=0.15)
cluster()
make.shared(label=0.03)
get.oturep()

dist.seqs() <- using the fasta output of get.oturep
clearcut()
etc…And just replace your original names file with the output files from get.oturep().

The thing I want to check is whether there’s a problem using dist.seqs a second time around. I would assume that no cutoff needs to be specified because the sequences have effectively been filtered to that level of dissimilarity already? Is this correct?

pschloss · November 6, 2012, 8:14pm

Ugh. Yeah you could do this, but why? I think part of the intended beauty of the phylogenetic approach is that you don’t have to apply a cutoff.

dwaite · November 7, 2012, 1:58am

I don’t know how valid these reasons are (hence the asking), but the reasons I’ve been thinking about it:

Firstly, just as a data reduction technique. The SOP is to dereplicate, removing identical sequences to reduce the processing requirements. This is really just reducing your data into unique OTUs, so using a lower similarity threshold (say, 97% similarity instead of 100%) is just an extension of this. That said, I realise that the output of get.oturep is not a consensus sequence.

Also, regardless of whether you want to to an OTU- or phylogeny-based analysis, the traditional ecological diversity measures are those used in the OTU approach and I think it’s important to keep these in the workflow. That said, if you end up calculating Chao1/Simpson/Shannon/whatever estimators at a level other than unique, you’re going to dissociate the results of these calculators from the data you’re running through a unifrac. Case by case, it might be interesting to see if the most abundant OTUs are all phylogenetically similar (eg. different strains of E. coli) or quite different, and the only ways I can see to do this are to look at their taxonomies (if they’re very different, and only for broad comparison), or to do the method I said above.

miguelangel · November 7, 2012, 10:36am

Thank you very much. I’ve run get.oturep and it worked perfectly.

Topic		Replies	Views
any idea why all sequences are shown in a separate OTU? Commands in mothur	2	1858	September 3, 2014
OTU differences when analyzing separate fastas or one file Commands in mothur	5	5969	June 23, 2010
OTU clusters Commands in mothur	1	1969	August 29, 2013
Representative OTU seqs closer than OTU cut-off? Theory behind mothur	3	5744	April 13, 2011
Representative OTU Seqs in Multisample Analyses Commands in mothur	1	3145	June 24, 2010

Cluster command output

Related topics