problem with cluster command

Shankar · July 2, 2010, 1:12am

Hi,

When i run the cluster command with the specified cutoff (0.03) , the program changes the cutoff. This is how it reads

mothur > cluster(method=average,cutoff=0.03,showabund=f)
changed cutoff to 0.00981666

Output File Names:
summer.final.an.sabund
summer.final.an.rabund
summer.final.an.list

If I go ahead and run the read.otu it says “Your group file contains 159916 sequences and list file contains 161307 sequences. Please correct.
For a list of names that are in your list file and not in your group file, please refer to summer.final.an.missing.group.”. But my “summer.final.an.missing.group” is empty for me to use the remove.seqs command. I didn’t have these issues in mothur v.1.10.0 and now I am using mothur v.1.11.0 . Can somebody help?

Thanks

pschloss · July 5, 2010, 4:53pm

You’d get the same result with this dataset and command if you used 1.10. Because of how the average neighbor algorithm works when using the sparse matrix structure that is used with cutoff setting, it is necessary for the algorithm to change the cutoff. I’d suggest starting with a larger initial cutoff when you calculate distances/clusters. As for the read.otu command, I suspect you’ve gotten your files crossed when running screen.seqs or remove.seqs. Did you perhaps remove chimeric sequences from the fasta file but not from your group file?

laalaa99stl · July 16, 2010, 6:45pm

So what is a good guideline for “starting with a larger initial cutoff?” I’m finding that I have to set an initial cutoff of 0.20 in order for Mothur to say "changed cutoff to 0.106447). Is this ~2X overcalculation comparable to other people’s experience?

Perhaps the cluster() command should spit out a more descriptive warning message when adjusting the cutoff and the wiki for dist.seqs() should be updated to warn people of the downstream consequences in cluster.seqs(method=average) when starting off with too low a cutoff.

Note that a 2-fold increase in distance cutoff translates to roughly a tenfold increase in the resulting file size. This may have consequences on the feasibility of using average neighbor with very large data sets.

Robin

pschloss · July 21, 2010, 1:49pm

Robin - yeah, I think 2x is a good place to start. Unfortunately, it may be a bit of trial and error. Admittedly, looking at average neighbor is new and we’re still trying to put it through it’s paces. We would love to get other people’s experiences.

isabel_leon · July 23, 2010, 7:19pm

Hi everyone

I’m having a similar problem and no matter what i change is always the same

this are the commands I try so far:
mothur > cluster(method=nearest)
mothur > cluster(method=nearest,cutoff=0.10)
mothur > cluster(method=nearest,cutoff=0.20)
mothur > cluster()

and when I try to sue the read.otu command I get the following message:

mothur > read.otu(list=all_metagenomes_V3.unique.filter.fn.list,group=v3.groups,label=unique-0.03-0.05-0.10)

Your group file contains 326 sequences and list file contains 327 sequences. Please correct.
For a list of names that are in your list file and not in your group file, please refer to all_metagenomes_V3.unique.filter.fn.missing.group.
deepsea_AUUB56197y1 is in your list file more than once. Sequence names must be unique. please correct.

Is always the same message, I’m pretty new with all this command-based program and bioinformatics.

Any help will be really useful, specially now that I’m totally stuck here.
Thanks
Isa

Topic		Replies	Views
Problem with cluster mothur bugs	16	13589	August 6, 2010
read.dist() and/or cluster() error when using cutoff mothur bugs	2	4279	February 24, 2011
setting the clustering cutoff mothur bugs	1	2533	November 29, 2011
cutoff not working correctly in cluster command Commands in mothur	5	5054	March 2, 2012
cluster bug? mothur bugs	2	2707	February 7, 2013

problem with cluster command

When i run the cluster command with the specified cutoff (0.03) , the program changes the cutoff. This is how it reads

Output File Names: summer.final.an.sabund summer.final.an.rabund summer.final.an.list

Related topics

Output File Names:
summer.final.an.sabund
summer.final.an.rabund
summer.final.an.list