problem with cluster command

Hi,

When i run the cluster command with the specified cutoff (0.03) , the program changes the cutoff. This is how it reads

mothur > cluster(method=average,cutoff=0.03,showabund=f)
changed cutoff to 0.00981666

Output File Names:
summer.final.an.sabund
summer.final.an.rabund
summer.final.an.list

If I go ahead and run the read.otu it says “Your group file contains 159916 sequences and list file contains 161307 sequences. Please correct.
For a list of names that are in your list file and not in your group file, please refer to summer.final.an.missing.group.”. But my “summer.final.an.missing.group” is empty for me to use the remove.seqs command. I didn’t have these issues in mothur v.1.10.0 and now I am using mothur v.1.11.0 . Can somebody help?

Thanks

You’d get the same result with this dataset and command if you used 1.10. Because of how the average neighbor algorithm works when using the sparse matrix structure that is used with cutoff setting, it is necessary for the algorithm to change the cutoff. I’d suggest starting with a larger initial cutoff when you calculate distances/clusters. As for the read.otu command, I suspect you’ve gotten your files crossed when running screen.seqs or remove.seqs. Did you perhaps remove chimeric sequences from the fasta file but not from your group file?

So what is a good guideline for “starting with a larger initial cutoff?” I’m finding that I have to set an initial cutoff of 0.20 in order for Mothur to say "changed cutoff to 0.106447). Is this ~2X overcalculation comparable to other people’s experience?

Perhaps the cluster() command should spit out a more descriptive warning message when adjusting the cutoff and the wiki for dist.seqs() should be updated to warn people of the downstream consequences in cluster.seqs(method=average) when starting off with too low a cutoff.

Note that a 2-fold increase in distance cutoff translates to roughly a tenfold increase in the resulting file size. This may have consequences on the feasibility of using average neighbor with very large data sets.

Robin

Robin - yeah, I think 2x is a good place to start. Unfortunately, it may be a bit of trial and error. Admittedly, looking at average neighbor is new and we’re still trying to put it through it’s paces. We would love to get other people’s experiences.

Hi everyone

I’m having a similar problem and no matter what i change is always the same

this are the commands I try so far:
mothur > cluster(method=nearest)
mothur > cluster(method=nearest,cutoff=0.10)
mothur > cluster(method=nearest,cutoff=0.20)
mothur > cluster()

and when I try to sue the read.otu command I get the following message:

mothur > read.otu(list=all_metagenomes_V3.unique.filter.fn.list,group=v3.groups,label=unique-0.03-0.05-0.10)

Your group file contains 326 sequences and list file contains 327 sequences. Please correct.
For a list of names that are in your list file and not in your group file, please refer to all_metagenomes_V3.unique.filter.fn.missing.group.
deepsea_AUUB56197y1 is in your list file more than once. Sequence names must be unique. please correct.

Is always the same message, I’m pretty new with all this command-based program and bioinformatics.

Any help will be really useful, specially now that I’m totally stuck here.
Thanks
Isa