pre.cluster command not working


I am using qual and fasta file as raw data to analyze my data from ion torrent PGM. I am having trouble running the command pre.cluster. The log file says,

mothur > pre.cluster(fasta=nihar.trim.unique.good.filter.unique.fasta, name=nihar.trim.unique.good.filter.names, group=nihar.good.groups, diffs=2)

Using 2 processors.

[ERROR]: Your name file contains 1435574 valid sequences, and your groupfile contains 3317132, please correct.
[ERROR]: process 0 only processed 1 of 59 groups assigned to it, quitting.

Running command: unique.seqs(fasta=nihar.trim.unique.good.filter.unique.precluster.fasta, name=nihar.trim.unique.good.filter.unique.precluster.names)
[ERROR]: nihar.trim.unique.good.filter.unique.precluster.fasta is blank, aborting.
Using nihar.trim.unique.good.filter.unique.fasta as input file for the fasta parameter.
[ERROR]: nihar.trim.unique.good.filter.unique.precluster.names is blank, aborting.

and now if I proceed without a group file at this step it worked fine,

mothur > pre.cluster(fasta=nihar.trim.unique.good.filter.unique.fasta, name=nihar.trim.unique.good.filter.names, diffs=2)

Using 1 processors.
1055535 534969 520566
Total number of sequences before precluster was 1055535.
pre.cluster removed 520566 sequences.

It took 95640 secs to cluster 1055535 sequences.

Output File Names:

But now I am having problem because at the command make.shared I need a group file,

mothur > make.shared(, group=final.groups, label=0.03)
Unable to open final.groups
You need to provide a groupfile or countfile if you are going to use the list format.
[ERROR]: did not complete make.shared.

mothur > make.shared(, label=0.03)
Unable to open
Using as input file for the list parameter.
You need to provide a groupfile or countfile if you are going to use the list format.
[ERROR]: did not complete make.shared.

I do not know what should I do at this situation. It will be very helpful if anyone has any idea how to solve this problem.

1 Like

It looks like you may have forgotten to include the group file on one of the commands before pre.cluster so it contains extra sequences. No worries, mothur can help, :). You can use the list.seqs and get.seqs commands to select the sequences you want.

mothur > list.seqs(name=nihar.trim.unique.good.filter.names) - lists all the sequences in your names file
mothur > get.seqs(accnos=current, group=nihar.good.groups) - selects those sequences from the group file
mothur > pre.cluster(fasta=nihar.trim.unique.good.filter.unique.fasta, name=current, group=current, diffs=2)

1 Like

Thanks, it worked fine. So I followed the 454 SOP. As I did not have any MOCK so I skipped the “Error Analysis” part. So now I am at dist.seqs step and it is taking more than 3 days to complete. It is showing 2 output files in making final.dist and final.dist0.temp and both are over 70GB each. The SOP says if this file >100GB there is something wrong. Last time I completed this step with a 50 GB final.dist output file. Should I stop now? I can’t see where it went wrong.

You may be interested in this,