I am using qual and fasta file as raw data to analyze my data from ion torrent PGM. I am having trouble running the command pre.cluster. The log file says,

mothur > pre.cluster(fasta=nihar.trim.unique.good.filter.unique.fasta, name=nihar.trim.unique.good.filter.names, group=nihar.good.groups, diffs=2)

Using 2 processors.

[ERROR]: Your name file contains 1435574 valid sequences, and your groupfile contains 3317132, please correct.
[ERROR]: process 0 only processed 1 of 59 groups assigned to it, quitting.

Running command: unique.seqs(fasta=nihar.trim.unique.good.filter.unique.precluster.fasta, name=nihar.trim.unique.good.filter.unique.precluster.names)
[ERROR]: nihar.trim.unique.good.filter.unique.precluster.fasta is blank, aborting.
Using nihar.trim.unique.good.filter.unique.fasta as input file for the fasta parameter.
[ERROR]: nihar.trim.unique.good.filter.unique.precluster.names is blank, aborting.

and now if I proceed without a group file at this step it worked fine,

mothur > pre.cluster(fasta=nihar.trim.unique.good.filter.unique.fasta, name=nihar.trim.unique.good.filter.names, diffs=2)

Using 1 processors.
1055535 534969 520566
Total number of sequences before precluster was 1055535.
pre.cluster removed 520566 sequences.

It took 95640 secs to cluster 1055535 sequences.

Output File Names:

But now I am having problem because at the command make.shared I need a group file,

mothur > make.shared(, group=final.groups, label=0.03)
Unable to open final.groups
You need to provide a groupfile or countfile if you are going to use the list format.
[ERROR]: did not complete make.shared.

mothur > make.shared(, label=0.03)
Unable to open
Using as input file for the list parameter.
You need to provide a groupfile or countfile if you are going to use the list format.
[ERROR]: did not complete make.shared.

I do not know what should I do at this situation. It will be very helpful if anyone has any idea how to solve this problem.

It looks like you may have forgotten to include the group file on one of the commands before pre.cluster so it contains extra sequences. No worries, mothur can help, :). You can use the list.seqs and get.seqs commands to select the sequences you want.

mothur > list.seqs(name=nihar.trim.unique.good.filter.names) - lists all the sequences in your names file
mothur > get.seqs(accnos=current, group=nihar.good.groups) - selects those sequences from the group file
mothur > pre.cluster(fasta=nihar.trim.unique.good.filter.unique.fasta, name=current, group=current, diffs=2)

Thanks, it worked fine. So I followed the 454 SOP. As I did not have any MOCK so I skipped the “Error Analysis” part. So now I am at dist.seqs step and it is taking more than 3 days to complete. It is showing 2 output files in making final.dist and final.dist0.temp and both are over 70GB each. The SOP says if this file >100GB there is something wrong. Last time I completed this step with a 50 GB final.dist output file. Should I stop now? I can’t see where it went wrong.

