pre.cluster problem

I’m getting the following error when I run the pre.cluster command:

Your name file contains 40766 valid sequences, and your groupfile contains 81532, please correct.

The command was executed as follows:

pre.cluster(fasta=MDVExhaustive_Mothur_Mod.unique.good.filter.unique.fasta, name=MDVExhaustive_Mothur_Mod.unique.good.filter.unique.accnos, group=GroupFile_Unique.group, diffs=2, processors=8)

I created my name file by using the list.seqs command with my fasta:

_list.seqs(fasta=MDVExhaustive_Mothur_Mod.unique.good.filter.unique.fasta)

Output File Names:
MDVExhaustive_Mothur_Mod.unique.good.filter.unique.accnos_

I then used R to create my group file from my name file.

My fasta, name, and group files all have the same number of sequences as verified using grep:

grep -o '’ MDVExhaustive_Mothur_Mod.unique.good.filter.unique.accnos | wc -l
(81,532)

grep -o ‘_’ GroupFile_Unique.group | wc -l
(81,532)

grep -o ‘>’ MDVExhaustive_Mothur_Mod.unique.good.filter.unique.fasta | wc -l
(81,532)_

The output gives me a list of all of the missing names, however, when I check my names file those names are actually present. The names it tells me are missing are distributed throughout my names file - they are not in a single cluster. Any help would be greatly appreciated!

Dave

Dave-

You appear to be giving pre.cluster your accnos file, not a names file. Can you try again?

pat

I meet the same problem in pre.cluster commond, the result show as below:
"Processing group MID1.archaea:
Error: diffs is greater than your sequence length.

[ERROR]: Your name file contains 0 valid sequences, and your groupfile contains 423, please correct.

[ERROR]: Your name file contains 0 valid sequences, and your groupfile contains 564, please correct.

[ERROR]: Your name file contains 0 valid sequences, and your groupfile contains 564, please correct.
[ERROR]: process 0 only processed 1 of 3 groups assigned to it, quitting.
[ERROR]: process 1 only processed 1 of 3 groups assigned to it, quitting.
[ERROR]: process 2 only processed 1 of 3 groups assigned to it, quitting. "
The commond I used was “pre.cluster(fasta=njsys.shhh.trim.unique.good.filter.unique.fasta, name=njsys.shhh.trim.unique.good.filter.names, group=njsys.shhh.good.groups, diffs=2)”, which is same as SOP, but my sample comtains both bacteria and archaea sequences.
we used the mixed reference file include silva.bacteria.fasta and silva.archaea.fasta.

"Processing group MID1.archaea:
Error: diffs is greater than your sequence length.

That’s your problem. I suspect this comes from your screen.seqs command. If you could start a new thread and post summary.seqs output before and after screen.seqs as well as what you are doing for screen.seqs that would be great.

Pat