I’m getting the following error when I run the pre.cluster command:
Your name file contains 40766 valid sequences, and your groupfile contains 81532, please correct.
The command was executed as follows:
pre.cluster(fasta=MDVExhaustive_Mothur_Mod.unique.good.filter.unique.fasta, name=MDVExhaustive_Mothur_Mod.unique.good.filter.unique.accnos, group=GroupFile_Unique.group, diffs=2, processors=8)
I created my name file by using the list.seqs command with my fasta:
_list.seqs(fasta=MDVExhaustive_Mothur_Mod.unique.good.filter.unique.fasta)
Output File Names:
MDVExhaustive_Mothur_Mod.unique.good.filter.unique.accnos_
I then used R to create my group file from my name file.
My fasta, name, and group files all have the same number of sequences as verified using grep:
grep -o '’ MDVExhaustive_Mothur_Mod.unique.good.filter.unique.accnos | wc -l
(81,532)
grep -o ‘_’ GroupFile_Unique.group | wc -l
(81,532)
grep -o ‘>’ MDVExhaustive_Mothur_Mod.unique.good.filter.unique.fasta | wc -l
(81,532)_
The output gives me a list of all of the missing names, however, when I check my names file those names are actually present. The names it tells me are missing are distributed throughout my names file - they are not in a single cluster. Any help would be greatly appreciated!
Dave