This occurs at any point where a names file should be included in the analysis (all steps that ask for the name file).
The duplicate values that are included in the groups file occurs each time the group file is created.
Here an instance where I get the message that names files is a problem, but the analysis “seems” to work:
mothur > unique.seqs(fasta=allsffRPtrmd.trim.fasta, name=allsffRPtrmd.trim.names)
Unable to open C:\mothur\in\allsffRPtrmd.trim.fasta. Trying output directory C:\mothur\out\allsffRPtrmd.trim.fasta
Unable to open C:\mothur\in\allsffRPtrmd.trim.names. Trying output directory C:\mothur\out\allsffRPtrmd.trim.names
[ERROR]: 07HYPI40V02DIQLX is in your fasta file, and not in your namefile, please correct.
[ERROR]: 07HYPI40V02EM9O0 is in your fasta file, and not in your namefile, please correct.
. . . .
[ERROR]: 39ICYDE7004IHIUI is in your fasta file, and not in your namefile, please correct.
[ERROR]: 39ICYDE7004IR2IM is in your fasta file, and not in your namefile, please correct.
33139 24151
Output File Names:
C:\mothur\out\allsffRPtrmd.trim.unique.names
C:\mothur\out\allsffRPtrmd.trim.unique.fasta
But the resulting file is problematic.
mothur > summary.seqs(name=current)
Using C:\mothur\out\allsffRPtrmd.trim.unique.names as input file for the name parameter.
Using C:\mothur\out\allsffRPtrmd.trim.unique.fasta as input file for the fasta parameter.
Using 1 processors.
[ERROR]: ‘07HYPI40V02DIQLX’ is not in your name or count file, please correct.
Note: But it is there . . . this causes the summary.seqs to stop processing the file.
[b]Another attempt[/b]:
Output File Names:
C:\mothur\out\allsffRPtrmd.trim.fasta
C:\mothur\out\allsffRPtrmd.scrap.fasta
C:\mothur\out\allsffRPtrmd.trim.qual
C:\mothur\out\allsffRPtrmd.scrap.qual
C:\mothur\out\allsffRPtrmd.trim.names
C:\mothur\out\allsffRPtrmd.scrap.names
C:\mothur\out\allsffRPtrmd.groups
mothur > summary.seqs(fasta=allsffRPtrmd.trim.fasta, name=allsffRPtrmd.trim.names)
Unable to open C:\mothur\in\allsffRPtrmd.trim.fasta. Trying output directory C:\mothur\out\allsffRPtrmd.trim.fasta
Unable to open C:\mothur\in\allsffRptrmd.trim.names. Trying output directory C:\mothur\out\allsffRptrmd.trim.names
Using 1 processors.
[ERROR]: '07HYPI40V02DIQLX' is not in your name or count file, please correct.
But again, here is an instance where it seems to work
mothur > summary.seqs(name=current)
Using C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.names as input file for the name parameter.
Using C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.fasta as input file for the fasta parameter.
Using 2 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 337 96 0 2 1
2.5%-tile: 1 340 116 0 3 819
25%-tile: 1 340 123 0 4 8190
Median: 1 340 123 0 4 16379
75%-tile: 1 340 133 0 4 24568
97.5%-tile: 1 340 143 0 5 31939
Maximum: 1 340 171 4 7 32757
Mean: 1 339.999 126.713 0.0141954 3.95842
of unique seqs: 7210
total # of seqs: 32757
Output File Names:
C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.summary
But immediately after doing that, there are issues with the file. And, of course, the error regarding the duplicate values in the groups file, which I had not corrected.
mothur > pre.cluster(fasta=current, name=current, group=allsffRPtrmd.pick.good.groups, diffs=2)
Using C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.fasta as input file for the fasta parameter.
Using C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.names as input file for the name parameter.
Unable to open C:\mothur\in\allsffRPtrmd.pick.good.groups. Trying output directory C:\mothur\out\allsffRPtrmd.pick.good.groups
Using 2 processors.
Your groupfile contains more than 1 sequence named 07HYPI40V02DIQLX, sequence names must be unique. Please correct.
Your groupfile contains more than 1 sequence named 07HYPI40V02EM9O0, sequence names must be unique. Please correct.
Your groupfile contains more than 1 sequence named 07HYPI40V02DH8EP, sequence names must be unique. Please correct.
. . .
. . .
Your groupfile contains more than 1 sequence named 39ICYDE7004IHIUI, sequence names must be unique. Please correct.
Your groupfile contains more than 1 sequence named 39ICYDE7004IR2IM, sequence names must be unique. Please correct.
[ERROR]: Your name file contains 0 valid sequences, and your groupfile contains 32801, please correct.
[ERROR]: process 0 only processed 1 of 6 groups assigned to it, quitting.
/******************************************/
Running command: unique.seqs(fasta=C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.precluster.fasta, name=C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.precluster.names)
[ERROR]: C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.precluster.fasta is blank, aborting.
Using C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.fasta as input file for the fasta parameter.
[ERROR]: C:\mothur\out\allsffRPtrmd.trim.unique.pick.good.filter.unique.precluster.names is blank, aborting.
/******************************************/
At this point, the program crashes.
I have noticed that there was an issue with the names files once before, and it seemed to be associated with an external program that was modifying the files in some way. It would be nice if the information regarding what program it was and what modification was done to the names files, was posted to the forum. That way, I could check to see if this is also an issue. What I have done is tried changing the line ends between the various formats: Unix, MacOSX, Windows (as this is commonly an issue that can affect a programs ability to use a file). This did not remedy the problem.
Why the groups files generated contain duplicate values, I cannot say.
Hope this helps.
James