no equal numbers of sequences between name and group file

Hi!
I am using the latest version of mothur. When I try to run pre.cluster there is this error coming up:
pre.cluster(fasta=03092012Lanoil.txt.shhh.trim.unique.good.filter.unique.fasta, name=03092012Lanoil.txt.shhh.trim.unique.good.filter.names, group=03092012Lanoil.txt.shhh.good.groups, diffs=2)

[ERROR]: Your name file contains 43374 valid sequences, and your groupfile contains 43432, please correct.

The question is since there is no group option for uniqe.seqs command, I was curious that group file has the non-redundant sequences or not and where might be the source of error? Is this a bug? I appreciate it.
Elham

Hi Elham,

Did you find out what the problem was? I encountered the same problem.

Anybody in the community?

Thanks,

Matthias

This type of error often occurs when you have forgotten to include the name file on a previous command, or if you mistakenly used the wrong file. If you post the previous commands I may be able to spot the error for you.

Thanks for your prompt response. Here are a few more details on my analysis which might help to identify the mistake.

The SOP worked fine with my own data (I quality filtered my reads using LUCY and entered the SOP after generating groupfile and namefile) - until I wanted to assign phylogenetic information to the OTUs and the sequences that were binned together. After entering the make.shared command for the standardization, I received an error message saying that one of my sequences (sequence 1B_FTWZP8R02ILDH4) appeared more than once in the groupfile. Interestingly the file named “final.tx.missing.group” is empty and when I counted (using the grep command) how often sequence 1B_FTWZP8R02ILDH4 is present in either the final.tx.list and final.groups file - it was found only once. (Calculating the tree with the OTUs on the other hand seemed to work fine - just in case this helps.)

Below is are the detailed commands and outputs I received from MOTHUR.

Thanks,

Matthias


********************
28) Assign phylogenetic information to each of the OTUs

mothur > classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy, label=0.03)

Output File Names:
final.an.0.03.cons.taxonomy
final.an.0.03.cons.tax.summary

  1. Phylotypes will be assigned to sequences and binned together

mothur > phylotype(taxonomy=final.taxonomy, name=final.names, label=1)

Output files:
final.tx.list
final.tx.sabund
final.tx.rabund

  1. Standardization of output file (based on minimal sequences generated)

mothur > make.shared(list=final.tx.list, group=final.groups, label=1)

[ERROR]: 1B_FTWZP8R02ILDH4 is in your listfile and not in your groupfile. Please correct.
Your group file contains 47344 sequences and list file contains 47345 sequences. Please correct.
For a list of names that are in your list file and not in your group file, please refer to final.tx.missing.group.
1B_FTWZP8R02ILDH4 is in your list file more than once. Sequence names must be unique. please correct.

When I take a look at the files - this is what I find:

$ grep -c “1B_FTWZP8R02ILDH4” final.tx.list
1
$ grep -c “1B_FTWZP8R02ILDH4” final.groups
1

NOTE: File final.tx.missing.group is empty

Are you using version 1.25? We had a small bug in version 1.24 that was causing a similar issue.

Hi Sara,

Yes - I was using mothur v.1.24.1 Last updated: 3/16/2012.

As this is the first time for me to use and update mothur I was hoping if you could give me some advice on how to perform the update and the subsequent analysis?

Can I simple download the new version, replace the old version and proceed my analysis from here? Or have there been changes in the upstream procedure that might cause problems with the downstream process and starting the analysis from the beginning should be performed?

Thanks,

Matthias

EXCELLENT. Using the latest version took care of the problem.

Thank you!!!

Matthias