no equal numbers of sequences between name and group file

elhammehr · April 6, 2012, 2:44pm

Hi!
I am using the latest version of mothur. When I try to run pre.cluster there is this error coming up:
pre.cluster(fasta=03092012Lanoil.txt.shhh.trim.unique.good.filter.unique.fasta, name=03092012Lanoil.txt.shhh.trim.unique.good.filter.names, group=03092012Lanoil.txt.shhh.good.groups, diffs=2)

[ERROR]: Your name file contains 43374 valid sequences, and your groupfile contains 43432, please correct.

The question is since there is no group option for uniqe.seqs command, I was curious that group file has the non-redundant sequences or not and where might be the source of error? Is this a bug? I appreciate it.
Elham

HessMatthias · May 2, 2012, 11:06pm

Hi Elham,

Did you find out what the problem was? I encountered the same problem.

Anybody in the community?

Thanks,

Matthias

westcott · May 3, 2012, 10:32pm

This type of error often occurs when you have forgotten to include the name file on a previous command, or if you mistakenly used the wrong file. If you post the previous commands I may be able to spot the error for you.

HessMatthias · May 3, 2012, 11:01pm

Thanks for your prompt response. Here are a few more details on my analysis which might help to identify the mistake.

The SOP worked fine with my own data (I quality filtered my reads using LUCY and entered the SOP after generating groupfile and namefile) - until I wanted to assign phylogenetic information to the OTUs and the sequences that were binned together. After entering the make.shared command for the standardization, I received an error message saying that one of my sequences (sequence 1B_FTWZP8R02ILDH4) appeared more than once in the groupfile. Interestingly the file named “final.tx.missing.group” is empty and when I counted (using the grep command) how often sequence 1B_FTWZP8R02ILDH4 is present in either the final.tx.list and final.groups file - it was found only once. (Calculating the tree with the OTUs on the other hand seemed to work fine - just in case this helps.)

Below is are the detailed commands and outputs I received from MOTHUR.

Thanks,

Matthias

********************
28) Assign phylogenetic information to each of the OTUs

mothur > classify.otu(list=final.an.list, name=final.names, taxonomy=final.taxonomy, label=0.03)

Output File Names:
final.an.0.03.cons.taxonomy
final.an.0.03.cons.tax.summary

Phylotypes will be assigned to sequences and binned together

mothur > phylotype(taxonomy=final.taxonomy, name=final.names, label=1)

Output files:
final.tx.list
final.tx.sabund
final.tx.rabund

Standardization of output file (based on minimal sequences generated)

mothur > make.shared(list=final.tx.list, group=final.groups, label=1)

[ERROR]: 1B_FTWZP8R02ILDH4 is in your listfile and not in your groupfile. Please correct.
Your group file contains 47344 sequences and list file contains 47345 sequences. Please correct.
For a list of names that are in your list file and not in your group file, please refer to final.tx.missing.group.
1B_FTWZP8R02ILDH4 is in your list file more than once. Sequence names must be unique. please correct.

When I take a look at the files - this is what I find:

$ grep -c “1B_FTWZP8R02ILDH4” final.tx.list
1
$ grep -c “1B_FTWZP8R02ILDH4” final.groups
1

NOTE: File final.tx.missing.group is empty

westcott · May 4, 2012, 1:26pm

Are you using version 1.25? We had a small bug in version 1.24 that was causing a similar issue.

HessMatthias · May 4, 2012, 6:33pm

Hi Sara,

Yes - I was using mothur v.1.24.1 Last updated: 3/16/2012.

As this is the first time for me to use and update mothur I was hoping if you could give me some advice on how to perform the update and the subsequent analysis?

Can I simple download the new version, replace the old version and proceed my analysis from here? Or have there been changes in the upstream procedure that might cause problems with the downstream process and starting the analysis from the beginning should be performed?

Thanks,

Matthias

HessMatthias · May 5, 2012, 12:56am

EXCELLENT. Using the latest version took care of the problem.

Thank you!!!

Matthias

Topic		Replies	Views
more sequences in groupfile than in name file mothur bugs	4	4135	July 13, 2012
groupfile has more valid sequences in it than my namefile mothur bugs	7	11364	October 24, 2012
pre.cluster problem mothur bugs	3	5387	October 20, 2014
Name file and group file sequence discrepancy Commands in mothur	5	3845	May 29, 2013
Another issue...Pre.cluster Commands in mothur	3	2614	October 19, 2015

no equal numbers of sequences between name and group file

Related topics