groupfile has more valid sequences in it than my namefile

enigmaticgnome · September 22, 2012, 2:07am

Hi,
I’ve recently got some pyrosequencing data which I want to analyse and my supervisor has pointed me towards using Mothur, I have been through the costello stool analysis tutorial and now I am trying to do the same thing with my own data. When I get up to the preclustering step I get this error
“Your name file contains 1585 valid sequences, and your groupfile contains 5268, please correct”

The command I just ran was pre.cluster(fasta=lscma.trim.unique.good.filter.unique.fasta, name=lscma.trim.unique.good.filter.unique.names, group=lscma.good.groups, diffs=1)
If I run it without the group paramater, it runs fine, but my fasta has several samples in it and I figure I should be preclustering sample by sample?

Am I doing something wrong or is there a command to remove the excess sequences from my group file?

Cheers

westcott · September 24, 2012, 11:12am

I suspect you either have a file mismatch or mothur is not reading the names file correctly. If you send you files to mothur.bugs@gmail.com I can take a look?

roro002 · October 9, 2012, 9:12pm

Hello,

I have been having a similar problem to this. (As has the person posting on http://w.mothur.org/forum/viewtopic.php?f=4&t=1677&p=4462).

I similarly hit a snag at pre.cluster(fasta=current,name=current,group=current) where there are less names in my .align file than my .groups.

The issue for me was that early on in the workflow I performed a trim.seqs(fasta=current,qfile=current,minlength=100,maxlength=250,maxambig=0,maxhomop=8,qaverage=30) command on my data.

This didn’t generate an .accnos file that could then be used to remove the trimmed sequences from the .groups file. I created an accnos file outside of MOTHUR and used it to remove.seqs and then pre.cluster worked fine.

It would be good if the “.groups” file could be put under the scope of the trim.seqs command e.g.
trim.seqs(fasta=current,qfile=current,group=current,minlength=100,maxlength=250,maxambig=0,maxhomop=8,qaverage=30), just to remove trimmed sequence from the list on the .groups file.

Thank you.

Angrist · October 11, 2012, 9:53am

How did you build your own accnos file outside of mothur for that purpose?

I am having the exact same problem with my sequencing data and was trying to build one, but cant reach the right number in the groups file…

roro002 · October 11, 2012, 10:50am

I did it in a very simple way.

In the command line MOTHUR lists all the sequence labels that are in the groups file but aren’t in the FASTA or names files. I just copied and pasted this list into a texteditor.

To make it look like an accnos file I got rid of the text that wasn’t a label by using a “find all” and “replace all” function (replacing the text with nothing).

Lastly I saved it as groups.accnos and it seemed to do the trick.

westcott · October 15, 2012, 3:08pm

You could also run:

list.seqs(name=yourTrimmedNameFile)
get.seqs(group=yourGroupFile, accnos=current)

roro002 · October 24, 2012, 7:31pm

Hi broken850. After running and cleaning up an alignment I’ve found it useful to run the “system(cp )” command to put a fasta suffix on a copy of my alignment file. It is then very quick and easy to open the file in a graphical software to visualise the quality of the alignment before continuing with the MOTHUR workflow.

roro002 · October 24, 2012, 7:33pm

Thank you Westcott for your suggestion of using

list.seqs(name=yourTrimmedNameFile)
get.seqs(group=yourGroupFile, accnos=current)

It works very well.

Cheers.

Topic		Replies	Views
more sequences in groupfile than in name file mothur bugs	4	4135	July 13, 2012
Name file and group file sequence discrepancy Commands in mothur	5	3846	May 29, 2013
pre.cluster problem mothur bugs	3	5390	October 20, 2014
Another issue...Pre.cluster Commands in mothur	3	2616	October 19, 2015
namefile and groupfile mismatch mothur bugs	4	4934	February 20, 2012

groupfile has more valid sequences in it than my namefile

Related topics