groupfile has more valid sequences in it than my namefile

Hi,
I’ve recently got some pyrosequencing data which I want to analyse and my supervisor has pointed me towards using Mothur, I have been through the costello stool analysis tutorial and now I am trying to do the same thing with my own data. When I get up to the preclustering step I get this error
“Your name file contains 1585 valid sequences, and your groupfile contains 5268, please correct”

The command I just ran was pre.cluster(fasta=lscma.trim.unique.good.filter.unique.fasta, name=lscma.trim.unique.good.filter.unique.names, group=lscma.good.groups, diffs=1)
If I run it without the group paramater, it runs fine, but my fasta has several samples in it and I figure I should be preclustering sample by sample?

Am I doing something wrong or is there a command to remove the excess sequences from my group file?

Cheers

I suspect you either have a file mismatch or mothur is not reading the names file correctly. If you send you files to mothur.bugs@gmail.com I can take a look?

Hello,

I have been having a similar problem to this. (As has the person posting on http://w.mothur.org/forum/viewtopic.php?f=4&t=1677&p=4462).

I similarly hit a snag at pre.cluster(fasta=current,name=current,group=current) where there are less names in my .align file than my .groups.

The issue for me was that early on in the workflow I performed a trim.seqs(fasta=current,qfile=current,minlength=100,maxlength=250,maxambig=0,maxhomop=8,qaverage=30) command on my data.

This didn’t generate an .accnos file that could then be used to remove the trimmed sequences from the .groups file. I created an accnos file outside of MOTHUR and used it to remove.seqs and then pre.cluster worked fine.

It would be good if the “.groups” file could be put under the scope of the trim.seqs command e.g.
trim.seqs(fasta=current,qfile=current,group=current,minlength=100,maxlength=250,maxambig=0,maxhomop=8,qaverage=30), just to remove trimmed sequence from the list on the .groups file.

Thank you.

How did you build your own accnos file outside of mothur for that purpose?

I am having the exact same problem with my sequencing data and was trying to build one, but cant reach the right number in the groups file…

I did it in a very simple way.

In the command line MOTHUR lists all the sequence labels that are in the groups file but aren’t in the FASTA or names files. I just copied and pasted this list into a texteditor.

To make it look like an accnos file I got rid of the text that wasn’t a label by using a “find all” and “replace all” function (replacing the text with nothing).

Lastly I saved it as groups.accnos and it seemed to do the trick.

You could also run:

list.seqs(name=yourTrimmedNameFile)
get.seqs(group=yourGroupFile, accnos=current)

Hi broken850. After running and cleaning up an alignment I’ve found it useful to run the “system(cp )” command to put a fasta suffix on a copy of my alignment file. It is then very quick and easy to open the file in a graphical software to visualise the quality of the alignment before continuing with the MOTHUR workflow.

Thank you Westcott for your suggestion of using

list.seqs(name=yourTrimmedNameFile)
get.seqs(group=yourGroupFile, accnos=current)

It works very well.

Cheers.