problem with "screen.seqs" tool

Greetings,
Using the latest MOTHUR (v.1.9.0) I keep running into a problem with the screen.seqs command.
Whenever I try to cull sequences that do not meet certain criteria I get the message “Your groupfile does not include the sequence … please correct”
Indeed, when I check the resulting “good.group” file, those sequences have not been culled although many other sequences that did not meet those criteria were culled and ended up in the “bad.group” as they should.
Similar things happen when I use the screen.seqs tool on aligned fastas… I get the message “Your namefile does not include the sequence…”

Could this be a bug or is it my input files?

I was wondering if anybody has suggestions.

Best,
Guus Roeselers

Example:
mothur > screen.seqs(fasta=BATCH1.fasta, group=BATCH1.group, minlength=200, maxlength=280, maxhomop=10)
Your groupfile does not include the sequence GCVP7QP02B34BT please correct.
Your groupfile does not include the sequence GCVP7QP02B9IAU please correct.
Your groupfile does not include the sequence GCVP7QP02BQ1MQ please correct.
Your groupfile does not include the sequence GCVP7QP02BZIZ3 please correct.
Your groupfile does not include the sequence GCVP7QP06HE78K please correct.

If you could email your fasta and group files to mothur.bugs@gmail.com we can take a look. We haven’t seen any problems in the past.

Pat

From lines 14646 through 178567 of your groups file all of the sequence names begin with a “>” character. If you remove all of these screen.seqs runs without a problem. It looks like these were generated using 454. Although it isn’t a problem to make your own groups file for 454 data, sometimes it might just be easier to use the trim.seqs command and have us do it for you :).

Hope this helps,
Pat

I am having the same problem with the screen.seqs command. It is not recognizing sequences in my groupfile and it says “please correct”:

Your groupfile does not include the sequence IH9LKJM12HLUP6 please correct.

I couldn’t find any incorrect characters in my group file, so not sure if you have any recommendations? I am using the newest version of mothur.

Thanks!

If you have opened your group file and the sequence does appear to be in the file, it could be an extra spacing issue or perhaps and duplicate line issue. To try and troubleshoot it you could try the following:

mothur > set.dir(debug=t)
mothur > get.groups(group=yourGroupFile, groups=OneOfYourGroups) - this will force a read of your group will with extra output from mothur

Thanks for your reply. It seems to give me a huge list of these sequences that are not in my groups file. I tried to start from the very beginning again and now I’m getting stuck at the summary.seqs command after trim.seqs. It tells me:

mothur > summary.seqs(fasta=1480_AllReads_LA3.trim.fasta, name=1480_AllReads_LA3.trim.names)

Using 2 processors.
[ERROR]: ‘IH9LKJM14IROHJ’ is not in your name or count file, please correct.

Using 2 processors.
[ERROR]: ‘IH9LKJM12HHOM9’ is not in your name or count file, please correct.

When I tried to debug it still told me:

[DEBUG]: count = 0
[DEBUG]: IH9LKJM14II8T9 114
[DEBUG]: IH9LKJM14II8T9 114
[DEBUG]: count = 1
[DEBUG]: IH9LKJM14IPEB6 65
[DEBUG]: IH9LKJM14IPEB6 65
[DEBUG]: count = 2
[DEBUG]: IH9LKJM14IRRKF 199
[DEBUG]: IH9LKJM14IRRKF 199
[DEBUG]: count = 3
[DEBUG]: IH9LKJM14IROHJ 123
[ERROR]: ‘IH9LKJM14IROHJ’ is not in your name or count file, please correct.
[DEBUG]: IH9LKJM14IROHJ 123

I looked for IH9LKJM14IROHJ in my name file, and it’s there. No abnormal spacing or characters that I can see.

ideas from here?

Could you send your fasta, name and logfile to mothur.bugs@gmail.com?