using groups files in read.otu

I have one very big groups file with all the groups for all my datasets (in one experiment). I could create a separate groups file for each dataset, but since the big file is the union of all the small files, I thought it wouldn’t make a difference.

in my case, “milk.groups” has groups definitions for all the files from my 454 run. “12C.groups” is just the groups that pertain to the “12C” samples. Here is my run:

mothur > read.otu(list=1.12C.unique.filter.fn.list, group=12C.groups, label=unique-0.03)

unique
0.03

mothur > read.otu(list=1.12C.unique.filter.fn.list, group=milk.groups, label=unique-0.03)

Your group file contains 379933 sequences and list file contains 4235 sequences. Please correct.
For a list of names that are in your group file and not in your list file, please refer to 1.12C.unique.filter.fn.missing.name.

Notice, no errors with the smaller (12C.groups) file, error message with the larger one.

Am I using group files correctly? It seems I should just ignore this message, or treat it as a warning rather than an error. Right?

This still needs to be treated as an error, because some of the calculators require an accurate count of groups and number of sequences in each group. But, thanks to your suggestion, we will be adding the groups parameter to the read.otu so that you could give mothur the large group file and it would create a .rabund and .groups for the groups you selected and run without error. Look for this change in 1.7!

Very nice! Thanks.

ok, so the obvious next question is: how do I get the .groups file, especially AFTER I’ve done filtering and screening?

I can generate a groups file from my original set of sequences, since they are in filenames associated with groups (these are barcoded data, so each file is a separate barcode which corresponds to a different treatment, and therefore belongs to a different “mothur group”). But once I’ve run unique.seqs and screened out low quality sequences, the remaining sequences are a strict subset of what I used to build my original (big) .groups file.

I could do this with a perl script, choosing the subset of groups from the original .groups file whose sequence IDs match sequence IDs in the sequence sets that have passed quality control. But it would make more sense to have some mechanism for generated the refined groups file within mothur–or a parameter to force the calculators to ignore sequences in a group which do not appear in the .list file.

(I’m running 1.7.2)

In the screen.seqs command there is a group option where you give mothur your original groups file. In the output you will get a good.groups file.
Hope that helps.

The same is true for the names file - see my post to your other question…

I think .seqs command works well here.But filtering and screening can remove this error message from here.