using groups files in read.otu

jamesafoster · October 26, 2009, 6:55pm

I have one very big groups file with all the groups for all my datasets (in one experiment). I could create a separate groups file for each dataset, but since the big file is the union of all the small files, I thought it wouldn’t make a difference.

in my case, “milk.groups” has groups definitions for all the files from my 454 run. “12C.groups” is just the groups that pertain to the “12C” samples. Here is my run:

mothur > read.otu(list=1.12C.unique.filter.fn.list, group=12C.groups, label=unique-0.03)

unique
0.03

mothur > read.otu(list=1.12C.unique.filter.fn.list, group=milk.groups, label=unique-0.03)

Your group file contains 379933 sequences and list file contains 4235 sequences. Please correct.
For a list of names that are in your group file and not in your list file, please refer to 1.12C.unique.filter.fn.missing.name.

Notice, no errors with the smaller (12C.groups) file, error message with the larger one.

Am I using group files correctly? It seems I should just ignore this message, or treat it as a warning rather than an error. Right?

westcott · October 29, 2009, 1:07pm

This still needs to be treated as an error, because some of the calculators require an accurate count of groups and number of sequences in each group. But, thanks to your suggestion, we will be adding the groups parameter to the read.otu so that you could give mothur the large group file and it would create a .rabund and .groups for the groups you selected and run without error. Look for this change in 1.7!

jamesafoster · November 3, 2009, 10:55pm

Very nice! Thanks.

jamesafoster · December 31, 2009, 6:26pm

ok, so the obvious next question is: how do I get the .groups file, especially AFTER I’ve done filtering and screening?

I can generate a groups file from my original set of sequences, since they are in filenames associated with groups (these are barcoded data, so each file is a separate barcode which corresponds to a different treatment, and therefore belongs to a different “mothur group”). But once I’ve run unique.seqs and screened out low quality sequences, the remaining sequences are a strict subset of what I used to build my original (big) .groups file.

I could do this with a perl script, choosing the subset of groups from the original .groups file whose sequence IDs match sequence IDs in the sequence sets that have passed quality control. But it would make more sense to have some mechanism for generated the refined groups file within mothur–or a parameter to force the calculators to ignore sequences in a group which do not appear in the .list file.

(I’m running 1.7.2)

hdrilling1 · January 1, 2010, 9:52pm

In the screen.seqs command there is a group option where you give mothur your original groups file. In the output you will get a good.groups file.
Hope that helps.

pschloss · January 2, 2010, 5:26pm

The same is true for the names file - see my post to your other question…

briellejast · January 13, 2010, 10:26am

I think .seqs command works well here.But filtering and screening can remove this error message from here.

Topic		Replies	Views
read.otu error message Commands in mothur	2	36135	December 11, 2009
where is the missing group(s)? Commands in mothur	2	50406	January 15, 2010
groups file out of sync with Costello pipeline Commands in mothur	11	10827	August 30, 2012
read.otu() Commands in mothur	2	3667	June 25, 2010
read.otu mothur bugs	1	2982	April 22, 2010

using groups files in read.otu

Related topics