Pain in my Groups file

wtaheri · January 25, 2013, 6:10pm

Hi Pat and gang,
I have a few questions.

I am running multiple treatments for diversity analysis of the 18S rSSU DNA with a custom taxonomy database.
I was following the esophogeal community example and…

dist.seqs() doesn’t like my data because all the sequences are not the same length. They cover the same region but vary in length by around 10 base pairs. Not sure if this is a bug or a requirement.

So I skipped that, ran unique.seqs() to generate the names file then ran pairwise.seqs() to generate the distance matrix utilizing only unique sequences. I then classified my sequences and went on to clustersplit with the fasta method. This crashed mother and generated the same error about my sequences not being the same length. I went around the problem by using the column formatted unique distance files and setting splitmethod to classify instead of fasta.

Now I am trying to make my shared files and am kind of stuck. I tried:
make.shared(list=LP_NoChim.unique.an.list, group=merge.groups, label=unique-0.03-0.05-0.10)

The error says it wants to find everything in the group file to also be in the list file. But my group file contains everything while my treatments are all separate files, each representing a subset of what is in the groups file.
Not sure how I’m supposed to handle that.

Help appreciated,
Wendy

pschloss · January 26, 2013, 6:31pm

First, I’d stay far away form the Esophagus example, its very old at this point. Stick to the SOP…

dist.seqs() doesn’t like my data because all the sequences are not the same length. They cover the same region but vary in length by around 10 base pairs. Not sure if this is a bug or a requirement.

Correct- they must be aligned unless you do pairwise.dist. But there are all sorts of reasons to align the sequences first (see my recent ISMEJ commentary on the issue)

So I skipped that, ran unique.seqs() to generate the names file then ran pairwise.seqs() to generate the distance matrix utilizing only unique sequences. I then classified my sequences and went on to clustersplit with the fasta method. This crashed mother and generated the same error about my sequences not being the same length. I went around the problem by using the column formatted unique distance files and setting splitmethod to classify instead of fasta.

Right. When you use the fasta method you are telling mothur to split your sequences by taxonomy and then calculate distances, etc. If you have to use pairwise.dist then you should use the distance method for splitting with the taxonomy.

Now I am trying to make my shared files and am kind of stuck. I tried:
make.shared(list=LP_NoChim.unique.an.list, group=merge.groups, label=unique-0.03-0.05-0.10)

The error says it wants to find everything in the group file to also be in the list file. But my group file contains everything while my treatments are all separate files, each representing a subset of what is in the groups file.
Not sure how I’m supposed to handle that.

If it’s in the group file, it has to be in the list file. Everything should be clustered together.

Hope this is helpful,
Pat

Topic		Replies	Views
names file Commands in mothur	5	5195	April 13, 2010
New groupfile needed after unique.seqs? Commands in mothur	5	3985	April 8, 2013
Dist.seqs of 700 000 illumina sequences Commands in mothur	4	4459	March 31, 2013
group file problem for read.dist command Commands in mothur	1	3004	April 20, 2010
Mismatch .groups file after using chop.seqs Commands in mothur	1	1975	October 12, 2012

Pain in my Groups file

Related topics