Pain in my Groups file

Hi Pat and gang,
I have a few questions.

I am running multiple treatments for diversity analysis of the 18S rSSU DNA with a custom taxonomy database.
I was following the esophogeal community example and…

dist.seqs() doesn’t like my data because all the sequences are not the same length. They cover the same region but vary in length by around 10 base pairs. Not sure if this is a bug or a requirement.

So I skipped that, ran unique.seqs() to generate the names file then ran pairwise.seqs() to generate the distance matrix utilizing only unique sequences. I then classified my sequences and went on to clustersplit with the fasta method. This crashed mother and generated the same error about my sequences not being the same length. I went around the problem by using the column formatted unique distance files and setting splitmethod to classify instead of fasta.

Now I am trying to make my shared files and am kind of stuck. I tried:
make.shared(list=LP_NoChim.unique.an.list, group=merge.groups, label=unique-0.03-0.05-0.10)

The error says it wants to find everything in the group file to also be in the list file. But my group file contains everything while my treatments are all separate files, each representing a subset of what is in the groups file.
Not sure how I’m supposed to handle that.

Help appreciated,
Wendy

First, I’d stay far away form the Esophagus example, its very old at this point. Stick to the SOP…

dist.seqs() doesn’t like my data because all the sequences are not the same length. They cover the same region but vary in length by around 10 base pairs. Not sure if this is a bug or a requirement.

Correct- they must be aligned unless you do pairwise.dist. But there are all sorts of reasons to align the sequences first (see my recent ISMEJ commentary on the issue)

So I skipped that, ran unique.seqs() to generate the names file then ran pairwise.seqs() to generate the distance matrix utilizing only unique sequences. I then classified my sequences and went on to clustersplit with the fasta method. This crashed mother and generated the same error about my sequences not being the same length. I went around the problem by using the column formatted unique distance files and setting splitmethod to classify instead of fasta.

Right. When you use the fasta method you are telling mothur to split your sequences by taxonomy and then calculate distances, etc. If you have to use pairwise.dist then you should use the distance method for splitting with the taxonomy.

Now I am trying to make my shared files and am kind of stuck. I tried:
make.shared(list=LP_NoChim.unique.an.list, group=merge.groups, label=unique-0.03-0.05-0.10)

The error says it wants to find everything in the group file to also be in the list file. But my group file contains everything while my treatments are all separate files, each representing a subset of what is in the groups file.
Not sure how I’m supposed to handle that.

If it’s in the group file, it has to be in the list file. Everything should be clustered together.

Hope this is helpful,
Pat