New groupfile needed after unique.seqs?

Hi all,

I’m pretty far down the pipeline analysis and all I need is to make a shared file showing shared sequences between samples. I started with a FASTA file of clean sequences since quality control was done somewhere else, and I had a group file for that. now I’m running:

make.shared(list=SEQS_aligned.unique.filter.unique.phylip.fn.list,group=SEQSaligned.group,label=0.030)

and get a zillion lines with error

“[ERROR]: TULA_9402 is in your groupfile and not your listfile. Please correct.”

does this mean I need to make a new groupfile for the representative set of sequences at label 0.030? I wonder how is the “unique” abundance information reincorporated into that then…

Did you forget to include the names file when you ran the cluster command?

I didn’t.

I used the unique.filter.names file produced by the second run of unique.seq after filtering. I supposed this contained the abundance information after unique.seq, so I didn’t used the first .names file produced by the first run of unique.seq.

Usually the error you are getting is caused by forgetting to include the name file on a command. If you post the commands you ran, I may be able to spot it.

Keep getting the same error. This is the command line I’m running (with newlines after the “;” for easier reading)

mothur “#unique.seqs(fasta=S3_aligned.fasta); filter.seqs(fasta=S3_aligned.unique.fasta,vertical=T,processors=6);
unique.seqs(fasta=S3_aligned.unique.filter.fasta,name=S3_aligned.names);
dist.seqs(fasta=S3_aligned.unique.filter.unique.fasta,processors=6,output=lt,cutoff=0.030);
hcluster(phylip=S3_aligned.unique.filter.unique.phylip.dist,method=furthest,cutoff=0.030,hard=t,name=S3_aligned.unique.filter.names);
make.shared(list=S3_aligned.unique.filter.unique.phylip.fn.list,group=S3.group,label=0.030)”

How did you create the group file? Could you run the following commands so we can see where the discrepancy starts?

split.groups(fasta=S3_aligned.fasta, group=S3.group)
unique.seqs(fasta=current)
split.groups(fasta=current, name=current, group=current)
filter.seqs(fasta=current,vertical=T,processors=6)
split.groups(fasta=current, name=current, group=current)
unique.seqs(fasta=current, name=current)
split.groups(fasta=current, name=current, group=current)
dist.seqs(fasta=current,processors=6,output=lt,cutoff=0.030)
split.groups(fasta=current, name=current, group=current)
hcluster(phylip=current,method=furthest,cutoff=0.030,hard=t,name=current)
list.seqs(list=current)

Also, you might be interested in Pat’s example analysis’, http://www.mothur.org/wiki/Schloss_SOP or http://www.mothur.org/wiki/MiSeq_SOP.