Groups file for split.groups - Schloss SOP inspired


I have put together a pipeline for preprocessing my sequences. I have several sff files that I would like to process in the same manner - each file includes several sequence tags by which I would like to be able to split the the results on at the end of the pipeline. So, I go through the denoising and chimera checking more or less as described in the Schloss SOP pipeline. Then I would like to use the split.groups command at the end to split the end result, but only get an error message.

Here is my pipeline:

sffinfo(sff=GQOGG8V06.sff, flow=T)


trim.flows(flow=GQOGG8V06.flow, oligos=…/infodata/, pdiffs=2, bdiffs=1, minflows=360, maxflows=720, processors=16)
shhh.flows(file=GQOGG8V06.flow.files, processors=16)

Trimming seqs that don’t match the primer, and

which are too short, and which have too low quality

trim.seqs(fasta=GQOGG8V06.shhh.fasta, name=GQOGG8V06.shhh.names, oligos=…/infodata/, pdiffs=0, bdiffs=0, maxhomop=8, minlength=200, processors=16)
chimera.uchime(fasta=GQOGG8V06.shhh.trim.unique.fasta, name=GQOGG8V06.shhh.trim.names,reference=self)
remove.seqs(fasta=GQOGG8V06.shhh.trim.unique.fasta, accnos=GQOGG8V06.shhh.trim.unique.uchime.accnos, name=GQOGG8V06.shhh.trim.names, group=GQOGG8V06.shhh.groups)
chop.seqs(fasta=GQOGG8V06.shhh.trim.unique.pick.fasta, numbases=400, keep=front, short=T)
split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, group=GQOGG8V06.shhh.pick.groups)

Everything goes very nice until the last line where I get:
mothur > split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, group=GQOGG8V06.shhh.pick.groups)

[ERROR]: Your fasta file contains 1139 valid sequences, and your groupfile contains 17094, please correct. Did you forget to include the name file?

mothur >

Now, I have been trying to figure out how to get the right groups file for this one, but I believe that I am missing something somewhere.

Thanks for your help!


split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, group=GQOGG8V06.shhh.pick.groups)

should probably be…

split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, name=GQOGG8V06.shhh.trim.pick.names, group=GQOGG8V06.shhh.pick.groups)

Hm. When I try this I get:

mothur > split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, name=GQOGG8V06.shhh.trim.pick.names, group=GQOGG8V06.shhh.pick.groups)
missing name GQOGG8V06DUZ1K
missing name GQOGG8V06DUZ9F
missing name GQOGG8V06DUZV4
missing name GQOGG8V06DUZVO
missing name GQOGG8V06DUZXQ

[ERROR]: Your name file contains 1207 valid sequences, and your groupfile contains 17094, please correct.

mothur >


Can you email GQOGG8V06.shhh.trim.unique.fasta, GQOGG8V06.shhh.trim.names, and GQOGG8V06.shhh.groups to

Sorry I missed this earlier, you have…


this should be…

unique.seqs(fasta=GQOGG8V06.shhh.trim.fasta, GQOGG8V06.shhh.trim.names)

Let us know how it goes…

This works now! Thanks a lot for your help.

But, just one question at the end:

I have sequences from several specimens. In this process I run unique on my sequences, which basically collapses sequences into one, albeit with a names file to keep track on them. What I am wondering about is what happens when I then do split.groups at the end of my process. If two specimens share the same sequence, each sequence set does get its own copy, right?

Thanks again!


It should split everything up