Groups file for split.groups - Schloss SOP inspired

karinlag · December 13, 2011, 10:25am

Hi!

I have put together a pipeline for preprocessing my sequences. I have several sff files that I would like to process in the same manner - each file includes several sequence tags by which I would like to be able to split the the results on at the end of the pipeline. So, I go through the denoising and chimera checking more or less as described in the Schloss SOP pipeline. Then I would like to use the split.groups command at the end to split the end result, but only get an error message.

Here is my pipeline:

set.dir(output=.)
sffinfo(sff=GQOGG8V06.sff, flow=T)
summary.seqs(fasta=GQOGG8V06.fasta)

Denoising

trim.flows(flow=GQOGG8V06.flow, oligos=…/infodata/noisetrimfile.tab, pdiffs=2, bdiffs=1, minflows=360, maxflows=720, processors=16)
shhh.flows(file=GQOGG8V06.flow.files, processors=16)

Trimming seqs that don’t match the primer, and

which are too short, and which have too low quality

trim.seqs(fasta=GQOGG8V06.shhh.fasta, name=GQOGG8V06.shhh.names, oligos=…/infodata/noisetrimfile.tab, pdiffs=0, bdiffs=0, maxhomop=8, minlength=200, processors=16)
summary.seqs(fasta=GQOGG8V06.shhh.trim.fasta)
unique.seqs(fasta=GQOGG8V06.shhh.trim.fasta)
summary.seqs(fasta=GQOGG8V06.shhh.trim.unique.fasta)
chimera.uchime(fasta=GQOGG8V06.shhh.trim.unique.fasta, name=GQOGG8V06.shhh.trim.names,reference=self)
remove.seqs(fasta=GQOGG8V06.shhh.trim.unique.fasta, accnos=GQOGG8V06.shhh.trim.unique.uchime.accnos, name=GQOGG8V06.shhh.trim.names, group=GQOGG8V06.shhh.groups)
summary.seqs(fasta=GQOGG8V06.shhh.trim.unique.pick.fasta)
chop.seqs(fasta=GQOGG8V06.shhh.trim.unique.pick.fasta, numbases=400, keep=front, short=T)
summary.seqs(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta)
split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, group=GQOGG8V06.shhh.pick.groups)

Everything goes very nice until the last line where I get:
mothur > split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, group=GQOGG8V06.shhh.pick.groups)

[ERROR]: Your fasta file contains 1139 valid sequences, and your groupfile contains 17094, please correct. Did you forget to include the name file?

mothur >

Now, I have been trying to figure out how to get the right groups file for this one, but I believe that I am missing something somewhere.

Thanks for your help!

Karin

pschloss · December 14, 2011, 9:20pm

split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, group=GQOGG8V06.shhh.pick.groups)

should probably be…

split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, name=GQOGG8V06.shhh.trim.pick.names, group=GQOGG8V06.shhh.pick.groups)

karinlag · December 15, 2011, 10:50am

Hm. When I try this I get:

mothur > split.groups(fasta=GQOGG8V06.shhh.trim.unique.pick.chop.fasta, name=GQOGG8V06.shhh.trim.pick.names, group=GQOGG8V06.shhh.pick.groups)
[cut]
missing name GQOGG8V06DUZ1K
missing name GQOGG8V06DUZ9F
missing name GQOGG8V06DUZV4
missing name GQOGG8V06DUZVO
missing name GQOGG8V06DUZXQ

[ERROR]: Your name file contains 1207 valid sequences, and your groupfile contains 17094, please correct.

mothur >

Karin

pschloss · December 15, 2011, 4:48pm

Can you email GQOGG8V06.shhh.trim.unique.fasta, GQOGG8V06.shhh.trim.names, and GQOGG8V06.shhh.groups to mothur.bugs@gmail.com?

pschloss · December 15, 2011, 7:21pm

Sorry I missed this earlier, you have…

unique.seqs(fasta=GQOGG8V06.shhh.trim.fasta)

this should be…

unique.seqs(fasta=GQOGG8V06.shhh.trim.fasta, GQOGG8V06.shhh.trim.names)

Let us know how it goes…
Pat

karinlag · December 16, 2011, 5:21pm

This works now! Thanks a lot for your help.

But, just one question at the end:

I have sequences from several specimens. In this process I run unique on my sequences, which basically collapses sequences into one, albeit with a names file to keep track on them. What I am wondering about is what happens when I then do split.groups at the end of my process. If two specimens share the same sequence, each sequence set does get its own copy, right?

Thanks again!

Karin

pschloss · December 16, 2011, 6:20pm

It should split everything up

Topic		Replies	Views
Batch file problem with the count.groups command Commands in mothur	2	2242	March 15, 2013
New problem running SOP Commands in mothur	4	5301	October 24, 2012
no name file generated from initial trim.seqs in Schloss SOP Commands in mothur	1	2570	May 14, 2012
Remove sequence groups Commands in mothur	2	1658	June 24, 2015
Trim.Seqs (Output blank) Commands in mothur	3	618	March 1, 2019

Groups file for split.groups - Schloss SOP inspired

Denoising

Trimming seqs that don’t match the primer, and

which are too short, and which have too low quality

Related topics