Run the workflow with several plates?

Johannes · March 25, 2013, 2:52pm

Hi,

I have sequences from 4 full plates. I wonder if it’s possible to input 4x everything in all commands in the workflow? I know some commands you can pass several files separated by “-”. I just finished running the plates separately, and then concatenating the final output files, but I didn’t think about that e.g.the seqs in the fasta files will not have same length… For the analysis I need the processed seqs to be in one file.

I guess I could (locally) align the concatenated sequences using e.g. MAFFT?

Thanks,

westcott · March 25, 2013, 5:45pm

You could use the sff.multiple command, http://www.mothur.org/wiki/Sff.multiple, and then proceed with the SOP from there.

Johannes · March 25, 2013, 7:51pm

What’s the difference between running that and enter several sff files when running the the sffinfo?

pschloss · March 25, 2013, 11:20pm

You’ll basically get to the end of trim.seqs with one fasta, name, and group file after running each of your files through sffinfo, trim.flows, shhh.flows, and trim.seqs. Then you can pick up the SOP at unique.seqs, align.seqs, etc. If you’re doing 16S, I wouldn’t suggest touching MAFFT.

Johannes · March 26, 2013, 12:18am

Thanks for your reply!

What I ended up doing was concatenating the fasta,name & groups files from the 4 plates just where the shhh.flow left off… Although, I now ended up with lesser amount of sequences (started out with the same as the 4 X e.g. fasta files);

EDIT: added #seqs when concatenating after trim.seqs as suggested by Pat below. Also added when running the chimera.uchime with flag dereplicate=t and the remove.seqs with flag dups=f

4 X/concat post shhh.flows /concat post trim.seqs/concat post trim.seqs + new flags
Total: 702979/676445/699322/712524
Unique: 65199/60366/61744/61864

…maybe this isn’t so strange? I could imagine that # of seqs would be reduced even more in the concat dataset in the pre.clustered and chimera.uchime steps compared to when running it separately on 4 subsets.

pschloss · March 26, 2013, 11:40am

So if your 4 sff files have the same barcodes for different samples, you may want to concatenate after trim.seqs. One reason why you might be getting different total numbers is in the chimera checking. By default if a sequence is flagged as being a chimera in one sample it will get yanked from all of the samples regardless of whether it was flagged in those samples. This could be happening here. The next release will allow you to turn off this feature. We’ve seen it cause problems in some cases where some sequences that are abundant in some samples get flagged in another sample where it is rare (e.g. pre, during, and post antibiotics).

Pat

Johannes · March 26, 2013, 1:27pm

Thanks Pat! Btw, when is the new version being released (if soon that is)…?

pschloss · March 26, 2013, 1:41pm

soon!

Johannes · April 8, 2013, 11:22am

Hi,

In the new mothur release, just to be sure, is the “turning off” feature that you (Pat) mentioned above the

dereplicate=t

flag in chimera.X together with the

dups=f

flag in remove.seqs? Btw, which of the chimera algorithms (chimeraslayer, uchime, perseus) do you prefer?

Thanks,

pschloss · April 8, 2013, 4:14pm

Correct - chimera.uchime.

Johannes · April 10, 2013, 1:36pm

Thanks Pat!

I ended up with some more sequences this way…

guy · May 21, 2013, 3:40am

Hi Pat,

I have been trying to combine a number (~100) .sff files from our sequence provider.
I have had to divide them up into 7 groups so that the barcodes weren’t overlapping in each group for the sff.multiple command.
At the end i use the merge.files command to make one .fasta, one .names, one. groups and one.summary files. All seems good when i do summary.seqs on this but then when i do unique.seqs it tells me that

“…
22000 16073
23000 16792
[ERROR]: You already have a sequence named HWWRNVT02EYIIU in your fasta file, sequence names must be unique, please correct.
…”

If i keep going it seems ok (with a number less sequences) but then when i try and pre.cluster it again causes problems with

“Your groupfile contains more than 1 sequence named HWWRNVT02EYIIU, sequence names must be unique. Please correct.”…

Is the problem that by doing multiple sff.multiple it ends up giving some sequences the same name? if so is there anyway around this given that the same barcodes have been used multiple times?

Thanks for your help and i really enjoyed the course a few weeks back.

regards

Guy

pschloss · May 21, 2013, 10:54am

Sequence names are unique within and between runs. I suspect that either your splitting wasn’t as perfect as you had hoped or your merging included too many files. Are you using sff.multiple? http://www.mothur.org/wiki/Sff.multiple

guy · May 21, 2013, 11:59pm

Hi Pat,
thanks for your reply.
Yep i used sff.multiple.
I had to do it as 5 separate lots so that there weren’t any barcodes used more than once in a single sff.multiple run.
After i got the outputs from each of these i combined the files (5 of each).

Cheers

Guy

westcott · May 22, 2013, 2:06pm

Could you post the sff.multiple and merge.files commands you ran, as well as the input file to sff.multiple? Have you tried to see which sff file HWWRNVT02EYIIU came from? Perhaps you inadvertently added this file twice somewhere?

guy · May 26, 2013, 10:53pm

Thanks for your help, I figured it out.
When putting the files together for sff.multiple i had accidentally put a couple of the entries twice, hence the duplicate names.
All good now.

G

Topic		Replies	Views
Using Sff.multiple Commands in mothur	1	1926	May 28, 2013
multiple runs used same barcode Commands in mothur	2	2434	January 9, 2013
sff.multiple crash Theory behind mothur	1	3181	January 15, 2014
sff.multiple outputs some weird files.. mothur bugs	3	3357	September 6, 2013
merge sff files mothur bugs	3	5556	December 7, 2011

Run the workflow with several plates?

Related topics