Run the workflow with several plates?

Hi,

I have sequences from 4 full plates. I wonder if it’s possible to input 4x everything in all commands in the workflow? I know some commands you can pass several files separated by “-”. I just finished running the plates separately, and then concatenating the final output files, but I didn’t think about that e.g.the seqs in the fasta files will not have same length… For the analysis I need the processed seqs to be in one file.

I guess I could (locally) align the concatenated sequences using e.g. MAFFT?

Thanks,

You could use the sff.multiple command, http://www.mothur.org/wiki/Sff.multiple, and then proceed with the SOP from there.

What’s the difference between running that and enter several sff files when running the the sffinfo?

You’ll basically get to the end of trim.seqs with one fasta, name, and group file after running each of your files through sffinfo, trim.flows, shhh.flows, and trim.seqs. Then you can pick up the SOP at unique.seqs, align.seqs, etc. If you’re doing 16S, I wouldn’t suggest touching MAFFT.

Thanks for your reply!

What I ended up doing was concatenating the fasta,name & groups files from the 4 plates just where the shhh.flow left off… Although, I now ended up with lesser amount of sequences (started out with the same as the 4 X e.g. fasta files);

EDIT: added #seqs when concatenating after trim.seqs as suggested by Pat below. Also added when running the chimera.uchime with flag dereplicate=t and the remove.seqs with flag dups=f

4 X/concat post shhh.flows /concat post trim.seqs/concat post trim.seqs + new flags
Total: 702979/676445/699322/712524
Unique: 65199/60366/61744/61864

…maybe this isn’t so strange? I could imagine that # of seqs would be reduced even more in the concat dataset in the pre.clustered and chimera.uchime steps compared to when running it separately on 4 subsets.

So if your 4 sff files have the same barcodes for different samples, you may want to concatenate after trim.seqs. One reason why you might be getting different total numbers is in the chimera checking. By default if a sequence is flagged as being a chimera in one sample it will get yanked from all of the samples regardless of whether it was flagged in those samples. This could be happening here. The next release will allow you to turn off this feature. We’ve seen it cause problems in some cases where some sequences that are abundant in some samples get flagged in another sample where it is rare (e.g. pre, during, and post antibiotics).

Pat

Thanks Pat! Btw, when is the new version being released (if soon that is)…?

soon!

Hi,

In the new mothur release, just to be sure, is the “turning off” feature that you (Pat) mentioned above the

dereplicate=t

flag in chimera.X together with the

dups=f

flag in remove.seqs? Btw, which of the chimera algorithms (chimeraslayer, uchime, perseus) do you prefer?

Thanks,

Correct - chimera.uchime.

Thanks Pat!

I ended up with some more sequences this way…

Hi Pat,

I have been trying to combine a number (~100) .sff files from our sequence provider.
I have had to divide them up into 7 groups so that the barcodes weren’t overlapping in each group for the sff.multiple command.
At the end i use the merge.files command to make one .fasta, one .names, one. groups and one.summary files. All seems good when i do summary.seqs on this but then when i do unique.seqs it tells me that

“…
22000 16073
23000 16792
[ERROR]: You already have a sequence named HWWRNVT02EYIIU in your fasta file, sequence names must be unique, please correct.
…”

If i keep going it seems ok (with a number less sequences) but then when i try and pre.cluster it again causes problems with

“Your groupfile contains more than 1 sequence named HWWRNVT02EYIIU, sequence names must be unique. Please correct.”…

Is the problem that by doing multiple sff.multiple it ends up giving some sequences the same name? if so is there anyway around this given that the same barcodes have been used multiple times?

Thanks for your help and i really enjoyed the course a few weeks back.

regards

Guy

Sequence names are unique within and between runs. I suspect that either your splitting wasn’t as perfect as you had hoped or your merging included too many files. Are you using sff.multiple? http://www.mothur.org/wiki/Sff.multiple

Hi Pat,
thanks for your reply.
Yep i used sff.multiple.
I had to do it as 5 separate lots so that there weren’t any barcodes used more than once in a single sff.multiple run.
After i got the outputs from each of these i combined the files (5 of each).

Cheers

Guy

Could you post the sff.multiple and merge.files commands you ran, as well as the input file to sff.multiple? Have you tried to see which sff file HWWRNVT02EYIIU came from? Perhaps you inadvertently added this file twice somewhere?

Thanks for your help, I figured it out.
When putting the files together for sff.multiple i had accidentally put a couple of the entries twice, hence the duplicate names.
All good now.

G