Merging data sets

I have two 454 data sets on 16S that I would like to analyze together, in total they have 59 barcoded samples. The samples are of the same type with the majority of the bacteria being the same, only from two different sequencing runs, so I figured that it would make sense to analyze them together to get the same OTUs etc. unless there’s a good reason not to do that?

If not, when do I ideally merge them, before the ‘unique.seqs’ step, merging the fasta and names files?

Any input is much appreciated!

I’m now running the shhh.flows and think there is a small error in the manual page, it says ‘shhh.flows(files=…)’ but it should probably be 'shhh.flows(file=…). Nothing major though :slight_smile:

Thank you, Sandra

You might give them different group names when you merge them at first so you can look at the technical variation between runs. We’ve done this in the past and then averaged the relative abundances (not the counts).

Thanks for the input and sorry for not replying. I wanted to give it a try before but still stuck at denoising my data, hence the new post…

I am having a similar problem relating to merging datasets. I am trying to compare four different samples together to end up with some OTU beta diversity statistics. I have four different sff files for four samples analyzed by 454. The sequencing company provided the sff files with the sample barcodes and primers removed.

For each sample, I run through sffinfo, trim.flows, and shhh.flows. After shhh.flows, I end up with file outputs of SAMPLE.trim…shhh.fasta, SAMPLE.trim…shhh.names, and SAMPLE.trim…shhh.groups (in addition to SAMPLE.shhh.fasta and SAMPLE.shhh.names, but NO SAMPLE.shhh.groups). I then use the merge.files command to obtain merged.trim…shhh.fasta, merged.trim…shhh.names, and merged.trim…shhh.groups files, and then proceed to follow the Schloss SOP with the merged files. when I get to the pre.cluster command step -

pre.cluster(fasta=merged.trim…shhh.trim.unique.good.filter.unique.fasta, name=merged.trim…shhh.trim.unique.good.filter.names, group=merged.trim…shhh.good.groups, diffs=2

I receive an error message: “[ERROR]: Your name file contains 301 valid sequences, and your groupfile contains 2259, please correct.”

Any advice on how to overcome this issue? I do not see a step where another group file would be created in the SOP. I have also tried merging files after the trim.seqs step in the SOP (instead of after the shhh.flows step), but I ended up with the same error. And I tried using ‘merged.shhh.trim.unique.good.filter.fasta’ (instead of the merged.trim…shhh.trim… file) but that didn’t help either.


After running shhh.flows you need to run trim.seqs again. Then I’d merge all of your different files.