Merging data sets

SandraBA · January 25, 2012, 1:05pm

Hi,
I have two 454 data sets on 16S that I would like to analyze together, in total they have 59 barcoded samples. The samples are of the same type with the majority of the bacteria being the same, only from two different sequencing runs, so I figured that it would make sense to analyze them together to get the same OTUs etc. unless there’s a good reason not to do that?

If not, when do I ideally merge them, before the ‘unique.seqs’ step, merging the fasta and names files?

Any input is much appreciated!

I’m now running the shhh.flows and think there is a small error in the manual page, it says ‘shhh.flows(files=…)’ but it should probably be 'shhh.flows(file=…). Nothing major though

Thank you, Sandra

pschloss · January 25, 2012, 4:50pm

You might give them different group names when you merge them at first so you can look at the technical variation between runs. We’ve done this in the past and then averaged the relative abundances (not the counts).

SandraBA · February 1, 2012, 8:37am

Thanks for the input and sorry for not replying. I wanted to give it a try before but still stuck at denoising my data, hence the new post…

alice508 · February 2, 2012, 12:01pm

I am having a similar problem relating to merging datasets. I am trying to compare four different samples together to end up with some OTU beta diversity statistics. I have four different sff files for four samples analyzed by 454. The sequencing company provided the sff files with the sample barcodes and primers removed.

For each sample, I run through sffinfo, trim.flows, and shhh.flows. After shhh.flows, I end up with file outputs of SAMPLE.trim…shhh.fasta, SAMPLE.trim…shhh.names, and SAMPLE.trim…shhh.groups (in addition to SAMPLE.shhh.fasta and SAMPLE.shhh.names, but NO SAMPLE.shhh.groups). I then use the merge.files command to obtain merged.trim…shhh.fasta, merged.trim…shhh.names, and merged.trim…shhh.groups files, and then proceed to follow the Schloss SOP with the merged files. when I get to the pre.cluster command step -

pre.cluster(fasta=merged.trim…shhh.trim.unique.good.filter.unique.fasta, name=merged.trim…shhh.trim.unique.good.filter.names, group=merged.trim…shhh.good.groups, diffs=2

I receive an error message: “[ERROR]: Your name file contains 301 valid sequences, and your groupfile contains 2259, please correct.”

Any advice on how to overcome this issue? I do not see a step where another group file would be created in the SOP. I have also tried merging files after the trim.seqs step in the SOP (instead of after the shhh.flows step), but I ended up with the same error. And I tried using ‘merged.shhh.trim.unique.good.filter.fasta’ (instead of the merged.trim…shhh.trim… file) but that didn’t help either.

Thanks

pschloss · February 2, 2012, 6:12pm

After running shhh.flows you need to run trim.seqs again. Then I’d merge all of your different files.

Topic		Replies	Views
merging files from the shhh pipeline Commands in mothur	1	2238	November 2, 2011
Problems combining data from different runs Commands in mothur	6	4485	May 16, 2014
Does this command exist? Commands in mothur	9	4588	February 11, 2014
Merging Files Theory behind mothur	9	8165	January 20, 2014
Optimal point for merging several 454 runs after sff? Commands in mothur	1	2572	March 28, 2014

Merging data sets

Related topics