combining sff files from different runs

Hi there

So i am a super newbie to not only mothur but to next gen sequencing.
Please bear with me.

I have single sff files for all my samples from 2 sequencing runs.
It is water and soil samples from two time points that i would like to compare the populations with.

Can someone please advise me on the best way to combine the data.

I have been using the SOP 454 and have found it extremely helpful especially for beginners.
Big ups to the authors for doing such a great job with transfer of knowledge!!

At what step and how (the specific command) should i combine the data?

Please remember im very new to this and need a good breakdown if possible.

1 Like

The sff.multiple command is probably going to be your best bet here. It basically runs through the unpacking/denoising/trimming process for a list in sff files then merges the outputs into a single file.

You’ll need to start off by creating an oligos file. There’s an entry on the wiki for what these should look like, but for 454 data you’ll generally have a text file with the primer sequence followed by each of your barcodes. It should look something like this:

forward CCGTCAATTCMTTTRAGT
barcode AATGGTAC Sample1
barcode AACCTGGC Sample2
barcode TTCGTGGC Sample3
barcode TTCTTGAC  Sample4

Obviously you’ll need to fill in your own primer/barcode sequences as needed. I’d recommend you do a quick test on one of your sff files to make sure you have the format correct. Just run the sffinfo and trim.flows (covered here) on one of your sff files and see check the trim and scrap files that you get back. It’s also worth noting that for the denoising process mothur uses a min/max flow length of 450. If you’ve sequenced a very short fragment (<200 bp or so) this might be too long so doing this step is a good way to check if this length is appropriate for your data.

A nice trick with the sff.multiple command is you can use a single oligos file for all your samples so once you’ve got this file sorted, you need to make the input file for the sff.multiple command. This takes a single text file as input, which is just a tab-delimited file containing the input file name and corresponding oligos file on each line. You can use the same oligos file for each entry, as shown on the sff.multiple wiki page.

That’s pretty much all it takes. Once you’ve got this file together it’s just a matter of starting the sff.multiple command and letting it run. You can set it to use multiple processors, which will drastically reduce the time needed to complete the denoising, but to what degree you can do this depends on how much RAM your computer has. If you’re working on any kind of cluster or heavy-duty bioinformatics computer then you probably don’t need to worry about this, but if you’re using your desktop PC you probably want to keep the number on the low side so you don’t overload the machine.

Once this finished you’ll have the standard *.fasta, *.names and *.groups files which you can use for the rest of the 454 SOP.

1 Like

Thanx so much for the help!

Will give it a try.