Split one sff file for sequence submission to NCBI

Hi there:

I have two sff files from two separate 454 runs, in order to submit the raw data onto SRA of NCBI database, I need to do the following two things:

  1. Combine the two sff files together to make one big sff file;
  2. Since for each 454 run, I have 20 samples distinguished with 20 different barcodes for bacterial sequences, how could I split one big sff file into 20 sff files (or other forms of files as long as NCBI would recognize) for 20 different bacterial samples?

This is my first time to deal with 454 runs and data submission, so it would be very sweet if someone could provide me a detailed procedure for above questions. Thanks a TON!

Hi,

I recently struggled with the same thing, so here’s one solution. It doesn’t use mothur though, but sfffile, which I think is Roche’s/454’s own program. If you don’t have it, I’m sure there are other tools that are freely available can do the same things.

  1. Mothur has the merge.sfffiles command (http://www.mothur.org/wiki/Merge.sfffiles), though I don’t know why you’d need one huge sff file in the first place?

  2. With the 454’s sfffile command, you can make a list (called a MIDConfig.parse file) that includes your sample barcodes, and give that list and all your sff files (no need to merge them) as input to sfffile, and it will make a separate sff file for each sample/barcode. Here’s a more detailed description of how it’s done: https://mulcyber.toulouse.inra.fr/plugins/mediawiki/wiki/ng6/index.php/454_demultiplex

Apparently the mothur developers are working on a make.sra command so hopefully this will be easier in the future :slight_smile:

You can use mothur to do the whole thing, :). As mentioned above the merge.sfffiles command, http://www.mothur.org/wiki/Merge.sfffiles, can be used to combine the files. The sffinfo command, http://www.mothur.org/wiki/Sffinfo, can be used to parse sff files.

mothur > merge.sfffiles(sff=yourFirstFile-yourSecondFile, output=merged.sff) - merge files
mothur > sffinfo(sff=merged.sff, oligos=yourOligosFile, pdiffs=2, bdiffs=1) - parse by sample

Note: the fastq.info command, http://www.mothur.org/wiki/Fastq.info, has similar functions for fastq data generated by MiSeq.

Thanks a TON for your detailed info :slight_smile: They are so helpful! Apparently, I will try both methods at once!

I am happy to have found this post, however, i wish this info (ability to split sff files) can be added to the “General Commands” page and right below the “merge.sfffiles” command.

again thanks

Ousama

Hello everyone,

I’m trying to split one sff file in multiple sff /fasta+qual+flow files.
I thought it was possible to do it using sffinfo function, using oligos as a separator:

mothur > sffinfo(sff=allData.sff,oligos=primerB.oligos,flow=T)

oligos is not a valid parameter.
The valid parameters are: sff, accnos, sfftxt, flow, trim, fasta, name, inputdir, and outputdir.
[ERROR]: did not complete sffinfo.

Am I doing something wrong?

Thank you,
JMarcelino

What version of mothur are you using?