Using Sffinfo to parse by MID

I’m prepping some SFF files for upload to the SRA. Or at least I’m trying to. Files that I’ve manipulated dozens of times in mothur are suddenly giving me problems. I’ve tried this in both versions 1.31.2 and 1.33.3 on my Ubuntu 12.04 64 bit system.

sffinfo(sff=myfile.sff, fasta=F, qfile=F, trim=F, flow=F, oligos=122010.oligos)

On both versions mothur generates the individual temp sff files for each MID, but at the end of the command everything gets dumped in the scrap file. I’ve tried multiple pairs of sff and oligos files, all of which have been previously analyzed, and nothing works. I’ve grabbed copies from other drives in case some how the oligos files got corrupted. Same problem. I also tried running everything in Windows. Same problems.

Also, oddly, I’ve always named my oligos files something like 12-2010.oligos, and suddenly versions of mothur I’ve used these files with before are refusing to accept the hyphen in the name. It’s an easy enough change, but I can’t figure out why now it’s a problem.

Please help. This is already a horrendously tedious process, and I haven’t had this much trouble operating mothur is years.

Have you tried with pdiffs and bdiffs?

Yes, tried the standard settings from the SOP for trim.seqs. No luck.

It’s looking like we can chalk this one up to “unexpected behaviors” (it was unexpected to me anyway).

  1. This produces the parsed sff files plus pooled fasta, qual, flow, and scrap files:
    sffinfo(sff=myfile.sff, oligos=122010A.oligos, bdiffs=1, pdiffs=2)

  2. This puts everything in the scrap file:
    sffinfo(sff=myfile.sff, fasta=F, qfile=F, trim=F, flow=F, oligos=122010A.oligos, bdiffs=1, pdiffs=2)

  3. This produces unparsed raw.fasta, raw.qual, flow, and scrap files:
    sffinfo(sff=myfile.sff, trim=F, oligos=122010A.oligos, bdiffs=1, pdiffs=2)

I am now attempting to determine whether or not the sequences are being trimmed in #1 (I want raw sequences minus the MID). For the record, I’m working with v1.33.3 on Win8 today.

[Edit] Confirmed, #1 is trimming the sequences. What is being trimmed off here? The documentation says it’s being trimmed to the “clipQualLeft and clipQualRight values”, but I cannot find where those values are described.

[Edit] Confirmed, #1 is trimming the sequences. What is being trimmed off here? The documentation says it’s being trimmed to the “clipQualLeft and clipQualRight values”, but I cannot find where those values are described.

If you run sffinfo(sfftxt=T) on the sff files you generated and open them, you’ll see that for each sequence the clip values are set there.