Hi,
I have a problem when running sffinfo that I am always getting some extra unwanted sequence at the start, which then gives me some trouble to sort them by my barcodes using trim.flows. It is not so simple as to add this extra sequence to my barcodes because it varies a bit between reads/runs, so too many good sequences keep ending up scrap.
I’ve compared the extraction of fasta files using sffinfo and Biopython seqIO.convert, and here is an example of what I’m getting:
Mothur sffinfo (trim=F)
HCZNHVY02GP28Z xy=2640_1633
gactACTCTCGTGTTACGGCCAGAGTCGGAGACTGGGGACTTCCTGGTAAAGAACGTTGCTCCGGGCCGGCCTAGTCGACTGCCAAGGCACACaggggataggn
Mothur sffinfo (trim=T):
HCZNHVY02GP28Z xy=2640_1633
ACTCTCGTGTTACGGCCAGAGTCGGAGACTGGGGACTTCCTGGTAAAGAACGTTGCTCCGGGCCGGCCTAGTCGACTGCCAAGGCACAC
Biopython raw:
HCZNHVY02GP28Z
gactactctcgtgttacggccAGAGTCGGAGACTGGGGACTTCCTGGTAAAGAACGTTGC
TCCGGGCCGGCCTAGTCGACTGCCAAGGCACACAggggataggn
Biopython trimmed:
HCZNHVY02GP28Z
AGAGTCGGAGACTGGGGACTTCCTGGTAAAGAACGTTGCTCCGGGCCGGCCTAGTCGACTGCCAAGGCACACA
The biopython trimming is always perfect (in the example above my barcode is AGAGTC). Also the fasta files provided by my sequencing facility are fine (they told me they are using a Roche tool for conversion).
I get the same result using v1.23 on my mac and v1.22 on windows.
Any ideas?
Thanks!
Marc