Help in sff file processing

Hi,

I am using my 454 data for OTU analysis in mothur. And I am confused after transform my sff to a fasta file. Sequencing information, platform 454 FLX, flow pattern TACG, barcode (AAAAAAAC) removed by sequencing center, primer: GAGTTTGATCNTGGCTCAG.

However, I have trouble to understand the sequence section (from 5th base to 12nd base). The primer started from 13 base. I attached the output fasta format from different toolkit.

sff_extract (from seq_crumbs toolkit) with clipping:

GAGTTTGATCCTGGCTCAGATTGAACGCTGG…

sff_extract (from seq_crumbs toolkit) without clipping:

tcagagagcgaaGAGTTTGATCCTGGCTCAGATTGAACGCTGG…

mothur output after denoise:

AGAGCGAAGAGTTTGATCCTGGCTCAGATTGAACGCTGG…

Does anyone can help to understand the sequence agagcgaa part? Base on the sequencing center information, it does not belong to barcode. And how should I deal with it? For example, it there a way to remove this region in mothur? Thank you!

I’m betting you got your data from MrDNA. For some reason they insist on giving people the wrong barcodes. Your barcode isn’t really AAAAAAAC. That would be a horrible barcode. You need to find out from them what the true barcodes were and then use that in trim.flows/trim.seqs.

Pat

Thank you so much for your reply. What do you think about the sequence: AGAGCGAA? Should I treat them as barcode? Or they are adapter?

Hi Pat,

I double check with sequencing center, which they added an artificial barcode to each sequence in a separate fasta file. And the AGAGCGAA belongs to adapter region. When I move the sff to QIIME by process_sff.py, they will trim the AGAGCGAA automatically. I am thinking may be there is a setting I did wrong in mothur or there is a bug for the sff format processing in mothur. Thank you!

I’m pretty sure that AGAGCGAA is the barcode unless you had the only sample on the sequencing run. It may include an adapter, but there’s probably a barcode in there somewhere. I’d insist that they tell you what the actual barcodes were.