denoising 454 sequences before splitting sampels by barcodes due to bidirectional 454 data

Hello All,

I am currently processing some bidirectional 454 data from a published manuscript. I have received raw .sff files from authors. The sequencing was done bidirectionally, and the only real outcome of this is that ~1/2 of the sequences are forward, and the other have are reverse (still the same fungal ITS2 region, just half of them are reverse complemented). This means to get through trim.seqs I need to run trim.seqs twice- once on the forward reads, and once on the reverse complemented reads. This is no big deal. However, it is a big deal when I go to denoise, as the output of trim.flows can’t easily be reverse complemented. However, even if it could, what would be ideal is if I could denoise before I even started demultiplexing the sequences by barcodes and doing the same on the reverse complement set. Below I propose a solution, but I’d like feedback on a few things.

  1. I use sffinfo in mothur to generate a .flow file from my .sff file.
  2. I use trim.flows but do not pass a barcode mapping file. This way all barcodes are retained, but 454-adapter sequences and other things represented by lower case bases in the .sff file are trimmed.
  3. I pipe the output of trim.flows to shhh.flows to do denoising. This will generate fasta and qual files, and I can move on to demultiplex with my barcode file from there.

The only potential hang up I can for see is that the 454 adapter sequence and other things represented by lower case bases in the raw .sff file will be retained when I choose not to specify a mapping file in trim.flows. This means I will have a bunch of trouble assigning OTUs post denoising.

Is there any other problem I haven’t thought of with denoising before demultiplexing? Or anything else I’ve proposed?

However, it is a big deal when I go to denoise, as the output of trim.flows can’t easily be reverse complemented.

It can, actually. If you run trim.seqs you can say flip=T to get the reverse complement. Regardless, you’ll want to analyze the forward reads separately from the reverse reads through the entire pipeline.

Pat

Hi Pat. Thanks for your response. To clarify- Do you mean trim.flows can generate the reverse complement of the .flow file?

No trim.flows doesn’t want the RC’d sequences. All of the sequences are in the direction they were sequenced, so they should all start with the appropriate barcode and primer sequence, even if it was the reverse barcode/primer. You’ll include all of the primers and barcodes in your oligos file and run trim.flows. You should give a name to the primers and barcodes in the oligos file. After trim.flows runs, you can edit the files file that outputted to create two files - one for the forward reads and one for the reverse reads. Run those through shhh.flows separately. Then the output of that for the reverse reads will include flip=T in trim.seqs while the forward reads will use flip=F.

Make sense?
Pat