Processing paired end reads separately

Hi all,

I am trying to re-analyze some previously published data and am wondering if anyone has a suggested way of dealing with paired end reads separately. In brief, the original authors used modified 799F and modified 1115R using Illumina HiSeq 150-bp paired-end sequencing. By my calculations that is a 300bp region with 150bp paired end reads. So unless I am missing something there is no way these reads can be assembled and it is unclear to me how the authors overcame this obstacle. When I try to align the paired end reads I get nonsense. However if I align the forward and reverse separately and then import into ARB the read align perfectly to the SILVA ref with ~10bp gap. Please tell me if I am missing something :slight_smile:

Anyway, its ~160 samples. I have two fastq files (R1 and R2) and the authors used a combinatorial barcoding approach (see below). I could process R1 and R2 separately (using trim.seqs) however that creates redundancy in the individual forward or reverse barcodes–the uniqueness comes from the paired barcodes. So I would basically need a separate oligo file for each sample.

Any advice would be most welcome.


barcode AAGCTA AATATC Sample1
barcode AATATC AATATC Sample2
barcode ACCCCC AATATC Sample3

I’d probably just analyze the R1 read since the R2 read will have a huge error rate. Then I’d use trim.seqs with one of the quality score settings (not sure what would work best).


Analysing only the R1 read is something I am considering.
However, how can you use trim.seqs to filter quality Illumina data? The “trim.seqs” Wiki focusses on 454 data with a separate quality file. And, how can you combine multiple R1 read fastq files into one fasta file?

Cheers, Leo

We have a file that will generate the fasta and qual files :slight_smile: