Processing paired end reads separately

Hi all,

I am trying to re-analyze some previously published data and am wondering if anyone has a suggested way of dealing with paired end reads separately. In brief, the original authors used modified 799F and modified 1115R using Illumina HiSeq 150-bp paired-end sequencing. By my calculations that is a 300bp region with 150bp paired end reads. So unless I am missing something there is no way these reads can be assembled and it is unclear to me how the authors overcame this obstacle. When I try to align the paired end reads I get nonsense. However if I align the forward and reverse separately and then import into ARB the read align perfectly to the SILVA ref with ~10bp gap. Please tell me if I am missing something :slight_smile:

Anyway, its ~160 samples. I have two fastq files (R1 and R2) and the authors used a combinatorial barcoding approach (see below). I could process R1 and R2 separately (using trim.seqs) however that creates redundancy in the individual forward or reverse barcodes–the uniqueness comes from the paired barcodes. So I would basically need a separate oligo file for each sample.

Any advice would be most welcome.

-Jarrod

primer AGGGTTGCGCTCGTTG AACMGGATTAGATACCCKG
barcode AAGCTA AATATC Sample1
barcode AATATC AATATC Sample2
barcode ACCCCC AATATC Sample3

I’d probably just analyze the R1 read since the R2 read will have a huge error rate. Then I’d use trim.seqs with one of the quality score settings (not sure what would work best).

Pat

Analysing only the R1 read is something I am considering.
However, how can you use trim.seqs to filter quality Illumina data? The “trim.seqs” Wiki focusses on 454 data with a separate quality file. And, how can you combine multiple R1 read fastq files into one fasta file?

Cheers, Leo

We have a fastq.info file that will generate the fasta and qual files :slight_smile: