unique sequences and titanium chemistry

I just got my sequences yesterday. The sequences span the V4-V6 region and are around 550 bp. in average. When I run the unique.seqs command almost all of the reads in a run (>50.000 reads pr sample) are unique. This is in contrast to the normal flx chemistry where the average read length are between 250-300 bp, and you by running the unique.seqs command half the amount of reads you have for further processing. However, reducing the read length of the reads to 400 bp. reduces the amount of read to the half when using the unique.seqs. command, but reducing the read length to 500 bp. most of the reads are still unique. This means that bases after 400 can not be used even thought they pass the quality filter, at least for my data. Has anyone experienced the same problems when using titanium chemistry? I found this thread
where the problem is more or less the same. I would be very happy if you ( pschloss) could explain in more detail the solution you noticed in the other thread,how to overcome the problem.

thx in advance,
Anders Jensen

Found out that the problem is the quality score, because when I run the trim.seqs command usind to the recommendations in the costello example (trim.seqs(fasta=nose.fasta, maxambig=0, maxhomop=8, oligos=nose.oligos, qfile=nose.qual, pdiffs=2, minlength=200, qwindowaverage=35, qwindowsize=50, processors=2) I go from:
Start End NBases Ambigs Polymer
Minimum: 1 31 31 0 1
2.5%-tile: 1 384 384 0 4
25%-tile: 1 519 519 0 4
Median: 1 528 528 0 5
75%-tile: 1 536 536 1 5
97.5%-tile: 1 560 560 2 6
Maximum: 1 1183 1183 14 31

of Seqs: 217978


Start End NBases Ambigs Polymer
Minimum: 1 202 202 0 3
2.5%-tile: 1 204 204 0 3
25%-tile: 1 244 244 0 4
Median: 1 274 274 0 4
75%-tile: 1 304 304 0 4
97.5%-tile: 1 374 374 0 6
Maximum: 1 434 434 0 8

of Seqs: 77621

I thought that it was my data that were rubbish, but when I did the same trimming on data from several titanium runs published in the SRA database, I saw the same trend. The median read length was reduced from around 500 to around 300 bp. This means that the advantage of using titanium chemistry over standard flx chemistry is more or less gone in these diversity studies using 16S rRNA sequences where rigid trimming is necessary. I hope someone with more experience in these kind analysis than me would comment on my observations.

Exactly. If you want 700 bp reads I could generate the extra 200 bp for a small fee :). Perhaps the most significant benefit of Titanium is that you should be getting extra reads. Also, if you look at the FLX data that’s around there’s a similar phenomenon.