Quality scores in make.contigs V123 vs V4

Hey guys,

We’ve done a couple of runs using the MiSeq 2x300 bp using the 27F and 519R primer set. Our first couple of runs seemed OK, but our latest one is dodgy as. I believe getting this region to work well really depends if you get a good sequencing run. I would highly recommend NOT sequencing this region and use the V4 region (and have a consensus reads as an additional quality control as in the paper - listen to Pat! :lol: ).

Given our dodgy run, I was trying to delve deeper in to the make.contigs command. I used fastQC to look at the quality of the calls in the contigs (make.fastq then fastQC), although I just came across a post here about the quality calls being calculated by PandaSeq so I’m not sure what the quality value mean now (i attached the pictures). It is a little bit unclear to me whether make.contigs does some trimming before alignment of contigs given the “trim” is in the resulting file name. So;

How does the trimming work in make.contigs?
How should we deal with the quality values here? I compared them to a V4 run we did (note: the contigs have not been screened, so they include the dodgey ones).

V123 - R1 (300 bases)

V123 - R2 (300 bases)

V123 - contig (600 bases)

V4 - R1 (250 bases)

V4 - R2 (250 bases)

V4 - contig (500 bases)

V4 - contig (cut to 275 bases)

Nice to have independent validation that I’m not nuts (on this at least) :slight_smile:

make contigs will align the sequences to each other. Then it finds all of the base calls where there is a discrepancy in the alignment. If there is a discrepancy, then it looks at the quality scores of the base calls and uses the base call for the quality score that is at least 6 points higher than the other. If the difference is less than 6 we call it an N and remove the sequence in screen.seqs. Make sense?

Hey Pat,

thanks for the reply. So then what are the quality scores from make.contigs representing? If the bases agree in the alignment, what score will it get? An issue is that when I look at the quality scores of the trim.contigs, there are scores well below Q20. Are these bases that agree, but with poor quality?

Thanks

Shaun

Hi Shaun,

Sorry the make.contigs wiki page is not up to date. We’ll get on that. By default, the quality score files generated from make.contigs are calculated using the method described in PandaSeq and aren’t Phred quality scores. Here’s a description of why the scores appear so small…

FWIW, people basically begged us to output quality scores and I’m still unclear what one would do with these scores :wink:

Pat