Can anyone comment on the “quality score” in trim.seqs? How exactly is this calculated? And why the magic number 35?
The qwindowaverage is calculated by sliding a window of a specified size (default=50 bases) and averaging the quality score over that window. When the average drops below the threshold, then the sequence is trimmed to the previous good window. The magic number of 35 was selected because if you look at the quality scores for different types of errors (mismatches and insertions) and for good base calls and you look at where the errors occur (at the distal end), 35 was the value that did the best of separating the good from the bad. This is in the process of being written up. I do feel weird about suggesting something that isn’t published yet, but I thought it was pretty important to get this information out.
Are we talking about the ubiquitous Phred quality scores? Here’s what I found in Wikipedia…
“Phred’s approach to base calling and calculating quality scores was outlined by Ewing et al… To determine quality scores, Phred first calculates several parameters related to peak shape and peak resolution at each base. Phred then uses these parameters to look up a corresponding quality score in huge lookup tables. These lookup tables were generated from sequence traces where the correct sequence was known, and are hard coded in Phred; different lookup tables are used for different sequencing chemistries and machines.”
(Ref.: Ewing B, Hillier L, Wendl MC, Green P. (1998): Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8(3):175-185)