September 7, 2015 at 12:55 pm #3188
This is a paper that was published some time ago which found that while KB gave longer read lengths when using predicted quality scores, it did not provide any increases when used for alignment.
A direct comparison of the KB™ Basecaller and phred for identifying the bases from DNA sequencing using chain termination chemistry
Relatively recently, the software KB™ Basecaller has replaced phred for identifying the bases from raw sequence data in DNA sequencing employing dideoxy chemistry. We have measured quantitatively the consequences of that change.
The high quality sequence segment of reads derived from the KB™ Basecaller were, on average, 30-to-50 bases longer than reads derived from phred. However, microbe identification appeared to have been unaffected by the change in software.
We have demonstrated a modest, but statistically significant, superiority in high quality read length of the KB™ Basecaller compared to phred. We found no statistically significant difference between the numbers of microbial species identified from the sequence data.
We actually looked into why this occurred around 10 years ago when we first introduced our LongTrace basecaller. The cause is surprising – while KB actually is a better basecaller than phred, a lot of the increased Q20+ bases come from pockets of high quality bases within low quality regions. These high quality bases prevent programs like phrap from being able to align the reads as they create high quality mismatches in the consensus sequence. The end result is that you get more Q20+ bases, but you don’t get any improvement in the alignments.
PeakTrace does not suffer from this effect and actually aligns better than would be expected from the read length alone. The reason for this is in comparison to KB and phred PeakTrace makes fewer indel errors (i.e. missing or inserting extra bases). Indel errors make aligning sequences difficult since they are heavily penalised in the alignment algorithms.
You must be logged in to reply to this topic.