-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
CADD score normalization #566
Comments
The CADD normalisation is here: Lines 30 to 43 in 7a4dc1e
|
Thanks for posting this @julesjacobsen. This makes some sense, but I'm not sure that it results in the desired behaviour. The CADD scaling will need to, as far as is reasonable, put the CADD scores on the same scale as the other pathogenicity scores. For example, a 0.9 for CADD should be largely equivalent to a 0.9 for REVEL. If this is not the case, then one pathogenicity source will outweigh all the others - which is what we see. If we include CADD, then it generally scores very high and will almost always be selected over other sources. I had a quick look at a number of variants with a REVEL score of ~0.9 and they all had corresponding CADD scores of ~30. By the scoring method above, a CADD score of 10 would be scaled to 0.9 and so it is not surprising that CADD heavily dominates. I think this scaling would benefit from a rethink. Is this something that we should discuss? |
I am wondering if you can explain/look into how CADD scores are normalized to have a scale from 0-1 when used as a pathogenicity predictor. I have noticed that when I combine CADD with other tools, CADD is the max path score for almost every variant, regardless of variant type. I've also noticed when I inspect the output files, these CADD scores are often >0.97.
One example:
One variant in my result had the following scores:
CADD=0.97435516,REVEL=0.031,MVP=0.19391714,ALPHA_MISSENSE=0.0944
I looked up this variant in CADD, and its CADD Phred score is 15.
However, another variant had a Exomiser CADD=0.99748814 in the output, but a CADD Phred score of 26. Since Phred scores are logarithmic, the difference between 15 and 26 is much more drastic than I am seeing with these scaled scores in the output.
We would really like to use CADD scores, especially since many other predictors are for missense variants only.
The text was updated successfully, but these errors were encountered: