CADD score normalization #566

icooperstein · 2024-08-27T19:11:32Z

I am wondering if you can explain/look into how CADD scores are normalized to have a scale from 0-1 when used as a pathogenicity predictor. I have noticed that when I combine CADD with other tools, CADD is the max path score for almost every variant, regardless of variant type. I've also noticed when I inspect the output files, these CADD scores are often >0.97.

One example:
One variant in my result had the following scores:
CADD=0.97435516,REVEL=0.031,MVP=0.19391714,ALPHA_MISSENSE=0.0944
I looked up this variant in CADD, and its CADD Phred score is 15.

However, another variant had a Exomiser CADD=0.99748814 in the output, but a CADD Phred score of 26. Since Phred scores are logarithmic, the difference between 15 and 26 is much more drastic than I am seeing with these scaled scores in the output.

We would really like to use CADD scores, especially since many other predictors are for missense variants only.

julesjacobsen · 2024-09-09T16:18:03Z

The CADD normalisation is here:

Exomiser/exomiser-core/src/main/java/org/monarchinitiative/exomiser/core/model/pathogenicity/CaddScore.java

Lines 30 to 43 in 7a4dc1e

    
               /** 
        
                * Creates a {@link CaddScore} from the input PHRED scaled score. *IMPORTANT* this method will rescale the input 
        
                * PHRED score to a score in the 0-1 range, therefore ensure the correct CADD score is used here. 
        
                * 
        
                * According to https://cadd.gs.washington.edu/info a good cutoff to use is the PHRED scaled scores of 
        
                * 10-20 which equates to 90-99% most deleterious or 13-20 (95-99%). For reference, these are scaled to 0.90 - 0.99. 
        
                * 
        
                * The M-CAP authors (http://bejerano.stanford.edu/mcap/) suggest these cutoffs are too permissive, although their 
        
                * recommended thresholds don't appear to match what was actually suggested by the CADD authors. 
        
                */ 
        
               public static CaddScore of(float phredScaledScore) { 
        
                   float score = 1 - (float) Math.pow(10, -(phredScaledScore / 10)); 
        
                   return new CaddScore(phredScaledScore, score); 
        
               }

AlistairNWard · 2024-09-23T19:23:53Z

Thanks for posting this @julesjacobsen. This makes some sense, but I'm not sure that it results in the desired behaviour. The CADD scaling will need to, as far as is reasonable, put the CADD scores on the same scale as the other pathogenicity scores. For example, a 0.9 for CADD should be largely equivalent to a 0.9 for REVEL. If this is not the case, then one pathogenicity source will outweigh all the others - which is what we see. If we include CADD, then it generally scores very high and will almost always be selected over other sources. I had a quick look at a number of variants with a REVEL score of ~0.9 and they all had corresponding CADD scores of ~30. By the scoring method above, a CADD score of 10 would be scaled to 0.9 and so it is not surprising that CADD heavily dominates. I think this scaling would benefit from a rethink. Is this something that we should discuss?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CADD score normalization #566

CADD score normalization #566

icooperstein commented Aug 27, 2024

julesjacobsen commented Sep 9, 2024

AlistairNWard commented Sep 23, 2024

CADD score normalization #566

CADD score normalization #566

Comments

icooperstein commented Aug 27, 2024

julesjacobsen commented Sep 9, 2024

AlistairNWard commented Sep 23, 2024