Run multicalibration on pre-computed scores w/o access to initial predictor #42

flinder · 2024-01-12T14:44:55Z

I'm trying to multi-calibrate scores precomputed from a black-box model (assume we don't have access to the model itself) but I'm getting non-sensical results.

I'm wondering if this should work in theory (and there's some other bug in my code) or if there's a more fundamental reason this doesn't work.

Here's an example to illustrate what I'm trying to do:

library(mcboost)

# simulate some random data
n = 100
scores = runif(n)
labels = rbinom(n, 1, scores)
is_test = as.logical(rbinom(n, 1, 0.1))
segmentation_features = data.table(
    cbind(
        rbinom(n, 1, 0.1),
        rbinom(n, 1, 0.5)
    )
)

init_predictor = function(data) {
    # Hack to make it return pre-computed scores for train/test since we don't have access to the model
    if(nrow(data) > 50) {
        scores[!is_test]
    } else {
        scores[is_test]
    }
}

mc = MCBoost$new(
    auditor_fitter="TreeAuditorFitter", 
    init_predictor=init_predictor
)

mc$multicalibrate(
    segmentation_features[!is_test],
    labels[!is_test]
)
mc

prs = mc$predict_probs(segmentation_features[is_test])

pfistfl · 2024-01-21T14:18:48Z

Hi @flinder

sorry for the long silence, I've been travelling.
I will need to think about this a little and get back to you, I can't tell why this shouldn't work off the top of my head.
I will try to get back to you within next week.

pfistfl · 2024-01-21T19:31:25Z

Can you maybe provide a little more detail wrt. your problem?
In theory, your approach should mostly work fine.

My test: Instead of the second feature, I added the true labels as features to see if there is a general problem.

library(mcboost)
library(data.table)

# simulate some random data
n = 100
scores = runif(n)
labels = rbinom(n, 1, scores)
is_test = as.logical(rbinom(n, 1, 0.2))
segmentation_features = data.table(
    cbind(
        rbinom(n, 1, scores),
        labels
    )
)

The default hyperparameters are not always helpful, I used the following:

mc = MCBoost$new(
    auditor_fitter="TreeAuditorFitter", 
    init_predictor=init_predictor,
    eta=0.5,
    alpha=1e-7,
    max_iter=20, 
    multiplicative = TRUE
)

Computing the Brier Score (MSE) now yields a strong improvement:

mse = function(x,y) {mean((x-y)^2)}
mse(scores[is_test], labels[is_test])
mse(prs, labels[is_test])

[1] 0.153977
[1] 0.0592801

NOTE: If you use a iter_sampling strategy other than none (the default), row order in your data might be shuffled or data might be subsetted. In this case you might have to adjust the initial learner to return the correct rows.

flinder · 2024-01-22T10:06:38Z

@pfistfl thanks for looking into it so quickly. Verifying that the function should work in this way is already very helpful! I'll look a bit more into your suggestions re hyper parameters and double check my code.

Can you maybe provide a little more detail wrt. your problem?

Here's a plot of the calibrated vs. original scores that I get for my problem:

pfistfl · 2024-01-22T18:58:12Z

The discontinuity stems from the fact that probabilities are bucketed at [0,0.5] and (0.5, 1] and then predictions are adapted within each bucket. From the figure I would assume that your problem is slightly imbalanced with more labels at 0 which leads to the overall predictions being pushed down?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run multicalibration on pre-computed scores w/o access to initial predictor #42

Run multicalibration on pre-computed scores w/o access to initial predictor #42

flinder commented Jan 12, 2024 •

edited

Loading

pfistfl commented Jan 21, 2024

pfistfl commented Jan 21, 2024

flinder commented Jan 22, 2024

pfistfl commented Jan 22, 2024

Run multicalibration on pre-computed scores w/o access to initial predictor #42

Run multicalibration on pre-computed scores w/o access to initial predictor #42

Comments

flinder commented Jan 12, 2024 • edited Loading

pfistfl commented Jan 21, 2024

pfistfl commented Jan 21, 2024

flinder commented Jan 22, 2024

pfistfl commented Jan 22, 2024

flinder commented Jan 12, 2024 •

edited

Loading