-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Run multicalibration on pre-computed scores w/o access to initial predictor #42
Comments
Hi @flinder sorry for the long silence, I've been travelling. |
Can you maybe provide a little more detail wrt. your problem? My test: Instead of the second feature, I added the true labels as features to see if there is a general problem. library(mcboost)
library(data.table)
# simulate some random data
n = 100
scores = runif(n)
labels = rbinom(n, 1, scores)
is_test = as.logical(rbinom(n, 1, 0.2))
segmentation_features = data.table(
cbind(
rbinom(n, 1, scores),
labels
)
) The default hyperparameters are not always helpful, I used the following: mc = MCBoost$new(
auditor_fitter="TreeAuditorFitter",
init_predictor=init_predictor,
eta=0.5,
alpha=1e-7,
max_iter=20,
multiplicative = TRUE
) Computing the Brier Score (MSE) now yields a strong improvement: mse = function(x,y) {mean((x-y)^2)}
mse(scores[is_test], labels[is_test])
mse(prs, labels[is_test])
NOTE: If you use a |
@pfistfl thanks for looking into it so quickly. Verifying that the function should work in this way is already very helpful! I'll look a bit more into your suggestions re hyper parameters and double check my code.
Here's a plot of the calibrated vs. original scores that I get for my problem: ![]() |
The discontinuity stems from the fact that probabilities are bucketed at [0,0.5] and (0.5, 1] and then predictions are adapted within each bucket. From the figure I would assume that your problem is slightly imbalanced with more labels at 0 which leads to the overall predictions being pushed down? |
I'm trying to multi-calibrate scores precomputed from a black-box model (assume we don't have access to the model itself) but I'm getting non-sensical results.
I'm wondering if this should work in theory (and there's some other bug in my code) or if there's a more fundamental reason this doesn't work.
Here's an example to illustrate what I'm trying to do:
The text was updated successfully, but these errors were encountered: