Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Node sample weights are not being updated in xgboost tree_path_dependent #48

Closed
thatlittleboy opened this issue May 18, 2023 · 0 comments · Fixed by #52
Closed

Node sample weights are not being updated in xgboost tree_path_dependent #48

thatlittleboy opened this issue May 18, 2023 · 0 comments · Fixed by #52
Assignees
Labels
bug Something isn't working

Comments

@thatlittleboy
Copy link
Collaborator

thatlittleboy commented May 18, 2023

Here is a minimal example:

import shap
import xgboost

X, y = shap.datasets.adult()  # shape: (32561, 12)
dtrain = xgboost.DMatrix(X, label=y, feature_names=X.columns)
params = {
    "booster": "gbtree",
    "objective": "binary:logistic",
    "max_depth": 2,
    "eta": 0.05,
    "nthread": -1,
    "random_state": 42,
}
bst = xgboost.train(params=params, dtrain=dtrain, num_boost_round=10)

explainer = shap.TreeExplainer(bst, data=X, feature_perturbation="tree_path_dependent")
print(explainer.model.fully_defined_weighting)
print(explainer.model.node_sample_weight)

The result:

False                             <------ this needs to be True when `tree_path_dependent` and when `data` is provided
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]

We boosted 10 rounds, and all nodes in all the trees have weights 0.

This is causing the error in the test test_provided_background_tree_path_dependent.

E           AssertionError: The background dataset you provided does not cover all the leaves in the model, so TreeExplainer cannot run with the feature_perturbation="tree_path_dependent" option! Try providing a larger background dataset, no background dataset, or using feature_perturbation="interventional".

The expectation is that the node_sample_weight array should not contain any 0's, or at least, shouldn't all be 0's.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant