Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix edge case in AnchorTabular where no samples satisfying the anchor… #742

Conversation

jklaise
Copy link
Contributor

@jklaise jklaise commented Sep 5, 2022

… exist in the train data

Fixes #317.

This is an edge case when no samples satisfying the anchor exist in the training data. For example, this can happen when explaining an instance with a categorical variable whose value is not observed in the training data.

The fix is minimal with no major logic changes:

  • check the edge case and adjust the variable partial_ancor_rows accordingly
  • wrap selection of the starting index for sampling into try/except and break out of the sampling and go straight to handling unknown features if an IndexError is raised

I have checked this on a toy example explaining an instance with a categorical variable whose value is not present in the training data. In this instance and with the dummy model used (prediction is the value of the categorical variable) the anchor should be the value of the categorical variable with precision 1 and coverage 0. The following snippet reproduces the result and compares it to the original anchor implementation (pip install anchor-exp):

import numpy as np
from anchor import anchor_tabular
from alibi.explainers import AnchorTabular

SEED = 0

# DATA
N = 1000
N_CAT = 3

# 1 categorical and 1 numerical feature
np.random.seed(SEED)
cat = np.random.randint(3, size=N)
num = np.random.rand(N)

data = np.column_stack((cat, num))
# filter out any rows where cat == 2
train = data[data[:, 0] != 2]

# add one row with cat == 2 (increases coverage to >0 so code doesn't break)
# train = np.vstack((train, np.array([2, 0.])))

# metadata
feature_names = ['categorical', 'numerical']
category_map = {0: ['category_0', 'category_1', 'category_2']}

# MODEL
predictor = lambda x: x[:, 0].astype(int)  # dummy model - categorical feature determines class

# ALIBI EXPLAINER
explainer = AnchorTabular(predictor=predictor,
                          feature_names=feature_names,
                          categorical_names=category_map,
                          seed=SEED)
explainer.fit(train)

# instance to be explained
bad_instance = np.array([2, 0.0])

# ALIBI EXPLANATION
explanation = explainer.explain(bad_instance)
print(f'Alibi anchor: {explanation.anchor}')
print(f'Alibi precision: {explanation.precision}')
print(f'Alibi coverage: {explanation.coverage}\n')

# ORIGINAL EXPLAINER
np.random.seed(SEED)  # attempt to recreate exactly alibi sampling...
explainer = anchor_tabular.AnchorTabularExplainer(
    class_names=['0', '1', '2'],
    feature_names=feature_names,
    train_data=train,
    categorical_names=category_map
)

# ORIGINAL EXPLANATION
explanation = explainer.explain_instance(bad_instance, predictor, threshold=0.95)
print(f'Original anchor: {explanation.names()}')
print(f'Original precision: {explanation.precision()}')
print(f'Original coverage: {explanation.coverage()}')
WARNING: No data records have 0 feature with value 2.0. Setting all samples' values to 2.0!
WARNING: No data records have 0 feature with value 2.0. Setting all samples' values to 2.0!
Alibi anchor: ['categorical = category_2']
Alibi precision: 1.0
Alibi coverage: 0.0

Original anchor: ['categorical = category_2']
Original precision: 1.0
Original coverage: 0.0

Process finished with exit code 0

Note that warning messages are raised when there are no entries in the training data satisfying the feature value.

@codecov
Copy link

codecov bot commented Sep 5, 2022

Codecov Report

Merging #742 (4ef8348) into master (4c28d19) will decrease coverage by 0.17%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #742      +/-   ##
==========================================
- Coverage   81.15%   80.97%   -0.18%     
==========================================
  Files         105      105              
  Lines       11847    11874      +27     
==========================================
+ Hits         9614     9615       +1     
- Misses       2233     2259      +26     
Impacted Files Coverage Δ
alibi/explainers/anchors/anchor_tabular.py 89.57% <50.00%> (-0.71%) ⬇️
alibi/datasets/default.py 70.58% <0.00%> (-14.50%) ⬇️
alibi/explainers/anchors/anchor_base.py 92.70% <0.00%> (ø)

@RobertSamoilescu
Copy link
Collaborator

Nice! I've also tested your script with numerical feature values outside the training range and it works.

@jklaise jklaise merged commit f12842c into SeldonIO:master Sep 8, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexError: index -1 is out of bounds for axis 0 with size 0
2 participants