Fix edge case in AnchorTabular where no samples satisfying the anchor… #742

jklaise · 2022-09-05T10:24:24Z

… exist in the train data

Fixes #317.

This is an edge case when no samples satisfying the anchor exist in the training data. For example, this can happen when explaining an instance with a categorical variable whose value is not observed in the training data.

The fix is minimal with no major logic changes:

check the edge case and adjust the variable partial_ancor_rows accordingly
wrap selection of the starting index for sampling into try/except and break out of the sampling and go straight to handling unknown features if an IndexError is raised

I have checked this on a toy example explaining an instance with a categorical variable whose value is not present in the training data. In this instance and with the dummy model used (prediction is the value of the categorical variable) the anchor should be the value of the categorical variable with precision 1 and coverage 0. The following snippet reproduces the result and compares it to the original anchor implementation (pip install anchor-exp):

import numpy as np
from anchor import anchor_tabular
from alibi.explainers import AnchorTabular

SEED = 0

# DATA
N = 1000
N_CAT = 3

# 1 categorical and 1 numerical feature
np.random.seed(SEED)
cat = np.random.randint(3, size=N)
num = np.random.rand(N)

data = np.column_stack((cat, num))
# filter out any rows where cat == 2
train = data[data[:, 0] != 2]

# add one row with cat == 2 (increases coverage to >0 so code doesn't break)
# train = np.vstack((train, np.array([2, 0.])))

# metadata
feature_names = ['categorical', 'numerical']
category_map = {0: ['category_0', 'category_1', 'category_2']}

# MODEL
predictor = lambda x: x[:, 0].astype(int)  # dummy model - categorical feature determines class

# ALIBI EXPLAINER
explainer = AnchorTabular(predictor=predictor,
                          feature_names=feature_names,
                          categorical_names=category_map,
                          seed=SEED)
explainer.fit(train)

# instance to be explained
bad_instance = np.array([2, 0.0])

# ALIBI EXPLANATION
explanation = explainer.explain(bad_instance)
print(f'Alibi anchor: {explanation.anchor}')
print(f'Alibi precision: {explanation.precision}')
print(f'Alibi coverage: {explanation.coverage}\n')

# ORIGINAL EXPLAINER
np.random.seed(SEED)  # attempt to recreate exactly alibi sampling...
explainer = anchor_tabular.AnchorTabularExplainer(
    class_names=['0', '1', '2'],
    feature_names=feature_names,
    train_data=train,
    categorical_names=category_map
)

# ORIGINAL EXPLANATION
explanation = explainer.explain_instance(bad_instance, predictor, threshold=0.95)
print(f'Original anchor: {explanation.names()}')
print(f'Original precision: {explanation.precision()}')
print(f'Original coverage: {explanation.coverage()}')

WARNING: No data records have 0 feature with value 2.0. Setting all samples' values to 2.0!
WARNING: No data records have 0 feature with value 2.0. Setting all samples' values to 2.0!
Alibi anchor: ['categorical = category_2']
Alibi precision: 1.0
Alibi coverage: 0.0

Original anchor: ['categorical = category_2']
Original precision: 1.0
Original coverage: 0.0

Process finished with exit code 0

Note that warning messages are raised when there are no entries in the training data satisfying the feature value.

… exist in the train data

codecov · 2022-09-05T11:00:22Z

Codecov Report

Merging #742 (4ef8348) into master (4c28d19) will decrease coverage by 0.17%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master     #742      +/-   ##
==========================================
- Coverage   81.15%   80.97%   -0.18%     
==========================================
  Files         105      105              
  Lines       11847    11874      +27     
==========================================
+ Hits         9614     9615       +1     
- Misses       2233     2259      +26

Impacted Files	Coverage Δ
alibi/explainers/anchors/anchor_tabular.py	`89.57% <50.00%> (-0.71%)`	⬇️
alibi/datasets/default.py	`70.58% <0.00%> (-14.50%)`	⬇️
alibi/explainers/anchors/anchor_base.py	`92.70% <0.00%> (ø)`

RobertSamoilescu · 2022-09-08T09:05:57Z

Nice! I've also tested your script with numerical feature values outside the training range and it works.

Fix edge case in AnchorTabular where no samples satisfying the anchor…

4ef8348

… exist in the train data

jklaise requested a review from RobertSamoilescu September 5, 2022 10:24

jklaise merged commit f12842c into SeldonIO:master Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix edge case in AnchorTabular where no samples satisfying the anchor… #742

Fix edge case in AnchorTabular where no samples satisfying the anchor… #742

jklaise commented Sep 5, 2022

codecov bot commented Sep 5, 2022 •

edited

Loading

RobertSamoilescu commented Sep 8, 2022

Fix edge case in AnchorTabular where no samples satisfying the anchor… #742

Fix edge case in AnchorTabular where no samples satisfying the anchor… #742

Conversation

jklaise commented Sep 5, 2022

codecov bot commented Sep 5, 2022 • edited Loading

Codecov Report

RobertSamoilescu commented Sep 8, 2022

codecov bot commented Sep 5, 2022 •

edited

Loading