-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
GAMChanger fails to load data in some cases #4
Comments
I have produced a more compact MWE; import numpy as np
import pandas as pd
import gamchanger as gc
from interpret.glassbox import ExplainableBoostingRegressor
size = 5
x1 = np.linspace(0, 10, size)
y = -1.0 * x1.copy() + 3.0
# Introduce missing data
x1[1] = np.nan
x1[2] = np.nan
# With only two missing datapoints, the GAMChanger interface loads fine
# If we introduce a third missing feature valueby un-commenting the below
# line, the validation data fails to load
#x1[3] = np.nan
df = pd.DataFrame(
data={
'x1': x1,
'y' : y
}
)
X = df[['x1']]
y = df['y']
print(df)
# Train model
ebm = ExplainableBoostingRegressor(interactions=False)
ebm.fit(X, y)
gc.visualize(ebm, X, y) |
...update... based on the above MWE, I have been able to narrow down the error to this javascript uncaught error in the Firefox JS console; ...which I believe is coming from the variable For the failing case, this javascript (before base64 encoding) looks like this; (function() {
let data = {
"model": {
"intercept": -0.849715269828704,
"isClassifier": false,
"features": [
{
"name": "x1",
"type": "continuous",
"importance": 0.14849663043758726,
"additive": [-0.1856, -0.1856],
"error": [0.7972, 0.7972],
"id": [0],
"count": [1, 1],
"binEdge": [0.0, 5.0, 10.0],
"histEdge": [0.0, 10.0],
"histCount": [2]
}
],
"labelEncoder": {},
"scoreRange": [-0.9828, 0.6905]
},
"sample": {
"featureNames": ["x1"],
"featureTypes": ["continuous"],
"samples": [[0.0], [NaN], [NaN], [NaN], [10.0]],
"labels": [3.0, 0.5, -2.0, -4.5, -7.0]
}
};
let event = new Event('gamchangerData');
event.data = data;
console.log('before');
console.log(data);
document.dispatchEvent(event);
}()) |
Following the rabbit trail from the Of these two functions, Line 447 in ec85c7a
At this point, my knowledge of Typescript and WASM is stopping me from investigating this bug further. I suspect the issue is coming from I would very much appreciate help from the devs to track down this bug! Presently, this is preventing me from using GAMChanger with my application (predicting court case outcomes). |
Wow @aaronsnoswell thank you so much for your detailed report and effort in debugging this issue! I tried to reproduce this error using your example, but I got a ValueError when fitting an EBM model with missing values. I believe EBM does not support missing value yet? My x1 y
0 0.0 3.0
1 NaN 0.5
2 NaN -2.0
3 7.5 -4.5
4 10.0 -7.0
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-13a640a7d74a> in <module>
32 # Train model
33 ebm = ExplainableBoostingRegressor(interactions=False)
---> 34 ebm.fit(X, y)
~/miniconda3/envs/gam/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in fit(self, X, y, sample_weight)
822 # AND add some tests for the X.dim == 1 scenario
823
--> 824 # TODO PK write an efficient striping converter for X that replaces unify_data for EBMs
825 # algorithm: grap N columns and convert them to rows then process those by sending them to C
826
~/miniconda3/envs/gam/lib/python3.7/site-packages/interpret/utils/all.py in unify_data(data, labels, feature_names, feature_types, missing_data_allowed)
340 msg = "Missing values are currently not supported."
341 log.error(msg)
--> 342 raise ValueError(msg)
343
344 return new_data, new_labels, new_feature_names, new_feature_types
ValueError: Missing values are currently not supported. |
Hi @xiaohk thanks for getting back to me! The latest versions of Interpret have experimental support for missing values - I forgot to mention that I am using this experimental code. To enable it, you need to change a few places in the interpret source code. See this comment on interpretml/interpret#18 for the details. So for instance, I checked After doing that, the example should work. Thanks again! |
Signed-off-by: Jay Wang <jay@zijie.wang>
I see, thanks! EBM's experimental support for missing values introduces a separate fb7ba18#diff-de8698f459a11697fd2d6614444871f69e802ae1af354cd35aba32e62e6698bbR267-R277 I will close this issue for now. Let me know if it doesn't work for you @aaronsnoswell. Thanks for reaching out to me! |
Thanks for looking into this, @xiaohk! fb7ba18 seems like a good patch for now. Dropping all rows with missing values is pretty rough for users with real-world data though in the longer run :D I'd be happy to take a stab at adding proper support for missing values if you can provide a little guidance for me. E.g. could you draw a sketch / doodle of what the GAMChanger interface should look like to show the missing value bin (where this would go in the interface?). Also, is there any documentation about setting up a development environment for GAMChanger? |
Thank you so much for your interest! I believe supporting missing value will be super helpful. Adding this feature might sound straightforward, but I am sure it would require A LOT of work. 😅 Some high-level steps:
To set up a development environment for GAM Changer: git clone git@github.com:interpretml/gam-changer.git
# Install the dependencies:
npm install
# Start a development server
npm run dev You might have noticed that the EBM inference and isotonic regression WebAssembly code are shipped as binaries in this repo. Their source code is at xiaohk/ebm.js and xiaohk/isotonic.js, respectively. If you are interested, I am happy to provide any sketches, feedback, and guides that can help you! It would be a hard and rewarding contribution to GAM Changer! |
Wow :) That does sound like a lot of work. A first point - Perhaps a good starting point is to figure out how stable the P.S. Thanks for the dev environment instructions. |
I've found a bug where GAMChanger sometimes doesn't populate the 'metrics' / 'feature' / 'history' panel. It seems that when this happens, the GAMChanger interface has failed to load the validation samples, because the status bar says "0/0 validation samples selected".
This seems to occur sometimes based on the data that is provided, and might have something to do with missing data points, but I'm struggling to figure out exactly what the cause is.
Below is the smallest reproducing example I can come up with.See following comment for a better MWE.
I've attached the CSV files, which differ in that the 'succeed' files have a single extra data point. That is, when loading 'demo-[X|y]-fail.csv' the GamChanger interface loads, but the side panel doesn't populate (unexpected behaviour). When loading 'demo-[X|y]-succeed.csv', the GamChanger interface loads and the side panel populates the metrics as expected.demo-X-fail.csv
demo-X-succeed.csv
demo-y-fail.csv
demo-y-succeed.csv
The text was updated successfully, but these errors were encountered: