Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Explore the data with continuous output and category input #540

Open
Vu1992 opened this issue May 9, 2024 · 4 comments
Open

Explore the data with continuous output and category input #540

Vu1992 opened this issue May 9, 2024 · 4 comments

Comments

@Vu1992
Copy link

Vu1992 commented May 9, 2024

Hi,

Thank for your great work. I have one question regard to the Explore data. Is it possible to use the following code to explain the continuous output and category input in Explore the data:

marginal = Marginal(names).explain_data(X_train, y_train, name='Train Data')
show(marginal)

When i try to use the above code, they return with Type error: Unable to do the formular for 'str'

@paulbkoch
Copy link
Collaborator

Hi @Vu1992 -- It should handle continuous output and category input. I don't see that error message in our repo or on the internet. Can you include a stack trace? Also, is the data public?

@Vu1992
Copy link
Author

Vu1992 commented May 13, 2024

Hi @paulbkoch ,

Thank for your reply. Unfortunately that the data is private, but i can show you what i'm trying to do. I have a dataframe and do the following step with df is my data as a table.
A=df[['BRANCH']] ; B=df[['Gross_Incurred']]; names=['BRANCH']
So basically A and B have the value as in the image bellow
image
image
Then I use your code for Data explorer
marginal = Marginal(names).explain_data(A, B, name='Train Data'); show(marginal)
Then python comeback to me with Type Error: unsupported operand type(s) for -: 'str' and 'str

@paulbkoch
Copy link
Collaborator

I tried to replicate this with the following code:

import numpy as np
import pandas as pd
from interpret.data import Marginal
from interpret import show
names=['BRANCH']
A = pd.DataFrame()
A["BRANCH"] = pd.Series(np.array(['VC', 'VC', 'MS', 'VH'], dtype=np.str_))
B = pd.DataFrame()
B["Gross_Incurred"] = pd.Series(np.array([18000000.0, 36200000000.0, 0.0, -50000000.0], dtype=float))
marginal = Marginal(names).explain_data(A, B, name='Train Data'); show(marginal)

My example works though. Any idea what could be different?

@Vu1992
Copy link
Author

Vu1992 commented May 15, 2024

Thank for your help.
I don't know what have gone wrong last time but now i tried again it work but the graph do not change when i change to Type Categorical even in your replication.
when i add continuous variable, it show like this
image
but when i want to see the categorical variable, nothing change
image

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

No branches or pull requests

2 participants