Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

EBM Loss functions #196

Closed
JoshuaC3 opened this issue Jan 20, 2021 · 10 comments
Closed

EBM Loss functions #196

JoshuaC3 opened this issue Jan 20, 2021 · 10 comments

Comments

@JoshuaC3
Copy link

What loss functions are being used for the boosting of the EBMs, both regression and classification? I searched the repo and could only find this but wasn't sure how _merged_pair_score_fn related to boosting.

Additionally, how can one use the non-standard metrics to train the model. E.g. assuming RMSE is used, how can we use MAE, Huber and custom loss functions? This is a major feature in other GBM/Boosted packages and having it would make EBM even more competitive.

Many thanks!! :D

@JoshuaC3
Copy link
Author

Any info on the types of loss function used to start with would be really useful. Thanks.

@davidlkl
Copy link

Hi I am also looking to implement custom loss function. However after reading the source, I think the loss function is implemented in the C library.

def _get_ebm_lib_path(debug=False):

@JoshuaC3
Copy link
Author

Yes. Unfortunately, I cannot read C++ so cannot even figure out what the current loss functions being used are. It also doesn't seem to be documented anywhere. Any info or insight on this would be great.

@davidlkl
Copy link

Maybe out of topic but I think you may find this useful:
https://github.com/ZebinYang/gaminet

They claimed it beats EBM in most of their cases, with similar setting (GA2M).

@interpret-ml
Copy link
Collaborator

interpret-ml commented Feb 8, 2021

Hi @JoshuaC3 , @davidlkl --

Yes, as you've pointed out, the loss function code is all implemented in C++ currently. We would like to expose this someday in python, but we're not clear at the moment how this will look and what the performance costs might be. For classification we currently use log loss, and MSE in regression. The EbmStats.h file contains most of these functions, but if you're not familiar with C++ it might be difficult to change these currently.

-InterpretML team

@JoshuaC3
Copy link
Author

Thank you! My assumption was MSE/RMSE but it is nice to have that confirmed. Yes, unfortunately I cannot write C++, otherwise I would be doing so PRs for this.

In my opinion, being able to write custom objective functions and custom eval scoring functions in python is really important. The other boosting libraries might should have some good examples of how they do this, as I don't notice any major changes in speed when using other functions (these days).

Thanks again for the amazing work and also the really helpful replies!!

@interpret-ml
Copy link
Collaborator

interpret-ml commented Feb 10, 2021

Hi @JoshuaC3 --

We've been looking at this question in more depth the last few days, and had a look at how XGBoost and LightGBM handle this internally. Exposing it at the python level probably won't happen for a while because their method relies on having an interface for accessing their internal dataset (DMatrix for XGBoost), and we don't yet have a clean separation of that concept in InterpretML. We do think though within the shorter term that it would be possible to make some changes in the C++ that would reduce the amount of code required for a new loss function to replacing just a few lines of code. We continue to investigate this and will update this thread if/once this change happens, and if it happens we'll also write out a description here on how to change it.

-InterpretML team

@JoshuaC3
Copy link
Author

That's really useful to know. I always wondered why XGB, LBG and I think CB used their own Dataset objects. I guess this is one motivation. Thank you interpret-ml!!! :D

@JDE65
Copy link

JDE65 commented Dec 8, 2022

Impatient to get progress on this side.
Loss function is the engine of the algorithm. Customizing it is sometime so efficient ;-)

@paulbkoch paulbkoch mentioned this issue Jan 26, 2023
@paulbkoch paulbkoch added the enhancement New feature or request label Jan 26, 2023
@paulbkoch
Copy link
Collaborator

Closing this to consolidate issues. We'll track updates regarding custom losses in the duplicate issue #281

@paulbkoch paulbkoch removed the enhancement New feature or request label Feb 10, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

No branches or pull requests

5 participants