Verbose option during training #252

onacrame · 2021-06-11T13:55:10Z

Training time can be quite long, particular with inner bagging is used. It would be helpful to have a verbose option to understand where the EBM model is while in training and how the model is performing on the validation set.

interpret-ml · 2021-06-14T13:33:21Z

Hi @onacrame,

This is a reasonable suggestion, and one we've been thinking about developing on our end too. The main reason we haven't implemented it yet is due to EBMs parallelization model -- unfortunately it's a bit trickier for us to provide reasonable verbose outputs than in the "standard" boosting algorithm setting.

The primary complexities are due to 1) bagging (as you've pointed out), and 2) early stopping. The default EBM builds 8 mini-EBM models in parallel, and assuming your machine has enough cores, they tend to all finish at approximately the same time. This has two implications:

A progress bar/verbose output just reporting which bag has finished (like sklearn's Random Forest) may not be that helpful by default, as all 8 bags tend to finish at the same time but might individually take a very long time to run. Of course, this would be helpful in the setting where # bags >> # cores.
It's difficult to provide true realtime validation metrics, because the final model is an average of all the mini-models once they finish training.
It's also hard to show a progress bar "per bag" due to early stopping -- it's very difficult to know apriori how early the model will exit. So we may show a progress bar from 1 -> 5000, but have the model early stop at iteration 800, leading to misleading estimates of true runtime.

We have a couple options we've been thinking about, and would be curious on your (and anyone else reading this) thoughts:

Just report the number of outer_bags completed, without realtime validation metrics.
Report outer_bags completed, with real time validation from one randomly selected bag.
Show a progress bar per outer_bag from 1 to max_rounds, which may exit early. We can also show validation metrics per bag in this setting, but it may become overwhelming if outer_bags is increased significantly.

Thanks again for the great question!
-InterpretML Team

onacrame · 2021-06-15T21:23:34Z

Thanks for the detailed response. Completely get the limitations/practicalities now.

JoshuaC3 · 2021-06-16T10:10:01Z

I think a verbose setting such as verbose=0/None gives no output and verbose=100 prints every 100th round would be sufficient. This the general behaviour of the "Big Three" GBM models.

Could this not be done by storing the scores for each mini-model in memory at each 100th round (in the case verbose=100), then, when the final model of the mini-models reaches this round, the average is computed and then printed?

I understand this could very slightly increase training time, but I think it is really important for user experience, debugging/finding frozen models and, probably most importantly, understand how the model is training and therefore having a better understanding of how the models works and can be interpreted by the user. For me it strikes at the core of this package!

Finally, the output for training interpretability in CatBoost is as follows, and can be extremely insightful:

from catboost import CatBoostClassifier, Pool

train_data = [[1, 3], [0, 4], [1, 7], [0, 3]]
train_labels = [1, 0, 1, 1]

eval_data = [[1, 4], [0, 4.2], [1, 7], [0, 3]]
eval_labels = [1, 0, 1, 1]

model = CatBoostClassifier(learning_rate=0.03)

model.fit(train_data,
          train_labels,
          verbose=100,
          eval_set=(eval_data, eval_labels),
          plot=True)

Just some food for though :D

bverhoeff · 2022-09-19T15:41:22Z

Hi @onacrame,

This is a reasonable suggestion, and one we've been thinking about developing on our end too. The main reason we haven't implemented it yet is due to EBMs parallelization model -- unfortunately it's a bit trickier for us to provide reasonable verbose outputs than in the "standard" boosting algorithm setting.

The primary complexities are due to 1) bagging (as you've pointed out), and 2) early stopping. The default EBM builds 8 mini-EBM models in parallel, and assuming your machine has enough cores, they tend to all finish at approximately the same time. This has two implications:

A progress bar/verbose output just reporting which bag has finished (like sklearn's Random Forest) may not be that helpful by default, as all 8 bags tend to finish at the same time but might individually take a very long time to run. Of course, this would be helpful in the setting where # bags >> # cores.

It's difficult to provide true realtime validation metrics, because the final model is an average of all the mini-models once they finish training.

It's also hard to show a progress bar "per bag" due to early stopping -- it's very difficult to know apriori how early the model will exit. So we may show a progress bar from 1 -> 5000, but have the model early stop at iteration 800, leading to misleading estimates of true runtime.

We have a couple options we've been thinking about, and would be curious on your (and anyone else reading this) thoughts:

Just report the number of outer_bags completed, without realtime validation metrics.

Report outer_bags completed, with real time validation from one randomly selected bag.

Show a progress bar per outer_bag from 1 to max_rounds, which may exit early. We can also show validation metrics per bag in this setting, but it may become overwhelming if outer_bags is increased significantly.

Thanks again for the great question! -InterpretML Team

Option 2) would be very helpful. If I could even estimate the remaining time based on rounds and tendency to convergence/early stopping, it would help tremendously.

Note that if the randomly chosen bag finished before the rest, reporting the next bag automatically would be a nice addition.

paulbkoch · 2022-09-20T09:59:10Z

EBMs generally take a fixed and consistent amount of time per round of boosting. You should be able to get a pretty good EBM after somewhere in the range of 1000-2000 rounds of boosting. Our default max is set to 5000, but usually it early stops before reaching that high.

I think if you did an initial run with 20 rounds, most of the time will have been spent boosting instead of as startup time. So, to get an overall estimate multiply that time by 100 and it should be in the ballpark for 2000 rounds.

I agree some feedback on time remaining would be better, but for now this is the best approach I can offer.

paulbkoch added the enhancement New feature or request label Feb 10, 2023

paulbkoch mentioned this issue Feb 10, 2023

Backlog #400

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verbose option during training #252

Verbose option during training #252

onacrame commented Jun 11, 2021

interpret-ml commented Jun 14, 2021

onacrame commented Jun 15, 2021

JoshuaC3 commented Jun 16, 2021 •

edited

Loading

bverhoeff commented Sep 19, 2022 •

edited

Loading

paulbkoch commented Sep 20, 2022 •

edited

Loading

Verbose option during training #252

Verbose option during training #252

Comments

onacrame commented Jun 11, 2021

interpret-ml commented Jun 14, 2021

onacrame commented Jun 15, 2021

JoshuaC3 commented Jun 16, 2021 • edited Loading

bverhoeff commented Sep 19, 2022 • edited Loading

paulbkoch commented Sep 20, 2022 • edited Loading

JoshuaC3 commented Jun 16, 2021 •

edited

Loading

bverhoeff commented Sep 19, 2022 •

edited

Loading

paulbkoch commented Sep 20, 2022 •

edited

Loading