hpbandster-sklearn
is a Python library providing a scikit-learn
wrapper - HpBandSterSearchCV
- for HpBandSter
, a hyper parameter tuning library.
HpBandSter
implements several cutting-edge hyper parameter algorithms, including HyperBand and BOHB. They often outperform standard Random Search, finding best parameter combinations in less time.
HpBandSter
is powerful and configurable, but its usage is often unintuitive for beginners and necessitating a large amount of boilerplate code. In order to solve that issue, HpBandSterSearchCV
was created as a drop-in replacement for scikit-learn
hyper parameter searchers, following its well-known and popular API, making it possible to tune scikit-learn
API estimators with minimal setup.
HpBandSterSearchCV
API has been based on scikit-learn
's HalvingRandomSearchCV
, implementing nearly all of the parameters it does.
pip install hpbandster-sklearn
Use it like any other scikit-learn
hyper parameter searcher:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.utils.validation import check_is_fitted
from hpbandster_sklearn import HpBandSterSearchCV
X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier(random_state=0)
np.random.seed(0)
param_distributions = {"max_depth": [2, 3, 4], "min_samples_split": list(range(2, 12))}
search = HpBandSterSearchCV(clf, param_distributions,random_state=0, n_jobs=1, n_iter=10, verbose=1).fit(X, y)
search.best_params_
You can also use ConfigSpace.ConfigurationSpace
objects instead of dicts (in fact, it is recommended)!
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.utils.validation import check_is_fitted
from hpbandster_sklearn import HpBandSterSearchCV
import ConfigSpace as CS
import ConfigSpace.hyperparameters as CSH
X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier(random_state=0)
np.random.seed(0)
param_distributions = CS.ConfigurationSpace(seed=42)
param_distributions.add_hyperparameter(CSH.UniformIntegerHyperparameter("min_samples_split", 2, 11))
param_distributions.add_hyperparameter(CSH.UniformIntegerHyperparameter("max_depth", 2, 4))
search = HpBandSterSearchCV(clf, param_distributions,random_state=0, n_jobs=1, n_iter=10, verbose=1).fit(X, y)
search.best_params_
Please refer to the documentation of this library, as well as to the documentation of HpBandSter
and ConfigSpace
for more information.
Pipelines and TransformedTargetRegressor
are also supported. Make sure to prefix the hyper parameter and resource names accordingly should you use either (or both) - for example, final_estimator__n_estimators
. n_samples
is not to be prefixed.
As almost every search algorithm in HpBandSter
leverages early stopping (mostly through Successive Halving), the user can configure the resource and budget to be used through the arguments of HpBandSterSearchCV
object.
search = HpBandSterSearchCV(
clf,
param_distributions,
resource_name='n_samples', # can be either 'n_samples' or a string corresponding to an estimator attribute, eg. 'n_estimators' for an ensemble
resource_type=float, # if specified, the resource value will be cast to that type before being passed to the estimator, otherwise it will be derived automatically
min_budget=0.2,
max_budget=1,
)
search = HpBandSterSearchCV(
clf,
param_distributions,
resource_name='n_estimators', # can be either 'n_samples' or a string corresponding to an estimator attribute, eg. 'n_estimators' for an ensemble
resource_type=int, # if specified, the resource value will be cast to that type before being passed to the estimator, otherwise it will be derived automatically
min_budget=20,
max_budget=200,
)
By default, the object will try to automatically determine the best resource, by checking the following in order:
'n_estimators'
, if the model has that attribute and thewarm_start
attribute'max_iter'
, if the model has that attribute and thewarm_start
attribute'n_samples'
- if the model doesn't supportwarm_start
, the dataset samples will be used as the resource instead, meaing the model will be iteratively fitted on a bigger and bigger portion of the dataset.
Furthermore, special support has been added for LightGBM
, XGBoost
and CatBoost
scikit-learn
estimators.
https://hpbandster-sklearn.readthedocs.io/en/latest/
HpBandSter
- https://github.com/automl/HpBandSterConfigSpace
- https://github.com/automl/ConfigSpacescikit-learn
- http://scikit-learn.org/
Antoni Baum (Yard1)