This is a package for performing Bayesian Optimization over String Spaces (BOSS). It accompanies https://arxiv.org/pdf/2010.00979.pdf and provides notebooks to recreate all the experiments.
The code is built upon the emukit Bayesian optimization library. We recommend following their tutorials to get started (https://github.com/emukit/emukit/tree/master/notebooks)
We currently support the following spaces:
- unconstrained strings of fixed-length
- locally-constrained strings of fixed-length
- strings of varied length following constraints given by a context-free grammar
- a candidate set of strings of varied length
and provide implementations for the following surrogate models:
- Gaussian process with a linear kernel applied to a one-hot-encoding of strings
- Gaussian process with an RBF kernel applied to a bag-of-ngrams representation of strings
- Gaussian process with an SSK kernel
- Gaussian process with a split SSK kernel (for scaling SSK to long strings)
- Random search
We also provide GPU support for string kernel GPs through GPflow. For an example see