# pyBPL: Python-based Bayesian Program Learning pyBPL is a package of tools to implement Bayesian Program Learning (BPL) in Python 3 using PyTorch backend. The [original BPL implementation](https://github.com/brendenlake/BPL) was written in MATLAB (see Lake et al. (2015): "Human-level concept learning through probabilistic program induction"). I'm a Ph.D. student with Brenden Lake and I've developed this library for our [ongoing modeling work](https://lake-lab.github.io/projects/#concept-learning-in-minds-and-machines). At the moment, only the forward generative model is complete; inference algorithms are still in the works (contributions welcome!). The library is still experimental and under heavy development. The key components of this repository are: 1. A fully-differentiable implementation of BPL character learning tools including symbolic rendering, spline fitting/evaluation, and model scoring (log-likelihoods). 2. A generalized framework for representing concepts and conceptual background knowledge as probabilistic programs. Character concepts are one manifestation of the framework, included here as the preliminary use case. I am thankful to Maxwell Nye, Mark Goldstein and Tuan-Anh Le for their help developing this library. ## Setup This code repository requires Python 3 and PyTorch >= 1.0.0. A full list of requirements can be found in `requirements.txt`. To install, first run the following command to clone the repository into a folder of your choice: ``` git clone https://github.com/rfeinman/pyBPL.git ``` Then, run the following command to install the package: ``` python setup.py install ``` ## Documentation In order to generate the documentation site for the pyBPL library, execute the following commands from the root folder: ``` cd docs/ make html ``` HELP WANTED: documentation build is broken right now, needs to be fixed. ## Usage Example The following code loads the BPL model with pre-defined hyperparameters and samples a token ```python from pybpl.library import Library from pybpl.model import CharacterModel # load the hyperparameters of the BPL graphical model (i.e. the "library") lib = Library(use_hist=True) # create the BPL graphical model model = CharacterModel(lib) # sample a character type from the prior P(Type) and score its log-probability char_type = model.sample_type() ll_type = model.score_type(char_type) # sample a character token from P(Token | Type=type) and score its log-probability char_token = model.sample_token(char_type) ll_token_given_type = model.score_token(char_type, char_token) # sample an image from P(Image | Token=token) image = model.sample_image(char_token) ll_image_given_token = model.score_image(char_token, image) ``` ## Status Notes #### General All functions required to sample character types, tokens and images are now complete. Currently, independent relations sample their position from a uniform distribution over the entire image window by default. To use the original spatial histogram from BPL, make sure to load the Library object with `use_hist=True`. Note, however, that log-likelihoods for spatial histograms are not differentiable. My Python implementations of the bottum-up image parsing algorithms are not yet complete (HELP WANTED! see `pybpl/bottomup` for current status). However, I have provided some wrapper functions that call the original matlab code using the [MATLAB Engine API for Python](https://www.mathworks.com/help/matlab/matlab-engine-for-python.html). These functions are located in `pybpl/matlab/bottomup`. You must have the MATLAB bindings installed to use this code. #### Library The library contains all of the parameters of the character learning BPL model. These parameters have been learned from the Omniglot dataset. The library data is stored as a series of `.mat` files in the subfolder `lib_data/`. I've included a Matlab script, `process_library.m`, which can be run inside the original BPL repository to obtain this folder of files. For an example of how to load the library, see `examples/generate_character.py`. ## Demos Currently there are 3 working demos, both found in the `examples` subfolder. #### 1. generate character You can generate a character type and sample a few tokens of the type by running the following command from the root folder: ``` python examples/generate_character.py ``` The script will sample a character type from the prior and then sample 4 tokens of the type, displaying the images. #### 2. optimize character type You can generate a character type and then optimize its parameters to maximize the likelihood of the type under the prior by running the following command from the root folder: ``` python examples/optimize_type.py ``` Optionally, you may add the integer parameter `--ns=<int>` to specify how many strokes you would like the generated character type to have. #### 3. bottom-up parsing To use the bottom-up parsing code, you must meet the following prerequisites: - You must have an active MATLAB installation and must have installed the [MATLAB Engine API for Python](https://www.mathworks.com/help/matlab/matlab-engine-for-python.html). - You must download the [BPL matlab repository](https://github.com/brendenlake/BPL) and all of its prerequisites, including the Lightspeed toolbox. The BPL repo must be added to your matlab path (alternatively, you may set a BPL_PATH environment variable as `export BPL_PATH="/path/to/BPL"`). With these prerequisites met, you can produce bottom-up parses using the skeleton extraction + random walks algorithm with the following example script: ``` python examples/parse_image.py ``` ## Citing If you use pyBPL for your research, you are encouraged (though not required) to cite this repository with the following BibTeX reference: ``` @misc{feinman2020pybpl, title={{pyBPL}}, author={Feinman, Reuben}, year={2020}, version={0.1}, url={https://github.com/rfeinman/pyBPL} } ```