This is a PyTorch-Lightning-based framework, based on our End-to-End Weak Supervision paper (NeurIPS 2021), that allows you to train your favorite neural network for weakly-supervised classification1
- only with multiple labeling functions (LFs)2, i.e. without any labeled training data!
- in an end-to-end manner, i.e. directly train and evaluate your neural net (end-model from here on), there's no need to train a separate label model any more as in Snorkel & co,
- with better test set performance and enhanced robustness against correlated or inaccurate LFs than prior methods like Snorkel
1 This includes learning from crowdsourced labels or annotations!
2 LFs are labeling heuristics, that output noisy labels for (subsets of) the training data
(e.g. crowdworkers or keyword detectors).
If you use this code, please consider citing our work
End-to-End Weak Supervision
Salva Rühling Cachay, Benedikt Boecking, and Artur Dubrawski
Advances in Neural Information Processing Systems (NeurIPS), 2021
arXiv:2107.02233v3
Credits
-
The following template was extremely useful as source of inspiration and for getting started with the PL+Hydra implementation: ashleve/lightning-hydra-template
-
Weasel image credits go to Rohan Chang for this Unsplash-licensed image
This library assumes familiarity with (multi-source) weak supervision, if that's not the case you may want to first learn its basics in e.g. this overview slides from Stanford or this Snorkel tutorial.
That being said, have a look at our examples and the notebooks therein showing you how to use Weasel for your own dataset, LF set, or end-model. E.g.:
-
A high-level starter tutorial, with few code, many explanations and including Snorkel as a baseline (so that if you are familiar with Snorkel you can see the similarities and differences to Weasel).
-
See how the whole WeaSEL pipeline works with all details, necessary steps and definitions for a new dataset & custom end-model. This notebook will probably make you learn the most about WeaSEL and how to apply it to your own problem.
-
A realistic ML experiment script with all that's part of a ML pipeline, including logging to Weight&Biases, arbitrary callbacks, and eventually retrieving your fully trained end-model.
Please have a look at the research code branch, which operates on pure PyTorch.
1. New environment (recommended, but optional)
conda create --name weasel python=3.9
conda activate weasel
2a: From source
python -m pip install git+https://github.com/autonlab/weasel#egg=weasel[all]
2b: From source, editable install
git clone https://github.com/autonlab/weasel.git
cd weasel
pip install -e .[all]
Minimal dependencies
Minimal dependencies, in particular not using Hydra, can be installed with
python -m pip install git+https://github.com/autonlab/weasel
The needed environment corresponds to conda env create -f env_gpu_minimal.yml
.
If you choose to use this variant, you won't be able to run some of the examples: You may want to have a look at this notebook that walks you through how to use Weasel without Hydra as the config manager.
Note: Weasel is under active development, some uncovered edge cases might exist, and any feedback is very welcomed!
Optional: This template config will help you get started with your own application, an analogous config is used in this tutorial script that you may want to check out.
Please have a look at the detailed instructions in this Readme.
Please have a look at the detailed instructions in this Readme.
@article{cachay2021endtoend,
author={R{\"u}hling Cachay, Salva and Boecking, Benedikt and Dubrawski, Artur},
journal={Advances in Neural Information Processing Systems},
title={End-to-End Weak Supervision},
year={2021}
}