This tutorial is about sktime - a unified framework for machine learning with time series. sktime contains algorithms and tools for building, applying, evaluating modular pipelines and composites for a variety of time series learning tasks, including forecasting, classification, regression.
sktime
is easily extensible by anyone, and interoperable with the python data science stack.
This tutorial gives a walkthrough of new sktime
features in 2022-2023, together with an updated general introduction.
In the tutorial, we will move through notebooks section by section.
You have different options how to run the tutorial notebooks:
- Run the notebooks in the cloud on Binder - for this you don't have to install anything!
- Run the notebooks on your machine. Clone this repository, get conda, install the required packages (
sktime
,seaborn
,jupyter
) in an environment, and open the notebooks with that environment. For detail instructions, see below. For troubleshooting, see sktime's more detailed installation instructions. - or, use python venv, and/or an editable install of this repo as a package. Instructions below.
Please let us know on the sktime discord if you have any issues during the conference, or join to ask for help anytime.
The tutorial gives an updated 30 minute introduction to sktime base features with a focus on forecasting, and then proceeds with a vignette slideshow introducing the most important features added since pydata global 2022:
- Upgraded base interface using scikit-base
- Rework of the forecasting pipeline interface
- fully distributional probabilistic forecasts and metrics
- extended parallelism, including parallel broadcasting to hierarchical data
- composable time series classifiers, regressors, distances, time series aligners
- reproducibility features such as blueprint and fitted estimator serialization
- benchmarking frameworks for replicating studies such as M4/M5
Each feature vignette will come with links to further, extended tutorials where applicable.
sktime
not just a package, but also an active community which aims to be welcoming to new joiners.
sktime is developed by an open community, with aims of ecosystem integration in a neutral, charitable space. We welcome contributions and seek to provides opportunity for anyone worldwide.
We invite anyone to get involved as a developer, user, supporter (or any combination of these).
-
Europython 2023 - General sktime introduction, half-day workshop
-
PyCon Prague 2023 - Forecasting, Advanced Pipelines, Benchmarking
-
Pydata Amsterdam 2023 - Probabilistic prediction, forecasting, evaluation
-
ODSC Europe 2023 - Forecasting, Pipelines, and ML Engineering
-
Pydata London 2023 - Time Series Classification, Regression, Distances & Kernels
-
Pydata London 2022 - How to implement your own estimator in sktime
If you're interested in contributing to sktime, you can find out more how to get involved here.
Any contributions are welcome, not just code!
To run the notebooks locally, you will need:
- a local repository clone
- a python environment with required packages installed
To clone the repository locally:
git clone https://github.com/sktime/sktime-tutorial-pydata-global-2023
- Create a python virtual environment:
conda create -y -n sktime_pydata python=3.11
- Install required packages:
conda install -y -n sktime_pydata pip sktime seaborn jupyter pmdarima statsmodels dtw-python
- Activate your environment:
conda activate sktime_pydata
- If using jupyter: make the environment available in jupyter:
python -m ipykernel install --user --name=sktime_pydata
- Create a python virtual environment:
python -m venv sktime_pydata
- Activate your environment:
source sktime_pydata/bin/activate
for Linux- sktime_pydata/Scripts/activate` for Windows
- Install the requirements:
pip install -r requirements
- If using jupyter: make the environment available in jupyter:
python -m ipykernel install --user --name=sktime_pydata