Skip to content

Files

Latest commit

 

History

History
222 lines (133 loc) · 6.14 KB

slides.md

File metadata and controls

222 lines (133 loc) · 6.14 KB
theme layout highlighter colorSchema favicon title
default
cover
shiki
light
favicon/url
How we used Polars to build functime, a next gen ML forecasting library

🔮 functime

How we used Polars and global forecasting to build
a next-generation ML forecasting library

👤 Luca Baggi
💼 ML Engineer @xtream
🛠️ Maintainer @functime

📍 Talk outline

🔮 The problem with forecasting

📈 functime's answer

🐻‍❄️ What is Polars?

🌏 What is global forecasting?

💻 A forecasting workflow with functime

🔎 Diagnostic tools

🔌 References


🔮 The problem with forecasting

A new paradigm to evaluate the forecasting process

"We spend far too many resources generating, reviewing, adjusting, and approving our forecasts, while almost invariably failing to achieve the level of accuracy desired." (source)

Mike Gilliland
Board of Directors of the International Institute of Forecasters


🔮 The problem with forecasting

A new paradigm to evaluate the forecasting process

"The focus needs to change. We need to shift our attention from esoteric model building to the forecasting process itself – its efficiency and its effectiveness." (source)

Mike Gilliland
Board of Directors of the International Institute of Forecasters


📈 functime's answer

Reframe the problem

Make forecasting just work at a reasonable scale (~90% of use cases).

  1. Forecast thousands of time series without distributed systems (PySpark).
  2. Feature-engineering and diagnostics API compatible with panel datasets.
  3. Smoothly translate form experimentation to production.

This can be achieved with two ingredients: Polars and global forecasting.


🐻‍❄️ What is Polars?

A brief description

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  1. A dataframe frontend: work with a Python object and not a SQL table.
  2. Utilises all cores on your machine, efficiently (more on this later).
  3. Uses 50+ years of relational database research to optimise the query.
  4. In-process, like sqlite (OLTP), duckdb (OLAP) and LanceDB (vector).

🐻‍❄️ What is Polars?

What makes it so fast

  1. Efficient data representation and I/O with Apache Arrow
  2. Work stealing, AKA efficient multithreading.
  3. Query optimisations through lazy evaluation (e.g.: DataFrame.sort("col1").head(5) in pandas vs Polars).

🌏 What is global forecasting?

A lesson from forecasting competitions

Global forecasting just means to fit a single model on all the time series in your panel dataset.

This approach proved successful in multiple forecasting competitions, most notably M4 (1 2) and M5 (1).


🌏 What is global forecasting?

A lesson from forecasting competitions

Gradient boosted regression trees secured the top spots, but linear models work well too, provided some thoughtful and deliberate feature engineering.

Here's the recipe to make functime: a powerful query engine to perform blazingly fast feature engineering, followed by a single model.fit().

Doesn't have to be best model, but fast to iterate on and scalable to thousands of time series on your laptop.


layout: intro

💻 Forecasting with functime

Time for some dangerous live coding 🥶


💻 Forecasting with functime

What I could not show

  • Prediction intervals with conformal predictions.
  • Hyperparameter tuning with flaml.
  • Advanced feature extraction.
  • Censored forecasts.
  • LLM data analysis.

🔌 References

A deep dive into the Arrow ecosystem and Polars internals


🔌 References

More PyData Global 2023 talks


🔌 References

Documentation and communities


layout: intro

🙏 Thank you!

Please share your feedback! My address is lucabaggi [at] duck.com