Critical Evaluation of Time Series Foundation Models in Demand Forecasting

🌐 Abstract

Accurate forecasts are crucial as they enable organizations to make informed decisions about their supply chain. This research aims to benchmark and evaluate the efficiency of various foundation models in time series forecasting, especially in the domain of demand forecasting. The research employs traditional statistical, machine learning, and deep learning algorithms and compares their forecasting performance with popular foundational models TimeGPT and TimesFM. Both accuracy and uncertainty metrics are considered to establish a credible framework for benchmarking.

This study demonstrates that TimesFM emerged as the better-performing model across MASE and SMAPE and different time granularities. Foundational models were found to be at par with traditional models, presenting a strong case for wider research and adoption in industrial demand forecasting.

📄 Read the Full Publication here.

📈 Data

The data used for the study is sourced from two datasets:

Daily Time Granularity: The dataset is from the Rohlik Orders Forecasting Challenge. Data from four warehouses were utilized.
Weekly and Monthly Granularity: Data with 5,800 unique combinations from the VN1 Forecasting Accuracy Challenge dataset was taken for weekly level and aggregated to monthly levels.

These datasets, being recent, ensure that no pretrained models were exposed to them, enabling an unbiased evaluation.

📄 Algorithms

Statistical Models:

AutoARIMA, AutoETS, and AutoTBATS (StatsForecast library).

Machine Learning Models:

Bagging methods: Random Forest (RF).
Boosting algorithms: XGBoost and LightGBM (LGBM) (via MLForecast library).

Deep Learning Models:

Temporal Fusion Transformer (TFT).
NHITS (NeuralForecast library).

Foundation Models:

TimeGPT and TimesFM were compared with the above models.

✅ Results

Daily Data

TimesFM outperformed traditional algorithms.
LGBM performed well, reinforcing the strength of machine learning models.
TimeGPT, in both zeroshot and finetuned forms, lagged behind.

Weekly Data

TimesFM demonstrated the best performance across metrics (SMAPE, MASE).
Deep learning models like TFT and NHITS were competitive.
TimeGPT showed strong performance compared to statistical methods.

Monthly Data

TimesFM excelled in accuracy metrics (MASE, SMAPE).
Limited fine-tuning was feasible with TimeGPT due to its data requirements.

Overall Insights

Foundational models provide competitive forecasts and simplify workflows, especially in new data distributions.
Machine learning models remain strong contenders across granularities.

📊 Evaluation Metrics

Accuracy Metrics:
- SMAPE (Scaled Mean Absolute Percentage Error).
- MASE (Mean Absolute Scaled Error).
Uncertainty Metrics:
- CRPS (Continuous Ranked Probability Score).

🔍 Key Areas for Further Research

More FMs: Evaluate additional foundational models.
Cross-Domain Testing: Apply models to datasets from diverse industries.
Ensembles: Explore hybrid approaches combining FMs and traditional models.
Uncertainty Quantification: Improve prediction interval calibration for FMs.

🏁 Conclusion

This research has used demand forecasting datasets from forecasting competitions to establish a comparative study between the performances of Statistical, ML, DL and FMs across daily, weekly and monthly time horizons. To evaluate the performances of the algorithms, MASE & SMAPE were used as scaled errors are independent of the scale of the data. TimesFM emerged as the best performing algorithm across all time granularities. These were closely followed by the DL & vanilla ML models. TimeGPT has also outperformed the statistical and ML models across some time horizons. Overall, it can be concluded that the foundational models, although being very new members of a forecasters’ toolkit, has shown impressive performance and can be used to establish a strong baseline for further research. Also, prediction intervals need to be calibrated for better performance of TimesFM.The FMs can adapt to new data distributions with minimal tuning and do not require manual feature engineering and careful selection of lagged variables unlike ML regressors and thus allow the users to build and deploy forecasting solutions quickly and easily

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
Daily_Forecasting		Daily_Forecasting
Monthly		Monthly
Weekly		Weekly
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Critical Evaluation of Time Series Foundation Models in Demand Forecasting

🌐 Abstract

📈 Data

📄 Algorithms

Statistical Models:

Machine Learning Models:

Deep Learning Models:

Foundation Models:

✅ Results

Daily Data

Weekly Data

Monthly Data

Overall Insights

📊 Evaluation Metrics

🔍 Key Areas for Further Research

🏁 Conclusion

About

Releases

Packages

Languages

Satyajit-Chaudhuri/Critical_Evaluation-of_Foundational_Models_in_Demand_Forecasting

Folders and files

Latest commit

History

Repository files navigation

Critical Evaluation of Time Series Foundation Models in Demand Forecasting

🌐 Abstract

📈 Data

📄 Algorithms

Statistical Models:

Machine Learning Models:

Deep Learning Models:

Foundation Models:

✅ Results

Daily Data

Weekly Data

Monthly Data

Overall Insights

📊 Evaluation Metrics

🔍 Key Areas for Further Research

🏁 Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages