Skip to content

Commit dcd2bc5

Browse files
docs for forecasting task (#443)
* docs for forecasting task * avoid directly import extra dependencies * Update docs/dev.rst Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * make ForecastingDependenciesNotInstalledError a str message * make ForecastingDependenciesNotInstalledError a str message * update readme and examples * add explanation for univariant models in example Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com>
1 parent 95c5fe4 commit dcd2bc5

16 files changed

+434
-198
lines changed

README.md

+98-2
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@ Copyright (C) 2021 [AutoML Groups Freiburg and Hannover](http://www.automl.org/
44

55
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed **Auto-PyTorch**, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).
66

7-
Auto-PyTorch is mainly developed to support tabular data (classification, regression).
7+
Auto-PyTorch is mainly developed to support tabular data (classification, regression) and time series data (forecasting).
88
The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
9+
Details about Auto-PyTorch for multi-horizontal time series forecasting tasks can be found in the paper ["Efficient Automated Deep Learning for Time Series Forecasting"](https://arxiv.org/abs/2205.05511) (also see below for bibtex ref).
910

1011
Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/master).
1112

@@ -27,7 +28,9 @@ In other words, we evaluate the portfolio on a provided data as initial configur
2728
Then API starts the following procedures:
2829
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
2930
2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
30-
3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
31+
3. **Evaluate baselines**
32+
* ***Tabular dataset*** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
33+
* ***Time Series Forecasting dataset*** : Train a dummy predictor that repeats the last observed value in each series
3134
4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
3235
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
3336
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
@@ -50,6 +53,14 @@ pip install autoPyTorch
5053

5154
```
5255

56+
Auto-PyTorch for Time Series Forecasting requires additional dependencies
57+
58+
```sh
59+
60+
pip install autoPyTorch[forecasting]
61+
62+
```
63+
5364
### Manual Installation
5465

5566
We recommend using Anaconda for developing as follows:
@@ -70,6 +81,20 @@ python setup.py install
7081

7182
```
7283

84+
Similarly, to install all the dependencies for Auto-PyTorch-TimeSeriesForecasting:
85+
86+
87+
```sh
88+
89+
git submodule update --init --recursive
90+
91+
conda create -n auto-pytorch python=3.8
92+
conda activate auto-pytorch
93+
conda install swig
94+
pip install -e[forecasting]
95+
96+
```
97+
7398
## Examples
7499

75100
In a nutshell:
@@ -105,6 +130,66 @@ score = api.score(y_pred, y_test)
105130
print("Accuracy score", score)
106131
```
107132

133+
For Time Series Forecasting Tasks
134+
```py
135+
136+
from autoPyTorch.api.time_series_forecasting import TimeSeriesForecastingTask
137+
138+
# data and metric imports
139+
from sktime.datasets import load_longley
140+
targets, features = load_longley()
141+
142+
# define the forecasting horizon
143+
forecasting_horizon = 3
144+
145+
# Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the
146+
# list, or a single pd.DataFrame that records the series
147+
# index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate
148+
# column
149+
# Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets
150+
# Normally the value to be forecasted should follow the training sets
151+
y_train = [targets[: -forecasting_horizon]]
152+
y_test = [targets[-forecasting_horizon:]]
153+
154+
# same for features. For uni-variant models, X_train, X_test can be omitted and set as None
155+
X_train = [features[: -forecasting_horizon]]
156+
# Here x_test indicates the 'known future features': they are the features known previously, features that are unknown
157+
# could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,
158+
# we could also omit X_test
159+
known_future_features = list(features.columns)
160+
X_test = [features[-forecasting_horizon:]]
161+
162+
start_times = [targets.index.to_timestamp()[0]]
163+
freq = '1Y'
164+
165+
# initialise Auto-PyTorch api
166+
api = TimeSeriesForecastingTask()
167+
168+
# Search for an ensemble of machine learning algorithms
169+
api.search(
170+
X_train=X_train,
171+
y_train=y_train,
172+
X_test=X_test,
173+
optimize_metric='mean_MAPE_forecasting',
174+
n_prediction_steps=forecasting_horizon,
175+
memory_limit=16 * 1024, # Currently, forecasting models use much more memories
176+
freq=freq,
177+
start_times=start_times,
178+
func_eval_time_limit_secs=50,
179+
total_walltime_limit=60,
180+
min_num_test_instances=1000, # proxy validation sets. This only works for the tasks with more than 1000 series
181+
known_future_features=known_future_features,
182+
)
183+
184+
# our dataset could directly generate sequences for new datasets
185+
test_sets = api.dataset.generate_test_seqs()
186+
187+
# Calculate test accuracy
188+
y_pred = api.predict(test_sets)
189+
score = api.score(y_pred, y_test)
190+
print("Forecasting score", score)
191+
```
192+
108193
For more examples including customising the search space, parellising the code, etc, checkout the `examples` folder
109194

110195
```sh
@@ -163,6 +248,17 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
163248
}
164249
```
165250

251+
```bibtex
252+
@article{deng-ecml22,
253+
author = {Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer},
254+
title = {Efficient Automated Deep Learning for Time Series Forecasting},
255+
year = {2022},
256+
booktitle = {Machine Learning and Knowledge Discovery in Databases. Research Track
257+
- European Conference, {ECML} {PKDD} 2022},
258+
url = {https://doi.org/10.48550/arXiv.2205.05511},
259+
}
260+
```
261+
166262
## Contact
167263

168264
Auto-PyTorch is developed by the [AutoML Groups of the University of Freiburg and Hannover](http://www.automl.org/).

autoPyTorch/api/time_series_forecasting.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@
77
from autoPyTorch.api.base_task import BaseTask
88
from autoPyTorch.automl_common.common.utils.backend import Backend
99
from autoPyTorch.constants import MAX_WINDOW_SIZE_BASE, TASK_TYPES_TO_STRING, TIMESERIES_FORECASTING
10-
from autoPyTorch.data.time_series_forecasting_validator import \
11-
TimeSeriesForecastingInputValidator
10+
from autoPyTorch.data.time_series_forecasting_validator import TimeSeriesForecastingInputValidator
1211
from autoPyTorch.data.utils import (
1312
DatasetCompressionSpec,
1413
get_dataset_compression_mapping

autoPyTorch/constants.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,10 @@
5454
CLASSIFICATION_OUTPUTS = [BINARY, MULTICLASS, MULTICLASSMULTIOUTPUT]
5555
REGRESSION_OUTPUTS = [CONTINUOUS, CONTINUOUSMULTIOUTPUT]
5656

57-
# Constants for Forecasting Tasks
57+
ForecastingDependenciesNotInstalledMSG = "Additional dependencies must be installed to work with time series " \
58+
"forecasting tasks! Please run \n pip install autoPyTorch[forecasting] \n to "\
59+
"install the corresponding dependencies!"
60+
5861

5962
# The constant values for time series forecasting comes from
6063
# https://github.com/rakshitha123/TSForecasting/blob/master/experiments/deep_learning_experiments.py

autoPyTorch/evaluation/abstract_evaluator.py

+13-57
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,19 @@
1919
import autoPyTorch.pipeline.image_classification
2020
import autoPyTorch.pipeline.tabular_classification
2121
import autoPyTorch.pipeline.tabular_regression
22-
import autoPyTorch.pipeline.time_series_forecasting
22+
try:
23+
import autoPyTorch.pipeline.time_series_forecasting
24+
forecasting_dependencies_installed = True
25+
except ModuleNotFoundError:
26+
forecasting_dependencies_installed = False
2327
import autoPyTorch.pipeline.traditional_tabular_classification
2428
import autoPyTorch.pipeline.traditional_tabular_regression
2529
from autoPyTorch.automl_common.common.utils.backend import Backend
2630
from autoPyTorch.constants import (
2731
CLASSIFICATION_TASKS,
2832
FORECASTING_BUDGET_TYPE,
2933
FORECASTING_TASKS,
34+
ForecastingDependenciesNotInstalledMSG,
3035
IMAGE_TASKS,
3136
MULTICLASS,
3237
REGRESSION_TASKS,
@@ -38,12 +43,16 @@
3843
BaseDataset,
3944
BaseDatasetPropertiesType
4045
)
41-
from autoPyTorch.datasets.time_series_dataset import TimeSeriesSequence
4246
from autoPyTorch.evaluation.utils import (
4347
DisableFileOutputParameters,
4448
VotingRegressorWrapper,
4549
convert_multioutput_multiclass_to_multilabel
4650
)
51+
try:
52+
from autoPyTorch.evaluation.utils_extra import DummyTimeSeriesForecastingPipeline
53+
forecasting_dependencies_installed = True
54+
except ModuleNotFoundError:
55+
forecasting_dependencies_installed = False
4756
from autoPyTorch.pipeline.base_pipeline import BasePipeline
4857
from autoPyTorch.pipeline.components.training.metrics.base import autoPyTorchMetric
4958
from autoPyTorch.pipeline.components.training.metrics.utils import (
@@ -314,61 +323,6 @@ def get_default_pipeline_options() -> Dict[str, Any]:
314323
'runtime': 1}
315324

316325

317-
class DummyTimeSeriesForecastingPipeline(DummyClassificationPipeline):
318-
"""
319-
A wrapper class that holds a pipeline for dummy forecasting. For each series, it simply repeats the last element
320-
in the training series
321-
322-
323-
Attributes:
324-
random_state (Optional[Union[int, np.random.RandomState]]):
325-
Object that contains a seed and allows for reproducible results
326-
init_params (Optional[Dict]):
327-
An optional dictionary that is passed to the pipeline's steps. It complies
328-
a similar function as the kwargs
329-
n_prediction_steps (int):
330-
forecasting horizon
331-
"""
332-
def __init__(self, config: Configuration,
333-
random_state: Optional[Union[int, np.random.RandomState]] = None,
334-
init_params: Optional[Dict] = None,
335-
n_prediction_steps: int = 1,
336-
) -> None:
337-
super(DummyTimeSeriesForecastingPipeline, self).__init__(config, random_state, init_params)
338-
self.n_prediction_steps = n_prediction_steps
339-
340-
def fit(self, X: Dict[str, Any], y: Any,
341-
sample_weight: Optional[np.ndarray] = None) -> object:
342-
self.n_prediction_steps = X['dataset_properties']['n_prediction_steps']
343-
y_train = subsampler(X['y_train'], X['train_indices'])
344-
return DummyClassifier.fit(self, np.ones((y_train.shape[0], 1)), y_train, sample_weight)
345-
346-
def _generate_dummy_forecasting(self, X: List[Union[TimeSeriesSequence, np.ndarray]]) -> List:
347-
if isinstance(X[0], TimeSeriesSequence):
348-
X_tail = [x.get_target_values(-1) for x in X]
349-
else:
350-
X_tail = [x[-1] for x in X]
351-
return X_tail
352-
353-
def predict_proba(self, X: Union[np.ndarray, pd.DataFrame],
354-
batch_size: int = 1000) -> np.ndarray:
355-
X_tail = self._generate_dummy_forecasting(X)
356-
return np.tile(X_tail, (1, self.n_prediction_steps)).astype(np.float32).flatten()
357-
358-
def predict(self, X: Union[np.ndarray, pd.DataFrame],
359-
batch_size: int = 1000) -> np.ndarray:
360-
X_tail = np.asarray(self._generate_dummy_forecasting(X))
361-
if X_tail.ndim == 1:
362-
X_tail = np.expand_dims(X_tail, -1)
363-
return np.tile(X_tail, (1, self.n_prediction_steps)).astype(np.float32).flatten()
364-
365-
@staticmethod
366-
def get_default_pipeline_options() -> Dict[str, Any]:
367-
return {'budget_type': 'epochs',
368-
'epochs': 1,
369-
'runtime': 1}
370-
371-
372326
def fit_and_suppress_warnings(logger: PicklableClientLogger, pipeline: BaseEstimator,
373327
X: Dict[str, Any], y: Any
374328
) -> BaseEstimator:
@@ -543,6 +497,8 @@ def __init__(self, backend: Backend,
543497
self.predict_function = self._predict_proba
544498
elif self.task_type in FORECASTING_TASKS:
545499
if isinstance(self.configuration, int):
500+
if not forecasting_dependencies_installed:
501+
raise ModuleNotFoundError(ForecastingDependenciesNotInstalledMSG)
546502
self.pipeline_class = DummyTimeSeriesForecastingPipeline
547503
elif isinstance(self.configuration, str):
548504
raise ValueError("Only tabular classifications tasks "

autoPyTorch/evaluation/tae.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
from autoPyTorch.automl_common.common.utils.backend import Backend
2626
from autoPyTorch.constants import (
2727
FORECASTING_BUDGET_TYPE,
28+
ForecastingDependenciesNotInstalledMSG,
2829
STRING_TO_TASK_TYPES,
2930
TIMESERIES_FORECASTING,
3031
)
@@ -34,7 +35,11 @@
3435
NoResamplingStrategyTypes
3536
)
3637
from autoPyTorch.evaluation.test_evaluator import eval_test_function
37-
from autoPyTorch.evaluation.time_series_forecasting_train_evaluator import forecasting_eval_train_function
38+
try:
39+
from autoPyTorch.evaluation.time_series_forecasting_train_evaluator import forecasting_eval_train_function
40+
forecasting_dependencies_installed = True
41+
except ModuleNotFoundError:
42+
forecasting_dependencies_installed = False
3843
from autoPyTorch.evaluation.train_evaluator import eval_train_function
3944
from autoPyTorch.evaluation.utils import (
4045
DisableFileOutputParameters,
@@ -152,6 +157,8 @@ def __init__(
152157
self.resampling_strategy_args = dm.resampling_strategy_args
153158

154159
if STRING_TO_TASK_TYPES.get(dm.task_type, -1) == TIMESERIES_FORECASTING:
160+
if not forecasting_dependencies_installed:
161+
raise ModuleNotFoundError(ForecastingDependenciesNotInstalledMSG)
155162
eval_function: Callable = forecasting_eval_train_function
156163
if isinstance(self.resampling_strategy, (HoldoutValTypes, CrossValTypes)):
157164
self.output_y_hat_optimization = output_y_hat_optimization

autoPyTorch/evaluation/time_series_forecasting_train_evaluator.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@
1313

1414
from autoPyTorch.automl_common.common.utils.backend import Backend
1515
from autoPyTorch.constants import SEASONALITY_MAP
16-
from autoPyTorch.evaluation.abstract_evaluator import DummyTimeSeriesForecastingPipeline
1716
from autoPyTorch.evaluation.train_evaluator import TrainEvaluator
1817
from autoPyTorch.evaluation.utils import DisableFileOutputParameters
18+
from autoPyTorch.evaluation.utils_extra import DummyTimeSeriesForecastingPipeline
1919
from autoPyTorch.pipeline.components.training.metrics.base import autoPyTorchMetric
2020
from autoPyTorch.pipeline.components.training.metrics.metrics import MASE_LOSSES
2121
from autoPyTorch.utils.hyperparameter_search_space_update import HyperparameterSearchSpaceUpdates

autoPyTorch/evaluation/utils_extra.py

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# The functions and classes implemented in this module all require extra requirements.
2+
# We put them here to make it easier to be wrapped by try-except process
3+
from typing import Any, Dict, List, Optional, Union
4+
5+
from ConfigSpace import Configuration
6+
7+
import numpy as np
8+
9+
import pandas as pd
10+
11+
from sklearn.dummy import DummyClassifier
12+
13+
from autoPyTorch.datasets.time_series_dataset import TimeSeriesSequence
14+
from autoPyTorch.utils.common import subsampler
15+
16+
17+
class DummyTimeSeriesForecastingPipeline(DummyClassifier):
18+
"""
19+
A wrapper class that holds a pipeline for dummy forecasting. For each series, it simply repeats the last element
20+
in the training series
21+
22+
23+
Attributes:
24+
random_state (Optional[Union[int, np.random.RandomState]]):
25+
Object that contains a seed and allows for reproducible results
26+
init_params (Optional[Dict]):
27+
An optional dictionary that is passed to the pipeline's steps. It complies
28+
a similar function as the kwargs
29+
n_prediction_steps (int):
30+
forecasting horizon
31+
"""
32+
def __init__(self, config: Configuration,
33+
random_state: Optional[Union[int, np.random.RandomState]] = None,
34+
init_params: Optional[Dict] = None,
35+
n_prediction_steps: int = 1,
36+
) -> None:
37+
self.config = config
38+
self.init_params = init_params
39+
self.random_state = random_state
40+
super(DummyTimeSeriesForecastingPipeline, self).__init__(strategy="uniform")
41+
self.n_prediction_steps = n_prediction_steps
42+
43+
def fit(self, X: Dict[str, Any], y: Any,
44+
sample_weight: Optional[np.ndarray] = None) -> object:
45+
self.n_prediction_steps = X['dataset_properties']['n_prediction_steps']
46+
y_train = subsampler(X['y_train'], X['train_indices'])
47+
return DummyClassifier.fit(self, np.ones((y_train.shape[0], 1)), y_train, sample_weight)
48+
49+
def _generate_dummy_forecasting(self, X: List[Union[TimeSeriesSequence, np.ndarray]]) -> List:
50+
if isinstance(X[0], TimeSeriesSequence):
51+
X_tail = [x.get_target_values(-1) for x in X]
52+
else:
53+
X_tail = [x[-1] for x in X]
54+
return X_tail
55+
56+
def predict_proba(self, X: Union[np.ndarray, pd.DataFrame],
57+
batch_size: int = 1000) -> np.ndarray:
58+
X_tail = self._generate_dummy_forecasting(X)
59+
return np.tile(X_tail, (1, self.n_prediction_steps)).astype(np.float32).flatten()
60+
61+
def predict(self, X: Union[np.ndarray, pd.DataFrame],
62+
batch_size: int = 1000) -> np.ndarray:
63+
X_tail = np.asarray(self._generate_dummy_forecasting(X))
64+
if X_tail.ndim == 1:
65+
X_tail = np.expand_dims(X_tail, -1)
66+
return np.tile(X_tail, (1, self.n_prediction_steps)).astype(np.float32).flatten()
67+
68+
@staticmethod
69+
def get_default_pipeline_options() -> Dict[str, Any]:
70+
return {'budget_type': 'epochs',
71+
'epochs': 1,
72+
'runtime': 1}

0 commit comments

Comments
 (0)