https://www.kaggle.com/c/m5-forecasting-accuracy/
This document shows how to reproduce 4th place solution for the M5 Forecasting - Accuracy competition. If you run into any trouble with the setup/code or have any questions please contact me at monsaraida@gmail.com
- AWS EC2
- Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - ami-0ac80df6eff0e70b5 (64 bit x86)
- m5.2xlarge (8 vCPUs, 16 GB memory, 100GB storage)
cd {path-to-dir}
unzip kaggle-m5a-4th.zip
cd {path-to-dir}/kaggle-m5a-4th
git clone https://github.com/pyenv/pyenv.git -b v1.2.16 ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
source ~/.bashrc
sudo apt-get update; sudo apt-get install --no-install-recommends make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev -y
pyenv install 3.7.6
pyenv local 3.7.6
pip install pipenv
pipenv install -r ./requirements.txt
pipenv shell
sudo apt install tree unzip
mkdir -p {path-to-dir}/kaggle-m5a-4th/raw
cd {path-to-dir}/kaggle-m5a-4th/raw
unzip m5-forecasting-accuracy.zip
cd {path-to-dir}/kaggle-m5a-4th
python ./m5a_main.py --help
usage: m5a_main.py [-h] [-ddp DATA_DIR_PATH] [-opn OUTPUT_NAME]
[-spr SAMPLING_RATE] [-flc FOLD_ID_LIST_CSV]
[-plc PREDICTION_HORIZON_LIST_CSV]
- data_dir_path (default: '.')
- root data directory path
- output_name (default: 'default')
- output directory name
- sampling_rate (default: 1.0)
- if sampling_rate is less than 1.0, items in dataset are randomly sampled
- fold_id_list_csv (default: '1941')
- fold_id is the last day of training data
- fold_id = 1941 : train d1 - d1941, predict d1942 - d1969 : for submission
- fold_id = 1913 : train d1 - d1913, predict d1914 - d1941 : for validation 1
- fold_id = 1885 : train d1 - d1885, predict d1886 - d1913 : for validation 2
- fold_id = 1857 : train d1 - d1857, predict d1858 - d1885 : for validation 3
- fold_id = 1829 : train d1 - d1829, predict d1830 - d1857 : for validation 4
- fold_id = 1577 : train d1 - d1577, predict d1578 - d1605 : for validation 5
- Multiple fold_id can be specified
- e.g. : '1941,1913,1885,1857,1829,1577'
- fold_id is the last day of training data
- prediction_horizon_list_csv (default: '7,14,21,28')
- weekly model : '7,14,21,28'
- monthly model : '28'
- day-by-day model : '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28'
cd {path-to-dir}/kaggle-m5a-4th
python ./m5a_main.py
cd {path-to-dir}/kaggle-m5a-4th/output/default
tree .
.
├── log
│ └── m5a_main_yyyy_mmdd.log
├── model
│ └── 1941
│ ├── lgb_model_CA_1_14.bin
│ ├── lgb_model_CA_1_21.bin
│ ├── lgb_model_CA_1_28.bin
│ ├── lgb_model_CA_1_7.bin
│ ├── lgb_model_CA_2_14.bin
│ ├── lgb_model_CA_2_21.bin
│ ├── lgb_model_CA_2_28.bin
│ ├── lgb_model_CA_2_7.bin
│ ├── lgb_model_CA_3_14.bin
│ ├── lgb_model_CA_3_21.bin
│ ├── lgb_model_CA_3_28.bin
│ ├── lgb_model_CA_3_7.bin
│ ├── lgb_model_CA_4_14.bin
│ ├── lgb_model_CA_4_21.bin
│ ├── lgb_model_CA_4_28.bin
│ ├── lgb_model_CA_4_7.bin
│ ├── lgb_model_TX_1_14.bin
│ ├── lgb_model_TX_1_21.bin
│ ├── lgb_model_TX_1_28.bin
│ ├── lgb_model_TX_1_7.bin
│ ├── lgb_model_TX_2_14.bin
│ ├── lgb_model_TX_2_21.bin
│ ├── lgb_model_TX_2_28.bin
│ ├── lgb_model_TX_2_7.bin
│ ├── lgb_model_TX_3_14.bin
│ ├── lgb_model_TX_3_21.bin
│ ├── lgb_model_TX_3_28.bin
│ ├── lgb_model_TX_3_7.bin
│ ├── lgb_model_WI_1_14.bin
│ ├── lgb_model_WI_1_21.bin
│ ├── lgb_model_WI_1_28.bin
│ ├── lgb_model_WI_1_7.bin
│ ├── lgb_model_WI_2_14.bin
│ ├── lgb_model_WI_2_21.bin
│ ├── lgb_model_WI_2_28.bin
│ ├── lgb_model_WI_2_7.bin
│ ├── lgb_model_WI_3_14.bin
│ ├── lgb_model_WI_3_21.bin
│ ├── lgb_model_WI_3_28.bin
│ └── lgb_model_WI_3_7.bin
└── result
└── 1941
├── feature_importance_CA_1_14.csv
├── feature_importance_CA_1_21.csv
├── feature_importance_CA_1_28.csv
├── feature_importance_CA_1_7.csv
├── feature_importance_CA_2_14.csv
├── feature_importance_CA_2_21.csv
├── feature_importance_CA_2_28.csv
├── feature_importance_CA_2_7.csv
├── feature_importance_CA_3_14.csv
├── feature_importance_CA_3_21.csv
├── feature_importance_CA_3_28.csv
├── feature_importance_CA_3_7.csv
├── feature_importance_CA_4_14.csv
├── feature_importance_CA_4_21.csv
├── feature_importance_CA_4_28.csv
├── feature_importance_CA_4_7.csv
├── feature_importance_TX_1_14.csv
├── feature_importance_TX_1_21.csv
├── feature_importance_TX_1_28.csv
├── feature_importance_TX_1_7.csv
├── feature_importance_TX_2_14.csv
├── feature_importance_TX_2_21.csv
├── feature_importance_TX_2_28.csv
├── feature_importance_TX_2_7.csv
├── feature_importance_TX_3_14.csv
├── feature_importance_TX_3_21.csv
├── feature_importance_TX_3_28.csv
├── feature_importance_TX_3_7.csv
├── feature_importance_WI_1_14.csv
├── feature_importance_WI_1_21.csv
├── feature_importance_WI_1_28.csv
├── feature_importance_WI_1_7.csv
├── feature_importance_WI_2_14.csv
├── feature_importance_WI_2_21.csv
├── feature_importance_WI_2_28.csv
├── feature_importance_WI_2_7.csv
├── feature_importance_WI_3_14.csv
├── feature_importance_WI_3_21.csv
├── feature_importance_WI_3_28.csv
├── feature_importance_WI_3_7.csv
├── feature_importance_agg_14.csv
├── feature_importance_agg_21.csv
├── feature_importance_agg_28.csv
├── feature_importance_agg_7.csv
├── feature_importance_all_14.csv
├── feature_importance_all_21.csv
├── feature_importance_all_28.csv
├── feature_importance_all_7.csv
├── holdout.csv
├── pred_h_14.csv
├── pred_h_21.csv
├── pred_h_28.csv
├── pred_h_7.csv
├── pred_h_all.csv
├── pred_v_14.csv
├── pred_v_21.csv
├── pred_v_28.csv
├── pred_v_7.csv
├── pred_v_all.csv
└── submission.csv
I would like to appreciate the participants and organizers of the competition very much. I have learned a lot from this competition and community. Thank you to all the kaggle community members.
-
Notebooks
-
Discussions
- Few thoughts about M5 competition
- Evaluation metric
- Three shades of Dark
- Moon Phase. Odd, yet helpful feature.