My attempt to solve Kaggle's Predict Future Sales competition.
Contains:
- RNN approach (pytorch)
- Data processing script in Spark (scala)
- Creation of embeddings for item/item category/shop descriptions
- Feature importance check using eli5
- And more!
For the solution report, check here.
For RNN-based model, automatic hyperparemeter tuning is available with Guild AI AutoML tool.
Check this document for details.
- Datasets from https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data
- If you would like to use russian embeddings pretrained from wikipedia, download them from here
The default location for these is the data folder.
All you need is included in environment.yml. Conda is the package manager for this project.