Skip to content

john2408/Pyspark_Hyperopt_Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docker Containerized Jupyter Notebook

Hyperparameter Optimization using parallel model training in Spark

image size

This docker stack allows to train forecasting models using any Machine Learning (sklearn based) or Statistical Model with the help of pypspark. Moreover, hyperparameter optimization via the package hyperopt is also available.

Currently RandomForestRegression and XGBRegressor can be used to train models using temporal cross validation and then make inferences based on the best model.

Model training and management is done via mlflow.

In order to reproduce the results initialize docker. If using WSL2:

sudo /etc/init.d/docker start

If using Linux based distributed System:

sudo systemctl start docker

Then, create image from Docker file:

sudo docker build -t pysparkforecast:latest .

Run interactive Jupyter Lab session from Docker Image:

sudo docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work pysparkforecast:latest

alt text

About

Pyspark forecasting using parellel training on pyspark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages