Docker Containerized Jupyter Notebook

Hyperparameter Optimization using parallel model training in Spark

This docker stack allows to train forecasting models using any Machine Learning (sklearn based) or Statistical Model with the help of pypspark. Moreover, hyperparameter optimization via the package hyperopt is also available.

Currently RandomForestRegression and XGBRegressor can be used to train models using temporal cross validation and then make inferences based on the best model.

Model training and management is done via mlflow.

In order to reproduce the results initialize docker. If using WSL2:

sudo /etc/init.d/docker start

If using Linux based distributed System:

sudo systemctl start docker

Then, create image from Docker file:

sudo docker build -t pysparkforecast:latest .

Run interactive Jupyter Lab session from Docker Image:

sudo docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work pysparkforecast:latest

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
img		img
notebooks		notebooks
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
ipython_kernel_config.py		ipython_kernel_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker Containerized Jupyter Notebook

Hyperparameter Optimization using parallel model training in Spark

About

Releases

Packages

Languages

john2408/Pyspark_Hyperopt_Forecasting

Folders and files

Latest commit

History

Repository files navigation

Docker Containerized Jupyter Notebook

Hyperparameter Optimization using parallel model training in Spark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages