dask-kubernetes
creates a Dask cluster on Google Container Engine.
It uses Google Cloud Storage bucket to store your notebook for persistence so there is no need to use a persistent volume.
- Create a GCS bucket for storing your notebooks
- Change
c.GoogleStorageContentManager.default_path
injupyter-config.py
to your GCS path - Create a GKE cluster of your choice (Recommend 2CPU 7.5G or larger each node), make sure turn on legacy authorisation mode
kubectl apply -f ./kube/
- Connect to service using port forwarding
kubectl port-forward svc/svc-notebooks 8888:8888
, or use the public ip fromkubectl get svc
- Start using cluster!
from dask_kubernetes import KubeCluster # See a sample worker spec in `config/worker-spec-sample.yaml` cluster = KubeCluster.from_yaml('...your yaml path') cluster.scale(3) # the desired number of nodes from dask.distributed import Client client = Client(cluster)
- Change the
Dockerfile
, build your image, and push it to any of the image storage service. - Change the image name in
30-deployment.yaml
file - Apply your kubernetes configuration