Author: Michelle Casbon
An example of LHC data on Kubeflow based on the Kaggle TrackML Particle Tracking Challenge.
- Create a cluster and install Kubeflow
- Run a notebook
- Run a pipeline
- Run hyperparameter tuning
Create a cluster with Click-to-deploy using default settings. Follow the provided instructions to setup OAuth credentials.
After the cluster is available and you are able to access the Kubeflow central dashboard, enable auto-provisioning with the following command:
gcloud beta container clusters update kubeflow \
--zone europe-west1-b \
--enable-autoprovisioning \
--max-cpu 128 \
--max-memory 1120 \
--max-accelerator type=nvidia-tesla-k80,count=4 \
--verbosity error
From the Kubeflow central dashboard, click on Notebooks and spawn a new instance. Use all defaults except for the following parameters:
CPU: 2
Memory: 12.0Gi
When the notebook instance is ready, click Connect and open a new Terminal. Run this command to import necessary libraries:
git clone https://github.com/LAL/trackml-library.git src/trackml-library
pip install src/trackml-library
pip install pandas
pip install matplotlib
pip install seaborn
Download sample data with this command:
mkdir input
gsutil cp gs://chasm-data/kaggle/trackml-particle-identification/train_sample.zip input
cd input
unzip train_sample.zip
Upload the file
notebooks/trackml-problem-explanation-and-data-exploration.ipynb
, which was
adapted from
Wesam Elshamy's
Kaggle Kernel
for use on Kubeflow v0.5.0, and open the notebook.
Each step in a pipeline references a container image. Build the necessary docker images with these commands:
docker/build.sh kfp_kubectl
docker/build.sh trackml
In a local Terminal or Cloud Shell, install the Kubeflow pipelines python SDK by running this command:
pip install -U kfp
Compile a pipeline by running it directly:
curl -O https://raw.githubusercontent.com/texasmichelle/kubeflow-cern/master/pipelines/trackml.py
./trackml.py
From the Kubeflow central dashboard, click on Pipeline Dashboard, then Upload
pipeline. Select the file you just created (trackml.py.tar.gz
) and then Upload.
Run the pipeline by first creating an experiment, then a run.
From the Kubeflow central dashboard, click on Notebooks, then Upload the file
notebooks/trackml-pipeline.ipynb
.
Run the notebook and click on the resulting links to view the pipeline executing.
Run once more, requesting GPU resources and watch auto-provisioning add a GPU node to the cluster before executing training.
Run the gpu-example on the cluster with this command:
kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha1/gpu-example.yaml
Observe auto-provisioning spin up 2 extra GPU nodes (5 total: 2 CPU, 3 GPU).