Skip to content

kalantar/iter8-kfserving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Iter8-kfserving

Iter8-kfserving enables metrics-driven experiments, progressive delivery, and automated rollouts for ML models served over Kubernetes and OpenShift clusters.

The picture below illustrates metrics-driven progressive canary release of a KFServing model using iter8-kfserving.

Progressive canary rollout orchestrated by iter8-kfserving

Table of Contents

Quick start on Minikube

Steps 1 to 7 demonstrate metrics-driven progressive canary release of a KFServing model using iter8-kfserving. This demo uses KFServing v0.5.0-rc2.

Before you begin, you will need Minikube, Kustomize v3, and Go 1.13+.

Step 1: Start Minikube with sufficient resources.

minikube start --cpus 6 --memory 12288 --kubernetes-version=v1.17.11 --driver=docker

Step 2: Install KFServing, kfserving-monitoring, and iter8-kfserving.

curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/platformsetup.sh | /bin/bash

Step 3: In a separate terminal, setup Minikube tunnel. If prompted, enter password.

minikube tunnel --cleanup

Step 4: Create a KFServing v1beta1 inferenceservice with a default model. Update it with a canary model. This step may take a couple of minutes.

curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/inferenceservicesetup.sh | /bin/bash

Step 5: In a separate terminal, generate prediction requests for the inferenceservice.

curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/predictionrequests.sh | /bin/bash

Step 6: Create the iter8-kfserving canary experiment.

kubectl apply -f https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/experiment.yaml
In this step, you are creating an iter8 experiment resource object in the Kubernetes cluster, which looks as follows.
apiVersion: iter8.tools/v2alpha1
kind: Experiment
metadata:
  name: experiment-1
spec:
  target: default/my-model
  strategy:
    type: Canary
  criteria:
    indicators:
    - 95th-percentile-tail-latency
    objectives:
    - metric: mean-latency
      upperLimit: 1000
    - metric: error-rate
      upperLimit: "0.01"
  duration:
    intervalSeconds: 15
    maxIterations: 12
The above spec asks iter8 to perform a canary release experiment for the inferenceservice named my-model in the default namespace; during the experiment, the default and canary model versions will be assessed every 15 seconds over 12 iterations; when the experiment completes, the canary version will be considered successful (winner) if its mean-latency is within 1000 msec and its error rate is within 1%. If canary is successful, it will be rolled out: i.e., 100% of the traffic will be shifted to it.

Step 7: In a separate terminal, periodically describe the experiment.

Install iter8ctl. You can change the directory where iter8ctl binary is installed by changing GOBIN below.

GO111MODULE=on GOBIN=/usr/local/bin go get github.com/iter8-tools/iter8ctl@v0.1.0-alpha

Periodically describe the experiment.

while clear; do
  kubectl get experiment experiment-1 -o yaml | iter8ctl describe -f -
  sleep 15
done

You should see output similar to the following.

******
Experiment name: experiment-1
Experiment namespace: default
Experiment target: default/my-model

******
Number of completed iterations: 10

******
Winning version: canary

******
Objectives
+--------------------------+---------+--------+
|        OBJECTIVE         | DEFAULT | CANARY |
+--------------------------+---------+--------+
| mean-latency <= 1000.000 | true    | true   |
+--------------------------+---------+--------+
| error-rate <= 0.010      | true    | true   |
+--------------------------+---------+--------+

******
Metrics
+--------------------------------+---------+---------+
|             METRIC             | DEFAULT | CANARY  |
+--------------------------------+---------+---------+
| request-count                  | 132.294 |  73.254 |
+--------------------------------+---------+---------+
| 95th-percentile-tail-latency   | 298.582 | 294.597 |
| (milliseconds)                 |         |         |
+--------------------------------+---------+---------+
| mean-latency (milliseconds)    | 229.529 | 230.090 |
+--------------------------------+---------+---------+
| error-rate                     |   0.000 |   0.000 |
+--------------------------------+---------+---------+

The experiment should complete after 12 iterations (~3 mins). Once the experiment completes, inspect the InferenceService object.

kubectl get isvc/my-model

You should see 100% of the traffic shifted to the canary model, similar to the below output.

# output of the above command should be similar to the below
NAME       URL                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
my-model   http://my-model.default.example.com   True           100                              my-model-predictor-default-zwjbq   5m

About

Iter8-KFServing Domain Package

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages