fix: updates for RedisAI v1.2 - switch backend to PyTorch use Hummingbird.ml for scikit-learn -> TorchScript

bsbodden · bsbodden · commit c1c1d6a027ae · 2021-07-20T14:33:38.000-07:00
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
 .DS_Store
 .vscode
-venv
+.venv
+target
+iris.pt
diff --git a/README.md b/README.md
@@ -2,33 +2,29 @@
 
 ## Step 0: Setup RedisAI
 
-To use RedisAI, well, you need RedisAI. I've found the easiest way to do this is with Docker. First, pull the redismod image—it contians Redis with several popular modules ready to go:
+To use RedisAI, well, you need RedisAI. I've found the easiest way to do this is with Docker. First, pull the redismod image—it contains Redis with several popular modules ready to go:
 
     $ docker image pull redislabs/redismod
 
 Then run the image:
 
-    $ docker run \
-        -p 6379:6379 \
-        redislabs/redismod \
-        --loadmodule /usr/lib/redis/modules/redisai.so \
-          ONNX redisai_onnxruntime/redisai_onnxruntime.so
+    $ docker run -p 6379:6379 --name redismod redislabs/redismod
 
 And, you've got RedisAI up and running!
 
 ## Step 1: Setup Python Environment
 
-You need a Python environment to make this all work. I used Python 3.8—the latest, greatest, and most updatest at the time of this writing. I also used `venv` to manage my environment.
+You need a Python environment to make this all work. I used Python 3.9—the latest, greatest, and most updatest at the time of this writing. I also used `venv` to manage my environment.
 
-I'll assume you can download and install Python 3.8 on your own. So lets go ahead and setup the environment:
+I'll assume you can download and install Python 3.9 on your own. So lets go ahead and setup the environment:
 
-    $ python3.8 -m venv venv
+    $ python3.9 -m venv .venv
 
 Once `venv` is installed, you need to activate it:
 
-    $ . venv/bin/activate
+    $ . ./.venv/bin/activate
 
-Now when you run `python` from the command line, it will always point to Python3.8 and any libraries you install will only be for this specific environment. Usually, this includes a dated version of pip so go ahead an update that as well:
+Now when you run `python` from the command line, it will always point to Python3.9 and any libraries you install will only be for this specific environment. Usually, this includes a dated version of pip so go ahead an update that as well:
 
     $ pip install --upgrade pip
 
@@ -45,17 +41,18 @@ Next, let's install all the dependencies. These are all listed in `requirements.
 
 Run that command, and you'll have all the dependencies installed and will be ready to run the code.
 
-## Step 3: Build the ONNX Model
+## Step 3: Build the TorchScript Model
 
-This is as easy as running the following:
+Load and train a Sklearn LogisticRegression model using the Iris Data Set. Use Microsoft's Hummingbird.ml to convert the Sklearn model into a TorchScript model for loading into RedisAI. Run the `build.py` Python script to generate the `iris.pt` model file:
 
     $ python build.py
 
 ## Step 4: Deploy the Model into RedisAI
 
 NOTE: This requires redis-cli. If you don't have redis-cli, I've found the easiest way to get it is to download, build, and install Redis itself. Details can be found at the [Redis quickstart](https://redis.io/topics/quickstart) page:
 
-    $ redis-cli -x AI.MODELSET iris ONNX CPU BLOB < iris.onnx
+    $ redis-cli -x AI.MODELSTORE iris TORCH CPU BLOB < iris.pt
+    OK
 
 ## Step 5: Make Some Predictions
 
@@ -67,21 +64,30 @@ Set the input tensor with 2 sets of inputs of 4 values each:
 
     > AI.TENSORSET iris:in FLOAT 2 4 VALUES 5.0 3.4 1.6 0.4 6.0 2.2 5.0 1.5
 
-Make the predictions:
+Make the predictions (inferences) by executing the model:
 
-    > AI.MODELRUN iris INPUTS iris:in OUTPUTS iris:inferences iris:scores
+    > AI.MODELEXECUTE iris INPUTS 1 iris:in OUTPUTS 2 iris:inferences iris:scores
 
 Check the predictions:
 
-    > AI.TENSORGET iris_out:predictions VALUES
-
+    > AI.TENSORGET iris:inferences VALUES
     1) (integer) 0
     2) (integer) 2
 
 Check the scores:
 
-    > AI.TENSORGET iris_out:scores VALUES
-
-    (error) ERR tensor key is empty
-
-What? The output tensor for the scores is required to run the model, but nothing is written to it. I'm still trying to track down this bug. `¯\_(ツ)_/¯`
+    > AI.TENSORGET iris:scores VALUES
+    1) "0.96567678451538086"
+    2) "0.034322910010814667"
+    3) "3.4662525649764575e-07"
+    4) "0.00066925224382430315"
+    5) "0.45369619131088257"
+    6) "0.54563456773757935"
+
+### References
+
+* https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html
+* https://pytorch.org
+* https://pytorch.org/docs/stable/jit.html
+* https://microsoft.github.io/hummingbird/
+* https://github.com/microsoft/hummingbird
diff --git a/build.py b/build.py
@@ -1,26 +1,36 @@
 from sklearn.datasets import load_iris
 from sklearn.model_selection import train_test_split
 from sklearn.linear_model import LogisticRegression
+from hummingbird.ml import convert
+from zipfile import ZipFile
+import shutil
+import os
 
-from skl2onnx import convert_sklearn
-from skl2onnx.common.data_types import FloatTensorType
+TORCH_FILE = 'iris.torch'
+TORCH_ARCHIVE = f'{TORCH_FILE}.zip' # the output of torch model save()
+TORCHSCRIPT_BLOB_SRC = 'deploy_model.zip' # internal (in zip) torchscript blob
+TORCHSCRIPT_BLOB_DEST = 'iris.pt' # output name for extracted torchscript blob
 
 # prepare the train and test data
 iris = load_iris()
 X, y = iris.data, iris.target
 X_train, X_test, y_train, y_test = train_test_split(X, y)
 
-# train a model
+# train the model - using logistic regression classifier
 model = LogisticRegression(max_iter=5000)
 model.fit(X_train, y_train)
 
-# convert the model to ONNX
-initial_types = [
-  ('input', FloatTensorType([None, 4]))
-]
-
-onnx_model = convert_sklearn(model, initial_types=initial_types)
+# use hummingbird.ml to convert sklearn model to torchscript model (torch.jit backend)
+torch_model = convert(model, 'torch.jit', test_input=X_train, extra_config={})
 
 # save the model
-with open("iris.onnx", "wb") as f:
-  f.write(onnx_model.SerializeToString())
+torch_model.save(TORCH_FILE)
+
+# extract the TorchScript binary payload
+with ZipFile(TORCH_ARCHIVE) as z:
+    with z.open(TORCHSCRIPT_BLOB_SRC) as zf, open(TORCHSCRIPT_BLOB_DEST, 'wb') as f:
+        shutil.copyfileobj(zf, f)
+
+# clean up - remove the zip file
+if os.path.exists(TORCH_ARCHIVE):
+    os.remove(TORCH_ARCHIVE)
diff --git a/iris.onnx b/iris.onnx
diff --git a/requirements.txt b/requirements.txt
@@ -1,12 +1,14 @@
-joblib==0.17.0
-numpy==1.19.2
-onnx==1.7.0
-onnxconverter-common==1.7.0
-protobuf==3.13.0
-scikit-learn==0.23.2
-scipy==1.5.2
-six==1.15.0
-skl2onnx==1.7.0
-sklearn==0.0
-threadpoolctl==2.1.0
-typing-extensions==3.7.4.3
+dill==0.3.4
+hummingbird-ml==0.4.0
+joblib==1.0.1
+numpy==1.21.1
+onnx==1.9.0
+onnxconverter-common==1.8.1
+protobuf==3.17.3
+psutil==5.8.0
+scikit-learn==0.24.2
+scipy==1.7.0
+six==1.16.0
+threadpoolctl==2.2.0
+torch==1.9.0
+typing-extensions==3.10.0.0

-Original file line number
+Diff line change
@@ @@ -1,3 +1,5 @@ @@
 .DS_Store
 .vscode
 -venv
 +.venv
 +target
 +iris.pt