From 159a10fd64ea52335f0ad777fa8e1b53c8318f5d Mon Sep 17 00:00:00 2001 From: hhsecond Date: Tue, 11 Jun 2019 11:45:29 +0530 Subject: [PATCH] redisai example with onnx --- examples/onnx/ScikitLearn2Production.ipynb | 476 +++++++++++++++++++++ 1 file changed, 476 insertions(+) create mode 100644 examples/onnx/ScikitLearn2Production.ipynb diff --git a/examples/onnx/ScikitLearn2Production.ipynb b/examples/onnx/ScikitLearn2Production.ipynb new file mode 100644 index 000000000..58ccea766 --- /dev/null +++ b/examples/onnx/ScikitLearn2Production.ipynb @@ -0,0 +1,476 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "-g3UdPRxNO4h" + }, + "source": [ + "# Taking ML (Scikit Learn) to highly scalable production using RedisAI\n", + "Scikit learn is probably the most used machine learning package in the industry. Even though, there are few options readily available for taking deep learning to production (with tfserving etc), there were no widely accepted attempts to build a framework that could help us to take ML to production. Microsoft had build [ONNXRuntime](https://github.com/microsoft/onnxruntime) and the scikit learn exporter for this very purpose. \n", + "Very recently RedisAI had announced the support for ONNXRuntime as the third backend (Tensorflow and PyTorch was already supported). This makes us capable of pushing a scikit-learn model through ONNX to a super scalable production. This demo is focusing on showing how this can be accomplished. We'll train a linear regression model for predicting boston house price first. The trained model is then converted to ONNX IR using [sk2onnx](https://github.com/onnx/sklearn-onnx). Third part of the demo shows how to load the onnx binary into RedisAI runtime and how to communicate. " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "qsftvU2WMNMd" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: skl2onnx in /home/sherin/miniconda3/lib/python3.7/site-packages (1.4.9)\n", + "Requirement already satisfied: onnxconverter-common>=1.4.2 in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (1.5.0)\n", + "Requirement already satisfied: protobuf in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (3.7.0)\n", + "Requirement already satisfied: scikit-learn>=0.19 in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (0.21.2)\n", + "Requirement already satisfied: numpy>=1.15 in /home/sherin/miniconda3/lib/python3.7/site-packages/numpy-1.16.4-py3.7-linux-x86_64.egg (from skl2onnx) (1.16.4)\n", + "Requirement already satisfied: six in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (1.12.0)\n", + "Requirement already satisfied: onnx>=1.2.1 in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (1.5.0)\n", + "Requirement already satisfied: setuptools in /home/sherin/miniconda3/lib/python3.7/site-packages (from protobuf->skl2onnx) (40.2.0)\n", + "Requirement already satisfied: joblib>=0.11 in /home/sherin/miniconda3/lib/python3.7/site-packages (from scikit-learn>=0.19->skl2onnx) (0.13.2)\n", + "Requirement already satisfied: scipy>=0.17.0 in /home/sherin/miniconda3/lib/python3.7/site-packages (from scikit-learn>=0.19->skl2onnx) (1.2.1)\n", + "Requirement already satisfied: typing-extensions>=3.6.2.1 in /home/sherin/miniconda3/lib/python3.7/site-packages (from onnx>=1.2.1->skl2onnx) (3.7.2)\n", + "Requirement already satisfied: typing>=3.6.4 in /home/sherin/miniconda3/lib/python3.7/site-packages (from onnx>=1.2.1->skl2onnx) (3.6.6)\n", + "Requirement already satisfied: skl2onnx in /home/sherin/miniconda3/lib/python3.7/site-packages (1.4.9)\n", + "Requirement already satisfied: scikit-learn>=0.19 in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (0.21.2)\n", + "Requirement already satisfied: protobuf in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (3.7.0)\n", + "Requirement already satisfied: six in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (1.12.0)\n", + "Requirement already satisfied: onnxconverter-common>=1.4.2 in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (1.5.0)\n", + "Requirement already satisfied: numpy>=1.15 in /home/sherin/miniconda3/lib/python3.7/site-packages/numpy-1.16.4-py3.7-linux-x86_64.egg (from skl2onnx) (1.16.4)\n", + "Requirement already satisfied: onnx>=1.2.1 in /home/sherin/miniconda3/lib/python3.7/site-packages (from skl2onnx) (1.5.0)\n", + "Requirement already satisfied: scipy>=0.17.0 in /home/sherin/miniconda3/lib/python3.7/site-packages (from scikit-learn>=0.19->skl2onnx) (1.2.1)\n", + "Requirement already satisfied: joblib>=0.11 in /home/sherin/miniconda3/lib/python3.7/site-packages (from scikit-learn>=0.19->skl2onnx) (0.13.2)\n", + "Requirement already satisfied: setuptools in /home/sherin/miniconda3/lib/python3.7/site-packages (from protobuf->skl2onnx) (40.2.0)\n", + "Requirement already satisfied: typing-extensions>=3.6.2.1 in /home/sherin/miniconda3/lib/python3.7/site-packages (from onnx>=1.2.1->skl2onnx) (3.7.2)\n", + "Requirement already satisfied: typing>=3.6.4 in /home/sherin/miniconda3/lib/python3.7/site-packages (from onnx>=1.2.1->skl2onnx) (3.6.6)\n", + "Collecting git+https://github.com/RedisAI/redisai-py/@onnxruntime\n", + " Cloning https://github.com/RedisAI/redisai-py/ (to revision onnxruntime) to /tmp/pip-req-build-pu_kkk06\n", + " Running command git clone -q https://github.com/RedisAI/redisai-py/ /tmp/pip-req-build-pu_kkk06\n", + " Running command git checkout -b onnxruntime --track origin/onnxruntime\n", + " Switched to a new branch 'onnxruntime'\n", + " Branch 'onnxruntime' set up to track remote branch 'onnxruntime' from 'origin'.\n", + "Requirement already satisfied: redis in /home/sherin/miniconda3/lib/python3.7/site-packages (from redisai==0.3.0) (3.2.1)\n", + "Requirement already satisfied: hiredis in /home/sherin/miniconda3/lib/python3.7/site-packages (from redisai==0.3.0) (1.0.0)\n", + "Requirement already satisfied: rmtest in /home/sherin/miniconda3/lib/python3.7/site-packages (from redisai==0.3.0) (0.7.0)\n", + "Building wheels for collected packages: redisai\n", + " Building wheel for redisai (setup.py) ... \u001b[?25ldone\n", + "\u001b[?25h Stored in directory: /tmp/pip-ephem-wheel-cache-g5np7tfg/wheels/bc/41/6c/294c468fc56049440cf0957709cbc453e271fed1c009123730\n", + "Successfully built redisai\n", + "Installing collected packages: redisai\n", + " Found existing installation: redisai 0.2.0\n", + " Uninstalling redisai-0.2.0:\n", + " Successfully uninstalled redisai-0.2.0\n", + "Successfully installed redisai-0.3.0\n" + ] + } + ], + "source": [ + "# Installing dependencies\n", + "!pip install skl2onnx\n", + "!pip install skl2onnx\n", + "# !pip install redisai\n", + "# hack since the redisai version is not updated in pypi yet\n", + "!pip install git+https://github.com/RedisAI/redisai-py/@onnxruntime" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_boston\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.linear_model import LinearRegression\n", + "import sklearn" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "JWZWq00DMa-6" + }, + "outputs": [], + "source": [ + "from skl2onnx import convert_sklearn\n", + "from skl2onnx.common.data_types import FloatTensorType" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "zIt6HmiXRFQ4" + }, + "source": [ + "### RedisAI Python client\n", + "RedisAI has client utilites available in [different langauges](https://github.com/RedisAI/redisai-examples). We will be using the python client of RedisAI." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "EdZReRdfMnkM" + }, + "outputs": [], + "source": [ + "import redisai as rai\n", + "from redisai.model import Model as raimodel\n", + "try:\n", + " if rai.__version__ < '0.3.0':\n", + " raise\n", + "except:\n", + " raise RuntimeError('ONNX is introduced in redisai-py version 0.3.0. Upgrade!!')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "g9Zn1kyjQovG" + }, + "source": [ + "### Loading training and testing data" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "CdBte4O_Mtgu" + }, + "outputs": [], + "source": [ + "boston = load_boston()\n", + "X, y = boston.data, boston.target\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "rGYB8SSnMuKC" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(379, 13)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_train.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "0oLUozCwQwu_" + }, + "source": [ + "### Building & Training the model" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "aoZJU9WDMw4k" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model = LinearRegression()\n", + "model.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "IiwgcKbTM0qU" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Mean Squared Error: 22.90649510340278\n" + ] + } + ], + "source": [ + "pred = model.predict(X_test)\n", + "\n", + "mse = sklearn.metrics.mean_squared_error(y_test, pred)\n", + "print(\"Mean Squared Error: \", mse)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "6KBv1zO6Q4MH" + }, + "source": [ + "### Converting scikit learn model to ONNX" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "IBEX0pSWM4r6" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The maximum opset needed by this model is only 1.\n" + ] + } + ], + "source": [ + "# 1 is batch size and 13 is num features\n", + "# reference: https://github.com/onnx/sklearn-onnx/blob/master/skl2onnx/convert.py\n", + "initial_type = [('float_input', FloatTensorType([1, 13]))]\n", + "\n", + "onnx_model = convert_sklearn(model, initial_types=initial_type)\n", + "raimodel.save(onnx_model, 'boston.onnx')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "f0IRri6xRupt" + }, + "source": [ + "### Loading the ONNX model to RedisAI\n", + "We'll be using the same python client for rest of the example as well. Before we start the next you need to setup the RedisAI server (TODO: link to setting up tutorial). Once the server is up and running on an IP address (and a port), we have the required setup to complete this example. Let's jump right into it.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "JSDjdG14M869" + }, + "outputs": [], + "source": [ + "con = rai.Client(host='localhost', port=6379, db=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "WJg_HsCMXCsv" + }, + "source": [ + "#### Loading the model" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "7xhRjCTXW6gM" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "b'OK'" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model = raimodel.load(\"boston.onnx\")\n", + "con.modelset(\"onnx_model\", rai.Backend.onnx, rai.Device.cpu, model)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "_G0MSYviXR74" + }, + "source": [ + "#### Loading the input tensor" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "sovqr3LmXUVp" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "b'OK'" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "# dummydata taken from sklearn.datasets.load_boston().data[0]\n", + "dummydata = [\n", + " 0.00632, 18.0, 2.31, 0.0, 0.538, 6.575, 65.2, 4.09, 1.0, 296.0, 15.3, 396.9, 4.98]\n", + "tensor = rai.Tensor.scalar(rai.DType.float, *dummydata)\n", + "# If the tensor is too complex to pass it as python list, you can use BlobTensor that takes numpy array\n", + "# tensor = rai.BlobTensor.from_numpy(np.array(dummydata, dtype='float32'))\n", + "con.tensorset(\"input\", tensor)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "2a_1BIJwYHfC" + }, + "source": [ + "#### Running the model\n", + "As you know already, Redis is a key value store. You just saved the model to a key **\"onnx_model\"** and the tensor to another key **\"input\"**. Now we can invoke ONNX backend from RedisAI and ask it to take the model saved on the **\"onnx_model\"** key and tensor saved on the **\"input\"** key and run it against the model (first run will take the model from the given key and load it into the provided backend and keep it hot since then). While running the model we should let RedisAI know what should be the key to which we want to save the output (If all of these process seems efficientless to you because we need make multiple calls to run the model and network call is expensive, you should wait for the DAGRUN feature which will be coming out soon). In our example, we save the model output to the key **\"output\"** as given below." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "FSM8FsRvZYYK" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "b'OK'" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "con.modelrun(\"onnx_model\", [\"input\"], [\"output\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "colab_type": "text", + "id": "ZGpswT-yZwK-" + }, + "source": [ + "We can fetch the output by calling **tensorget**" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "colab": {}, + "colab_type": "code", + "id": "LjQ1UasVZ4Tz" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "House cost predicted by model is $29969.89631652832\n" + ] + } + ], + "source": [ + "outtensor = con.tensorget(\"output\", as_type=rai.BlobTensor)\n", + "print(f\"House cost predicted by model is ${outtensor.to_numpy().item() * 1000}\")" + ] + } + ], + "metadata": { + "colab": { + "name": "ScikitLearn2Production.ipynb", + "provenance": [], + "version": "0.3.2" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}