Skip to content

OpenVINO Model Server 2021.1

Compare
Choose a tag to compare
@dtrawins dtrawins released this 06 Oct 12:42
· 2003 commits to main since this release

This is a major release of OpenVINO Model Server. It is a completely rewritten implementation of the serving component. Upgrade from Python-based version (2020.4) to C++ implementation (2021.1) should be mostly transparent. There are no changes required on the client side. Exposed API is unchanged but some configuration settings and deployment methods might be slightly adjusted.

Key New Features and Enhancements

  • Much higher scalability in a single service instance. You can now utilize the full capacity of the available hardware. Expect linear scalability when introducing additional resources while avoiding any bottleneck on the frontend.
  • Lower latency between the client and the server. This is especially noticeable with high performance accelerators or CPUs.
  • Reduced footprint. By switching to C++ and reducing dependencies, the Docker image is reduced to ~400MB (for CPU, NCS and HDDL support) and ~800MB (for the image including also iGPU support).
  • Reduced RAM usage. Thanks to reduced number of external software dependencies, OpenVINO Model Server allocates less memory on start up.
  • Easier deployment on bare-metal or inside a Docker container.
  • Support for online model updates.The server monitors configuration file changes and reloads models as needed without restarting the service.
  • Model ensemble (preview). Connect multiple models to deploy complex processing solutions and reduce overhead of sending data back and forth.
  • Azure Blob Storage support. From now on you can host your models in Azure Blob Storage containers.
  • Updated helm chart for easy deployment in Kubernetes

Changes in version 2021.1

Moving from 2020.4 to 2021.1 introduces a few changes and optimizations which primarily impact the server deployment and configuration process. These changes are documented below.

  • Docker Container Entrypoint
    To simplify deployment with containers, Docker imageentrypoint was added. Now the container startup requires only parameters specific to the Model Server executable:
    Old command:
    docker run -d -v $(pwd)/model:/models/my_model/ -e LOG_LEVEL=DEBUG -p 9000:9000 openvino/model_server /ie-serving-py/start_server.sh ie_serving model --model_path /models/face-detection --model_name my_model --port 9000 --shape auto
    New command:
    docker run -d -v $(pwd)/model:/models/my_model/ -p 9000:9000 openvino/model_server --model_path /models/my_model --model_name my_model --port 9000 --shape auto --log_level DEBUG
  • Simplified Command Line Parameters
    Subcommands model and config are no longer used. Single-model mode or multi-model mode of serving is determined based on whether --config_path or --model_name is defined. --config_path or --model_name are exclusive.
  • Changed default THROUGHPUT_STREAMS settings for the CPU and GPU device plugin
    In python implementation, the default configuration was optimized for minimal latency results with a single stream of inference request. In version 2021.1, the default configuration for the server concurrency CPU_THROUGHPUT_STREAMS and GPU_THROUGHPUT_STREAMS are calculated automatically based on the available resources. It ensure both low latency and efficient parallel processing. If you need to serve the models only for a single client on high performance systems, set a parameter like below:
    --plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}'
  • Log Level and Log File Path
    Instead of environment variables LOG_LEVEL and LOG_PATH, log level and path are now defined in command line parameters to simplify configuration.
    --log_level DEBUG/INFO(default)/ERROR
  • grpc_workers Parameter Meaning
    In the Python implementation (2020.4 and below) this parameter defined the number of frontend threads. In the C++ implementation (2021.1 and above) this defines the number of internal gRPC server objects to increase the maximum bandwidth capacity. The default value of 1 should be sufficient for most scenarios. Consider tuning it if you expect very high load from multiple parallel clients.
  • Model Data Type Conversion
    In the Python implementation (2020.4 and below) the input tensors of data type different than expected by the model were automatically converted to match required data type. In some cases, such conversion impacted the overall performance of inference request. In the version 2021.1, the user input data type must be the same as the model input data type. The client receives an error indicating incorrect input data precision, which gives immediate feedback to correct the format.
  • Proxy Settings
    no_proxy environment variable is not used with the cloud storage for models. The http_proxy and https_proxy settings are common for all remote models deployed in OpenVINO Model Server. In case you need to deploy both models stored behind the proxy and direct, run two instances of the model server.
    Refer to troubleshooting guide to learn about known issues and workarounds.
  • Default Docker security context
    By default OpenVINO Model Server process starts inside the docker container in the context of ovms account with uid 5000. It was root context in the previous versions. The change is enforcing the best practice of minimal required permissions. In case you need to change the security context, use –user flag in docker run command.

Note: Git history of C++ development is stored on a main branch (new default). Python implementation history is preserved on a master branch.

You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.1 or
docker pull openvino/model_server:2021.1-gpu