By @b-g Benedikt GroΓ
We find ourselves in an exciting technological moment. In the last few years, it seems magic started to happen in Artificial Intelligence βAIβ. After a long AI winter, machine learning βMLβ methods and techniques started to work.
If you follow ML news outlets like Two Minute Papers (ML papers nicely exlpained in 2 min videos) or the recent development of Runway ML App (think of "ready made" ML effect app) there seems to be popping up one interesting ML model after another. And often these ML models come with example code on Github! Yay!
But the excitment often fades away quickly, as even though the example code or the demo of the ML model doesn't look crazy complicated ... it can become quickly pure hell to actually get it running on your computer. Most often simply because of ultra specfic software and GPU driver dependencies, which propably don't go well with what you already have installed on your machine. It is often a total mess :(
This tutorial tries to jump in by showing you an approach to handle the ML software dependency complexity better! We are going to use a Docker Container (think of an entire operating system nicely bundled in a container) which has access to the GPU and the files of a host machine running it. So instead of installing everything direclty in the operating system of your local machine, we create a layer of abstraction for every model. You could also say every ML model gets its own (virtual) operation system, as if you had for every ML model a dedicated computer.
Let's assume you stumbled upon the "DeepFill" paper Generative Image Inpainting with Contextual Attention and the corresponding Github repository JiahuiYu/generative_inpainting. Here is a quick illustration on what DeepFill does:
Fancy! You provide a mask and Deepfill is hallucinating the content which matches the context.
The following sections are step by step instructions how to get DeepFill running in a Docker container. The process should be fairly similar for other models and hence can be seen as a general approach to encapsulate the setup complexity which comes with state of the art ML models.
My hope is furthermore that dedicated Docker containers will make ML models a lot more shareable and accessible for a wider audience in Art & Design, to facilitate the very needed debate of wider implications of AI/ML.
You will need the following hard- and software setup to be able to run Docker with GPU support:
- An Ubuntu computer/server with a Nvidia CUDA GPU
- Docker Engine with version >= 19.03.0
- Docker Compose with version >= 1.28.0
- Nvidia drivers with version >= 361
Follow the official documentation: https://docs.docker.com/install/linux/docker-ce/ubuntu/
Verify Docker version:
docker version
The output should be a long list with infos like "API version: 1.4" etc.
Follow the official documentation: https://docs.docker.com/compose/install/linux/
Verify Docker Compose version:
docker compose version
The output should be similar to "Docker Compose version v2.9.0".
Install CUDA along with latest nVidia driver for your graphics card.
- Go to: https://developer.nvidia.com/cuda-downloads
- Select Linux > x86_64 > Ubuntu
- Select your ubuntu version
- Select Installer type (we tested with deb local or deb network)
- Follow instructions
- After install, reboot your machine
- Test if nvidia driver are installed with:
nvidia-smi
Verify Nvidia drivers version:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:02:00.0 Off | N/A |
| N/A 40C P0 8W / N/A | 0MiB / 2002MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
- Follow the official quickstart documentation e.g. for Ubuntu 16.04/18.04:
# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install and reload Docker
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker
- Verify installation by running a dummy docker image
$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
# Should output something like
+----------------------------------------------------------+
| NVIDIA-SMI 418.87 Driver Version: 418.87 CUDA Version: 10.1
|-------------------------------+----------------------
Yay πππ ! You can go on to finally run ML models in Docker on your machine! Good news is that you just have to do the horrible installation part once.
_
So let's have a look in DeepFill Github repository JiahuiYu/generative_inpainting. The README.md is nice in the sense that it explains with a few images what DeepFill does, has references and even sections on requirements and how to run the demo. But you still won't be able to run it out of the box. Like other ML models DeepFill relies on very specific software dependencies. And as ML researchers are busy with their research, documenting software setups for a wider audience seem currently not to be a priority in those circles. The dream situation would be that there is already a Dockerfile
(e.g. Detectron2 is a notable exception), or at least arequirements.txt
(used in Python to define dependencies).
We have to create the Docker container on our own. This is where we have to start to play detective! :)
Stroll around in the repository and try to find clues of what we are going to need. I found the following:
- there is a badge saying Tensorflow v1.7.0
- under requirements the author states: python3, tensorflow, neuralgym
- OpenCV (cv2) is mentioned
- and to download the pretrained models and copy them to the folder
model_logs/
to your own Github account. Press the "Fork" button at JiahuiYu/generative_inpainting. Result should be github.com/{your-username}/generative_inpainting
Checkout your fork of generative_inpainting to your own computer.
Now we will create a Docker container which reflects all the requirements we have spotted. Add an empty file named Dockerfile
to your DeepFill repository.
We will base everything on an official Docker container from Nvidia. That way we get a clean Ubuntu with correctly installed CUDA / CUDNN / GPU / drivers. The syntax to write a docker container is quite understandable:
- FROM: new container should be based on "cuda" with tag "9.0-cudnn7-runtime-ubuntu16.04" published by "nvidia"
- RUN: run a single or many terminal commands to install something.
- apt-get: is the command line package manager of Ubuntu. Think of an app store for command line apps.
- pip3: is the command line package manager of Python 3.
Here is how the Dockerfile
looks:
FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
# Install a few basic command line tools e.g. zip, git, cmake
RUN apt-get update && \
apt-get install -y \
build-essential \
cmake \
git \
wget \
unzip \
yasm \
pkg-config \
curl
# Install Python 3, numpy and pip3
RUN apt-get install -y \
python3-dev \
python3-numpy \
python3-pip
# Install OpenCV
RUN apt-get install -y \
libopencv-dev \
python-opencv
# Cleanup apt-get installs
RUN rm -rf /var/lib/apt/lists/*
# Install Python dependencies
RUN pip3 --no-cache-dir install \
opencv-python \
Pillow \
pyyaml \
tqdm
# Install project specific dependencies
RUN pip3 --no-cache-dir install \
tensorflow-gpu==1.8.0 \
git+git://github.com/JiahuiYu/neuralgym.git@88292adb524186693a32404c0cfdc790426ea441
You can also think of the Dockerfile
as a very long list of installation instructions. There are additional docker keywords for config setting e.g. which network ports should be available etc.
We will later also talk about strategies for finding which versions / packages / cuda ... match your ML models and go well together.
docker build -t deepfill:v0 .
deepfill is the name of our container and v0 is our version tag
During the installation process Docker prints out what is going on. Keep your eyes open for red lines (errors). If everything goes well you can run the container now.
docker run -it --runtime=nvidia --volume $(pwd)/:/shared --workdir /shared deepfill:v0 bash
-
-it and bash run the container interactive, container should provide a bash terminal prompt
-
--runtime=nvidia container can access to GPU
-
--volume mount the current folder (the DeepFill repo) to folder /shared in the docker container filesystem. The folder is shared between the host and the docker container
-
deepfill:v0 run a docker container named deepfill with version v0
In a more complex scenario where you want to leverage GPUs to run your services, it can be useful to set up a Docker Compose configuration file. For example, you can create a docker-compose.yml
file with the following content:
version: "3"
services:
deepfill:
image: deepfill:v0
runtime: nvidia
volumes:
- $(pwd)/:/shared
working_dir: /shared
tty: true
stdin_open: true
command: bash
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
This configuration file defines a service named deepfill
that uses the deepfill:v0
Docker image and leverages the nvidia runtime to utilize GPUs. The devices section maps 1 of the host's NVIDIA GPUs by using nvidia driver. This configuration file can also be useful to specify GPUs to specific services in the same compose.
After running docker compose up -d
, you can enter the running container and start a bash session using the docker compose exec
command. The following command can be used to enter the deepfill
service container and start a bash session:
docker compose exec deepfill bash
The container will continue to run in the background, and you can start another bash session using the same command if needed.
Download the pretrained models e.g. Places2 (places background) or CelebA-HQ (faces) and copy it to folder model_logs
. The demo relies on it.
Copy these two images the DeepFill repo folder:
input.png | mask.png |
---|---|
Paste the command below in the terminal of our running DeepFill container:
python3 test.py --image input.png --mask mask.png --output output.png --checkpoint_dir model_logs/release_places2_256_deepfill_v2/
The terminal will return a lot of debugging infos ... don't bother. After a few seconds you should get this result:
output.png |
---|
Yay πππ !
To be honest it can take quite a while to figure out the requirements. Yes it is a mess. These strategies helped me figuring it out a bit less painful:
- Try to define the dependecy as specific as possible, as there is a steady release of updates which not always go well together. You want to be able to run your container a few weeks later:
- Bad: pip3 install tensorflow-gpu
- Good: pip3 install tensorflow-gpu==1.8.0
- Bad: FROM nvidia/cuda
- Good: FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
- Guess a cuda version which matches the age of the core ML libaray used e.g:
- Tensorflow 2.0 β cuda:10.2-cudnn7
- Tensorflow 1.8.0 β cuda:9.0-cudnn7
- If it is unclear, start with the new ones and gradually move back in time
- All available docker containers published by Nvidia can be found here: https://hub.docker.com/r/nvidia/cuda/
- Google error message in combination with specfic tensorflow / cuda versions e.g.
FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
and then running the demo gave meImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
. After reading a few posts it turned out that this is a typical error which can be avoided using cuda 9.0. - If there is a requirements state e.g. tensorflow-gpu==1.7.0 but you still have problems e.g.
E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7603 (compatibility version 7600) but source was compiled with 7005 (compatibility version 7000)
. Try to gently bump up or down the version. In the case of DeppFill using tensorflow-gpu==1.8.0 solved the issue.
- Article section around how to install Docker is based on the Install nvidia-docker guide from the OpenDataCam project
- Cover image based on OpenMojis β the open-source emoji and icon project. License: CC BY-SA 4.0
- DeepFill demo images are from the movie "Fast and Furious" (2019) ... and my silly experiments Fast and the Furious without guys and cars