Fixing bugs when running container models on multiple GPUs #445

ant0nsc · 2021-04-22T15:02:34Z

Fixes two bugs:

The use_gpu flag for container models was not picked up correctly, always running without GPU
When running inference for container models with the test_step method, PL would fail when running on >1 GPU
Adds an extra test to run the HelloContainer model in AzureML

fix inference for container models

javier-alvarez · 2021-04-23T16:15:24Z

InnerEye/ML/run_ml.py

+            # Lightning does not cope with having two calls to .fit or .test in the same script. As a workaround for
+            # now, restrict number of GPUs to 1, meaning that it will not start DDP.
+            self.container.max_num_gpus = 1
+            trainer = create_lightning_trainer(self.container, num_nodes=1)[0]


trainer, _ = create... looks more readable than [0] but minor stuff

ant0nsc added 12 commits April 22, 2021 15:41

adding test

cd641b0

fix env

7d35a54

fix yaml

63e34aa

fix yaml

9b90d5f

fix yaml

0fe1a6e

fix env

036ba92

simplify build

f5dc35a

fix download problem in AzureML

63e0570

fix yaml syntax

fbebbf2

fix use_gpu bug

0feb38a

fix inference for container models

changelog

cf9b1f1

flake

7755435

ant0nsc changed the title ~~Adding tests for running inference on a container model when >1 GPU is present~~ Fixing bugs when running container models on multiple GPUs Apr 23, 2021

ant0nsc requested review from Shruthi42 and melanibe April 23, 2021 11:54

melanibe approved these changes Apr 23, 2021

View reviewed changes

ant0nsc enabled auto-merge (squash) April 23, 2021 15:34

javier-alvarez approved these changes Apr 23, 2021

View reviewed changes

Shruthi42 approved these changes Apr 23, 2021

View reviewed changes

ant0nsc merged commit c298155 into main Apr 23, 2021

ant0nsc deleted the antonsc/multi-gpu-container branch April 23, 2021 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing bugs when running container models on multiple GPUs #445

Fixing bugs when running container models on multiple GPUs #445

ant0nsc commented Apr 22, 2021 •

edited

Loading

javier-alvarez Apr 23, 2021

Fixing bugs when running container models on multiple GPUs #445

Fixing bugs when running container models on multiple GPUs #445

Conversation

ant0nsc commented Apr 22, 2021 • edited Loading

javier-alvarez Apr 23, 2021

Choose a reason for hiding this comment

ant0nsc commented Apr 22, 2021 •

edited

Loading