Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.

Fixing bugs when running container models on multiple GPUs #445

Merged
merged 12 commits into from
Apr 23, 2021

Conversation

ant0nsc
Copy link
Contributor

@ant0nsc ant0nsc commented Apr 22, 2021

Fixes two bugs:

  • The use_gpu flag for container models was not picked up correctly, always running without GPU
  • When running inference for container models with the test_step method, PL would fail when running on >1 GPU
    Adds an extra test to run the HelloContainer model in AzureML

@ant0nsc ant0nsc changed the title Adding tests for running inference on a container model when >1 GPU is present Fixing bugs when running container models on multiple GPUs Apr 23, 2021
@ant0nsc ant0nsc requested review from Shruthi42 and melanibe April 23, 2021 11:54
@ant0nsc ant0nsc enabled auto-merge (squash) April 23, 2021 15:34
# Lightning does not cope with having two calls to .fit or .test in the same script. As a workaround for
# now, restrict number of GPUs to 1, meaning that it will not start DDP.
self.container.max_num_gpus = 1
trainer = create_lightning_trainer(self.container, num_nodes=1)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trainer, _ = create... looks more readable than [0] but minor stuff

@ant0nsc ant0nsc merged commit c298155 into main Apr 23, 2021
@ant0nsc ant0nsc deleted the antonsc/multi-gpu-container branch April 23, 2021 16:15
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants