Skip to content

fix: Address multi-GPU issue in engine deserialize #2325

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Sep 27, 2023

Conversation

gs-olive
Copy link
Collaborator

Description

  • Fix issue where GPU ID of device compiled on was taking precedence over GPU ID of device in context
  • Clean up logic in device detection

Fixes #2319
Fixes #2269

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • [ x ] My code follows the style guidelines of this project (You can use the linters)
  • [ x ] I have performed a self-review of my own code
  • [ x ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ x ] I have made corresponding changes to the documentation
  • [ - ] I have added tests to verify my fix or my feature
  • [ x ] New and existing unit tests pass locally with my changes
  • [ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

@gs-olive gs-olive added the WIP Work is in progress, pull request should not be merged yet label Sep 18, 2023
@gs-olive gs-olive force-pushed the runtime_multi_gpu_fix branch from 54af72d to 754fd5a Compare September 19, 2023 00:07
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@gs-olive gs-olive added WIP Work is in progress, pull request should not be merged yet and removed WIP Work is in progress, pull request should not be merged yet labels Sep 19, 2023
@gs-olive gs-olive force-pushed the runtime_multi_gpu_fix branch 2 times, most recently from 42c6650 to 5a6d1e6 Compare September 19, 2023 00:46
@gs-olive gs-olive removed the WIP Work is in progress, pull request should not be merged yet label Sep 19, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@@ -10,6 +10,8 @@ namespace runtime {
c10::optional<RTDevice> get_most_compatible_device(const RTDevice& target_device) {
LOG_DEBUG("Target Device: " << target_device);
auto device_options = find_compatible_devices(target_device);
auto current_device = get_current_device();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably has performance implications since the current_device_call has significant perf overhead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this line would only run when the engine is being instantiated on a new device and not on every inference call, unless is_switch_required is entered, but this PR ensures that would not be the case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default constructor of RTDevice invalid device. Add Optional argument to get_most_compatible_device if RTDevice is missing then add get_current_device call.

- Fix issue where GPU ID of device compiled on was taking precedence
over GPU ID of device in context
- Add performance-considerate implementation of current device-getting
@gs-olive gs-olive force-pushed the runtime_multi_gpu_fix branch from 5a6d1e6 to 3df2b65 Compare September 26, 2023 23:49
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
3 participants