Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Quality of life smart validate Improvements #458

Merged
merged 3 commits into from
Jan 20, 2024

Conversation

MattToast
Copy link
Member

Quality of life smart validate improvements:

  • Set CUDA_VISIBLE_DEVICES environment variable within smart validate prior to importing any ML deps to prevent false negatives on multi-GPU systems
  • Move SmartRedis logs from standard out to dedicated log file in the validation temporary directory
  • Suppress sklearn deprecation warning by pinning KMeans constructor argument

- Set `CUDA_VISIBLE_DEVICES` environment variable within `smart
  validate` prioro to importing any ML deps to prevent false negatives
  on multi-GPU systems
- Move SmartRedis logs from standard out to dedicated log file in the
  validation temporary directory
- Suppress `sklearn` deprecation warning by pinning `KMeans` constructor
  argument
@MattToast MattToast added the type: usability Issues related to ease of use label Jan 19, 2024
@MattToast MattToast requested review from ashao and al-rigazzi January 19, 2024 18:04
@MattToast MattToast self-assigned this Jan 19, 2024
Copy link
Collaborator

@al-rigazzi al-rigazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I love quality of life improvements!

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c38f73f) 90.42% compared to head (53fa04f) 90.80%.
Report is 3 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #458      +/-   ##
===========================================
+ Coverage    90.42%   90.80%   +0.38%     
===========================================
  Files           60       60              
  Lines         3738     3830      +92     
===========================================
+ Hits          3380     3478      +98     
+ Misses         358      352       -6     

see 13 files with indirect coverage changes

Copy link
Collaborator

@ashao ashao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for finding this

if with_pt:
logger.info("Verifying Torch Backend")
_test_torch_install(client, device)
if with_tf:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fixes the problem we found, could you move the Tensorflow test to the very end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely!!

@MattToast MattToast merged commit 35973b5 into CrayLabs:develop Jan 20, 2024
26 checks passed
@MattToast MattToast deleted the val-w-many-gpu branch January 22, 2024 23:10
ashao pushed a commit to ashao/SmartSim that referenced this pull request Jan 27, 2024
Quality of life `smart validate` improvements:
- Set `CUDA_VISIBLE_DEVICES` environment variable within `smart
  validate` prior to importing any ML deps to prevent false negatives on
  multi-GPU systems
- Move SmartRedis logs from standard out to dedicated log file in the
  validation temporary directory
- Suppress `sklearn` deprecation warning by pinning `KMeans` constructor
  argument
- Move TF test to last as TF may reserve the GPUs it uses

[ committed by @MattToast ]
[ reviewed by @al-rigazzi @ashao ]
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
type: usability Issues related to ease of use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants