Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

PyTorch profiler integration #1297

Merged
merged 22 commits into from
Dec 15, 2021

Conversation

shrinath-suresh
Copy link
Contributor

Signed-off-by: Shrinath Suresh shrinath@ideas2it.com

Description

This pull request contains changes in base handler to integrate PyTorch profiler - https://pytorch.org/blog/introducing-pytorch-profiler-the-new-and-improved-performance-tool/ with torchserve

When user opts for the profiler, the preprocess/inference/postprocess calls are profiled and the traces are saved as a json file . User can load the trace json into pytorch profiler and view the detailed output.

This feature will help users to debug the issues in preprocess or inference.

the basis structure of the implementation is in place. More detailed requirements and unit tests to be captured. once it is done, WIP tag will be removed.

Initial Requirements:

  1. The profiler integration is done only for the custom handler. User can opt for the profiler flow by setting the class variable.
  2. User can override torch.profiler.profiler arguments - This helps user to add conditions based on the environment. For example: by default, the profiler is set to trace only CPU. In case, user wants to profiler CUDA activity as well, profiler arguments can be overridden using class variable.
  3. Saving the profiler output and ability to load the profiler output using pytorch profiler plugin - https://github.com/pytorch/kineto

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the tests [UT/IT] that you ran to verify your changes and relevent result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • UT/IT execution results

Yet to be added

  • Logs

Checklist:

  • Have you added tests that prove your fix is effective or that this feature works?
  • New and existing unit tests pass locally with these changes?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 1da9607
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 1da9607
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 1da9607
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@HamidShojanazeri HamidShojanazeri self-requested a review October 27, 2021 19:02
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: c8676c6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: c8676c6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: c8676c6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@msaroufim msaroufim requested a review from yqhu November 2, 2021 21:43
Copy link

@yqhu yqhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>
Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 259899a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 259899a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 259899a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -19,37 +19,8 @@ def teardown_module(module):
test_utils.torchserve_cleanup()


def model_archiver_command_builder(model_name=None, version=None, model_file=None, serialized_file=None, handler=None, extra_files=None, force=False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this method to test_utils.py to reuse

@msaroufim msaroufim self-requested a review November 16, 2021 06:39
Copy link
Collaborator

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shrinath-suresh, it looks great. I think adding some notes to docs probably in configuration would be helpful.

self.profiler_args["record_shapes"] = True

if "on_trace_ready" not in self.profiler_args:
result_path = "/tmp/pytorch_profiler"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrinath-suresh I wonder if we give this option to user to choose the result_path and if not provided we fall back to default path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also adding/extending one of the examples, like Resnet or BERT on using this profilers would be very helpful to show the usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HamidShojanazeri Sure Hamid.. I will pick any one example (mnist or resnet) to showcase profiler integration and update the readme accordingly. Thanks for the suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HamidShojanazeri Regarding the profiler path.. By default the outputs get stored into "/tmp/pytorch_profiler". User can override this behaviour like this.

@HamidShojanazeri Do you have any suggestions for default path ? At base_handler.py i couldn't find any default path (model store?) to store profiler trace files. /tmp/pytorch_profiler looks hardcoded.. Any suggestion ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the profiler path I'd suggest using the same path the pytorch profiler uses by default. If 2 succesive runs of a profile just create new directories in the /tmp/pytorch_profiler then there is no need to overwrite but if they do then it's best to allow users to configure what the output path is

Copy link
Collaborator

@HamidShojanazeri HamidShojanazeri Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will pick any one example (mnist or resnet) to showcase profiler integration and update the readme accordingly. Thanks for the suggestion

let's have a separate PR on the example.

if not self._is_explain():
output = self.inference(data_preprocess)
output = self.postprocess(output)
is_profiler_enabled = os.environ.get("ENABLE_TORCH_PROFILER", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrinath-suresh I wonder if this could be added to the config.properties, this would help to make sure all the configs are gathered in one place. An example of adding to the config can be found here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chauhang your thoughts on this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Hamid, it's a simple change that'll make sure people don't need to think about both OS environment variables and config.properties. It's how we manage all configs. Let us know if you have any questions about how this would work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shrinath-suresh any update on this? it's much cleaner for us to use config.properties so users can discover all the available configuration options. That way it'll be much easier for people to discover this feature too

@shrinath-suresh
Copy link
Contributor Author

Thanks @shrinath-suresh, it looks great. I think adding some notes to docs probably in configuration would be helpful.

Thanks for your review comments. I will work on addressing them

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: dd0f807
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: dd0f807
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: dd0f807
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: c916d3c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: c916d3c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: a0e1539
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: a0e1539
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: a0e1539
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 19a5211
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 19a5211
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 19a5211
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@chauhang chauhang changed the title [WIP] PyTorch profiler integration PyTorch profiler integration Dec 13, 2021
export ENABLE_TORCH_PROFILER=true
```

Note: Ensure`enable_envvars_config` is set to `true` in torchserve config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrinath-suresh is this gonna be added to the config.properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note here, since the env based setup doesnt work unless user sets enable_envvars_config to true in config.properties.

I will add config.properties file in this example and update the instructions accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrinath-suresh as mentioned in previous comment, you can add a setting to the config following this example. This basically would not work for now.

Install torch-tb-profiler and run the following command to view the results in UI

```
tensorboard --logdir /tmp/pytorch_profiler/mnist/
Copy link
Collaborator

@HamidShojanazeri HamidShojanazeri Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrinath-suresh in the handler below seems, "on_trace_ready" to override the path is missing.

Copy link
Contributor Author

@shrinath-suresh shrinath-suresh Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on_trace_ready is not a mandatory argument. If user provides the argument, the trace files will get generated into the user specified location. if the user doesn't provide the argument, the trace files get generated inside "/tmp/pytorch_profiler/<model_name>" folder. Its handled here

For demonstration, we can add on_trace_ready in handler file. Again, we cannot set the on_trace_ready to os.getcwd() as it will be a temp directory in run time. May be we can try use home directory or something .. Your thoughts ?

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>
Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com>
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: d9cbaff
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: a3abd42
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: a3abd42
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: d9cbaff
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: d9cbaff
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: a3abd42
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 66ae377
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 66ae377
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 66ae377
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 0807e3c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-cpu
  • Commit ID: 0807e3c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 0807e3c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@chauhang chauhang merged commit 04c873f into pytorch:master Dec 15, 2021
@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-win
  • Commit ID: 0807e3c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-neo-ci-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: torch-serve-build-gpu
  • Commit ID: 0807e3c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Collaborator

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrinath-suresh please trying adding the config using the example shared in the comments.

export ENABLE_TORCH_PROFILER=true
```

Note: Ensure`enable_envvars_config` is set to `true` in torchserve config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrinath-suresh as mentioned in previous comment, you can add a setting to the config following this example. This basically would not work for now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants