Skip to content

facebookresearch/HolisticTraceAnalysis

Repository files navigation

CircleCI codecov Docs License PRs Welcome

Holistic Trace Analysis

Holistic Trace Analysis (HTA), is a performance analysis tool to identify performance bottlenecks in distributed training workloads. HTA achieves this by analyzing traces collected through the PyTorch Profiler a.k.a. Kineto.

Features

HTA provides the following features:

  1. Temporal Breakdown - Breakdown of time taken by the GPUs in terms of time spent in computation, communication, memory events, and idle time across all ranks.
  2. Kernel Breakdown - Finds kernels with the longest duration on each rank.
  3. Kernel Duration Distribution - Distribution of average time taken by longest kernels across different ranks.
  4. Idle Time Breakdown - Breakdown of GPU idle time into waiting for the host, waiting for another kernel or attribution to an unknown cause.
  5. Communication Computation Overlap - Calculate the percentage of time when communication overlaps computation.
  6. Frequent CUDA Kernel Patterns - Find the CUDA kernels most frequently launched by any given PyTorch or user defined operator.
  7. CUDA Kernel Launch Statistics - Distributions of GPU kernels with very small duration, large duration, and excessive launch time.
  8. Augmented Counters (Queue length, Memory bandwidth) - Augmented trace files which provide insights into memory bandwidth utilized and number of outstanding operations on each CUDA stream.
  9. Trace Comparison - A trace comparison tool to identify and visualize the differences between traces.
  10. CUPTI Counter Analysis - An experimental API to get GPU performance counters. By attributing performance measurements from kernels to PyTorch operators roofline analysis can be performed and kernels can be optimized.

Installation

HTA runs on Linux and Mac with Python >= 3.8.

Setup a Conda environment (optional)

See here to install Miniconda.

Create the environment env_name

conda create -n env_name

Activate the environment

conda activate env_name

Deactivate the environment

conda deactivate

Install using PyPI (stable)

pip install HolisticTraceAnalysis

Install from source

git clone https://github.com/facebookresearch/HolisticTraceAnalysis.git
cd HolisticTraceAnalysis
git submodule update --init
pip install -r requirements.txt
pip install -e .

Documentation

Learn more about the features and the API from our documentation.

Usage

Data Preparation

All traces collected from a job must reside in a unique folder.

Analysis in a Jupyter notebook

Activate the Conda environment and launch a Jupyter notebook.

conda activate env_name
jupyter notebook

Import HTA, and create a TraceAnalysis object

from hta.trace_analysis import TraceAnalysis
analyzer = TraceAnalysis(trace_dir = "/path/to/folder/containing/the/traces")

Basic Usage

# Temporal breakdown
temporal_breakdown_df = analyzer.get_temporal_breakdown()

# Kernel breakdown
kernel_breakdown_df = analyzer.get_gpu_kernel_breakdown()

# Idle time breakdown
idle_time_df = analyzer.get_idle_time_breakdown()

# Communication computation overlap
comm_comp_overlap_df = analyzer.get_comm_comp_overlap()

# Frequent CUDA kernel patterns
frequent_patterns_df = analyzer.get_frequent_cuda_kernel_patterns(operator_name="aten::linear", output_dir="/new/trace/path")

# CUDA kernel launch statistics
cuda_launch_kernel_stats = analyzer.get_cuda_kernel_launch_stats()

# Memory bandwidth time series
memory_bw_series = analyzer.get_memory_bw_time_series()

# Memory bandwidth summary
memory_bw_summary = analyzer.get_memory_bw_summary()

# Queue length time series
ql_series = analyzer.get_queue_length_time_series()

# Queue length summary
ql_summary = analyzer.get_queue_length_summary()

For a detailed demo run the trace_analysis_demo and trace_diff_demo notebooks in the examples folder.

Advanced Usage

Logging Level

Logging level is set through a configuration file in HTA. The default logging level is set in hta/configs/logging.config and can be changed in the [logger_hta] section of the file. If needed, a different logging file can be configured to use by modifying hta/configs/trace_analyzer.json.

Repo Map

├── examples                       # folder containing demo notebooks
│         ├── ...
├── hta
│         ├── analyzers            # core logic for each analysis
│         │       ├── ...
│         ├── common               # code common to multiple analysis
│         │       ├── ...
│         ├── configs              # config files
│         │       ├── ...
│         ├── trace_analysis.py    # entrypoint for TraceAnalysis API
│         ├── trace_diff.py        # entrypoint for TraceDiff API
│         └── utils                # utility files
│                 └── ...
├── scripts                        # generic tools for traces
│         └── ...
│── tests                          # unittests
│         └── ...

Contributing

We welcome new contributions. If you plan to contribute new features or extensions, please first open an issue and discuss the feature with us. To learn more about how to contribute, see our contributing guidelines.

Please let us know if you encounter a bug by filing an issue.

The Team

HTA is currently maintained by: Anupam Bhatnagar, Brian Coutinho, Xizhou Feng, Yifan Liu, Sung-Han Lin and Louis Feng. Past contributors include Michael Acar and Yuzhen Huang.

License

Holistic Trace Analysis is licensed under the MIT License.