Testing Log Probability Methods with GPU

The script test_log_prob_fn benchmarks different methods for calculating log probabilities (naive_method, efficient_method, less_efficient_method) using both float32 and bfloat16 precision formats on a GPU.

Overview

The script performs benchmarks on three different methods for calculating log probabilities and reports the time and peak GPU memory usage for each method:

Naive Method
Efficient Method
Less Efficient Method

It runs the tests for two types of precision:

float32
bfloat16

Test Functions

There are two main test functions in the script:

test_log_prob_methods_float32: This function benchmarks the three methods using float32 precision.
test_log_prob_methods_bfloat16: This function benchmarks the three methods using bfloat16 precision.

Workflow

Parameters Setup: The tests are executed using a batch size of 16, a sequence length of 1024, and a dictionary size of 32768. The data is randomly generated for benchmarking.
GPU Memory Tracking: The GPU memory is tracked using torch.cuda.max_memory_allocated() to measure the peak memory usage during the benchmark.
Method Execution: Each method is run multiple times (10 iterations) to measure the execution time and to ensure stability.
Results Validation: The results from each method are compared with the Naive method to check for correctness, with a tolerance value applied for bfloat16 precision.

Benchmarked Methods:

Naive Method: The basic, unoptimized method for calculating log probabilities.
Efficient Method: An optimized version of the naive method to reduce memory usage.
Less Efficient Method: A method with a higher memory consumption compared to the efficient method.

GPU Memory Usage:

The function get_gpu_memory() is used to fetch the current peak GPU memory usage during the execution of each method.

Output Example

Testing with `float32` Precision

==================================================
Testing with float32 precision
==================================================

Naive:
Time: 5.07 ± 0.83 ms
Peak GPU Memory: 4096.31 MB

Efficient:
Time: 15.76 ± 21.19 ms
Peak GPU Memory: 2176.44 MB

Less_Efficient:
Time: 14.63 ± 5.06 ms
Peak GPU Memory: 4608.39 MB
PASSED [100%]

Testing with `bfloat16` Precision

==================================================
Testing with bfloat16 precision
==================================================

Naive:
Time: 1.42 ± 0.00 ms
Peak GPU Memory: 2048.22 MB

Efficient:
Time: 1.83 ± 0.01 ms
Peak GPU Memory: 1152.25 MB

Less_Efficient:
Time: 8.67 ± 0.07 ms
Peak GPU Memory: 2560.27 MB

Results Analysis

Execution Time
- The Naive method is the fastest in both precisions but sacrifices memory efficiency.
- The Efficient method balances memory usage and execution time, though it is slower than the Naive method.
- The Less Efficient method is slower than both the Naive and Efficient methods and consumes the most memory, making it the least desirable for both speed and memory usage.
GPU Memory
- The Efficient method consistently uses the least memory, especially in the bfloat16 precision where it achieves the lowest memory consumption.
- The Naive method uses more memory than the Efficient method but has lower execution times.
- The Less Efficient method consumes the most memory in both precision formats.

How to Run the Tests

To run the tests:

pytest -v -s tests/test_log_prob_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Testing Log Probability Methods with GPU

Overview

Test Functions

Workflow

Benchmarked Methods:

GPU Memory Usage:

Output Example

Testing with `float32` Precision

Testing with `bfloat16` Precision

Results Analysis

How to Run the Tests

Files

README.md

Latest commit

History

README.md

File metadata and controls

Testing Log Probability Methods with GPU

Overview

Test Functions

Workflow

Benchmarked Methods:

GPU Memory Usage:

Output Example

Testing with float32 Precision

Testing with bfloat16 Precision

Results Analysis

How to Run the Tests

Testing with `float32` Precision

Testing with `bfloat16` Precision