Add Benchmark Framework for ducktape #2030

k-raina · 2025-08-21T12:44:40Z

What

Key Features:

MetricsCollector: Real-time performance metrics collection with latency tracking, memory monitoring, and throughput analysis
MetricsBounds: Configurable performance thresholds with automatic validation
Enhanced Tests: All existing ducktape tests now include integrated benchmark metrics
Rich Reporting: Detailed performance reports with P50/P95/P99 latencies, memory usage, and batch efficiency

Metrics Collected:

Throughput: Send/delivery rates (msg/s, MB/s) with realistic bounds (1k+ msg/s)
Latency: P50/P95/P99 percentiles using Python's statistics.quantiles()
Memory: Peak usage and growth tracking via psutil
Efficiency: Messages per poll, buffer utilization, per-topic/partition breakdowns
Reliability: Success/error rates with comprehensive validation

Files Added:

tests/ducktape/benchmark_metrics.py - Complete benchmark framework

Files Modified:

tests/ducktape/test_producer.py - Enhanced all tests with integrated metrics
tests/ducktape/README.md - Updated documentation

Checklist

Contains customer facing changes? Including API/behavior changes
- No breaking changes - all existing tests enhanced with metrics, not replaced
Did you add sufficient unit test and/or integration test coverage for this PR?
- Yes - all existing ducktape tests now include comprehensive metrics validation
- Validated with 348k+ msg/s throughput and sub-100ms P95 latency

References

Original TODO: Initialize ducktape setup #2021 (comment)

Test & Review

# Run enhanced ducktape tests with integrated benchmarks
./tests/ducktape/run_ducktape_test.py

confluent-cla-assistant · 2025-08-21T12:44:52Z

🎉 All Contributor License Agreements have been signed. Ready to merge.
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

Copilot

Pull Request Overview

This PR adds a comprehensive benchmark framework for Kafka producer testing in the ducktape test suite. The framework provides real-time performance metrics collection, validation against configurable bounds, and detailed reporting capabilities.

Implements a complete MetricsCollector system with latency tracking, memory monitoring, and throughput analysis
Enhances all existing ducktape tests with integrated benchmark metrics without breaking changes
Adds configurable performance bounds validation with realistic thresholds (1k+ msg/s throughput)

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
`tests/ducktape/benchmark_metrics.py`	New comprehensive benchmark framework with MetricsCollector, MetricsBounds, and reporting utilities
`tests/ducktape/test_producer.py`	Enhanced all producer tests with integrated metrics collection and validation
`tests/ducktape/README.md`	Updated documentation to reflect new metrics capabilities and additional psutil dependency

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tests/ducktape/test_producer.py

tests/ducktape/benchmark_metrics.py

tests/ducktape/test_producer.py

Copilot · 2025-08-21T12:47:32Z

tests/ducktape/benchmark_metrics.py

+
+            # Use quantiles for P95, P99 (more accurate than custom implementation)
+            try:
+                quantiles = statistics.quantiles(self.delivery_latencies, n=100)


Computing quantiles with n=100 for every summary is computationally expensive. Consider using a more efficient approach like numpy.percentile or caching the sorted data.

sonarqube-confluent · 2025-08-21T16:10:32Z

Analysis Details

5 Issues

0 Bugs
0 Vulnerabilities
5 Code Smells

Coverage and Duplications

No coverage information (66.10% Estimated after merge)
No duplication information (5.60% Estimated after merge)

Project ID: confluent-kafka-python

View in SonarQube

MSeal

Minor comments. I was debating if we should use something like locust for this.. might be worth switching to down the road but you kind of have to hack it to do any non-RESTful patterns for testing. e.g. https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/kafka_ex.py

MSeal · 2025-08-24T21:40:10Z

tests/ducktape/benchmark_metrics.py

+        except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
+            # Handle edge cases where process might not exist or be accessible
+            return None
+        except Exception:


I would not catch generic Exception here and just let it boil up to be remediated

MSeal · 2025-08-24T21:40:43Z

tests/ducktape/benchmark_metrics.py

+            return None
+
+
+class MetricsBounds:


Maybe add a TODO: load from config file?

MSeal · 2025-08-24T22:12:40Z

tests/ducktape/test_producer.py

+                    latency_ms = (time.time() - send_times[msg_key]) * 1000
+                    del send_times[msg_key]  # Clean up
+                else:
+                    latency_ms = 5.0  # Default latency if timing info not available


maybe better to just set to 0 or None

MSeal · 2025-08-24T22:15:46Z

Let's touch up small things, get a merge then iterate / change things if we want later. I want to get this into the history so we can build abstractions above for simpler test definitions and swap the implementation details as needed / remove conflicts on future PRs

Add metrics collertor and bounds

762a258

Copilot AI review requested due to automatic review settings August 21, 2025 12:44

k-raina requested review from MSeal and a team as code owners August 21, 2025 12:44

Copilot AI reviewed Aug 21, 2025

View reviewed changes

This comment has been minimized.

# to view

Address copilot recommendations

cd1296e

MSeal requested changes Aug 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Benchmark Framework for ducktape #2030

Add Benchmark Framework for ducktape #2030

k-raina commented Aug 21, 2025

Uh oh!

confluent-cla-assistant bot commented Aug 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

This comment has been minimized.

sonarqube-confluent bot commented Aug 21, 2025

Uh oh!

MSeal left a comment

Uh oh!

MSeal Aug 24, 2025

Uh oh!

MSeal Aug 24, 2025

Uh oh!

MSeal Aug 24, 2025

Uh oh!

MSeal commented Aug 24, 2025

Uh oh!

Uh oh!

Add Benchmark Framework for ducktape #2030

Are you sure you want to change the base?

Add Benchmark Framework for ducktape #2030

Conversation

k-raina commented Aug 21, 2025

What

Checklist

References

Test & Review

Uh oh!

confluent-cla-assistant bot commented Aug 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

sonarqube-confluent bot commented Aug 21, 2025

Analysis Details

5 Issues

Coverage and Duplications

Uh oh!

MSeal left a comment

Choose a reason for hiding this comment

Uh oh!

MSeal Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

MSeal Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

MSeal Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

MSeal commented Aug 24, 2025

Uh oh!

Uh oh!