-
Notifications
You must be signed in to change notification settings - Fork 925
Add Benchmark Framework for ducktape #2030
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: master
Are you sure you want to change the base?
Conversation
🎉 All Contributor License Agreements have been signed. Ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive benchmark framework for Kafka producer testing in the ducktape test suite. The framework provides real-time performance metrics collection, validation against configurable bounds, and detailed reporting capabilities.
- Implements a complete MetricsCollector system with latency tracking, memory monitoring, and throughput analysis
- Enhances all existing ducktape tests with integrated benchmark metrics without breaking changes
- Adds configurable performance bounds validation with realistic thresholds (1k+ msg/s throughput)
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
File | Description |
---|---|
tests/ducktape/benchmark_metrics.py |
New comprehensive benchmark framework with MetricsCollector, MetricsBounds, and reporting utilities |
tests/ducktape/test_producer.py |
Enhanced all producer tests with integrated metrics collection and validation |
tests/ducktape/README.md |
Updated documentation to reflect new metrics capabilities and additional psutil dependency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
# Use quantiles for P95, P99 (more accurate than custom implementation) | ||
try: | ||
quantiles = statistics.quantiles(self.delivery_latencies, n=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing quantiles with n=100 for every summary is computationally expensive. Consider using a more efficient approach like numpy.percentile or caching the sorted data.
Copilot uses AI. Check for mistakes.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. I was debating if we should use something like locust for this.. might be worth switching to down the road but you kind of have to hack it to do any non-RESTful patterns for testing. e.g. https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/kafka_ex.py
except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess): | ||
# Handle edge cases where process might not exist or be accessible | ||
return None | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not catch generic Exception here and just let it boil up to be remediated
return None | ||
|
||
|
||
class MetricsBounds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO: load from config file?
latency_ms = (time.time() - send_times[msg_key]) * 1000 | ||
del send_times[msg_key] # Clean up | ||
else: | ||
latency_ms = 5.0 # Default latency if timing info not available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to just set to 0 or None
Let's touch up small things, get a merge then iterate / change things if we want later. I want to get this into the history so we can build abstractions above for simpler test definitions and swap the implementation details as needed / remove conflicts on future PRs |
What
Key Features:
Metrics Collected:
statistics.quantiles()
psutil
Files Added:
tests/ducktape/benchmark_metrics.py
- Complete benchmark frameworkFiles Modified:
tests/ducktape/test_producer.py
- Enhanced all tests with integrated metricstests/ducktape/README.md
- Updated documentationChecklist
References
Test & Review
# Run enhanced ducktape tests with integrated benchmarks ./tests/ducktape/run_ducktape_test.py