-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
chore(sampling): change trace sampling formula #12950
base: main
Are you sure you want to change the base?
Conversation
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 229 ± 2 ms. The average import time from base is: 231 ± 2 ms. The import time difference between this PR and base is: -2.39 ± 0.09 ms. Import time breakdownThe following import paths have shrunk:
|
9b1b71b
to
4425185
Compare
BenchmarksBenchmark execution time: 2025-04-11 14:38:34 Comparing candidate commit 9ef48b7 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 496 metrics, 2 unstable metrics. |
4425185
to
7d29178
Compare
5e4379f
to
6a0d361
Compare
ab957c4
to
70c37c8
Compare
f55a39e
to
1479752
Compare
remove default USER_KEEP from http extractor
6ded685
to
3d56e51
Compare
8f146c7
to
9ef48b7
Compare
@@ -281,6 +283,8 @@ def _set_sampling_tags(span, sampled, sample_rate, mechanism): | |||
# Set the sampling priority | |||
priorities = SAMPLING_MECHANISM_TO_PRIORITIES[mechanism] | |||
priority_index = _KEEP_PRIORITY_INDEX if sampled else _REJECT_PRIORITY_INDEX | |||
|
|||
span.set_metric(_SAMPLING_PRIORITY_KEY, priorities[priority_index]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set the sampling priority on the span directly when we do it on the context below, which eventually ends up being applied to the span? https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/_trace/processor/__init__.py#L217
@@ -310,7 +310,10 @@ def _extract(headers): | |||
headers, | |||
default="0", | |||
) | |||
sampling_priority = _extract_header_value(POSSIBLE_HTTP_HEADER_SAMPLING_PRIORITIES, headers, default=USER_KEEP) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems like it probably shouldn't be with the behavior. Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Just one question!
This PR changes the formulas used to sample traces & spans in order to have a consistent one across languages.
Two formulas were used:
((trace_id * KNUTH_FACTOR) % 2^64 -1) <= sampling_rate * (2^64 -1)
inddtrace/_trace/sampler.py
&ddtrace/_trace/sampling_rule.py
((trace_id * KNUTH_FACTOR) % 2^64) <= sampling_rate * (2^64)
inddtrace/internal/sampling.py
Both have been changed to
((trace_id * KNUTH_FACTOR) % 2^64) <= sampling_rate * (2^64 -1)
There was an hardcoded sampling decision in the http header extractor whenever we were receiving a request with a non-empty trace-id preventing us from applying our own sampling logic.
This PR will allow us to enable the sampling rates system tests.
Checklist
Reviewer Checklist