-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Support traceId-based r-values #417
Conversation
...ent-sampling/src/main/java/io/opentelemetry/contrib/samplers/ConsistentAlwaysOffSampler.java
Outdated
Show resolved
Hide resolved
consistent-sampling/src/main/java/io/opentelemetry/contrib/samplers/ConsistentSampler.java
Outdated
Show resolved
Hide resolved
* Returns a {@link ConsistentSampler} that samples each span with a fixed probability. | ||
* | ||
* @param samplingProbability the sampling probability | ||
* @param rValueGenerator the function to use for generating the r-value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably needs some more documentation. The rValueGenerator
must map a trace ID to a value in range [0,62] with probabilities as specified in https://opentelemetry.io/docs/reference/specification/trace/tracestate-probability-sampling/#appendix. I would prefer a new functional interface instead of ToIntFunction<String>
where all this can be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. A new interface could be also used to define the default behavior "s -> RandomGenerator.getDefault().numberOfLeadingZerosOfRandomLong()" and avoid repeating this code phrase in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great ideas, implemented. I put the default behavior in a separate class RValueGenerators
to keep it package-private.
@@ -212,8 +240,7 @@ public final SamplingResult shouldSample( | |||
|
|||
// generate new r-value if not available | |||
if (!otelTraceState.hasValidR()) { | |||
otelTraceState.setR( | |||
Math.min(randomGenerator.numberOfLeadingZerosOfRandomLong(), OtelTraceState.getMaxR())); | |||
otelTraceState.setR(Math.min(rValueGenerator.applyAsInt(traceId), OtelTraceState.getMaxR())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see this approach made more general with rValueGenerator taking not only traceId, but also the Attributes as arguments.
The use case for this is to assign the same rValue to a group of traces, so they are likely to survive sampling as an intact group - while still ensuring the correct statistical distribution of the rValues over the whole trace population.
* <tr><td>59</td><td>2**-60</td></tr> | ||
* <tr><td>60</td><td>2**-61</td></tr> | ||
* <tr><td>61</td><td>2**-62</td></tr> | ||
* <tr><td>62</td><td>2**-62</td></tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last case should include all values >= 62
, because the ConsistentSampler
treats larger values than 62 like 62 anyway. Otherwise, randomGenerator.numberOfLeadingZerosOfRandomLong()
would not conform to this definition and would have to be replaced everywhere by Math.min(62, randomGenerator.numberOfLeadingZerosOfRandomLong())
.
hey @oertl! I'm interested in using ConsistentProbabilityBasedSampler and ConsistentRateLimitingSampler, but I'm currently limited by backwards compatibility with an existing ecosystem which requires sampling to be based on a specific hash of the traceId (where lower hash values are more likely to be sampled-in).
I realize that I can't get true parity with any traceId-based ratio sampler, since the consistent samplers have some randomization in the decision making when the probability is not a power of 1/2, but I think I can live with that variance.
This PR explores an option to allow this kind of traceId-based decision when constructing the r-value.
I realize it's not an ideal end-state, but I think it could be a good bridge for us (and maybe others?) to the new consistent samplers.