Support traceId-based r-values #417

trask · 2022-08-12T01:28:24Z

hey @oertl! I'm interested in using ConsistentProbabilityBasedSampler and ConsistentRateLimitingSampler, but I'm currently limited by backwards compatibility with an existing ecosystem which requires sampling to be based on a specific hash of the traceId (where lower hash values are more likely to be sampled-in).

I realize that I can't get true parity with any traceId-based ratio sampler, since the consistent samplers have some randomization in the decision making when the probability is not a power of 1/2, but I think I can live with that variance.

This PR explores an option to allow this kind of traceId-based decision when constructing the r-value.

I realize it's not an ideal end-state, but I think it could be a good bridge for us (and maybe others?) to the new consistent samplers.

...ent-sampling/src/main/java/io/opentelemetry/contrib/samplers/ConsistentAlwaysOffSampler.java

consistent-sampling/src/main/java/io/opentelemetry/contrib/samplers/ConsistentSampler.java

oertl · 2022-08-12T05:10:39Z

consistent-sampling/src/main/java/io/opentelemetry/contrib/samplers/ConsistentSampler.java

+   * Returns a {@link ConsistentSampler} that samples each span with a fixed probability.
+   *
+   * @param samplingProbability the sampling probability
+   * @param rValueGenerator the function to use for generating the r-value


Probably needs some more documentation. The rValueGenerator must map a trace ID to a value in range [0,62] with probabilities as specified in https://opentelemetry.io/docs/reference/specification/trace/tracestate-probability-sampling/#appendix. I would prefer a new functional interface instead of ToIntFunction<String> where all this can be documented.

I agree. A new interface could be also used to define the default behavior "s -> RandomGenerator.getDefault().numberOfLeadingZerosOfRandomLong()" and avoid repeating this code phrase in multiple places.

Great ideas, implemented. I put the default behavior in a separate class RValueGenerators to keep it package-private.

PeterF778 · 2022-08-12T17:11:27Z

consistent-sampling/src/main/java/io/opentelemetry/contrib/samplers/ConsistentSampler.java

@@ -212,8 +240,7 @@ public final SamplingResult shouldSample(

    // generate new r-value if not available
    if (!otelTraceState.hasValidR()) {
-      otelTraceState.setR(
-          Math.min(randomGenerator.numberOfLeadingZerosOfRandomLong(), OtelTraceState.getMaxR()));
+      otelTraceState.setR(Math.min(rValueGenerator.applyAsInt(traceId), OtelTraceState.getMaxR()));


I'd love to see this approach made more general with rValueGenerator taking not only traceId, but also the Attributes as arguments.
The use case for this is to assign the same rValue to a group of traces, so they are likely to survive sampling as an intact group - while still ensuring the correct statistical distribution of the rValues over the whole trace population.

oertl · 2022-08-13T04:05:10Z

consistent-sampling/src/main/java/io/opentelemetry/contrib/samplers/RValueGenerator.java

+ *   <tr><td>59</td><td>2**-60</td></tr>
+ *   <tr><td>60</td><td>2**-61</td></tr>
+ *   <tr><td>61</td><td>2**-62</td></tr>
+ *   <tr><td>62</td><td>2**-62</td></tr>


I think the last case should include all values >= 62, because the ConsistentSampler treats larger values than 62 like 62 anyway. Otherwise, randomGenerator.numberOfLeadingZerosOfRandomLong() would not conform to this definition and would have to be replaced everywhere by Math.min(62, randomGenerator.numberOfLeadingZerosOfRandomLong()).

Support traceId-based r-values

a0a57e4

github-actions bot assigned oertl and PeterF778 Aug 12, 2022

github-actions bot requested review from oertl and PeterF778 August 12, 2022 01:28

trask commented Aug 12, 2022

View reviewed changes

...ent-sampling/src/main/java/io/opentelemetry/contrib/samplers/ConsistentAlwaysOffSampler.java Outdated Show resolved Hide resolved

oertl reviewed Aug 12, 2022

View reviewed changes

PeterF778 reviewed Aug 12, 2022

View reviewed changes

trask added 5 commits August 12, 2022 13:58

Interface

d0d7dbd

Fix javadoc

49d8714

Spotless

5259ee6

More

3302ee2

Clean

71c5a32

trask marked this pull request as ready for review August 12, 2022 22:22

trask requested a review from a team August 12, 2022 22:22

oertl reviewed Aug 16, 2022

View reviewed changes

Review

a5b569c

oertl approved these changes Aug 18, 2022

View reviewed changes

PeterF778 approved these changes Aug 18, 2022

View reviewed changes

mateuszrzeszutek approved these changes Aug 19, 2022

View reviewed changes

trask merged commit 71cac47 into open-telemetry:main Aug 19, 2022

trask deleted the trace-id-based-r-values branch August 19, 2022 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support traceId-based r-values #417

Support traceId-based r-values #417

trask commented Aug 12, 2022

oertl Aug 12, 2022

PeterF778 Aug 12, 2022

trask Aug 12, 2022

PeterF778 Aug 12, 2022 •

edited

Loading

oertl Aug 13, 2022

Support traceId-based r-values #417

Support traceId-based r-values #417

Conversation

trask commented Aug 12, 2022

oertl Aug 12, 2022

Choose a reason for hiding this comment

PeterF778 Aug 12, 2022

Choose a reason for hiding this comment

trask Aug 12, 2022

Choose a reason for hiding this comment

PeterF778 Aug 12, 2022 • edited Loading

Choose a reason for hiding this comment

oertl Aug 13, 2022

Choose a reason for hiding this comment

PeterF778 Aug 12, 2022 •

edited

Loading