Brave 5.13
Brave 5.13 makes it safer to try emerging trace header formats.
(Http|Messaging|Rpc)Tracing.propagation()
This is an advanced topic about how propagation (ex which headers are sent or received) work. In summary, it is now easier to have one library, ex gRPC, accept a different format than another.
Brave 4 was released almost 4 years ago. Not only did Brave 4 support multiple instances of differently configured tracers, it allowed each to configurePropagation
differently. For example, we build-in support for different B3 formats, including the more efficient single header variant. For years, sites could use alternate formats such as AWS, Stackdriver and emerging formats like W3C. Typically the bespoke formats are attempted, and if any problem we use B3.
Recently, we've learned some sites are being pushed into a less efficient and more complex W3C trace context format. This is caused by reasons including affinity for something called a standard, and defaults in some libraries. For example, OpenTelemetry includes B3 propagation, but they chose to disable it by default. This choice isn't uniform in OpenTelemetry: other distributions such as Amazon's defaults to interop with B3.
Before, most would try to change the tracer-scoped propagation format in response to this, that or mechanically convert "traceparent" to "b3". However, there's a problem with assigning Tracing.propagation()
. It carries any penalty of performance and instability to all communication. This is too broad, as B3 is a de-facto standard. Only certain upstream and downstream services would disable it entirely. Before, we didn't have a way for users to choose what to do except on a per-tracer basis.
Now, you can isolate unstable or inefficient formats to only libraries that need them.
Ex You can choose to use W3C trace-context, but only for a specific gRPC client
grpcTracing = GrpcTracing.create(rpcTracing.toBuilder().propagation(traceContextPropagation).build());
channel = ManagedChannelBuilder.forAddress("something_that_only_talks_w3c", serverPort)
.intercept(grpcTracing.newClientInterceptor())
...
While this example is about gRPC, it hints that you can change any library or the entire RPC subsystem while leaving everything else alone. You can also use this approach to disable baggage. As these concerns are uniform, we added them to all our major abstractions: HttpTracing
, MessagingTracing
and RpcTracing
.
Thanks very much for @dimi-nk who helped us identify problems they face in header diversity. While imperfect, we hope this helps and will continue work to reduce pain in tracing.
Brave no longer imports Maven Bill of Materials (BOM)
End users can opt-in to io.zipkin.brave:brave-bom
to pin our versions, but we will no longer use tools like BOMs for internal convenience.
Our parent project formerly imported netty-bom
for our convenience. This allowed our tests to not download several similar versions of netty. However, this leaked a detail to those using brave's core library. Simply depending on io.zipkin.brave:brave
would download that BOM. Even if it didn't impact anything, it causes confusion as to why an unrelated library's file is being downloaded. In summary our internal convenience should not cause confusion for others. Hence, all core libraries no longer import boms, and neither do transitively (parents).
The build is more resilient and faster
We had numerous problems due to rate limiting and in some cases CI service shutoff completely. A few top level changes led by @adriancole allowed the project to resume functioning from a test and deployment perspective.
- The build now uses GitHub Action workflows
- The build now publishes to Sonatype directly instead of intermediating through a service.
- The build no longer depends on Docker Hub (docker.io) images as that can trigger rate limits for us or forks
Smaller updates
- @adriancole fixed "grpc-trace-bin" aka census propagation
- @anuraaga added "leaked all the way until GC" to
StrictScopeDecorator
- @rgamez fixed a problem in p6spy that constrained
zipkinServiceName
to a smaller character set than it should allow. - @m50d fixed a problem where setting Kafka headers marked read-only could crash a request (raise an exception)
Sidebar on propagation isolation
Despite all our work, life in tracing is becoming more difficult now. For example, the main distribution of OpenTelemetry chose to only propagate their W3C trace-context format. In other words they disable B3 by default. Not only is this format less efficient than b3 single, it is more complicated. Most tools don't implement the tracestate part. The most common practice is to blindly copy an unvalidated string into it. Lack of validation in a primary trace context field means easy bugs that can propagate across the network. Receivers have to expect and handle more malformed use cases due to the flexibility allowed in the spec and practices such as these. This means a fast moving library, almost always <1.0. Use of an unstable format and an unstable library are two problems, not one, and with different implications. For example, if anything <1.0 is used in tracing, it should be re-packaged with tools like "maven-shade-plugin" in order to eliminate compatibility and upgrade problems. Isolating entry-points into these unstable areas of code and communication is the safest way out. This is why we broke our already flexible propagation system into parts, so that users can isolate unstable headers to only where they are used.