Releases: openzipkin/brave
Brave 5.14.1
Brave 5.13.11
Brave 5.13.10
What's Changed
Full Changelog: 5.13.9...5.13.10
Brave 5.13.9
Brave 5.13.8
Brave 5.13.7
Brave 5.13.6
Brave 5.13.5
What's Changed
- Update log4j to 2.17 to address CVE-2021-45105. by @loganaden in #1319
New Contributors
- @loganaden made their first contribution in #1319
Full Changelog: 5.13.4...5.13.5
Brave 5.13
Brave 5.13 makes it safer to try emerging trace header formats.
(Http|Messaging|Rpc)Tracing.propagation()
This is an advanced topic about how propagation (ex which headers are sent or received) work. In summary, it is now easier to have one library, ex gRPC, accept a different format than another.
Brave 4 was released almost 4 years ago. Not only did Brave 4 support multiple instances of differently configured tracers, it allowed each to configurePropagation
differently. For example, we build-in support for different B3 formats, including the more efficient single header variant. For years, sites could use alternate formats such as AWS, Stackdriver and emerging formats like W3C. Typically the bespoke formats are attempted, and if any problem we use B3.
Recently, we've learned some sites are being pushed into a less efficient and more complex W3C trace context format. This is caused by reasons including affinity for something called a standard, and defaults in some libraries. For example, OpenTelemetry includes B3 propagation, but they chose to disable it by default. This choice isn't uniform in OpenTelemetry: other distributions such as Amazon's defaults to interop with B3.
Before, most would try to change the tracer-scoped propagation format in response to this, that or mechanically convert "traceparent" to "b3". However, there's a problem with assigning Tracing.propagation()
. It carries any penalty of performance and instability to all communication. This is too broad, as B3 is a de-facto standard. Only certain upstream and downstream services would disable it entirely. Before, we didn't have a way for users to choose what to do except on a per-tracer basis.
Now, you can isolate unstable or inefficient formats to only libraries that need them.
Ex You can choose to use W3C trace-context, but only for a specific gRPC client
grpcTracing = GrpcTracing.create(rpcTracing.toBuilder().propagation(traceContextPropagation).build());
channel = ManagedChannelBuilder.forAddress("something_that_only_talks_w3c", serverPort)
.intercept(grpcTracing.newClientInterceptor())
...
While this example is about gRPC, it hints that you can change any library or the entire RPC subsystem while leaving everything else alone. You can also use this approach to disable baggage. As these concerns are uniform, we added them to all our major abstractions: HttpTracing
, MessagingTracing
and RpcTracing
.
Thanks very much for @dimi-nk who helped us identify problems they face in header diversity. While imperfect, we hope this helps and will continue work to reduce pain in tracing.
Brave no longer imports Maven Bill of Materials (BOM)
End users can opt-in to io.zipkin.brave:brave-bom
to pin our versions, but we will no longer use tools like BOMs for internal convenience.
Our parent project formerly imported netty-bom
for our convenience. This allowed our tests to not download several similar versions of netty. However, this leaked a detail to those using brave's core library. Simply depending on io.zipkin.brave:brave
would download that BOM. Even if it didn't impact anything, it causes confusion as to why an unrelated library's file is being downloaded. In summary our internal convenience should not cause confusion for others. Hence, all core libraries no longer import boms, and neither do transitively (parents).
The build is more resilient and faster
We had numerous problems due to rate limiting and in some cases CI service shutoff completely. A few top level changes led by @adriancole allowed the project to resume functioning from a test and deployment perspective.
- The build now uses GitHub Action workflows
- The build now publishes to Sonatype directly instead of intermediating through a service.
- The build no longer depends on Docker Hub (docker.io) images as that can trigger rate limits for us or forks
Smaller updates
- @adriancole fixed "grpc-trace-bin" aka census propagation
- @anuraaga added "leaked all the way until GC" to
StrictScopeDecorator
- @rgamez fixed a problem in p6spy that constrained
zipkinServiceName
to a smaller character set than it should allow. - @m50d fixed a problem where setting Kafka headers marked read-only could crash a request (raise an exception)
Sidebar on propagation isolation
Despite all our work, life in tracing is becoming more difficult now. For example, the main distribution of OpenTelemetry chose to only propagate their W3C trace-context format. In other words they disable B3 by default. Not only is this format less efficient than b3 single, it is more complicated. Most tools don't implement the tracestate part. The most common practice is to blindly copy an unvalidated string into it. Lack of validation in a primary trace context field means easy bugs that can propagate across the network. Receivers have to expect and handle more malformed use cases due to the flexibility allowed in the spec and practices such as these. This means a fast moving library, almost always <1.0. Use of an unstable format and an unstable library are two problems, not one, and with different implications. For example, if anything <1.0 is used in tracing, it should be re-packaged with tools like "maven-shade-plugin" in order to eliminate compatibility and upgrade problems. Isolating entry-points into these unstable areas of code and communication is the safest way out. This is why we broke our already flexible propagation system into parts, so that users can isolate unstable headers to only where they are used.
Brave 5.12
Brave 5.12 introduces a powerful new way to handle data, completes our RPC abstraction, drops our Zipkin dependency and pours our thinking into RATIONALE docs.
There's a lot in this release for those doing advanced things like managing configuration tools or implementing custom tracing backends. Most users will do nothing except upgrade.
If you are using Brave directly, you should take note of deprecation mentioned. We do a major release every couple years, to remove deprecation and Brave 6 will also do that. By paying attention, not only will your code work faster, but you'll have less surprise later.
Like all releases, volunteers bore a huge responsibility on this release. As so much happened here, it was quite a load. Please reach out and thank those who contributed, star our repo or say hi on gitter. If you have ideas, we'd love to hear about them, too.
On to the main show!
Introducing SpanHandler
Brave 5.12 has a cleaner integration for data than ever before. SpanHandler
replaces FinishedSpanHandler
. SpanHandler
can do everything FinishedSpanHandler
did: redacting, adding tags based on baggage, remapping trace IDs, sending to multiple systems etc.
The more advanced begin
hook adds much more power. You can setup default baggage only on local roots, add correlated mapped data extensions, perform aggregations such as child counts.
This is our most powerful API co-designed by @anuraaga and with lots of good feedback from our usual suspects @jeqo and @jorgheymans. For now, you can just replace FinishedSpanHandler
with SpanHandler
, but if you are curious.. here are few links of interest:
See https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/handler/SpanHandler.java
See https://github.com/openzipkin/brave/tree/master/brave/src/test/java/brave/features/handler
See https://github.com/openzipkin/zipkin-reporter-java/tree/master/brave
MutableSpan can do everything now
MutableSpan was initially a response to complaints that immutable conversions added GC pressure and generally weren't a good choice for telemetry.
Before, we paired TraceContext
with MutableSpan
, splitting responsibilities. However, this would make things like natively writing JSON from Zipkin types difficult. Hence, we fully fleshed out MutableSpan
so that it accompanies, but is decoupled from TraceContext
.
Here are some features newly available with much thanks to @anuraaga for a month of help on them!
MutableSpanBytesEncoder
- allows you to writeMutableSpan
directly to JSON without any dependencies or intermediating through another type such aszipkin2.Span
.MutableSpan.xxxId()
- allows you to specify read or remap all IDs including trace IDs, depending on your outputMutableSpan.annotations(), tags()
- read-only immutable collection views for convenience of those not concerned with performance (internally implemented as arrays)MutableSpan.annotationCount(), tagCount() xxxValueAt(index)
- allocation free tools to write data conversions as for loops.
RPC abstraction is now complete!
We started an RPC abstraction about 9 months ago. Last October, we RPC sampling support in Brave 5.8.
With a lot of thanks to our contributors @devinsba @jeqo @jcchavezs and especially weeks of effort by volunteer @anuraaga, we have a complete product. Those using gRPC or Dubbo can now uniformly sample and parse parse based on RPC metadata:
By default, the following are added to both RPC client and server spans:
- Span.name is the RPC service/method. Ex. "zipkin.proto3.SpanService/Report"
- If the service is absent, the method is the name and visa versa.
- Tags:
- "rpc.method", eg "Report"
- "rpc.service", eg "zipkin.proto3.SpanService"
- "rpc.error_code", eg "CANCELLED"
- "error" the RPC error code if there is no exception
- Remote IP and port information
Users familiar with how HTTP works will love the familiarity. The APIs are similar, exactly the same features are supported, whether that's sampling, baggage you name it. Those curious about our decision making process can have a look at the RATIONALE as we tried our best to make sound decisions and be transparent about them. Enjoy!
Zipkin dependency is dropped!
With the SpanHandler
type finalized, we have deprecated support for zipkin2.Reporter<zipkin2.Span>
in Brave and removes dependencies on Zipkin libraries.
This isn't to deprecate Zipkin support, of course, just move the responsibility to the zipkin-reporter-brave project (even [XML beans](https://github.com/openzipkin/zipkin-reporter-java/tree/master/spring-beans for those who need it!)
The end result is cleaner integrations for the various SaaS offerings who use Brave, but don't use Zipkin. Such use cases should be directly implemented as SpanHandler
now, with no need to route through zipkin format.
Zipkin users should simply replace AsyncReporter
with AsyncZipkinSpanHandler
to adjust, similar to what's in our README:
// Configure a reporter, which controls how often spans are sent
// (this dependency is io.zipkin.reporter2:zipkin-sender-okhttp3)
sender = OkHttpSender.create("http://127.0.0.1:9411/api/v2/spans");
// (this dependency is io.zipkin.reporter2:zipkin-reporter-brave)
zipkinSpanHandler = AsyncZipkinSpanHandler.create(sender);
tracing = Tracing.newBuilder()
.addSpanHandler(zipkinSpanHandler)
...
Test infrastructure overhaul
As we no longer have a Zipkin dependency, we decided to make tools to help common unit and integration tests. For example, vendors integrating with Brave should be able to assert on the data produced. Third party libraries should be able to avoid common bugs. Beyond our normal ITHttpServer
and similar tests, we've extracted the following in the brave-tests
package:
- ITRemote - configures the most common test fixtures for multi-threaded integration tests
- TestSpanHandler - allows simple assertions for unit tests
- IntegrationTestSpanHandler - blocking span reporter for remote multi-threaded unit tests.
Rationale
We have updated and added many RATIONALE files including the below to better help people understand our thinking.. and to help us remember our thinking!
Thanks to @jorgheymans @jeqo @jcchavezs @anuraaga and @NersesAM for the help adding content and reviewing
brave
brave-instrumentation
brave-instrumentation-dubbo
brave-instrumentation-http
brave-instrumentation-grpc
brave-instrumentation-kafka-streams
brave-instrumentation-rpc
Other Notable Changes
Updates
- Kafka 2.5 is now supported, thanks to @jeqo
Behavior
- one-way RPC span modeling should no longer use
span.start().flush()
on one host andspan.finish()
(without start) on the other. This was implemented inconsistently and not very compatible with most clones.
Additions
Tracing.Builder.clearSpanHandlers(), spanHandlers()
- allowsTracingCustomizer
instances to re-order or prune span handlers. For example, to ensure Zipkin is last, or theirs is first.Tracing.Builder.alwaysSampleLocal()
- special hook for metrics aggregation and secondary-sampling that says the backend should always see recorded spans even if they weren't sampled in headers
Deprecations:
Tracer.propagationFactory()
is deprecated for the existingTracer.propagation()
as we no longer rely on non-string keys (these were only used by gRPC and we changed to hide this conversion).brave.ErrorParser
is deprecated as it was only used for Zipkin conversion. You can optionally specifyTag<Throwable>
to affect the default "error" tag in zipkin-reporter-brave