Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Selective Http Trace and Baggage Propagation #1853

Open
martinjt opened this issue May 31, 2024 · 8 comments
Open

Selective Http Trace and Baggage Propagation #1853

martinjt opened this issue May 31, 2024 · 8 comments

Comments

@martinjt
Copy link
Member

Looking for some feedback on a change to the HttpClient instrumentation package I've been playing with.

Ultimately, I'm trying to get to a point where you can choose which HttpClient's will have trace/baggage propagation and which won't. With the ultimate goal of stopping trace/baggage leakage to domains you don't control.

I originally thought about domain allow/deny lists, but I'm not sure that's the right approach. So then I pivoted to adding a handler instead.

Right now, TracePropagation happens in 2 places.
The DiagnosticHandler added by the runtime
The HttpClient instrumentation package listening to a Diagnostic source that is written to by 1

You can disable 1 by using the following 2 switches:

AppContext.SetSwitch("DOTNET_SYSTEM_NET_HTTP_ENABLEACTIVITYPROPAGATION", false);
AppContext.SetSwitch("System.Net.Http.EnableActivityPropagation", false);

This has the effect of disabling propagation for the entire application. Further, it also stops the generation of the outbound HTTP span.

In an ideal world, you'd be able to do this at a HttpClient level, and selective choose:

  • No span and no propagation
  • No span but propagate
  • Span, but no propagation
  • Span and Propagate

The scenario where this is useful is where you're using a third party API which you want to track the performance of, but you don't want them to receive baggage and trace details.

What I've been toying with is adding something to the Http Instrumentation package that allows you to "turn off" instrumentation, then add it back it in incrementally.

builder.Services.AddOpenTelemetry()
    .WithTracing(tracingOptions => tracingOptions
        .AddHttpClientInstrumentation(httpOptions => {
            httpOptions.DisableAutoPropagation = true;
        })

Then have 2 handlers such that you can do..

builder.Services.AddHttpClient<WeatherApiClient>(client => client.BaseAddress = new("http://backend"))
    .AddHttpMessageHandler<TracingHandlerWithPropagation>();
builder.Services.AddHttpClient<WeatherApiClient>(client => client.BaseAddress = new("http://backend"))
    .AddHttpMessageHandler<TracingHandlerWithoutPropagation>();

Then add that an extension like

builder.Services.AddHttpClient<WeatherApiClient>(client => client.BaseAddress = new("http://backend"))
    .WithOpenTelemetry(options => {
        options.PropagationEnabled = true;
        options.Tracked = true;
    });

(Names TBC)

The issue I see right now is that the DiagnosticListener is the way we listen to the http calls, and that also does propagation.

So I could make it so that Handler generates the Activity directly, and ignore the diagnostic listener. In effect the diagnostic listener would only be relevant for "Auto instrumented" stuff coming out of the runtime. The alternative is to have the handler only generate diagnostics when it should be traced, and not in other scenarios. Or I use a different diagnostic name.

So is this something worthwhile for people? is the approach viable?

@CodeBlanch
Copy link
Member

Couldn't you do this?

    public async Task<string> CallThirdPartyServiceAndReturnResponse(Uri requestUri)
    {
        // Turn off flow of Activity & Baggage for this ExecutionContext
        using var flow = ExecutionContext.SuppressFlow();

        // Make the call with no trace or baggage to inject
        return await this.httpClient.GetStringAsync(requestUri);
    }

The nice thing about that is it works (in theory) for all things not just HttpClient. Same thing but calling some message queue:

    public async Task PostMessageToThirdPartyService(Uri requestUri)
    {
        // Turn off flow of Activity & Baggage for this ExecutionContext
        using var flow = ExecutionContext.SuppressFlow();

        // Post message with no trace or baggage to inject
        await this.myAwesomeMessageService.PostMessageAsync(requestUri);
    }

SuppressFlow can be kind of wonky and doesn't work if everything ends up executing synchronously. That might need like Task.Yield before the await to be 100% solid. But conceptually, this is how you turn off flow of AsyncLocal which I think might be helpful for this case.

@martinjt
Copy link
Member Author

From what I can tell from testing this, it does stop activity from being created, however, the runtime code does not honor that and still propagates the trace data.

@martinjt
Copy link
Member Author

So ultimately, the issue is that I need to disable the runtime propagation, but if I do that (with the switches), then the Http instrumentation doesn't kick in, because it's based on the diagnostic listener that's generated by the DiagnosticHandler which the switches disable.

@CodeBlanch
Copy link
Member

Ya it will still create an Activity \ span but it shouldn't be parented to the current trace it will be a new root. So you won't leak the internal trace/baggage to the third party. Wasn't that the goal?

If you wanted to stop the Activity \ span from being created completely, really the OTel way to do that is inside the sampler. The problem is though HttpClient isn't passing any tags into the sampler so there isn't really anyway to drop just some specific destination 😢 IIRC the spec says there are some standard tags which should be supplied at sample time it just isn't part of runtime logic yet.

You could probably do something like this:

// At call site
using var scope = MySampler.SuppressActivityCreation();

await httpClient.DoSomethingAsync();

// Set as OTel sampler
internal sealed class MySampler : Sampler
{
    private static readonly AsyncLocal<bool> SuppressActivity = new AsyncLocal<bool>();
    private readonly Sampler innerSampler;

    public MySampler(Sampler innerSampler)
    {
        this.innerSampler = innerSampler ?? throw new ArgumentNullException(nameof(innerSampler));
    }

    public static IDisposable SuppressActivityCreation()
    {
        var previousValue = SuppressActivity.Value;

        SuppressActivity.Value = true;

        return new MySamplerScope(previousValue);
    }

    public override SamplingResult ShouldSample(in SamplingParameters samplingParameters)
    {
        if (SuppressActivity.Value)
        {
            return new SamplingResult(SamplingDecision.Drop);
        }

        return this.innerSampler.ShouldSample(in samplingParameters);
    }

    private sealed class MySamplerScope : IDisposable
    {
        private readonly bool previousValue;

        public MySamplerScope(bool previousValue)
        {
            this.previousValue = previousValue;
        }

        public void Dispose()
        {
            SuppressActivity.Value = this.previousValue;
        }
    }
}

I didn't test any of that just an idea!

@lmolkova
Copy link
Contributor

lmolkova commented May 31, 2024

I think there are two fundamental problems:

  • baggage flows everywhere and you can't control it unless it's enableable (per endpoint/request/etc).
  • different endpoints have different context propagation formats

Having a propagator configurable per endpoint might solve both.
Also, it's not a unique .NET problem - all languages have to some extent.

We can try to extend Propagator spec so that propagator knows the endpoint/some other context it's propagating to
It probably could be done today with some duct tape: custom baggage propagator and AsyncLocal

class ConfigurablePropagator: Propagator {
   public static readonly AsyncLocal<string> Endpoint = new AsyncLocal<string>(); 
   
   public override void Inject(...) {
       propagators[Endpoint.Value].Inject(...)
   }
}

DistributedContextPropagator Current = DistributedContextPropagator.CreateNoOutputPropagator(); // I hope it works
Sdk.SetDefaultTextMapPropagator(new ConfigurablePropagator(....));

ConfigurablePropagator.Endpoint = "https://external-no-baggage-prop.com";
await httpClient.SendAsync()...
ConditionalPropagator.Endpoint = null; 

UPDATE: I assume it should be possible to implement a customizeable DistributedContextPropagator and give .NET a custom propagator.

@martinjt
Copy link
Member Author

Ya it will still create an Activity \ span but it shouldn't be parented to the current trace it will be a new root. 

@CodeBlanch I'm saying that isn't happening. The propagation happens at the DiagnosticHandler level in the runtime. It's not even touching otel. So it's not creating the Activity, but is still propagating the context.

it's not a unique .NET problem - all languages have to some extent.

JS and go, you have to add the propagation handler from otel to the specific HTTP client. Since dotnet does this in the runtime, there's no way to turn it off. It happens even if you don't have otel installed at all.

@martinjt
Copy link
Member Author

martinjt commented Jun 1, 2024

This is what we get with the execution flow applied. You'll notice that the second row shows that the backend has used the propagation context from the first.

image

What we need is the ability to Disable the propagation part, based on a handler of some sort.

What the execution flow IS doing is making the span not inherit the parent, and making propagation be based on the root. Which doesn't give the desired outcome

@johncrim
Copy link

johncrim commented Jul 22, 2024

This sounds super useful to me. Just yesterday I wrote a PrivateBaggageTextMapPropagator for this exact reason, and I would love to have better support for the scenario. My use case is that traceparent can be sent anywhere (it would be better if it wasn't, but I'm not worried about security consequences from an API vendor receiving our traceparent, and in some cases it can be used), but baggage shouldn't be sent to external APIs.

I think the main use case should be opt-in instead of opt-out. I don't think it's reasonable to map all of the possible (transitive) external domains.

I would also be happy if this was something I could explicitly attach to an HttpClient/Handler, or configure with a named client.

My PrivateBaggageTextMapPropagator does a DNS lookup when needed to determine if the service IP is on the private network or on the public internet, and uses that to decide whether to propagate baggage. So my approach is higher overhead than I'd like; the approach you're proposing sounds much better.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants