-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
proof of concept of tenancy #3605
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3605 +/- ##
==========================================
- Coverage 96.40% 96.25% -0.15%
==========================================
Files 264 264
Lines 15509 15606 +97
==========================================
+ Hits 14951 15022 +71
- Misses 468 484 +16
- Partials 90 100 +10
Continue to review full report at Codecov.
|
The mechanism of passing tenant via Context makes me rather uncomfortable. Since there are strong privacy implications associated with tenancy, passing it via Context in a 'best effort" manner makes it pretty easy to violate those implications. On the other hand, there are steps we could take to protect against that, such as storage integration tests that write & query to ensure different tenants do not see each other data, and also by making the tenant key in the Context required, i.e. by failing fast if it's undefined. The latter can be implemented as tenant-enforcing decorators to the SpanProcessor and Storage interfaces. I also think the entry points to the backend should always invoke some "tenant strategy" that should be configured centrally (only once), such that it could enforce that a tenant header is always populated or always ignored (for single-tenant sites), and in the future could be use to validate the tenant IDs. |
I agree with the @yurishkuro comment above. If we want to add tenancy support in the storage layer I would prefer to pass the tenant explicitly, however, it will be a breaking change for the grpc storage plugin - actually, there will be always a breaking change because the dependency store write method does not pass context. That said if we can make sure the tenant is always properly propagated I am fine with the context as well. |
@yurishkuro @pavolloffay I wrote this PR as an exercise to make sure I understood the Jaeger internals well enough to write a proposal, and to size the effort. Is the next step to write a proposal? I had hoped to get started on this before the April community call. Here are my design thoughts:
Flow of tenant information on post (new parameters in bold)
Currently only the V2 HTTP getTrace was implemented to pass tenant info from header
Flow of tenant information on query (no new params; although context is used to pass information from heaters network -> getTrace() etc -> queryService.GetTrace(context, ...) -> Reader.GetTrace(context, ...) We probably can't implement tenancy in all back-ends. The first release should target This PR does not yet have a way to tell Jaeger what the allowed tenants are. Flag with comma-separated list? File? Each implementation of type Tenanted interface {
IsValid(tenant string) error
} If the user specifies something like |
I think we can iterate on this later, as I see benefits to having per-tenant queues for better isolation. Why no per-tenant metrics? If the tenant data is available, # of queues doesn't matter, certain metrics can still be tagged with the tenant.
As I mentioned in earlier comments, I would like to have a decorator-type Reader/Writer that verifies that the tenant info is present in the Context for all methods. That doesn't guarantee that storage implementation would do the right thing, but it enforces the upstream pipeline to always pass the tenant (since we're not adding an explicit parameter to all of the pipeline APIs).
Why is that? I think every integration test instantiation needs that, but I don't see who would use this tenant-aware information from the storage implementations.
We can add that incrementally, but we should some central choke point (perhaps that decorator I mentioned above) that could validate tenants if necessary. |
I feel that this PR can work as a proposal, but at the same time I've always wanted to start including Architecture Decision Records (ADRs) in the repo, so not a bad idea to give it a spin. |
@yurishkuro I am restarting work on this PR today, in response to your comments. This PR used a single Would you suggest all tenants share the same SpanProcessors? |
Why one writer per tenant? Are you trying to allow different storage configs per tenant? My suggestion would be to start with the simplest form: everything identical and shared, but tenancy exists at the data level. Then we can start adding complications for better isolation, if there's interest. |
Signed-off-by: Ed Snible <snible@us.ibm.com>
Signed-off-by: Ed Snible <snible@us.ibm.com>
…jection of invalid tenants Signed-off-by: Ed Snible <snible@us.ibm.com>
b5bb192
to
e695f99
Compare
@esnible this PR is quickly getting out of hand in size & complexity, I strongly advise incremental changes. For example, you're doing a large scale refactoring to pipe through the Context into more places - this should be done separately and we can merge such change to unblock the actual tenant work. |
@yurishkuro Thanks for your eyes on this. I am happy to close this draft and create small PRs to do this work gradually. The breaking change I am introducing is the addition of a
|
No, I don't think we should do V2 interfaces, I would rather change/upgrade things in place, to reduce the maintenance burden in the future (we're changing the internals, so no need to keep V1s)
It's a hard question to answer. My preference is to pass tenancy explicitly, if possible, but we have to weigh it against the amount of API churn it will introduce. From your PR it seems that a lot of churn is already needed anyway just to pass the context. As I mentioned, I don't mind the Context approach too much, but in this case we should be consistent end-to-end, which was not the case in your PR since some signatures were passing tenant explicitly. So I think you, having looked at all the pipeline code, might be in the best position to argue which way is better. |
Updates: #3427
Demo of tenancy implementation
I have Jaeger running behind a proxy that adds
x-tenant
header depending on an Authorization header token or on a cookie. Each tenant sees only the traces posted with the same header value.Only HTTP
GET /api/traces/*
and gRPCjaeger.api_v2.CollectorService/PostSpans
are implemented.Only the memory storage is implemented. No limits on the number of tenants allowed.
Is this the right approach? This PR offered as a proof-of-concept.
cc @pavolloffay