Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Unexpected local context data behavior change for duplicated context #5505

Open
ben1222 opened this issue Mar 5, 2025 · 6 comments
Open
Labels

Comments

@ben1222
Copy link
Contributor

ben1222 commented Mar 5, 2025

Version

4.5.12

Context

Before 4.5.11, when duplicating a duplicated context, it always have empty local context data at beginning, just like what the Javadoc says.
However, in 4.5.12, due to the changes in #5441, the local context data is copied.
There are 2 concerns here:

  1. It may break existing code that rely on empty local context on duplicating DuplicatedContext
    In our use case, we put a map in the local context data (using getLocal()/putLocal(), we used it before ContextLocal was introduced) to track some data.
    We were expecting duplicating DuplicatedContext also starts with empty local context data, just like duplicating ContextImpl, so that our map can track only what’s run on the current duplicated context.
    However, after the behavior change in 4.5.12, since the local context data is copied to new DuplicatedContext when duplicating from DuplicatedContext, we got many duplicated contexts referencing same map instance and messed things up.

    One option for us could be clearing the local context data on any places that may produce duplicated context (e.g., every message consumer), but it requires lots of changes, and having unnecessary overhead of copying and then clearing local context data.
    Another option for us is to move our map to a separate ContextLocal, and customize a duplicator to always return null. (a minor problem is that if such local data is only used in a small part of contexts, the overhead of reserving one more slot for every context/duplicated context may not worth it)

    Why changing the behavior of existing getLocal()/putLocal() use cases?
    Can the duplicator of ContextInternal.LOCAL_MAP be changed to always return null?
    Or at least support configuring the default behavior of duplicating context to not copying?

  2. Inconsistent behavior between duplicating ContextImpl and DuplicatedContext
    Before 4.5.12, the behavior was consistent – it always create a DuplicatedContext with no local context data.
    In 4.5.12, duplicating ContextImpl will not copy the local context data in ContextImpl into DuplicatedContext, while duplicating DuplicatedContext will copy the local context data to the new DuplicatedContext.

    This is really confusing me... is it by design to not allow the customized duplicator of ContextLocal to not work when duplicating from ContextImpl?

Do you have a reproducer?

Tried to create a unit test to illustrate one kind of use cases:

  @Test
  public void testLocalsForDuplicatedContext() {
    Logger LOG = LogManager.getLogger(ContextTest.class);
    final String RESOURCE_TRACKER_KEY = "resource-tracker-key";
    final String COMMAND_ADDRESS = "command-address";
    final String EVENT_ADDRESS_PREFIX = "event-address-";
    final String RESOURCE_1 = "resource-1";
    final String RESOURCE_2 = "resource-2";
    final String COMMAND_ID = "command-id-1";
    final Supplier<ConcurrentHashMap<String, Integer>> getResourceTracker = () -> {
      Context ctx = vertx.getOrCreateContext();
      ConcurrentHashMap<String, Integer> ret = ctx.getLocal(RESOURCE_TRACKER_KEY);
      if (ret == null) {
        LOG.info("Creating resource tracker, context: {}", ContextInternal.current());
        ret = new ConcurrentHashMap<>();
        ctx.putLocal(RESOURCE_TRACKER_KEY, ret);
      }
      return ret;
    };
    final AtomicReference<String> errorMessage = new AtomicReference<>();
    final Consumer<String> useResource = resource -> {
      ConcurrentHashMap<String, Integer> resourceTracker = getResourceTracker.get();
      LOG.info("Using {}, tracker: {}, context: {}", resource, resourceTracker, ContextInternal.current());
      if (resourceTracker.containsKey(resource)) {
        errorMessage.set("Resource " + resource + " is already in use");
      }
      resourceTracker.merge(resource, 1, Integer::sum);
    };
    final Consumer<String> releaseResource = resource -> {
      ConcurrentHashMap<String, Integer> resourceTracker = getResourceTracker.get();
      LOG.info("Releasing {}, tracker: {}, context: {}", resource, resourceTracker, ContextInternal.current());
      resourceTracker.computeIfPresent(resource, (k, v) -> v == 1 ? null : v - 1);
    };
    vertx.eventBus().<String>localConsumer(COMMAND_ADDRESS, msg -> {
      String id = msg.body();
      LOG.info("Received command {}, context: {}", id, ContextInternal.current());
      AtomicBoolean isCancelled = new AtomicBoolean();
      useResource.accept(RESOURCE_1);
      releaseResource.accept(RESOURCE_1);

      MessageConsumer<String> eventConsumer = vertx.eventBus().consumer(EVENT_ADDRESS_PREFIX + id, event -> {
        LOG.info("Received event {}, context: {}", event.body(), ContextInternal.current());
        isCancelled.set(true);
        useResource.accept(RESOURCE_2);
        releaseResource.accept(RESOURCE_2);
      });
      eventConsumer.completionHandler(ar -> {
        assertTrue(ar.succeeded());
        vertx.executeBlocking(() -> {
          useResource.accept(RESOURCE_2);
          try {
            // simulate some works
            for (int i = 0; i < 200; i++) {
              if (isCancelled.get()) {
                return "cancelled";
              }
              Thread.sleep(10);
            }
            return "done";
          } finally {
            releaseResource.accept(RESOURCE_2);
          }
        }).onComplete(ar2 -> {
          eventConsumer.unregister();
          if (ar2.succeeded()) {
            msg.reply(ar2.result());
          } else {
            msg.reply("failed: " + ar2.cause().getMessage());
          }
        });
      });
    }).completionHandler(ar -> {
      assertTrue(ar.succeeded());
      vertx.eventBus().request(COMMAND_ADDRESS, COMMAND_ID, ar1 -> {
        assertTrue(ar1.succeeded());
        assertEquals("cancelled", ar1.result().body());
        assertNull(errorMessage.get());
        testComplete();
      });
      vertx.setTimer(500, i -> vertx.eventBus().publish(EVENT_ADDRESS_PREFIX + COMMAND_ID, "cancel"));
    });
    await();
  }

In 4.5.11, it passes, and in 4.5.12, it fails with:

java.lang.AssertionError: expected null, but was:<Resource resource-2 is already in use>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotNull(Assert.java:756)
	at org.junit.Assert.assertNull(Assert.java:738)
	at org.junit.Assert.assertNull(Assert.java:748)
	at io.vertx.test.core.AsyncTestBase.assertNull(AsyncTestBase.java:250)
	at io.vertx.core.ContextTest.lambda$testLocalsForDuplicatedContext$141(ContextTest.java:1290)
	at io.vertx.core.impl.future.FutureImpl$4.onSuccess(FutureImpl.java:176)
	at io.vertx.core.impl.future.FutureBase.lambda$emitSuccess$0(FutureBase.java:60)
	at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:312)
	at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:302)
	at io.vertx.core.impl.future.FutureBase.emitSuccess(FutureBase.java:57)
	at io.vertx.core.impl.future.FutureImpl.tryComplete(FutureImpl.java:259)
	at io.vertx.core.Promise.complete(Promise.java:66)
	at io.vertx.core.eventbus.impl.ReplyHandler.dispatch(ReplyHandler.java:97)
	at io.vertx.core.eventbus.impl.HandlerRegistration$InboundDeliveryContext.execute(HandlerRegistration.java:137)
	at io.vertx.core.eventbus.impl.DeliveryContextBase.next(DeliveryContextBase.java:80)
	at io.vertx.core.eventbus.impl.DeliveryContextBase.dispatch(DeliveryContextBase.java:43)
	at io.vertx.core.eventbus.impl.HandlerRegistration.dispatch(HandlerRegistration.java:98)
	at io.vertx.core.eventbus.impl.ReplyHandler.doReceive(ReplyHandler.java:81)
	at io.vertx.core.eventbus.impl.HandlerRegistration.lambda$receive$0(HandlerRegistration.java:49)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

@ben1222 ben1222 added the bug label Mar 5, 2025
@vietj
Copy link
Member

vietj commented Mar 5, 2025

@ben1222 can you instead use your own io.vertx.core.spi.context.storage.ContextLocal instead ? that would give you more control over the values

As said in the issue that changes this behaviour, semantic was not clearly defined and we try to make it well defined instead with context local SPI.

perhaps also we could improve the duplication and handle map/concurrent map with a copy and that would help your case to be solved ?

@ben1222
Copy link
Contributor Author

ben1222 commented Mar 6, 2025

@vietj

can you instead use your own io.vertx.core.spi.context.storage.ContextLocal instead ? that would give you more control over the values

Using ContextLocal is an option for us and we are doing POC to see if it works.
However, this behavior change is backward incompatible and could impact others and is not well documented (e.g., not included in the breaking changes)

A few options to keep backward compatibility may be:

  1. Keep using old behavior for the old APIs (getLocal()/setLocal()), and for developers that want a better control over the duplicate behavior, they can use the new API (ContextLocal)
    This can be easily done by defining the duplicator of ContextInternal.LOCAL_MAP to not do duplicate. (i.e., always return null)
  2. Change the default behavior of old APIs but still allows developers to switch back to old behavior or define the duplicator of ContextInternal.LOCAL_MAP
    This sounds more complicated and not sure if it is worth to do for the old APIs that are currently marked as deprecated.
    (By the way, the deprecated annotation is only added to the ContextInternal, but not Context, is it expected? What's the implication of it?)

As said in the issue that changes this behaviour, semantic was not clearly defined and we try to make it well defined instead with context local SPI.

See this section: Advanced Vert.x Guide - Contexts and tracing
Reading this gives me a strong impression of that the context local storage is very similar to "thread local storage", the difference is that it works with context instead of thread.
The thread local storage will not be copied over by default when starting new thread, so it is nature to think that the context local storage will not be copied over as well, and this is how it used to do.

Although this semantic is not explicitly defined, I'm not sure how many people have similar impression as me...

I agree that the semantic can be better defined with context local SPI, but it looks like still need some polish (or are you already planning to do something?):
The behavior of duplicating from ContextImpl is inconsistent with duplicating from DuplicatedContext and developer cannot well define the semantic of duplicating from ContextImpl even with ContextLocal.
I'm still not sure if it is intended or missed.
If it is intended to be inconsistent and do not allow developer to define that part of semantic, what's the reason? Is it documented somewhere?

perhaps also we could improve the duplication and handle map/concurrent map with a copy and that would help your case to be solved ?

I'm not sure if I understand this question correctly... if you are saying to do a deep copy, then no, it will not help in our case.
The option 1 or 2 above could help in our case.

@vietj
Copy link
Member

vietj commented Mar 6, 2025

@ben1222 one reason is that the putLocal/getLocal should never have been made public API as they actually target internal vertx usage, hence there is lot of confusion around it.

if we revert the usage, we need to break it still in 5.0 :-(

this behavior is actually needed by gRPC context storage to works correctly (and it makes sense)

let me know how 2. works for you, as this is the option that gives control to the owner of the local.

Yes local are like thread locals but they are attached to a flow of execution supported by multiple threads

@ben1222
Copy link
Contributor Author

ben1222 commented Mar 7, 2025

@vietj

one reason is that the putLocal/getLocal should never have been made public API as they actually target internal vertx usage, hence there is lot of confusion around it.

It is a great feature and is useful not only for internal vertx, at least it helped us a lot.
It may be better to pay more attention on the backward compatibility/deprecation/documentation.

if we revert the usage, we need to break it still in 5.0 :-(

May I ask why it will still break in 5.0? Is it because of the gRPC context storage?

this behavior is actually needed by gRPC context storage to works correctly (and it makes sense)

I'm not familiar with gRPC so please correct me if I'm wrong:
I briefly read some related code (vertx-grpc-context-storage 4.5.13, vertx-grpcio-context-storage 5.0.0.CR4, io.grpc.Context imlementation) and vertx-grpc document and have a few thoughts/questions:

  1. The ContextStorageOverride implementation seems need to put gRPC context in vertx local context, and need to propagate it on vertx context duplication
    It looks to me that using ContextLocal for it is enough and no need to rely on the ContextInternal.LOCAL_MAP to be copied over on vertx context duplication?
    Reading the implementation in 5.0 I'm seeing it is already using ContextLocal for the GrpcStorage which contains the gRPC context.
    Then it looks to me all set, what do you mean by "we need to break it still in 5.0"?

  2. I see ContextStorageOverride.duplicate and wondering why it want to copy over the data referenced by ContextInternal.LOCAL_MAP?
    If developer want some local data be propagated on duplicate, shouldn't they put it in the gRPC context and/or in a separate ContextLocal?
    If it wants to allow a gRPC context attach does not affect other vertx local context data, then handling it here specifically sounds better than doing it for every vertx context duplication... what do you think?

  3. In this line, shouldn't it use prev.getLocal instead of next.getLocal?
    It looks like the doAttach needs to return the previously attached gRPC context but this code seems always return the newly attached gRPC context.

  4. Not sure if the following is a valid usage, but if it is written this way, it may produce a chain of duplicated vertx context in GrpcStorage.prevVertxContext that grows indefinitely and has OOM risk:

    void scheduleTimer(Context grpcContext) {
      vertx.setTimer(DELAY, i -> grpcContext.run(() -> {
        doSomething();
        scheduleTimer(grpcContext);
      }));
    }

    The gRPC context run looks for sync code only and mixing it in async environment sounds a bit confusing... is there some guideline/best practice for it?

let me know how 2. works for you, as this is the option that gives control to the owner of the local.

With this, we may just switch it back to old behavior (no copy on context duplication).
We are not using grpc for now.
Are you aware of other places that rely on the duplicator of ContextInternal.LOCAL_MAP to do a copy?

@vietj
Copy link
Member

vietj commented Mar 7, 2025

  1. Vert.x 4 gRPC storage is broken and corrected in Vert.x 5 . The gRPC storage should propagate context local set via putLocal, hence it needs to duplicate the context map. By break I mean the behaviour of copying on duplicate is assumed as a breaking change. I will add a note about this in https://vertx.io/docs/guides/vertx-5-migration-guide/ where it is not yet documented.
  2. it does a best effort to duplicate everything tied to the context, e.g. there might be some data set by Vert.x or some user code before the gRPC context storage operates, e.g. the new Deadline feature relies on context storage with GrpcLocal
  3. Perhaps, need to investigate this more carefully, thanks
  4. thank you I will investigate

As a side note : in Vert.x 5 getLocal/putLocal have been moved to ContextInternal and we consider instead people should use context local, yet they are still available with a cast. Since you are a user of this API, what is your opinion on this? do you find it acceptable ? See https://vertx.io/docs/guides/vertx-5-migration-guide/#_context_local_storage_removal, we should document I think the new alternative of ContextLocal

@vietj
Copy link
Member

vietj commented Mar 7, 2025

FYI : #5506

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants