Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix thread-safety issue in quickjs-libc #578

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

bnoordhuis
Copy link
Contributor

JS_NewClassID(rt, &class_id) where class_id is a global variable is unsafe when called from multiple threads but that is exactly what quickjs-libc.c did.

Change the class ids in quickjs-libc.c to thread-locals, and change the function prototype of JS_NewClassID to hopefully make it harder for downstream users to introduce such bugs.

The gcc48 and tcc buildbots don't support _Thread_local and that is why this commit removes them.

I toyed with the idea of porting the uv_key_create/uv_key_get/etc. functions from libuv but because quickjs-libc doesn't know when it's safe to free the thread-local keys, that leads to memory leaks, and support for ancient or niche compilers isn't worth that.

Fixes: #577

@bnoordhuis
Copy link
Contributor Author

On second thought: using _Thread_local won't work right when a downstream user moves a JSRuntime between threads.

I think the options are:

  1. Move the class ids to JSThreadState and leak memory in js_std_free_handlers so JSThreadState is available in finalizers

  2. Move js_std_free_handlers to after JS_FreeRuntime and accept that it breaks existing downstream code that calls js_std_free_handlers first

  3. Concoct a JS_SetRuntimeOpaque2 that adds a finalizer that runs last in JS_FreeRuntime (to clean up JSThreadState) and turn js_std_free_handlers into a no-op

@saghul
Copy link
Contributor

saghul commented Oct 7, 2024

Ah good one!

Some thoughts (after reading the other issue too).

JS_NewClassID(rt, &class_id) where class_id is a global variable is unsafe when called from multiple threads but that is exactly what quickjs-libc.c did.

JSRuntime is not thread-safe so is this a case of "you are using it wrong" ? Or rather the API is wrong!

JS_NewClassID should always return a new class ID, attached to the runtime, I guess that slipped when I did the refactor.

However even when doing that we'd not be alright because the worker case would create different class IDs for each runtime so the single static variable wouldn't do....

So how about option 4!

  1. Switch back to the way it was in the OG: a static global (now) protected with a mutex. SInce we have cross-platform mutexes here it should be simple to support. We'd likely need to use a once thing because there is no static initializer on Windows, but nothing insurmountable.

The main reason this change (the change to JS_NewClassID) makes me a bit uneasy is that we are breaking a widely used (amongst QuickJS users, that is) pattern, so almost all porgrams would need an update if / when migrating. I know we have done this with the memory allocation for example, but I'd argue the people using those APIs is a smaller subset.

Then there is the gcc48 situation. IMHO if we drop the CI we should also drop the change that came with it, aka the atomics header file, since we can really claim to suopport if we don't test on it...

WDYT?

@bnoordhuis
Copy link
Contributor Author

Switch back to the way it was in the OG: a static global (now) protected with a mutex.

That doesn't work because the class ids can be different between different JSRuntime instances. Using a single JSClassID is always wrong.

(That's what #577 really is, using an old - and wrong - class id in a new JSRuntime.)

I've opted for (3) and:

  • left JS_NewClassID the way it is
  • brought back the tcc and gcc48 buildbots
  • kept js_std_free_handlers even though it's a no-op now

quickjs-libc.c Outdated

list_for_each_safe(el, el1, &ts->os_rw_handlers) {
JSOSRWHandler *rh = list_entry(el, JSOSRWHandler, link);
free_rw_handler(rt, rh);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On third thought... this isn't quite right either because we still need to be able to call JS_FreeValueRT during cleanup.

I'm going to keep js_std_free_handlers and only move freeing JSThreadState here.

`JS_NewClassID(rt, &class_id)` where `class_id` is a global variable
is unsafe when called from multiple threads but that is exactly what
quickjs-libc.c did.

Add a new JS_AddRuntimeFinalizer function that lets quickjs-libc
store the class ids in JSRuntimeState and defer freeing the memory
until the runtime is destroyed. Necessary because object finalizers
such as js_std_file_finalizer need to know the class id and run after
js_std_free_handlers runs.

Fixes: quickjs-ng#577
@saghul
Copy link
Contributor

saghul commented Oct 7, 2024

That doesn't work because the class ids can be different between different JSRuntime instances. Using a single JSClassID is always wrong.

I might have not gotten enough ☕ in the system yet 😅 How can that be if the class ID allocation is protected with a lock? In its (weird, I'll admit!) implementastion if you store a static variable you'd get it initialized just once, so the same class ID for all runtimes, no?

I'll look through the changes now!

@@ -3976,6 +3990,7 @@ void js_std_init_handlers(JSRuntime *rt)
ts->exc = JS_UNDEFINED;

JS_SetRuntimeOpaque(rt, ts);
JS_AddRuntimeFinalizer(rt, js_free_rt, ts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you avoid this by using the system allocator like before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could but this seems neater. Is there a reason to prefer plain malloc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion, but we are somewhat contradicting the documentation you just added: the runtime itself is no longer usable. I would've understood calling any RT function (including the memory ones) is a no-no.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do as I say, not as I do, is what I always say :-)

quickjs and quickjs-libc are strongly integrated so I don't think there's a dichotomy here. The doc comment is for downstream users and it's best to keep it simple ("do X" instead of "do X unless Y")

@bnoordhuis
Copy link
Contributor Author

bnoordhuis commented Oct 7, 2024

How can that be if the class ID allocation is protected with a lock?

Consider:

  1. I create runtime R1 and add classes A and B, in that order. Then:
  2. I create runtime R2 and add classes B and A, in that order

With global class ids, mutex or not, JS_NewClassID reuses the class ids from (1) but now they're in the wrong order in (2)

edit: rt->js_class_id_alloc isn't updated in R2 either.

@saghul
Copy link
Contributor

saghul commented Oct 7, 2024

With global class ids, mutex or not, JS_NewClassID reuses the class ids from (1) but now they're in the wrong order in (2)

Right they will be reused. But if static variables are used they will match. Is the order relevant for user code? I know internally it is...

@bnoordhuis
Copy link
Contributor Author

Is the order relevant for user code? I know internally it is...

The fact that the reproducer in #577 fails the way it does is proof that, yes, order is relevant :-)

@bnoordhuis
Copy link
Contributor Author

Maybe a better example is this:

  1. Register A in R1
  2. Register A and B in R2

Because rt->js_class_id_alloc isn't updated in (2) for A, that means B gets the same class id as A. That's the bug in #577.

@saghul
Copy link
Contributor

saghul commented Oct 7, 2024

I'll add some text to the docs once I get around adding C API docs!

@bnoordhuis bnoordhuis merged commit 9a37c57 into quickjs-ng:master Oct 7, 2024
52 checks passed
@bnoordhuis bnoordhuis deleted the fix577 branch October 7, 2024 19:27
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weird memory corruption (?) bug
2 participants