Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add ability to (de)serialize symbols #539

Merged
merged 3 commits into from
Sep 24, 2024
Merged

Add ability to (de)serialize symbols #539

merged 3 commits into from
Sep 24, 2024

Conversation

saghul
Copy link
Contributor

@saghul saghul commented Sep 21, 2024

Fixes: #481

@saghul saghul requested a review from bnoordhuis September 21, 2024 21:31
@saghul
Copy link
Contributor Author

saghul commented Sep 21, 2024

@bnoordhuis Is this what you had in mind? If so, I'll add the missing part to serialize symbols directly, as in bjson.write(Symbol('foo')) and push it over the finish line.

@bnoordhuis
Copy link
Contributor

Symbol.toStringTag is a well-known symbol but would it also work for e.g. Symbol('#frob')?

@saghul
Copy link
Contributor Author

saghul commented Sep 21, 2024

Symbol.toStringTag is a well-known symbol but would it also work for e.g. Symbol('#frob')?

I believe so, I did consider it:

qjs > new Uint8Array(bjson.write({[Symbol('#frob')]: "42"}))
Uint8Array(16) [ 15, 1, 3, 10, 35, 102, 114, 111, 98, 8, 1, 2, 7, 4, 52, 50 ]

The "problem" is that normal symbols won't match after being deserialized, but I'd say that is expected, right?

@bnoordhuis
Copy link
Contributor

Yes, I'd say so; Symbol("#frob) != Symbol("#frob)

Symbol.for("#frob) == Symbol.for("#frob) on the other hand so maybe it should work for those. You probably have to work in a bit in the wire format to record whether it's JS_ATOM_TYPE_GLOBAL_SYMBOL vs. JS_ATOM_TYPE_SYMBOL.

You're more or less handcrafting the symbol from the wire data now, right? Maybe it's easier to use JS_NewSymbolInternal()?

@saghul
Copy link
Contributor Author

saghul commented Sep 22, 2024

I'm passing in the atom and the type so the distinction is already there. I'm only special casing the internal atoms.

I'll continue tonight!

@saghul saghul marked this pull request as ready for review September 22, 2024 19:59
@saghul
Copy link
Contributor Author

saghul commented Sep 22, 2024

@bnoordhuis Updated, PTAL!

#else
return (int32_t)v < JS_ATOM_END;
#endif
return (int32_t)v < JS_ATOM_END;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this is ok, but I don't see what the previous check tries to accomplish. AFAICT, the only case in which (int32_t)v <= 0; will be true is for tagged integer atoms. But then, in JS_FreeRuntime we only check for atoms >= JS_ATOM_END or if the refcount is not 1, so I think we're good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it's intended as an extra reference counting stress test.

An atom that's a keyword, like Symbol("return"), is constant and exempt from reference counting - but being exempt means refcount bugs won't surface.

(Aside: I don't like the int32_t cast, too clever and relies on implementation-defined behavior.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspected something similar. I don't have a strong opinion, do you have a preference here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't complicate this PR, I'd keep it; otherwise, chuck it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't complicate things much, but I see little value in it plus there is the all-too-clever cast necessary. Look me a while to figure out what was going on because the code behaved differently in debug mode. All that to say I'm chucking it. We can always reconsider down the road if we get bitten by it :-P

@saghul saghul changed the title Add ability to serialize symbol keys in JS_WriteObject Add ability to (de)serialize symbols Sep 23, 2024
Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some suggestions/comments/observations/etc.

#else
return (int32_t)v < JS_ATOM_END;
#endif
return (int32_t)v < JS_ATOM_END;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it's intended as an extra reference counting stress test.

An atom that's a keyword, like Symbol("return"), is constant and exempt from reference counting - but being exempt means refcount bugs won't surface.

(Aside: I don't like the int32_t cast, too clever and relies on implementation-defined behavior.)

Comment on lines +33378 to +33379
"Map",
"Set",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot those.

JSAtom atom = s->idx_to_atom[i];
if (__JS_AtomIsConst(atom)) {
bc_put_u8(s, 0 /* the type */);
bc_put_u32(s, atom);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding for tagged integers could be more efficient, and for keyword-ish atoms (0 < atom < JS_ATOM_END) a single byte is enough, for now, because we only have 219.

Not so important however and fine to leave it for another time (but please add a TODO comment in that case.)

if (bc_get_u32(s, &atom))
return -1;
} else {
assert(type >= JS_ATOM_TYPE_STRING && type < JS_ATOM_TYPE_PRIVATE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this reads untrusted input, it should return an error, not assert.

@saghul saghul merged commit c25aad7 into master Sep 24, 2024
50 checks passed
@saghul saghul deleted the write-symbols branch September 24, 2024 08:01
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JS_WriteObject does not serialize symbol keys
2 participants