Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371. #4935

Open
SwtWld opened this issue Jan 4, 2022 · 3 comments
Labels
bug Undesired behaviour

Comments

@SwtWld
Copy link

SwtWld commented Jan 4, 2022

JerryScript revision

Commit: a6ab5e9

Version: v3.0.0

Build platform

Ubuntu 18.04.5 LTS (Linux 4.19.128-microsoft-standard x86_64)

Ubuntu 18.04.5 LTS (Linux 5.4.0-44-generic x86_64)

Build steps
python ./tools/build.py --clean --debug --compile-flag=-fsanitize=address --compile-flag=-m32 --compile-flag=-g --strip=off --lto=off --logging=on --line-info=on --error-message=on --system-allocator=on --stack-limit=20
Test case

poc-as.txt

Execution steps & Output
$ ./jerryscript/build/bin/jerry poc.js

ICE: Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371.
Error: ERR_FAILED_INTERNAL_ASSERTION
[1]    abort      jerry poc.js

Credits: Found by OWL337 team.

@rerobika rerobika added the bug Undesired behaviour label Jan 4, 2022
@ossy-szeged
Copy link
Contributor

ossy-szeged commented Jan 10, 2022

@rerobika I think it is not a bug, but a feature. "𞸋" is encoded in UTF-8 as 0xF09EB88B which is invaliid in CESU8. But of course we could raise a user friendly error message instead of assertion.

@dbatyai
Copy link
Member

dbatyai commented Jan 10, 2022

The issue is not with the "𞸋" character, all non-BMP characters are converted to cesu8 encoding during parsing.
The problem is that the first character is in the basic multilingual plane and should be encoded using 3 bytes, however it is encoded using 4 bytes in the input. This messes up the conversion logic, which always expects the cesu8 equivalent to be 6 bytes long.

@ossy-szeged
Copy link
Contributor

ossy-szeged commented Jan 11, 2022

+info, a simple /*𝔽*/ string fails with the same error if we build with tools/build.py --debug --function-to-string=on

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Undesired behaviour
Projects
None yet
Development

No branches or pull requests

4 participants