Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Feature: handle tracef's %c as unicode code point #411

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zetanumbers
Copy link
Contributor

@zetanumbers zetanumbers commented Feb 27, 2022

Previously converted such character to UTF-16 char code, so large unicode characters would have been truncated. Now it's possible to pass unicode characters.

Previously converted such character to UTF-16 char code,
so large unicode characters would have been truncated.
Now it's possible to pass unicode characters.
@aduros
Copy link
Owner

aduros commented Mar 1, 2022

We should probably match the same behavior as C's printf, which I think truncates to 8 bits for %c.

For me this program:

printf("Hello %c\n", 12345678);

Prints Hello N.

@zetanumbers
Copy link
Contributor Author

We should probably match the same behavior as C's printf, which I think truncates to 8 bits for %c.

But why? It's not like we are trying to implement libc. With this PR we would able to pass rust's char for example.

@zetanumbers
Copy link
Contributor Author

Btw if we truncate, should we truncate to 7 bits for ASCII, or truncate to 8 bits and allow some UTF-16 char codes? Aren't non-ASCII characters for printf OS dependent?

@aduros
Copy link
Owner

aduros commented Mar 2, 2022

Could we truncate to 8 bits? libc printf semantics aren't perfect, but at least they're well-defined and we don't need to document our own special handling of certain features.

For printing unicode characters, isn't it possible to use %s instead of %c? Or just format the string directly in Rust.

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Mar 4, 2022

Could we truncate to 8 bits? libc printf semantics aren't perfect, but at least they're well-defined and we don't need to document our own special handling of certain features.

Until and even then we truncate to 8 bits, we probably could handle non-ascii chars as unicode code points instead of UTF-16 char codes?

@zetanumbers
Copy link
Contributor Author

For printing unicode characters, isn't it possible to use %s instead of %c? Or just format the string directly in Rust.

Current %s implementation only works on ascii null-terminated strings.

https://github.com/aduros/wasm4/blob/main/runtimes/web/src/runtime.ts#L272

To manually tracef in Rust you would:

  1. Create an empty string;
  2. Gradually write to this string other substrings, numbers, etc. Meanwhile the String would grow (reallocate) gradually increasing its capacity;
  3. Flush the whole string onto a single line via traceUtf8;
  4. Deallocate the string.

This brings some runtime (~7KiB on all code optimizations) into the binary. It could have been better (now only ~2KIB) if there was an ability flush the line by parts, requiring no allocations.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants