Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add warnings about UTF-16 vs UTF-8 strings #1416

Merged
merged 1 commit into from
Apr 5, 2019

Conversation

alexcrichton
Copy link
Contributor

This commit aims to address #1348 via a number of strategies:

  • Documentation is updated to warn about UTF-16 vs UTF-8 problems
    between JS and Rust. Notably documenting that as_string and handling
    of arguments is lossy when there are lone surrogates.

  • A JsString::is_valid_utf16 method was added to test whether
    as_string is lossless or not.

The intention is that most default behavior of wasm-bindgen will
remain, but where necessary bindings will use JsString instead of
str/String and will manually check for is_valid_utf16 as
necessary. It's also hypothesized that this is relatively rare and not
too performance critical, so an optimized intrinsic for is_valid_utf16
is not yet provided.

Closes #1348

@alexcrichton alexcrichton requested a review from Pauan April 1, 2019 18:19
@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from 588a54c to c8fdfac Compare April 2, 2019 20:54
@alexcrichton alexcrichton requested a review from Pauan April 2, 2019 20:54
Copy link
Contributor

@Pauan Pauan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also still think as_string should be changed to into)

@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from c8fdfac to d212fac Compare April 3, 2019 15:23
@alexcrichton alexcrichton requested a review from Pauan April 3, 2019 15:23
Copy link
Contributor

@Pauan Pauan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a few nits, this is looking really good! I like how this turned out.

@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from d212fac to 2452d25 Compare April 4, 2019 20:38
@alexcrichton
Copy link
Contributor Author

👍 Thanks for the thorough review!

Copy link
Contributor

@Pauan Pauan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more nits, and then this is ready to merge!

This commit aims to address rustwasm#1348 via a number of strategies:

* Documentation is updated to warn about UTF-16 vs UTF-8 problems
  between JS and Rust. Notably documenting that `as_string` and handling
  of arguments is lossy when there are lone surrogates.

* A `JsString::is_valid_utf16` method was added to test whether
  `as_string` is lossless or not.

The intention is that most default behavior of `wasm-bindgen` will
remain, but where necessary bindings will use `JsString` instead of
`str`/`String` and will manually check for `is_valid_utf16` as
necessary. It's also hypothesized that this is relatively rare and not
too performance critical, so an optimized intrinsic for `is_valid_utf16`
is not yet provided.

Closes rustwasm#1348
@alexcrichton alexcrichton force-pushed the js-string-valid-utf16 branch from 2452d25 to 44738e0 Compare April 5, 2019 15:11
@alexcrichton alexcrichton merged commit 16745ed into rustwasm:master Apr 5, 2019
@alexcrichton alexcrichton deleted the js-string-valid-utf16 branch April 5, 2019 15:12
@alexcrichton
Copy link
Contributor Author

Thanks @Pauan!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants