Skip to content

TextDecoder should not output BOM #25315

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
joyeecheung opened this issue Jan 2, 2019 · 4 comments
Closed

TextDecoder should not output BOM #25315

joyeecheung opened this issue Jan 2, 2019 · 4 comments
Labels
confirmed-bug Issues with confirmed bugs. encoding Issues and PRs related to the TextEncoder and TextDecoder APIs.

Comments

@joyeecheung
Copy link
Member

Found when porting encoding WPT into core.

Spec: https://encoding.spec.whatwg.org/#concept-td-serialize

If encoding is UTF-8, UTF-16BE, or UTF-16LE, and ignore BOM flag and BOM seen flag are unset, then:

  • If token is U+FEFF, then set BOM seen flag.
  • Otherwise, if token is not end-of-stream, then set BOM seen flag and append token to output.
  • Otherwise, return output.

Failing tests:

@joyeecheung joyeecheung added the encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. label Jan 2, 2019
@joyeecheung
Copy link
Member Author

cc @nodejs/i18n

@joyeecheung
Copy link
Member Author

Sorry, pinged the wrong team @nodejs/intl

@joyeecheung joyeecheung added the confirmed-bug Issues with confirmed bugs. label Jan 2, 2019
@Xstoudi
Copy link
Contributor

Xstoudi commented Oct 24, 2019

PR merged, should be close.

@targos
Copy link
Member

targos commented Oct 24, 2019

PR was merged, but the bug still exists. Both failing tests are skipped:

"textdecoder-byte-order-marks.any.js": {
"fail": "Mismatching BOM should not be ignored"
},
"textdecoder-copy.any.js": {
"fail": "Should not have output BOM: https://encoding.spec.whatwg.org/#concept-td-serialize"
},

addaleax added a commit to addaleax/node that referenced this issue Nov 1, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: nodejs#25315
MylesBorins pushed a commit that referenced this issue Nov 17, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

PR-URL: #30132
Reviewed-By: Gus Caplan <me@gus.host>
targos pushed a commit that referenced this issue Dec 1, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

PR-URL: #30132
Reviewed-By: Gus Caplan <me@gus.host>
MylesBorins pushed a commit that referenced this issue Dec 17, 2019
Do not accept the BOM if it comes from a different encoding, and
only discard the BOM after it has actually been read (including
when it is spread over multiple chunks in streaming mode).

Fixes: #25315

PR-URL: #30132
Reviewed-By: Gus Caplan <me@gus.host>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
confirmed-bug Issues with confirmed bugs. encoding Issues and PRs related to the TextEncoder and TextDecoder APIs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants