-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[bug] Nokogiri::XML::Reader.from_io.each
misidentifies character encoding?
#2882
Comments
@koshigoe Thank you for reporting this! This error message is being generated by libxml2. I have reproduced the issue and will investigate. |
Git bisect shows that this is the commit that introduced the new behavior: https://gitlab.gnome.org/GNOME/libxml2/-/commit/3582b07bd24d438be7dd08ab57e3f9e635373e32
|
I've narrowed this down to specific changes in libxml2 chunk parsing that may be a bug. I'll open an issue upstream and link to it here. |
Neat! This was already reported upstream at https://gitlab.gnome.org/GNOME/libxml2/-/issues/542 and was fixed about an hour ago in https://gitlab.gnome.org/GNOME/libxml2/-/commit/e0f3016f71297314502a3620a301d7e064cbb612 I expect it'll be fixed shortly in a libxml2 release. I'll leave this open until that happens and I can ship a new nokogiri release. |
libxml2 v2.11.4 is out with the fix: https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.11.4 I'll try to get a release out in the next day. |
Nokogiri v1.15.1 is out with this upstream fix. https://github.com/sparklemotion/nokogiri/releases/tag/v1.15.1 |
Please describe the bug
Nokogiri::XML::Reader.from_io.each
cause exceptionNokogiri::XML::SyntaxError
when XML node contain long non-ascii characters.The XML node contain only valid UTF-8 characters, but cause error
FATAL: Input is not proper UTF-8, indicate encoding !
.Help us reproduce what you're seeing
Expected behavior
Do not raise error.
Environment
Additional context
The text was updated successfully, but these errors were encountered: