-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
ParseException could not get message when xml with invalid characters #29
Comments
Could you show a Ruby script and XML that reproduce this problem? |
My XML file contains invalid encoding, part of XML file is:
my rub script is simple:
the xml file is utf-8 encoding, I know the xml contains invalid characters, after I load the xml file, ruby raise <REXML::ParseException: #<ArgumentError: invalid byte sequence in UTF-8> excepiton and I cann't get the exact error info by exception message, if I temporary change the ParseException to_s method line 32 to utf-8 like this: |
Here's a very simple reproduction of this bug (the base64 stuff is just there to make sure the special characters in the string come through): require 'rexml/document'
require 'base64'
include REXML
begin
REXML::Document.new(Base64.decode64("YT08YSDigIs+4oCL\n"))
# Equivalent to:
# REXML::Document.new "a=<a >"
rescue => e
e.to_s
end The input is invalid XML and rightly triggers a It looks like this is a bug in the err << @source.buffer[0..80].force_encoding("ASCII-8BIT").gsub(/\n/, ' ') |
…etrieved if the error content contained Unicode characters. ## Why? If the xml tag contains Unicode characters when the error occurs, an `Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT` exception is raised, ParseException error message cannot be retrieved. See: ruby#29
…alid encoding XML (#123) ## Why? If the XML tag contains Unicode characters and an error is occurred for the tag, an incompatible encoding error is raised. Because our parse exception message parts have an UTF-8 part (that includes the target tag information) and an ASCII-8BIT part (that includes error context input). Fix GH-29 Reported by DuKewu. Thanks!!!
I get the following backtrace message when i load xml:
the xml encoding is UTF-8 and with invalid characters, but parseexception to_s use ASCII-8BIT encoding, so here to_s will raise an exception with encoding fail, user will not get the actual error information in xml
The text was updated successfully, but these errors were encountered: