-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: Extra content at the end of the document #161
Conversation
## Why? XML with multiple root elements is invalid. See: ruby#160 (comment)
641d9d1
to
4e9de51
Compare
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: ruby#161 (comment)
lib/rexml/parsers/baseparser.rb
Outdated
return [ :start_element, tag, attributes ] | ||
end | ||
else | ||
text = @source.read_until("<") | ||
if text.chomp!("<") | ||
@source.position -= "<".bytesize | ||
end | ||
if @tags.empty? and @have_root | ||
if text.strip != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strip
allocates a new string. Can we avoid it?
For example: /\A\s*\z/.match?(text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
I see.
## Why? XML with additional content at the end of the document is invalid. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc ``` [27] Misc ::= Comment | PI | S ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI ``` [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget ``` [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) ```
4e9de51
to
c094825
Compare
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: ruby#161 (comment)
Thanks. |
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: #161 (comment)
@naitoh @kou After this change, parsing of The main cause is this if statement at lib/rexml/parsers/baseparser.rb:498
|
|
@kou Well, I use socket to get xml messages from the server and parse them using PullParser. Each message is complete and valid. Before change it worked like a charm. Now it doesn't work anymore. |
OK. Could you open a new issue for it? |
Why?
XML with additional content at the end of the document is invalid.
https://www.w3.org/TR/2006/REC-xml11-20060816/#document
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget