-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Stop adding extra new line after XML declaration with pretty format #164
Conversation
lib/rexml/formatters/pretty.rb
Outdated
@@ -111,7 +111,7 @@ def write_document( node, output ) | |||
# itself, then we don't need a carriage return... which makes this | |||
# logic more complex. | |||
node.children.each { |child| | |||
next if child == node.children[-1] and child.instance_of?(Text) | |||
next if child == "\n" and child.instance_of?(Text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Does this work with the following XML?
<?xml version="1.0" encoding="UTF-8"?>
<message>Hello world!</message>
It seems that we can ignore all top-level text nodes:
diff --git a/lib/rexml/formatters/pretty.rb b/lib/rexml/formatters/pretty.rb
index a1198b7..a838d83 100644
--- a/lib/rexml/formatters/pretty.rb
+++ b/lib/rexml/formatters/pretty.rb
@@ -111,7 +111,7 @@ module REXML
# itself, then we don't need a carriage return... which makes this
# logic more complex.
node.children.each { |child|
- next if child == node.children[-1] and child.instance_of?(Text)
+ next if child.instance_of?(Text)
unless child == node.children[0] or child.instance_of?(Text) or
(child == node.children[1] and !node.children[0].writethis)
output << "\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right correction, too.
If next if child.instance_of?(Text)
, then the following XML works.
<?xml version="1.0" encoding="UTF-8"?>
<message>Hello world!</message>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naitoh Interestingly if you have top-level text nodes that are not an ?xml
node, then you end up with the strange behavior below where it outputs a leading blank line.
XXX<message>Hello world!</message>
d = REXML::Document.new("XXX<message>Hello world!</message>")
d.to_s
# => "XXX<message>Hello world!</message>"
d.write(out = '', 2)
out
# => "\n<message>\n Hello world!\n</message>"
Note that I think this leading "XXX" is invalid XML anyway, but is parsed successfully with REXML (Nokogiri fails in STRICT mode with this). I wonder if you also need to fail on text nodes before the root node similar to how you did it in #161 for those after the root node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's open a new issue and discuss this on the new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created #184 which fixes the above.
## Why? Fix REXML::Formatters::Pretty#write_document, which implicitly assumes that the XML file ends with a newline. If the XML file does not end with a newline, a space is added to the end of the first line. ``` Failure: test_indent(REXMLTests::TestDocument::WriteTest::ArgumentsTest) /Users/naitoh/ghq/github.com/naitoh/rexml/test/test_document.rb:270:in `test_indent' 267: output = "" 268: indent = 2 269: @document.write(output, indent) => 270: assert_equal(<<-EOX.chomp, output) 271: <?xml version='1.0' encoding='UTF-8'?> 272: <message> 273: Hello world! <"<?xml version='1.0' encoding='UTF-8'?>\n" + "<message>\n" + " Hello world!\n" + "</message>"> expected but was <"<?xml version='1.0' encoding='UTF-8'?> \n" + "<message>\n" + " Hello world!\n" + "</message>"> diff: ? <?xml version='1.0' encoding='UTF-8'?> <message> Hello world! </message> ```
df89ee2
to
01162bd
Compare
Likely a result of ruby/rexml#164
* Bump rexml to resolve security advisory changelog: Internal, Dependencies, Update dependencies to resolve security advisories * Add rexml as explicit dependency Since we use it in our code, it should be an explicit dependency See: https://github.com/18F/identity-idp/blob/ea8a6081961d6c373a870dd5fea31efce89fde7e/app/services/proofing/aamva/request/verification_request.rb#L60-L102 * Sync AAMVA fixture to expected output Likely a result of ruby/rexml#164
…fore root element (#184) ## Why? XML with content at the start of the document is invalid. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc ``` [27] Misc ::= Comment | PI | S ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI ``` [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget ``` [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-doctypedecl ``` [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>' ``` See: #164 (comment)
If the XML file does not end with a newline, a space is added to the end of the first line.
This is happen because
REXML::Formatters::Pretty#write_document
has a logic that depends on the last text node.We should ignore all top-level text nodes with pretty format.