Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Stop adding extra new line after XML declaration with pretty format #164

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Jul 9, 2024

If the XML file does not end with a newline, a space is added to the end of the first line.

Failure: test_indent(REXMLTests::TestDocument::WriteTest::ArgumentsTest)
/Users/naitoh/ghq/github.com/naitoh/rexml/test/test_document.rb:270:in `test_indent'
     267:           output = ""
     268:           indent = 2
     269:           @document.write(output, indent)
  => 270:           assert_equal(<<-EOX.chomp, output)
     271: <?xml version='1.0' encoding='UTF-8'?>
     272: <message>
     273:   Hello world!
<"<?xml version='1.0' encoding='UTF-8'?>\n" +
"<message>\n" +
"  Hello world!\n" +
"</message>"> expected but was
<"<?xml version='1.0' encoding='UTF-8'?> \n" +
"<message>\n" +
"  Hello world!\n" +
"</message>">

diff:
? <?xml version='1.0' encoding='UTF-8'?>
  <message>
    Hello world!
  </message>

This is happen because REXML::Formatters::Pretty#write_document has a logic that depends on the last text node.

We should ignore all top-level text nodes with pretty format.

@@ -111,7 +111,7 @@ def write_document( node, output )
# itself, then we don't need a carriage return... which makes this
# logic more complex.
node.children.each { |child|
next if child == node.children[-1] and child.instance_of?(Text)
next if child == "\n" and child.instance_of?(Text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Does this work with the following XML?

<?xml version="1.0" encoding="UTF-8"?>

<message>Hello world!</message>

It seems that we can ignore all top-level text nodes:

diff --git a/lib/rexml/formatters/pretty.rb b/lib/rexml/formatters/pretty.rb
index a1198b7..a838d83 100644
--- a/lib/rexml/formatters/pretty.rb
+++ b/lib/rexml/formatters/pretty.rb
@@ -111,7 +111,7 @@ module REXML
         # itself, then we don't need a carriage return... which makes this
         # logic more complex.
         node.children.each { |child|
-          next if child == node.children[-1] and child.instance_of?(Text)
+          next if child.instance_of?(Text)
           unless child == node.children[0] or child.instance_of?(Text) or
             (child == node.children[1] and !node.children[0].writethis)
             output << "\n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right correction, too.

If next if child.instance_of?(Text), then the following XML works.

<?xml version="1.0" encoding="UTF-8"?>

<message>Hello world!</message>

Copy link

@Fryguy Fryguy Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naitoh Interestingly if you have top-level text nodes that are not an ?xml node, then you end up with the strange behavior below where it outputs a leading blank line.

XXX<message>Hello world!</message>
d = REXML::Document.new("XXX<message>Hello world!</message>")
d.to_s
# => "XXX<message>Hello world!</message>"
d.write(out = '', 2)
out
# => "\n<message>\n  Hello world!\n</message>"

Note that I think this leading "XXX" is invalid XML anyway, but is parsed successfully with REXML (Nokogiri fails in STRICT mode with this). I wonder if you also need to fail on text nodes before the root node similar to how you did it in #161 for those after the root node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you also need to fail on text nodes before the root node similar to how you did it in #161 for those after the root node?

XXX<message>Hello world!</message>

This is invalid XML and I think should be an error.

@kou
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open a new issue and discuss this on the new issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created #184 which fixes the above.

## Why?
Fix REXML::Formatters::Pretty#write_document, which implicitly assumes that the XML file ends with a newline.

If the XML file does not end with a newline, a space is added to the end of the first line.

```
Failure: test_indent(REXMLTests::TestDocument::WriteTest::ArgumentsTest)
/Users/naitoh/ghq/github.com/naitoh/rexml/test/test_document.rb:270:in `test_indent'
     267:           output = ""
     268:           indent = 2
     269:           @document.write(output, indent)
  => 270:           assert_equal(<<-EOX.chomp, output)
     271: <?xml version='1.0' encoding='UTF-8'?>
     272: <message>
     273:   Hello world!
<"<?xml version='1.0' encoding='UTF-8'?>\n" +
"<message>\n" +
"  Hello world!\n" +
"</message>"> expected but was
<"<?xml version='1.0' encoding='UTF-8'?> \n" +
"<message>\n" +
"  Hello world!\n" +
"</message>">

diff:
? <?xml version='1.0' encoding='UTF-8'?>
  <message>
    Hello world!
  </message>

```
@naitoh naitoh force-pushed the fix_formatters_pretty branch from df89ee2 to 01162bd Compare July 10, 2024 12:01
@naitoh naitoh requested a review from kou July 10, 2024 12:19
@kou kou changed the title fix REXML::Formatters::Pretty#write_document Stop adding extra new line after XML declaration by REXML::Formatters::Pretty#write_document Jul 11, 2024
@kou kou changed the title Stop adding extra new line after XML declaration by REXML::Formatters::Pretty#write_document Stop adding extra new line after XML declaration with pretty format Jul 11, 2024
@kou kou merged commit 5e140ed into ruby:master Jul 11, 2024
61 checks passed
@naitoh naitoh deleted the fix_formatters_pretty branch July 11, 2024 01:00
aduth added a commit to 18F/identity-idp that referenced this pull request Jul 17, 2024
aduth added a commit to 18F/identity-idp that referenced this pull request Jul 17, 2024
* Bump rexml to resolve security advisory

changelog: Internal, Dependencies, Update dependencies to resolve security advisories

* Add rexml as explicit dependency

Since we use it in our code, it should be an explicit dependency

See: https://github.com/18F/identity-idp/blob/ea8a6081961d6c373a870dd5fea31efce89fde7e/app/services/proofing/aamva/request/verification_request.rb#L60-L102

* Sync AAMVA fixture to expected output

Likely a result of ruby/rexml#164
kou pushed a commit that referenced this pull request Jul 22, 2024
…fore root element (#184)

## Why?

XML with content at the start of the document is invalid.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

```
[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog

```
[22]   	prolog	   ::=   	XMLDecl Misc* (doctypedecl Misc*)?
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl
```
[23]   	XMLDecl	   ::=   	'<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc

```
[27]   	Misc	   ::=   	Comment | PI | S
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI

```
[16]   	PI	   ::=   	'<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget

```
[17]   	PITarget	   ::=   	Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-doctypedecl
```
[28]   	doctypedecl	   ::=   	'<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>'
```

See: #164 (comment)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants