Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix a bug that Stream parser doesn't expand the user-defined entity references for "text" #200

Merged
merged 5 commits into from
Aug 21, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Aug 20, 2024

Why?

Pull parser expands character references and predefined entity references, but doesn't expand user-defined entity references.

Change

  • text_stream_unnormalize.rb
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml/document'
require 'rexml/parsers/sax2parser'
require 'rexml/parsers/pullparser'
require 'rexml/parsers/streamparser'
require 'rexml/streamlistener'

xml = <<EOS
<!DOCTYPE foo [
  <!ENTITY la "1234">
  <!ENTITY lala "--&la;--">
  <!ENTITY lalal "&la;&la;">
]><root><la>&la;</la><lala>&lala;</lala><a>&lt;P&gt; &lt;I&gt; &lt;B&gt; Text &lt;/B&gt; &lt;/I&gt;</a><b>test&#8482;</b></root>
EOS

class StListener
  include REXML::StreamListener

  def text(text)
    puts text
  end
end

puts "REXML(DOM)"
REXML::Document.new(xml).elements.each("/root/*") {|element| puts element.text}

puts ""
puts "REXML(Pull)"
parser = REXML::Parsers::PullParser.new(xml)
while parser.has_next?
  event = parser.pull
  case event.event_type
  when :text
    puts event[1]
  end
end

puts ""
puts "REXML(Stream)"
parser = REXML::Parsers::StreamParser.new(xml, StListener.new).parse

puts ""
puts "REXML(SAX)"
sax = REXML::Parsers::SAX2Parser.new(xml)
sax.listen(:characters) {|x| puts x }
sax.parse

Before (master)

$ ruby text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
&la;      #<= This
&lala;    #<= This
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

After(This PR)

$ ruby text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

naitoh added 4 commits August 21, 2024 07:41
…eferences for "text"

## Why?
Pull parser expands character references and predefined entity references, but doesn't expand user-defined entity references.

## Change
- text_stream_unnormalize.rb
```
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml/document'
require 'rexml/parsers/sax2parser'
require 'rexml/parsers/pullparser'
require 'rexml/parsers/streamparser'
require 'rexml/streamlistener'

xml = <<EOS
<!DOCTYPE foo [
  <!ENTITY la "1234">
  <!ENTITY lala "--&la;--">
  <!ENTITY lalal "&la;&la;">
]><root><la>&la;</la><lala>&lala;</lala><a>&lt;P&gt; &lt;I&gt; &lt;B&gt; Text &lt;/B&gt; &lt;/I&gt;</a><b>test&#8482;</b></root>
EOS

class StListener
  include REXML::StreamListener

  def text(text)
    puts text
  end
end

puts "REXML(DOM)"
REXML::Document.new(xml).elements.each("/root/*") {|element| puts element.text}

puts ""
puts "REXML(Pull)"
parser = REXML::Parsers::PullParser.new(xml)
while parser.has_next?
  event = parser.pull
  case event.event_type
  when :text
    puts event[1]
  end
end

puts ""
puts "REXML(Stream)"
parser = REXML::Parsers::StreamParser.new(xml, StListener.new).parse

puts ""
puts "REXML(SAX)"
sax = REXML::Parsers::SAX2Parser.new(xml)
sax.listen(:characters) {|x| puts x }
sax.parse
```

## Before (master)
```
$ ruby  text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
&la;           #<= This
&lala;         #<= This
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™
```

After(This PR)

```
$ ruby  text_stream_unnormalize.rb
REXML(DOM)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Pull)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(Stream)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™

REXML(SAX)
1234
--1234--
<P> <I> <B> Text </B> </I>
test™
```
## Why?

See:
- ruby#187
- ruby#195

## Change
- Supported `REXML::Security.entity_expansion_limit=` in Stream parser
- Supported `REXML::Security.entity_expansion_text_limit=` in Stream parser
## Why?
Because `StreamParser#entity_expansion_count` was added.
@naitoh naitoh marked this pull request as ready for review August 20, 2024 23:04
test/test_stream.rb Outdated Show resolved Hide resolved
@naitoh naitoh requested a review from kou August 21, 2024 00:54
@kou kou merged commit 6109e01 into ruby:master Aug 21, 2024
61 checks passed
@kou
Copy link
Member

kou commented Aug 21, 2024

Thanks.

@naitoh naitoh deleted the fix_stream_text_unnormalize branch August 21, 2024 07:10
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants