Skip to content

Commit

Permalink
Optimize IOSource#read_until method (#210)
Browse files Browse the repository at this point in the history
## Why?
The result of `encode(term)` can be cached.

## Benchmark

```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     17.546      18.512        32.282       32.306 i/s -     100.000 times in 5.699323s 5.402026s 3.097658s 3.095448s
                 sax     25.435      28.294        47.526       50.074 i/s -     100.000 times in 3.931613s 3.534310s 2.104122s 1.997057s
                pull     29.471      31.870        54.400       57.554 i/s -     100.000 times in 3.393211s 3.137793s 1.838222s 1.737494s
              stream     29.169      31.153        51.613       52.898 i/s -     100.000 times in 3.428318s 3.209941s 1.937508s 1.890424s

Comparison:
                              dom
         after(YJIT):        32.3 i/s
        before(YJIT):        32.3 i/s - 1.00x  slower
               after:        18.5 i/s - 1.75x  slower
              before:        17.5 i/s - 1.84x  slower

                              sax
         after(YJIT):        50.1 i/s
        before(YJIT):        47.5 i/s - 1.05x  slower
               after:        28.3 i/s - 1.77x  slower
              before:        25.4 i/s - 1.97x  slower

                             pull
         after(YJIT):        57.6 i/s
        before(YJIT):        54.4 i/s - 1.06x  slower
               after:        31.9 i/s - 1.81x  slower
              before:        29.5 i/s - 1.95x  slower

                           stream
         after(YJIT):        52.9 i/s
        before(YJIT):        51.6 i/s - 1.02x  slower
               after:        31.2 i/s - 1.70x  slower
              before:        29.2 i/s - 1.81x  slower

```

- YJIT=ON : 1.00x - 1.06x faster
- YJIT=OFF : 1.05x - 1.11x faster
  • Loading branch information
naitoh authored Oct 9, 2024
1 parent 622011f commit 1d0c362
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 1 deletion.
3 changes: 2 additions & 1 deletion lib/rexml/source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ def initialize(arg, encoding=nil)
detect_encoding
end
@line = 0
@term_encord = {}
end

# The current buffer (what we're going to read next)
Expand Down Expand Up @@ -227,7 +228,7 @@ def read(term = nil, min_bytes = 1)

def read_until(term)
pattern = Private::PRE_DEFINED_TERM_PATTERNS[term] || /#{Regexp.escape(term)}/
term = encode(term)
term = @term_encord[term] ||= encode(term)
until str = @scanner.scan_until(pattern)
break if @source.nil?
break if @source.eof?
Expand Down
34 changes: 34 additions & 0 deletions test/test_document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,40 @@ def test_utf_16
assert_equal(expected_xml, actual_xml)
end
end

class ReadUntilTest < Test::Unit::TestCase
def test_utf_8
xml = <<-EOX.force_encoding("ASCII-8BIT")
<?xml version="1.0" encoding="UTF-8"?>
<message testing=">">Hello world!</message>
EOX
document = REXML::Document.new(xml)
assert_equal("UTF-8", document.encoding)
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
end

def test_utf_16le
xml = <<-EOX.encode("UTF-16LE").force_encoding("ASCII-8BIT")
<?xml version="1.0" encoding="UTF-16"?>
<message testing=">">Hello world!</message>
EOX
bom = "\ufeff".encode("UTF-16LE").force_encoding("ASCII-8BIT")
document = REXML::Document.new(bom + xml)
assert_equal("UTF-16", document.encoding)
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
end

def test_utf_16be
xml = <<-EOX.encode("UTF-16BE").force_encoding("ASCII-8BIT")
<?xml version="1.0" encoding="UTF-16"?>
<message testing=">">Hello world!</message>
EOX
bom = "\ufeff".encode("UTF-16BE").force_encoding("ASCII-8BIT")
document = REXML::Document.new(bom + xml)
assert_equal("UTF-16", document.encoding)
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
end
end
end
end
end

0 comments on commit 1d0c362

Please # to comment.