Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Separate IOSource#ensure_buffer from IOSource#match. #118

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Mar 5, 2024

Why?

It would affect performance to do a read check in IOSource#match every time, Separate read processing from IOSource#ensure_buffer.

Use IOSource#ensure_buffer in the following cases where @source.buffer is empty.

  1. at the start of pull_event
  2. If a trailing '>' pattern matches, as in @source.match(/\s*>/um).

Benchmark

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.278      10.986        16.430       16.941 i/s -     100.000 times in 9.729858s 9.102574s 6.086579s 5.902885s
                 sax     30.166      30.496        49.851       51.596 i/s -     100.000 times in 3.315008s 3.279069s 2.005961s 1.938123s
                pull     35.459      36.380        60.266       63.134 i/s -     100.000 times in 2.820181s 2.748745s 1.659301s 1.583928s
              stream     33.762      34.636        55.173       55.859 i/s -     100.000 times in 2.961948s 2.887131s 1.812485s 1.790218s

Comparison:
                              dom
         after(YJIT):        16.9 i/s
        before(YJIT):        16.4 i/s - 1.03x  slower
               after:        11.0 i/s - 1.54x  slower
              before:        10.3 i/s - 1.65x  slower

                              sax
         after(YJIT):        51.6 i/s
        before(YJIT):        49.9 i/s - 1.04x  slower
               after:        30.5 i/s - 1.69x  slower
              before:        30.2 i/s - 1.71x  slower

                             pull
         after(YJIT):        63.1 i/s
        before(YJIT):        60.3 i/s - 1.05x  slower
               after:        36.4 i/s - 1.74x  slower
              before:        35.5 i/s - 1.78x  slower

                           stream
         after(YJIT):        55.9 i/s
        before(YJIT):        55.2 i/s - 1.01x  slower
               after:        34.6 i/s - 1.61x  slower
              before:        33.8 i/s - 1.65x  slower

  • YJIT=ON : 1.01x - 1.05x faster
  • YJIT=OFF : 1.01x - 1.06x faster

@kou
Copy link
Member

kou commented Mar 6, 2024

Can we give it better name?
I feel that next_read doesn't imply "read if buffer is empty".

ensure_buffered?

## Why?

It would affect performance to do a read check in `IOSource#match` every time,
Separate read processing from IOSource#ensure_buffer.

Use `IOSource#ensure_buffer` in the following cases:

1. at the start of pull_event
2. after a trailing '>' pattern such as `@source.match(/\s*>/um`)

## Benchmark

```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.278      10.986        16.430       16.941 i/s -     100.000 times in 9.729858s 9.102574s 6.086579s 5.902885s
                 sax     30.166      30.496        49.851       51.596 i/s -     100.000 times in 3.315008s 3.279069s 2.005961s 1.938123s
                pull     35.459      36.380        60.266       63.134 i/s -     100.000 times in 2.820181s 2.748745s 1.659301s 1.583928s
              stream     33.762      34.636        55.173       55.859 i/s -     100.000 times in 2.961948s 2.887131s 1.812485s 1.790218s

Comparison:
                              dom
         after(YJIT):        16.9 i/s
        before(YJIT):        16.4 i/s - 1.03x  slower
               after:        11.0 i/s - 1.54x  slower
              before:        10.3 i/s - 1.65x  slower

                              sax
         after(YJIT):        51.6 i/s
        before(YJIT):        49.9 i/s - 1.04x  slower
               after:        30.5 i/s - 1.69x  slower
              before:        30.2 i/s - 1.71x  slower

                             pull
         after(YJIT):        63.1 i/s
        before(YJIT):        60.3 i/s - 1.05x  slower
               after:        36.4 i/s - 1.74x  slower
              before:        35.5 i/s - 1.78x  slower

                           stream
         after(YJIT):        55.9 i/s
        before(YJIT):        55.2 i/s - 1.01x  slower
               after:        34.6 i/s - 1.61x  slower
              before:        33.8 i/s - 1.65x  slower

```

- YJIT=ON : 1.01x - 1.05x faster
- YJIT=OFF : 1.01x - 1.06x faster
@naitoh naitoh changed the title Separate IOSource#next_read from IOSource#match. Separate IOSource#ensure_buffer from IOSource#match. Mar 6, 2024
@naitoh
Copy link
Contributor Author

naitoh commented Mar 6, 2024

@kou

Can we give it better name? I feel that next_read doesn't imply "read if buffer is empty".

ensure_buffered?

I changed it to ensure_buffer.
What do you think?

@kou
Copy link
Member

kou commented Mar 6, 2024

OK. Let's use it.

@kou kou merged commit 77cb0dc into ruby:master Mar 6, 2024
39 checks passed
@naitoh naitoh deleted the add_next_read branch March 6, 2024 22:12
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants