Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Make parser faster #327

Merged
merged 1 commit into from
Jul 10, 2020
Merged

Make parser faster #327

merged 1 commit into from
Jul 10, 2020

Conversation

pocke
Copy link
Member

@pocke pocke commented Jul 10, 2020

This pull request makes the parser faster.

This pull request includes two hacks.

First, it reduces StringScanner#charpos method calls because it is heavy method.
https://github.com/ruby/strscan/blob/d0c82c20c64323c45ee275f09e5a951a291f1ee2/ext/strscan/strscan.c#L442-L453

profiling

$ stackprof stackprof-out -m new_token
RBS::Parser#new_token (/home/pocke/ghq/github.com/ruby/rbs/lib/rbs/parser.rb:138)
  samples:    42 self (20.0%)  /     48 total (22.9%)
  callers:
      48  (  100.0%)  RBS::Parser#next_token
  callees (6 total):
       3  (   50.0%)  RBS::Location#initialize
       3  (   50.0%)  RBS::Parser::LocatedValue#initialize
  code:
    1    (0.5%) /     1   (0.5%)  |   138  | def new_token(type, value = input.matched)
   20    (9.5%) /    20   (9.5%)  |   139  |   start_index = input.charpos - input.matched.size
   21   (10.0%) /    21  (10.0%)  |   140  |   end_index = input.charpos
                                  |   141  | 
    3    (1.4%)                   |   142  |   location = RBS::Location.new(buffer: buffer,
                                  |   143  |                                            start_pos: start_index,
                                  |   144  |                                            end_pos: end_index)
                                  |   145  | 
    3    (1.4%)                   |   146  |   [type, LocatedValue.new(location: location, value: value)]
                                  |   147  | end

(By the way, we need to modify parser.rb to display the line-level profiling by stackprof. module_eval(<<'...end parser.y/module_eval...', __FILE__, __LINE__ + 1))

Second, it replaces StringScanner#charpos with StringScanner#pos if the source only contains ASCII characters.
it improves performance if the content is ASCII only. But parsing will be slow if the content is not ASCII only.
I think most RBS files are ASCII only, so this hack improves performance on the whole.

Benchmark

require 'benchmark'
require 'rbs'
require 'pathname'

contents = Pathname.glob('stdlib/**/*.rbs').map(&:read)
contents_unicode = contents.map { |c| "# 💪\n\n" + c }

Benchmark.bmbm do |x|
  x.report('almost ascii'){10.times{contents.each { |content| RBS::Parser.parse_signature(content)}}}
  x.report('unicode'){10.times{contents_unicode.each { |content| RBS::Parser.parse_signature(content)}}}
end

before

Rehearsal ------------------------------------------------                                                                                                                                                                                                                                                                                                                                                                              
almost ascii   5.267882   0.532476   5.800358 (  5.809570)                                                                                                                                                                                                                                                                                                                                                                              
unicode        6.008536   0.552040   6.560576 (  6.573858)                                                                                                                                                                                                                                                                                                                                                                              -------------------------------------- total: 12.360934sec
                                                                                                          
                   user     system      total        real                                     
almost ascii   5.135175   0.752033   5.887208 (  5.896382)                              
unicode        6.054542   0.442207   6.496749 (  6.509640) 

Reduce charpos calls

This benchmark doesn't include ascii_only? hack.

1.22x faster for almost ascii case, and 1.27x faster for unicode case.

Rehearsal ------------------------------------------------
almost ascii   4.355789   0.516210   4.871999 (  4.879363)
unicode        4.745674   0.511966   5.257640 (  5.266774)
-------------------------------------- total: 10.129639sec

                   user     system      total        real
almost ascii   4.303953   0.505985   4.809938 (  4.817146)
unicode        4.903094   0.209526   5.112620 (  5.121698)

Reduce charpos calls + ascii_only? hack

The result of this pull request.

1.58x fasterfor almost ascii case, and 1.14x faster for unicode case.

Rehearsal ------------------------------------------------
almost ascii   3.704439   0.216363   3.920802 (  3.926681)
unicode        5.144221   0.212918   5.357139 (  5.366537)
--------------------------------------- total: 9.277941sec

                   user     system      total        real
almost ascii   3.607662   0.113144   3.720806 (  3.726305)
unicode        5.156968   0.542275   5.699243 (  5.708651)

Copy link
Member

@soutaro soutaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 🚀 🚅

@soutaro soutaro merged commit 3da88ed into ruby:master Jul 10, 2020
@pocke pocke deleted the make-parse-faster branch July 10, 2020 14:18
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants