-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Support for Ruby 2.4 casecmp? method #100
Comments
casecmp?
method
Great idea! I guess we'll have to update this cop. |
I have also considered this method. However this is a performance cop, it is doubtful that
# bench.rb
require 'benchmark/ips'
Benchmark.ips do |x|
x.report('upcase') { 'String'.upcase == 'string' }
x.report('downcase') { 'String'.downcase == 'string' }
x.report('casecmp') { 'String'.casecmp('string').zero? }
x.report('casecmp?') { 'String'.casecmp?('string') }
x.compare!
end % ruby -v
ruby 2.7.0p0 (2019-12-25 revision 647ee6f091) [x86_64-darwin17]
% ruby bench.rb
Warming up --------------------------------------
upcase 200.689k i/100ms
downcase 197.741k i/100ms
casecmp 225.196k i/100ms
casecmp? 164.628k i/100ms
Calculating -------------------------------------
upcase 4.208M (± 7.5%) i/s - 21.072M in 5.038656s
downcase 4.329M (± 5.1%) i/s - 21.752M in 5.039575s
casecmp 5.091M (± 3.7%) i/s - 25.447M in 5.005460s
casecmp? 3.014M (± 5.9%) i/s - 15.146M in 5.047086s
Comparison:
casecmp: 5090889.2 i/s
downcase: 4328831.1 i/s - 1.18x slower
upcase: 4207647.1 i/s - 1.21x slower
casecmp?: 3013803.1 i/s - 1.69x slower |
## Background `String#casecmp?` method was suggested to RuboCop Performance to use as a fast method. rubocop/rubocop-performance#100 However `String#casecmp?` is always slower than `String#casecmp("arg").zero?`. ## Summary This PR improves `String#casecmp?` and `Symbol#casecmp?` performance when comparing ASCII characters. OTOH, in the case of Non-ASCII characters, performance is slightly degraded by added condition. And I found ruby#1668 which is essentially the same. ## Benchmark The following is a benchmark case for `String#casecmp?` with similar cases. ```ruby require 'benchmark/ips' puts '* ASCII' Benchmark.ips do |x| x.report('upcase') { 'String'.upcase == 'string' } x.report('downcase') { 'String'.downcase == 'string' } x.report('casecmp.zero?') { 'String'.casecmp('string').zero? } x.report('casecmp?') { 'String'.casecmp?('string') } x.compare! end puts '* Non-ASCII' Benchmark.ips do |x| x.report('upcase') { 'äö '.upcase == ('ÄÖÜ') } x.report('downcase') { 'äö '.downcase == ('ÄÖÜ') } x.report('casecmp.zero?') { 'äö '.casecmp('ÄÖÜ').zero? } x.report('casecmp?') { 'äö '.casecmp?('ÄÖÜ') } x.compare! end ``` ### Before ```console * ASCII Warming up -------------------------------------- upcase 200.428k i/100ms downcase 197.503k i/100ms casecmp.zero? 216.244k i/100ms casecmp? 171.257k i/100ms Calculating ------------------------------------- upcase 4.326M (± 2.9%) i/s - 21.646M in 5.008511s downcase 4.350M (± 2.3%) i/s - 21.923M in 5.042694s casecmp.zero? 4.998M (± 3.7%) i/s - 25.084M in 5.025253s casecmp? 3.090M (± 2.9%) i/s - 15.584M in 5.047497s Comparison: casecmp.zero?: 4998357.4 i/s downcase: 4349766.0 i/s - 1.15x slower upcase: 4325765.3 i/s - 1.16x slower casecmp?: 3090194.2 i/s - 1.62x slower * Non-ASCII Warming up -------------------------------------- upcase 137.352k i/100ms downcase 136.735k i/100ms casecmp.zero? 206.341k i/100ms casecmp? 96.326k i/100ms Calculating ------------------------------------- upcase 2.215M (± 2.7%) i/s - 11.126M in 5.027582s downcase 2.199M (± 2.5%) i/s - 11.076M in 5.038781s casecmp.zero? 4.709M (± 1.8%) i/s - 23.729M in 5.041362s casecmp? 1.315M (± 1.6%) i/s - 6.646M in 5.054479s Comparison: casecmp.zero?: 4708502.7 i/s upcase: 2214590.5 i/s - 2.13x slower downcase: 2199478.9 i/s - 2.14x slower casecmp?: 1315326.8 i/s - 3.58x slower ``` ### After `casecmp?` performance for ASCII characters will be improved. ```console * ASCII Warming up -------------------------------------- upcase 203.640k i/100ms downcase 203.605k i/100ms casecmp.zero? 217.033k i/100ms casecmp? 227.628k i/100ms Calculating ------------------------------------- upcase 4.284M (± 1.4%) i/s - 21.586M in 5.039916s downcase 4.228M (± 1.6%) i/s - 21.175M in 5.009956s casecmp.zero? 5.202M (± 2.7%) i/s - 26.044M in 5.010598s casecmp? 5.308M (± 7.4%) i/s - 26.405M in 5.005934s Comparison: casecmp?: 5308087.4 i/s casecmp.zero?: 5201808.4 i/s - same-ish: difference falls within error upcase: 4283916.1 i/s - 1.24x slower downcase: 4227698.3 i/s - 1.26x slower * Non-ASCII Warming up -------------------------------------- upcase 134.896k i/100ms downcase 133.699k i/100ms casecmp.zero? 204.155k i/100ms casecmp? 90.405k i/100ms Calculating ------------------------------------- upcase 2.160M (± 5.7%) i/s - 10.792M in 5.013980s downcase 2.013M (±12.2%) i/s - 9.894M in 4.999781s casecmp.zero? 4.457M (± 4.8%) i/s - 22.253M in 5.004385s casecmp? 1.257M (± 3.2%) i/s - 6.328M in 5.038371s Comparison: casecmp.zero?: 4457430.6 i/s upcase: 2160130.1 i/s - 2.06x slower downcase: 2012931.9 i/s - 2.21x slower casecmp?: 1257370.8 i/s - 3.55x slower ``` `Symbol#casecmp?` have similar results. ## Other Information This is an off-topic for this PR. I inform RuboCop Performance that `String#casecmp` and `String#casecmp?` behave differently when using Non-ASCII characters. ```ruby 'äöü'.casecmp('ÄÖÜ').zero? #=> false 'äöü'.casecmp?('ÄÖÜ') #=> true ```
## Background `String#casecmp?` method was suggested to RuboCop Performance to use as a fast method. rubocop/rubocop-performance#100 However `String#casecmp?` is always slower than `String#casecmp("arg").zero?`. ## Summary This PR improves `String#casecmp?` and `Symbol#casecmp?` performance when comparing ASCII characters. OTOH, in the case of Non-ASCII characters, performance is slightly degraded by added condition. And I found ruby#1668 which is essentially the same. ## Benchmark The following is a benchmark case for `String#casecmp?` with similar cases. ```ruby require 'benchmark/ips' puts '* ASCII' Benchmark.ips do |x| x.report('upcase') { 'String'.upcase == 'string' } x.report('downcase') { 'String'.downcase == 'string' } x.report('casecmp.zero?') { 'String'.casecmp('string').zero? } x.report('casecmp?') { 'String'.casecmp?('string') } x.compare! end puts '* Non-ASCII' Benchmark.ips do |x| x.report('upcase') { 'äö '.upcase == ('ÄÖÜ') } x.report('downcase') { 'äö '.downcase == ('ÄÖÜ') } x.report('casecmp.zero?') { 'äö '.casecmp('ÄÖÜ').zero? } x.report('casecmp?') { 'äö '.casecmp?('ÄÖÜ') } x.compare! end ``` ### Before ```console * ASCII Warming up -------------------------------------- upcase 200.428k i/100ms downcase 197.503k i/100ms casecmp.zero? 216.244k i/100ms casecmp? 171.257k i/100ms Calculating ------------------------------------- upcase 4.326M (± 2.9%) i/s - 21.646M in 5.008511s downcase 4.350M (± 2.3%) i/s - 21.923M in 5.042694s casecmp.zero? 4.998M (± 3.7%) i/s - 25.084M in 5.025253s casecmp? 3.090M (± 2.9%) i/s - 15.584M in 5.047497s Comparison: casecmp.zero?: 4998357.4 i/s downcase: 4349766.0 i/s - 1.15x slower upcase: 4325765.3 i/s - 1.16x slower casecmp?: 3090194.2 i/s - 1.62x slower * Non-ASCII Warming up -------------------------------------- upcase 137.352k i/100ms downcase 136.735k i/100ms casecmp.zero? 206.341k i/100ms casecmp? 96.326k i/100ms Calculating ------------------------------------- upcase 2.215M (± 2.7%) i/s - 11.126M in 5.027582s downcase 2.199M (± 2.5%) i/s - 11.076M in 5.038781s casecmp.zero? 4.709M (± 1.8%) i/s - 23.729M in 5.041362s casecmp? 1.315M (± 1.6%) i/s - 6.646M in 5.054479s Comparison: casecmp.zero?: 4708502.7 i/s upcase: 2214590.5 i/s - 2.13x slower downcase: 2199478.9 i/s - 2.14x slower casecmp?: 1315326.8 i/s - 3.58x slower ``` ### After `casecmp?` performance for ASCII characters will be improved. ```console * ASCII Warming up -------------------------------------- upcase 203.640k i/100ms downcase 203.605k i/100ms casecmp.zero? 217.033k i/100ms casecmp? 227.628k i/100ms Calculating ------------------------------------- upcase 4.284M (± 1.4%) i/s - 21.586M in 5.039916s downcase 4.228M (± 1.6%) i/s - 21.175M in 5.009956s casecmp.zero? 5.202M (± 2.7%) i/s - 26.044M in 5.010598s casecmp? 5.308M (± 7.4%) i/s - 26.405M in 5.005934s Comparison: casecmp?: 5308087.4 i/s casecmp.zero?: 5201808.4 i/s - same-ish: difference falls within error upcase: 4283916.1 i/s - 1.24x slower downcase: 4227698.3 i/s - 1.26x slower * Non-ASCII Warming up -------------------------------------- upcase 134.896k i/100ms downcase 133.699k i/100ms casecmp.zero? 204.155k i/100ms casecmp? 90.405k i/100ms Calculating ------------------------------------- upcase 2.160M (± 5.7%) i/s - 10.792M in 5.013980s downcase 2.013M (±12.2%) i/s - 9.894M in 4.999781s casecmp.zero? 4.457M (± 4.8%) i/s - 22.253M in 5.004385s casecmp? 1.257M (± 3.2%) i/s - 6.328M in 5.038371s Comparison: casecmp.zero?: 4457430.6 i/s upcase: 2160130.1 i/s - 2.06x slower downcase: 2012931.9 i/s - 2.21x slower casecmp?: 1257370.8 i/s - 3.55x slower ``` `Symbol#casecmp?` have similar results. ## Other Information This is an off-topic for this PR. I inform RuboCop Performance that `String#casecmp` and `String#casecmp?` behave differently when using Non-ASCII characters. ```ruby 'äöü'.casecmp('ÄÖÜ').zero? #=> false 'äöü'.casecmp?('ÄÖÜ') #=> true ```
## Background `String#casecmp?` method was suggested to RuboCop Performance to use as a fast method. rubocop/rubocop-performance#100 However `String#casecmp?` is always slower than `String#casecmp("arg").zero?`. ## Summary This PR improves `String#casecmp?` and `Symbol#casecmp?` performance when comparing ASCII characters. OTOH, in the case of Non-ASCII characters, performance is slightly degraded by added condition. And I found ruby#1668 which is essentially the same. ## Benchmark The following is a benchmark case for `String#casecmp?` with similar cases. ```ruby require 'benchmark/ips' puts '* ASCII' Benchmark.ips do |x| x.report('upcase') { 'String'.upcase == 'string' } x.report('downcase') { 'String'.downcase == 'string' } x.report('casecmp.zero?') { 'String'.casecmp('string').zero? } x.report('casecmp?') { 'String'.casecmp?('string') } x.compare! end puts '* Non-ASCII' Benchmark.ips do |x| x.report('upcase') { 'äö '.upcase == ('ÄÖÜ') } x.report('downcase') { 'äö '.downcase == ('ÄÖÜ') } x.report('casecmp.zero?') { 'äö '.casecmp('ÄÖÜ').zero? } x.report('casecmp?') { 'äö '.casecmp?('ÄÖÜ') } x.compare! end ``` ### Before ```console * ASCII Warming up -------------------------------------- upcase 200.428k i/100ms downcase 197.503k i/100ms casecmp.zero? 216.244k i/100ms casecmp? 171.257k i/100ms Calculating ------------------------------------- upcase 4.326M (± 2.9%) i/s - 21.646M in 5.008511s downcase 4.350M (± 2.3%) i/s - 21.923M in 5.042694s casecmp.zero? 4.998M (± 3.7%) i/s - 25.084M in 5.025253s casecmp? 3.090M (± 2.9%) i/s - 15.584M in 5.047497s Comparison: casecmp.zero?: 4998357.4 i/s downcase: 4349766.0 i/s - 1.15x slower upcase: 4325765.3 i/s - 1.16x slower casecmp?: 3090194.2 i/s - 1.62x slower * Non-ASCII Warming up -------------------------------------- upcase 137.352k i/100ms downcase 136.735k i/100ms casecmp.zero? 206.341k i/100ms casecmp? 96.326k i/100ms Calculating ------------------------------------- upcase 2.215M (± 2.7%) i/s - 11.126M in 5.027582s downcase 2.199M (± 2.5%) i/s - 11.076M in 5.038781s casecmp.zero? 4.709M (± 1.8%) i/s - 23.729M in 5.041362s casecmp? 1.315M (± 1.6%) i/s - 6.646M in 5.054479s Comparison: casecmp.zero?: 4708502.7 i/s upcase: 2214590.5 i/s - 2.13x slower downcase: 2199478.9 i/s - 2.14x slower casecmp?: 1315326.8 i/s - 3.58x slower ``` ### After `casecmp?` performance for ASCII characters will be improved. ```console * ASCII Warming up -------------------------------------- upcase 206.368k i/100ms downcase 205.971k i/100ms casecmp.zero? 225.636k i/100ms casecmp? 232.030k i/100ms Calculating ------------------------------------- upcase 4.337M (± 1.3%) i/s - 21.875M in 5.045187s downcase 4.297M (± 1.5%) i/s - 21.627M in 5.033750s casecmp.zero? 5.240M (± 1.4%) i/s - 26.399M in 5.038613s casecmp? 5.548M (± 1.5%) i/s - 27.844M in 5.019895s Comparison: casecmp?: 5547954.3 i/s casecmp.zero?: 5240466.7 i/s - 1.06x slower upcase: 4336522.3 i/s - 1.28x slower downcase: 4297370.8 i/s - 1.29x slower ### Non-ASCII Warming up -------------------------------------- upcase 141.841k i/100ms downcase 141.233k i/100ms casecmp.zero? 215.401k i/100ms casecmp? 94.625k i/100ms Calculating ------------------------------------- upcase 2.238M (± 2.1%) i/s - 11.205M in 5.010247s downcase 2.243M (± 1.3%) i/s - 11.299M in 5.038967s casecmp.zero? 4.611M (± 2.6%) i/s - 23.048M in 5.001970s casecmp? 1.255M (± 2.4%) i/s - 6.340M in 5.055611s Comparison: casecmp.zero?: 4610838.3 i/s downcase: 2242621.9 i/s - 2.06x slower upcase: 2237553.1 i/s - 2.06x slower casecmp?: 1254739.1 i/s - 3.67x slower ``` `Symbol#casecmp?` have similar results. ## Other Information This is an off-topic for this PR. I inform RuboCop Performance that `String#casecmp` and `String#casecmp?` behave differently when using Non-ASCII characters. ```ruby 'äöü'.casecmp('ÄÖÜ').zero? #=> false 'äöü'.casecmp?('ÄÖÜ') #=> true ```
I've reported about this cop in 2017: rubocop/rubocop#4277 (comment)
|
And there are no actions in |
Funnily enough, str = "Straße"
str2 = str.upcase
# => "STRASSE"
str.downcase == str2.downcase
# => false
# ...because...
[str.downcase, str2.downcase]
# => ["straße", "strasse"]
# ...so, you might want to...
[str.downcase(:fold), str2.downcase(:fold)]
# => ["strasse", "strasse"]
# ...but you can just
str.casecmp?(str2)
# => true So, it seems that:
|
@zverok good case. Can we use |
@AlexWayfer counter-example would be EXACTLY the same :) German's "lowercase sharp s" ("ß") generally uppercased as "SS", but "uppercase sharp s" still exists (used for avoiding ambiguity in uppercase passport data, for example). So, peculiarly... "STRASSE" == "STRAẞE"
# => false
"STRASSE".upcase == "STRAẞE".upcase
# => false
"STRASSE".casecmp?("STRAẞE")
# => true Generally, case folding is a "right" thing to do for case-insensitive comparison, and it is complicated, and there is no "simple shortcut" around it, that's why we need it. |
Wow, it's interesting. These chars look similar in GitHub, but not in my terminal: Thank you. Then, if Ruby (and other languages, I've checked) works so, let's move this cop as suggested by @zverok. |
Is your feature request related to a problem? Please describe.
The suggestion for
casecmp
can be updated to use the newer Ruby 2.4 methodcasecmp?
instead ofcasecmp.zero?
which requires the user to know the contract for thecasecmp
method. The?
variation returns truthy or falsy values and is more succinct.Example:
str.casecmp('ABC').zero?
vs.str.casecmp?('ABC')
Describe the solution you'd like
Would it be possible to include
casecmp?
as a potentialGood
alternative in the casecmp cop?Describe alternatives you've considered
Removing the
.zero?
variation completely might not be preferable if we are supporting versions of Ruby before 2.4.Additional context
N/A
The text was updated successfully, but these errors were encountered: