Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add Lint/Casecmp Cop #9753

Open
parkerfinch opened this issue Apr 29, 2021 · 0 comments
Open

Add Lint/Casecmp Cop #9753

parkerfinch opened this issue Apr 29, 2021 · 0 comments

Comments

@parkerfinch
Copy link
Contributor

Background

There has been continuous confusion around the Performance/Casecmp cop.

Right now the Performance/Casecmp cop suggests changing case-insensitive string comparisons from e.g. str.downcase == 'foo' to str.casecmp('foo').zero?.

A problem is that these are not necessarily equivalent, since #casecmp works only on ASCII characters.

# Note that this has a "LATIN SMALL LETTER SHARP S"
str1 = 'straße'

# And this one has a "LATIN CAPITAL LETTER SHARP S"
str2 = 'STRAẞE'

# String#casecmp fails to compare them correctly
str1.casecmp(str2).zero?
=> false

# String#downcase does compare them correctly
str1.downcase == str2.downcase
=> true

This is a known issue, which is why the Performance/Casecmp cop was marked as unsafe.

But there's another issue at play here, with String#downcase and String#upcase not round-tripping successfully and giving incorrect results for some unicode characters.

str1.downcase == str1.upcase.downcase
=> false

The String#casecmp? method performs case folding and compares the strings "correctly":

str1.casecmp?(str1.upcase.downcase)
=> true

A downside of the #casecmp? method is that it is slower than the #lower or the #casecmp approaches, which is why it hasn't been adopted in the performance cop. However, it does seem to be more correct, which is why I agree with @zverok that it should be a Linting cop.

Describe the solution you'd like

I would like to adopt the approach that @zverok suggested here and, instead of having a performance cop, make this a linting cop.

We can separately remove the performance cop (since it changes a program and would conflict with this one). (I'm happy to do that over in rubocop-performance if it sounds good).

We can create a new cop (called Lint/Casecmp, or maybe Lint/CasecmpP to prevent name clashes with the Performance cop) to suggest using casecmp? instead of the #downcase or #upcase approaches.

Let me know if this sounds like a good approach! I am happy to make a PR for this if it will move forward.

Describe alternatives you've considered

One alternative would be to consider casecmp(...).zero? a violation and suggest that it is changed to casecmp?(...). This could lead to confusing conflicts if people have a version of rubocop-performance with the Performance/Casecmp cop and the new Lint/Casecmp cop, so I think it makes sense to consider casecmp(...).zero? to be acceptable. (It's also the fastest approach when dealing with ASCII-only text, so there are many valid use cases for it.)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant