Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

What information should the fingerprint be based on? #69

Open
konstin opened this issue Sep 7, 2023 · 1 comment
Open

What information should the fingerprint be based on? #69

konstin opened this issue Sep 7, 2023 · 1 comment

Comments

@konstin
Copy link

konstin commented Sep 7, 2023

We want to generate the right fingerprint values from ruff for integration with gitlab code quality (PR). The question is, on what information should the fingerprint be based?

If we e.g. take the following python code ...

def a(x=[]):
    x.append(1)
    print(x)


def b(y=[]):
    y.append(2)
    print(y)

... and run ruff on it, we get two B006 violations:

$ ruff --select B --show-source scratch.py
scratch.py:1:9: B006 [*] Do not use mutable data structures for argument defaults
  |
1 | def a(x=[]):
  |         ^^ B006
2 |     x.append(1)
3 |     print(x)
  |
  = help: Replace with `None`; initialize within function

scratch.py:6:9: B006 [*] Do not use mutable data structures for argument defaults
  |
6 | def b(y=[]):
  |         ^^ B006
7 |     y.append(2)
8 |     print(y)
  |
  = help: Replace with `None`; initialize within function

How should these be hashed for the fingerprint? If we include only the message and the source of the violation ([]), we get to identical fingerprints. If we include the line number on the other hand, the fingerprint will change if any line is inserted or removed before them.

@pmhahn
Copy link

pmhahn commented Feb 13, 2025

I also stumbled over the CC spec documentation, which is unclear to me:

fingerprint -- Optional. A unique, deterministic identifier for the specific issue being reported to allow a user to exclude it from future analyses.

Is fingerprint unique per type or per instance, e.g. if I have two overlong lines in two different files, should they have the same fingerprint for type overlong line or should that be two different fingerprints as in overlong line in location 1 and overlong line in location 2?

From what I found by reading other generators it should be the later, e.g. per instance: Both should have the same check_name for overlong line, but fingerprint should be different.

Also is there any limitation on fingerprint, may it be an arbitrary large string or is there a length/character-set limitation, which would require some kind of hashing?

Clarification on that is appreciated.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants