What information should the fingerprint be based on? #69

konstin · 2023-09-07T16:31:20Z

We want to generate the right fingerprint values from ruff for integration with gitlab code quality (PR). The question is, on what information should the fingerprint be based?

If we e.g. take the following python code ...

def a(x=[]):
    x.append(1)
    print(x)


def b(y=[]):
    y.append(2)
    print(y)

... and run ruff on it, we get two B006 violations:

$ ruff --select B --show-source scratch.py
scratch.py:1:9: B006 [*] Do not use mutable data structures for argument defaults
  |
1 | def a(x=[]):
  |         ^^ B006
2 |     x.append(1)
3 |     print(x)
  |
  = help: Replace with `None`; initialize within function

scratch.py:6:9: B006 [*] Do not use mutable data structures for argument defaults
  |
6 | def b(y=[]):
  |         ^^ B006
7 |     y.append(2)
8 |     print(y)
  |
  = help: Replace with `None`; initialize within function

How should these be hashed for the fingerprint? If we include only the message and the source of the violation ([]), we get to identical fingerprints. If we include the line number on the other hand, the fingerprint will change if any line is inserted or removed before them.

The text was updated successfully, but these errors were encountered:

pmhahn · 2025-02-13T12:15:08Z

I also stumbled over the CC spec documentation, which is unclear to me:

fingerprint -- Optional. A unique, deterministic identifier for the specific issue being reported to allow a user to exclude it from future analyses.

Is fingerprint unique per type or per instance, e.g. if I have two overlong lines in two different files, should they have the same fingerprint for type overlong line or should that be two different fingerprints as in overlong line in location 1 and overlong line in location 2?

From what I found by reading other generators it should be the later, e.g. per instance: Both should have the same check_name for overlong line, but fingerprint should be different.

Also is there any limitation on fingerprint, may it be an arbitrary large string or is there a length/character-set limitation, which would require some kind of hashing?

Clarification on that is appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What information should the fingerprint be based on? #69

What information should the fingerprint be based on? #69

konstin commented Sep 7, 2023

pmhahn commented Feb 13, 2025 •

edited

Loading

What information should the fingerprint be based on? #69

What information should the fingerprint be based on? #69

Comments

konstin commented Sep 7, 2023

pmhahn commented Feb 13, 2025 • edited Loading

pmhahn commented Feb 13, 2025 •

edited

Loading