Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve Copyright Detection #3929

Merged
merged 6 commits into from
Oct 4, 2024
Merged

Improve Copyright Detection #3929

merged 6 commits into from
Oct 4, 2024

Conversation

AyanSinhaMahapatra
Copy link
Member

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Add a new matcher_order attribute to LicenseMatch and use it for sorting
matches rather than the matcher string.
This was we can ensure that there is a proper precedence between
matchers when two matches are matching exactly the same text.

The new sort order for matcher is like that:
- 0: 1-hash
- 1: 2-aho
- 2: 1-spdx-id
- 3: 3-seq
- 4: 5-undetected
- 5: 5-aho-frag
- 6: 6-unknown

The outcome is that a hash or aho match for the same text at the same
position will take precedence of the SPDX id match, allowing to curate
and correct some incorrect license expressions if needed.

Reference: #3912
Reported-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
And also improve other copyright detections

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Enable CREDITs detection in main authors loop
And detect more copyrights

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Member

The Azure CI is flaky and randomly fails with:

E               ERROR: Unknown error:
E               Traceback (most recent call last):
E                 File "/home/vsts/work/1/s/src/scancode/interrupt.py", line 89, in interruptible
E                   create_signal(SIGALRM, handler)
E                 File "/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/signal.py", line 56, in signal
E                   handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
E               ValueError: signal only works in main thread of the main interpreter

This is an heisenbug only on Azure so I am going to merge anyway.

@pombredanne pombredanne merged commit 7d0d91a into develop Oct 4, 2024
37 of 39 checks passed
@pombredanne pombredanne deleted the misc-copyrights2 branch October 4, 2024 08:59
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants