Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

License not pickedup for binaries like java (openjdk), node (nodejs) #2765

Open
mithunms333 opened this issue Apr 10, 2024 · 10 comments
Open
Labels
enhancement New feature or request

Comments

@mithunms333
Copy link

mithunms333 commented Apr 10, 2024

What happened:
I ran syft scan on a container image which has java tar binaries downloaded (not installed as rpm linux packages) and placed from openjdk (downloaded from github - adoptium). The SBOM json (SPDX, CycloneDX) lists the binary component with name 'java' and its correct version, location. But its license is not picked up. There is a LICENSE file at the path: '.../openjdk/legal/java.base/LICENSE'.
I believe the issue is same for ALL binaries of all types, whether java or nodejs, and from all github projects/vendors.

Additional trials details:
I also tried the following ideas, but they didnt work:
I went through this syft source code go class 'syft/internal/licenses/list.go', and according to its list, I kept copies of the LICENSE file with renamed versions hoping that some name will get picked up by syft i nsome folder, and at all folders such as:
'.../openjdk/legal/java.base/LICENSE'
'.../openjdk/bin/LICENSE'
'.../openjdk/LICENSE'

These trials did not succeed.

What you expected to happen:
License value GPLv2+ should have been picked up and included in json SBOM files. But it did not happen. SPDX fields for license show as 'NOASSERTION'.

Steps to reproduce the issue:
create a simple linux image by downloading the openjdk tar binaries from github adoptium. then run syft scan on it, generating SPDX or cyclonedx json output format. check the license field values for that component in generated outptu SBOM file.

Anything else we need to know?:
I believe the issue is same for ALL binaries of all types, whether java or nodejs, and from all github projects/vendors.

Environment:

  • Output of syft version:
    Application: syft
    Version: 0.99.0
    BuildDate: 2023-12-21T16:18:46Z
    GitCommit: 3cffa0b
    GitDescription: v0.99.0
    Platform: linux/amd64
    GoVersion: go1.21.5
    Compiler: gc
  • OS (e.g: cat /etc/os-release or similar): RHEL 8.9 / UBI minimal - series 8 or 9 - any.
@mithunms333 mithunms333 added the bug Something isn't working label Apr 10, 2024
@spiffcs spiffcs added the enhancement New feature or request label Apr 11, 2024
@willmurphyscode willmurphyscode removed the bug Something isn't working label Apr 11, 2024
@mithunms333
Copy link
Author

Dear team,
I understand that this ask/issue is tagged for enhancement. Until that change is delivered in the product, I still need the license names to be picked up by syft in my processes. Is there any change or manual work around I can do at my send to overcome this? - such as in which folder should I keep the LICENSE file for these downloaded binaries to make syft pick it up. WIll be much helpful!

@tgerla
Copy link
Contributor

tgerla commented Apr 18, 2024

Hi @mithunms333, unfortunately we don't have a ready workaround for you in this case. We are discussing some improvements the binary catalogers and how to handle some special cases like the JDK and JRE. We do have another issue discussing a possible framework for "hints" that would give you some tools to customize the output of the SBOM on a per cataloger basis: #31

I will go ahead and keep this issue open for you until we have a resolution, and if you need anything else please feel free to open another issue.

@tgerla tgerla moved this to Ready in OSS Apr 18, 2024
@kzantow
Copy link
Contributor

kzantow commented Apr 18, 2024

Developer note: after a discussion about implementing this feature, we think the following approach may work reasonably well and help to scale the binary classifiers without the need to add individual catalogers for each case:

  • Add a configuration to the binary classifiers which allows post-processing after a package has been identified
  • Specifically for licenses, a function to locate and identify license may be added that allows a relative path (and/or possibly absolute path) to be specified to find license information present on the system.

An example of how this might look is (naming and exact details TBD, of course):

		{
			Class:    "java-binary-oracle",
			FileGlob: "**/java",
			EvidenceMatcher: FileContentsVersionMatcher(
					`(?m)\x00(?P<version>[0-9]+[.0-9]+[+][-0-9]+)\x00`),
			Package: "java/jre",
			PURL:    mustPURL("pkg:generic/java/jre@version"),
			CPEs:    singleCPE("cpe:2.3:a:oracle:jre:*:*:*:*:*:*:*:*"),
			Append:  licenseFromFiles("../legal/java.base/LICENSE", "./LICENSE"),
		},

So, in the event that a matching package is discovered by this cataloger, a secondary set of functions may run to append additional information to the package, in this example appending any license information found based on the paths relative to where the binary was located.

@mithunms333
Copy link
Author

mithunms333 commented Apr 19, 2024

Hi @kzantow
Sharing the path locations for openjdk:
In openjdk downloaded tar from github, the LICENSE file will present at:
.../openjdk/legal/java.base/LICENSE

java binary executable will be foudn at:
.../openjdk/bin/java

there would be few other supporting jars- probably applicable to same LICENSE at:
.../openjdk/lib/*.jar

@witchcraze
Copy link
Contributor

witchcraze commented Nov 1, 2024

From https://github.com/anchore/syft/releases/tag/v1.13.0, Syft will check more detail information for JDK binary.
I noticed some json include LICENSE file under metadata->release like this.
How about using this file ?

Now, I'm checking "buildType" and installed path to judge some detected jdk binary are OracleJDK or OpenJDK Oracle build.

"metadata": {
        "release": {
          "javaVersion": "1.8.0_152",
          "osArch": "amd64",
          "osName": "Linux",
          "osVersion": "2.6",
          "source": " .:01c1e35a6ade corba:819ee87a39ab deploy:d6ae396f7716 hotspot:61079977e79a hotspot/make/closed:029cef92e9db hotspot/src/closed:dab91f0c8557 install:ee3ffcd140dc jaxp:e85e84dd9244 jaxws:14d19efa7e5c jdk:14ec99ce504f jdk/make/closed:41cdd11d9b99 jdk/src/closed:31209ceb96b8 langtools:06967271fe02 nashorn:046ba9357cdc",
          "buildType": "commercial"
        },
        "files": [
          "/opt/jdk1.8.0_152/COPYRIGHT",
          "/opt/jdk1.8.0_152/LICENSE",
          "/opt/jdk1.8.0_152/README.html",
          "/opt/jdk1.8.0_152/THIRDPARTYLICENSEREADME.txt",
          "/opt/jdk1.8.0_152/bin/extcheck",

@mithunms333
Copy link
Author

Hi @witchcraze ,
The issue/ gap here is that syft is unable to pickup license info from license file inside the contents of OpenJDK binary tar.gz downloaded from official site (not Oracle provided).
ex: I download the OpenJDK binary tar.gz from following link:
https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jdk_x64_linux_hotspot_17.0.13_11.tar.gz

This tar.gz contains the LICENSE file, which is just a text file.
There is no other config or JSON file for this.

The meta data for this can be foun at following link:
https://api.adoptium.net/v3/assets/latest/17/hotspot?architecture=x64&image_type=jdk&os=linux&vendor=eclipse

Please suggest to me if you have any work around ideas.

@witchcraze
Copy link
Contributor

Sorry for confusion. I have no concrete workaround.

As I wrote, current syft can list files including LICENSE,
As I think maybe this is useful information, I commented.

$ syft version; syft -q /tmp/jdk-17.0.13+11 -o json | jq '.artifacts[] | select(.name == "java")'
Application: syft
Version:    0.99.0
BuildDate:  2023-12-21T16:18:46Z
GitCommit:  3cffa0b7fd276a35123c48e45407c4f402f2c58f
GitDescription: v0.99.0
Platform:   linux/amd64
GoVersion:  go1.21.5
Compiler:   gc
{
  "id": "2c3e81e7164670a7",
  "name": "java",
  "version": "17.0.13+11",
  "type": "binary",
  "foundBy": "binary-cataloger",
  "locations": [
    {
      "path": "/bin/java",
      "accessPath": "/bin/java",
      "annotations": {
        "evidence": "primary"
      }
    }
  ],
  "licenses": [],
  "language": "",
  "cpes": [
    "cpe:2.3:a:oracle:openjdk:17.0.13\\+11:*:*:*:*:*:*:*",
    "cpe:2.3:a:java:java:17.0.13\\+11:*:*:*:*:*:*:*"
  ],
  "purl": "pkg:generic/java@17.0.13+11",
  "metadataType": "binary-signature",
  "metadata": {
    "matches": [
      {
        "classifier": "java-binary-openjdk",
        "location": {
          "path": "/bin/java",
          "accessPath": "/bin/java",
          "annotations": {
            "evidence": "primary"
          }
        }
      },
      {
        "classifier": "java-binary-oracle",
        "location": {
          "path": "/bin/java",
          "accessPath": "/bin/java",
          "annotations": {
            "evidence": "primary"
          }
        }
      }
    ]
  }
}
$ syft version; syft -q /tmp/jdk-17.0.13+11 -o json | jq '.artifacts[] | select(.name == "openjdk")'
Application: syft
Version:    1.16.0
BuildDate:  2024-11-04T22:29:33Z
GitCommit:  8a41d772509d37267a65e0b425808e883e4b9dce
GitDescription: v1.16.0
Platform:   linux/amd64
GoVersion:  go1.22.8
Compiler:   gc
{
  "id": "05aef2ff9f375716",
  "name": "openjdk",
  "version": "17.0.13+11",
  "type": "binary",
  "foundBy": "java-jvm-cataloger",
  "locations": [
    {
      "path": "/release",
      "accessPath": "/release"
    }
  ],
  "licenses": [],
  "language": "",
  "cpes": [
    {
      "cpe": "cpe:2.3:a:oracle:openjdk:17.0.13:*:*:*:*:*:*:*",
      "source": "declared"
    }
  ],
  "purl": "pkg:generic/oracle/openjdk@17.0.13%2B11?repository_url=https://github.com/adoptium/jdk17u.git",
  "metadataType": "java-jvm-installation",
  "metadata": {
    "release": {
      "implementor": "Eclipse Adoptium",
      "implementorVersion": "Temurin-17.0.13+11",
      "javaRuntimeVersion": "17.0.13+11",
      "javaVersion": "17.0.13",
      "javaVersionDate": "2024-10-15",
      "libc": "gnu",
      "modules": [
        :
        module list
        :
      ],
      "osArch": "x86_64",
      "osName": "Linux",
      "source": ".:git:e022498bca4c",
      "buildSource": "git:08621c72262ba260e4bb6451a9c75bf1c0ab365d",
      "buildSourceRepo": "https://github.com/adoptium/temurin-build.git",
      "sourceRepo": "https://github.com/adoptium/jdk17u.git",
      "fullVersion": "17.0.13+11",
      "semanticVersion": "17.0.13+11",
      "buildInfo": "OS: Linux Version: 6.5.0-1025-azure",
      "jvmVariant": "Hotspot",
      "jvmVersion": "17.0.13+11",
      "imageType": "JDK"
    },
    "files": [
        :
        files
        :      
      "/legal/java.base/ADDITIONAL_LICENSE_INFO",
      "/legal/java.base/ASSEMBLY_EXCEPTION",
      "/legal/java.base/LICENSE",
        :
        files
        :
    ]
  }
}
$ head jdk-17.0.13+11/legal/java.base/LICENSE
The GNU General Public License (GPL)

Version 2, June 1991

Copyright (C) 1989, 1991 Free Software Foundation, Inc.
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.

@mithunms333
Copy link
Author

Hi,
Sorry for my language if it caused any misunderstanding. I need the license value to come up inside the license field in syft json output:
"licenses": [],

Syft's license cataloger should pickup this LICENSE file and add the license name in the above json field in output.
It is not coming now/ empty.

@kzantow
Copy link
Contributor

kzantow commented Nov 8, 2024

For openjdk, with the JDK cataloger, there would be a reasonably easy enhancement to search in the set of JVM files matching known license file names to attempt to gather licenses.

Syft integrates a Google license classification library for this purpose elsewhere (like go license enrichment).

@mithunms333
Copy link
Author

requesting to kindly let me know if this is planned in any upcoming enhancement or bug-fix cycle? I am not from the tech stack of syft, and hence unable to fix it myself.
Since java is one of the very popular application programming stack, and many have it in images, not being able to list its license is a significant gap.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
Status: Ready
Development

No branches or pull requests

6 participants