Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Devise standard method of modeling artifacts with multiple names #405

Open
magnusbaeck opened this issue Nov 12, 2024 · 1 comment
Open
Assignees

Comments

@magnusbaeck
Copy link
Member

magnusbaeck commented Nov 12, 2024

Summary

As someone who wants to add traceability of the Docker image used to build an artifact
I want a standard way of modeling Docker images available under many different names (e.g. debian:unstable and debian:unstable-20241016).
So that I can track which exact image was used in the build but also which image:tag was requested.

Context

There are cases when artifacts are available under multiple names/aliases. The main case I want to solve for is Docker (OCI) images but there might be similar use cases with other kinds of artifacts. Such images have an identity based on the SHA-256 parts of the image's contents (the manifest.json file, I believe) but can have zero, one, or more names for convenience, but those names also make it possible to locate and download an image.

Image names have the general form [repository/]name[:tag], e.g. debian:unstable or registry.example.com/debian:unstable. These two example names could at some point refer to the same image, but tags are mutable so the meaning may change over time. It's also common for images to have multiple tags so suit different use cases. For example debian:latest points to the latest Debian image no matter what, debian:unstable points to the latest in the unstable series, and unstable-20241016 points to a specific release in the unstable series (and is probably immutable).

An ArtC can have an ENVIRONMENT link to a an ED, which in turn can have a RUNTIME_ENVIRONMENT link to another ArtC. The current limitation is that it isn't clear how to describe the image used to produce the artifact. You could link to an ArtC with data.identity containing the SHA-256 identity of the image, but then you can't express the intention of the environment, or the requested image. In other words, you can trace the exact image but not how we got there. It could matter whether the artifact was produced in an environment based on debian:unstable or debian:unstable-20241016.

Exemplification

There are a couple of problems we can solve:

  • What do images with mutable names (like debian:unstable) resolve to over time?
  • What exact Docker image was used in the build that produced an artifact?
  • What image did the build ask for?

Drawbacks

No response

Out of Scope

While there might be other cases of artifacts known under different names, no concrete examples have been identified and we're not trying to solve that problem preemptively. However, the solutions that use ArtP to express the names might be useful for other types of artifacts too.

Further links

No response

Acceptance Criteria

No response

Implementation Ideas

We discussed this matter at a community meeting on 2024-11-07 with two initial suggestions and a third one devised at the very end of the meeting.

New event type

The first proposal requires adding a new event type that expresses that an existing ArtC is also available under another identity. This wasn't dismissed, but the notion of artifacts having multiple identities didn't sit right with everyone. It was expressed that a Docker image can only have one identity, one that includes the SHA-256.

image

Use ArtP to express image names

Instead of the new event type, it was suggested that we use ArtP as a way to convey the location of the manifest.json file. That URI can trivially be transformed from e.g. https://registry.example.com/v2/debian/manifests/unstable to the corresponding string accepted by docker pull (registry.example.com/debian:unstable). To make this meaning of the URI more clear, an OCI_MANIFEST value for the data.locations.type enum seems like a good idea.

The drawback of this suggestion is that we don't get to know the requested image, only the concrete one in the ArtC.

image

Use ArtP to express image names and link to them from the ED

A variation of the previous suggestion that alleviates its main drawback is to allow the target of the RUNTIME_ENVIRONMENT link to be an ArtP, which isn't allowed right now.

image

@magnusbaeck
Copy link
Member Author

This was discussed (again) at the 2024-12-05 community meeting with the following conclusions:

  • The third option where the image name and registry location is expressed with the ArtP (implicitly, using the manifest URL), and which relies on extending ED to allow links to ArtP, was deemed the best option.
  • The ArtC for a container image shouldn't contain the registry URI since it goes against the spirit of the package URL. And if the registry URI is included in the purl, how would that be reconciled with the location given in the ArtP? What does it means if they're different?
  • As a corollary to the previous item, there should be exactly one ArtC for a given container image.
  • There are clear parallels to how Git works. A commit is identified by its contents, not how it's accessed (typically via a branch or a tag).
  • While this all constitutes a minor protocol change, the best practices need to be thoroughly documented.

@magnusbaeck magnusbaeck self-assigned this Jan 10, 2025
@magnusbaeck magnusbaeck moved this to Todo in Eiffel Protocol Jan 10, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
Status: Todo
Development

No branches or pull requests

1 participant