Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ref splitting code could perform more complete validation of content type #162

Open
halfline opened this issue Oct 9, 2024 · 1 comment

Comments

@halfline
Copy link

halfline commented Oct 9, 2024

Recently it was discovered that colons didn't work in the "filename" part of a file ref, because it would be treated as delimiter for the content type. This caused:

╎❯ omlmd push http://quay.io/rhatdan/tinyllama:latest '/home/dwalsh/.local/share/ramalama/repos/ollama/blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816' --empty-metadata

to fail, since the filename has a colon between the hash type and hash in it.

That issue was addressed by #161 which added some I/O (a file existence test) to decide whether or not a colon in the ref "belongs" to the filename or as the ref delimiter. That's the backstory, but during discussion it also came to light that the media type itself isn't validated. This lack of validation had little bearing on the original problem, and so it was decided it was out of scope for the original issue, but it may still merit further investigation. That's why I'm filing this issue (at the request of @vsoch and @tarilabs)

Specifically there are a few areas that could be looked into:

  1. Right now the media type after the colon is not validated, but the OCI spec requires a strictly defined format here
  2. There is some question of whether or not the media type should be allowed to be fully qualified or if only media types of the form maintype/subtype should be supported, not types with parameters (like say text/plain; charset=iso8859-1)
  3. There's some scratch code to do some initial validation and offer better "colon finding" heuristics here (informed by that validation). It's not ready to go as-is. it's untested, and the answer to 2 decides which direction the code needs to go. If parameters shouldn't be supported, the regex's need to be adapted to drop that. If parameters should be supported, ideally there would be some checks added to error out if there are duplicate parameters. It might make sense to just add a dependency on the rfc3986 module or similar
  4. It might also make sense to move the guts of split_path_and_content to the PathAndOptionalContent constructor. It's really just serving as a static constructor in practice.

Anyway, none of this is critical, and it's not something i'm personally going to spend more time on, so we could just close this ticket out if it's not a priority, but I'm filing the issue anyway because I was asked to.

@vsoch
Copy link
Contributor

vsoch commented Oct 9, 2024

Thank you! Much appreciated.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants