Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allowed characters: User IDs and namespaces #478

Open
m-mohr opened this issue Dec 23, 2022 · 3 comments
Open

Allowed characters: User IDs and namespaces #478

m-mohr opened this issue Dec 23, 2022 · 3 comments
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Dec 23, 2022

Currently, the following is allowed in the API:

  • process namespaces (in process graphs) are not restricted wrt characters
  • user IDs are allowed to be: ^[\w\-\.~]+$

In #348 it became obvious that the user ID is likely to be used in process namespaces, but with #348 they are also meant to be part of the URI path. While you can use URI encoding, it's not ideal to allow all characters in the path segment, especially / is an issue here (e.g. due to OAI/OpenAPI-Specification#892).

Some things we've seen in implementations:

  • in the VITO implementation an "e-mail address" like construct (the EGI id) is used as namespace and user ID(?), which would not be valid right now.
  • In JS we do parsing of formulas, which includes processes and their namespaces. Thus symbols from math are problematic in namespaces (and we already run into inconsistencies with the -). The format is usually process@namespace.

I think the user ID and process namespace should cover the same range of allowed characters. While we can probably extend the allowed user IDs without issues in the API, but it might lead to issues when you store files under a folder with the user ID. We should probably restrict the namespace, but we can't enforce it directly in the process graphs in 1.x. Thus we could implicitly do so through #348 (and then it only applies if you implement the endpoints) and only later also apply it to the process graph.

The following set of characters could be allowed for both: ^[\w\-\.~\@]+$ (the : is not allowed in Windows filenames)

@soxofaan
Copy link
Member

soxofaan commented Jan 4, 2023

regarding

in the VITO implementation an "e-mail address" like construct (the EGI id) is used as namespace and user ID(?), which would not be valid right now.

@SerRichard @LukeWeidenwalker : In the EODC implementation you use a user id like us-28h3d94d-c534-4cb0-a4ec-7ae3k2 (e.g. as returned by GET /me). I guess you use an internal mapping to/from EGI user ids for that (or is it some kind of direct hashing/encoding?).
Apart from adhering to ^[\w\-\.~]+$ , do you know if there is/was any other reason not to use the EGI user id directly?
(e.g. to have ability to link multiple EGI user ids to single account?)

@SerRichard
Copy link

@soxofaan Both assumptions for these questions would be the reasons I'd offer for why it was set up like this. Currently we are undergoing a refactor for the api implementation on our side. This is actually a workflow we are likely to keep in place, do you know any obvious reasons not to do this? Though, from #348 my understanding is a backend has some internal id decoupled from the external ids given by providers?

@m-mohr Is there a reason the process namespaces can't be defined by the user?

I'm thinking using an id would probably lead to all back-ends effectively having different values for "/processes/{namespace}/{process_id}". Would it not also be an option to have an endpoint for creating a namespace? With requests going through the aggregator, then only namespaces that have not been used by any backend can be considered available. To me this approach could be simpler programmatically than the aggregator having to locate and merge diverging namespaces from different back-ends. I think this suggestion requires more work on side of definitions, but it could be viable?

@m-mohr
Copy link
Member Author

m-mohr commented Jan 30, 2023

The aim was to keep the specification slim. A whole new set of CRUD endpoints for process namespaces is not necessarily lightweight, especially if you factor federation in.

But federation was not actually thought of when we were defining this. Back in the days, it was simply a 1:1 mapping on a single back-end so this approach made sense. By the way, /processes/{namespace} is just part of an old PR as of now, it's not yet part of the API at all (except for the namespace attribute in process graphs itself).

Self-defining namespaces could still lead to conflicts in the aggregator though (IDs as well, but less likely). Currently, many people for example are still wokrking directly on VITO and if they define something that already has been defined on EODC for example, you get a conflict in the aggregator (assuming public access).

@m-mohr m-mohr added this to the 1.2.0 milestone Feb 10, 2023
@m-mohr m-mohr modified the milestones: 1.2.0, 1.3.0 Apr 18, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants