Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Issue with format file extensions #19

Open
joncison opened this issue Mar 15, 2020 · 5 comments
Open

Issue with format file extensions #19

joncison opened this issue Mar 15, 2020 · 5 comments
Assignees
Labels
done - pending review Issue / check is implemented, but a review of it is needed.

Comments

@joncison
Copy link
Contributor

From edamontology/edamontology#421:

  • file_extension in EDAM must be given in lower case
  • file_extension value also appears in hasExactSynonym (and preserving the capitalisation variants, e.g. all uppercase - where these are the "canoncical" variant in use)
@joncison
Copy link
Contributor Author

@matuskalas - a small detail - do we give e.g. ".txt" "txt" or both ? (prob. both?)

@joncison joncison self-assigned this Mar 18, 2020
@joncison
Copy link
Contributor Author

joncison commented Mar 18, 2020

@albangaignard for my first foray in SPARQL, I'm tackling this query, which addresses (from above):

  • file_extension in EDAM must be given in lower case

but I notice that the pattern for the file_extension property currently allows the use of | (pipe) as delimiter between multiple values, e..g yaml|yml.

While this is compact / looks nice, it rather complicates the semantics and downstream uses: file_extension currently means "A string in which one or more commonly used file extensions for a data format are delimited by pipe character(s)." rather than simply "A commonly used file extension for a data format."

I think @matuskalas the right course is to refactor EDAM so that one extension is given per file_extension? In which case the query becomes:

  • file_extension in EDAM must be contain lower-case alphanumeric characters only.

Thoughts please!

cc @hmenager @veitveit

@joncison
Copy link
Contributor Author

PS. @albangaignard my hunch is that most or all the checks will require some Python programming, so your suggestion to use Jupyter notebooks is a very good one!

@joncison
Copy link
Contributor Author

UPDATE

I just finished the query, taking the decision that only lowercase alphanumeric characters are allowed in EDAM Format file extensions. cc @matuskalas @veitveit

This being my first foray into Python and SPARQL in case you have time @albangaignard @hmenager or @hansioan I'd much appreciate some feedback on the quality of the code, which is included here (from this Juypter notebook).

@joncison joncison added the done - pending review Issue / check is implemented, but a review of it is needed. label Mar 20, 2020
@joncison
Copy link
Contributor Author

Just added check that label or exact synonym is defined that matches the file extension, see this notebook

cc @albangaignard @hmenager

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
done - pending review Issue / check is implemented, but a review of it is needed.
Projects
None yet
Development

No branches or pull requests

1 participant