Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

the entirety of datasets through identifiers.org #15

Open
yarikoptic opened this issue Jan 10, 2018 · 1 comment
Open

the entirety of datasets through identifiers.org #15

yarikoptic opened this issue Jan 10, 2018 · 1 comment

Comments

@yarikoptic
Copy link
Member

yarikoptic commented Jan 10, 2018

Cons

Analysis/possible difficulties

  • I do not see yet how to discover individual IDs/datasets for a particular prefix (sent out a question via their web interface; the answer was: not at the moment, but it sounded to them as an interesting feature so might come at some point)
  • Not all prefixes relate to "datasets", but some are known as "(data) collections": https://www.ebi.ac.uk/miriam/main/collections/
  • I do not think there is any versioning, but most probably it is assumed that an identifier points to immutable dataset
  • There will be a lot of datasets. So we would need some sensible structure/hierarchy. First level would be the identifier. Then we could partition even further splitting IDs on / and -.
  • There seems to be no "filename" information provided. So we would have choices:
    • like a default git-annex behavior - just use the entire url to compose a unique filename
    • one from the URL (often from Content-Disposition header field) - but that might lead to conflicts since we would allow only for a flat structure:
      • we could preanalyze the entire list of those first and see if conflicts arise. If there are conflicts, try to deduce somehow disambiguating structure. but that is unreliable in case a dataset record changes with more files etc
      • just add an arbitrary, or based on some metadata?, numeric index in addition
@chrisgorgo
Copy link

BTW NeuroVault uses identifiers.org: http://identifiers.org/neurovault.collection and http://identifiers.org/neurovault.image. Happy to answer any questions I can.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants