-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Sanitize location of http resources in WT Catalog. Fixes #193 #266
Conversation
Codecov Report
@@ Coverage Diff @@
## master #266 +/- ##
==========================================
+ Coverage 85.55% 85.66% +0.11%
==========================================
Files 38 38
Lines 2181 2198 +17
==========================================
+ Hits 1866 1883 +17
Misses 315 315
Continue to review full report at Codecov.
|
Noting that of the 29 items registered at the catalog root in production, 11 are users trying to register DOIs for unsupported repositories (Dryad, Figshare, Zenodo). There are also several that are for supported providers (DataONE, Globus, DVN) but may predate the implementation. Tangential to this PR, but I think it would make sense to reject (with a sensible error message) any DOI that doesn't match a provider instead of falling back to HTTP. Same goes with handles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach makes a lot of sense to me and the implementation -- including -- work as expected.
Currently, files registered via the http provider end up within the root of the Whole Tale Catalog. Since we "reuse" Girder objects during the registration to avoid duplicates, it quickly leads to name collisions, e.g. if two users register a file called README.md, only one of those can be stored in WT.
Approach
In order to assign a pseudo-unique identifier to http resources we can map a url to a folder structure (using
/
as a separator) and register the file as a leaf item underneath it. E.g.http://example.com/level1/level2/file.zip
->/collection/WholeTale Catalog/WholeTale Catalog/example.com/level1/level2/file.zip
http://use.yt/upload/4166454f
->/collection/WholeTale Catalog/WholeTale Catalog/use.yt/upload/4166454f/axis_test.zip
(usesContent-Disposition
for deriving file name)How to test?
/collection/WholeTale Catalog/WholeTale Catalog
and confirm that/collection/WholeTale Catalog/WholeTale Catalog/www.gw-openscience.org/s/events/
existsTesting migration script
girder-shell scripts/http_files_migrate.py
within Girder container./collection/WholeTale Catalog/WholeTale Catalog
TODO: