Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Sanitize location of http resources in WT Catalog. Fixes #193 #266

Merged
merged 5 commits into from
Mar 18, 2019

Conversation

Xarthisius
Copy link
Collaborator

@Xarthisius Xarthisius commented Mar 13, 2019

Currently, files registered via the http provider end up within the root of the Whole Tale Catalog. Since we "reuse" Girder objects during the registration to avoid duplicates, it quickly leads to name collisions, e.g. if two users register a file called README.md, only one of those can be stored in WT.

Approach

In order to assign a pseudo-unique identifier to http resources we can map a url to a folder structure (using / as a separator) and register the file as a leaf item underneath it. E.g.

  • http://example.com/level1/level2/file.zip -> /collection/WholeTale Catalog/WholeTale Catalog/example.com/level1/level2/file.zip
  • http://use.yt/upload/4166454f -> /collection/WholeTale Catalog/WholeTale Catalog/use.yt/upload/4166454f/axis_test.zip (uses Content-Disposition for deriving file name)

How to test?

  1. Deploy locally
  2. Register LIGO Tale using provided script
  3. In Girder UI navigate to /collection/WholeTale Catalog/WholeTale Catalog and confirm that /collection/WholeTale Catalog/WholeTale Catalog/www.gw-openscience.org/s/events/ exists

Testing migration script

  1. Deploy locally with a production database migrated to v0.6 (remember to fix gridfs assetstore by removing replicaSet and setting a proper mongo uri).
  2. Execute girder-shell scripts/http_files_migrate.py within Girder container.
  3. Confirm that there are no items in /collection/WholeTale Catalog/WholeTale Catalog
  4. Launch "Ligo Tale" and confirm it's still working properly.

TODO:

  • Provide description and "how to test?" here
  • Provide migration script for existing http/https resources
  • Possibly fix tests

@codecov
Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #266 into master will increase coverage by 0.11%.
The diff coverage is 96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #266      +/-   ##
==========================================
+ Coverage   85.55%   85.66%   +0.11%     
==========================================
  Files          38       38              
  Lines        2181     2198      +17     
==========================================
+ Hits         1866     1883      +17     
  Misses        315      315
Impacted Files Coverage Δ
server/rest/dataset.py 85.55% <100%> (+0.49%) ⬆️
server/lib/http_provider.py 87.5% <94.73%> (+3.5%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a59f534...a863a43. Read the comment docs.

@Xarthisius Xarthisius changed the title [WIP] Sanitize location of http resources in WT Catalog. Fixes #193 Sanitize location of http resources in WT Catalog. Fixes #193 Mar 15, 2019
@craig-willis
Copy link

Noting that of the 29 items registered at the catalog root in production, 11 are users trying to register DOIs for unsupported repositories (Dryad, Figshare, Zenodo). There are also several that are for supported providers (DataONE, Globus, DVN) but may predate the implementation. Tangential to this PR, but I think it would make sense to reject (with a sensible error message) any DOI that doesn't match a provider instead of falling back to HTTP. Same goes with handles.

Copy link

@craig-willis craig-willis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach makes a lot of sense to me and the implementation -- including -- work as expected.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants