Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Minor updates to BDBag/Deriva #519

Merged
merged 6 commits into from
Feb 16, 2022
Merged

Minor updates to BDBag/Deriva #519

merged 6 commits into from
Feb 16, 2022

Conversation

Xarthisius
Copy link
Collaborator

@Xarthisius Xarthisius commented Feb 12, 2022

This PR adds following enhancement to BDBag and Deriva providers:

  1. manifest-<alg>.txt files are parsed and checksums are stored on imported girder objects (see eb2a284)
  2. manifest.json is parsed to get additional metadata (see d8364fa). It's mostly stored raw on girder objects, with an exception of mimeType that's now properly set on imported items and their identifiers that are taken from bundledAs.uri section (see a1db594)
  3. Main identifier is set on the root of dataset and method for retrieving it was added (see c4c9cbb). Makes WT bag export "just work"^{TM}
  4. Adds a proper unique object identifier to the registered dataset (see cc18930).

TODO

  • tests

How to test?

  1. Click on https://girder.local.wholetale.org/api/v1/integration/deriva?url=https%3A%2F%2Fpbcconsortium.s3.amazonaws.com%2Fwholetale%2F5ad7cdf55b0d5007601015b7ff1ea8d6%2F2021-11-09_21.47.58%2FDataset_1-882P.zip&force=false
  2. After importing a Tale, export it as WT Bag (Tale > (tale menu ellipsis) > export Tale)
  3. Confirm that bag is mostly empty (in terms of files), but manifest-md5.txt, fetch.txt and manifest.json contain a lot of entries/remote files.
    NOTE: Bags are not necessarily complete, nor useful at this stage. Further enhancements will be surely needed.

@codecov
Copy link

codecov bot commented Feb 12, 2022

Codecov Report

Merging #519 (74ac6fd) into master (a251646) will increase coverage by 0.77%.
The diff coverage is 95.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #519      +/-   ##
==========================================
+ Coverage   92.15%   92.92%   +0.77%     
==========================================
  Files          58       58              
  Lines        4460     4508      +48     
==========================================
+ Hits         4110     4189      +79     
+ Misses        350      319      -31     
Impacted Files Coverage Δ
server/lib/deriva/provider.py 96.66% <92.30%> (+28.66%) ⬆️
server/lib/bdbag/bdbag_provider.py 93.90% <96.15%> (+2.99%) ⬆️
server/lib/resolvers.py 94.02% <100.00%> (+11.94%) ⬆️
server/lib/deriva/integration.py 93.54% <0.00%> (+48.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a251646...74ac6fd. Read the comment docs.

@Xarthisius Xarthisius requested a review from hategan February 12, 2022 17:07
Copy link

@craig-willis craig-willis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense to me and works as expected -- from the WT-centric view. I can see this changing in some way when we publish to DERIVA, particularly handling the metadata-CSV files.

tale=False,
)

def getDatasetUID(self, doc: object, user: object) -> str:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly during export (our bags).

@Xarthisius Xarthisius merged commit 4ca6aa2 into master Feb 16, 2022
@Xarthisius Xarthisius deleted the deriva_enhs branch February 16, 2022 15:56
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants