Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update dataone tales #114

Merged
merged 10 commits into from
Feb 19, 2021
Merged

Update dataone tales #114

merged 10 commits into from
Feb 19, 2021

Conversation

ThomasThelen
Copy link
Member

@ThomasThelen ThomasThelen commented Oct 11, 2020

Fixes issue #107. This is a rewrite of a much uglier PR (#111 ). This PR does not include the refactored unit tests, but does give a little more clarity into the Tale updating changes.

The first two commits just pull things into a few classes to make things easier down the road. The next few fix up a few pep issues. The last commit fixed this issue where one except statement was swallowing an exception from a nested try/catch. I flattened the try/catch blocks and things seem to be working fine.

Reviewer Note

#115 should be merged after this PR is merged.

Updating

A Tale is 'updated' when it's published a second time. If a Tale is published to DataONE, and then a copy is made and
subsequently published the datacite:IsDerivedFrom relation pointing to the first Tale's doi is added to the resource map of the newly published object (see example below for clarity).

When a Tale is published to DataONE two times, the previous system metadata for the EML is retrieved. A new system metadata document is made from it, but with values such for md5 and size updated. This system metadata document is uploaded with the new EML document, and effectively signals DataONE to obsolete the package.

Testing

  1. Create a Tale

  2. Publish it to DataONE

  3. View the landing page for that Tale in a new tab (don't close the tab!)

  4. Publish again

  5. Refresh the URL from step 3, you should see a notification that there is an updated version available
    _

  6. Create a Tale

  7. Publish it to DataONE

  8. Make the Tale public

  9. Sign out and # under a new account

  10. Copy the Tale from step 1

  11. Publish it to DataONE

  12. Copy the ID of the resource map (the uuid in the picture below)

  13. Visit https://dev.nceas.ucsb.edu/knb/d1/mn/v2/object/<RESOURCE_MAP_ID>

  14. Search for "datacite"

  15. Check that it references the copied Tale's doi

Example:

I published a Tale here, copied it and republished here. In the new resource map found here, you can see that the datacite:IsDerivedFrom points to the DOI of the first package.

@codecov
Copy link

codecov bot commented Feb 18, 2021

Codecov Report

Merging #114 (8731b54) into master (0b3485f) will decrease coverage by 1.25%.
The diff coverage is 66.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #114      +/-   ##
==========================================
- Coverage   62.51%   61.25%   -1.26%     
==========================================
  Files           9        9              
  Lines        1163     1226      +63     
==========================================
+ Hits          727      751      +24     
- Misses        436      475      +39     
Impacted Files Coverage Δ
gwvolman/lib/dataone/publish.py 74.34% <66.07%> (-8.64%) ⬇️
gwvolman/lib/dataone/metadata.py 84.18% <67.44%> (-4.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b3485f...8731b54. Read the comment docs.

@Xarthisius
Copy link
Contributor

During step 11 I got:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/girder_worker/task.py", line 148, in __call__
    results = super(Task, self).__call__(*_t_args, **_t_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/gwvolman/gwvolman/tasks.py", line 516, in publish
    provider.publish()
  File "/gwvolman/gwvolman/lib/dataone/publish.py", line 300, in publish
    metadata.set_related_identifiers(manifest, eml_pid, self.tale,
  File "/gwvolman/gwvolman/lib/dataone/metadata.py", line 124, in set_related_identifiers
    old_publish = next(
StopIteration

@Xarthisius
Copy link
Contributor

I fix that ^^, but I think it also suggest that it didn't work as expected, since I can't find any 'datacite' in resource_map. FTR these are my packages I've published in order:

  1. https://dev.nceas.ucsb.edu/view/urn:uuid:1bce1e87-4803-48bc-bfe3-780c0d86f98e
  2. https://dev.nceas.ucsb.edu/view/urn:uuid:484c46b3-5e0d-4b5e-9362-88d90db4be9c
  3. https://dev.nceas.ucsb.edu/view/urn:uuid:42b1a9fc-36ec-43f8-a5eb-a74af07dd6f1

NOTE: I didn't make a Tale public and log in as a different account. Instead I called POST /tale/:id/copy using swagger directly. Source for my tale can be found here: https://sandbox.zenodo.org/record/509115. It requires Jupyter Lab image to be present and for some reason we have JupyterLab in our setup_girder.py, so make sure you fix that prior to importing.

@ThomasThelen
Copy link
Member Author

Thanks for that fix. I added a commit that aliases 'datacite' all the time in the namespace. In the case of the packages above the relations are in there but namespaced as ns1. Here's an example of one with the last commit, with RDF here. You can confirm that the isDerivedFrom relation, <datacite:IsDerivedFrom rdf:resource="doi:10.5072/FK2CC13G82"/>, points to the initial publication by a different user here.

@Xarthisius Xarthisius merged commit 9c658e6 into master Feb 19, 2021
@Xarthisius Xarthisius deleted the update_dataone_tales branch February 19, 2021 18:51
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants