Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Refactor importing by adding write endpoints for all items #101

Closed
stefandesu opened this issue Jun 4, 2020 · 6 comments · Fixed by #127
Closed

Refactor importing by adding write endpoints for all items #101

stefandesu opened this issue Jun 4, 2020 · 6 comments · Fixed by #127
Labels
Milestone

Comments

@stefandesu
Copy link
Member

The import script is unnecessarily complicated and only usable from the command line. There should be a way to import concepts and schemes with HTTP endpoints and the importing process should become clearer.

@stefandesu
Copy link
Member Author

@nichtich In a private conversation, you said that you would like to remove the --reset parameter from the import script because it is confusing when something is deleted and it might lead to accidental data loss. How about the following:

  • We leave the import script's usage exactly the same as it currently is, except that --reset and any deletion of data is removed from the script (i.e. it can be called without being afraid of deleting something).
    • The question is what the script is doing with duplicates. For schemes I would suggest to override them, for concepts I would suggest to ignore them. Not sure about mappings/concordances.
  • We add a separate script for deletion.
    • In particular, we need a method to delete all concepts for a particular concept scheme. This is currently not implemented as an API endpoint. I could imagine using DELETE /voc/concepts and provide the scheme URI via an URL parameter.

I'm open for changes on all of this though. On the one hand, I would like to have a consistent behavior between different type of entities, on the other hand, we need to consider the usual use cases (in particular our own use cases) which demand slightly inconsistent behavior.

@stefandesu
Copy link
Member Author

stefandesu commented Jun 15, 2020

We now have write endpoints for concept schemes and concepts, so those parts of the import script can be rewritten now.

For the whole import script to be rewritten, there are a bunch more things necessary though:

And probably more that I haven't though of yet.

@stefandesu
Copy link
Member Author

As we are breaking the import script here, but not releasing a new major version, we should try to detect whether the new import script is called with old parameters and show an error with a link to the documentation.

@nichtich nichtich modified the milestones: 1.3.0, 1.2.0 Oct 1, 2020
@stefandesu
Copy link
Member Author

stefandesu commented Oct 2, 2020

The POST endpoints should also support files via upload or URL. This could then be used by the rewritten import script (which would only be a simple wrapper over the internal methods) and by the future admin interface (#39).

Edit: This is now implemented via json-anystream.

stefandesu added a commit that referenced this issue Oct 15, 2020
Note that this is just a workaround for now. The whole import script will be rewritten for #101.
stefandesu added a commit that referenced this issue Oct 15, 2020
Needed for new import script (#101). See also #124.
stefandesu added a commit that referenced this issue Oct 15, 2020
Bulk imports are now read and written to the database 5000 at a time. For non-bulk imports, the whole data is read into memory first and then written.
stefandesu added a commit that referenced this issue Oct 19, 2020
…101)

Notes:
- Currently, this is a separate file and is not tested.
- Next step is to write an additional script that replaces the --reset option of the import script.
- Then, the previous import script can be replaced and the tests adjusted so that they use the reset script instead of --reset.
- Mappings/concordances are still imported directly in the database because the service methods do not yet offer sufficient functionality (needs #98 and #99).
@stefandesu
Copy link
Member Author

stefandesu commented Oct 21, 2020

So I implemented a reset script that replaces everything the --reset parameter on the old script did and more. I also replaced the old import script with the new one and adjusted the tests to use the reset script where necessary. Everything seems to work very well. There are still some open tasks though:

  • Adjustments to documentation
  • Bulk deletion endpoints, at least for /voc/concepts; for now, it should be fine that it's only possible to bulk delete via command line I think, so this could be a separate issue
  • Check parameters for reset script (@nichtich)
    • I'll already note here that I opted for having the type as an option (-t) because otherwise it wouldn't be possible to call the script with a single URI without type. In theory, we could differentiate between type and URIs by checking whether the string conforms to a URI, but I like it the current way actually. But please take a look.
  • More tests, especially for the reset script
  • Separate issue: Creating concordances and importing concordances via endpoints (Create and save into concordances, assign mappings to a concordances #98, Support writing concordances #99) -> issue created
  • Adjust global commands (do we even need a global jskos-import command? I would have suggested replacing jskos-import with jskos-server-import and jskos-server-reset, or to just not offer global commands)
  • Import script: Import a list of mappings for a specific concordance (in case concordance and mappings need to be imported separately)
  • Look at overrideAndRemoveDuplicates.js and decide whether we need that anymore (-> I really don't know why we needed that, so I just deleted it. Good thing about source code repositories is that if we do still need it, it's easy to recover.)
  • Importing annotations (honestly, I simply forgot)

@stefandesu
Copy link
Member Author

I have added a bunch of tests for the import and reset script, trying to test many possibilities. There's probably still potential for more, but it should be enough for now. I will finally merge import-refactor into dev tomorrow and close this issue!

@stefandesu stefandesu linked a pull request Oct 27, 2020 that will close this issue
stefandesu added a commit that referenced this issue Oct 27, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants