-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Transitioning from Eurostat Bulk Download to API #243
Comments
@djhurio @antagomir I think that this is a good opportunity to think about the future of the package in the context of our new project and our planned ecossytem. I want to draw the attention to an early development phase package, dataset, that I have sent for peer-review to rOpenSci. In my opinion, it would be a better basis for all rOpenGov packages that interact with API, as it creates a special tibble that contains provenance metadata and other relevant information about the data inquisition. The aim of the package, in the context of eurostat use, is to document properly from downloading , through analysis, till eventual end-result publication on open science repositories or in knowlege graphs (RDF) the results. As most EU open data services are moving towards RDF and SPARQL, I think that we should also think about moving into this direction. This would enable the user to create, for example, subjective interpretations of Eurostat datasets, and release them (such as our improved regional datasets) in full synch with the Eurostat datasets (i.e. whenever Eurostat releases the new version of the dataset, the changes go through the entire chain until the published subjective verison of the dataset.) I wanted to bring this up anyways next week when we are starting to plan our 3-year project, but this deadline with Eurostat probably makes these new strategic development more urgent. I would suggest to start a project that ends in November 2022 with a transition to the new Eurostat API and at the same time reviews potential breaking changes in the package, for example, moving from tibble to the inherited dataset class, which harmonizes well with connectors to Zenodo or the rdflib bindings. |
Also ping @pitkant |
@antagomir @pitkant I think that this would be a good opportunity to think through what we want to do in OpenMuse, and what we must do by November (these changes will affect thoroughly the iotables package, too.) We should somehow carefully plan resources so that we can have a smooth transition to OpenMuse but avoid a disruption of critical packages. |
So @djhurio we hope to have a solution in advance but currently this depends a bit on the availability of suitably skilled person to work on the implementation. Ideas and contributions are very welcome. We will inform here about the progress. |
From Migrating_to_API_TSV.pdf pages 8-9 I got the impression that the current ("legacy") way of downloading data will not (at least immediately) be removed but will continue to function. Migrating to API should therefore be seen as a new functionality that we can implement at our own pace, instead of rushing it before November. I can try and get this confirmed from someone in Eurostat. |
Great, please do check if this can be confirmed. We would proceed with the updates as soon as possible but this may take a bit longer than November. |
One thing is very funny, the new dataset package that I would like to use eventually as a dependency for potentially all rOpenGov / eurostat related packages, actually implements some of these changes already. I think Eurostat just moves closer to the new SDMX/RDF standard reconciliation and that is what exactly the So in a way we are on track, but the dataset package is not expected to be rolled out by November, it is currently waiting for further reviewers in rOpenSci, and I would like to develop it in OpenMuse. |
I received the following answer from Eurostat user support: "The bulk download and the API will coexist for a few months. When the communication announcing the decommissioning will come, you can count around half a year before the decommissioning happens. Nevertheless, we could only advise you to transition to the new API as soon as possible." |
There is also rsdmx... |
Yes, I think the advantage of eurostat pkg has been that it is more specific to eurostat and serves that particular use case better than the general-purpose rsdmx package. These things can be reconsidered if there is evidence otherwise. |
Yes. To be frank I gave a try to rsdmx but I really struggled (and failed) to understand how to connect to Eurostat (I should probably feed the author back...) Still it could be useful to use its infrastructure to present a much more user-friendly interface, like for the {eurostat} package, to the user. |
If we have to rewrite the package with the new API changes, then this is potentially something to look at as one possible solution, at least. |
Dear all, it looks the old Eurostat API is not operating any more and it has broken the API download at the |
Thanks for the reporting. We are actively looking for a solution to this, any support will be appreciated. |
As mentioned in #251, bulk download is still working so please use it. In the meantime in working on fixing the |
Message from the Eurostat BulkDownloadListing website:
We will aim to complete the migration from old BulkDownload to the new API and remove references to BulkDownload from code before October. |
The issue, as it was described in the opening message, is now fixed in v4-dev branch of eurostat: https://github.com/rOpenGov/eurostat/tree/v4-dev
Another avenue worth exploring might be to utilise pure "SDMX-ML" more as abovementioned rsdmx package is doing - at least for fetching dataset metadata from Data Structure Definition (DSD) files. I tried the package and handling big xml files felt a bit slow so I'm not sure if it's the way to go for most users. Probably the easiest way to achieve this would be to rely on rsdmx package as a dependency / import and wrap them in functions that would be similar to other, existing functions. We're looking into that now but, as mentioned in the first message, the "Transitioning from Eurostat Bulk Download to API" part of this issue has now been fixed and it be available in CRAN in the near future. @antaldaniel wrote on Sep 13, 2022:
I am personally not familiar with the relation between SDMX and RDF. SDMX Roadmap 2021-2025 does mentions that
and it would seem that while there certainly is interest in translating SDMX data / metadata to RDF and vice versa in building open data portals and other interoperable infrastructure, it is not something that is a core concern for the SDMX community and institutions that utilize SDMX, such as Eurostat and ECB, at least when they are providing documentation for end users who just need to fetch data regularly. I think eurostat package users also appreciate the "do one thing and do it well" aspect of the package. I would not lightly add packages like rdflib/redland as a dependency as they, in turn, depend on Redland RDF C library that complicates things in CI and testing. I think discussing these types of ideas for added features would be the best in Discussions. |
Closed with the CRAN release of package version 4.0.0 |
I am sure you are aware of the planned changes for the Eurostat data dissemination. For example the Eurostat Bulk Download will be changed to API. It will happen this winter, probably starting from November 2022.
We are heavy users of the "eurostat" package. Thank you for the excellent tool you have developed! Have you planned to make the necessary changes for the "eurostat" package so it will be operational also after the mentioned changes in Eurostat data dissemination?
Let me know if you need help with development or testing regarding this issue.
The text was updated successfully, but these errors were encountered: