Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

request review of metadata #1

Open
MichaelTiemannOSC opened this issue Oct 8, 2021 · 4 comments
Open

request review of metadata #1

MichaelTiemannOSC opened this issue Oct 8, 2021 · 4 comments
Assignees
Labels

Comments

@MichaelTiemannOSC
Copy link
Contributor

Please review and comment on the metadata implementation here:

https://github.com/os-climate/wri-gppd-ingestion-pipeline/blob/master/notebooks/WRI-gppd-ingest.ipynb

Relates to: os-climate/os_c_data_commons#48

@MichaelTiemannOSC
Copy link
Contributor Author

I have pushed a new branch that adapts the pyarrow ideas that Vincent shared yesterday: https://github.com/os-climate/wri-gppd-ingestion-pipeline/tree/metadata-v1

Please have a look and comment.

@caldeirav
Copy link
Contributor

I have now moved the metadata implementation to DBT pipelines - as OpenMetadata is able to ingest metadata from catalog.json which is generated and versioned when generating DBT documentation

@MichaelTiemannOSC
Copy link
Contributor Author

I just reviewed the notebook and see that it is unchanged since October 2021. It needs a complete overhaul in terms of credentials.env variable names, use of osc-ingest-tools, among other things. What's the best way to both true up this notebook to modern standards and true up to new DBT+OpenMetadata?

@caldeirav
Copy link
Contributor

caldeirav commented Aug 6, 2022

I suggest deprecating the older notebook as I am essentially rebuilding the pipeline from scatch. But keep it around so you can have a look when I complete the end-to-end flow, as you may want to make some functional changes (note: most of the data processing that was in the notebook should be in DBT now).

I have already checked in the notebooks for extraction and loading, with the data transformation now being shifted to DBT together with metadata ingestion.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants