OEP Data Review

Process and Workflow

An open GitHub Issue

A contributor has opened a Metadata Review Issue in the data-preprocessing repository using the issue template and assigned you. Try to help them by fixing all mistakes with a straight forward solution and documenting what you did. Ask for the information that might still be missing for you to finish the review. For your reference: There are example and template files for the metadata string in the oemetadata repository)).

Metadata string

If the submitter has not provided a string anywhere you can find it, kindly ask them to provide it to you
If the string is attached to the issue, download it and push it to a new branch, named according to the name of the dataset "review/nameofdataset"
If the user has already done these things, make sure that the naming is appropriate and continue with the next steps
Have a look if the contributor could work off all their checkboxes. If they could not, offer some help. If possible to you, you may also complete a few of their tasks.

Check the license first

Only open data is allowed on the platform. If anything else is uploaded, there might be nasty legal consequences. So before deep-diving, check the string if the data is available under an [open license](open license. The most common ones should be CC0, dl-de/zero, CC-BY, dl-de/by, PDDL, ODbL-1.0. as described in the license recommendation. If no open license is available for the dataset, send a friendly reminder to the original contributor that only open data is allowed on the OEP. If there are reasons for a review of metadata on not openly licensed datasets, just establish that the data will not be published on the OEP.

Check string validity

Check if string is valid json with the tool of your choice. If you don't have one, maybe try jsonlint.
Check if omi can parse the string by running omi translate -f oep-v1.4 name_of_string.json. A valid string will just be returned in the command line.

Look at the string

Look at the string from top to bottom ...

Check if all keys of the strings are there and whether there are additional keys. Supplement missing ones and remove additional ones. Let the original contributor know about these.
While looking at each item, try to interpret the entered values and make sure they conform to the metadata key description. The following points describe things to check in specific fields, ordered by the sequence of the string.
Make sure the table name follows the OEP Naming Conventions:

content
- name starts with the copyright owner, source, project or model name (e.g. zensus, eGo, oemof)
- main value (e.g. population)
- Use underscores as separatos
- separations with "by" (e.g. by_gender)
- resolution info with "per" (e.g. per_mun)
format
- only use lower case
- use the singular instead of plural.
- use ASCII characters only
- no points, commas or spaces
- avoid dates
Example: zensus_population_by_gender_per_mun

Check all links in string to make sure that
- there are no dead links
- links reference the intended location
- sources and attributions are correct
Add appropriate OEP tags in the list of keywords
Check the dates for compliance with ISO 8601. This applies to the keys publicationDate, referenceDate, start, end and date.
Make sure that there is an author with a contact and add yourself as a reviewer in the contributors list.
The table should be created in the schema model_draft and then moved to the final schema. Under resources the key name is the name of the table as it will be stored on the OEDB. The schema is specified by putting the schema name in front and separating it from the table name with a dot. So the name will read something like model_draft.tablename. When uploading the data/metadata via oem2orm, model_draft will need to be set as a schema in the resources name. Once that is done, change the schema to where the final location of the dataset is supposed to go. A list of schemas can be observed on the OEP.
Check if the resource description reflets the provided data and make sure that a primary key is set.
Only in the case of geographic data, make sure that:

the geometry column is named geom (for vector data) or rast (for raster data)
The data type is geometry (or raster)
One of the geometric types of PostgreSQL is set for each column
The CRS (SRID) defined is defined with an EPSG code. Common codes are WGS84 - EPSG:4326 and ETRS89 - EPSG:3035

Award a badge (see below for criteria)

Optional Steps

If you find new energy-related abbreviations, if you can, see if you can suggest adding them in the OEO by opening an issue, or maybe just to the openmod glossary
If you notice anything that's not described here or if you have any other suggestions for this document, feel free to create an issue on review improvement.

Final step: Uploading

Go an check whether the submitter has already
- created the table (with the upload tutorial)
- attached the metadata string to the table
- uploaded any data
take over these tasks in case they weren't done yet
Check that a primary key is set
If any errors occur(ed), check if all columns are described in resources and have the right data format
Move the data to the final schema
Merge the related PR
Tick the last boxes of the issue and close it
Yay, good job!

Badge criteria

The database set-up of the OEP aims to support users in achieving good data quality. We developed our data management and publication guidelines with the information and guidelines of the Open Knowledge Foundation. A good resource to learn about working with open data is Software Carpentry. See for example this publication on good enough practices for schientific computing. When the number of users and reviewers becomes large enough, we hope to implement user evaluations and ratings on data quality. For now, we are implementing badges as a reference for quality, because badges come recommended.

The quality of data is indicated by a badge, i.e. Bronze, Silver, Gold or Platinum. A particular badge implies that all the criteria defined for it are fulfilled, including the ones of subordinate badges. So for example a dataset holding a gold badge also fulfills the criteria described in bronze and silver).

Bronze (must-have)

Metadata exist
Primary key on table (for data on the OEP)
Has a name following the naming conventions

Silver (should-have)

Meta data is exhaustive, specifically context, a contact, sources and resources are provided
Data itself has been provided under an open license

Gold (good-to-have)

Additional material, like a documentation or an article are linked

Platinum (best-practice)

Approved/rated positively by a number of users OR a testing script is provided

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review_manual.md

review_manual.md

OEP Data Review

Process and Workflow

An open GitHub Issue

Metadata string

Check the license first

Check string validity

Look at the string

Optional Steps

Final step: Uploading

Badge criteria

Files

review_manual.md

Latest commit

History

review_manual.md

File metadata and controls

OEP Data Review

Process and Workflow

An open GitHub Issue

Metadata string

Check the license first

Check string validity

Look at the string

Optional Steps

Final step: Uploading

Badge criteria