Skip to content

Latest commit

 

History

History
94 lines (76 loc) · 8.4 KB

review_manual.md

File metadata and controls

94 lines (76 loc) · 8.4 KB

OEP Data Review

Process and Workflow

An open GitHub Issue

  • A contributor has opened a Metadata Review Issue in the data-preprocessing repository using the issue template and assigned you. Try to help them by fixing all mistakes with a straight forward solution and documenting what you did. Ask for the information that might still be missing for you to finish the review. For your reference: There are example and template files for the metadata string in the oemetadata repository)).

Metadata string

  • If the submitter has not provided a string anywhere you can find it, kindly ask them to provide it to you
  • If the string is attached to the issue, download it and push it to a new branch, named according to the name of the dataset "review/nameofdataset"
  • If the user has already done these things, make sure that the naming is appropriate and continue with the next steps
  • Have a look if the contributor could work off all their checkboxes. If they could not, offer some help. If possible to you, you may also complete a few of their tasks.

Check the license first

  • Only open data is allowed on the platform. If anything else is uploaded, there might be nasty legal consequences. So before deep-diving, check the string if the data is available under an [open license](open license. The most common ones should be CC0, dl-de/zero, CC-BY, dl-de/by, PDDL, ODbL-1.0. as described in the license recommendation. If no open license is available for the dataset, send a friendly reminder to the original contributor that only open data is allowed on the OEP. If there are reasons for a review of metadata on not openly licensed datasets, just establish that the data will not be published on the OEP.

Check string validity

  1. Check if string is valid json with the tool of your choice. If you don't have one, maybe try jsonlint.
  2. Check if omi can parse the string by running omi translate -f oep-v1.4 name_of_string.json. A valid string will just be returned in the command line.

Look at the string

  • Look at the string from top to bottom ...
  1. Check if all keys of the strings are there and whether there are additional keys. Supplement missing ones and remove additional ones. Let the original contributor know about these.
  2. While looking at each item, try to interpret the entered values and make sure they conform to the metadata key description. The following points describe things to check in specific fields, ordered by the sequence of the string.
  3. Make sure the table name follows the OEP Naming Conventions:
  • content
    • name starts with the copyright owner, source, project or model name (e.g. zensus, eGo, oemof)
    • main value (e.g. population)
    • Use underscores as separatos
    • separations with "by" (e.g. by_gender)
    • resolution info with "per" (e.g. per_mun)
  • format
    • only use lower case
    • use the singular instead of plural.
    • use ASCII characters only
    • no points, commas or spaces
    • avoid dates
  • Example: zensus_population_by_gender_per_mun
  1. Check all links in string to make sure that
    • there are no dead links
    • links reference the intended location
    • sources and attributions are correct
  2. Add appropriate OEP tags in the list of keywords
  3. Check the dates for compliance with ISO 8601. This applies to the keys publicationDate, referenceDate, start, end and date.
  4. Make sure that there is an author with a contact and add yourself as a reviewer in the contributors list.
  5. The table should be created in the schema model_draft and then moved to the final schema. Under resources the key name is the name of the table as it will be stored on the OEDB. The schema is specified by putting the schema name in front and separating it from the table name with a dot. So the name will read something like model_draft.tablename. When uploading the data/metadata via oem2orm, model_draft will need to be set as a schema in the resources name. Once that is done, change the schema to where the final location of the dataset is supposed to go. A list of schemas can be observed on the OEP.
  6. Check if the resource description reflets the provided data and make sure that a primary key is set.
  7. Only in the case of geographic data, make sure that:
  • the geometry column is named geom (for vector data) or rast (for raster data)
  • The data type is geometry (or raster)
  • One of the geometric types of PostgreSQL is set for each column
  • The CRS (SRID) defined is defined with an EPSG code. Common codes are WGS84 - EPSG:4326 and ETRS89 - EPSG:3035
  1. Award a badge (see below for criteria)

Optional Steps

  1. If you find new energy-related abbreviations, if you can, see if you can suggest adding them in the OEO by opening an issue, or maybe just to the openmod glossary
  2. If you notice anything that's not described here or if you have any other suggestions for this document, feel free to create an issue on review improvement.

Final step: Uploading

  • Go an check whether the submitter has already
    • created the table (with the upload tutorial)
    • attached the metadata string to the table
    • uploaded any data
  • take over these tasks in case they weren't done yet
  • Check that a primary key is set
  • If any errors occur(ed), check if all columns are described in resources and have the right data format
  • Move the data to the final schema
  • Merge the related PR
  • Tick the last boxes of the issue and close it
  • Yay, good job!

Badge criteria

The database set-up of the OEP aims to support users in achieving good data quality. We developed our data management and publication guidelines with the information and guidelines of the Open Knowledge Foundation. A good resource to learn about working with open data is Software Carpentry. See for example this publication on good enough practices for schientific computing. When the number of users and reviewers becomes large enough, we hope to implement user evaluations and ratings on data quality. For now, we are implementing badges as a reference for quality, because badges come recommended.

The quality of data is indicated by a badge, i.e. Bronze, Silver, Gold or Platinum. A particular badge implies that all the criteria defined for it are fulfilled, including the ones of subordinate badges. So for example a dataset holding a gold badge also fulfills the criteria described in bronze and silver).

  1. Bronze (must-have)
  • Metadata exist
  • Primary key on table (for data on the OEP)
  • Has a name following the naming conventions
  1. Silver (should-have)
  • Meta data is exhaustive, specifically context, a contact, sources and resources are provided
  • Data itself has been provided under an open license
  1. Gold (good-to-have)
  • Additional material, like a documentation or an article are linked
  1. Platinum (best-practice)
  • Approved/rated positively by a number of users OR a testing script is provided