Using standardised metadata descriptions makes datasets:
- More discoverable
- Easily syndicated
- Transferable
- Easily combined with other datasets
Ultimately makes it easier for datasets to be used in real-world situations to add value.
Learn more: Metadata standards for open data
Metadata is structured information about a stream or channel separate from the content itself (title, language, media type, etc.). It is stored in the blockchain as the value property of a claim.
ℹ️ | The content of this document will cover only specific areas for improvement, please read the complete metadata specification |
---|
This are all the metadata fields mentioned in the doucument:
Name | Description | Required |
---|---|---|
license |
A valid spdx license identifier or english acronym | Required |
license_url |
A valid url for the actual license | Not required |
description |
A simple description of the content. It can include nested metadata (YFM) | Not required |
Copyright is a law that gives the owner of a work (for example, a book, movie, picture, song or website) the right to say how other people can use it. These rights include:
- The right to reproduce the work.
- Prepare derivative works.
- Distribute copies.
- Perform and display the work publicly.
It helps protect authors from other people copying their works without permission and/or for commercial purposes.
Including this information on the metadata is important to prevent unintentional copyright infringement and makes easy for everyone to discover, share, reuse or remix content legally.
Identifiers are short strings so they can take less space and are easy to process by other software or programs.
By providing a short identifier, users can efficiently refer to a license without having to redundantly reproduce the full license.
They also help dealing with typos and multilingual content, for example take a look at this two licenses:
- Attribution-NonCommercial-ShareAlike 4.0 International
- Attribution - Pas d’Utilisation Commerciale - Partage dans les Mêmes Conditions 4.0 International
Unless you can read and understand both languages (english and french) it is difficult to tell if they are the same license or different types.
Example using the correct format:
{ "license": "CC-BY-NC-SA-4.0" }
Learn more: https://spdx.org/licenses/
There is no identifier registered for "All rights reserved" on the SPDX License list, but you can use the ARR
acronym instead of the legacy string.
Example using the correct format:
{ "license": "ARR" }
For public domain is recommended to use the CC0-1.0
spdx-license-identifier or the english acronym PD
instead of the legacy string "Public domain".
Example using the correct format:
{ "license": "CC0-1.0" }
With a valid spdx license identifier there is no need to provide an url and the license_url
field can be ignored. However if your content is published under a different license that is not registered on the SPDX License list please include a valid one.
Example using the correct format:
{ "license_url": "http://domain.com/custom_license/1.0/archive.txt" }
Legacy strings are supported for compatibility with old metadata published and they will be deprecated in the future. You should use the english acronym instead.
Name | Legacy string |
---|---|
PD |
Public Domain |
ARR |
All rights reserved, Copyrighted |
Work in progress, we need more help and feedback from the community. |
---|
Some types of content require very specific metadata information wich is not provided in the current metadata schema.
Since most platforms interpret the description
field as markdown, it is possible to include nested metadata within this field using yaml
or json
front matter:
Front matter is metadata located at the top of the markdown file.
Front matter examples:
YAML
---
key: value
---
Additional content ( usually as markdown format )...
JSON
{ key: value }
Additional content ( usually as markdown format )...
The nested metadata included on the yaml block should be very minimal and only used if the current metadata fileds don't provide enough information.
Nested metadata keys should follow a specific naming convention and never tried to replace the current available metadata fields.
Nested metadata values should only include common data types such as string or numbers.
If the nested metadata has an invalid syntax, format or structure or does not provide any relevant information it should be ignored.
Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.
Software or applications should use a clear predefined schema to validate the nested metadata before any other process or interaction with it. Schema.org provides the prefered structured data schemas to use for extending the claim metadata. See list of available schemas