GitHub - vemonet/shapes-of-you: 💠 An index for linked open data & standard knowledge descriptions (ontologies, vocabularies, shapes, queries, mappings)

Access standard knowledge indexed from code repositories, connected to the Linked Open Data access points

🖥 Access the web app at index.semanticscience.org

📬 Query our knowledge graph using the OpenAPI at grlc.io/api-git/vemonet/shapes-of-you/subdir/api (powered by grlc.io and SPARQL)

✨ Directly query the SPARQL endpoint on YASGUI at https://graphdb.dumontierlab.com/repositories/shapes-registry.

The SPARQL endpoint is also conveniently accessible in the webapp Active endpoints tab, since Shapes of You indexes its own SPARQL query files, and computes metadata for its SPARQL endpoint.

Shapes of you is a global index for semantically descriptive files published to public Git repositories (GitHub, GitLab, and Gitee), it enables semantic web enthusiast to connect those standard knowledge definitions to active Linked Open Data access points (SPARQL endpoints).

To be found by our indexer, make sure your repository description, or topics, on GitHub, GitLab, or Gitee includes one of the resources mentionned below, we automatically index files from public repositories every week on saturday at 1:00 GMT+1 🕐

SHACL shapes: we index RDF files such as .ttl, .rdf, .jsonld, etc), with all sh:NodeShape they contain
ShEx expressions: we index .shex files, and ShEx shapes defined in RDF files
SPARQL queries: we index .rq and .sparql files, and parse grlc.io APIs metadata
OWL ontologies: we index all RDF files with all owl:Class they contain
SKOS vocabularies: we index all RDF files with all skos:Concept they contain
RML mappings: we index RDF files, with all r2rml:SubjectMap and rml:LogicalSource they contain
R2RML mappings: we index RDF files, with all r2rml:SubjectMap they contain
CSVW metadata: we index RDF files, with all csvw:Column they contain
Nanopublication templates: we index RDF files, with all nt:AssertionTemplates and inputs they contain
OBO ontologies: we index all .obo files with all terms they contain
OpenAPI specifications: we index .yml, .yaml and .json files, and parse the spec to retrieve API metadata
DCAT datasets: we index RDF files, with all dcat:Dataset they contain

If your repository or endpoint is missed by our indexer:

Additional GitHub repositories in the file EXTRAS_GITHUB_REPOSITORIES.txt
Additional SPARQL endpoints in the file EXTRAS_SPARQL_ENDPOINTS.txt

Technical overview 🧭

This web service is composed of those 4 main parts, described more in details below:

A python script to retrieve SPARQL queries, SHACL & ShEx Shapes files with some metadata from GitHub repositories. The retrieved data is defined using RDF.
- A GitHub Actions workflow runs every week on saturday night to execute the python script, and publish the RDF output to the triplestore
A React web app written in TypeScript, which displays the files and metadata from the SPARQL endpoint with filters, and search
- The website is automatically deployed by GitHub Actions workflows to GitHub Pages at each push to the main branch.
- We use expo to build this Progressive Web App (aka. PWA), it can be installed as a native app on any computer desktop (using Chrome is recommended), or smartphones.
A triplestore with a publicly available SPARQL endpoint at https://graphdb.dumontierlab.com/repositories/shapes-registry
A grlc.io powered OpenAPI to query the SPARQL endpoint at http://grlc.io/api-git/vemonet/shapes-of-you
- Most SPARQL queries used by the webapp are also provided as API calls

Data model 📋

We defined and published a simple schema for our data as a OWL ontology, mainly re-using schema.org concepts.

Checkout the OWL ontology in website/assets/shapes-of-you-ontology.ttl 🦉

Here is an overview of the ontology (generated by gra.fo):

Prefixes

Just copy/paste this if you are missing some prefixes to query the Shapes of You knowledge graph:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX sio: <http://semanticscience.org/resource/SIO_>
PREFIX schema: <https://schema.org/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX shex: <http://www.w3.org/ns/shex#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
PREFIX sdm: <https://w3id.org/vocab/sdm#>
PREFIX r2rml: <http://www.w3.org/ns/r2rml#>
PREFIX rml: <http://semweb.mmlab.be/ns/rml#>
PREFIX nt: <https://w3id.org/np/o/ntemplate/>
PREFIX csvw: <http://www.w3.org/ns/csvw#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

Classes

"Shape" files: schema:SoftwareSourceCode
- Properties:
  - dcterms:hasPart
  - rdfs:comment
  - schema:codeRepository > schema:DataCatalog
- Subclasses:
  - sh:Shape (SHACL shape)
  - shex:Schema (ShEX schema)
  - sh:SPARQLFunction (SPARQL query) - additional properties: void:sparqlEndpoint, schema:query
  - owl:Ontology (OWL ontology)
  - skos:ConceptScheme (SKOS vocabulary)
  - sio:000623 (OBO ontology)
  - schema:APIReference (OpenAPI)
  - rml:LogicalSource (RML and YARRRML mappings)
  - r2rml:TriplesMap (R2RML mappings)
  - nt:AssertionTemplate (Nanopublication templates)
  - dcat:Dataset (DCAT datasets)
Git repositories: schema:DataCatalog
- Properties:
  - rdfs:comment
Active SPARQL endpoints:schema:EntryPoint

Run the web app 🛩️

Requirements: npm and yarn installed.

In development 🏗

Clone the repository:

git clone https://github.com/vemonet/shapes-of-you
cd shapes-of-you

Install dependencies 📥

yarn

Run the web app on http://localhost:19006, it should reload automatically at each changes to the code 🔃

yarn dev

Upgrade the packages versions in yarn.lock 🔒

yarn upgrade

In production 🌍

This website is automatically deployed by a GitHub Actions workflow to GitHub Pages which is accessed from http://index.semanticscience.org

You can also build locally in the /web-build folder and serve on http://localhost:5000 (checkout the Dockerfile)

yarn build
yarn serve

Deploy the backend

Deploy the Oxigraph triplestore and ElasticSearch index using Docker 🐳 (requires docker installed)

Make sure the folder for ElasticSearch has the right permissions

mkdir -p /data/shapes-of-you/elasticsearch
sudo chown -R 1000:0 /data/shapes-of-you/elasticsearch

Deploy the stack

docker-compose up -d

Checkout the docker-compose.yml file to see how we run the Docker image.

⛏️ Index structured and semantic files

Requirements: Python 3.6+, git

🗃️ Index files from code repositories

This script is run every day by the mighty .github/workflows/index-shapes.yml workflow

The Python script retrieves shapes files from various popular Git services API (GitHub GraphQL API, GitLab API , Gitee API), and generates RDF data. The RDF data is then automatically published to the publicly available triplestore by the GitHub workflow.

You can find the python scripts and requirements in the etl folder.

Use this command to locally define the API_GITHUB_TOKEN, GITLAB_TOKEN and GITEE_TOKEN environment variables required to run the script (you might need to adapt on Windows, but you should know better than me):

export API_GITHUB_TOKEN=MYGITHUBTOKEN000
export GITLAB_TOKEN=MYGITLABTOKEN000
export GITEE_TOKEN=MYGITEETOKEN000

Add those commands to your .zshrc or .bashrc to make it permanent

For GitHub you can create a new GitHub API key (aka. personal access token) at https://github.com/settings/tokens

Go to the etl folder:

cd etl

Install the requirements:

pip install -e .

Retrieve shapes files from search the GitHub GraphQL API (you can also use a topic to search, e.g. topic:sparql):

python3 main.py github vemonet/shapes-of-you

Retrieve shapes files from GitLab API using the python-gitlab package:

python3 main.py gitlab sparql

Retrieve shapes files from Gitee API:

python3 main.py gitee ontology

✨ Generate SPARQL endpoints metadata

This task is performed every day by the swifty .github/workflows/analyze-endpoints.yml workflow

We use the d2s tool (aka. data2services) to generate HCLS metadata for a SPARQL endpoint:

pip install d2s
d2s metadata analyze https://graphdb.dumontierlab.com/repositories/shapes-registry -o metadata.ttl

We commit the generated metadata file to the metadata branch, to experiment using git to version and keep track of changes of the metadata generated for the SPARQL endpoints over time.

Enable Virtuoso Linked Data Platform

Enable WebDAV LDP on Virtuoso 7 (from the official Virtuoso documentation)

Start the virtuoso-opensource-7 docker image

docker-compose up -d

The first time you start Virtuoso, or after you reset the database, you will need to run this script to prepare the Linked Data Platform:

./prepare_virtuoso.sh

To prepare for shapes-of-you, create folders github, gitlab, gitee, apis and endpoints using the same owner and permission as for the ldp folder.

Test by uploading a turtle file to the LDP (change the password before):

curl -u ldp:$ENDPOINT_PASSWORD --data-binary @shapes-rdf.ttl -H "Accept: text/turtle" -H "Content-type: text/turtle" -H "Slug: test-shapes-rdf" https://data.index.semanticscience.org/DAV/home/ldp/github

Enable CORS to query the Virtuoso SPARQL endpoint from JavaScript. See the Virtuoso CORS documentation.

Go to Web Application Server > Virtual Domains & Directories
Expand Interface for the Default Web Site
Locate the /sparql Logical Path > click Edit
Enter \* in the Cross-Origin Resource Sharing input field.

👩‍💻 Contribute

Contributions are welcome! See the guidelines to contribute.

🤝 Acknowledgements

RDF data hosted in a Oxigraph triplestore (open source)

OpenAPI powered by grlc.io

SPARQL query UI powered by Triply's YASGUI

Ontology built with gra.fo

Data processing workflows run for free using GitHub Actions open source plan

Files parsed using python libraries: rdflib, obonet, prance

Name		Name	Last commit message	Last commit date
Latest commit History 460 Commits
.github		.github
api		api
etl		etl
website		website
.env.sample		.env.sample
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
EXTRAS_GITHUB_REPOSITORIES.txt		EXTRAS_GITHUB_REPOSITORIES.txt
EXTRAS_GITHUB_REPOSITORIES_2.txt		EXTRAS_GITHUB_REPOSITORIES_2.txt
EXTRAS_SPARQL_ENDPOINTS.txt		EXTRAS_SPARQL_ENDPOINTS.txt
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
doap-project.ttl		doap-project.ttl
docker-compose.yml		docker-compose.yml
endpoint.txt		endpoint.txt
nginx-elastic.conf		nginx-elastic.conf
nginx-oxigraph.conf		nginx-oxigraph.conf
prepare_virtuoso.sh		prepare_virtuoso.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Access standard knowledge indexed from code repositories, connected to the Linked Open Data access points

Technical overview 🧭

Data model 📋

Prefixes

Classes

Run the web app 🛩️

In development 🏗

In production 🌍

Deploy the backend

⛏️ Index structured and semantic files

🗃️ Index files from code repositories

✨ Generate SPARQL endpoints metadata

Enable Virtuoso Linked Data Platform

👩‍💻 Contribute

🤝 Acknowledgements

About

Releases

Contributors 4

Languages

License

vemonet/shapes-of-you

Folders and files

Latest commit

History

Repository files navigation

Access standard knowledge indexed from code repositories, connected to the Linked Open Data access points

Technical overview 🧭

Data model 📋

Prefixes

Classes

Run the web app 🛩️

In development 🏗

In production 🌍

Deploy the backend

⛏️ Index structured and semantic files

🗃️ Index files from code repositories

✨ Generate SPARQL endpoints metadata

Enable Virtuoso Linked Data Platform

👩‍💻 Contribute

🤝 Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 4

Languages