Link Traversal for Comunica
Learn more about Comunica on our website.
This is a monorepo that contains packages for allowing Comunica to link traversal-based query execution. If you want to use an Link Traversal-enabled Comunica engine, have a look at Comunica SPARQL Link Traversal.
Concretely, link traversal is enabled in the following engines:
- Query engine configurations:
- Comunica SPARQL Link Traversal: A Comunica query engine that includes all Link Traversal packages.
- Comunica SPARQL Link Traversal Solid: A Comunica query engine that includes all Link Traversal and Solid-related packages.
These engines make use of the following packages:
- Seed URL actors:
- Seed URL preprocessor: Actor that sets sources based on the given query, if no other sources were set.
- Join entries sort actors:
- Zero-knowledge: Actor that orders join entries based on heuristics for plan selection in link traversal environments.
- Link extractors (all require Traverse RDF Resolve Hypermedia Links Actor and Traverse RDF Metadata Extract Actor):
- All links extractor: Actor that extracts all URLs in a document for traversal.
- Content policies extractor: Actor that extracts URLs matching content policies in a document for traversal.
- Predicates extractor: Actor that extracts the object URLs of triples that match with a configured predicate regex for traversal.
- Quad pattern extractor: Actor that extracts all URLs that match the current quad pattern in a document for traversal.
- Quad pattern query extractor: Actor that extracts all URLs that match any quad pattern in the current query in a document for traversal.
- Solid type index extractor: Actor that extracts links to types via the Solid type index.
- Pruning actors:
- Traverse Prune ShapeTrees RDF Resolve Hypermedia Links Actor: Actor that prunes links that are guaranteed to not match with the current query based on ShapeTrees metadata.
- Query termination actors:
- Link count limit: Actor that imposes a limit of the maximum number of links that can be pushed into the link queue.
- Link depth limit: Actor that imposes a limit of the depth of link paths that can be pushed into the link queue.
- Source annotation actors:
- Annotate Graph: Annotates triples with their document's URL via the named graph.
- Buses:
- Extract links: Bus that determines the links to follow from a metadata quad stream.
- Mediators:
- Combine array: Mediator that concatenates an array of all actor results.
- Other:
- Context entries: Reusable context key definitions for link traversal.
- Types: Reusable Typescript interfaces and types for link traversal.
Warning: All packages in this repo should be considered unstable, and breaking changes may occur at any time.
Click here to learn more about Link Traversal in Comunica, or to see live examples.
(JSDoc: https://comunica.github.io/comunica-feature-link-traversal/)
This repository should be used by Comunica module developers as it contains multiple Comunica modules that can be composed. This repository is managed as a monorepo using Lerna.
If you want to develop new features or use the (potentially unstable) in-development version, you can set up a development environment for Comunica.
Comunica requires Node.JS 18.0 or higher and the Yarn package manager. Comunica is tested on OSX, Linux and Windows.
This project can be setup by cloning and installing it as follows:
$ git clone https://github.com/comunica/comunica.git
$ cd comunica
$ yarn install
Note: npm install
is not supported at the moment, as this project makes use of Yarn's workspaces functionality
This will install the dependencies of all modules, and bootstrap the Lerna monorepo.
After that, all Comunica packages are available in the packages/
folder
and can be used in a development environment, such as querying with Comunica SPARQL Link Traversal (engines/query-sparql-link-traversal
).
Furthermore, this will add pre-commit hooks
to build, lint and test.
These hooks can temporarily be disabled at your own risk by adding the -n
flag to the commit command.
This code is copyrighted by Ghent University – imec and released under the MIT license.