Coding in R is useless without interesting research questions; and even the best questions remain unanswered without data. RStudio provides a number of convenient ways to access data, among which the possibility to write SQL code chunks in Rmarkdown, to run these chunks and to assign the value of the query result directly to a variable of your choice. No such thing is available yet for SPARQL queries: the ones that allow you to navigate gigantic knowledge graphs that incarnate the conscience of the semantic web. This is where the SPARQLchunks package steps in.
This package allows you to query SPARQL endpoints in two different ways:
- It allows you to run SPARQL chunks in Rmarkdown files.
- It provides inline functions to send SPARQL queries to a user-defined endpoint and retrieve data in dataframe form (
sparql2df
) or list form (sparql2list
).
Endpoints can be reached from behind corporate firewalls on Windows machines thanks to automatic proxy detection. See Execute SPARQL chunks in R Markdown.
Most users can install by running this command
remotes::install_github("aourednik/SPARQLchunks", build_vignettes = TRUE)
If you are behind a corporate firewall on a Windows machine, direct access to GitHub might be blocked. If that is your case, run this installation code instead:
proxy_url <- curl::ie_get_proxy_for_url("https://github.com")
httr::set_config(httr::use_proxy(proxy_url))
remotes::install_url("https://github.com/aourednik/SPARQLchunks/archive/refs/heads/master.zip", build_vignettes = TRUE)
To use the full potential of the package you need to load the library and tell knitr that a SPARQL engine exists:
library(SPARQLchunks)
knitr::knit_engines$set(sparql = SPARQLchunks::eng_sparql)
Once you have done so, you can run SPARQL chunks:
output.var: the name of the data frame you want to store the results in
endpoint: the URL of the SPARQL endpoint
autoproxy: whether or not try to use the automatic proxy detection
auth: authentication information for the sparql endpoint (as an httr authentication object, optional)
Example 1 (Swiss administration endpoint)
```{sparql output.var="queryres_df", endpoint="https://lindas.admin.ch/query"}
PREFIX schema: <http://schema.org/>
SELECT * WHERE {
?sub a schema:DataCatalog .
?subtype a schema:DataType .
}
```
Example 2 (Uniprot endpoint)
Note the use of attempt at automatic proxy detection.
```{sparql output.var="tes5", endpoint="https://sparql.uniprot.org/sparql", autoproxy=TRUE}
PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?taxon
FROM <http://sparql.uniprot.org/taxonomy>
WHERE {
?taxon a up:Taxon .
} LIMIT 500
```
Example 3 (WikiData endpoint):
```{sparql output.var="res.df", endpoint="https://query.wikidata.org/sparql"}
SELECT DISTINCT ?item ?itemLabel ?country ?countryLabel ?linkTo ?linkToLabel
WHERE {
?item wdt:P1142 ?linkTo .
?linkTo wdt:P31 wd:Q12909644 .
VALUES ?type { wd:Q7278 wd:Q24649 }
?item wdt:P31 ?type .
?item wdt:P17 ?country .
MINUS { ?item wdt:P576 ?abolitionDate }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
```
output.var: the name of the list you want to store the results in
endpoint: the URL of the SPARQL endpoint
output.type : when set to "list", retrieves a list (tree structure) instead of a data-frame
autoproxy: whether or not try to use the automatic proxy detection
```{sparql output.var="queryres_list", endpoint="https://lindas.admin.ch/query", output.type="list"}
PREFIX schema: <http://schema.org/>
SELECT * WHERE {
?sub a schema:DataCatalog .
?subtype a schema:DataType .
}
```
The inline functions sparql2df
and sparql2list
both have the same pair of arguments: a SPARQL endpoint and a SPARQL query. Queries can be multi-line:
endpoint <- "https://lindas.admin.ch/query"
query <- "PREFIX schema: <http://schema.org/>
SELECT * WHERE {
?sub a schema:DataCatalog .
?subtype a schema:DataType .
}"
result_df <- sparql2df(endpoint,query)
The same but with attempt at automatic proxy detection:
result_df <- sparql2df(endpoint,query,autoproxy=TRUE)
result_list <- sparql2list(endpoint,query)
The same but with attempt at automatic proxy detection:
result_list <- sparql2list(endpoint,query,autoproxy=TRUE)