Skip to content

Configuring Databases

Irina Dragoste edited this page Nov 19, 2018 · 21 revisions

VLog supports loading facts from several databases at the same time. It also supports several database technologies. This section explains how a database source can be configured.

Vlog will load the configuration of all used data sources from a given file with .conf extension. Each data source will be associated to an EDB predicate. Not that there cannot be multiple EDB configuration for the same predicate name! The .conf file configures these predicates trough properties dedicated to each such EDB. A list of n datasources will be configured by assigning n sets of properties values, each set of property names identifying its EDB by its prefix. For example, a .conf that configures two EDB predicates P and Q will contain the following lines:

EDB0_predname=P
EDB1_predname=Q

In memory database, loaded from file

VLog supports file datasources of type .csv, as well as the RDF format N-Triples (.nt). Files can also be zipped, with .gzip extension. The facts from given file will be dictionary-encoded and stored in memory. Each such file i must be configured using the following setting:

EDB[i]_predname=[predicate_name]
EDB[i]_type=INMEMORY
EDB[i]_param0=[path_to_file_parent_dir]
EDB[i]_param1=[file_name_without_extension]

Trident database

Trident is a highly-efficient in-house RDF storage, that is embedded into VLog. To initialise a Trident database from an (gzipped) N-Triples format file, one must run the following command:

./vlog load -i [path_to_N-Triples_file_without_extension] -o [path_to_new_database_dir]

After the computation is terminated, VLog has created a new folder at given a copy of the database at given [path_to_new_database_dir] location, where the content of the database containing term triples from [path_to_N-Triples_file_without_extension] has been exported.

.conf file content, for a Trident database:

EDB[i]_predname=[predicate_name]
EDB[i]_type=Trident
EDB[i]_param0=[path_to_trident_database]

Example

Assume the following command was executed

./vlog load -i /path/to/ntriple_facts_file -o path/to/trident/db

in order to generate a Trident database initialised with the content of N-Triples file ntriple_facts_file.nt.gz. The unzipped file (ntriple_facts_file.nt) can also be provided. If path/to/trident directory exists, db directory will be created by the command, where all database files will be exported.

Now, we will associate all the term tuples in the database to predicate TripleEDB:

EDB0_predname=TripleEDB
EDB0_type=Trident
EDB0_param0=path/to/trident/db

Remote SPARQL query database

Vlog can integrate into its database answers to a SPARQL query on a remote endpoint. Each such answer will be associated to an EDB predicate, as configured below.

EDB[i]_predname=[predicate_name]
EDB[i]_type=SPARQL
EDB[i]_param0=[URL_of_remote_endpoint]
EDB[i]_param1=[comma_separated_list_of_answer_variables]
EDB[i]_param2=[SPARQL_query_body]

The SPARQL_query_body is the content of the WHERE clause in the SPARQL query. New line characters are not allowed in the SPARQL_query_body Notice that the same variable name can occur only once in [comma_separated_list_of_answer_variables].

Example

The example below confirgures binary predicate parentSameChild to the Wikidata SPARQL query answers to the query

SELECT ?mother ?father
WHERE
 { ?child wdt:P25 ?mother .
   ?child wdt:P22 ?father }

which computes all mother-father pairs of parents of the same child. See Wikidata Property:P22 (father) and Property:P25 (mother) properties.

EDB0_predname=parentSameChild
EDB0_type=SPARQL
EDB0_param0=http://query.wikidata.org/sparql
EDB0_param1=mother,father
EDB0_param2=?child wdt:P25 ?mother . ?child wdt:P22 ?father .

Examples

The following lines of a .conf file configure four EDB predicates: P, Q, R and Triple.

EDB0_predname=P
EDB0_type=INMEMORY
EDB0_param0=/data/edb_files
EDB0_param1=p_csv_file

EDB1_predname=Q
EDB1_type=INMEMORY
EDB1_param0=/data/edb_files
EDB1_param1=q_zipped_csv_file

EDB2_predname=R
EDB2_type=INMEMORY
EDB2_param0=/data/edb_files
EDB2_param1=r_nt_file

EDB3_predname=Triple
EDB3_type=INMEMORY
EDB3_param0=/data/trident_dbs/t_db

As you can see, predicate P is associated to the tuples of terms written in the CSV file located at /data/edb_files/p_csv_file.csv :

c1, c2, c3
c4, c5, c6

The resulting facts loaded from data source EDB0 are: P(c1, c2, c3), P(c4, c5, c6).

Predicate Q is associated to the tuples of terms written in the zipped CSV file located at /data/edb_files/q_zipped_csv_file.csv.gzip :/data/edb_files/q_zipped_csv_file.csv.gzip :

c7, c8
c9, c10

The resulting facts loaded from data source EDB1 are: Q(c7, c8), Q(c9, c10).

Predicate R is associated to the tuples of terms written in the zipped N-Triples file located at /data/edb_files/r_nt_file.nt.gzip :

<tarantino> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <director> .

The resulting facts loaded from data source EDB2 are: R(<tarantino>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type >, <director>).

Data source EDB2 associates predicate Triple to the tuples stored in the Trident database found in directory /data/trident_dbs/t_db.