Skip to content

Configuring Databases

Irina Dragoste edited this page Nov 19, 2018 · 21 revisions

VLog supports loading facts from several databases at the same time. It also supports several database technologies. This section explains how a database source can be configured.

Vlog will load the configuration of all used data sources from a given file with .conf extension. Each data source will be associated to an EDB predicate. Not that there cannot be multiple EDB configuration for the same predicate name! The .conf file configures these predicates trough properties dedicated to each such EDB. A list of n datasources will be configured by assigning n sets of properties values, each set of property names identifying its EDB by its prefix. For example, a .conf that configures two EDB predicates P and Q will contain the following lines:

EDB0_predname=P
EDB1_predname=Q

In memory database, loaded from file

VLog supports file datasources of type .csv, as well as the RDF format N-Triples (.nt). Files can also be zipped, with .gzip extension. The facts from given file will be dictionary-encoded and stored in memory. Each such file i must be configured using the following setting:

EDB[i]_predname=[predicate_name]
EDB[i]_type=INMEMORY
EDB[i]_param0=[path_to_file_parent_dir]
EDB[i]_param1=[file_name_without_extension]

Trident database

To initialise a Trident database from a N-Triples format file, one must run the following command:

./vlog load -i [path_to_N-Triples_file_without_extension] -o [path_to_new_database_dir]

After the computation is terminated, VLog has created a new folder at given a copy of the database at given [path_to_new_database_dir] location, where the content of the database containing facts from [path_to_N-Triples_file] has been exported.

.conf file content, for a Trident database:

EDB0_predname=[predicate_name]
EDB0_type=Trident
EDB0_param0=[path_to_trident_database]

Example

todo

Remote SPARQL query database

Vlog can integrate into its database answers to an SQL query on a remote endpoint. Each such answer will be associated to an EDB predicate, as configured below.

EDB[i]_predname=[predicate_name]
EDB[i]_type=SPARQL
EDB[i]_param0=[URL_of_remote_endpoint]
EDB[i]_param1=[comma_separated_list_of_answer_variables]
EDB[i]_param2=[SPARQL_query_body]

Example

todo

Examples

The following lines of a .conf file configure four EDB predicates: P, Q, R and Triple.

EDB0_predname=P
EDB0_type=INMEMORY
EDB0_param0=/data/edb_files
EDB0_param1=p_csv_file

EDB1_predname=Q
EDB1_type=INMEMORY
EDB1_param0=/data/edb_files
EDB1_param1=q_zipped_csv_file

EDB2_predname=R
EDB2_type=INMEMORY
EDB2_param0=/data/edb_files
EDB2_param1=r_nt_file

EDB3_predname=Triple
EDB3_type=INMEMORY
EDB3_param0=/data/trident_dbs/t_db

As you can see, predicate P is associated to the tuples of terms written in the CSV file located at /data/edb_files/p_csv_file.csv :

c1, c2, c3
c4, c5, c6

The resulting facts loaded from data source EDB0 are: P(c1, c2, c3), P(c4, c5, c6).

Predicate Q is associated to the tuples of terms written in the zipped CSV file located at /data/edb_files/q_zipped_csv_file.csv.gzip :/data/edb_files/q_zipped_csv_file.csv.gzip :

c7, c8
c9, c10

The resulting facts loaded from data source EDB1 are: Q(c7, c8), Q(c9, c10).

Predicate R is associated to the tuples of terms written in the zipped N-Triples file located at /data/edb_files/r_nt_file.nt.gzip :

<tarantino> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <director> .

The resulting facts loaded from data source EDB2 are: R(<tarantino>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type >, <director>).

Data source EDB2 associates predicate Triple to the tuples stored in the Trident database found in directory /data/trident_dbs/t_db.

Clone this wiki locally