-
Notifications
You must be signed in to change notification settings - Fork 9
Configuring Databases
VLog supports loading facts from several databases at the same time. It also supports several database technologies. This section explains how a database source can be configured.
Vlog will load the configuration of all used data sources from a given file with .conf extension. Each data source will be associated to an EDB predicate. Not that there cannot be multiple EDB configuration for the same predicate name! The .conf file configures these predicates trough properties dedicated to each such EDB. A list of n datasources will be configured by assigning n sets of properties values, each set of property names identifying its EDB by its prefix. For example, a .conf that configures two EDB predicates P and Q will contain the following lines:
EDB0_predname=P
EDB1_predname=Q
VLog supports file datasources of type .csv, as well as the RDF format N-Triples (.nt). Files can also be zipped, with .gzip extension. The facts from given file will be dictionary-encoded and stored in memory. Each such file i must be configured using the following setting:
EDB[i]_predname=[predicate_name]
EDB[i]_type=INMEMORY
EDB[i]_param0=[path_to_file_parent_dir]
EDB[i]_param1=[file_name_without_extension]
To initialise a Trident database from a N-Triples format file, one must run the following command:
./vlog load -i [path_to_N-Triples_file_without_extension] -o [path_to_new_database_dir]
After the computation is terminated, VLog has created a new folder at given a copy of the database at given [path_to_new_database_dir] location, where the content of the database containing facts from [path_to_N-Triples_file] has been exported.
.conf file content, for a Trident database:
EDB0_predname=[predicate_name]
EDB0_type=Trident
EDB0_param0=[path_to_trident_database]
The following lines of a .conf file configure four EDB predicates: P, Q, R and Triple.
EDB0_predname=P
EDB0_type=INMEMORY
EDB0_param0=/data/edb_files
EDB0_param1=p_csv_file
EDB1_predname=Q
EDB1_type=INMEMORY
EDB1_param0=/data/edb_files
EDB1_param1=q_zipped_csv_file
EDB2_predname=R
EDB2_type=INMEMORY
EDB2_param0=/data/edb_files
EDB2_param1=r_nt_file
EDB3_predname=Triple
EDB3_type=INMEMORY
EDB3_param0=/data/trident_dbs/t_db
As you can see, predicate P is associated to the tuples of terms written in the CSV file located at /data/edb_files/p_csv_file.csv :
c1, c2, c3
c4, c5, c6
The resulting facts loaded from data source EDB0 are: P(c1, c2, c3), P(c4, c5, c6).
Predicate Q is associated to the tuples of terms written in the zipped CSV file located at /data/edb_files/q_zipped_csv_file.csv.gzip :
c7, c8
c9, c10
The resulting facts loaded from data source EDB1 are: Q(c7, c8), Q(c9, c10).