SynchDB is a PostgreSQL extension designed to replicate data from one or more heterogeneous databases (such as MySQL, MS SQLServer, Oracle, etc.) directly to PostgreSQL in a fast and reliable way. PostgreSQL serves as the destination from multiple heterogeneous database sources. No middleware or third-party software is required to orchestrate the data synchronization between heterogeneous databases and PostgreSQL. SynchDB extension itself is capable of handling all the data synchronization needs.
Visit SynchDB documentation site here for more details.
SynchDB extension consists of six major components:
- Debezium Runner Engine (Java)
- SynchDB Launcher
- SynchDB Worker
- Format Converter
- Replication Agent
- Table Synch Agent (TBD)
- A java application utilizing Debezium embedded library.
- Supports various connector implementations to replicate change data from various database types such as MySQL, Oracle, SQL Server, etc.
- Invoked by
SynchDB Worker
to initialize Debezium embedded library and receive change data. - Send the change data to
SynchDB Worker
in generalized JSON format for further processing.
- Responsible for creating and destroying SynchDB workers using PostgreSQL's background worker APIs.
- Configure each worker's connector type, destination database IPs, ports, etc.
- Instantiats a
Debezium Runner Engine
to replicate changes from a specific connector type. - Communicate with Debezium Runner via JNI to receive change data in JSON formats.
- Transfer the JSON change data to
Format Converter
module for further processing.
- Parse the JSON change data using PostgreSQL Jsonb APIs
- Transform DDL queries to PostgreSQL compatible queries and translate heterogeneous column data type to PostgreSQL compatible types.
- Transform DML queries to PostgreSQL compatible queries and handle data processing based on column data types
- Produces raw HeapTupleData which can be fed directly to Heap Access Method within PostgreSQL for faster executions.
- Processes the outputs from
Format Converter
. Format Converter
will produce HeapTupleData format outputs, thenReplication Agent
will invoke PostgreSQL's heap access method routines to handle them.
- Design details and implementation are not available yet. TBD
- Intended to provide a more efficient alternative to perform initial table synchronization.
The following software is required to build and run SynchDB. The versions listed are the versions tested during development. Older versions may still work.
- Java Development Kit 17 or later. Download here
- Apache Maven 3.6.3 or later. Download here
- PostgreSQL 16.3 Source. Git clone here. Refer to this wiki for PostgreSQL build requirements
- Docker compose 2.28.1 (for testing). Refer to here
- Unix based operating system like Ubuntu 22.04 or MacOS
Clone the PostgreSQL source and switch to 16.3 release tag
git clone https://github.com/postgres/postgres.git --branch REL_16_3
cd postgres
Clone the SynchDB source from within the extension folder
cd contrib/
git clone https://github.com/Hornetlabs/synchdb.git
If you are working on Ubuntu 22.04.4 LTS, install the Maven as below:
sudo apt install maven
if you are using MacOS, you can use the brew command to install maven (refer (here)[https://brew.sh/] for how to install Homebrew) without any other settings:
brew install maven
If you are working on Ubuntu 22.04.4 LTS, install the OpenJDK as below:
sudo apt install openjdk-21-jdk
If you are working on MacOS, please install the JDK with brew command:
brew install openjdk@22
This can be done by following the standard build and install procedure as described here. Generally, it consists of:
cd /home/$USER/postgres
./configure
make
sudo make install
You should build and install the default extensions as well:
cd /home/$USER/postgres/contrib
make
sudo make install
The commands below build and install the Debezium Runner Engine jar file to your PostgreSQL's lib folder.
cd /home/$USER/postgres/contrib/synchdb
make build_dbz
sudo make install_dbz
The commands below build and install SynchDB extension to your PostgreSQL's lib and share folder.
cd /home/$USER/postgres/contrib/synchdb
make
sudo make install
Lastly, we also need to tell your system's linker where the newly added Java library (libjvm.so) is located in your system.
# Dynamically set JDK paths
JAVA_PATH=$(which java)
JDK_HOME_PATH=$(readlink -f ${JAVA_PATH} | sed 's:/bin/java::')
JDK_LIB_PATH=${JDK_HOME_PATH}/lib
echo $JDK_LIB_PATH
echo $JDK_LIB_PATH/server
sudo echo "$JDK_LIB_PATH" | sudo tee -a /etc/ld.so.conf.d/x86_64-linux-gnu.conf
sudo echo "$JDK_LIB_PATH/server" | sudo tee -a /etc/ld.so.conf.d/x86_64-linux-gnu.conf
Note, for mac with M1/M2 chips, you need to the two lines into /etc/ld.so.conf.d/aarch64-linux-gnu.conf
Run ldconfig to reload:
sudo ldconfig
Ensure synchdo.so extension can link to libjvm Java library on your system:
ldd synchdb.so
linux-vdso.so.1 (0x00007ffeae35a000)
libjvm.so => /usr/lib/jdk-22.0.1/lib/server/libjvm.so (0x00007fc1276c1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc127498000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc127493000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc12748e000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc127489000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1273a0000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc128b81000)
Refer to testenv/README.md
or visit our documentation page here to prepare a sample MySQL and SQLServer database instances for testing.
SynchDB extension requires pgcrypto to encrypt certain sensitive credential data. Please make sure it is installed prior to installing SynchDB. Alternatively, you can include CASCADE
clause in CREATE EXTENSION
to automatically install dependencies:
CREATE EXTENSION synchdb CASCADE;
With SynchDB installed, we can create one or more connector information entries that represent the details to connect to a remote heterogeneous database and what to replicate. This can be done with synchdb_add_conninfo()
function.
For example:
# create a mysql conninfo called 'mysqlconn' to replicate inventory.orders and inventory.customers tables from source database 'inventory' to destination database 'postgres' using default transform rules
SELECT synchdb_add_conninfo('mysqlconn','127.0.0.1',3306,'mysqluser', 'mysqlpwd', 'inventory', 'postgres', 'inventory.orders,inventory.customers', 'mysql', '');
# create a sqlserver conninfo called 'sqlserverconn' to replicate all tables from source database 'testDB' to destination database 'sqlserverdb' using default transform rules
SELECT synchdb_add_conninfo('sqlserverconn','127.0.0.1',1433,'sa', 'Password!', 'testDB', 'sqlserverdb', '', 'sqlserver', '');
All connection information are created in the table synchdb_conninfo
. We are free to view its content and make modification as required. Please note that the password of a user credential is encrypted by pgcrypto using a key only known to synchdb. So please do not modify the password field or it may be decrypted incorrectly if tempered. See below for an example output:
postgres=# \x
Expanded display is on.
postgres=# select * from synchdb_conninfo;
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
name | mysqlconn
data | {"pwd": "\\xc30d040703024828cc4d982e47b07bd23901d03e40da5995d2a631fb89d49f748b87247aee94070f71ecacc4990c3e71cad9f68d57c440de42e35bcc78fd145feab03452e454284289db", "port": 3306, "user": "mysqluser", "dstdb": "postgres", "srcdb": "inventory", "table": "inventory.orders,inventory.customers", "hostname": "192.168.1.86", "connector": "mysql", "null"}
-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
name | sqlserverconn
data | {"pwd": "\\xc30d0407030231678e1bb0f8d3156ad23a010ca3a4b0ad35ed148f8181224885464cdcfcec42de9834878e2311b343cd184fde65e0051f75d6a12d5c91d0a0403549fe00e4219215eafe1b", "port": 1433, "user": "sa", "dstdb": "sqlserverdb", "srcdb": "testDB", "table": "null", "hostname": "192.168.1.86", "connector": "sqlserver", "null"}
Use synchdb_start_engine_bgw()
function to start a connector worker. It takes one argument which is the connection information name created above. the commands below will spawn a worker for each conenctor with the specified replication configuration.
select synchdb_start_engine_bgw('mysqlconn');
select synchdb_start_engine_bgw('sqlserverconn');
Use synchdb_state_view()
to examine all the running connectors and their states. Currently, synchdb can support up to 30 running workers.
postgres=# select * from synchdb_state_view;
id | connector | conninfo_name | pid | state | err | last_dbz_offset
----+-----------+----------------+--------+---------+----------+---------------------------------------------------------------------------------------------------
0 | mysql | mysqlconn | 461696 | syncing | no error | {"ts_sec":1725644339,"file":"mysql-bin.000004","pos":138466,"row":1,"server_id":223344,"event":2}
1 | sqlserver | sqlserverconn | 461739 | syncing | no error | {"event_serial_no":1,"commit_lsn":"00000100:00000c00:0003","change_lsn":"00000100:00000c00:0002"}
3 | null | | -1 | stopped | no error | no offset
4 | null | | -1 | stopped | no error | no offset
5 | null | | -1 | stopped | no error | no offset
...
...
Use synchdb_stop_engine_bgw()
SQL function to stop a running or paused connector worker. This function takes conninfo_name
as its only parameter, which can be found from the output of synchdb_get_state()
view.
select synchdb_stop_engine_bgw('mysqlconn');