Releases: Hornetlabs/synchdb
SynchDB 1.0
Release Date: December 24, 2024
We're thrilled to announce the release of SynchDB 1.0! This PostgreSQL extension is designed to seamlessly synchronize data from multiple heterogeneous databases, such as MySQL and MS SQL Server, directly to PostgreSQL. SynchDB manages all data synchronization without requiring middleware, making it an efficient, streamlined solution for real-time data replication and integration.
This release primarily resolves the performance and resource issues knwon to 1.0 beta1 release and added several new utilities to allow user to fine tune the behavior and performance of SynchDB
Added
- added a data cache in DML parsing stage to prevent frequent access to PostgreSQL's catalog to obtain a table's tuple descriptor structure.
- added a variant of
synchdb_start_engine_bgw(name, mode)
that takes a second argument to indicate a custom snapshot mode to start the connector with. More detail here. - added several new GUCs that can be adjusted to tune the performance of Debezium Runner. Refer to here for complete list.
- added a debug SQL function
synchdb_log_jvm_meminfo(name)
that causes specified connector to output current JVM heap memory usage summary in PostgreSQL log file. - added a new VIEW
synchdb_stats_view
that prints statistic information for all connectors. More detail here. - added a new SQL function
synchdb_reset_stats(name)
to clear statistic information of specified connector. More detail here. - added a mess creation script to quickly generate test tables and data on MySQL database type.
Changed
- synchdb_state_view(): added a new field called
stage
that indicates the current stage of a connector (value can either beInitial Snapshot
orChange Data Capture
). - synchdb_state_view(): will only show state of valid connectors.
- removed sending "partial batch completion" notification to Debezium Runner in case of error, because a batch is now handled by one PostgreSQL transaction, and partial completion is not allowed.
- SSL related parameters per connector can now be specified in the rule file.
- the maximum heap memory to allocate to JVM that runs the Debezium Runner can now be configured via GUC.
- max number of connector background worker is now configurable instead of hardcoded 30.
Fixed
- fixed rapid memory buildups in Debezium runner in JVM by adding a throttle control in the receiving of change events.
- resolved majority of memory leak in both SynchDB and Debezium runner components.
- corrected the use of memory context in SynchDB such that heap memory can be correctly freed at the end of each change event processing.
- significantly increased the processing speed of SynchDB by processing a batch within a single PostgreSQL transaction rather than multiple.
- corrected SQLServer's default data type size mapping for
char
type from 0 to -1. - resolved high memory usage during DML processing using SPI.
Known Issues
- Automatic connector launcher only launches connector workers created under the default postgres database (#71)
- ALTER TABLE ALTER COLUMN does not handle: (#77)
- Complex data type changes (e.g., from TEXT -> INT)
- Column index changes
- Renamed columns
- Restarting a paused connector will cause it to resume when restarted, rather than starting it in paused state (#80)
SynchDB 1.0 Beta1
Release Date: October 23, 2024
We're thrilled to announce the beta release of SynchDB 1.0! This PostgreSQL extension is designed to seamlessly synchronize data from multiple heterogeneous databases, such as MySQL and MS SQL Server, directly to PostgreSQL. SynchDB manages all data synchronization without requiring middleware, making it an efficient, streamlined solution for real-time data replication and integration.
Requirements
- PostgreSQL 16
- Java Runtime Environment 17 or later
Core Features
- Supports logical replication from heterogeneous databases (MySQL and SQLServer)
- Supports DDL replication
- CREATE TABLE
- DROP TABLE
- ALTER TABLE ADD COLUMN
- ALTER TABLE DROP COLUMN
- ALTER TABLE ALTER COLUMN
- Supports DML replication (INSERT, UPDATE, DELETE)
- Supports max 30 concurrent connector workers
- Supports automatic connector launcher at PostgreSQL startup
- Supports global connector state and last error message views
- Supports selective databases and tables replication
- Supports change events in batches
- Supports connector restarts in different snapshot modes
- Supports offset management interfaces to select custom replication resume point
- Supports default data type and object name transform rules for supported heterogeneous databases
- Supports JSON rule file to define custom:
- Data type transform rules
- Column name transform rules
- Table name transform rules
- Data expression transform rules
- Supports 2 data apply modes (SPI, HeapAM API)
- Supports several utility functions to perform connector operations:
- start
- stop
- pause
- resume
Known Issues
- Automatic connector launcher only launches connector workers created under the default
postgres
database (#71) - ALTER TABLE ALTER COLUMN does not handle:
- Complex data type changes (e.g., from TEXT -> INT)
- Column index changes
- Renamed columns
(#77)
- Cannot specify X509 certificate and private key to connect to heterogeneous databases via TLS (#78)
- The
last_dbz_offset
column fromsynchdb_state_view()
does not reflect the current data offset, but rather the last offset value flushed to disk (#79) - Restarting a
paused
connector will cause it to resume when restarted, rather than starting it in paused state (#80) - memory leak issue in synchdb
- java heap memory in JVM may run out if the source tables are too big - need throttle control there