Let's copy some bytes!
Download from GitHub releases appropriate for your OS.
Example: macOS
curl -OL https://github.com/h7kanna/airbyte-replication-operator/releases/download/v0.0.1/airbyte-replication-operator-aarch64-apple-darwin.tar.gz
tar -xvf airbyte-replication-operator-aarch64-apple-darwin.tar.gz
./airbyte-replication-driver --help
docker pull h7kanna/airbyte-replication-operator:latest
#or
docker pull ghcr.io/h7kanna/airbyte-replication-operator:latest
You need Rust toolchain to be installed first. Check https://www.rust-lang.org/tools/install
After installing Rust, prepare a cup of your favorite drink and run the build.
git clone git@github.com:pravaah-dev/airbyte-replication-operator.git
cd airbyte-replication-operator
cargo build --bins --release
./target/release/airbyte-replication-driver --help
Airbyte replication connectors are run as containers. So we need a container runtime like Docker Desktop.
But the driver here is agnostic to the container runtime unlike
the Official Airbyte Worker job.
The architecture differs in how the driver coordinates the replication process from Source to Destination.
Driver plays two roles, Source and Destination, and it is run inside the containers.
The driver processes inside the containers coordinate themselves through IPC (inter-process communication).
They
execute an
initialization handshake protocol to start the Source and Destination processes.
If the process creation is successful Source driver will start replication of data into Destination process.
Source driver will take care of tracking record flow and publish stats and metrics.
Destination driver will take care of state persistence.
Once the replication is complete, the drivers again coordinate the shutdown procedure though IPC and end gracefully.
For the above protocol we need some shared volume for the process pipes.
So an init command is executed at the
startup.
The end effect is we run just two containers.
We can run the replication using a docker compose when using Docker.
Now as the introduction is out of our way, we shall copy a file from one folder to another using the Airbyte
replication.
Let's
use Airbyte Connector Source File
to Airbyte Connector Destination CSV
À la cp local/input.csv local/_airbyte_raw_test.csv
! 😃
cd e2e/hello-airbyte-file-to-csv/docker
docker compose up
Let's verify
chmod +x assert.sh && ./assert.sh
Cleanup
docker compose down && rm -rf ./local/_airbyte_raw_test.csv
More example compose files here
Again, as Airbyte replications are run as containers, we can run them using any Container platform.
Plan is to support running on various platforms like Amazon ECS, Fargate etc.
But for our darling Kubernetes, Operator component is a Kubernetes Operator which runs the above driver as a Pod and takes care of the lifecycle of one replication.
Idea is that higher level Orchestrators can use this as a building block to provider features like scheduling, config management, UI etc.
And, Kubernetes is one of the pluggable storage options to store the replication state.
As an example, KubeVela can be used as that high level workflow engine. Check
details here.
A video demo is here.
Driver has a CLI interface for some useful operations to view State and Progress of replications.
CLI also provides some scheduling(cron) and config management capability for simple use-cases.
And in future the envisioned Airbyte Desktop, a way to store your personal data using Airbyte replications in your
personal Data Lake.
For all commands
./airbyte-replication-driver --help
Example: Check the status of the replication
./airbyte-replication-driver --command state --replication hello-airbyte-file-to-csv --store-path tmp
Hopefully, the above tutorial has helped you understand the usage examples of replication Driver/Operator combo.
And more importantly, the value proposition.
So let us start contributing to the awesome Airbyte ecosystem in a unique way.
Shall we?
Check the Roadmap here.
Your feedback is much appreciated.