Skip to content

Latest commit

 

History

History
164 lines (111 loc) · 5.08 KB

File metadata and controls

164 lines (111 loc) · 5.08 KB

Quick start

Let's copy some bytes!

Installation

From releases

Download from GitHub releases appropriate for your OS.

Example: macOS

curl -OL https://github.com/h7kanna/airbyte-replication-operator/releases/download/v0.0.1/airbyte-replication-operator-aarch64-apple-darwin.tar.gz
tar -xvf airbyte-replication-operator-aarch64-apple-darwin.tar.gz
./airbyte-replication-driver --help

From Container registry

docker pull h7kanna/airbyte-replication-operator:latest
#or
docker pull ghcr.io/h7kanna/airbyte-replication-operator:latest

From source

You need Rust toolchain to be installed first. Check https://www.rust-lang.org/tools/install

After installing Rust, prepare a cup of your favorite drink and run the build.

git clone git@github.com:pravaah-dev/airbyte-replication-operator.git
cd airbyte-replication-operator
cargo build --bins --release
./target/release/airbyte-replication-driver --help

Replication Driver

Airbyte replication connectors are run as containers. So we need a container runtime like Docker Desktop.
But the driver here is agnostic to the container runtime unlike the Official Airbyte Worker job.
The architecture differs in how the driver coordinates the replication process from Source to Destination.
Driver plays two roles, Source and Destination, and it is run inside the containers.
The driver processes inside the containers coordinate themselves through IPC (inter-process communication).
They execute an initialization handshake protocol to start the Source and Destination processes.
If the process creation is successful Source driver will start replication of data into Destination process.
Source driver will take care of tracking record flow and publish stats and metrics.
Destination driver will take care of state persistence.
Once the replication is complete, the drivers again coordinate the shutdown procedure though IPC and end gracefully.

For the above protocol we need some shared volume for the process pipes.
So an init command is executed at the startup.

Drivers

The end effect is we run just two containers.

We can run the replication using a docker compose when using Docker.

Now as the introduction is out of our way, we shall copy a file from one folder to another using the Airbyte replication.
Let's use Airbyte Connector Source File to Airbyte Connector Destination CSV

À la cp local/input.csv local/_airbyte_raw_test.csv ! 😃

cd e2e/hello-airbyte-file-to-csv/docker
docker compose up

Let's verify

chmod +x assert.sh && ./assert.sh

Cleanup

docker compose down && rm -rf ./local/_airbyte_raw_test.csv

More example compose files here

Replication Operator

Again, as Airbyte replications are run as containers, we can run them using any Container platform.
Plan is to support running on various platforms like Amazon ECS, Fargate etc.

But for our darling Kubernetes, Operator component is a Kubernetes Operator which runs the above driver as a Pod and takes care of the lifecycle of one replication.

Idea is that higher level Orchestrators can use this as a building block to provider features like scheduling, config management, UI etc.

And, Kubernetes is one of the pluggable storage options to store the replication state.

As an example, KubeVela can be used as that high level workflow engine. Check details here.
A video demo is here.

Drivers

CLI

Driver has a CLI interface for some useful operations to view State and Progress of replications.
CLI also provides some scheduling(cron) and config management capability for simple use-cases.
And in future the envisioned Airbyte Desktop, a way to store your personal data using Airbyte replications in your personal Data Lake.

For all commands

./airbyte-replication-driver --help

Example: Check the status of the replication

./airbyte-replication-driver --command state --replication hello-airbyte-file-to-csv --store-path tmp

Finally

Hopefully, the above tutorial has helped you understand the usage examples of replication Driver/Operator combo.
And more importantly, the value proposition.

So let us start contributing to the awesome Airbyte ecosystem in a unique way.
Shall we?

Check the Roadmap here.

Your feedback is much appreciated.