This repository includes artifacts required to compare an RDBMS like postgres with Kafka we primarily look at the cost of adding triggers to achieve CEP like a Stream processing system does and analyze the cost to insertion speed and we then try to see how decoupling ingestion and processing is beneficial with kafka as a storage system and KSQL (Kafka Streams) for processing. Thanks for for providing a Streaming API for RSVPs
- We use Streaming API from for our experiment
- Clone this repository
- Run
to save data from APIcd datagen pip3 install requests python3
- Once we have sufficient data we can transfer this to a GCS bucket with name
for use in benchmark
PS. Ensure that the VM have sufficient cores and memory to run kafka and consumers in parallel eg. 15 cores 50GB memory
- Run
sudo apt update && sudo apt upgrade
- Copy the data generated from previous step from GCS Bucket into VM
gsutil cp -R gs://$BUCKET_NAME .
- Install
as described here Install Docker via Convenience Scriptcurl -fsSL -o sudo sh sudo usermod -aG docker $USER
- Install
sudo curl -L "$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose
- Install pip3 to install dependencies
sudo apt install python3-pip
- Clone this repository onto the VM
git clone
- Start containers
cd advanced-databases_ApacheKafka docker-compose up -d
- (Optionally) Setup remote port forwarding to view the control center, to be run in work station
ensure that SSH has been setup up to connect to instances GCP - Connecting to Instances
ssh -L 5000:localhost:9021 [USERNAME]@[EXTERNAL_IP_ADDRESS]
- Move
- Setup database
docker-compose exec postgres psql -Ubenchmark -f /datagen/initdb.sql
- We need to install
to connect to postgrespip3 install psycopg2-binary
- Run benchmark
python3 ingest_postgres.sql
- Remove trigger from
one at a time and repeat the steps above
- Create a new topic to inject data for benchmark
docker-compose exec broker kafka-topics --create \ --zookeeper zookeeper:2181 \ --replication-factor 1 --partitions 60 \ --topic meetup
- Open multi tabs or panes, suggested to use a terminal multiplexer like TMux or open multiple SSH connections and run
docker-compose exec ksqldb-cli ksql http://ksqldb-server:8088
- Create
fromkafka topic
Refer to:kafka-processing.sql
- Install
to act as producer for kafkapip3 install kafka-python
- Execute the other queries in multiple panes / windows
- Run benchmark
- Multiplex and run multiple producers to see high throughput
- Repeat steps 4 - 6 for various query combinations