A little demo showing how to pull data into Greenplum DB (GPDB) from a Kafka topic, then push data back out into Kafka
The simplest way to run this is to have GPDB and Kafka running on the same machine, so let's go over that.
- Log into a GPDB Single Node VM as user "gpadmin"
- Place a copy of this repo into
~gpadmin/
- Create a database, "gpadmin", if it doesn't already exist
- Download and install the latest Apache Kafka release, per the Quick Start
- Edit
kafka_env.sh
to suit your deployment:# Set up environment, shared across scripts export kafka_dir="$HOME/kafka_2.11-0.11.0.0" export zk_host=localhost export KAFKA_HEAP_OPTS="-Xmx16G -Xms16G"
cd $HOME/gpdb-kafka-round-trip/
- Start up Zookeeper:
./zk_start.sh
, and check./zk.log
to ensure that was successful (also, note this log file can get large). - Start up Kafka:
./kafka_start.sh
. Again, verify it's running by checking./kafka.log
. - Create a topic,
chicago_crimes
:./kafka_create_topic.sh
- Follow this procedure to install the underlying C Kafka client library. The two Go programs are dynamically linked to this library, so it will need to be installed onto each of the segment hosts in your GPDB cluster (on the Single Node VM, you just install it in one place).
- If there is a pre-compiled binary for your platform in ./bin, you can just symlink each of them into
$HOME/
and skip the remainder of this section. - Install Go, per these instructions.
- Refer to this link for guidance on installing the Go Kafka client.
git clone https://github.com/mgoddard-pivotal/confluent-kafka-go.git
cd ./confluent-kafka-go/examples/
- Install a prerequisite for
go-kafkacat
:go get gopkg.in/alecthomas/kingpin.v2
- In the Bash shell, this should produce executables of the required binaries, placing them into
$HOME
:for dir in go-kafkacat producer_example do cd $dir go build . && cp ./$dir ~/ cd - done
cd $HOME/gpdb-kafka-round-trip/
- Run
./kafka_gpdb_kafka_roundtrip_demo.sh
, and hit "enter" at each prompt.