A little demo showing how to pull data into Greenplum DB (GPDB) from a Kafka topic, then push data back out into Kafka

The simplest way to run this is to have GPDB and Kafka running on the same machine, so let's go over that.

Log into a GPDB Single Node VM as user "gpadmin"
Place a copy of this repo into ~gpadmin/
Create a database, "gpadmin", if it doesn't already exist

Set up Kafka

Download and install the latest Apache Kafka release, per the Quick Start

Edit kafka_env.sh to suit your deployment:

# Set up environment, shared across scripts
export kafka_dir="$HOME/kafka_2.11-0.11.0.0"
export zk_host=localhost
export KAFKA_HEAP_OPTS="-Xmx16G -Xms16G"

cd $HOME/gpdb-kafka-round-trip/
Start up Zookeeper: ./zk_start.sh, and check ./zk.log to ensure that was successful (also, note this log file can get large).
Start up Kafka: ./kafka_start.sh. Again, verify it's running by checking ./kafka.log.
Create a topic, chicago_crimes: ./kafka_create_topic.sh

Prepare the Go Kafka client programs

Follow this procedure to install the underlying C Kafka client library. The two Go programs are dynamically linked to this library, so it will need to be installed onto each of the segment hosts in your GPDB cluster (on the Single Node VM, you just install it in one place).
If there is a pre-compiled binary for your platform in ./bin, you can just symlink each of them into $HOME/ and skip the remainder of this section.
Install Go, per these instructions.
Refer to this link for guidance on installing the Go Kafka client.
git clone https://github.com/mgoddard-pivotal/confluent-kafka-go.git
cd ./confluent-kafka-go/examples/
Install a prerequisite for go-kafkacat: go get gopkg.in/alecthomas/kingpin.v2
In the Bash shell, this should produce executables of the required binaries, placing them into $HOME:
```
for dir in go-kafkacat producer_example
do
  cd $dir
  go build . && cp ./$dir ~/
  cd -
done
```

Run the demo

cd $HOME/gpdb-kafka-round-trip/
Run ./kafka_gpdb_kafka_roundtrip_demo.sh, and hit "enter" at each prompt.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bin		bin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chicago_crimes_10k_rows.csv.gz		chicago_crimes_10k_rows.csv.gz
kafka_create_topic.sh		kafka_create_topic.sh
kafka_delete_topic.sh		kafka_delete_topic.sh
kafka_env.sh		kafka_env.sh
kafka_gpdb_kafka_roundtrip_demo.sh		kafka_gpdb_kafka_roundtrip_demo.sh
kafka_list_topics.sh		kafka_list_topics.sh
kafka_start.sh		kafka_start.sh
zk_export_offsets.sh		zk_export_offsets.sh
zk_import_offsets.sh		zk_import_offsets.sh
zk_offsets.txt		zk_offsets.txt
zk_start.sh		zk_start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A little demo showing how to pull data into Greenplum DB (GPDB) from a Kafka topic, then push data back out into Kafka

The simplest way to run this is to have GPDB and Kafka running on the same machine, so let's go over that.

Set up Kafka

Prepare the Go Kafka client programs

Run the demo

About

Releases

Packages

Languages

License

jistok/gpdb-kafka-round-trip

Folders and files

Latest commit

History

Repository files navigation

A little demo showing how to pull data into Greenplum DB (GPDB) from a Kafka topic, then push data back out into Kafka

The simplest way to run this is to have GPDB and Kafka running on the same machine, so let's go over that.

Set up Kafka

Prepare the Go Kafka client programs

Run the demo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages