Skip to content

A little demo showing how to pull data into Greenplum DB (GPDB) from a Kafka topic, then also push data back out into Kafka

License

Notifications You must be signed in to change notification settings

jistok/gpdb-kafka-round-trip

Repository files navigation

A little demo showing how to pull data into Greenplum DB (GPDB) from a Kafka topic, then push data back out into Kafka

The simplest way to run this is to have GPDB and Kafka running on the same machine, so let's go over that.

  1. Log into a GPDB Single Node VM as user "gpadmin"
  2. Place a copy of this repo into ~gpadmin/
  3. Create a database, "gpadmin", if it doesn't already exist

Set up Kafka

  1. Download and install the latest Apache Kafka release, per the Quick Start
  2. Edit kafka_env.sh to suit your deployment:
    # Set up environment, shared across scripts
    export kafka_dir="$HOME/kafka_2.11-0.11.0.0"
    export zk_host=localhost
    export KAFKA_HEAP_OPTS="-Xmx16G -Xms16G"
    
  3. cd $HOME/gpdb-kafka-round-trip/
  4. Start up Zookeeper: ./zk_start.sh, and check ./zk.log to ensure that was successful (also, note this log file can get large).
  5. Start up Kafka: ./kafka_start.sh. Again, verify it's running by checking ./kafka.log.
  6. Create a topic, chicago_crimes: ./kafka_create_topic.sh

Prepare the Go Kafka client programs

  1. Follow this procedure to install the underlying C Kafka client library. The two Go programs are dynamically linked to this library, so it will need to be installed onto each of the segment hosts in your GPDB cluster (on the Single Node VM, you just install it in one place).
  2. If there is a pre-compiled binary for your platform in ./bin, you can just symlink each of them into $HOME/ and skip the remainder of this section.
  3. Install Go, per these instructions.
  4. Refer to this link for guidance on installing the Go Kafka client.
  5. git clone https://github.com/mgoddard-pivotal/confluent-kafka-go.git
  6. cd ./confluent-kafka-go/examples/
  7. Install a prerequisite for go-kafkacat: go get gopkg.in/alecthomas/kingpin.v2
  8. In the Bash shell, this should produce executables of the required binaries, placing them into $HOME:
    for dir in go-kafkacat producer_example
    do
      cd $dir
      go build . && cp ./$dir ~/
      cd -
    done
    

Run the demo

  1. cd $HOME/gpdb-kafka-round-trip/
  2. Run ./kafka_gpdb_kafka_roundtrip_demo.sh, and hit "enter" at each prompt.

About

A little demo showing how to pull data into Greenplum DB (GPDB) from a Kafka topic, then also push data back out into Kafka

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages