Simple example scala Flink job consuming protobuf data from Kinesis Stream and storing it on s3 using partitioned parquet.
This application is designed for deployment on Kinesis Analytics for Java Applications.
- terraform
- aws cli
- sbt 1.3.5
- java 1.8
- maven > 3.1
Because of licence issues the Flink Kinesis Streams connector artifact isn't publicly available (see Flink documentation). Therefore it must be build locally.
First get the Flink 1.8.2 sources from github. Then build it with kinesis.
mvn clean install -Pinclude-kinesis -DskipTests
sbt assembly
cd infrastructure
terraform init
terraform apply
Example deployment via aws cli
./create_application.sh
Run KinesisProducerApp. Now check the bucket for the parquet data. About every five minutes a new file should be created.
For updating the application code following script can be used
./update_request.sh
./delete_request.sh
# Clean s3 bucket before destroying all resources. S3-buckets can't be deleted if there is still any data
terraform destroy