Welcome to Byte to Eat, the tastiest way to explore Data Contracts in action! This demo simulates a restaurant kitchen running on Confluent Cloud, showcasing how data contracts work to manage recipe and order events.
This project demonstrates 4 key capabilities of Data Contracts:
- Data Quality Rules – Ensuring only valid recipes and orders get into the kitchen.
- Data Transformation Rules – Ensuring data is in the right format before cooking.
- Schema Migration Rules – Evolving schemas while ensuring no recipe goes bad.
- Data Encryption Rules – Keeping pii information safe.
- Recipe Producer: Sends recipe details, including ingredients, steps, and chef info to the Kafka topic in Confluent Cloud.
- Order Producer: Simulates customer orders referencing recipes by their
recipe_id
. - Kafka Consumers: Consume orders and recipes, validating and transforming the data based on Avro schemas.
- Schema Registry: Ensures proper validation and schema management for the recipes and orders.
See the demo-recording-480p.mp4
file in the directory
- Java: Kafka Producers & Consumers
- Confluent Cloud: The data streaming platform
- Confluent Schema Registry: Manages Data Contracts
- Flink: To join the Recipe and Order events
- Avro: Used to define the Schema
-
Clone the Repo
git clone https://github.com/wvella/byte-to-eat.git
-
Setup Variabes via Terraform
-
Create a
terraform.tfvars
in theterraform/confluent-cloud
directory with the following contents:confluent_cloud_api_key = "<<confluent_cloud_api_key_for_terraform>>" confluent_cloud_api_secret = "<<confluent_cloud_api_secret_for_terraform>>" aws_kms_key_arn = "<<arn for the key in AWS KMS>>"
-
-
Deploy the Demo
- Run
./terraform-apply.sh
. This script will deploy all the resources in Confluent Cloud and produce a Spaghetti Bolognese recipe to the topic. 🍝 Yum! - Grant Confluent access to the Key in AWS KMS by applying the policy directly on the key in KMS.
- TODO: Automate this manual step!
- Run
-
Flow
- PrePreparation
- Open VSCode
- Ensure Docker is running
- Open
ProducerAvroRecipe.java
- Open
data-governance.tf
- Log into Confluent Cloud
- Open
schema-raw.recipe-value-v2.avsc
- Open 4 Terminal Windows and
cd java-demo-data_contracts-bytetoeat
:- Window 1: V1 Producer (White Background)
- Window 2: V2 Producer (Black Background)
- Window 3: V1 Consumer (White Background)
- Window 4: V2 Consumer (Black Background)
- Data Quality Rules
- Show
require_more_than_one_ingredient
rule definition in Terraformdata-governance.tf
/ Confluent Cloud UI. - Demonstrate by trying to produce a recipe that violates the rule by running
./run-producer-recipe.sh false
in thebyte-to-eat-v1
directory. - Show the bad message ending up in the
raw.recipes.dlq
topic.
- Show
- Data Transformation Rules
- Show
transform_recipe_name_to_valid_recipe_id
rule definition in Terraformdata-governance.tf
/ Confluent Cloud UI. - Show the recipe id in the Java Code
ProducerAvroRecipe.java
in thebyte-to-eat-v1
directory. - Show how the recipe ID is transformed when it's written to the
raw.recipe
topic via the Data Transformation rule.
- Show
- Data Encryption Rules
- Show the
Orders
Data Contract in the Confluent Cloud UI. Orders have some PII tags. - Show the
ProducerAvroRecipe.java
application. There is no Code to do the encryption, it just imports thekafka-schema-rules
dependency. - Show
encrypt_pii
rule definition in the Confluent Cloud UI. - Show the
aws-kek-for-csfle
definition underStream Governance
->Schema Registry
->Encryption Keys
. - Start up the
./run-producer-orders.sh
producer. - Show the
raw.orders
topic to see thecustomer_address
andpii
field encrypted. - In the Confluent Cloud UI, add another
pii
tag tocustomer_name
show the schema is instant. No code changes. - Show the
raw.orders
Topic in the Confluent Cloud UI to show thecustomer_name
field is now encrypted. - Start up the
./run-consumer-orders.sh
to see how a Java consumer can decrypt the field. - Bonus: The consumer can only decrypt the field because it has access to the
aws-kek-for-csfle
key. Remove the access via the Confluent Cloud UI and the field won't be decrypted. - Bonus: Flink is joining the
Orders
andRecipes
together, and the encrypted field will be carried through.
- Show the
- Schema Migration Rules
- Switch to the
byte-to-eat-v2
folder. - In the Confluent Cloud UI, show the current version of the
raw.recipe-value
Data Contract which hasapplication.major.version
set to 1. - Show
schema-raw.recipe-value-v2.avsc
which now has thechef_first_name
andchef_last_name
as seperate fields. This would be a breaking change. - Run
register-migration-rules.sh
to register the new Data Contract and Migration Rules. Showmigration_rules.json
. - Show the new
raw.recipe-value
Data Contract in the Confluent Cloud UI.application.major.version
is now set to 2. - Show the
split_chef_first_and_last_name
andjoin_chef_first_and_last_name
migration rules in the Confluent Cloud UI. - Force V1 to use the first version of the Schema. Set the
use.schema.id=
to the Schema ID of the first version inbyte-to-eat-v1\producer-recipe.template
- Set
use.latest.version
to `False
- Set
- Start up the V1 consumer is Window 3
./run-consumer-recipe.sh
- Start up the V2 consumer is Window 3
./run-consumer-recipe.sh
- Start up the V1 producer in Window 1
./run-producer-recipe.sh
. Observer the V1 and V2 consumer view of the data. - Start up the V1 producer in Window 2
./run-producer-recipe.sh
. Observer the V1 and V2 consumer view of the data.
- Switch to the
- Demo Cleanup
- Run
./terraform-destroy.sh
.
- Run
- PrePreparation
- To generate the effective POM:
mvn help:effective-pom -Doutput=effective-pom.xml
- To generate the Java Class from an AVRO Schema:
mvn generate-sources
- Java 11 or later
- Confluent Cloud (Kafka and Schema Registry)
- Terraform