Skip to content

Latest commit

 

History

History
238 lines (173 loc) · 9.14 KB

develop.md

File metadata and controls

238 lines (173 loc) · 9.14 KB

Developer guide

This guide details what you'll need to contribute to Materialize.

Materialize is written in Rust and should compile on any recent stable version.

Materialize can be connected to many different types of event sources:

  • Local files with line-by-line textual events, where structured data can be extracted via regular expressions or CSV parsing.
  • Custom Kafka topics, where events are encoded with Protobuf or Avro, with support for additional encoding formats coming soon.
  • Kafka topics managed by a CDC tool like Debezium, where events adhere to a particular "envelope" format that distinguishes updates from insertions and deletions.
  • Streaming HTTP sources? Apache Pulsar sources? With a bit of elbow grease, support for any message bus can be added to Materialize!

Note that local file sources are intended only for ad-hoc experimentation and analysis. Production use cases are expected to use Kafka sources, which have a better availability and durability story.

Installing

Rust

Install Rust via rustup:

curl https://sh.rustup.rs -sSf | sh

Rustup will automatically select the correct toolchain version specified in materialize/rust-toolchain.

Confluent Platform

The Confluent Platform bundles Apache ZooKeeper and Apache Kafka with several non-free Confluent tools, like the Confluent Schema Registry and Control Center. For local development, the Confluent CLI allows easy management of these services

On macOS, the easiest installation method is to use Homebrew:

brew install confluent-platform

On Debian-based Linux variants, it's a tad more involved:

curl http://packages.confluent.io/deb/5.2/archive.key | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.2 stable main"
sudo apt update
sudo apt install openjdk-8-jre-headless confluent-community-2.12

On other Linux variants, you'll need to make your own way through Confluent's installation instructions. Note that, at the time of writing, Java 8 is a strict requirement. Later versions of Java are not supported.

On Linux, you might want to consider using nix. It is a purely functional package manager, which is appealing because bits and pieces of state usually are to blame when package management goes wrong. Plus, getting started is easy:

cd materialize
nix-shell

This will start a new shell with all the necessary dependencies available and pinned to the correct version. Reach out to @jamii for more information.

Confluent CLI

As of Feb 24, 2020 you can run:

curl -L --http1.1 https://cnfl.io/cli | sh -s -- -b /usr/local/bin

However, if this ever stops working, check out these great docs on the Confluent CLI.

Building Materialize

Materialize is fully integrated with Cargo, so building it is dead simple:

git clone git@github.com:MaterializeInc/materialize.git
cd materialize
cargo run --bin materialized

Because the MaterializeInc organization requires two-factor authentication (2FA), you'll need to clone via SSH as indicated above, or configure a personal access token for use with HTTPS.

Prepping Confluent

Like we mentioned above, you need to have a few Confluent services running to get Materialize to work. To prep what you need (for the [demo], at least), run the following:

confluent local start kafka     # Also starts zookeeper
confluent local start schema-registry

You can also use the included confluent CLI command to start and stop individual services. For example:

confluent local status        # View what services are currently running.
confluent local start kafka   # Start Kafka and any services it depends upon.
confluent local log kafka     # View Kafka log file.

Beware that the CLI is fairly buggy, especially around service management. Putting your computer to sleep often causes the service status to get out of sync. In other words, trust the output of confluent local log and ps ... | grep over the output of confluent local status. Still, it's reliable enough to be more convenient than managing each service manually.

demo

Symbiosis mode

For the convenience of developers, Materialize has a semi-secret "symbiosis" mode that turns Materialize into a full HTAP system, rather than an OLAP system that must sit atop a OLTP system via a CDC pipeline. In other words, where you would normally need to plug MySQL into Debezium into Kafka into Materialize, and run all the Confluent services that that entails, you can instead run:

$ materialized --symbiosis postgres://localhost:5432

When symbiosis mode is active, all DDL statements and all writes will be routed to the specified PostgreSQL server. CREATE TABLE, for example, will create both a table in PostgreSQL and a source in Materialize that mirrors that table. INSERT, UPDATE, and DELETE statements that target that table will be reflected in Materialize for the next SELECT statement.

Symbiosis mode is not suitable for production use, as its implementation is very inefficient. It is, however, excellent for manually taking Materialize for a spin without the hassle of setting up various Kafka topics and Avro schemas. It also powers our sqllogictest runner.

See the symbiosis crate documentation for more details.

Testing

Materialize's testing philosophy is sufficiently complex that it warrants its own document. See Developer guide: testing.

Git workflow

Submitting changes

We require that every change is first opened as a GitHub pull request and subjected to a CI run. GitHub will prevent you from pushing directly to master or merging a PR that does not have a green CI run.

Our CI provider is Buildkite (https://buildkite.com). It's like Travis CI or Circle CI, if you're familiar with either of those, except that it lets you bring your own infrastructure. Details about the setup are in ci/README.md, but the day-to-day interaction with Buildkite should be straightforward.

If you want more confidence that your PR will succeed in CI, you can run bin/pre-push before pushing your changes. You can configure Git to do this automatically by following the instructions in misc/githooks/pre-push. The pre-push checks don't run the full battery of tests, but a small subset that experience shows are the most frustrating when they fail in CI, like linters.

While the team is small, we leave it up to you to decide whether your PR needs a review. Your first several PRs at Materialize should go through review no matter what, but once you learn the ropes you should feel free to land small, uncontroversial changes without review. It's not always possible to perfectly predict controversiality ahead of time, of course, but reverts are cheap and easy, so err on the side of merging for now.

Git details

Nikhil highly recommends that you configure git pull to use rebase instead of merge:

git config pull.rebase true
git config rebase.autoStash true

This keeps the Git history tidy, since it avoids creating merge commits when you git pull with unpushed changes. The rebase.autoStash option makes this workflow particularly ergonomic by stashing any uncommitted changes you have when you run git pull, then unstashing them after the rebase is complete.

Other repositories

Several components of Materialize are maintained in separate Git repositories. Where possible, we prefer to keep things in the main repository (a "monorepo" approach), but when forking existing packages, maintaining a separate repository makes it easier to integrate changes from upstream.

Some notable repositories include:

  • mtrlz-setup, containing automatic development environment setup scripts;
  • rust-rdkafka, a fork of the Rust Kafka library;
  • sqlparser, a heavily-customized fork of a general SQL parsing package.

As mentioned before, because the MaterializeInc organization requires two-factor authentication (2FA), to clone these repositories you'll need to use either SSH or configure a personal access token for use with HTTPS.