Skip to content
This repository has been archived by the owner on Mar 17, 2024. It is now read-only.

Commit

Permalink
Documentation updates for 0.4.1 (#29)
Browse files Browse the repository at this point in the history
* Doc updates for standalone mode and metrics filter

* Misc. cleanup
  • Loading branch information
seglo authored Jun 6, 2019
1 parent d4dcf94 commit 3126369
Show file tree
Hide file tree
Showing 9 changed files with 161 additions and 23 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ target
*.tgz
*.iml
application.conf
!examples/standalone/application.conf
138 changes: 121 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,23 @@
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->


- [Introduction](#introduction)
- [Metrics](#metrics)
- [Labels](#labels)
- [Configuration](#configuration)
- [Install with Helm](#install-with-helm)
- [Examples](#examples)
- [Run on Kubernetes](#run-on-kubernetes)
- [Configuration](#configuration)
- [Install with Helm](#install-with-helm)
- [Examples](#examples)
- [View the health endpoint](#view-the-health-endpoint)
- [View exporter logs](#view-exporter-logs)
- [View exporter logs](#view-exporter-logs)
- [Run Standalone](#run-standalone)
- [Configuration](#configuration-1)
- [Running Docker Image](#running-docker-image)
- [Estimate Consumer Group Time Lag](#estimate-consumer-group-time-lag)
- [Strimzi Kafka Cluster Watcher](#strimzi-kafka-cluster-watcher)
- [Monitoring with Grafana](#monitoring-with-grafana)
- [Filtering Metrics without Prometheus Server](#filtering-metrics-without-prometheus-server)
- [Development](#development)
- [Tests](#tests)
- [Testing with local `docker-compose.yaml`](#testing-with-local-docker-composeyaml)
Expand All @@ -33,7 +39,7 @@

## Introduction

Kafka Lag Exporter makes it easy to view the latency of your [Apache Kafka](https://kafka.apache.org/)
Kafka Lag Exporter makes it easy to view the latency (residence time) of your [Apache Kafka](https://kafka.apache.org/)
consumer groups. It can run anywhere, but it provides features to run easily on [Kubernetes](https://kubernetes.io/)
clusters against [Strimzi](https://strimzi.io/) Kafka clusters using the [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/)
monitoring stack. Kafka Lag Exporter is an [Akka Typed](https://doc.akka.io/docs/akka/current/typed/index.html)
Expand All @@ -53,31 +59,31 @@ automatically detect the HTTP endpoint and scrape its data.

**`kafka_consumergroup_group_offset`**

Labels: `cluster_name, group, topic, partition, state, is_simple_consumer, member_host, consumer_id, client_id`
Labels: `cluster_name, group, topic, partition, member_host, consumer_id, client_id`

The offset of the last consumed offset for this partition in this topic partition for this group.

**`kafka_consumergroup_group_lag`**

Labels: `cluster_name, group, topic, partition, state, is_simple_consumer, member_host, consumer_id, client_id`
Labels: `cluster_name, group, topic, partition, member_host, consumer_id, client_id`

The difference between the last produced offset and the last consumed offset for this partition in this topic partition for this group.

**`kafka_consumergroup_group_lag_seconds`**

Labels: `cluster_name, group, topic, partition, state, is_simple_consumer, member_host, consumer_id, client_id`
Labels: `cluster_name, group, topic, partition, member_host, consumer_id, client_id`

The estimated lag in seconds. This metric correlates with lag in offsets. For more information on how this is calculated read the Estimate consumer group lag in time section below.

**`kafka_consumergroup_group_max_lag`**

Labels: `cluster_name, group, state, is_simple_consumer`
Labels: `cluster_name, group, is_simple_consumer`

The highest (maximum) lag in offsets for a given consumer group.

**`kafka_consumergroup_group_max_lag_seconds`**

Labels: `cluster_name, group, state, is_simple_consumer`
Labels: `cluster_name, group, is_simple_consumer`

The highest (maximum) lag in time for a given consumer group.

Expand All @@ -98,28 +104,28 @@ Each metric may include the following labels when reported.

The rest of the labels are passed along from the consumer group metadata requests.

* `state` - The state of the consumer group when the group data was polled.
* `is_simple_consumer` - Is this group using the [old] simple consumer API.
* `member_host` - The hostname or IP of the machine or container running the consumer group member that is assigned this partition.
* `client_id` - The id of the consumer group member. This is usually generated automatically by the group coordinator.
* `consumer_id` - The globally unique id of the consumer group member. This is usually a combination of the client_id and a GUID generated by the group coordinator.

Prometheus server may add additional labels based on your configuration. For example, Kubernetes pod information about the Kafka Lag Exporter pod where the metrics were scraped from.

## Configuration
## Run on Kubernetes

### Configuration

Details for configuration for the Helm Chart can be found in the [`values.yaml`](./charts/kafka-lag-exporter/values.yaml)
file of the accompanying Helm Chart.

## Install with Helm
### Install with Helm

You can install the chart from the local filesystem.

```
helm install https://github.com/lightbend/kafka-lag-exporter/releases/download/v0.4.0/kafka-lag-exporter-0.4.0.tgz
```

### Examples
#### Examples

Install with the [Strimzi](https://strimzi.io/) Kafka discovery feature.
See [Strimzi Kafka Cluster Watcher](#strimzi-kafka-cluster-watcher) for more details.
Expand Down Expand Up @@ -164,7 +170,7 @@ Ex)
kubectl port-forward service/kafka-lag-exporter-service 8080:8000 --namespace myproject
```

### View exporter logs
#### View exporter logs

To view the logs of the exporter, identify the pod name of the exporter and use the `kubectl logs` command.

Expand All @@ -174,9 +180,80 @@ Ex)
kubectl logs {POD_ID} --namespace myproject -f
```

## Run Standalone

To run the project in standalone mode you must first define a configuration `application.conf`. This configuration must
contain at least connection info to your Kafka cluster (`kafka-lag-exporter.clusters`). All other configuration has
defaults defined in the project itself. See [`reference.conf`](./src/main/resources/reference.conf) for defaults.
### Configuration

General Configuration (`kafka-lag-exporter{}`)

| Key | Default | Description |
|------------------------|--------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| `port` | `8000` | The port to run the Prometheus endpoint on |
| `poll-interval` | `30 seconds` | How often to poll Kafka for latest and group offsets |
| `lookup-table-size` | `60` | The maximum window size of the look up table **per partition** |
| `client-group-id` | `kafkalagexporter` | Consumer group id of kafka-lag-exporter's client connections |
| `kafka-client-timeout` | `10 seconds` | Connection timeout when making API calls to Kafka |
| `clusters` | `[]` | A statically defined list of Kafka connection details. This list is optional if you choose to use the Strimzi auto-discovery feature |
| `watchers` | `{}` | Settings for Kafka cluster "watchers" used for auto-discovery. |

Kafka Cluster Connection Details (`kafka-lag-exporter.clusters[]`)

| Key | Default | Required | Description |
|---------------------|-------------|----------|--------------------------------------------------------------------|
| `name` | `""` | Yes | A unique cluster name to for this Kafka connection detail object |
| `bootstrap-brokers` | `""` | Yes | Kafka bootstrap brokers. Comma delimited list of broker hostnames |
| `security-protocol` | `PLAINTEXT` | No | The Kafka security protocol. `PLAINTEXT` or `TLS`. |
| `sasl-mechanism` | `""` | No | Kafka SASL mechanism |
| `sasl-jaas-config` | `""` | No | Kafka JAAS configuration |

Watchers (`kafka-lag-exporters.watchers{}`)

| Key | Default | Description |
|---------------------|-------------|------------------------------------------|
| `strimzi` | `false` | Toggle for using Strimzi auto-discovery. |


Ex) Expose metrics on port `9999`, double the default lookup table size, and setup a single non-TLS cluster connection object.

```
kafka-lag-exporter {
port = 9999
lookup-table-size = 60
clusters = [
{
name = "a-cluster"
bootstrap-brokers = "a-1.cluster-a.xyzcorp.com:9092,a-2.cluster-a.xyzcorp.com:9092,a-3.cluster-a.xyzcorp.com:9092"
}
]
}
```

### Running Docker Image

Define an `application.conf` and optionally a `logback.xml` with your configuration.

Run the Docker image. Expose metrics endpoint on the host port `8000`. Mount a config dir with your `application.conf`
`logback.xml` into the container.

Ex)

```
docker run -p 8000:8000 \
-v $DIR:/opt/docker/conf/ \
lightbend/kafka-lag-exporter:0.4.0 \
/opt/docker/bin/kafka-lag-exporter \
-Dconfig.file=/opt/docker/conf/application.conf \
-Dlogback.configurationFile=/opt/docker/conf/logback.xml
```

See full example in [`./examples/standalone`](./examples/standalone).

## Estimate Consumer Group Time Lag

One of Kafka Lag Exporter’s more unique features is its ability to estimate the length of time that a consumer group is behind the last produced value for a particular partition, time lag. Offset lag is useful to indicate that the consumer group is lagging, but it doesn’t provide a sense of the actual latency of the consuming application.
One of Kafka Lag Exporter’s more unique features is its ability to estimate the length of time that a consumer group is behind the last produced value for a particular partition, time lag (wait time). Offset lag is useful to indicate that the consumer group is lagging, but it doesn’t provide a sense of the actual latency of the consuming application.

For example, a topic with two consumer groups may have different lag characteristics. Application A is a consumer which performs CPU intensive (and slow) business logic on each message it receives. It’s distributed across many consumer group members to handle the high load, but since its processing throughput is slower it takes longer to process each message per partition. Meanwhile Application B is a consumer which performs a simple ETL operation to land streaming data in another system, such as an HDFS data lake. It may have similar offset lag to Application A, but because it has a higher processing throughput its lag in time may be significantly less.

Expand Down Expand Up @@ -242,6 +319,25 @@ axis. The right Y axis has the sum of latest and last consumed offsets for all
![Max Consumer Group Time Lag Over Summed Offsets](./grafana/max_consumer_group_time_lag_over_summed_offsets.png)
4. **Kafka Lag Exporter JVM Metrics** - JVM metrics for the Kafka Lag Exporter itself.

## Filtering Metrics without Prometheus Server

It's possible to filter specific metric names using HTTP query parameters to the metrics health endpoint.

To filter 1 or more metrics use the query parameter pattern of `name[]=prometheus_metric_name`.

Ex)

```
$ curl -X GET -g http://localhost:8080?name[]=kafka_consumergroup_group_max_lag
# HELP kafka_consumergroup_group_max_lag Max group offset lag
# TYPE kafka_consumergroup_group_max_lag gauge
kafka_consumergroup_group_max_lag{cluster_name="pipelines-strimzi",group="variable-throughput-runtime.f3-merge.in01",} 52.0
...
```

This is an undocumented feature of the Prometheus HTTP server. For reference consult the [`parseQuery` method](https://github.com/prometheus/client_java/blob/4e0e7527b048f1ffd0382dcb74c0b9dab23b4d9f/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L101) for the
HTTP server in the [`prometheus/client_java`](https://github.com/prometheus/client_java/) GitHub repository.

## Development

### Tests
Expand Down Expand Up @@ -338,6 +434,14 @@ required. Before running a release make sure the following pre-req's are met.

## Change log

0.4.1

* Remove labels `state` and `is_simple_consumer` from group topic partition metrics
* Document metric endpoint filtering [#24](https://github.com/lightbend/kafka-lag-exporter/issues/24)
* Document standalone deployment mode [#22](https://github.com/lightbend/kafka-lag-exporter/issues/22)
* Evict metrics from endpoint when they're no longer tracked by Kafka [#25](https://github.com/lightbend/kafka-lag-exporter/issues/25)
* Support clusters with TLS and SASL [#21](https://github.com/lightbend/kafka-lag-exporter/pull/21)

0.4.0

* Open Sourced! 🎆 [#17](https://github.com/lightbend/kafka-lag-exporter/issues/17)
Expand Down
9 changes: 9 additions & 0 deletions examples/standalone/application.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
kafka-lag-exporter {
port = 8000
clusters = [
{
name = "a-cluster"
bootstrap-brokers = "a-1.cluster-a.xyzcorp.com:9092,a-2.cluster-a.xyzcorp.com:9092,a-3.cluster-a.xyzcorp.com:9092"
}
]
}
15 changes: 15 additions & 0 deletions examples/standalone/logback.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<configuration>
<variable name="ROOT_LOG_LEVEL" value="${ROOT_LOG_LEVEL:-INFO}" />
<variable name="KAFKA_LAG_EXPORTER_LOG_LEVEL" value="${KAFKA_LAG_EXPORTER_LOG_LEVEL:-INFO}" />
<variable name="KAFKA_LAG_EXPORTER_KAFKA_LOG_LEVEL" value="${KAFKA_LAG_EXPORTER_KAFKA_LOG_LEVEL:-INFO}" />
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%date{ISO8601} %-5level %logger{36} %X{akkaSource} - %msg %ex%n</pattern>
</encoder>
</appender>
<logger name="org.apache.kafka" level="${KAFKA_LAG_EXPORTER_KAFKA_LOG_LEVEL}"/>
<logger name="com.lightbend.kafkalagexporter" level="${KAFKA_LAG_EXPORTER_LOG_LEVEL}"/>
<root level="${ROOT_LOG_LEVEL}">
<appender-ref ref="STDOUT" />
</root>
</configuration>
10 changes: 10 additions & 0 deletions examples/standalone/run-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

DIR="$(cd "$(dirname "$0")" && pwd)"

docker run -p 8000:8000 \
-v $DIR:/opt/docker/conf/ \
lightbend/kafka-lag-exporter:0.4.0 \
/opt/docker/bin/kafka-lag-exporter \
-Dconfig.file=/opt/docker/conf/application.conf \
-Dlogback.configurationFile=/opt/docker/conf/logback.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import java.time.Clock

import akka.actor.Cancellable
import akka.actor.typed.scaladsl.Behaviors
import akka.actor.typed.{ActorRef, Behavior, PostStop, SupervisorStrategy}
import akka.actor.typed.{ActorRef, Behavior, SupervisorStrategy}
import com.lightbend.kafkalagexporter.KafkaClient.KafkaClientContract
import com.lightbend.kafkalagexporter.LookupTable.Table.{LagIsZero, Prediction, TooFewPoints}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ class KafkaClient private[kafkalagexporter](cluster: KafkaCluster, groupId: Stri
}.toMap

def close(): Unit = {
adminClient.close(_clientTimeout.toMillis, TimeUnit.MILLISECONDS)
adminClient.close(_clientTimeout)
consumer.close(_clientTimeout)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

package com.lightbend.kafkalagexporter

import akka.actor.typed.Behavior
import akka.actor.typed.scaladsl.Behaviors
import akka.actor.typed.{Behavior, PostStop}
import com.lightbend.kafkalagexporter.MetricsSink._

object MetricsReporter {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,8 @@
package com.lightbend.kafkalagexporter.watchers

import akka.actor.typed.scaladsl.Behaviors
import akka.actor.typed.{ActorRef, Behavior, PostStop}
import com.lightbend.kafkalagexporter.KafkaCluster
import com.lightbend.kafkalagexporter.KafkaClusterManager
import akka.actor.typed.{ActorRef, Behavior}
import com.lightbend.kafkalagexporter.{KafkaCluster, KafkaClusterManager}

object StrimziClusterWatcher {
val name: String = "strimzi"
Expand Down

0 comments on commit 3126369

Please # to comment.