Skip to content

Commit

Permalink
feat: world engine telemetry setup and config (#79)
Browse files Browse the repository at this point in the history
Closes: WORLD-1195

## Overview

This PR adds new `world.toml` configuration options related to telemetry, CLI flags, and docker services. The services added are Jaeger for tracing, and Prometheus for metrics. ATM these are only for Nakama, but will support Cardinal in the future. For more context, checkout this [Notion doc](https://www.notion.so/arguslabs/World-Engine-Telemetry-Config-119c63b376aa80f6a13cd1ef7752b8e6).

The behaviour of the `--telemetry` flag is as follows:
- if `--telemetry` is disabled, then all telemetry is disabled regardless of the values in `world.toml`
- if `--telemetry` is enabled, then the telemetry components (tracing, metrics, profiling) will be enabled based on the values in `world.toml`

**NOTE**: The scope of this PR is only for toggling telemetry for Nakama. Cardinal's will be done in a separate PR.
**NOTE**: Some tests are failing because the Nakama docker image isn't created yet (waiting for [this PR](Argus-Labs/world-engine#799)).

## Brief Changelog
- Added Jaeger and Prometheus docker services
- Added `--telemetry` flag
- Added `world.toml` config options: `NAKAMA_TRACE_ENABLED`, `NAKAMA_TRACE_SAMPLE_RATE`, `NAKAMA_METRICS_ENABLED`, `NAKAMA_METRICS_INTERVAL`.

## Testing and Verifying

Manually tested and verified

### Notes for QA Team

Here are some test cases to consider including in the World CLI's test suite:
- `world cardinal start` without `--telemetry` flag won't enable tracing/metrics in Nakama and Prometheus and Jaeger containers won't be started.
- `world cardinal start --telemetry` with `NAKAMA_TRACES_ENABLED=true` and `NAKAMA_METRICS_ENABLED=false` will start Jaeger without Prometheus. The same applies vice versa. If both options are set to `true`, then both containers will be started.

## Summary by CodeRabbit

- **New Features**
	- Introduced Jaeger container configuration for enhanced tracing capabilities.
	- Added Prometheus container configuration for improved monitoring and metrics collection.
	- Implemented a telemetry flag for flexible service management based on tracing and metrics settings.
	- Added configuration for Jaeger and Prometheus containers in the Docker environment.
	- Introduced telemetry options in the Cardinal command-line interface.

- **Updates**
	- Upgraded Nakama Docker image version from `1.2.7` to `1.3.0`.
	- Enhanced Nakama configuration with new environment variables for Jaeger integration and metrics collection.
	- Expanded purge command to include Jaeger and Prometheus services for improved cleanup.
	- Enhanced the `purge` and `stop` commands to manage Jaeger and Prometheus services.
	- Updated the Nakama service configuration to support new telemetry and metrics features.

- **Bug Fixes**
	- Improved cleanup process in tests to ensure all relevant services are purged.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

## Release Notes

- **New Features**
	- Added support for configuring Jaeger and Prometheus containers in the Docker environment.
	- Introduced telemetry options in the configuration settings for enhanced monitoring capabilities.

- **Improvements**
	- Expanded the `purge` command to include Jaeger and Prometheus services.
	- Updated the `start` and `stop` commands to manage additional telemetry services based on user-defined settings.

- **Bug Fixes**
	- Enhanced test cleanup processes to ensure reliable testing environments by including additional services in purging operations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
  • Loading branch information
rmrt1n committed Oct 11, 2024
1 parent 9888454 commit 88789d7
Show file tree
Hide file tree
Showing 9 changed files with 149 additions and 22 deletions.
3 changes: 2 additions & 1 deletion cmd/world/cardinal/purge.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ This command stop all Docker services and remove all Docker volumes.`,
}
defer dockerClient.Close()

err = dockerClient.Purge(cmd.Context(), service.Nakama, service.Cardinal, service.NakamaDB, service.Redis)
err = dockerClient.Purge(cmd.Context(), service.Nakama, service.Cardinal,
service.NakamaDB, service.Redis, service.Jaeger, service.Prometheus)
if err != nil {
return err
}
Expand Down
26 changes: 19 additions & 7 deletions cmd/world/cardinal/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,12 @@ import (
/////////////////

const (
flagBuild = "build"
flagDebug = "debug"
flagDetach = "detach"
flagLogLevel = "log-level"
flagEditor = "editor"
flagBuild = "build"
flagDebug = "debug"
flagDetach = "detach"
flagLogLevel = "log-level"
flagEditor = "editor"
flagTelemetry = "telemetry"

// DockerCardinalEnvLogLevel Environment variable name for Docker
DockerCardinalEnvLogLevel = "CARDINAL_LOG_LEVEL"
Expand Down Expand Up @@ -74,6 +75,10 @@ This will start the following Docker services and its dependencies:
if err := replaceBoolWithFlag(cmd, flagDetach, &cfg.Detach); err != nil {
return err
}

if err := replaceBoolWithFlag(cmd, flagTelemetry, &cfg.Telemetry); err != nil {
return err
}
cfg.Timeout = -1

// Replace cardinal log level using flag value if flag is set
Expand Down Expand Up @@ -120,8 +125,14 @@ This will start the following Docker services and its dependencies:

// Start the World Engine stack
group.Go(func() error {
if err := dockerClient.Start(ctx, service.NakamaDB,
service.Redis, service.Cardinal, service.Nakama); err != nil {
services := []service.Builder{service.NakamaDB, service.Redis, service.Cardinal, service.Nakama}
if cfg.Telemetry && cfg.DockerEnv["NAKAMA_TRACE_ENABLED"] == "true" {
services = append(services, service.Jaeger)
}
if cfg.Telemetry && cfg.DockerEnv["NAKAMA_METRICS_ENABLED"] == "true" {
services = append(services, service.Prometheus)
}
if err := dockerClient.Start(ctx, services...); err != nil {
return eris.Wrap(err, "Encountered an error with Docker")
}
return eris.Wrap(ErrGracefulExit, "Stack terminated")
Expand Down Expand Up @@ -157,6 +168,7 @@ func init() {
startCmd.Flags().Bool(flagDetach, false, "Run in detached mode")
startCmd.Flags().String(flagLogLevel, "", "Set the log level")
startCmd.Flags().Bool(flagDebug, false, "Enable delve debugging")
startCmd.Flags().Bool(flagTelemetry, false, "Enable tracing, metrics, and profiling")
}

// replaceBoolWithFlag overwrites the contents of vale with the contents of the given flag. If the flag
Expand Down
3 changes: 2 additions & 1 deletion cmd/world/cardinal/stop.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ This will stop the following Docker services:
}
defer dockerClient.Close()

err = dockerClient.Stop(cmd.Context(), service.Nakama, service.Cardinal, service.NakamaDB, service.Redis)
err = dockerClient.Stop(cmd.Context(), service.Nakama, service.Cardinal,
service.NakamaDB, service.Redis, service.Jaeger, service.Prometheus)
if err != nil {
return err
}
Expand Down
11 changes: 5 additions & 6 deletions common/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,10 @@ const (
flagForConfigFile = "config"
)

var (
// Items under these toml headers will be included in the environment variables when
// running docker. An error will be generated if a duplicate key is found across
// these sections.
dockerEnvHeaders = []string{"cardinal", "evm", "nakama", "common"}
)
// Items under these toml headers will be included in the environment variables when
// running docker. An error will be generated if a duplicate key is found across
// these sections.
var dockerEnvHeaders = []string{"cardinal", "evm", "nakama", "common"}

type Config struct {
RootDir string
Expand All @@ -34,6 +32,7 @@ type Config struct {
Build bool
Debug bool
DevDA bool
Telemetry bool
Timeout int
DockerEnv map[string]string
}
Expand Down
7 changes: 4 additions & 3 deletions common/docker/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ func TestMain(m *testing.M) {
os.Exit(1)
}

err = dockerClient.Purge(context.Background(), service.Nakama, service.Cardinal, service.Redis, service.NakamaDB)
err = dockerClient.Purge(context.Background(), service.Nakama,
service.Cardinal, service.Redis, service.NakamaDB, service.Jaeger, service.Prometheus)
if err != nil {
logger.Errorf("Failed to purge containers: %v", err)
os.Exit(1)
Expand Down Expand Up @@ -239,8 +240,8 @@ func redisIsDown(t *testing.T) bool {
func cleanUp(t *testing.T, dockerClient *Client) {
t.Cleanup(func() {
assert.NilError(t, dockerClient.Purge(context.Background(), service.Nakama,
service.Cardinal, service.Redis,
service.NakamaDB), "Failed to purge container during cleanup")
service.Cardinal, service.Redis, service.NakamaDB, service.Jaeger, service.Prometheus),
"Failed to purge container during cleanup")

assert.NilError(t, dockerClient.Close())
})
Expand Down
2 changes: 1 addition & 1 deletion common/docker/service/evm.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ func EVM(cfg *config.Config) Service {

faucetEnabled := cfg.DockerEnv["FAUCET_ENABLED"]
if faucetEnabled == "" {
faucetEnabled = "false"
faucetEnabled = "false" //nolint:goconst // default values should be local to the service
}

faucetAddress := cfg.DockerEnv["FAUCET_ADDRESS"]
Expand Down
28 changes: 28 additions & 0 deletions common/docker/service/jaeger.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package service

import (
"fmt"

"github.com/docker/docker/api/types/container"

"pkg.world.dev/world-cli/common/config"
)

func getJaegerContainerName(cfg *config.Config) string {
return fmt.Sprintf("%s-jaeger", cfg.DockerEnv["CARDINAL_NAMESPACE"])
}

func Jaeger(cfg *config.Config) Service {
exposedPorts := []int{16686}

return Service{
Name: getJaegerContainerName(cfg),
Config: container.Config{
Image: "jaegertracing/all-in-one:1.61.0",
},
HostConfig: container.HostConfig{
PortBindings: newPortMap(exposedPorts),
NetworkMode: container.NetworkMode(cfg.DockerEnv["CARDINAL_NAMESPACE"]),
},
}
}
36 changes: 33 additions & 3 deletions common/docker/service/nakama.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package service

import (
"fmt"
"strconv"
"time"

"github.com/docker/docker/api/types/container"
Expand Down Expand Up @@ -34,28 +35,57 @@ func Nakama(cfg *config.Config) Service {
dbPassword = "very_unsecure_password_please_change" //nolint:gosec // This is a default password
}

traceEnabled := cfg.DockerEnv["NAKAMA_TRACE_ENABLED"]
if traceEnabled == "" || !cfg.Telemetry {
traceEnabled = "false"
}

traceSampleRate := cfg.DockerEnv["NAKAMA_TRACE_SAMPLE_RATE"]
if traceSampleRate == "" {
traceSampleRate = "0.6"
}

metricsEnabled := false
if cfg.Telemetry {
cfgMetricsEnabled, err := strconv.ParseBool(cfg.DockerEnv["NAKAMA_METRICS_ENABLED"])
if err == nil {
metricsEnabled = cfgMetricsEnabled
}
}

// prometheus metrics export is disabled if port is 0
// src: https://heroiclabs.com/docs/nakama/getting-started/configuration/#metrics
prometheusPort := 0
if metricsEnabled {
prometheusPort = 9100
}

exposedPorts := []int{7349, 7350, 7351}

return Service{
Name: getNakamaContainerName(cfg),
Config: container.Config{
Image: "ghcr.io/argus-labs/world-engine-nakama:1.2.7",
Image: "ghcr.io/argus-labs/world-engine-nakama:1.2.9",
Env: []string{
fmt.Sprintf("CARDINAL_CONTAINER=%s", getCardinalContainerName(cfg)),
fmt.Sprintf("CARDINAL_ADDR=%s:4040", getCardinalContainerName(cfg)),
fmt.Sprintf("CARDINAL_NAMESPACE=%s", cfg.DockerEnv["CARDINAL_NAMESPACE"]),
fmt.Sprintf("DB_PASSWORD=%s", dbPassword),
fmt.Sprintf("ENABLE_ALLOWLIST=%s", enableAllowList),
fmt.Sprintf("OUTGOING_QUEUE_SIZE=%s", outgoingQueueSize),
fmt.Sprintf("TRACE_ENABLED=%s", traceEnabled),
fmt.Sprintf("JAEGER_ADDR=%s:4317", getJaegerContainerName(cfg)),
fmt.Sprintf("JAEGER_SAMPLE_RATE=%s", traceSampleRate),
},
Entrypoint: []string{
"/bin/sh",
"-ec",
fmt.Sprintf("/nakama/nakama migrate up --database.address root:%s@%s:26257/nakama && /nakama/nakama --config /nakama/data/local.yml --database.address root:%s@%s:26257/nakama --socket.outgoing_queue_size=64 --logger.level INFO", //nolint:lll
fmt.Sprintf("/nakama/nakama migrate up --database.address root:%s@%s:26257/nakama && /nakama/nakama --config /nakama/data/local.yml --database.address root:%s@%s:26257/nakama --socket.outgoing_queue_size=64 --logger.level INFO --metrics.prometheus_port %d", //nolint:lll
dbPassword,
getNakamaDBContainerName(cfg),
dbPassword,
getNakamaDBContainerName(cfg)),
getNakamaDBContainerName(cfg),
prometheusPort),
},
ExposedPorts: getExposedPorts(exposedPorts),
Healthcheck: &container.HealthConfig{
Expand Down
55 changes: 55 additions & 0 deletions common/docker/service/prometheus.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
package service

import (
"fmt"
"strings"

"github.com/docker/docker/api/types/container"

"pkg.world.dev/world-cli/common/config"
)

var containerCmd = `sh -s <<EOF
cat > ./prometheus.yaml <<EON
global:
scrape_interval: __NAKAMA_METRICS_INTERVAL__s
evaluation_interval: __NAKAMA_METRICS_INTERVAL__s
scrape_configs:
- job_name: nakama
metrics_path: /
static_configs:
- targets: ['__NAKAMA_CONTAINER__:9100']
EON
prometheus --config.file=./prometheus.yaml
EOF
`

func getPrometheusContainerName(cfg *config.Config) string {
return fmt.Sprintf("%s-prometheus", cfg.DockerEnv["CARDINAL_NAMESPACE"])
}

func Prometheus(cfg *config.Config) Service {
nakamaMetricsInterval := cfg.DockerEnv["NAKAMA_METRICS_INTERVAL"]
if nakamaMetricsInterval == "" {
nakamaMetricsInterval = "30"
}

exposedPorts := []int{9090}

containerCmd = strings.ReplaceAll(containerCmd, "__NAKAMA_CONTAINER__", getNakamaContainerName(cfg))
containerCmd = strings.ReplaceAll(containerCmd, "__NAKAMA_METRICS_INTERVAL__", nakamaMetricsInterval)

return Service{
Name: getPrometheusContainerName(cfg),
Config: container.Config{
Image: "prom/prometheus:v2.54.1",
Entrypoint: []string{"/bin/sh", "-c"},
Cmd: []string{containerCmd},
},
HostConfig: container.HostConfig{
PortBindings: newPortMap(exposedPorts),
NetworkMode: container.NetworkMode(cfg.DockerEnv["CARDINAL_NAMESPACE"]),
},
}
}

0 comments on commit 88789d7

Please # to comment.