Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: world engine telemetry setup and config #79

Merged
merged 1 commit into from
Oct 11, 2024
Merged

Conversation

rmrt1n
Copy link
Member

@rmrt1n rmrt1n commented Oct 8, 2024

Closes: WORLD-1195

Overview

This PR adds new world.toml configuration options related to telemetry, CLI flags, and docker services. The services added are Jaeger for tracing, and Prometheus for metrics. ATM these are only for Nakama, but will support Cardinal in the future. For more context, checkout this Notion doc.

The behaviour of the --telemetry flag is as follows:

  • if --telemetry is disabled, then all telemetry is disabled regardless of the values in world.toml
  • if --telemetry is enabled, then the telemetry components (tracing, metrics, profiling) will be enabled based on the values in world.toml

NOTE: The scope of this PR is only for toggling telemetry for Nakama. Cardinal's will be done in a separate PR.
NOTE: Some tests are failing because the Nakama docker image isn't created yet (waiting for this PR).

Brief Changelog

  • Added Jaeger and Prometheus docker services
  • Added --telemetry flag
  • Added world.toml config options: NAKAMA_TRACE_ENABLED, NAKAMA_TRACE_SAMPLE_RATE, NAKAMA_METRICS_ENABLED, NAKAMA_METRICS_INTERVAL.

Testing and Verifying

Manually tested and verified

Notes for QA Team

Here are some test cases to consider including in the World CLI's test suite:

  • world cardinal start without --telemetry flag won't enable tracing/metrics in Nakama and Prometheus and Jaeger containers won't be started.
  • world cardinal start --telemetry with NAKAMA_TRACES_ENABLED=true and NAKAMA_METRICS_ENABLED=false will start Jaeger without Prometheus. The same applies vice versa. If both options are set to true, then both containers will be started.

Summary by CodeRabbit

  • New Features

    • Introduced Jaeger container configuration for enhanced tracing capabilities.
    • Added Prometheus container configuration for improved monitoring and metrics collection.
    • Implemented a telemetry flag for flexible service management based on tracing and metrics settings.
    • Added configuration for Jaeger and Prometheus containers in the Docker environment.
    • Introduced telemetry options in the Cardinal command-line interface.
  • Updates

    • Upgraded Nakama Docker image version from 1.2.7 to 1.3.0.
    • Enhanced Nakama configuration with new environment variables for Jaeger integration and metrics collection.
    • Expanded purge command to include Jaeger and Prometheus services for improved cleanup.
    • Enhanced the purge and stop commands to manage Jaeger and Prometheus services.
    • Updated the Nakama service configuration to support new telemetry and metrics features.
  • Bug Fixes

    • Improved cleanup process in tests to ensure all relevant services are purged.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for configuring Jaeger and Prometheus containers in the Docker environment.
    • Introduced telemetry options in the configuration settings for enhanced monitoring capabilities.
  • Improvements

    • Expanded the purge command to include Jaeger and Prometheus services.
    • Updated the start and stop commands to manage additional telemetry services based on user-defined settings.
    • Enhanced the Nakama function with new environment variables for improved telemetry and metrics configuration.
  • Bug Fixes

    • Enhanced test cleanup processes to ensure reliable testing environments by including additional services in purging operations.

Copy link

graphite-app bot commented Oct 8, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “graphite/merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge.

You must have a Graphite account and log in to Graphite in order to use the merge queue. # using this link.

Copy link

coderabbitai bot commented Oct 8, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The changes introduce modifications to the configuration and functionality of several services within a Docker environment. A new file jaeger.go is added to define functions for configuring a Jaeger container, while another new file prometheus.go serves a similar purpose for Prometheus. The Nakama service configuration is updated with a new Docker image tag and additional environment variables. The purge command is expanded to include Jaeger and Prometheus services, and the command-line interface for the Cardinal game shard stack is enhanced with a telemetry flag.

Changes

File Path Change Summary
common/config/config.go Added Telemetry bool to Config struct, updated dockerEnvHeaders variable declaration.
common/docker/service/evm.go Added comment to faucetEnabled variable assignment.
common/docker/service/jaeger.go Added functions getJaegerContainerName and Jaeger for Jaeger container configuration.
common/docker/service/nakama.go Updated Docker image tag to 1.2.9, added environment variables NAKAMA_TRACE_ENABLED and NAKAMA_TRACE_SAMPLE_RATE, modified entry point command.
common/docker/service/prometheus.go Added functions getPrometheusContainerName and Prometheus for Prometheus container configuration.
cmd/world/cardinal/purge.go Updated Purge method to include service.Jaeger and service.Prometheus.
cmd/world/cardinal/start.go Added flagTelemetry for enabling telemetry, modified RunE to conditionally start services based on telemetry settings, added getStartedServices function.
cmd/world/cardinal/stop.go Updated Stop method to include service.Jaeger and service.Prometheus.
common/docker/client_test.go Updated Purge method calls in tests to include service.Jaeger and service.Prometheus.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@rmrt1n rmrt1n marked this pull request as draft October 8, 2024 17:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Files that changed from the base of the PR and between 9888454 and 63eb4c6.

📒 Files selected for processing (3)
  • common/docker/service/jaeger.go (1 hunks)
  • common/docker/service/nakama.go (1 hunks)
  • common/docker/service/prometheus.go (1 hunks)
🧰 Additional context used
🪛 GitHub Check: Go
common/docker/service/jaeger.go

[failure] 6-6:
File is not gci-ed with --skip-generated -s standard -s default -s prefix(pkg.world.dev/world-cli) -s blank -s dot --custom-order (gci)

common/docker/service/nakama.go

[failure] 42-42:
line is 123 characters (lll)


[failure] 53-53:
printf: fmt.Sprintf format %.2f has arg 1 of wrong type int (govet)

common/docker/service/prometheus.go

[failure] 7-7:
File is not gci-ed with --skip-generated -s standard -s default -s prefix(pkg.world.dev/world-cli) -s blank -s dot --custom-order (gci)

🪛 GitHub Check: Unit & Coverage
common/docker/service/nakama.go

[failure] 53-53:
fmt.Sprintf format %.2f has arg 1 of wrong type int

🔇 Additional comments (4)
common/docker/service/jaeger.go (1)

1-8: LGTM: Package declaration and imports are appropriate.

The package name and imports are well-structured and relevant to the file's purpose. The use of the project's import path convention for the custom package is good practice.

🧰 Tools
🪛 GitHub Check: Go

[failure] 6-6:
File is not gci-ed with --skip-generated -s standard -s default -s prefix(pkg.world.dev/world-cli) -s blank -s dot --custom-order (gci)

common/docker/service/nakama.go (1)

Line range hint 1-87: Summary of changes and recommendations

The modifications to the Nakama service configuration align well with the PR objectives of enhancing telemetry capabilities. The main changes include updating the Docker image, adding Jaeger configuration, and including Prometheus metrics.

Key points to address:

  1. Ensure the correct Docker image tag is used after the referenced PR is merged.
  2. Implement configurability for Jaeger and Prometheus settings based on world.toml.
  3. Fix the format specifier for JAEGER_SAMPLE_RATE.
  4. Improve readability by splitting long lines, especially in the Nakama command.

These changes will significantly enhance the telemetry setup while maintaining code quality and flexibility.

🧰 Tools
🪛 GitHub Check: Go

[failure] 42-42:
line is 123 characters (lll)


[failure] 53-53:
printf: fmt.Sprintf format %.2f has arg 1 of wrong type int (govet)

🪛 GitHub Check: Unit & Coverage

[failure] 53-53:
fmt.Sprintf format %.2f has arg 1 of wrong type int

common/docker/service/prometheus.go (2)

34-34: Confirm the target port for Nakama metrics

The Prometheus configuration targets Nakama metrics on port 9100. Ensure that Nakama exposes its metrics on this port. If a different port is used, please update the configuration accordingly.


32-32: ⚠️ Potential issue

Verify the 'metrics_path' for Nakama

The metrics_path is set to /, but typically, Prometheus metrics are exposed at /metrics. Please confirm that / is the correct endpoint for scraping Nakama metrics. If not, update it to /metrics.

Apply this diff if /metrics is the correct path:

   - job_name: nakama
-    metrics_path: /
+    metrics_path: /metrics
     static_configs:

common/docker/service/jaeger.go Show resolved Hide resolved
common/docker/service/jaeger.go Show resolved Hide resolved
common/docker/service/jaeger.go Outdated Show resolved Hide resolved
common/docker/service/nakama.go Outdated Show resolved Hide resolved
common/docker/service/nakama.go Outdated Show resolved Hide resolved
common/docker/service/nakama.go Outdated Show resolved Hide resolved
common/docker/service/prometheus.go Outdated Show resolved Hide resolved
common/docker/service/prometheus.go Outdated Show resolved Hide resolved
@rmrt1n rmrt1n changed the title [WIP] feat: world engine telemetry setup and config feat: world engine telemetry setup and config Oct 9, 2024
@rmrt1n rmrt1n marked this pull request as ready for review October 9, 2024 18:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range comments (3)
cmd/world/cardinal/stop.go (2)

Line range hint 46-46: Consider updating the success message

The success message "Cardinal successfully stopped" might not accurately reflect the operation if other services are also being stopped. Consider updating it to be more inclusive of all stopped services.

Suggestion:

- fmt.Println("Cardinal successfully stopped")
+ fmt.Println("Cardinal game shard stack successfully stopped")

This change would better align with the command's description and the potentially broader scope of services being stopped.


Line range hint 18-23: Update command description to reflect dynamic service stopping

The current command description lists specific services that will be stopped. However, with the introduction of dynamic service stopping, this list may no longer be accurate or complete.

Consider updating the command description to reflect the dynamic nature of the services being stopped. For example:

 This will stop the following Docker services:
-- Cardinal (Game shard)
-- Nakama (Relay) + DB
-- Redis (Cardinal dependency)`,
+- All running services associated with your Cardinal game shard stack`,

This change would make the description more flexible and aligned with the new implementation.

common/docker/service/evm.go (1)

Line range hint 1-68: Consider adding telemetry configuration for EVM service in the future

While this file doesn't require immediate changes related to the PR objectives, it might be beneficial to consider adding telemetry configuration for the EVM service in the future. This would align with the overall goal of enhancing telemetry capabilities across the system.

For now, this suggestion can be kept as a potential future improvement, as it's not directly related to the current PR scope focusing on Nakama telemetry.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Files that changed from the base of the PR and between 63eb4c6 and 396c15c.

📒 Files selected for processing (8)
  • cmd/world/cardinal/purge.go (1 hunks)
  • cmd/world/cardinal/start.go (5 hunks)
  • cmd/world/cardinal/stop.go (1 hunks)
  • common/config/config.go (2 hunks)
  • common/docker/service/evm.go (1 hunks)
  • common/docker/service/jaeger.go (1 hunks)
  • common/docker/service/nakama.go (2 hunks)
  • common/docker/service/prometheus.go (1 hunks)
🧰 Additional context used
🔇 Additional comments (12)
common/docker/service/jaeger.go (4)

1-28: Overall, good implementation with room for improvements.

The jaeger.go file successfully introduces the necessary functionality for configuring a Jaeger container. The code is well-structured and follows good naming conventions. However, there are opportunities for enhancement:

  1. Implement error handling in getJaegerContainerName and propagate errors in Jaeger.
  2. Use constants for version numbers and port numbers to improve maintainability.
  3. Add comments to explain the purpose of specific configurations (e.g., exposed ports).
  4. Ensure imports are ordered according to the project's gci configuration.

Addressing these points will result in more robust, maintainable, and clear code.


3-9: 🧹 Nitpick (assertive)

Consider running gci on this file.

To ensure consistency with the project's import ordering standards, please run the following command:

gci write --skip-generated -s standard -s default -s "prefix(pkg.world.dev/world-cli)" -s blank -s dot --custom-order common/docker/service/jaeger.go

This will order the imports according to the project's gci configuration.


11-13: ⚠️ Potential issue

Add error handling for missing CARDINAL_NAMESPACE.

The function assumes CARDINAL_NAMESPACE is always present in the config. To improve robustness, consider adding error handling:

func getJaegerContainerName(cfg *config.Config) (string, error) {
    namespace, ok := cfg.DockerEnv["CARDINAL_NAMESPACE"]
    if !ok {
        return "", fmt.Errorf("CARDINAL_NAMESPACE not found in config")
    }
    return fmt.Sprintf("%s-jaeger", namespace), nil
}

This change will allow callers to handle the case where the required configuration is missing.


15-28: 🛠️ Refactor suggestion

Enhance Jaeger function for improved maintainability and error handling.

Consider the following improvements:

  1. Use a constant for the Jaeger image version for easier updates.
  2. Add error handling for the getJaegerContainerName call.
  3. Add a comment explaining the purpose of the exposed port (16686 for Jaeger UI).
  4. Return an error from the function to propagate any issues.

Here's a suggested refactoring:

const (
    JaegerImageVersion = "1.61.0"
    JaegerUIPort       = 16686
)

func Jaeger(cfg *config.Config) (Service, error) {
    containerName, err := getJaegerContainerName(cfg)
    if err != nil {
        return Service{}, err
    }

    return Service{
        Name: containerName,
        Config: container.Config{
            Image: fmt.Sprintf("jaegertracing/all-in-one:%s", JaegerImageVersion),
        },
        HostConfig: container.HostConfig{
            // Port 16686 is used for the Jaeger UI
            PortBindings: newPortMap([]int{JaegerUIPort}),
            NetworkMode:  container.NetworkMode(cfg.DockerEnv["CARDINAL_NAMESPACE"]),
        },
    }, nil
}

These changes will improve maintainability, error handling, and code clarity.

cmd/world/cardinal/stop.go (2)

Line range hint 1-50: Overall assessment: Changes improve flexibility with minor adjustments needed

The modifications to this file enhance the flexibility of the stop command by introducing dynamic service stopping. This is a positive change that allows for easier maintenance and configuration. However, there are a few areas where the documentation and messaging could be updated to better reflect these changes.

To summarize the suggestions:

  1. Verify the implementation of the getStartedServices function.
  2. Update the success message to be more inclusive of all stopped services.
  3. Modify the command description to reflect the dynamic nature of service stopping.

These minor adjustments will ensure that the code and its documentation are fully aligned with the new implementation.


40-41: Verify the implementation of getStartedServices function

The change from hardcoded service names to a dynamic retrieval using getStartedServices(cfg) improves flexibility and maintainability. However, ensure that the getStartedServices function correctly returns all necessary services to be stopped.

To verify the correct implementation, please run the following script:

This script will help us verify that the getStartedServices function is correctly implemented and returns all the previously hardcoded services.

common/docker/service/evm.go (1)

26-26: Approved: Linter directive improves code clarity

The added comment //nolint:goconst // default values should be local to the service is a good practice. It explicitly tells the linter to ignore this line for the goconst check, which would typically flag repeated string literals as potential constants. The comment also explains why this value should remain local to the service, improving code maintainability and preventing potential misuse of this default value in other parts of the codebase.

common/config/config.go (2)

23-26: Improved documentation for dockerEnvHeaders

The addition of the comment block above the dockerEnvHeaders variable declaration enhances code readability and maintainability. It clearly explains the purpose and behavior of the variable in the context of Docker environment variables.


35-35: Addition of Telemetry configuration option

The new Telemetry bool field in the Config struct aligns with the PR objectives to enhance telemetry capabilities. This boolean field is appropriate for enabling or disabling telemetry features.

cmd/world/cardinal/start.go (3)

28-28: Definition of flagTelemetry constant is correct.

The flagTelemetry constant is appropriately defined for use in the command-line flags.


79-81: Properly handling the flagTelemetry command-line flag.

The code correctly updates the configuration with the telemetry flag from the command line.


129-129: Service initialization with getStartedServices function is well-structured.

Calling getStartedServices(cfg) cleanly encapsulates the logic for determining which services to start.

cmd/world/cardinal/purge.go Show resolved Hide resolved
common/config/config.go Show resolved Hide resolved
common/docker/service/prometheus.go Outdated Show resolved Hide resolved
common/docker/service/prometheus.go Show resolved Hide resolved
common/docker/service/nakama.go Show resolved Hide resolved
common/docker/service/nakama.go Outdated Show resolved Hide resolved
common/docker/service/nakama.go Show resolved Hide resolved
cmd/world/cardinal/start.go Outdated Show resolved Hide resolved
cmd/world/cardinal/start.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range comments (1)
common/docker/client_test.go (1)

Inconsistent Usage of Purge Method: Missing Jaeger and Prometheus

The dockerClient.Purge method is not consistently including Jaeger and Prometheus across all calls.

  • common/docker/client_test.go: assert.NilError(t, dockerClient.Purge(ctx, service.Redis), "failed to purge container") is missing Jaeger and Prometheus.

Ensure that all Purge method calls include Jaeger and Prometheus to maintain consistency.

🔗 Analysis chain

Line range hint 1-244: Overall assessment: Changes are appropriate and align with PR objectives.

The modifications to common/docker/client_test.go successfully integrate Jaeger and Prometheus services into the test cleanup process. These changes ensure proper isolation between tests and maintain a clean testing environment, which is crucial for reliable test results.

The consistency between the TestMain function and the cleanUp function in handling the new services is commendable. The changes align well with the PR's goal of enhancing telemetry capabilities.

To ensure that all necessary services are being purged consistently throughout the codebase, run the following script:

This will help identify any other locations where the Purge method is called and verify that Jaeger and Prometheus are included consistently.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent usage of Purge method with all services

# Test: Search for Purge method calls. Expect: All calls include Jaeger and Prometheus
rg --type go 'dockerClient\.Purge\(' -A 5

Length of output: 1240

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Files that changed from the base of the PR and between 396c15c and a779603.

📒 Files selected for processing (4)
  • cmd/world/cardinal/start.go (5 hunks)
  • common/docker/client_test.go (2 hunks)
  • common/docker/service/nakama.go (2 hunks)
  • common/docker/service/prometheus.go (1 hunks)
🧰 Additional context used
📓 Learnings (1)
common/docker/service/nakama.go (1)
Learnt from: rmrt1n
PR: Argus-Labs/world-cli#79
File: common/docker/service/nakama.go:85-90
Timestamp: 2024-10-10T08:28:46.268Z
Learning: In `common/docker/service/nakama.go`, the Nakama entrypoint command does not require the `--trace.enabled` and `--trace.sample_rate` flags for telemetry tracing.
🔇 Additional comments (18)
common/docker/service/prometheus.go (7)

1-10: Code structure and imports are well-organized

The package declaration and import statements are correctly structured, following Go conventions for grouping standard library and third-party packages.


12-26: Prometheus configuration script is properly defined

The multi-line string containerCmd correctly defines the shell script to generate the prometheus.yaml configuration file with appropriate placeholders.


28-30: Function getPrometheusContainerName is correctly implemented

The function effectively generates the Prometheus container name based on the CARDINAL_NAMESPACE from the configuration.


33-36: Default metrics interval is appropriately set

The code retrieves the NAKAMA_METRICS_INTERVAL from the configuration and properly defaults to "30" if not specified.


38-38: Exposed ports are correctly assigned

The Prometheus service is set to expose port 9090, which is standard for Prometheus metrics.


40-41: Placeholder replacement in containerCmd is functional

The placeholders in the containerCmd string are correctly replaced with actual values from the configuration.


43-55: Prometheus service configuration is accurate

The Service struct for Prometheus is properly configured with the correct image, entry point, command, and host settings, including port bindings and network mode.

common/docker/service/nakama.go (6)

5-5: Necessary import added.

The addition of "strconv" is appropriate since strconv.ParseBool is used later in the code.


38-47: Telemetry configuration is implemented correctly.

The logic for setting traceEnabled and traceSampleRate handles default values appropriately and respects the global telemetry setting.


48-55: Metrics configuration logic is correct.

metricsEnabled is correctly determined based on the telemetry settings and the NAKAMA_METRICS_ENABLED environment variable. The error handling ensures that only valid boolean values are considered.


56-62: Prometheus port configuration is properly handled.

The prometheusPort is set based on metricsEnabled, ensuring Prometheus metrics are enabled when intended.


85-90: Entrypoint command updated correctly.

The entrypoint command includes the --metrics.prometheus_port flag, correctly utilizing the prometheusPort variable to enable or disable Prometheus metrics based on the configuration.


68-70: Update the image tag if the referenced PR has been merged.

The TODO comment indicates that the image tag should be updated once PR Argus-Labs/world-engine#799 is merged. Please update the Image field accordingly if the PR has been merged.

To check if the PR has been merged, you can run:

✅ Verification successful

PR Not Yet Merged

The referenced PR Argus-Labs/world-engine#799 has not been merged yet. Please retain the Image field as is and update it once the PR is merged.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if PR 799 in the world-engine repository has been merged.

gh pr view 799 --repo Argus-Labs/world-engine --json merged --jq '.merged'

Length of output: 703


Script:

#!/bin/bash
# Description: Check if PR 799 in the world-engine repository has been merged by verifying the 'mergedAt' field.

gh pr view 799 --repo Argus-Labs/world-engine --json mergedAt --jq '.mergedAt != null'

Length of output: 94

cmd/world/cardinal/start.go (5)

28-28: Proper addition of telemetry flag with default disabled

The flagTelemetry is added with a default value of false, which respects user privacy by requiring opt-in for telemetry features.


79-81: Consistent handling of telemetry flag in configuration

The use of replaceBoolWithFlag correctly updates cfg.Telemetry based on the --telemetry flag, ensuring command-line arguments override configuration values as intended.


128-129: Modular service initialization using getStartedServices

Refactoring service initialization into getStartedServices(cfg) improves code readability and maintainability by encapsulating the logic for determining which services to start.


165-165: Registration of telemetry flag in command initialization

The flagTelemetry is properly registered in the init function, allowing users to enable telemetry features via the command line.


182-191: Correct implementation of conditional service inclusion

The getStartedServices function accurately includes telemetry services (Jaeger and Prometheus) based on both the global telemetry setting and specific environment variables.

common/docker/client_test.go Show resolved Hide resolved
common/docker/client_test.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Files that changed from the base of the PR and between a779603 and 2856220.

📒 Files selected for processing (2)
  • cmd/world/cardinal/start.go (4 hunks)
  • cmd/world/cardinal/stop.go (1 hunks)
🧰 Additional context used
🔇 Additional comments (1)
cmd/world/cardinal/start.go (1)

129-134: Previous Comment Still Applies: Consider Improving Boolean Checks

The previous review comment about improving the robustness of environment variable checks is still applicable here. The comparisons to "true" might miss other truthy values like "True" or "TRUE". Consider using strconv.ParseBool to handle various representations of boolean values.

cmd/world/cardinal/stop.go Show resolved Hide resolved
Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 79.41176% with 14 lines in your changes missing coverage. Please review.

Project coverage is 46.69%. Comparing base (9888454) to head (d7d0e1f).

Files with missing lines Patch % Lines
cmd/world/cardinal/start.go 22.22% 3 Missing and 4 partials ⚠️
common/docker/service/nakama.go 71.42% 4 Missing and 2 partials ⚠️
common/docker/service/evm.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
+ Coverage   45.85%   46.69%   +0.84%     
==========================================
  Files          54       56       +2     
  Lines        2617     2677      +60     
==========================================
+ Hits         1200     1250      +50     
- Misses       1216     1220       +4     
- Partials      201      207       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Files that changed from the base of the PR and between 2856220 and d7d0e1f.

📒 Files selected for processing (1)
  • common/docker/service/nakama.go (2 hunks)
🧰 Additional context used
📓 Learnings (1)
common/docker/service/nakama.go (1)
Learnt from: rmrt1n
PR: Argus-Labs/world-cli#79
File: common/docker/service/nakama.go:85-90
Timestamp: 2024-10-10T08:28:46.268Z
Learning: In `common/docker/service/nakama.go`, the Nakama entrypoint command does not require the `--trace.enabled` and `--trace.sample_rate` flags for telemetry tracing.
🪛 GitHub Check: codecov/patch
common/docker/service/nakama.go

[warning] 50-52: common/docker/service/nakama.go#L50-L52
Added lines #L50 - L52 were not covered by tests


[warning] 60-60: common/docker/service/nakama.go#L60
Added line #L60 was not covered by tests

🔇 Additional comments (1)
common/docker/service/nakama.go (1)

58-61: Verify the handling of prometheusPort in the Entrypoint command

When metricsEnabled is false, prometheusPort is set to 0, and the Entrypoint command includes --metrics.prometheus_port 0. Confirm that passing a port of 0 correctly disables Prometheus metrics in Nakama. If Nakama requires the omission of the --metrics.prometheus_port flag to disable metrics, consider modifying the Entrypoint command to conditionally include this flag based on metricsEnabled.

You can verify this behavior by reviewing Nakama's documentation or testing the configuration.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 60-60: common/docker/service/nakama.go#L60
Added line #L60 was not covered by tests

common/docker/service/nakama.go Show resolved Hide resolved
common/docker/service/nakama.go Show resolved Hide resolved
Copy link

graphite-app bot commented Oct 11, 2024

Merge activity

Closes: WORLD-1195

## Overview

This PR adds new `world.toml` configuration options related to telemetry, CLI flags, and docker services. The services added are Jaeger for tracing, and Prometheus for metrics. ATM these are only for Nakama, but will support Cardinal in the future. For more context, checkout this [Notion doc](https://www.notion.so/arguslabs/World-Engine-Telemetry-Config-119c63b376aa80f6a13cd1ef7752b8e6).

The behaviour of the `--telemetry` flag is as follows:
- if `--telemetry` is disabled, then all telemetry is disabled regardless of the values in `world.toml`
- if `--telemetry` is enabled, then the telemetry components (tracing, metrics, profiling) will be enabled based on the values in `world.toml`

**NOTE**: The scope of this PR is only for toggling telemetry for Nakama. Cardinal's will be done in a separate PR.
**NOTE**: Some tests are failing because the Nakama docker image isn't created yet (waiting for [this PR](Argus-Labs/world-engine#799)).

## Brief Changelog
- Added Jaeger and Prometheus docker services
- Added `--telemetry` flag
- Added `world.toml` config options: `NAKAMA_TRACE_ENABLED`, `NAKAMA_TRACE_SAMPLE_RATE`, `NAKAMA_METRICS_ENABLED`, `NAKAMA_METRICS_INTERVAL`.

## Testing and Verifying

Manually tested and verified

### Notes for QA Team

Here are some test cases to consider including in the World CLI's test suite:
- `world cardinal start` without `--telemetry` flag won't enable tracing/metrics in Nakama and Prometheus and Jaeger containers won't be started.
- `world cardinal start --telemetry` with `NAKAMA_TRACES_ENABLED=true` and `NAKAMA_METRICS_ENABLED=false` will start Jaeger without Prometheus. The same applies vice versa. If both options are set to `true`, then both containers will be started.

## Summary by CodeRabbit

- **New Features**
	- Introduced Jaeger container configuration for enhanced tracing capabilities.
	- Added Prometheus container configuration for improved monitoring and metrics collection.
	- Implemented a telemetry flag for flexible service management based on tracing and metrics settings.
	- Added configuration for Jaeger and Prometheus containers in the Docker environment.
	- Introduced telemetry options in the Cardinal command-line interface.

- **Updates**
	- Upgraded Nakama Docker image version from `1.2.7` to `1.3.0`.
	- Enhanced Nakama configuration with new environment variables for Jaeger integration and metrics collection.
	- Expanded purge command to include Jaeger and Prometheus services for improved cleanup.
	- Enhanced the `purge` and `stop` commands to manage Jaeger and Prometheus services.
	- Updated the Nakama service configuration to support new telemetry and metrics features.

- **Bug Fixes**
	- Improved cleanup process in tests to ensure all relevant services are purged.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

## Release Notes

- **New Features**
	- Added support for configuring Jaeger and Prometheus containers in the Docker environment.
	- Introduced telemetry options in the configuration settings for enhanced monitoring capabilities.

- **Improvements**
	- Expanded the `purge` command to include Jaeger and Prometheus services.
	- Updated the `start` and `stop` commands to manage additional telemetry services based on user-defined settings.

- **Bug Fixes**
	- Enhanced test cleanup processes to ensure reliable testing environments by including additional services in purging operations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@graphite-app graphite-app bot merged commit 88789d7 into main Oct 11, 2024
5 of 7 checks passed
Copy link

sentry-io bot commented Oct 11, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ *eris.wrapError: Failed to remove containers: Errors: [Failed to inspect container defaultnamespace-nakama-db: Can... pkg.world.dev/world-cli/common/docker in (*Clie... View Issue
  • ‼️ *eris.wrapError: Encountered an error with Docker: Failed to start containers: Errors: [Post "http://%2Fvar%2Frun%... pkg.world.dev/world-cli/cmd/world/cardinal in i... View Issue
  • ‼️ *eris.wrapError: Encountered an error with Docker: Failed to build images: Errors: [Error decoding build output: c... pkg.world.dev/world-cli/cmd/world/cardinal in i... View Issue

Did you find this useful? React with a 👍 or 👎

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants