Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add hadoop input plugin to categraf #1137

Merged
merged 3 commits into from
Jan 20, 2025

Conversation

JiaLiangC
Copy link
Contributor


Add hadoop input plugin to categraf

feat: Add Hadoop input plugin to Categraf


PR 描述

新增功能

此 PR 新增了一个 Hadoop 监控插件,支持通过 JMX 接口采集 Hadoop 集群中以下组件的监控指标:

  • Yarn ResourceManager
  • Yarn NodeManager
  • Hadoop NameNode
  • Hadoop DataNode

配置说明

插件的配置文件位于 conf/input.hadoop/hadoop.toml,支持以下配置项:

通用配置

[common]
useSASL = false
saslUsername = "HTTP/_HOST"
saslDisablePAFXFast = true
saslMechanism = "gssapi"
kerberosAuthType = "keytabAuth"
keyTabPath = "/path/to/keytab"
kerberosConfigPath = "/path/to/krb5.conf"
realm = "EXAMPLE.COM"

组件配置

每个组件的配置通过 [[components]] 块定义,支持以下字段:

  • name:组件名称(如 YarnResourceManager)。
  • port:JMX 端口。
  • processName:进程名称,用于动态判断是否需要采集该组件的指标。
  • allowRecursiveParse:是否递归解析 JMX 返回的 JSON 数据。
  • allowMetricsWhiteList:是否启用白名单。
  • jmxUrlSuffix:JMX URL 后缀。
  • white_list:需要采集的指标名称列表。

示例配置:

[[components]]
name = "YarnResourceManager"
port = 8088
processName = "org.apache.hadoop.yarn.server.resourcemanager.ResourceManager"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "NumActiveNMs", # 活跃的NodeManager数量
    "NumUnhealthyNMs", # 不健康的NodeManager数量
    "NumLostNMs", # 丢失连接的NodeManager数量
]

[[components]]
name = "YarnNodeManager"
port = 8042
processName = "Dproc_nodemanager"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "ContainersLaunched",        # 已启动的容器总数
    "ContainersCompleted",       # 已完成的容器总数
    "ContainersFailed",          # 失败的容器总数
]

[[components]]
name = "HadoopNameNode"
port = 50070
processName = "org.apache.hadoop.hdfs.server.namenode.NameNode"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "FSState", # NameNode 文件系统状态(Operational/SafeMode等)
    "HAState", # HA状态(active/standby)
    "State", # NameNode 状态
]

[[components]]
name = "HadoopDataNode"
port = 1022
processName = "Dproc_datanode"
allowRecursiveParse = true
allowMetricsWhiteList = true
jmxUrlSuffix = "/jmx"
white_list = [
    "SystemCpuLoad",              # 系统CPU负载
    "ProcessCpuLoad",             # DataNode进程CPU负载
    "HeapMemoryUsage",            # JVM堆内存使用情况
]

白名单的作用

  • 白名单white_list 用于指定需要采集的指标名称。插件会根据白名单中的指标名称从 JMX 接口中提取对应的数据。
  • 动态采集:插件会根据 processName 判断当前机器是否有该进程,如果有则自动采集白名单中的指标。
  • 递归解析:如果开启 allowRecursiveParse,插件会递归解析 JMX 返回的 JSON 数据,并采集白名单中的指标。

测试

已通过以下测试:

  1. 在 Hadoop 集群中部署 Categraf,验证插件能够正确采集 Yarn ResourceManager、Yarn NodeManager、Hadoop NameNode 和 Hadoop DataNode 的指标。
  2. 验证白名单功能,确保只有白名单中的指标被采集。
  3. 验证递归解析功能,确保嵌套的 JSON 数据能够被正确解析。
    image
    image

相关 Issue

#1136)


代码变更

新增文件

  1. plugins/inputs/hadoop/hadoop.go:Hadoop 插件的核心实现。
  2. conf/input.hadoop/hadoop.toml:Hadoop 插件的配置文件模板。
  3. plugins/inputs/hadoop/README.md:Hadoop 插件的使用文档。

修改文件

  1. plugins/inputs/inputs.go:注册 Hadoop 插件。

@JiaLiangC JiaLiangC changed the title add hadoop input plugin to categraf Add hadoop input plugin to categraf Jan 17, 2025
@kongfei605
Copy link
Collaborator

Thank you @JiaLiangC

@kongfei605 kongfei605 merged commit cc550db into flashcatcloud:main Jan 20, 2025
2 of 3 checks passed
@JiaLiangC
Copy link
Contributor Author

Thanks for helping review.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants