Skip to content

feat(node): zfs monitor and grafana ui #559

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

waitingsong
Copy link
Contributor

@waitingsong waitingsong commented Mar 31, 2025

feat(node): zfs monitor and grafana ui

Installation

Add parameters:

  • zfs_exporter_enabled: setup zfs_exporter on this node, false by default
  • zfs_exporter_version: current 3.8.1
  • zfs_exporter_port: zfs exporter listen port, 9134 by default
  • zfs_exporter_options: see "/roles/node_monitor/defaults/main.yml"

Prometheus

Add node_exporter (forked from pdf-node_exporter) with more metrics:

  • node:ins:zfs_pool_metrics
  • node:ins:zfs_dataset_metrics

Add alert rules:

  • ZPoolDegraded
  • ZPoolFaulted
  • ZPoolOffline
  • ZPoolUnavail
  • ZPoolRemoved
  • ZPoolSuspended
  • ZPoolReadonly
  • ZPoolSpaceFull
  • ZDatasetSpaceFull

Update alert rule:

  • NodeFsSpaceFull with fstype!="zfs"

Grafana

Add panels:

  • node-zfs.json
    • list ZFS Pools and Datasets
    • ability search with Node ID, Pool, Dataset
    • alert summary about zfs

Update panels:

@waitingsong
Copy link
Contributor Author

node-overview.json

zfs-2025-03-31_141247

@waitingsong
Copy link
Contributor Author

waitingsong commented Mar 31, 2025

node-zfs.json

zfs-2025-04-01_145502

.

@waitingsong
Copy link
Contributor Author

waitingsong commented Mar 31, 2025

node-zfs.json

unhealthy pool list fisrt

zfs-2025-04-01_151730

node-overview.json
zfs-2025-03-31_140046

@waitingsong
Copy link
Contributor Author

node-alert.json

  • ZPoolDegraded
  • ZDatasetSpaceFull

zfs-2025-03-31_140112

@waitingsong
Copy link
Contributor Author

waitingsong commented Mar 31, 2025

ZPoolDegraded jump from alert page with pool name

zfs-2025-03-31_140942

@waitingsong
Copy link
Contributor Author

ZDatasetSpaceFull jump from alert page with pool name and dataset name

zfs-2025-03-31_140914

@waitingsong waitingsong force-pushed the zfs-monitor branch 11 times, most recently from cef6110 to 9fb1852 Compare April 1, 2025 08:04
@waitingsong
Copy link
Contributor Author

zfs_exporter aliveness panels

zfs-2025-04-01_165111

.

## Installation
Add parameters:
- `zfs_exporter_enabled`: setup zfs_exporter on this node, false by default
- `zfs_exporter_version`: current `3.8.1`
- `zfs_exporter_port`:    zfs exporter listen port, 9134 by default
- `zfs_exporter_options`: see "/roles/node_monitor/defaults/main.yml"

## Prometheus
Add [node_exporter] (forked from [pdf-node_exporter]) with more metrics:
- `node:zfs:pool_metrics`
- `node:zfs:dataset_metrics`

Add alert rules:
- `ZPoolDegraded`
- `ZPoolFaulted`
- `ZPoolOffline`
- `ZPoolUnavail`
- `ZPoolRemoved`
- `ZPoolSuspended`
- `ZPoolReadonly`
- `ZPoolSpaceFull`
- `ZDatasetSpaceFull`

Update alert rule:
-  `NodeFsSpaceFull` with `fstype!="zfs"`

## Grafana
Add panels:
- [node-zfs.json]
  - list ZFS Pools and Datasets
  - unhealthy pool list fisrt
  - ability search with `Node ID`, `Pool`, `Dataset`
  - alert summary about zfs

Update panels:
- [node-overview.json](http://g.pigsty/d/node-overview/node-overview)
  - add `ZFS Pools` Row
  - unhealthy pool list fisrt
- [node-alert.json](http://g.pigsty/d/node-alert/node-alert)
  - show zfs alerts
  - cell link to [node-overview.json] page with pool name and/or dataset name filter

[node_exporter]: https://github.com/waitingsong/zfs_exporter
[pdf-node_exporter]: https://github.com/pdf/zfs_exporter
[node-overview.json]: http://g.pigsty/d/node-overview/node-overview
[node-zfs.json]: http://g.pigsty/d/zfs-overview/node-zfs
@Vonng
Copy link
Member

Vonng commented Apr 5, 2025

zfs_exporter is added to pigsty-infra repo now

pgsty/infra-pkg@906ab53

@Vonng Vonng force-pushed the main branch 4 times, most recently from df9afaf to 47d9e6d Compare April 5, 2025 14:29
@waitingsong
Copy link
Contributor Author

waitingsong commented Apr 8, 2025

node-cluster.json
link: d/node-cluster/node-cluster

zfs-2025-04-10_210454

@waitingsong waitingsong force-pushed the zfs-monitor branch 5 times, most recently from d94bca3 to 04f70eb Compare April 10, 2025 06:05
@waitingsong waitingsong changed the title feat(node): zfs monitor and grafana ui WIP: feat(node): zfs monitor and grafana ui Apr 10, 2025
- node:ins:zfs_arc_utilization
- node:ins:zfs_arc_memory_ratio
- node:ins:zfs_arc_meta_usage
- node:ins:zfs_arc_hit_ratio
- node:ins:zfs_arc_hit_ratio_rate1m
- node:ins:zfs_arc_hit_ratio_rate5m
- node:ins:zfs_arc_usage_ratio
- node:ins:zfs_arc_pressure_ratio
@waitingsong waitingsong force-pushed the zfs-monitor branch 2 times, most recently from 662b303 to 38deca0 Compare April 11, 2025 06:17
- ARC Pressure
- ARC Memory Ratio
- ARC Utilization
- ARC Meta Usage
- ARC Dnode Size
- ARC Evict Skip
- ARC Hit (rate1m)
- ARC Hit Ratio (rate1m)
@waitingsong
Copy link
Contributor Author

node-instance.json
link: /d/node-instance/node-instance

zfs-2025-04-11_144225

@waitingsong waitingsong changed the title WIP: feat(node): zfs monitor and grafana ui feat(node): zfs monitor and grafana ui Apr 11, 2025
@waitingsong
Copy link
Contributor Author

waitingsong commented Apr 12, 2025

node-alert.json
link: d/node-alert/node-alert

zfs-2025-04-12_215458

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants