pageserver_io_operations_bytes_total
not lifecycled correctly for secondaries
#11156
Labels
a/observability
Area: related to observability
c/storage/pageserver
Component: storage: pageserver
t/bug
Issue Type: Bug
Background
The
pageserver_io_operations_bytes_total
metric (STORAGE_IO_SIZE
) is per-timeline.Before
the tenant/shard/timeline ids were inferred from the path that was passed to VirtualFile::open. The first read/write operation on that VirtualFile would instantiate the metric via
with_label_values
.This happens both for Attached and Secondary locations.
Attached locations properly lifecycle-manage the metric: during
Timeline::shutdown
, the metric gets removed.#7202 avoids the
with_label_values
in the hot path but punts on fixing the problem described below.Problem
Secondary locations don't lifecycle-manage the metric.
Thus, if a secondary location gets detached, the instantiated per-timeline metric remains in the registry causing
Solution
Somehow lifecycle-manage the metric.
Secondaries don't have infrastructure for this at the timeline-level, so, some generic infrastructure needs to be put in place first.
The text was updated successfully, but these errors were encountered: