Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

blobstore: Add metrics_tag config field #205

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

minor-fixes
Copy link
Contributor

@minor-fixes minor-fixes commented May 11, 2024

This change adds a single string field to the base blobstore config message, the value of which gets propagated to a tag value on exported metrics for the corresponding backend instance.

In particular, this change allows for attaching different tags on separate uses of a single "label" backend in order to collect metrics specific to a particular instantiation. Without this change, all accesses to the referenced backend shared one combined set of metrics; with this change, it's possible to set separate metrics_tag values on each instantiation and collect separate metrics, as well as tagging the underlying backend and grabbing aggregate metrics as well.

TODO before marking non-draft:

  • pass presubmit checks
  • drop spurious bzlmod changes
  • format code
  • second pass on proto comments

Tested: see comment below

@minor-fixes
Copy link
Contributor Author

With the config attached below, I started a bb_storage instance, and issued 7 RPCs:

  • 1x "access through readCaching": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'
  • 2x "access slow directly": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "slow" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'
  • 4x "access fast directly": grpc_cli call localhost:8980 ContentAddressableStorage.FindMissingBlobs 'instance_name: "fast" digest_function: 1 blob_digests { hash: "aed662c522d64bb9e73534f4cd9cd0d98c8407e737689f1a591e4409f4d9e0ce" size_bytes: 5 }'

The counts for the respective tags were:

  • top: 7 (sum of all RPCs)
  • read_cached: 1 (only one call to the readCaching backend)
  • slow_underlying: 3 (1 call to readCaching + 2 calls to 'slow' directly via instance name)
  • fast_underlying: 4 (0 calls to readCaching? + 4 calls to 'fast' directly via instance name)
  • slow_direct: 2 (counts only direct calls via instance name)
  • slow_cached: 1 (only one call to the readCaching backend)
  • fast_direct: 4 (counts only direct calls via instance name)
  • fast_cached: 0 (presumably; could not find metrics)

I wouldn't mind adding an appropriate test though; any ideas on a better testing approach?


Config:

local SmallCache = function(tag) {
  metricsTag: tag,
  'local': {
    keyLocationMapInMemory: { entries: 1000 },
    keyLocationMapMaximumGetAttempts: 8,
    keyLocationMapMaximumPutAttempts: 32,
    oldBlocks: 2,
    currentBlocks: 1,
    newBlocks: 1,
    blocksInMemory: {
      blockSizeBytes: 10 * 1000 * 1000,
    },
  },
};

{
  executeAuthorizer: { allow: {} },
  grpcServers: [
    {
      listenAddresses: ["127.0.0.1:8980"],
      authenticationPolicy: { allow: {} },
    },
  ],
  global: {
    diagnosticsHttpServer: {
      httpServers: [{ listen_addresses: ["127.0.0.1:9980"], authenticationPolicy: { allow: {} } }],
      enablePrometheus: true,
    },
  },
  contentAddressableStorage: {
    getAuthorizer: { allow: {} },
    putAuthorizer: { allow: {} },
    findMissingAuthorizer: { allow: {} },
    backend: {
      withLabels: {
        backend: {
          metricsTag: 'top',
          demultiplexing: {
            instanceNamePrefixes: {
              '': {
                backend: {
                  metricsTag: 'read_cached',
                  readCaching: {
                    slow: { label: 'slow', metricsTag: 'slow_cached' },
                    fast: { label: 'fast', metricsTag: 'fast_cached' },
                    replicator: { 'local': {} },
                  },
                },
              },
              'slow': {
                backend: {
                  label: 'slow',
                  metricsTag: 'slow_direct',
                },
              },
              'fast': {
                backend: {
                  label: 'fast',
                  metricsTag: 'fast_direct',
                },
              },
            },
          },
        },
        labels: {
          'fast': SmallCache('fast_underlying'),
          'slow': SmallCache('slow_underlying'),
        },
      },
    },
  },
}

@minor-fixes minor-fixes force-pushed the metrics_tag branch 3 times, most recently from fc91d76 to 2e717ee Compare May 12, 2024 23:11
WIP; more description to come
@EdSchouten
Copy link
Member

Instead of actually having a separate metrics_tag, what are your thoughts on reusing the existing with_label / label facility? As in, make it so that using with_label causes a label="..." to be added to any BlobAccess underneath.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants