Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[podresapi 2/3] resmgr,agent: generate topology hints from Pod Resource API. #419

Merged
merged 4 commits into from
Dec 4, 2024

Conversation

klihub
Copy link
Collaborator

@klihub klihub commented Dec 1, 2024

This patch series

  • adds support to query the kubelet's Pod Resource API via the agent
  • generates extra topology hints using the Pod Resource API, if it is enabled
  • updates Helm charts to provide access to the kubelet's Pod Resource API socket

Pod Resource API queries are triggered asynchronously during pod creation/handling RunPodSandbox NRI events. Container creation waits for any pending queries (with a short timeout) then generates extra topology hints if the results contain devices with declared NUMA-locality. The existing device allow-/deny-annotations can be used to explicitly allow or deny topology hints corresponding to Pod Resource API hints.

@klihub klihub force-pushed the devel/hints-from-pod-resource-api branch 2 times, most recently from 5f9f2ea to 5f1519e Compare December 1, 2024 17:53
@klihub klihub changed the title [Pod Resource API 2/3]: generate topology hints from Pod Resource API. [podresapi 2/3]: generate topology hints from Pod Resource API. Dec 1, 2024
@klihub
Copy link
Collaborator Author

klihub commented Dec 2, 2024

/cc @pfl

@klihub klihub changed the title [podresapi 2/3]: generate topology hints from Pod Resource API. [podresapi 2/3] generate topology hints from Pod Resource API. Dec 2, 2024
@klihub klihub marked this pull request as draft December 2, 2024 09:42
@klihub klihub marked this pull request as ready for review December 2, 2024 11:48
@klihub klihub changed the title [podresapi 2/3] generate topology hints from Pod Resource API. [podresapi 2/3] resmgr,agent: generate topology hints from Pod Resource API. Dec 2, 2024
@klihub klihub force-pushed the devel/hints-from-pod-resource-api branch 2 times, most recently from a91bbf4 to f0622a5 Compare December 3, 2024 09:09
@klihub klihub requested a review from kad December 3, 2024 12:35
@klihub klihub force-pushed the devel/hints-from-pod-resource-api branch from f0622a5 to dc900fa Compare December 3, 2024 15:22
Add configuration for the agent itself. Currently the
only control enables access to Node Resource Topology
Custom Resources and it is on by default. Update the
agent to honor its configuration.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add support for tapping into the kubelet's pod resource
API to query extra information about resources assigned
to pods and containers. Use such a new client to expose
agent functions for listing and querying pod resources,
both synchronously and asynchronously.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Try querying Pod Resource API and generate extra topology
hints using container device assignments listed there.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Bind-mount kubelet pod-resources directory read-only to plugin
daemonset, to provide access to kubelet pod-resources socket.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@klihub klihub force-pushed the devel/hints-from-pod-resource-api branch from dc900fa to c1a1417 Compare December 3, 2024 15:24
Copy link
Collaborator

@kad kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// these constants were obtained from NFD sources, cross-checked against
// https://github.com/kubernetes/kubernetes/blob/release-1.31/test/e2e_node/util.go#L83
defaultSocketPath = "/var/lib/kubelet/pod-resources/kubelet.sock"
timeout = 10 * time.Second
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we reduce timeout in future? it might be too long for our operations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout is now only used for establishing a client connection. For actually listing all pod resources or querying a single one the caller provides time.Duration timeout (in the agent interface) or a context.Context which can then be a context.WithTimeout to use a timeout for the call.

Maybe that const should be/have been called a connectionTimeout ?

Copy link
Collaborator Author

@klihub klihub Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asynchronous List()s/Get()s we fire off with a 2 and 1 second timeouts respectively. However (these could also be much longer), since the async. functions return a channel that can be used to wait for and eventually obtain the result, (and therefore) it is possible to programmatically override them easily with a simple select with a channel receive and a time.After receive cases.

@kad kad merged commit b412a05 into containers:main Dec 4, 2024
4 checks passed
@klihub klihub deleted the devel/hints-from-pod-resource-api branch December 4, 2024 14:40
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants