-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[podresapi 2/3] resmgr,agent: generate topology hints from Pod Resource API. #419
Conversation
5f9f2ea
to
5f1519e
Compare
/cc @pfl |
a91bbf4
to
f0622a5
Compare
f0622a5
to
dc900fa
Compare
Add configuration for the agent itself. Currently the only control enables access to Node Resource Topology Custom Resources and it is on by default. Update the agent to honor its configuration. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add support for tapping into the kubelet's pod resource API to query extra information about resources assigned to pods and containers. Use such a new client to expose agent functions for listing and querying pod resources, both synchronously and asynchronously. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Try querying Pod Resource API and generate extra topology hints using container device assignments listed there. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Bind-mount kubelet pod-resources directory read-only to plugin daemonset, to provide access to kubelet pod-resources socket. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
dc900fa
to
c1a1417
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
// these constants were obtained from NFD sources, cross-checked against | ||
// https://github.com/kubernetes/kubernetes/blob/release-1.31/test/e2e_node/util.go#L83 | ||
defaultSocketPath = "/var/lib/kubelet/pod-resources/kubelet.sock" | ||
timeout = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we reduce timeout in future? it might be too long for our operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout is now only used for establishing a client connection. For actually listing all pod resources or querying a single one the caller provides time.Duration
timeout (in the agent interface) or a context.Context
which can then be a context.WithTimeout
to use a timeout for the call.
Maybe that const should be/have been called a connectionTimeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asynchronous List()s/Get()s we fire off with a 2 and 1 second timeouts respectively. However (these could also be much longer), since the async. functions return a channel that can be used to wait for and eventually obtain the result, (and therefore) it is possible to programmatically override them easily with a simple select with a channel receive and a time.After receive cases.
This patch series
Pod Resource API queries are triggered asynchronously during pod creation/handling
RunPodSandbox
NRI events. Container creation waits for any pending queries (with a short timeout) then generates extra topology hints if the results contain devices with declared NUMA-locality. The existing device allow-/deny-annotations can be used to explicitly allow or deny topology hints corresponding to Pod Resource API hints.