Skip to content

KEP-3953: Node Resource Hot Plug #3955

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

Karthik-K-N
Copy link

@Karthik-K-N Karthik-K-N commented Apr 17, 2023

  • One-line PR description: Node Resource Hot Plug
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Apr 17, 2023
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 17, 2023
@Karthik-K-N Karthik-K-N mentioned this pull request Apr 17, 2023
4 tasks
@Karthik-K-N Karthik-K-N changed the title Dynamic node resize KEP-3953: Dynamic node resize Apr 17, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Apr 17, 2023

/assign @mrunalp @SergeyKanzhelev @klueska

@kad
Copy link
Member

kad commented Apr 28, 2023

/cc

@ffromani
Copy link
Contributor

/cc

@k8s-ci-robot k8s-ci-robot requested a review from ffromani May 18, 2023 07:29
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 23, 2023
@fmuyassarov
Copy link
Member

/cc

@Karthik-K-N
Copy link
Author

Sure, as long as that part isn't changed, I believe it's ok from sig-scheduling side.

yeah, I will reach out, if there are any changes in future, Thank you for taking a look.

@Karthik-K-N
Copy link
Author

Based on the conversation over slack and here, Added a section about compatability with Cluster Autoscaler: https://github.com/Karthik-K-N/enhancements/tree/node-resize/keps/sig-node/3953-node-resource-hot-plug#compatability-with-cluster-autoscaler.
Please take a look. Thanks.

Thank you, the added section accurately captures the problem and possible solutions. Could you add some information on when you want to address this part? When going to beta with the feature?

Thanks for the review, We plan to provide compatibility with autoscaler by storing the initial node resource in node object and also plan to provide this feature during beta graduation so we can get started now. But we are open for community feedback. Thank you.

@towca
Copy link

towca commented May 30, 2025

Thanks for the review, We plan to provide compatibility with autoscaler by storing the initial node resource in node object and also plan to provide this feature during beta graduation so we can get started now. But we are open for community feedback. Thank you.

Sounds good to me, thanks!

@elmiko
Copy link
Contributor

elmiko commented May 30, 2025

gave a read of the kep, no comments or suggestions currently, but happy to see this work progressing.

* Enable the re-initialization of resource managers (CPU manager, memory manager) and kube runtime manager without reset to accommodate alterations in the node's resource allocation.
* Recalculating and updating swap memory limit for existing pods.

### Non-Goals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see you call out devices. Is this in scope for this feature? ie adding gpudevices?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No its not in the scope, May be I can explicitly mention that for more clarity, Thanks.

@zvonkok
Copy link

zvonkok commented Jun 16, 2025

AFAIU this is only for CPU and Memory what about hot-plug of PCIe devices? How will this work once we have cAdvisorless metrics? #2371

@Karthik-K-N
Copy link
Author

AFAIU this is only for CPU and Memory what about hot-plug of PCIe devices? How will this work once we have cAdvisorless metrics? #2371

Yes correct, PCIe devices are out of scope for this KEP, About cAdvisorless metrics would seek help from @haircommander.

@haircommander
Copy link
Contributor

node level metrics aren't covered by 2371 so we should be good there, but regardless of the source if we're going through the stats manager then there should be no difference

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.