Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[BUG] 并发调用创建虚机接口,有几率分配显卡失败 #21508

Open
66545shiwo opened this issue Nov 2, 2024 · 1 comment
Open
Labels
bug Something isn't working state/awaiting processing

Comments

@66545shiwo
Copy link

问题描述/What happened:
并发调用创建虚机接口,有几率分配显卡失败:

221940 [warning 2024-11-02 05:56:50 predicates.(*PredicateHelper).GetResult(predicates.go:89)]Filter Result: candidate: "0a70d90d-f1d5-4dc5-8aaa-0306d88936f9", filter: "host_isolated_device", is_ok: false, reason: "no enough resource: test, requested: 1, total: 8, free: 0, IsolatedDevice count not enough, request: 1, hostTotal: 8, hostFree: 0", error: <nil>

宿主机共8卡, 创建虚机接口并发调用8次,每个虚机分配1卡,报错如上。
调用完成后查看宿主机实际占用7卡,剩余1卡

环境/Environment:
v3.10.15

  • OS (e.g. cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Host: (e.g. dmidecode | egrep -i 'manufacturer|product' |sort -u)
  • Service Version (e.g. kubectl exec -n onecloud $(kubectl get pods -n onecloud | grep climc | awk '{print $1}') -- climc version-list):
@66545shiwo 66545shiwo added the bug Something isn't working label Nov 2, 2024
@66545shiwo
Copy link
Author

66545shiwo commented Nov 4, 2024

我们在scheduler的内存/显卡的predicates里加了重试策略,可以简单解决内存(#21301)/显卡并发分配问题。

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working state/awaiting processing
Projects
None yet
Development

No branches or pull requests

1 participant