Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Hami2.4.0 + AscenDevicePlugin 报错UnexpectedAdmissionError #654

Open
ymbZzz opened this issue Nov 28, 2024 · 4 comments · May be fixed by Project-HAMi/ascend-device-plugin#14
Open

Hami2.4.0 + AscenDevicePlugin 报错UnexpectedAdmissionError #654

ymbZzz opened this issue Nov 28, 2024 · 4 comments · May be fixed by Project-HAMi/ascend-device-plugin#14
Labels
kind/bug Something isn't working

Comments

@ymbZzz
Copy link

ymbZzz commented Nov 28, 2024

表现

image
UnexpectedAdmissionError 14m kubelet Allocate failed due to rpc error: code = Unknown desc = parse pod annotation error: unknown uuid: , which is unexpected

可能的原因

Hami在POD中注入注解,第二部份没有UUID信息
image

POD信息

核心是sidecar.istio.io/inject: "true"这个注解
这个注解会动态注入一个container

@ymbZzz ymbZzz added the kind/bug Something isn't working label Nov 28, 2024
@lengrongfu
Copy link
Member

/assign

@lengrongfu
Copy link
Member

@ymbZzz can you provide you pod yaml?

@lengrongfu
Copy link
Member

Below are two pods, one of which uses hami. it label value is hami.io/vgpu-devices-allocated: GPU-ebe7c3f7-303d-558d-435e-99a160631fe4,NVIDIA,1000,10:;,,0,0:;

kind: Pod
apiVersion: v1
metadata:
  name: gpu-test-6f58db7c7c-nszdp
  generateName: gpu-test-6f58db7c7c-
  namespace: default
  uid: 81794181-4045-4880-86ba-5282d73056d7
  resourceVersion: '3276674'
  creationTimestamp: '2024-12-07T08:24:26Z'
  labels:
    app: gpu-test
    pod-template-hash: 6f58db7c7c
  annotations:
    cni.projectcalico.org/containerID: d95faeb288f6134b4cc65a3baeac1fa4d831b03cc92149bc6a641bc15d5d2537
    cni.projectcalico.org/podIP: 10.233.74.96/32
    cni.projectcalico.org/podIPs: 10.233.74.96/32
    hami.io/bind-phase: success
    hami.io/bind-time: '1733559866'
    hami.io/vgpu-devices-allocated: GPU-ebe7c3f7-303d-558d-435e-99a160631fe4,NVIDIA,1000,10:;,,0,0:;
    hami.io/vgpu-devices-to-allocate: ;,,0,0:;
    hami.io/vgpu-node: controller-node-1
    hami.io/vgpu-time: '1733559866'
spec:
  volumes:
    - name: kube-api-access-l46cm
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: container-1
      image: ubuntu:22.04
      command:
        - sleep
        - '1000000'
      env:
        - name: CUDA_TASK_PRIORITY
          value: '1'
      resources:
        limits:
          cpu: 250m
          memory: 512Mi
          nvidia.com/gpucores: '10'
          nvidia.com/gpumem: 1k
          nvidia.com/vgpu: '1'
        requests:
          cpu: 250m
          memory: 512Mi
          nvidia.com/gpucores: '10'
          nvidia.com/gpumem: 1k
          nvidia.com/vgpu: '1'
      volumeMounts:
        - name: kube-api-access-l46cm
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
    - name: container-2
      image: ubuntu:22.04
      command:
        - sleep
        - '1000000'
      resources:
        limits:
          cpu: 250m
          memory: 512Mi
        requests:
          cpu: 250m
          memory: 512Mi
      volumeMounts:
        - name: kube-api-access-l46cm
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: default
  serviceAccount: default
  nodeName: controller-node-1
  schedulerName: hami-scheduler

@lengrongfu
Copy link
Member

@ymbZzz I don't have the environment, can you help me test this pr Project-HAMi/ascend-device-plugin#14?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants