Skip to content

Commit cb6e45e

Browse files
authored
Merge pull request #841 from cdesiniotis/cherry-picks-for-v0.16.1
Cherry picks for the v0.16.1 release
2 parents d2eea55 + 5505131 commit cb6e45e

15 files changed

+52
-49
lines changed

CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
## Changelog
22

3+
### v0.16.1
4+
- Bump nvidia-container-toolkit to v1.16.1 to fix a bug with CDI spec generation for MIG devices
5+
36
### v0.16.0
47
- Fixed logic of atomic writing of the feature file
58
- Replaced `WithDialer` with `WithContextDialer`

README.md

+25-25
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ The NVIDIA device plugin for Kubernetes is a Daemonset that allows you to automa
3939
- Run GPU enabled containers in your Kubernetes cluster.
4040

4141
This repository contains NVIDIA's official implementation of the [Kubernetes device plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/).
42-
As of v0.16.0 this repository also holds the implementation for GPU Feature Discovery labels,
42+
As of v0.16.1 this repository also holds the implementation for GPU Feature Discovery labels,
4343
for further information on GPU Feature Discovery see [here](docs/gpu-feature-discovery/README.md).
4444

4545
Please note that:
@@ -123,7 +123,7 @@ Once you have configured the options above on all the GPU nodes in your
123123
cluster, you can enable GPU support by deploying the following Daemonset:
124124

125125
```shell
126-
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.0/deployments/static/nvidia-device-plugin.yml
126+
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.16.1/deployments/static/nvidia-device-plugin.yml
127127
```
128128

129129
**Note:** This is a simple static daemonset meant to demonstrate the basic
@@ -558,11 +558,11 @@ $ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
558558
$ helm repo update
559559
```
560560

561-
Then verify that the latest release (`v0.16.0`) of the plugin is available:
561+
Then verify that the latest release (`v0.16.1`) of the plugin is available:
562562
```
563563
$ helm search repo nvdp --devel
564564
NAME CHART VERSION APP VERSION DESCRIPTION
565-
nvdp/nvidia-device-plugin 0.16.0 0.16.0 A Helm chart for ...
565+
nvdp/nvidia-device-plugin 0.16.1 0.16.1 A Helm chart for ...
566566
```
567567

568568
Once this repo is updated, you can begin installing packages from it to deploy
@@ -573,7 +573,7 @@ The most basic installation command without any options is then:
573573
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
574574
--namespace nvidia-device-plugin \
575575
--create-namespace \
576-
--version 0.16.0
576+
--version 0.16.1
577577
```
578578

579579
**Note:** You only need the to pass the `--devel` flag to `helm search repo`
@@ -582,7 +582,7 @@ version (e.g. `<version>-rc.1`). Full releases will be listed without this.
582582

583583
### Configuring the device plugin's `helm` chart
584584

585-
The `helm` chart for the latest release of the plugin (`v0.16.0`) includes
585+
The `helm` chart for the latest release of the plugin (`v0.16.1`) includes
586586
a number of customizable values.
587587

588588
Prior to `v0.12.0` the most commonly used values were those that had direct
@@ -592,7 +592,7 @@ case of the original values is then to override an option from the `ConfigMap`
592592
if desired. Both methods are discussed in more detail below.
593593

594594
The full set of values that can be set are found here:
595-
[here](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.16.0/deployments/helm/nvidia-device-plugin/values.yaml).
595+
[here](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.16.1/deployments/helm/nvidia-device-plugin/values.yaml).
596596

597597
#### Passing configuration to the plugin via a `ConfigMap`.
598598

@@ -631,7 +631,7 @@ EOF
631631
And deploy the device plugin via helm (pointing it at this config file and giving it a name):
632632
```
633633
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
634-
--version=0.16.0 \
634+
--version=0.16.1 \
635635
--namespace nvidia-device-plugin \
636636
--create-namespace \
637637
--set-file config.map.config=/tmp/dp-example-config0.yaml
@@ -653,7 +653,7 @@ $ kubectl create cm -n nvidia-device-plugin nvidia-plugin-configs \
653653
```
654654
```
655655
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
656-
--version=0.16.0 \
656+
--version=0.16.1 \
657657
--namespace nvidia-device-plugin \
658658
--create-namespace \
659659
--set config.name=nvidia-plugin-configs
@@ -681,7 +681,7 @@ EOF
681681
And redeploy the device plugin via helm (pointing it at both configs with a specified default).
682682
```
683683
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
684-
--version=0.16.0 \
684+
--version=0.16.1 \
685685
--namespace nvidia-device-plugin \
686686
--create-namespace \
687687
--set config.default=config0 \
@@ -700,7 +700,7 @@ $ kubectl create cm -n nvidia-device-plugin nvidia-plugin-configs \
700700
```
701701
```
702702
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
703-
--version=0.16.0 \
703+
--version=0.16.1 \
704704
--namespace nvidia-device-plugin \
705705
--create-namespace \
706706
--set config.default=config0 \
@@ -783,7 +783,7 @@ chart values that are commonly overridden are:
783783
```
784784

785785
Please take a look in the
786-
[`values.yaml`](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.16.0/deployments/helm/nvidia-device-plugin/values.yaml)
786+
[`values.yaml`](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.16.1/deployments/helm/nvidia-device-plugin/values.yaml)
787787
file to see the full set of overridable parameters for the device plugin.
788788

789789
Examples of setting these options include:
@@ -792,7 +792,7 @@ Enabling compatibility with the `CPUManager` and running with a request for
792792
100ms of CPU time and a limit of 512MB of memory.
793793
```shell
794794
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
795-
--version=0.16.0 \
795+
--version=0.16.1 \
796796
--namespace nvidia-device-plugin \
797797
--create-namespace \
798798
--set compatWithCPUManager=true \
@@ -803,7 +803,7 @@ $ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
803803
Enabling compatibility with the `CPUManager` and the `mixed` `migStrategy`
804804
```shell
805805
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
806-
--version=0.16.0 \
806+
--version=0.16.1 \
807807
--namespace nvidia-device-plugin \
808808
--create-namespace \
809809
--set compatWithCPUManager=true \
@@ -822,7 +822,7 @@ Discovery to perform this labeling.
822822
To enable it, simply set `gfd.enabled=true` during helm install.
823823
```
824824
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
825-
--version=0.16.0 \
825+
--version=0.16.1 \
826826
--namespace nvidia-device-plugin \
827827
--create-namespace \
828828
--set gfd.enabled=true
@@ -867,7 +867,7 @@ nvidia.com/gpu.product = A100-SXM4-40GB-MIG-1g.5gb-SHARED
867867

868868
#### Deploying gpu-feature-discovery in standalone mode
869869

870-
As of v0.16.0, the device plugin's helm chart has integrated support to deploy
870+
As of v0.16.1, the device plugin's helm chart has integrated support to deploy
871871
[`gpu-feature-discovery`](https://gitlab.com/nvidia/kubernetes/gpu-feature-discovery/-/tree/main)
872872

873873
When gpu-feature-discovery in deploying standalone, begin by setting up the
@@ -878,13 +878,13 @@ $ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
878878
$ helm repo update
879879
```
880880

881-
Then verify that the latest release (`v0.16.0`) of the plugin is available
881+
Then verify that the latest release (`v0.16.1`) of the plugin is available
882882
(Note that this includes the GFD chart):
883883

884884
```shell
885885
$ helm search repo nvdp --devel
886886
NAME CHART VERSION APP VERSION DESCRIPTION
887-
nvdp/nvidia-device-plugin 0.16.0 0.16.0 A Helm chart for ...
887+
nvdp/nvidia-device-plugin 0.16.1 0.16.1 A Helm chart for ...
888888
```
889889

890890
Once this repo is updated, you can begin installing packages from it to deploy
@@ -894,7 +894,7 @@ The most basic installation command without any options is then:
894894

895895
```
896896
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
897-
--version 0.16.0 \
897+
--version 0.16.1 \
898898
--namespace gpu-feature-discovery \
899899
--create-namespace \
900900
--set devicePlugin.enabled=false
@@ -905,7 +905,7 @@ the default namespace.
905905

906906
```shell
907907
$ helm upgrade -i nvdp nvdp/nvidia-device-plugin \
908-
--version=0.16.0 \
908+
--version=0.16.1 \
909909
--set allowDefaultNamespace=true \
910910
--set nfd.enabled=false \
911911
--set migStrategy=mixed \
@@ -928,31 +928,31 @@ Using the default values for the flags:
928928
$ helm upgrade -i nvdp \
929929
--namespace nvidia-device-plugin \
930930
--create-namespace \
931-
https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.16.0.tgz
931+
https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.16.1.tgz
932932
```
933933

934934
## Building and Running Locally
935935

936936
The next sections are focused on building the device plugin locally and running it.
937937
It is intended purely for development and testing, and not required by most users.
938-
It assumes you are pinning to the latest release tag (i.e. `v0.16.0`), but can
938+
It assumes you are pinning to the latest release tag (i.e. `v0.16.1`), but can
939939
easily be modified to work with any available tag or branch.
940940

941941
### With Docker
942942

943943
#### Build
944944
Option 1, pull the prebuilt image from [Docker Hub](https://hub.docker.com/r/nvidia/k8s-device-plugin):
945945
```shell
946-
$ docker pull nvcr.io/nvidia/k8s-device-plugin:v0.16.0
947-
$ docker tag nvcr.io/nvidia/k8s-device-plugin:v0.16.0 nvcr.io/nvidia/k8s-device-plugin:devel
946+
$ docker pull nvcr.io/nvidia/k8s-device-plugin:v0.16.1
947+
$ docker tag nvcr.io/nvidia/k8s-device-plugin:v0.16.1 nvcr.io/nvidia/k8s-device-plugin:devel
948948
```
949949

950950
Option 2, build without cloning the repository:
951951
```shell
952952
$ docker build \
953953
-t nvcr.io/nvidia/k8s-device-plugin:devel \
954954
-f deployments/container/Dockerfile.ubuntu \
955-
https://github.com/NVIDIA/k8s-device-plugin.git#v0.16.0
955+
https://github.com/NVIDIA/k8s-device-plugin.git#v0.16.1
956956
```
957957

958958
Option 3, if you want to modify the code:

deployments/helm/nvidia-device-plugin/Chart.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ apiVersion: v2
22
name: nvidia-device-plugin
33
type: application
44
description: A Helm chart for the nvidia-device-plugin on Kubernetes
5-
version: "0.16.0"
6-
appVersion: "0.16.0"
5+
version: "0.16.1"
6+
appVersion: "0.16.1"
77
kubeVersion: ">= 1.10.0-0"
88
home: https://github.com/NVIDIA/k8s-device-plugin
99

deployments/static/gpu-feature-discovery-daemonset-with-mig-mixed.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,11 @@ spec:
1515
metadata:
1616
labels:
1717
app.kubernetes.io/name: gpu-feature-discovery
18-
app.kubernetes.io/version: 0.16.0
18+
app.kubernetes.io/version: 0.16.1
1919
app.kubernetes.io/part-of: nvidia-gpu
2020
spec:
2121
containers:
22-
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.0
22+
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
2323
name: gpu-feature-discovery
2424
command: ["/usr/bin/gpu-feature-discovery"]
2525
volumeMounts:

deployments/static/gpu-feature-discovery-daemonset-with-mig-single.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ metadata:
44
name: gpu-feature-discovery
55
labels:
66
app.kubernetes.io/name: gpu-feature-discovery
7-
app.kubernetes.io/version: 0.16.0
7+
app.kubernetes.io/version: 0.16.1
88
app.kubernetes.io/part-of: nvidia-gpu
99
spec:
1010
selector:
@@ -15,11 +15,11 @@ spec:
1515
metadata:
1616
labels:
1717
app.kubernetes.io/name: gpu-feature-discovery
18-
app.kubernetes.io/version: 0.16.0
18+
app.kubernetes.io/version: 0.16.1
1919
app.kubernetes.io/part-of: nvidia-gpu
2020
spec:
2121
containers:
22-
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.0
22+
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
2323
name: gpu-feature-discovery
2424
command: ["/usr/bin/gpu-feature-discovery"]
2525
volumeMounts:

deployments/static/gpu-feature-discovery-daemonset.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ metadata:
44
name: gpu-feature-discovery
55
labels:
66
app.kubernetes.io/name: gpu-feature-discovery
7-
app.kubernetes.io/version: 0.16.0
7+
app.kubernetes.io/version: 0.16.1
88
app.kubernetes.io/part-of: nvidia-gpu
99
spec:
1010
selector:
@@ -15,11 +15,11 @@ spec:
1515
metadata:
1616
labels:
1717
app.kubernetes.io/name: gpu-feature-discovery
18-
app.kubernetes.io/version: 0.16.0
18+
app.kubernetes.io/version: 0.16.1
1919
app.kubernetes.io/part-of: nvidia-gpu
2020
spec:
2121
containers:
22-
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.0
22+
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
2323
name: gpu-feature-discovery
2424
command: ["/usr/bin/gpu-feature-discovery"]
2525
volumeMounts:

deployments/static/gpu-feature-discovery-job.yaml.template

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,19 @@ metadata:
44
name: gpu-feature-discovery
55
labels:
66
app.kubernetes.io/name: gpu-feature-discovery
7-
app.kubernetes.io/version: 0.16.0
7+
app.kubernetes.io/version: 0.16.1
88
app.kubernetes.io/part-of: nvidia-gpu
99
spec:
1010
template:
1111
metadata:
1212
labels:
1313
app.kubernetes.io/name: gpu-feature-discovery
14-
app.kubernetes.io/version: 0.16.0
14+
app.kubernetes.io/version: 0.16.1
1515
app.kubernetes.io/part-of: nvidia-gpu
1616
spec:
1717
nodeName: NODE_NAME
1818
containers:
19-
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.0
19+
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
2020
name: gpu-feature-discovery
2121
command: ["/usr/bin/gpu-feature-discovery"]
2222
args:

deployments/static/nvidia-device-plugin-compat-with-cpumanager.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ spec:
3838
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
3939
priorityClassName: "system-node-critical"
4040
containers:
41-
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.0
41+
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
4242
name: nvidia-device-plugin-ctr
4343
env:
4444
- name: FAIL_ON_INIT_ERROR

deployments/static/nvidia-device-plugin-privileged-with-service-account.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ spec:
124124
- env:
125125
- name: PASS_DEVICE_SPECS
126126
value: "true"
127-
image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
127+
image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
128128
name: nvidia-device-plugin-ctr
129129
securityContext:
130130
privileged: true

deployments/static/nvidia-device-plugin.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ spec:
3838
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
3939
priorityClassName: "system-node-critical"
4040
containers:
41-
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.0
41+
- image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
4242
name: nvidia-device-plugin-ctr
4343
env:
4444
- name: FAIL_ON_INIT_ERROR

go.mod

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ require (
66
github.com/NVIDIA/go-gpuallocator v0.5.0
77
github.com/NVIDIA/go-nvlib v0.6.0
88
github.com/NVIDIA/go-nvml v0.12.4-0
9-
github.com/NVIDIA/nvidia-container-toolkit v1.16.0
9+
github.com/NVIDIA/nvidia-container-toolkit v1.16.1
1010
github.com/fsnotify/fsnotify v1.7.0
1111
github.com/google/renameio v1.0.1
1212
github.com/google/uuid v1.6.0

go.sum

+2-2
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ github.com/NVIDIA/go-nvlib v0.6.0 h1:zAMBzCYT9xeyRQo0tb7HJbStkzajD6e5joyaQqJ2OGU
2727
github.com/NVIDIA/go-nvlib v0.6.0/go.mod h1:9UrsLGx/q1OrENygXjOuM5Ey5KCtiZhbvBlbUIxtGWY=
2828
github.com/NVIDIA/go-nvml v0.12.4-0 h1:4tkbB3pT1O77JGr0gQ6uD8FrsUPqP1A/EOEm2wI1TUg=
2929
github.com/NVIDIA/go-nvml v0.12.4-0/go.mod h1:8Llmj+1Rr+9VGGwZuRer5N/aCjxGuR5nPb/9ebBiIEQ=
30-
github.com/NVIDIA/nvidia-container-toolkit v1.16.0 h1:NZyKfW0s8nfghoBSJJUth7OZB5ZzRGYbn3RaiTDYdHM=
31-
github.com/NVIDIA/nvidia-container-toolkit v1.16.0/go.mod h1:jJXYvHEdqqpDcRXvolaiFCBsgLxvCwmJWSBZM3zQPY8=
30+
github.com/NVIDIA/nvidia-container-toolkit v1.16.1 h1:PkY6RqYD1wIt1izCvYZ7kr7IitxK8e9+k/prO6b3vD0=
31+
github.com/NVIDIA/nvidia-container-toolkit v1.16.1/go.mod h1:jJXYvHEdqqpDcRXvolaiFCBsgLxvCwmJWSBZM3zQPY8=
3232
github.com/Shopify/logrus-bugsnag v0.0.0-20171204204709-577dee27f20d h1:UrqY+r/OJnIp5u0s1SbQ8dVfLCZJsnvazdBP5hS4iRs=
3333
github.com/Shopify/logrus-bugsnag v0.0.0-20171204204709-577dee27f20d/go.mod h1:HI8ITrYtUY+O+ZhtlqUnD8+KwNPOyugEhfP9fdUIaEQ=
3434
github.com/alecthomas/template v0.0.0-20160405071501-a0175ee3bccc/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc=

vendor/github.com/NVIDIA/nvidia-container-toolkit/internal/platform-support/dgpu/nvml.go

+3-3
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vendor/modules.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ github.com/NVIDIA/go-nvlib/pkg/pciids
4343
## explicit; go 1.20
4444
github.com/NVIDIA/go-nvml/pkg/dl
4545
github.com/NVIDIA/go-nvml/pkg/nvml
46-
# github.com/NVIDIA/nvidia-container-toolkit v1.16.0
46+
# github.com/NVIDIA/nvidia-container-toolkit v1.16.1
4747
## explicit; go 1.20
4848
github.com/NVIDIA/nvidia-container-toolkit/internal/config/image
4949
github.com/NVIDIA/nvidia-container-toolkit/internal/discover

versions.mk

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ MODULE := github.com/NVIDIA/$(DRIVER_NAME)
1717

1818
REGISTRY ?= nvcr.io/nvidia
1919

20-
VERSION ?= v0.16.0
20+
VERSION ?= v0.16.1
2121

2222
# vVERSION represents the version with a guaranteed v-prefix
2323
vVERSION := v$(VERSION:v%=%)

0 commit comments

Comments
 (0)