Skip to content

Conversation

@unsc-oni-ancilla
Copy link
Contributor

This PR contains the following updates:

Package Update Change
nvidia-device-plugin minor 0.17.00.18.0

Release Notes

NVIDIA/k8s-device-plugin (nvidia-device-plugin)

v0.18.0

Compare Source

  • Rename getHealthCheckXids and clarify documentation
  • Add support for explicitly enabling XIDs in health checks
  • Deduplicate requested device IDs
  • Check for nil before reading boolean config values
  • Make gated modes (GDS, MOFED, GDRCOPY) optional in CDI
  • Add support for setting gdrcopyEnabled
  • Ignore errors getting device memory using NVML
  • Ensure that directory volumes have Directory type
  • Switch to plain golang image for builds
  • Remove unneeded intermediate container
  • Update CI definitions
  • Switch to distroless golang image
  • Update README.md with RuntimeClass
  • Pass a single context throughout the device-plugin method call stack (#​1284)
  • Remove internal logger in favour of klog (#​1277)
  • Remove FAIL_ON_INIT_ERROR from static examples
  • Detect blackwell architecture
  • Updated .release:staging to stage device-plugin images in nvstaging
  • Use MiB instead of MB for gpu-memory
  • Ignore XID error 109
  • Update README.md adjust set docker runtime default
  • Remove nvidia.com/gpu.imex-domain label
  • Fix containerd runc config error when creating a kind cluster
  • Use stable nividia-container-toolkit repo when creating a kind cluster
  • Switch to context package in go stdlib
  • Raise a warning instead of an error if GPU mode labeler fails
  • Add ada-lovelace architecture label for compute capability 8.9
  • Ensure FAIL_ON_INIT_ERROR boolean env is quoted
  • Honor fail-on-init-error when no resources are found
  • Enable hostPID in the mps-control-daemon pod (#​1045)

v0.17.4

Compare Source

What's Changed

Full Changelog: NVIDIA/k8s-device-plugin@v0.17.3...v0.17.4

v0.17.3

Compare Source

What's Changed

Full Changelog: NVIDIA/k8s-device-plugin@v0.17.2...v0.17.3

v0.17.2

Compare Source

What's Changed

  • Update nvidia.com/gpu.product label to include blackwell architectures
  • Update documentation to indicate that nvidia.com/gpu.memory label is in MiB instead of MB

Full Changelog: NVIDIA/k8s-device-plugin@v0.17.1...v0.17.2

v0.17.1

Compare Source

  • Ensure that generated CDI specs do not contain enable-cuda-compat hooks
  • Remove nvidia.com/gpu.imex-domain label
  • Ignore XID error 109
  • Add ada-lovelace architecture label for compute capability 8.9
  • Ensure FAIL_ON_INIT_ERROR boolean env is quoted
  • Honor fail-on-init-error when no resources are found

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

@unsc-oni-ancilla unsc-oni-ancilla bot added renovate/helm type/minor area/kubernetes Changes made in the kubernetes directory labels Dec 30, 2025
@unsc-oni-ancilla
Copy link
Contributor Author

--- kubernetes/apps/kube-system/nvidia-device-plugin/app Kustomization: kube-system/nvidia-device-plugin HelmRelease: kube-system/nvidia-device-plugin

+++ kubernetes/apps/kube-system/nvidia-device-plugin/app Kustomization: kube-system/nvidia-device-plugin HelmRelease: kube-system/nvidia-device-plugin

@@ -15,13 +15,13 @@

     spec:
       chart: nvidia-device-plugin
       sourceRef:
         kind: HelmRepository
         name: nvidia-device-plugin
         namespace: kube-system
-      version: 0.17.0
+      version: 0.18.0
   install:
     crds: CreateReplace
     remediation:
       retries: 3
     strategy:
       name: RetryOnFailure

@unsc-oni-ancilla
Copy link
Contributor Author

--- HelmRelease: kube-system/nvidia-device-plugin DaemonSet: kube-system/nvidia-device-plugin

+++ HelmRelease: kube-system/nvidia-device-plugin DaemonSet: kube-system/nvidia-device-plugin

@@ -41,24 +41,25 @@

         securityContext:
           allowPrivilegeEscalation: false
           capabilities:
             drop:
             - ALL
         volumeMounts:
-        - name: device-plugin
+        - name: kubelet-device-plugins-dir
           mountPath: /var/lib/kubelet/device-plugins
         - name: mps-shm
           mountPath: /dev/shm
         - name: mps-root
           mountPath: /mps
         - name: cdi-root
           mountPath: /var/run/cdi
       volumes:
-      - name: device-plugin
+      - name: kubelet-device-plugins-dir
         hostPath:
           path: /var/lib/kubelet/device-plugins
+          type: Directory
       - name: mps-root
         hostPath:
           path: /run/nvidia/mps
           type: DirectoryOrCreate
       - name: mps-shm
         hostPath:
--- HelmRelease: kube-system/nvidia-device-plugin DaemonSet: kube-system/nvidia-device-plugin-mps-control-daemon

+++ HelmRelease: kube-system/nvidia-device-plugin DaemonSet: kube-system/nvidia-device-plugin-mps-control-daemon

@@ -22,12 +22,13 @@

         app.kubernetes.io/instance: nvidia-device-plugin
       annotations: {}
     spec:
       priorityClassName: system-node-critical
       runtimeClassName: nvidia
       securityContext: {}
+      hostPID: true
       initContainers:
       - image: nvcr.io/nvidia/k8s-device-plugin:v0.18.1
         name: mps-control-daemon-mounts
         command:
         - mps-control-daemon
         - mount-shm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubernetes Changes made in the kubernetes directory renovate/helm type/minor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant