Skip to content

Add docs for 26.3.0 release#353

Open
a-mccarthy wants to merge 3 commits intoNVIDIA:mainfrom
a-mccarthy:dev-26.3.0
Open

Add docs for 26.3.0 release#353
a-mccarthy wants to merge 3 commits intoNVIDIA:mainfrom
a-mccarthy:dev-26.3.0

Conversation

@a-mccarthy
Copy link
Collaborator

No description provided.

@github-actions
Copy link

Documentation preview

https://nvidia.github.io/cloud-native-docs/review/pr-353


* Added support for including extra manifests with the Helm chart.

* Added a the ``sandboxWorkloads.mode`` field to help manage sandboxWorkloads. with ["kubevirt", "kata"] as valid values.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add more context to this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a WIP

============

This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
* This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
* This feature does not support seamless upgrade from clusterpolicy managed drivers to nvidiadriver managed drivers. Existing driver pods will be terminated immediately if users switch from clusterpolicy to nvidiadriver CRD. Users are required to either use the default nvidiadriver CRD rendered by helm chart or create and manage their own custom nvidiadriver CRDs.

Comment on lines 38 to 39
You must uninstall an existing installation and then install the Operator again.
Uninstalling the Operator interrupts services and applications that require access to NVIDIA GPUs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You must uninstall an existing installation and then install the Operator again.
Uninstalling the Operator interrupts services and applications that require access to NVIDIA GPUs.


This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
* This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
You must uninstall an existing installation and then install the Operator again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You must uninstall an existing installation and then install the Operator again.


#. Create a file, such as ``nvd-precomiled-some.yaml``, with contents like the following:

.. literalinclude:: ./manifests/input/nvd-precompiled-some.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also change the driver version specified in these files?

.. code-block:: console

$ kubectl label node <node-name> --overwrite driver.precompiled="true"
$ kubectl label node <node-name> --overwrite driver.version="535"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use 580 here.


#. Create a file, such as ``nvd-driver-multiple.yaml``, with contents like the following:

.. literalinclude:: ./manifests/input/nvd-driver-multiple.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Lets use supported versions in this file.


#. Create a file, such as ``nvd-all.yaml``, with contents like the following:

.. literalinclude:: ./manifests/input/nvd-all.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use updated driver version in this file

.. code-block:: console

$ kubectl patch nvidiadriver/demo-silver --type='json' \
-p='[{"op": "replace", "path": "/spec/version", "value": "525.125.06"}]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use updated version here.

@a-mccarthy a-mccarthy force-pushed the dev-26.3.0 branch 2 times, most recently from 665840c to d2a85a2 Compare March 16, 2026 14:47
a-mccarthy and others added 3 commits March 17, 2026 13:53
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

Co-authored-by: Rajath Agasthya <rajathagasthya@gmail.com>
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Copy link
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass. Will review again tomorrow.

Comment on lines +51 to +53
- NVIDIA Kubernetes Device Plugin v0.18.2
- NVIDIA MIG Manager for Kubernetes v0.13.1
- NVIDIA GPU Feature Discovery for Kubernetes v0.18.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- NVIDIA Kubernetes Device Plugin v0.18.2
- NVIDIA MIG Manager for Kubernetes v0.13.1
- NVIDIA GPU Feature Discovery for Kubernetes v0.18.2
- NVIDIA Kubernetes Device Plugin v0.19.0
- NVIDIA MIG Manager for Kubernetes v0.14.0
- NVIDIA GPU Feature Discovery for Kubernetes v0.19.0


* Updated software component versions:

- NVIDIA Driver Manager for Kubernetes v0.9.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- NVIDIA Driver Manager for Kubernetes v0.9.1
- NVIDIA Driver Manager for Kubernetes v0.10.0

Comment on lines +66 to +70
This feature requires CRI-O v1.34.0 or later or containerd v1.7.30, v2.1.x, or v2.2.x.
If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator.

.. note::
OpenShift clusters do not support the Node Feature API yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @tariq1890 and issue encountered on OpenShift is actually not OpenShift specific -- it can occur with vanilla k8s + cri-o. Because of this, we want to limit support for the NRI plugin to just containerd. We are hoping to remove this limitation in the future.

Suggested change
This feature requires CRI-O v1.34.0 or later or containerd v1.7.30, v2.1.x, or v2.2.x.
If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator.
.. note::
OpenShift clusters do not support the Node Feature API yet.
This feature requires containerd v1.7.30, v2.1.x, or v2.2.x.
If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator.
.. note::
Enabling the NRI plugin is not supported with cri-o.

Corresponding content in the cdi.rst page needs to be updated as well.

Comment on lines +78 to +81
* Added full support for the NVIDIA Driver Custom Resource Definition (CRD).
Previously available in Technology Preview, the NVIDIA Driver CRD is now generally available.
Use this feature to configure multiple driver types and versions on different nodes or multiple operating system versions on nodes.
Refer to the :doc:`NVIDIA Driver Custom Resource Definition documentation <gpu-driver-configuration>` for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- do we want to call out the limitations regarding migration here?

Use this feature to configure multiple driver types and versions on different nodes or multiple operating system versions on nodes.
Refer to the :doc:`NVIDIA Driver Custom Resource Definition documentation <gpu-driver-configuration>` for more information.

* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.
* Added support for KubeVirt with GPU passthrough on Ubuntu 24.04 LTS


* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.

* Added support for vGPU precompiled driver container for Azure Linux.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajathagasthya were we actually planning to call this out in the release notes? My gut says no...

* Added PodSecurityContext support for DaemonSets (`PR #2120 <https://github.com/NVIDIA/gpu-operator/pull/2120>`_).
In ClusterPolicy, set ``spec.daemonsets.podSecurityContext``; in NVIDIADriver, set ``spec.podSecurityContext``.

* See `PR #2014 <https://github.com/NVIDIA/gpu-operator/pull/2014>`_ for related changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet lacks context / feels out of place. Was this meant to be a sub-bullet for another bullet?

Comment on lines +118 to +119
* Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet.
This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This never made it in.

Suggested change
* Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet.
This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers).

* Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet.
This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers).
* Improved the Upgrade Controller to decrease unnecessary reconciliation in environments with Node Feature Discovery (NFD) enabled.
* Improved performance (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What perf improvements? We should be more specific.

A suggestion (but someone might have a better suggestion 😄 ):

Suggested change
* Improved performance (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).
* Improved performance of the clusterpolicy controller by reducing the number of API calls made (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).


* Marked unused field ``defaultRuntime`` as optional in the ClusterPolicy. (`PR #2000 <https://github.com/NVIDIA/gpu-operator/pull/2000>`_)
* The NVIDIA Kata Manager for Kubernetes is now deprecated.
To enable Kata Containers for GPUs, install the upstream kata-deploy Helm chart, which deploys all Kata runtime classes, including the NVIDIA-specific runtime classes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to the kata procedure (which you are adding in #365) here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants