Add docs for 26.3.0 release by a-mccarthy · Pull Request #353 · NVIDIA/cloud-native-docs

a-mccarthy · 2026-02-25T18:40:17Z

No description provided.

github-actions · 2026-02-25T18:43:30Z

Documentation preview

https://nvidia.github.io/cloud-native-docs/review/pr-353

gpu-operator/gpu-operator-mig.rst

a-mccarthy · 2026-03-04T17:42:22Z

gpu-operator/release-notes.rst

+
+* Added support for including extra manifests with the Helm chart.
+
+* Added a the ``sandboxWorkloads.mode`` field to help manage sandboxWorkloads.  with ["kubevirt", "kata"] as valid values.


need to add more context to this

gpu-operator/life-cycle-policy.rst

gpu-operator/release-notes.rst

a-mccarthy · 2026-03-10T19:56:14Z

gpu-operator/gpu-operator-kata.rst

still a WIP

rahulait · 2026-03-10T20:22:27Z

gpu-operator/gpu-driver-configuration.rst

+============

-   This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
+*  This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.


Suggested change

* This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.

* This feature does not support seamless upgrade from clusterpolicy managed drivers to nvidiadriver managed drivers. Existing driver pods will be terminated immediately if users switch from clusterpolicy to nvidiadriver CRD. Users are required to either use the default nvidiadriver CRD rendered by helm chart or create and manage their own custom nvidiadriver CRDs.

rahulait · 2026-03-10T20:22:43Z

gpu-operator/gpu-driver-configuration.rst

   You must uninstall an existing installation and then install the Operator again.
   Uninstalling the Operator interrupts services and applications that require access to NVIDIA GPUs.


Suggested change

You must uninstall an existing installation and then install the Operator again.

Uninstalling the Operator interrupts services and applications that require access to NVIDIA GPUs.

rahulait · 2026-03-10T20:22:57Z

gpu-operator/gpu-driver-configuration.rst


-   This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
+*  This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
   You must uninstall an existing installation and then install the Operator again.


Suggested change

You must uninstall an existing installation and then install the Operator again.

rahulait · 2026-03-10T20:58:03Z

gpu-operator/gpu-driver-configuration.rst


 #. Create a file, such as ``nvd-precomiled-some.yaml``, with contents like the following:

   .. literalinclude:: ./manifests/input/nvd-precompiled-some.yaml


can we also change the driver version specified in these files?

rahulait · 2026-03-10T20:58:15Z

gpu-operator/gpu-driver-configuration.rst

   .. code-block:: console

      $ kubectl label node <node-name> --overwrite driver.precompiled="true"
      $ kubectl label node <node-name> --overwrite driver.version="535"


Lets use 580 here.

rahulait · 2026-03-10T20:58:56Z

gpu-operator/gpu-driver-configuration.rst


 #. Create a file, such as ``nvd-driver-multiple.yaml``, with contents like the following:

   .. literalinclude:: ./manifests/input/nvd-driver-multiple.yaml


Same here. Lets use supported versions in this file.

rahulait · 2026-03-10T20:59:28Z

gpu-operator/gpu-driver-configuration.rst


 #. Create a file, such as ``nvd-all.yaml``, with contents like the following:

   .. literalinclude:: ./manifests/input/nvd-all.yaml


Lets use updated driver version in this file

rahulait · 2026-03-10T21:00:22Z

gpu-operator/gpu-driver-configuration.rst

   .. code-block:: console

      $ kubectl patch nvidiadriver/demo-silver --type='json' \
          -p='[{"op": "replace", "path": "/spec/version", "value": "525.125.06"}]'


Lets use updated version here.

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> Co-authored-by: Rajath Agasthya <rajathagasthya@gmail.com>

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

cdesiniotis

Made a first pass. Will review again tomorrow.

cdesiniotis · 2026-03-18T01:48:59Z

gpu-operator/release-notes.rst

+  - NVIDIA Kubernetes Device Plugin v0.18.2
+  - NVIDIA MIG Manager for Kubernetes v0.13.1
+  - NVIDIA GPU Feature Discovery for Kubernetes v0.18.2


Suggested change

- NVIDIA Kubernetes Device Plugin v0.18.2

- NVIDIA MIG Manager for Kubernetes v0.13.1

- NVIDIA GPU Feature Discovery for Kubernetes v0.18.2

- NVIDIA Kubernetes Device Plugin v0.19.0

- NVIDIA MIG Manager for Kubernetes v0.14.0

- NVIDIA GPU Feature Discovery for Kubernetes v0.19.0

cdesiniotis · 2026-03-18T01:49:06Z

gpu-operator/release-notes.rst

+
+* Updated software component versions:
+
+  - NVIDIA Driver Manager for Kubernetes v0.9.1


Suggested change

- NVIDIA Driver Manager for Kubernetes v0.9.1

- NVIDIA Driver Manager for Kubernetes v0.10.0

cdesiniotis · 2026-03-18T01:53:49Z

gpu-operator/release-notes.rst

+  This feature requires CRI-O v1.34.0 or later or containerd v1.7.30, v2.1.x, or v2.2.x.
+  If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator.
+
+  .. note::
+    OpenShift clusters do not support the Node Feature API yet.


I discussed this with @tariq1890 and issue encountered on OpenShift is actually not OpenShift specific -- it can occur with vanilla k8s + cri-o. Because of this, we want to limit support for the NRI plugin to just containerd. We are hoping to remove this limitation in the future.

Suggested change

This feature requires CRI-O v1.34.0 or later or containerd v1.7.30, v2.1.x, or v2.2.x.

If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator.

.. note::

OpenShift clusters do not support the Node Feature API yet.

This feature requires containerd v1.7.30, v2.1.x, or v2.2.x.

If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator.

.. note::

Enabling the NRI plugin is not supported with cri-o.

Corresponding content in the cdi.rst page needs to be updated as well.

cdesiniotis · 2026-03-18T01:54:46Z

gpu-operator/release-notes.rst

+* Added full support for the NVIDIA Driver Custom Resource Definition (CRD).
+  Previously available in Technology Preview, the NVIDIA Driver CRD is now generally available.
+  Use this feature to configure multiple driver types and versions on different nodes or multiple operating system versions on nodes.
+  Refer to the :doc:`NVIDIA Driver Custom Resource Definition documentation <gpu-driver-configuration>` for more information.


Question -- do we want to call out the limitations regarding migration here?

cdesiniotis · 2026-03-18T01:56:47Z

gpu-operator/release-notes.rst

+  Use this feature to configure multiple driver types and versions on different nodes or multiple operating system versions on nodes.
+  Refer to the :doc:`NVIDIA Driver Custom Resource Definition documentation <gpu-driver-configuration>` for more information.
+
+* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.


Suggested change

* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.

* Added support for KubeVirt with GPU passthrough on Ubuntu 24.04 LTS

cdesiniotis · 2026-03-18T01:57:33Z

gpu-operator/release-notes.rst

+
+* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.
+
+* Added support for vGPU precompiled driver container for Azure Linux.


@rajathagasthya were we actually planning to call this out in the release notes? My gut says no...

cdesiniotis · 2026-03-18T01:59:25Z

gpu-operator/release-notes.rst

+* Added PodSecurityContext support for DaemonSets (`PR #2120 <https://github.com/NVIDIA/gpu-operator/pull/2120>`_).
+  In ClusterPolicy, set ``spec.daemonsets.podSecurityContext``; in NVIDIADriver, set ``spec.podSecurityContext``.
+
+* See `PR #2014 <https://github.com/NVIDIA/gpu-operator/pull/2014>`_ for related changes.


This bullet lacks context / feels out of place. Was this meant to be a sub-bullet for another bullet?

cdesiniotis · 2026-03-18T01:59:52Z

gpu-operator/release-notes.rst

+  * Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet.
+    This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers).


This never made it in.

Suggested change

* Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet.

This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers).

cdesiniotis · 2026-03-18T02:04:32Z

gpu-operator/release-notes.rst

+  * Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet.
+    This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers).
+  * Improved the Upgrade Controller to decrease unnecessary reconciliation in environments with Node Feature Discovery (NFD) enabled. 
+  * Improved performance (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).


What perf improvements? We should be more specific.

A suggestion (but someone might have a better suggestion 😄 ):

Suggested change

* Improved performance (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).

* Improved performance of the clusterpolicy controller by reducing the number of API calls made (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).

cdesiniotis · 2026-03-18T02:07:39Z

gpu-operator/release-notes.rst

+
+* Marked unused field ``defaultRuntime`` as optional in the ClusterPolicy. (`PR #2000 <https://github.com/NVIDIA/gpu-operator/pull/2000>`_)
+* The NVIDIA Kata Manager for Kubernetes is now deprecated.
+  To enable Kata Containers for GPUs, install the upstream kata-deploy Helm chart, which deploys all Kata runtime classes, including the NVIDIA-specific runtime classes.


Should we link to the kata procedure (which you are adding in #365) here?

a-mccarthy force-pushed the dev-26.3.0 branch from 319d6c0 to a4293b1 Compare February 27, 2026 16:42

rajathagasthya reviewed Mar 2, 2026

View reviewed changes

a-mccarthy force-pushed the dev-26.3.0 branch from 91a0edb to d830e1c Compare March 4, 2026 17:38

a-mccarthy commented Mar 4, 2026

View reviewed changes

gpu-operator/life-cycle-policy.rst Show resolved Hide resolved

empovit reviewed Mar 4, 2026

View reviewed changes

gpu-operator/release-notes.rst Outdated Show resolved Hide resolved

empovit reviewed Mar 4, 2026

View reviewed changes

gpu-operator/release-notes.rst Outdated Show resolved Hide resolved

a-mccarthy force-pushed the dev-26.3.0 branch from bd14c4f to c9f87aa Compare March 10, 2026 14:42

a-mccarthy commented Mar 10, 2026

View reviewed changes

gpu-operator/gpu-operator-kata.rst Outdated

Copy link

Collaborator Author

a-mccarthy Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a WIP

rahulait reviewed Mar 10, 2026

View reviewed changes

a-mccarthy force-pushed the dev-26.3.0 branch 2 times, most recently from 665840c to d2a85a2 Compare March 16, 2026 14:47

a-mccarthy and others added 3 commits March 17, 2026 13:53

Add docs for 26.3.0 release

ef45b28

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> Co-authored-by: Rajath Agasthya <rajathagasthya@gmail.com>

remove kata changes

51f260e

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

small updates

7003594

Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>

a-mccarthy force-pushed the dev-26.3.0 branch from 3c58318 to 7003594 Compare March 17, 2026 18:22

cdesiniotis reviewed Mar 18, 2026

View reviewed changes


		* Added support for including extra manifests with the Helm chart.

		* Added a the ``sandboxWorkloads.mode`` field to help manage sandboxWorkloads. with ["kubevirt", "kata"] as valid values.

	* This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator.
	* This feature does not support seamless upgrade from clusterpolicy managed drivers to nvidiadriver managed drivers. Existing driver pods will be terminated immediately if users switch from clusterpolicy to nvidiadriver CRD. Users are required to either use the default nvidiadriver CRD rendered by helm chart or create and manage their own custom nvidiadriver CRDs.

		You must uninstall an existing installation and then install the Operator again.
		Uninstalling the Operator interrupts services and applications that require access to NVIDIA GPUs.


		#. Create a file, such as ``nvd-precomiled-some.yaml``, with contents like the following:

		.. literalinclude:: ./manifests/input/nvd-precompiled-some.yaml


		#. Create a file, such as ``nvd-driver-multiple.yaml``, with contents like the following:

		.. literalinclude:: ./manifests/input/nvd-driver-multiple.yaml


		#. Create a file, such as ``nvd-all.yaml``, with contents like the following:

		.. literalinclude:: ./manifests/input/nvd-all.yaml


		* Updated software component versions:

		- NVIDIA Driver Manager for Kubernetes v0.9.1

	* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.
	* Added support for KubeVirt with GPU passthrough on Ubuntu 24.04 LTS


		* Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework.

		* Added support for vGPU precompiled driver container for Azure Linux.

		* Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet.
		This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers).

	* Improved performance (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).
	* Improved performance of the clusterpolicy controller by reducing the number of API calls made (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_).

Conversation

a-mccarthy commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Documentation preview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdesiniotis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants