Conversation
Documentation preview |
319d6c0 to
a4293b1
Compare
gpu-operator/release-notes.rst
Outdated
|
|
||
| * Added support for including extra manifests with the Helm chart. | ||
|
|
||
| * Added a the ``sandboxWorkloads.mode`` field to help manage sandboxWorkloads. with ["kubevirt", "kata"] as valid values. |
There was a problem hiding this comment.
need to add more context to this
gpu-operator/gpu-operator-kata.rst
Outdated
| ============ | ||
|
|
||
| This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator. | ||
| * This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator. |
There was a problem hiding this comment.
| * This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator. | |
| * This feature does not support seamless upgrade from clusterpolicy managed drivers to nvidiadriver managed drivers. Existing driver pods will be terminated immediately if users switch from clusterpolicy to nvidiadriver CRD. Users are required to either use the default nvidiadriver CRD rendered by helm chart or create and manage their own custom nvidiadriver CRDs. |
| You must uninstall an existing installation and then install the Operator again. | ||
| Uninstalling the Operator interrupts services and applications that require access to NVIDIA GPUs. |
There was a problem hiding this comment.
| You must uninstall an existing installation and then install the Operator again. | |
| Uninstalling the Operator interrupts services and applications that require access to NVIDIA GPUs. |
|
|
||
| This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator. | ||
| * This feature does not support an upgrade from an earlier version of the NVIDIA GPU Operator. | ||
| You must uninstall an existing installation and then install the Operator again. |
There was a problem hiding this comment.
| You must uninstall an existing installation and then install the Operator again. |
|
|
||
| #. Create a file, such as ``nvd-precomiled-some.yaml``, with contents like the following: | ||
|
|
||
| .. literalinclude:: ./manifests/input/nvd-precompiled-some.yaml |
There was a problem hiding this comment.
can we also change the driver version specified in these files?
| .. code-block:: console | ||
|
|
||
| $ kubectl label node <node-name> --overwrite driver.precompiled="true" | ||
| $ kubectl label node <node-name> --overwrite driver.version="535" |
|
|
||
| #. Create a file, such as ``nvd-driver-multiple.yaml``, with contents like the following: | ||
|
|
||
| .. literalinclude:: ./manifests/input/nvd-driver-multiple.yaml |
There was a problem hiding this comment.
Same here. Lets use supported versions in this file.
|
|
||
| #. Create a file, such as ``nvd-all.yaml``, with contents like the following: | ||
|
|
||
| .. literalinclude:: ./manifests/input/nvd-all.yaml |
There was a problem hiding this comment.
Lets use updated driver version in this file
| .. code-block:: console | ||
|
|
||
| $ kubectl patch nvidiadriver/demo-silver --type='json' \ | ||
| -p='[{"op": "replace", "path": "/spec/version", "value": "525.125.06"}]' |
There was a problem hiding this comment.
Lets use updated version here.
665840c to
d2a85a2
Compare
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com> Co-authored-by: Rajath Agasthya <rajathagasthya@gmail.com>
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
Signed-off-by: Abigail McCarthy <20771501+a-mccarthy@users.noreply.github.com>
cdesiniotis
left a comment
There was a problem hiding this comment.
Made a first pass. Will review again tomorrow.
| - NVIDIA Kubernetes Device Plugin v0.18.2 | ||
| - NVIDIA MIG Manager for Kubernetes v0.13.1 | ||
| - NVIDIA GPU Feature Discovery for Kubernetes v0.18.2 |
There was a problem hiding this comment.
| - NVIDIA Kubernetes Device Plugin v0.18.2 | |
| - NVIDIA MIG Manager for Kubernetes v0.13.1 | |
| - NVIDIA GPU Feature Discovery for Kubernetes v0.18.2 | |
| - NVIDIA Kubernetes Device Plugin v0.19.0 | |
| - NVIDIA MIG Manager for Kubernetes v0.14.0 | |
| - NVIDIA GPU Feature Discovery for Kubernetes v0.19.0 |
|
|
||
| * Updated software component versions: | ||
|
|
||
| - NVIDIA Driver Manager for Kubernetes v0.9.1 |
There was a problem hiding this comment.
| - NVIDIA Driver Manager for Kubernetes v0.9.1 | |
| - NVIDIA Driver Manager for Kubernetes v0.10.0 |
| This feature requires CRI-O v1.34.0 or later or containerd v1.7.30, v2.1.x, or v2.2.x. | ||
| If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator. | ||
|
|
||
| .. note:: | ||
| OpenShift clusters do not support the Node Feature API yet. |
There was a problem hiding this comment.
I discussed this with @tariq1890 and issue encountered on OpenShift is actually not OpenShift specific -- it can occur with vanilla k8s + cri-o. Because of this, we want to limit support for the NRI plugin to just containerd. We are hoping to remove this limitation in the future.
| This feature requires CRI-O v1.34.0 or later or containerd v1.7.30, v2.1.x, or v2.2.x. | |
| If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator. | |
| .. note:: | |
| OpenShift clusters do not support the Node Feature API yet. | |
| This feature requires containerd v1.7.30, v2.1.x, or v2.2.x. | |
| If you are not using the latest containerd version, check that both CDI and NRI are enabled in the containerd configuration file before deploying GPU Operator. | |
| .. note:: | |
| Enabling the NRI plugin is not supported with cri-o. |
Corresponding content in the cdi.rst page needs to be updated as well.
| * Added full support for the NVIDIA Driver Custom Resource Definition (CRD). | ||
| Previously available in Technology Preview, the NVIDIA Driver CRD is now generally available. | ||
| Use this feature to configure multiple driver types and versions on different nodes or multiple operating system versions on nodes. | ||
| Refer to the :doc:`NVIDIA Driver Custom Resource Definition documentation <gpu-driver-configuration>` for more information. |
There was a problem hiding this comment.
Question -- do we want to call out the limitations regarding migration here?
| Use this feature to configure multiple driver types and versions on different nodes or multiple operating system versions on nodes. | ||
| Refer to the :doc:`NVIDIA Driver Custom Resource Definition documentation <gpu-driver-configuration>` for more information. | ||
|
|
||
| * Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework. |
There was a problem hiding this comment.
| * Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework. | |
| * Added support for KubeVirt with GPU passthrough on Ubuntu 24.04 LTS |
|
|
||
| * Added support for KubeVirt GPU passthrough with Ubuntu 24.04 LTS and the VFIO framework. | ||
|
|
||
| * Added support for vGPU precompiled driver container for Azure Linux. |
There was a problem hiding this comment.
@rajathagasthya were we actually planning to call this out in the release notes? My gut says no...
| * Added PodSecurityContext support for DaemonSets (`PR #2120 <https://github.com/NVIDIA/gpu-operator/pull/2120>`_). | ||
| In ClusterPolicy, set ``spec.daemonsets.podSecurityContext``; in NVIDIADriver, set ``spec.podSecurityContext``. | ||
|
|
||
| * See `PR #2014 <https://github.com/NVIDIA/gpu-operator/pull/2014>`_ for related changes. |
There was a problem hiding this comment.
This bullet lacks context / feels out of place. Was this meant to be a sub-bullet for another bullet?
| * Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet. | ||
| This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers). |
There was a problem hiding this comment.
This never made it in.
| * Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet. | |
| This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers). |
| * Improved the NVIDIA Kubernetes Device Plugin to avoid unnecessary GPU unbind/rebind operations during rolling updates of the vfio-manager DaemonSet. | ||
| This improves the stability of GPU passthrough workloads (KubeVirt, Kata Containers). | ||
| * Improved the Upgrade Controller to decrease unnecessary reconciliation in environments with Node Feature Discovery (NFD) enabled. | ||
| * Improved performance (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_). |
There was a problem hiding this comment.
What perf improvements? We should be more specific.
A suggestion (but someone might have a better suggestion 😄 ):
| * Improved performance (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_). | |
| * Improved performance of the clusterpolicy controller by reducing the number of API calls made (`PR #2113 <https://github.com/NVIDIA/gpu-operator/pull/2113>`_). |
|
|
||
| * Marked unused field ``defaultRuntime`` as optional in the ClusterPolicy. (`PR #2000 <https://github.com/NVIDIA/gpu-operator/pull/2000>`_) | ||
| * The NVIDIA Kata Manager for Kubernetes is now deprecated. | ||
| To enable Kata Containers for GPUs, install the upstream kata-deploy Helm chart, which deploys all Kata runtime classes, including the NVIDIA-specific runtime classes. |
There was a problem hiding this comment.
Should we link to the kata procedure (which you are adding in #365) here?
No description provided.