diff --git a/gpu-operator/graphics/CoCo-Architecture.png b/confidential-containers/graphics/CoCo-Architecture.png similarity index 100% rename from gpu-operator/graphics/CoCo-Architecture.png rename to confidential-containers/graphics/CoCo-Architecture.png diff --git a/confidential-containers/graphics/CoCo-Reference-Architecture.png b/confidential-containers/graphics/CoCo-Reference-Architecture.png new file mode 100644 index 000000000..decb5f3da Binary files /dev/null and b/confidential-containers/graphics/CoCo-Reference-Architecture.png differ diff --git a/confidential-containers/graphics/CoCo-Sample-Workflow.png b/confidential-containers/graphics/CoCo-Sample-Workflow.png new file mode 100644 index 000000000..57fb11717 Binary files /dev/null and b/confidential-containers/graphics/CoCo-Sample-Workflow.png differ diff --git a/confidential-containers/index.rst b/confidential-containers/index.rst new file mode 100644 index 000000000..084a6e728 --- /dev/null +++ b/confidential-containers/index.rst @@ -0,0 +1,67 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + +********************************************************** +NVIDIA Confidential Containers Architecture (Early Access) +********************************************************** + +.. toctree:: + :caption: NVIDIA Confidential Containers Architecture + :hidden: + :titlesonly: + + Overview + Deploy Confidential Containers with NVIDIA GPU Operator + + +This is documentation for NVIDIA's Early Access implementation of Confidential Containers including reference architecture information and supported platforms. + + +.. grid:: 3 + :gutter: 3 + + .. grid-item-card:: :octicon:`book;1.5em;sd-mr-1` Overview + :link: overview + :link-type: doc + + Introduction and approach to Confidential Containers. + + .. grid-item-card:: :octicon:`project;1.5em;sd-mr-1` Architecture + :link: coco-architecture + :link-type: ref + + High-level flow and diagram for Confidential Containers architecture. + + .. grid-item-card:: :octicon:`briefcase;1.5em;sd-mr-1` Use Cases + :link: coco-use-cases + :link-type: ref + + Regulated industries and workloads that benefit from confidential computing. + + .. grid-item-card:: :octicon:`package;1.5em;sd-mr-1` Components + :link: coco-supported-platforms-components + :link-type: ref + + Key software components for confidential containers. + + .. grid-item-card:: :octicon:`server;1.5em;sd-mr-1` Supported Platforms + :link: coco-supported-platforms + :link-type: ref + + Platform and feature support scope for Early Access (EA). + diff --git a/confidential-containers/overview.rst b/confidential-containers/overview.rst new file mode 100644 index 000000000..ee76042f8 --- /dev/null +++ b/confidential-containers/overview.rst @@ -0,0 +1,245 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " + + +****************************************************** +NVIDIA Confidential Containers Overview (Early Access) +****************************************************** + +.. admonition:: Early Access + + Confidential Containers are available as Early Access (EA) with curated platform and feature support. EA features are not supported in production and are not functionally complete. API and architectural designs are not final and may change. + +.. _confidential-containers-overview: + +Overview +======== +NVIDIA GPUs power the training and deployment of Frontier Models—world-class Large Language Models (LLMs) that define the state of the art in AI reasoning and capability. + +As organizations adopt these models in regulated industries such as financial services, healthcare, and the public sector, protecting model intellectual property and sensitive user data becomes essential. Additionally, the model deployment landscape is evolving to include public clouds, enterprise on-premises, and edge. A zero-trust posture on cloud-native platforms such as Kubernetes is essential to secure assets (model IP and enterprise private data) from untrusted infrastructure with privileged user access. + +Securing data at rest and in transit is standard.Protecting data in-use remains a critical gap. Confidential Computing (CC) addresses this gap by providing isolation, encryption, and integrity verification of proprietary application code and sensitive data during processing. CC uses hardware-based Trusted Execution Environments (TEEs), such as AMD SEV-SNP / Intel TDX technologies, and NVIDIA Confidential Computing capabilities to create trusted enclaves. + +In addition to TEEs, Confidential Computing provides Remote Attestation features. Attestation enables remote systems or users to interrogate the security state of a TEE before interacting with it and providing any secrets or sensitive data. + +`Confidential Containers `_ (CoCo) is the cloud-native approach of CC on Kubernetes. +The Confidential Containers architecture leverages Kata Containers to provide the sandboxing capabilities. `Kata Containers `_ is an open-source project that provides lightweight Utility Virtual Machines (UVMs) that feel and perform like containers while providing strong workload isolation. Along with the Confidential Containers project, Kata enables the orchestration of secure, GPU-accelerated workloads in Kubernetes. + +.. _coco-architecture: + +Architecture Overview +===================== + +NVIDIA's approach to the Confidential Containers architecture delivers on the key promise of Confidential Computing: confidentiality, integrity, and verifiability. +Integrating open source and NVIDIA software components with the Confidential Computing capabilities of NVIDIA GPUs, the Reference Architecture for Confidential Containers is designed to be the secure and trusted deployment model for AI workloads. + +.. image:: graphics/CoCo-Reference-Architecture.png + :alt: High-Level Reference Architecture for Confidential Containers + +*High-Level Reference Architecture for Confidential Containers* + +The key value proposition for this architecture approach is: + +1. **Built on OSS standards** - The Reference Architecture for Confidential Containers is built on key OSS components such as Kata, Trustee, QEMU, OVMF, and Node Feature Discovery (NFD), along with hardened NVIDIA components like NVIDIA GPU Operator. +2. **Highest level of isolation** - The Confidential Containers architecture is built on Kata containers, which is the industry standard for providing hardened sandbox isolation, and augmenting it with support for GPU passthrough to Kata containers makes the base of the Trusted Execution Environment (TEE). +3. **Zero-trust execution with Attestation** - Ensuring the trust of the model providers/data owners by providing a full-stack verification capability with Attestation. The integration of NVIDIA GPU attestation capabilities with Trustee based architecture, to provide composite attestation provides the base for secure, attestation based key-release for encrypted workloads, deployed inside the TEE. + +.. _coco-use-cases: + +Use Cases +========= + +The target for Confidential Containers is to enable model providers (Close and Open source) and Enterprises to leverage the advancements of Gen AI, agnostic to the deployment model (Cloud, Enterprise, or Edge). Some of the key use cases that CC and Confidential Containers enable are: + +* **Zero-Trust AI & IP Protection:** You can deploy proprietary models (like LLMs) on third-party or private infrastructure. The model weights remain encrypted and are only decrypted inside the hardware-protected enclave, ensuring absolute IP protection from the host. +* **Data Clean Rooms:** This allows you to process sensitive enterprise data (like financial analytics or healthcare records) securely. Neither the infrastructure provider nor the model builder can see the raw data. + +.. image:: graphics/CoCo-Sample-Workflow.png + :alt: Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo + +*Sample Workflow for Securing Model IP on Untrusted Infrastructure with CoCo* + +.. _coco-supported-platforms-components: + +Software Components for Confidential Containers +=============================================== + +The following is a brief overview of the software components for Confidential Containers. + +**Kata Containers** + +Acts as the secure isolation layer by running standard Kubernetes Pods inside lightweight, hardware-isolated Utility VMs (UVMs) rather than sharing the untrusted host kernel. Kata containers are integrated with the Kubernetes `Agent Sandbox `_ project to deliver sandboxing capabilities. + +**NVIDIA GPU Operator** + +Automates GPU lifecycle management. For Confidential Containers, it securely provisions GPU support and handles VFIO-based GPU passthrough directly into the Kata confidential VM without breaking the hardware trust boundary. + +The GPU Operator deploys the components needed to run Confidential Containers to simplify managing the software required for confidential computing and deploying confidential container workloads: + +* NVIDIA Confidential Computing Manager (cc-manager) for Kubernetes - to set the confidential computing (CC) mode on the NVIDIA GPUs. +* NVIDIA Sandbox Device Plugin - to discover NVIDIA GPUs along with their capabilities, to advertise these to Kubernetes, and to allocate GPUs during pod deployment. +* NVIDIA VFIO Manager - to bind discovered NVIDIA GPUs to the vfio-pci driver for VFIO passthrough. +* NVIDIA Kata Manager for Kubernetes - to create host-side CDI specifications for GPU passthrough. + +**Kata Deploy** + +Deployment mechanism (often managed via Helm) that installs the Kata runtime binaries, UVM kernels, and TEE-specific shims (such as ``kata-qemu-nvidia-gpu-snp`` or ``kata-qemu-nvidia-gpu-tdx``) onto the cluster's worker nodes. + +**Node Feature Discovery (NFD)** + +Bootstraps the node by advertising the node features via labels to make sophisticated scheduling decisions, like installing the Kata/CoCo stack only on the nodes that support the CC prerequisites for CPU and GPU. This feature directs the Operator to install node feature rules that detect CPU security features and the NVIDIA GPU hardware. + +**Trustee** + +Attestation and key brokering framework (which includes the Key Broker Service and Attestation Service). It acts as the cryptographic gatekeeper, verifying hardware/software evidence and only releasing secrets if the environment is proven secure. + +**Snapshotter (e.g., Nydus)** + +Handles the "Guest Pull" functionality. It bypasses the host to fetch and unpack encrypted container images directly inside the protected guest memory, keeping proprietary code hidden. + +**Kata Agent Policy** + +Runs inside the guest VM to manage the container lifecycle while enforcing a strict, immutable policy based on Rego (regorus) for allow-list. This blocks the untrusted host from executing unauthorized commands, such as a malicious ``kubectl exec``. + +**Confidential Data Hub (CDH)** + +An in-guest component that securely receives decrypted secrets from Trustee and transparently manages encrypted persistent storage and image decryption for the workload. + +**NVRC (NVIDIA runcom)** + +A minimal, chiseled and hardened init system that securely bootstraps the guest environment, life cycles the kata-agent, provides health checks on started helper daemons and launches the Kata Agent while drastically reducing the attack surface. + +Software Stack and Component Versions +-------------------------------------- + +The following is the component stack to support the open Reference Architecture (RA) along with the proposed versions of different SW components. + +.. flat-table:: + :header-rows: 1 + + * - Category + - Component + - Release/Version + * - :rspan:`1` **HW Platform** + - GPU Platform + - | Hopper 100/200 + | Blackwell B200 + | Blackwell RTX Pro 6000 + * - CPU Platform + - | AMD Genoa/ Milan + | Intel ER/ GR + * - :rspan:`7` **Host SW Components** + - Host OS + - 25.10 + * - Host Kernel + - 6.17+ + * - Guest OS + - Distroless + * - Guest kernel + - 6.18.5 + * - OVMF + - edk2-stable202511 + * - QEMU + - 10.1 \+ Patches + * - Containerd + - 2.2.2 \+ + * - Kubernetes + - 1.32 \+ + * - :rspan:`3` **Confidential Containers Core Components** + - NFD + - v0.6.0 + * - NVIDIA/gpu-operator + | - NVIDIA VFIO Manager + | - NVIDIA Sandbox device plugin + | - NVIDIA Confidential Computing Manager for Kubernetes + | - NVIDIA Kata Manager for Kubernetes + - v25.10.0 and higher + * - CoCo release (EA) + | - Kata 3.25 (w/ kata-deploy helm) + | -Trustee/Guest components 0.17.0 + | - KBS protocol 0.4.0 + - v0.18.0 + + +Cluster Topology Considerations +------------------------------- + +You can configure all the worker nodes in your cluster for running GPU workloads with confidential containers, or you can configure some nodes for Confidential Containers and the others for traditional containers. Consider the following example where node A is configured to run traditional containers and node B is configured to run confidential containers. + +.. list-table:: + :widths: 50 50 + :header-rows: 1 + + * - Node A - Traditional Containers receives the following software components + - Node B - Kata CoCo receives the following software components + * - * NVIDIA Driver Manager for Kubernetes + * NVIDIA Container Toolkit + * NVIDIA Device Plugin for Kubernetes + * NVIDIA DCGM and DCGM Exporter + * NVIDIA MIG Manager for Kubernetes + * Node Feature Discovery + * NVIDIA GPU Feature Discovery + - * NVIDIA Kata Manager for Kubernetes + * NVIDIA Confidential Computing Manager for Kubernetes + * NVIDIA Sandbox Device Plugin + * NVIDIA VFIO Manager + * Node Feature Discovery + +This configuration can be controlled via node labelling, as described in the `GPU Operator confidential containers deployment guide `_. + +.. _coco-supported-platforms: + +Supported Platforms +=================== + +Following is the platform and feature support scope for Early Access (EA) of Confidential Containers open Reference Architecture published by NVIDIA. + +.. flat-table:: Supported Platforms + :header-rows: 1 + + * - Component + - Feature + * - GPU Platform + - Hopper 100/200 + * - TEE + - AMD SEV-SNP only + * - Feature Support + - Confidential Containers w/ Kata; Single GPU Passthrough only + * - Attestation Support + - Composite Attestation for CPU \+ GPU; integration with Trustee for local verifier. + +Refer to the *Confidential Computing Deployment Guide* at the `Confidential Computing `_ website for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100, and specifically to `CC deployment guide for SEV-SNP `_ for setup specific to AMD SEV-SNP machines. + +The following topics in the deployment guide apply to a cloud-native environment: + +* Hardware selection and initial hardware configuration, such as BIOS settings. +* Host operating system selection, initial configuration, and validation. + +When following the cloud-native sections in the deployment guide linked above, use Ubuntu 25.10 as the host OS with its default kernel version and configuration. + +The remaining configuration topics in the deployment guide do not apply to a cloud-native environment. NVIDIA GPU Operator performs the actions that are described in these topics. + +Limitations and Restrictions for CoCo EA +---------------------------------------- + +* Only the AMD platform using SEV-SNP is supported for Confidential Containers Early Access. +* GPUs are available to containers as a single GPU in passthrough mode only. Multi-GPU passthrough and vGPU are not supported. +* Support is limited to initial installation and configuration only. Upgrade and configuration of existing clusters to configure confidential computing is not supported. +* Support for confidential computing environments is limited to the implementation described on this page. +* NVIDIA supports the GPU Operator and confidential computing with the containerd runtime only. +* NFD doesn’t label all Confidential Container capable nodes as such automatically. In some cases, users must manually label nodes to deploy the NVIDIA Confidential Computing Manager for Kubernetes operand onto these nodes as described below. diff --git a/confidential-containers/versions1.json b/confidential-containers/versions1.json new file mode 100644 index 000000000..4d9c5bd4a --- /dev/null +++ b/confidential-containers/versions1.json @@ -0,0 +1,7 @@ +[ + { + "preferred": "true", + "url": "../1.0.0", + "version": "1.0.0" + } + ] \ No newline at end of file diff --git a/gpu-operator/confidential-containers-deploy.rst b/gpu-operator/confidential-containers-deploy.rst index 0a6cd204c..faa306a02 100644 --- a/gpu-operator/confidential-containers-deploy.rst +++ b/gpu-operator/confidential-containers-deploy.rst @@ -4,19 +4,22 @@ Deploy Confidential Containers with NVIDIA GPU Operator ******************************************************* -This page describes how to deploy Confidential Containers using the NVIDIA GPU Operator. -For an overview of Confidential Containers, refer to :ref:`early-access-gpu-operator-confidential-containers-kata`. - .. note:: Early Access features are not supported in production environments and are not functionally complete. Early Access features provide a preview of upcoming product features, enabling customers to test functionality and provide feedback during the development process. These releases may not have complete documentation, and testing is limited. Additionally, API and architectural designs are not final and may change in the future. + +The page describes deploying COnfidentail Containers with the NVIDIA GPU Operator. +The implementation replies on the Kata Containers project to provide the lightweight utility Virtual Machines (UVMs) that feel and perform like containers but provide strong workload isolation. + +Refer to the `Confidential Containers overview `_ for details on the reference architecture and supported platforms. + .. _coco-prerequisites: Prerequisites ============= -* You are using a supported platform for confidential containers. For more information, refer to :ref:`supported-platforms`. In particular: +* You are using a supported platform for confidential containers. For more information, refer to `Confidential Containers supported platforms `_. In particular: * You selected and configured your hardware and BIOS to support confidential computing. * You installed and configured Ubuntu 25.10 as host OS with its default kernel to support confidential computing. @@ -34,7 +37,9 @@ Prerequisites * Run ``sudo update-grub`` after making the change to configure the bootloader. Reboot the host after configuring the bootloader. -* You have a Kubernetes cluster and you have cluster administrator privileges. For this cluster, you are using containerd 2.1 and Kubernetes version v1.34. These versions have been validated with the kata-containers project and are recommended. You use a ``runtimeRequestTimeout`` of more than 5 minutes in your `kubelet configuration `_ (the current method to pull container images within the confidential container may exceed the two minute default timeout in case of using large container images). +* You have a Kubernetes cluster and you have cluster administrator privileges. + * For this cluster, you are using containerd 2.1 and Kubernetes version v1.34. These versions have been validated with the kata-containers project and are recommended. You use a ``runtimeRequestTimeout`` of more than 5 minutes in your `kubelet configuration `_ (the current method to pull container images within the confidential container may exceed the two minute default timeout in case of using large container images). + * Make sure ``KubeletPodResourcesGet`` is enabled on your cluster. The NVIDIA GPU runtime classes use VFIO cold-plug, which requires the Kata runtime to query Kubele`’s Pod Resources API to discover allocated GPU devices during sandbox creation. For Kubernetes versions older than 1.34, you must explicitly enable the ``KubeletPodResourcesGet`` feature gate in your Kubelet configuration. For Kubernetes 1.34 and later, this feature is enabled by default. .. _installation-and-configuration: diff --git a/gpu-operator/confidential-containers.rst b/gpu-operator/confidential-containers.rst deleted file mode 100644 index 760bf64f7..000000000 --- a/gpu-operator/confidential-containers.rst +++ /dev/null @@ -1,168 +0,0 @@ -.. _early-access-gpu-operator-confidential-containers-kata: - -**************************************************************************** -Early Access: NVIDIA GPU Operator with Confidential Containers based on Kata -**************************************************************************** - -.. note:: - - **Early Access Support** - - Early Access (EA) features are not supported in production environments and are not functionally complete. EA features provide a preview of upcoming product features, enabling customers to test functionality and provide feedback during the development process. These releases may not have complete documentation, and testing is limited. Additionally, API and architectural designs are not final and may change in the future. - -.. note:: - - This EA release only supports the AMD platform using SEV-SNP. - Intel TDX support is planned for a future release. - -.. _confidential-containers-nvidia-gpu-early-access: - - - -Overview -======== - - -NVIDIA GPUs power the training and deployment of Frontier Models—world-class Large Language Models (LLMs) that define the state of the art in AI reasoning and capability. As organizations adopt these models in regulated industries such as financial services, healthcare, and the public sector, protecting model intellectual property and sensitive user data becomes essential. - -While securing data at rest and in transit is industry standard, protecting data in use remains a critical gap. Confidential Computing (CC) addresses this gap by providing isolation, encryption, and integrity verification of proprietary application code and sensitive data during processing. CC uses hardware-based Trusted Execution Environments (TEEs)—such as AMD SEV—to create protected enclaves in both CPU and GPU. - -The TEE provides embedded encryption keys and an attestation mechanism through cryptographic verification to ensure that keys are only accessible by authorized application code. - -`Confidential Containers `_ (CoCo) is the cloud-native approach of CC in Kubernetes. `Kata Containers `_ is an open source project that provides lightweight utility Virtual Machines (UVMs) that feel and perform like containers but provide strong workload isolation. Along with the CoCo project, Kata enables the orchestration of secure, GPU-accelerated workloads in Kubernetes. - -This page describes NVIDIA's Early Access implementation for orchestrating CoCo with Kata using the NVIDIA GPU Operator. - -Architecture Overview ---------------------- - -The following high-level flow and diagram show some fundamental concepts for CoCo. The NVIDIA GPU operator is a central component to enable this workflow. In the following section, we describe the most important components and the deployment scenario. - -.. image:: graphics/CoCo-Architecture.png - :alt: High-Level Logical Diagram of Software Components and Communication Paths - -*High-Level Logical Diagram of Software Components and Communication Paths* - -1. The GPU Operator sets up the nodes to be Confidential Computing ready. - - * During installation, the GPU Operator deploys the components needed to run Confidential Containers to nodes that meet Confidential Container requirements. - -2. Application Owner deploys container with Kata-Confidential Containers runtime class. - - * The user deploys a confidential container GPU workload, which is placed onto a specific node by the Kubernetes control plane. On this node, the local Kubelet instructs its container runtime to create this pod. - * Containerd is configured to run a Kata runtime to start the Kata CVM. - * The Kata runtime starts the Kata CVM using the upstream Kata Containers kernel and NVIDIA initial RAM disk containing the VM's root filesystem. - * In the Kata CVM's early boot phase, the `NVRC `_ prepares the passthrough GPU for container access. - * Kata agent starts containers in the Kata CVM. - * The confidential containers attestation agent exercises remote attestation based on the Remote ATtestation ProcedureS (RATS) model in concert with the Confidential Containers' Trustee solution. As part of this, the attestation agent transitions the GPU into the Ready state. Refer to the attestation section for more details. - -.. _key-software-components-gpu-operator: - -Key Software Components of the NVIDIA GPU Operator -=================================================== - -NVIDIA GPU Operator brings together the following software components to simplify managing the software required for confidential computing and deploying confidential container workloads: - -**NVIDIA Kata Manager for Kubernetes** - -GPU Operator deploys the NVIDIA Kata Manager for Kubernetes, k8s-kata-manager. The manager is responsible for creating host-side CDI specifications for GPU passthrough. - -**NVIDIA Confidential Computing Manager for Kubernetes** - -GPU Operator deploys the manager, k8s-cc-manager, to set the confidential computing (CC) mode on the NVIDIA GPUs. - -**NVIDIA Sandbox Device Plugin** - -GPU Operator deploys the sandbox device plugin, nvidia-sandbox-device-plugin-daemonset, to discover NVIDIA GPUs along with their capabilities, to advertise these to Kubernetes, and to allocate GPUs during pod deployment. - -**NVIDIA VFIO Manager** - -GPU Operator deploys the VFIO manager, nvidia-vfio-manager, to bind discovered NVIDIA GPUs to the vfio-pci driver for VFIO passthrough. - -**Node Feature Discovery (NFD)** - -When you install NVIDIA GPU Operator for confidential computing, you must specify the ``nfd.nodefeaturerules=true`` option. This option directs the Operator to install node feature rules that detect CPU security features and the NVIDIA GPU hardware. You can confirm the rules are installed by running ``kubectl get nodefeaturerules nvidia-nfd-nodefeaturerules``. - -On nodes that have NVIDIA Hopper family GPU and AMD SEV-SNP, NFD adds labels to the node such as ``"feature.node.kubernetes.io/cpu-security.sev.snp.enabled": "true"`` and ``"nvidia.com/cc.capable": "true"``. NVIDIA GPU Operator only deploys the operands for confidential containers on nodes that have the ``"nvidia.com/cc.capable": "true"`` label. - -Cluster Topology Considerations ---------------------------------- - -You can configure all the worker nodes in your cluster for running GPU workloads with confidential containers or you configure some nodes for confidential containers and the others for traditional containers. Consider the following example where node A is configured to run traditional containers and node B is configured to run confidential containers. - -.. list-table:: - :widths: 50 50 - :header-rows: 1 - - * - Node A - Traditional Containers receives the following software components - - Node B - Kata CoCo receives the following software components - * - * NVIDIA Driver Manager for Kubernetes - * NVIDIA Container Toolkit - * NVIDIA Device Plugin for Kubernetes - * NVIDIA DCGM and DCGM Exporter - * NVIDIA MIG Manager for Kubernetes - * Node Feature Discovery - * NVIDIA GPU Feature Discovery - - * NVIDIA Kata Manager for Kubernetes - * NVIDIA Confidential Computing Manager for Kubernetes - * NVIDIA Sandbox Device Plugin - * NVIDIA VFIO Manager - * Node Feature Discovery - -This configuration can be controlled through node labelling as described in :ref:`confidential-containers-deploy`. - -.. _supported-platforms: - -Supported Platforms -=================== - -Refer to the *Confidential Computing Deployment Guide* at the https://docs.nvidia.com/confidential-computing website for information about supported NVIDIA GPUs, such as the NVIDIA Hopper H100, and specifically to https://docs.nvidia.com/cc-deployment-guide-snp.pdf for setup specific to AMD SEV-SNP machines. - -The following topics in the deployment guide apply to a cloud-native environment: - -* Hardware selection and initial hardware configuration, such as BIOS settings. -* Host operating system selection, initial configuration, and validation. - -When following the cloud-native sections in above linked deployment guide, use Ubuntu 25.10 as host OS with its default kernel version and configuration. - -The remaining configuration topics in the deployment guide do not apply to a cloud-native environment. NVIDIA GPU Operator performs the actions that are described in these topics. - -For scope of this EA, the following is the validated support matrix. Any other combination has not been evaluated: - -.. list-table:: - :widths: 50 50 - :header-rows: 1 - - * - Component - - Release - * - GPU Platform - - Hopper 100/200 - * - GPU Driver - - R580 TRD 3 - * - kata-containers/kata-containers - - 3.24.0 - * - NVIDIA/gpu-operator - - v25.10.0 and higher - -.. _limitations-and-restrictions: - -Limitations and Restrictions -============================= - -* Only the AMD platform using SEV-SNP is supported for Confidential Containers Early Access. -* GPUs are available to containers as a single GPU in passthrough mode only. Multi-GPU passthrough and vGPU are not supported. -* Support is limited to initial installation and configuration only. Upgrade and configuration of existing clusters to configure confidential computing is not supported. -* Support for confidential computing environments is limited to the implementation described on this page. -* NVIDIA supports the GPU Operator and confidential computing with the containerd runtime only. -* OpenShift is not supported in the Early Access release. -* NFD doesn't label all Confidential Container capable nodes as such automatically. In some cases, users must manually label nodes to deploy the NVIDIA Confidential Computing Manager for Kubernetes operand onto these nodes as described below. - -Deployment and Configuration -============================= - -For detailed instructions on deploying and configuring confidential containers with the NVIDIA GPU Operator, refer to the following guide: - -.. toctree:: - :maxdepth: 2 - - confidential-containers-deploy diff --git a/gpu-operator/index.rst b/gpu-operator/index.rst index afa96c50b..05f5bbf78 100644 --- a/gpu-operator/index.rst +++ b/gpu-operator/index.rst @@ -55,8 +55,8 @@ :titlesonly: :hidden: - KubeVirt - Confidential Containers + KubeVirt + Confidential Containers .. toctree:: :caption: Specialized Networks diff --git a/repo.toml b/repo.toml index 0510a4fd1..abc9b5ab1 100644 --- a/repo.toml +++ b/repo.toml @@ -82,6 +82,7 @@ project_build_order = [ "gpu-telemetry", "openshift", "gpu-operator", + "confidential-containers", "edge", "kubernetes", "partner-validated", @@ -194,6 +195,7 @@ redirects = [ { path="openshift/install-gpu-ocp.html", project="openshift", target="install-gpu-ocp.html" }, { path="dra-crds.html", target="dra-intro-install.html" }, { path="dra-gpus.html", target="dra-intro-install.html" }, + { path="confidential-containers.html", target="confidential-containers-deploy.html" }, ] [repo_docs.projects.gpu-operator.builds.linkcheck] @@ -201,6 +203,18 @@ build_by_default = false output_format = "linkcheck" +[repo_docs.projects.confidential-containers] +docs_root = "${root}/confidential-containers" +project = "confidential-containers" +name = "NVIDIA Confidential Containers Architecture" +version = "25.10" +copyright_start = 2020 + +[repo_docs.projects.confidential-containers.builds.linkcheck] +build_by_default = false +output_format = "linkcheck" + + [repo_docs.projects.openshift] docs_root = "${root}/openshift" project = "gpu-operator-openshift" diff --git a/review/index.rst b/review/index.rst index 6b5c3afa5..a23a6098b 100644 --- a/review/index.rst +++ b/review/index.rst @@ -28,3 +28,4 @@ Refer to the following URLs for the review HTML: * `NVIDIA GPU Operator on Red Hat OpenShift Container Platform <./openshift/latest/index.html>`__ * `NVIDIA GPUs and Edge Computing <./edge/latest/index.html>`__ * `Partner-Validated Configurations <./partner-validated/latest/index.html>`__ +* `NVIDIA Confidential Containers <./confidential-containers/latest/index.html>`__