From 2f0c9bbe425db5b2946e3b385d13cd5a4ce5dce0 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Wed, 1 May 2024 17:38:40 -0700 Subject: [PATCH 01/18] initial version of the cl_khr_unified_svm extension --- api/cl_khr_unified_svm.asciidoc | 16 + extensions/cl_khr_unified_svm.asciidoc | 1082 ++++++++++++++++++++++++ xml/cl.xml | 238 +++++- 3 files changed, 1332 insertions(+), 4 deletions(-) create mode 100644 api/cl_khr_unified_svm.asciidoc create mode 100644 extensions/cl_khr_unified_svm.asciidoc diff --git a/api/cl_khr_unified_svm.asciidoc b/api/cl_khr_unified_svm.asciidoc new file mode 100644 index 000000000..a34bfecea --- /dev/null +++ b/api/cl_khr_unified_svm.asciidoc @@ -0,0 +1,16 @@ +// Copyright 2024 The Khronos Group Inc. +// SPDX-License-Identifier: CC-BY-4.0 + +include::{generated}/meta/{refprefix}cl_khr_unified_svm.txt[] + +=== Other Extension Metadata + +TODO + +=== Description + +TODO + +=== Version History + +TODO diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc new file mode 100644 index 000000000..0a735aed9 --- /dev/null +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -0,0 +1,1082 @@ += cl_khr_unified_svm + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C, +// for syntax highlighting purposes. +:language: c + +// This is what is needed for C++, since docbook uses c++ +// and everything else uses cpp. This doesn't work when +// source blocks are in table cells, though, so don't use +// C++ unless it is required. +//:language: {basebackend@docbook:c++:cpp} + +== Name Strings + +`cl_khr_unified_svm` + +== Contact + +Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com) + +== Contributors + +// spell-checker: disable +* Brice Videau, Argonne National Laboratory +* Kévin Petit, Arm Ltd. +* Ewan Crawford, Codeplay Software Ltd. +* Paul Fradgley, Imagination Technologies +* Pekka Jääskeläinen, Intel +* Nikhil Joshi, NVIDIA +* Balaji Calidas, Qualcomm Technologies Inc. +* TODO +// spell-checker: enable + +== Notice + +include::../copyrights.txt[] + +== Status + +Working Draft + +== Version + +Built On: {docdate} + +Revision: 0.2.0 + +== Dependencies + +This extension is written against the OpenCL API Specification Version 3.0.17. +This extension uses and extends the SVM APIs from OpenCL 2.0 and hence requires an OpenCL 2.0 platform, however it is intended to be implementable by devices supporting many diverse OpenCL versions. + +== Overview + +This extension adds additional types of Shared Virtual Memory (SVM) to OpenCL. +Compared to Coarse-Grained and Fine-Grained SVM in OpenCL 2.0 and newer, the additional types of Shared Virtual Memory added by this this extension provide: + +* Sufficient functionality to implement "Unified Shared Memory" (USM) in other APIs, such as SYCL. + +* Additional control over the ownership and accessibility of SVM allocations, to more precisely choose between application performance and programmer convenience. + +* A simpler programming model, by automatically migrating more SVM allocations between devices and the host, or by accessing more SVM allocations on the host without needing to map or unmap the allocation. + +Specifically, this extension provides: + +* Extensible interfaces to support many types of SVM, including the SVM types defined in core OpenCL, in this extension, and additional SVM types defined by other combinations of SVM capabilities. + +* Explicit control over memory placement and migration by supporting device-owned SVM allocations for best performance, host-owned SVM allocations for wide visibility, and shared SVM allocations that may migrate between devices and the host. + +* The ability to query detailed SVM capabilities for each SVM allocation type supported by a platform and device. + +* Additional properties to control how memory is allocated and freed, including properties to associate an SVM allocation with both a device and a context. + +* A mechanism to indicate that a kernel may access SVM allocations indirectly, without passing a set of indirectly accessed SVM allocations to the kernel, improving usability and reducing driver overhead for kernels that access many SVM allocations. + +* A new query function to query properties of an SVM allocation. + +* A new function to suggest an SVM allocation type for a set of SVM capabilities. + +== New API Functions + +[source] +---- +void* clSVMAllocWithPropertiesKHR( + cl_context context, + const cl_svm_alloc_properties_khr* properties, + cl_uint svm_type_index, + size_t size, + cl_int* errcode_ret); + +cl_int clSVMFreeWithPropertiesKHR( + cl_context context, + const cl_svm_free_properties_khr* properties, + cl_svm_free_flags_khr flags, + void* ptr); + +cl_int clGetSVMPointerInfoKHR( + cl_context context, + cl_device_id device, // optional - generic input? + const void* ptr, + cl_svm_pointer_info_khr param_name, + size_t param_value_size, + void* param_value, + size_t* param_value_size_ret); + +cl_int clGetSVMSuggestedTypeIndexKHR( + cl_context context, + cl_svm_capabilities_khr required_capabilities, + cl_svm_capabilities_khr desired_capabilities, + const cl_svm_alloc_properties_khr* properties, + size_t size, + cl_uint* suggested_svm_type_index); +---- + +== New API Enums + +Bitfield type and bits describing the SVM capabilities for a SVM allocation type: + +[source] +---- +typedef cl_bitfield cl_svm_capabilities_khr; + +/* cl_svm_capabilities_khr */ +#define CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR (1 << 0) +#define CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR (1 << 1) +#define CL_SVM_CAPABILITY_DEVICE_OWNED_KHR (1 << 2) +#define CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR (1 << 3) +#define CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR (1 << 4) +#define CL_SVM_CAPABILITY_HOST_OWNED_KHR (1 << 5) +#define CL_SVM_CAPABILITY_HOST_READ_KHR (1 << 6) +#define CL_SVM_CAPABILITY_HOST_WRITE_KHR (1 << 7) +#define CL_SVM_CAPABILITY_HOST_MAP_KHR (1 << 8) +#define CL_SVM_CAPABILITY_DEVICE_READ_KHR (1 << 9) +#define CL_SVM_CAPABILITY_DEVICE_WRITE_KHR (1 << 10) +#define CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR (1 << 11) +#define CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR (1 << 12) +#define CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR (1 << 13) +#define CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR (1 << 14) +---- + +Convenience macros describing required properties for several common SVM allocation types: + +[source] +---- +#define CL_SVM_TYPE_MACRO_COARSE_GRAIN_BUFFER_KHR /* ... */ +#define CL_SVM_TYPE_MACRO_FINE_GRAIN_BUFFER_KHR /* ... */ +#define CL_SVM_TYPE_MACRO_DEVICE_KHR /* ... */ +#define CL_SVM_TYPE_MACRO_HOST_KHR /* ... */ +#define CL_SVM_TYPE_MACRO_SINGLE_DEVICE_SHARED_KHR /* ... */ +#define CL_SVM_TYPE_MACRO_SYSTEM_KHR /* ... */ +---- + +Accepted value for the _param_name_ parameter to *clGetPlatformInfo* to query combinations of SVM capabilities defining the SVM types supported by an OpenCL platform: + +[source] +---- +/* cl_platform_info */ +#define CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR 0x0909 +---- + +Accepted value for the _param_name_ parameter to *clGetDeviceInfo* to query combinations of SVM capabilities defining the SVM types supported by an OpenCL device: + +[source] +---- +/* cl_device_info */ +#define CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR 0x1077 +---- + +Type to describe optional SVM allocation properties, and allocation properties added by this extension: + +[source] +---- +typedef cl_properties cl_svm_alloc_properties_khr; + +/* cl_svm_alloc_properties_khr */ +#define CL_SVM_ALLOC_ASSOCIATED_DEVICE_HANDLE_KHR 0x2078 +#define CL_SVM_ALLOC_ACCESS_FLAGS_KHR 0x2079 +#define CL_SVM_ALLOC_ALIGNMENT_KHR 0x207A + +typedef cl_bitfield cl_svm_alloc_access_flags_khr; + +/* cl_svm_alloc_access_flags_khr */ +#define CL_SVM_ALLOC_ACCESS_HOST_NOREAD_KHR (1 << 0) +#define CL_SVM_ALLOC_ACCESS_HOST_NOWRITE_KHR (1 << 1) +/* bits 2 through 7 are reserved for additional host access flags */ +#define CL_SVM_ALLOC_ACCESS_DEVICE_NOREAD_KHR (1 << 8) +#define CL_SVM_ALLOC_ACCESS_DEVICE_NOWRITE_KHR (1 << 9) +/* bits 10 through 15 are reserved for additional device access flags */ +/* bits 16 and beyond are reserved for future use */ +---- + +Type to describe optional SVM free properties. +No free properties are added by this extension: + +[source] +---- +typedef cl_properties cl_svm_free_properties_khr; +---- + +Type to describe SVM free flags, and SVM free flags added by this extension: + +[source] +---- +// TODO: should this be a bitfield, or is this an enum? +// If it is an enum, it should be renamed. +typedef cl_bitfield cl_svm_free_flags_khr; + +/* cl_svm_free_flags_khr */ +#define CL_SVM_FREE_BLOCKING_KHR (1 << 0) +---- + +Enumeration type and values for the _param_name_ parameter to *clGetSVMPointerInfoKHR* to query information about an SVM allocation. + +[source] +---- +typedef cl_uint cl_svm_pointer_info_khr; + +#define CL_SVM_INFO_TYPE_INDEX_KHR 0x2088 +#define CL_SVM_INFO_CAPABILITIES_KHR 0x2089 +#define CL_SVM_INFO_PROPERTIES_KHR 0x208A +#define CL_SVM_INFO_ACCESS_FLAGS_KHR 0x208B +#define CL_SVM_INFO_BASE_PTR_KHR 0x419B +#define CL_SVM_INFO_SIZE_KHR 0x419C +#define CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR 0x419D +---- + +Accepted values for the _param_name_ parameter to *clSetKernelExecInfo* to enable and disable indirect access to SVM allocations made by *clSVMAllocWithPropertiesKHR* or *clSVMAlloc*: + +[source] +---- +/* cl_kernel_exec_info */ +#define CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR 0x11BB +---- + +== Modifications to the OpenCL API Specification + +=== Section 4.1 - Querying Platform Info: + +Add to Table 3 - List of supported param_names by *clGetPlatformInfo*: + +[caption="Table 5. "] +.List of supported param_names by clGetDeviceInfo +[width="100%",cols="<30%,<20%,<50%",options="header"] +|==== +| Device Info | Return Type | Description +| `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR` + | `cl_svm_capabilities_khr[]` + | Queries the combinations of SVM capabilities defining the SVM types supported by OpenCL devices in the OpenCL platform. + Returns an array of bitfields, where each bitfield in the array describes the SVM capabilities for one SVM type. + Each SVM type must be supported by at least one device in the platform, but may not be supported by all devices in the platform. + To determine the combinations of SVM capabilities defining the SVM types supported by a device, use the query `CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR`. + + Please refer to the <> table for capability values and their description. +|==== + +=== Section 4.2 - Querying Devices: + +Add to Table 5 - List of supported param_names by *clGetDeviceInfo*: + +[caption="Table 5. "] +.List of supported param_names by clGetDeviceInfo +[width="100%",cols="<30%,<20%,<50%",options="header"] +|==== +| Device Info | Return Type | Description +| `CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR` + | `cl_svm_capabilities_khr[]` + | Queries the combinations of SVM capabilities describing the SVM types supported by an OpenCL device. + Returns an array of bitfields, where each bitfield in the array describes the SVM capabilities for one SVM type. + The size of the returned array must match the size of the array returned by the platform query `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR`. + + Each entry in the returned array must be either a super-set of the entry in the array returned by the platform query `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR`, indicating that the SVM type is supported by the device, or zero, indicating that the SVM type is not supported by this device. + + Please refer to the <> table for valid capability values and their description. +|==== + +[[svm-capabilities-table]] +[caption="Table X. "] +.List of SVM capabilities +[width="100%",cols="2,3",options="header"] +|==== +| Capability | Description +| `CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR` + | There is a single address space for this type of SVM. + The same pointer may be used on the host and the device; the pointer has _address equivalence_. +| `CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR` + | This type of SVM provides access to the entire host virtual memory, including memory allocated by a system allocator such as `malloc` or `new` or objects allocated on the stack, and does not require calling *clSVMAllocWithPropertiesKHR* or *clSVMAlloc*. +| `CL_SVM_CAPABILITY_DEVICE_OWNED_KHR` + | This type of SVM is owned by an associated device handle and is not intended to migrate to another device or the host. + Allocations that are owned by a device generally trade off access limitations for higher performance. +| `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR` + | This type of SVM does not need to be associated with a device handle. +| `CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR` + | This type of SVM is accessible to other devices in the context that support the SVM type. +| `CL_SVM_CAPABILITY_HOST_OWNED_KHR` + | This type of SVM is owned by the host and is not intended to migrate to a device. + Allocations that are owned by the host generally trade off wide accessibility for potentially higher per-access costs. +| `CL_SVM_CAPABILITY_HOST_READ_KHR` + | This type of SVM is readable on the host without needing to map or unmap the allocation. +| `CL_SVM_CAPABILITY_HOST_WRITE_KHR` + | This type of SVM is writeable on the host without needing to map or unmap the allocation. +| `CL_SVM_CAPABILITY_HOST_MAP_KHR` + | This type of SVM is accessible on the host but requires mapping and unmapping the allocation. +| `CL_SVM_CAPABILITY_DEVICE_READ_KHR` + | This type of SVM is accessible on the device for reading. +| `CL_SVM_CAPABILITY_DEVICE_WRITE_KHR` + | This type of SVM is writeable on the device for writing. +| `CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR` + | This type of SVM is accessible on the device using atomic built-in functions. +| `CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR` + | This type of SVM supports concurrent access from the host and a device, or from multiple devices. +| `CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR` + | This type of SVM supports concurrent atomic access from the host and a device, or from multiple devices. +| `CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR` + | This type of SVM supports a single kernel enable to indicate that the kernel may allocate any allocation of this type, rather than passing a list of indirectly accessed allocations to the kernel. +|==== + +[NOTE] +==== +* SVM types that are `DEVICE_OWNED` must not be `DEVICE_UNASSOCIATED`. +* SVM types that are `HOST_OWNED` must be `DEVICE_UNASSOCIATED`. +* SVM types that are `HOST_OWNED` must be `HOST_ACCESSIBLE`. +* ... +==== + +[NOTE] +==== +The following table provides a high-level summary of SVM capabilities for some common SVM types: + +.High-Level Summary of Shared Virtual Memory Types and Capabilities +[width="100%",options="header"] +|==== +| SVM Type | Initial Location 2+| Accessible By 2+| Migratable To + +.2+| **Coarse-Grain Buffer SVM** .2+| Unspecified +| Host | Yes, with Map | Host | Yes, with Map +| Any Device | Yes | Device | Yes + +.2+| **Fine-Grain Buffer SVM** .2+| Unspecified +| Host | Yes | Host | Yes +| Any Device | Yes | Device | Yes + +.3+| **Device SVM** .3+| Associated Device +| Host | No | Host | No +| Associated Device | Yes | Device | N/A +| Another Device | Not With This Extension | Another Device | No + +.2+| **Host SVM** .2+| Host +| Host | Yes | Host | N/A +| Any Device | Yes (perhaps over a bus, such as PCIe) | Device | No + +.3+| **Shared SVM** .3+| Host, or Associated Device, or Unspecified +| Host | Yes | Host | Yes +| Associated Device | Yes | Device | Yes +| Another Device | Not With This Extension | Another Device | Not With This Extension + +.2+| **Shared System SVM** .2+| Host +| Host | Yes | Host | Yes +| Device | Yes | Device | Yes + +|==== +==== + +[NOTE] +==== +The following table describes the detailed set of SVM capabilities for some common SVM types: + +// Table shortcuts: +:O: Optional + +[[minimum-svm-capabilities-table]] +[caption="Table X. "] +.Set of SVM Capabilities for Common SVM Types +[width="100%",cols="2,^1,^1,^1,^1,^1,^1",options="header"] +|==== +| SVM Capability | Coarse-Grain Buffer SVM | Fine-Grain Buffer SVM | Device SVM | Host SVM | Single-Device Shared SVM | System SVM +// CG FG Dev Host SDS Sys +| `CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR` | Y | Y | Y | Y | Y | Y +| `CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR` | | | | | | Y +| `CL_SVM_CAPABILITY_DEVICE_OWNED_KHR` | | | Y | | | +| `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR` | Y | Y | | Y | | Y +| `CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR` | Y | Y | | Y | | Y +| `CL_SVM_CAPABILITY_HOST_OWNED_KHR` | | | | Y | | +| `CL_SVM_CAPABILITY_HOST_READ_KHR` | | Y | | Y | Y | Y +| `CL_SVM_CAPABILITY_HOST_WRITE_KHR` | | Y | | Y | Y | Y +| `CL_SVM_CAPABILITY_HOST_MAP_KHR` | Y | Y | | | | Y? +| `CL_SVM_CAPABILITY_DEVICE_READ_KHR` | Y | Y | Y | Y | Y | Y +| `CL_SVM_CAPABILITY_DEVICE_WRITE_KHR` | Y | Y | Y | Y | Y | Y +| `CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR` | Y | Y | Y | {O} | {O} | Y +| `CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR` | | Y | {O} | {O} | {O} | Y +| `CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR` | | {O} | {O} | {O} | {O} | Y +| `CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR` | {O} | {O} | Y | Y | Y | Y +|==== + +In this table: + +* The capabilities marked Y are supported by the SVM type. +* The capabilities marked {O} or blank may be optionally supported capabilities for the SVM type on some devices. +** The capabilities marked {O} are likely to be supported by some devices supporting the SVM type. +** The capabilities that are blank may be supported by some devices, but support is likely to be less common. + +// Un-set table shortcuts: +:!O: +==== + +=== Section 5.6 - Shared Virtual Memory: + +TODO: Probably ought to substantially rewrite portions of Section 5.6.1 and perhaps 5.6.2. + +==== Allocating SVM With Properties: + +The function + +[source] +---- +void* clSVMAllocWithPropertiesKHR( + cl_context context, + const cl_svm_alloc_properties_khr* properties, + cl_uint svm_type_index, + size_t size, + cl_int* errcode_ret); +---- + +allocates shared virtual memory with optional properties. + +_context_ is a valid OpenCL context used to allocate the shared virtual memory. + +_properties_ is an optional list of allocation properties and their corresponding values. +The list is terminated with the special property `0`. +If no allocation properties are required, _properties_ may be `NULL`. +Please refer to the <> table for valid SVM allocation properties and their description. + +_svm_type_index_ is an index into the array of supported SVM types returned by `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR` or `CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR` that specifies the type of SVM to allocate. + +_size_ is the size in bytes of the requested SVM allocation. + +_errcode_ret_ may return an appropriate error code. +If _errcode_ret_ is `NULL` then no error code will be returned. + +*clSVMAllocWithPropertiesKHR* will return a valid non-`NULL` address and `CL_SUCCESS` will be returned in _errcode_ret_ if the shared virtual memory is allocated successfully. +Otherwise, `NULL` will be returned, and _errcode_ret_ will be set to one of the following error values: + +* `CL_INVALID_CONTEXT` if _context_ is not a valid context. +* `CL_INVALID_PROPERTY` if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. +* `CL_INVALID_OPERATION` if no devices in _context_ support the SVM type specified by _svm_type_index_, or if a device associated with the SVM allocation does not support the SVM type specified by _svm_type_index_. +* `CL_INVALID_VALUE` if _svm_type_index_ is greater than the number of SVM types supported the devices in _context_. +* `CL_INVALID_BUFFER_SIZE` if _size_ is zero or greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for any OpenCL device in _context_ that supports the specified SVM type, or if _size_ is greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for a device associated with the SVM allocation. +TODO: update depending on the updated queries for available SVM sizes. +* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. +* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. + +TODO: Do we want to document any specific error conditions for invalid property values? + +[[svm-alloc-properties-table]] +[caption="Table X. "] +.List of supported SVM allocation properties by *clSVMAllocWithPropertiesKHR* +[width="100%",cols="2,1,3",options="header"] +|==== +| Allocation Property | Property Value | Description +| `CL_SVM_ALLOC_ASSOCIATED_DEVICE_HANDLE_KHR` + | `cl_device_id` + | Associates the allocation with a specific device handle. + The associated device handle property is required unless the specified + SVM type contains `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR`. + + The default value is `NULL`, which indicates that the allocation is not + associated with a specific device handle. +| `CL_SVM_ALLOC_ACCESS_FLAGS_KHR` + | `cl_svm_alloc_access_flags_khr` + | Flags specifying access information for the allocation. + If these access flags are violated, behavior is undefined. + This is a bitfield type that may be set to a combination of the following values: + + `CL_SVM_ALLOC_ACCESS_HOST_NOREAD_KHR`: the host will not read this allocation. + + `CL_SVM_ALLOC_ACCESS_HOST_NOWRITE_KHR`: the host will not write this allocation. + + `CL_SVM_ALLOC_ACCESS_DEVICE_NOREAD_KHR`: the device will not read this allocation. + + `CL_SVM_ALLOC_ACCESS_DEVICE_NOWRITE_KHR`: the device will not write this allocation. + + The default value is `0`, which indicates no special access behavior for + the host or the device for this allocation. + +| `CL_SVM_ALLOC_ALIGNMENT_KHR` + | `size_t` + | Specifies the minimum alignment in bytes for the SVM allocation. + The alignment must be a power of two and must be equal to or smaller + than the size of the largest data type supported by any OpenCL device in + _context_. + + The default value is `0`, which specifies an alignment that is equal to + the size of the largest data type supported by any OpenCL device in + _context_. + +|==== + +===== Freeing SVM Allocations + +The function + +[source] +---- +cl_int clSVMFreeWithPropertiesKHR( + cl_context context, + const cl_svm_free_properties_khr* properties, + cl_svm_free_flags_khr flags, + void* ptr); +---- + +frees an SVM allocation with optional properties. + +_context_ is a valid OpenCL context used to free the SVM allocation. + +_properties_ is an optional list of allocation properties and their corresponding values. +The list is terminated with the special property `0`. +If no free properties are required, _properties_ may be `NULL`. +This extension does not define any free properties. + +_flags_ is used to specify how the SVM allocation is freed. +Please refer to the <> table for valid SVM free flags and their description. + +_ptr_ is the SVM allocation to free. +It must be a value returned by *clSVMAlloc*, *clSVMAllocWithPropertiesKHR*, or a `NULL` pointer. +If _ptr_ is `NULL` then no action occurs. + +*clSVMFreeWithPropertiesKHR* will return `CL_SUCCESS` if the function executes successfully. +Otherwise, it returns one of the following errors: + +* `CL_INVALID_CONTEXT` if _context_ is not a valid context. +* `CL_INVALID_PROPERTY` if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. +* `CL_INVALID_VALUE` if _flags_ contains an invalid SVM free flag. +* `CL_INVALID_VALUE` if _ptr_ is not a value returned by *clSVMAlloc*, *clSVMAllocWithPropertiesKHR*, or a `NULL` pointer. +* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. +* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. + +By default, *clSVMFreeWithPropertiesKHR* does not wait for previously enqueued commands that may be using _ptr_ to finish before freeing _ptr_. +It is the responsibility of the application to make sure enqueued commands that use _ptr_ are complete before freeing _ptr_. +Behavior is undefined if a previously enqueued command that may be using _ptr_ is still executing. +Applications should take particular care freeing memory allocations with kernels that may access memory indirectly, since a kernel that accesses memory indirectly may be using any memory allocation of the specified type or types. +To wait for previously enqueued commands to finish that may be using _ptr_ before freeing _ptr_, use the flag `CL_SVM_FREE_BLOCKING_KHR`. + +[[svm-free-flags-table]] +[caption="Table 40. "] +.List of supported SVM free flag values +[width="100%",cols="1,1",options="header"] +|==== +| SVM Free Flags | Description +| `CL_SVM_FREE_BLOCKING_KHR` + | Waits for all previously executing commands fo finish that may be using the SVM allocation before freeing the SVM allocation. +|==== + +===== Querying SVM Allocations + +The function + +[source] +---- +cl_int clGetSVMPointerInfoKHR( + cl_context context, + cl_device_id device, + const void* ptr, + cl_svm_pointer_info_khr param_name, + size_t param_value_size, + void* param_value, + size_t* param_value_size_ret); +---- + +queries information about an SVM allocation. + +_context_ is a valid OpenCL context to query for information about the SVM allocation. + +_device_ is an optional OpenCL device handle to query for information about the SVM allocation. +If _device_ is `NULL`, the default device is the device associated with the SVM allocation, or all devices in the _context_ if there is no device associated with the SVM allocation. + +_ptr_ is a pointer into an SVM allocation to query. +_ptr_ need not be a value returned by *clSVMAlloc* or *clSVMAllocWithProperties*, but the query may be faster if it is. + +_param_name_ specifies the information to query. +The list of supported _param_name_ values and the information returned in _param_value_ is described in the <> table. + +_param_value_ is a pointer to memory where the appropriate result being queried is returned. +If _param_value_ is `NULL`, it is ignored. + +_param_value_size_ specifies the size in bytes of memory pointed to by _param_value_. +This size must be greater than or equal to the size of return type as described in the <> table. +If _param_value_ is `NULL`, it is ignored. + +_param_value_size_ret_ returns the actual size in bytes of data being queried by _param_name_. +If _param_value_size_ret_ is `NULL`, it is ignored. + +*clGetSVMPointerInfoKHR* returns `CL_SUCCESS` if the function is executed successfully. +Otherwise, it will return one of the following error values: + +* `CL_INVALID_CONTEXT` if _context_ is not a valid context. +* `CL_INVALID_DEVICE` if _device_ is not a valid device or is not associated with _context_. +* `CL_INVALID_VALUE` if _param_name_ is not a valid SVM allocation query. +* `CL_INVALID_VALUE` if _param_value_ is not `NULL` and _param_value_size_ is smaller than the size of the query return type. +* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. +* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. + +[[svm-queries-table]] +.List of supported param_names by clGetSVMPointerInfoKHR +[width="100%",cols="<34%,<33%,<33%",options="header"] +|==== +| *cl_svm_pointer_info_khr* | Return type | Info. returned in _param_value_ +| `CL_SVM_INFO_TYPE_INDEX_KHR` + | `cl_uint` + | Returns the SVM type index used to allocate the SVM allocation. + + Returns `CL_UINT_MAX` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + +| `CL_SVM_INFO_CAPABILITIES_KHR` + | `cl_svm_capabilities_khr` + | Returns the SVM capabilities for the SVM allocation for the specified _device_. + If _device_ is `NULL` and there is a device associated with the SVM allocation, returns the SVM capabilities for the device associated with the SVM allocation. + If _device_ is `NULL` and there is no device associated with the SVM allocation, returns the SVM capabilities for all devices in _context_ supporting the SVM allocation. + + Returns `0` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + +| `CL_SVM_INFO_PROPERTIES_KHR` + | `cl_svm_alloc_properties_khr` + | Returns the properties argument specified in *clSVMAllocWithPropertiesKHR* when _ptr_ was allocated. + + If the properties argument specified in *clSVMAllocWithPropertiesKHR* was not `NULL`, the implementation must return the values specified in the properties argument in the same order and without including additional properties. + + If the properties argument specified in *clSVMAllocWithPropertiesKHR* was `NULL`, or if _ptr_ was allocated using *clSVMAlloc*, or if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_, the implementation must return _param_value_size_ret_ equal to `0`, indicating that there are no properties to be returned. + +| `CL_SVM_INFO_ACCESS_FLAGS_KHR` + | `cl_svm_alloc_access_flags_khr` + | Returns access flags for the SVM allocation, specified by the `CL_SVM_ALLOC_ACCESS_FLAGS_KHR` property. + + Returns `0` if the `CL_SVM_ALLOC_ACCESS_FLAGS_KHR` property was not specified when _ptr_ was allocated, or if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + + TODO: Check if `0` is the right default in all cases. + If _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_ should we return `NOREAD | NOWRITE` instead? + What if _device_ is different than the device associated with the SVM allocation? + +| `CL_SVM_INFO_BASE_PTR_KHR` + | `void*` + | Returns the base address of the SVM allocation. + + Returns `NULL` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + +| `CL_SVM_INFO_SIZE_KHR` + | `size_t` + | Returns the size in bytes of the SVM allocation. + + Returns `0` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + +| `CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR` + | `cl_device_id` + | Returns the device associated with the SVM allocation. + + Returns `NULL` if the SVM allocation has no associated device handle, or if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. +|==== + +===== Suggesting an SVM Type + +The function + +[source] +---- +cl_int clGetSVMSuggestedTypeIndexKHR( + cl_context context, + cl_svm_capabilities_khr required_capabilities, + cl_svm_capabilities_khr desired_capabilities, + const cl_svm_alloc_properties_khr* properties, + size_t size, + cl_uint* suggested_svm_type_index); +---- + +suggests an SVM allocation type that meets the required SVM capabilities. + +_context_ is a valid OpenCL context to query. + +_required_capabilities_ specifies SVM capabilities that must be supported by the suggested SVM type. + +_desired_capabilities_ specifies additional desired SVM capabilities that may influence the suggested SVM type, but that may not be supported by the suggested SVM type. +_desired_capabilities_ may be zero if no capabilities are desired other than those specified by _required_capabilities_. + +_properties_ is an optional list of allocation properties and their corresponding values. +The list is terminated with the special property `0`. +If no allocation properties are required, _properties_ may be `NULL`. +Please refer to the <> table for valid SVM allocation properties and their description. + +_size_ is the size in bytes for the suggestion. +If _size_ is `0`, it is ignored. + +_suggested_svm_type_index_ is a pointer that will contain the result of the query. +The suggested SVM type may be `CL_UINT_MAX`, indicating that there is no SVM allocation type for the _context_ and devices in _device_list_ that support the _required_capabilities_. + +*clGetSuggestedSVMTypeKHR* returns `CL_SUCCESS` if the query executed successfully. Otherwise, it returns one of the following errors: + +* `CL_INVALID_CONTEXT` if _context_ is not a valid context. +* `CL_INVALID_PROPERTY` if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. +* `CL_INVALID_VALUE` if _required_capabilities_ or _desired_capabilities_ contains an invalid SVM capability. +* `CL_INVALID_BUFFER_SIZE` if _size_ is greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for any OpenCL device in _context_ or if _size_ is greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for a device associated with the SVM allocation. +TODO: update depending on the updated queries for available SVM sizes. +* `CL_INVALID_VALUE` if _suggested_svm_type_index_ is `NULL`. +* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. +* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. + +===== Using SVM with Kernels + +SVM allocations may be accessed by kernels indirectly, without passing a pointer to the allocation as a kernel argument. +The new _param_name_ values described below may be used with the existing *clSetKernelExecInfo* function to describe how SVM allocations are accessed indirectly by a kernel: + +[caption="Table 28. "] +.List of supported param_names by clSetKernelExecInfo +[width="100%",cols="<34%,<33%,<33%",options="header"] +|==== +| *cl_kernel_exec_info* | Type | Description +| `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` + | `cl_bool` + | Specifies whether SVM allocations from *clSVMAlloc* or *clSVMAllocWithPropertiesKHR* may be accessed indirectly within a kernel. + + When `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` is `CL_FALSE`, the kernel may only access SVM allocations from *clSVMAlloc* or *clSVMAllocWithPropertiesKHR* that are explicitly passed as kernel arguments or using `CL_KERNEL_EXEC_INFO_SVM_PTRS`. + + When `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` is `CL_TRUE`, the kernel may access any SVM pointers allocated by *clSVMAlloc* or *clSVMAllocWithPropertiesKHR* on any device where the SVM allocation type includes `CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR`. + + By default, indirect access is disabled for all SVM allocations (except fine-grain system SVM allocations, see `CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM`), indicating that the kernel will only access SVM allocations that are explicitly passed as kernel arguments or using `CL_KERNEL_EXEC_INFO_SVM_PTRS`. +|==== + +The following errors may be returned by *clSetKernelExecInfo* for these new _param_name_ values: + +* `CL_INVALID_OPERATION` if _param_name_ is `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` and no devices in the context associated with _kernel_ support SVM. + +== Interactions with Other Extensions + +TODO + +`cl_intel_unified_shared_memory`: + +* Need to document interaction with individual indirect access enable flags. +* Plus more interactions. + +Interactions with command buffers? + +== Issues + +. Is there a minimum supported granularity for concurrent access? For example, might it be possible to concurrently access different pages of an allocation, but not different bytes within the same page? ++ +-- +*UNRESOLVED*: +Need to solve now. +Check the Vulkan query for `nonCoherentAtomSize`. +-- + +. What other SVM allocation properties should we support? ++ +-- +`RESOLVED`: We decided not to accept any `cl_mem_flags` or `cl_svm_mem_flags`, and added access properties instead. +-- + +. Do we need separate "concurrent access" capabilities for host access vs. device access? ++ +-- +`RESOLVED`: +The initial version of this extension will only have a single capability for all types of concurrent access. +-- + +. What would we need to add to support system allocations? ++ +-- +`RESOLVED`: No longer applicable. +-- + +. Do we need the ability to "register" or "use" an existing host allocations? ++ +-- +`RESOLVED`: +The initial version of this extension will only support allocating host memory. +-- + +. Do we want to support both a _flags_ argument and a _properties_ argument to the USM allocation APIs? ++ +-- +`RESOLVED`: No, we will not support a _flags_ argument, and we will only support _properties_. +-- + +. What should behavior be for *clGetSVMPointerInfoKHR* if the passed-in _ptr_ is `NULL` or doesn't point into an SVM allocation? ++ +-- +`RESOLVED`: The behavior is defined for all queries for this case. +-- + +. Do we want separate "memset" APIs to set to different sized "value", such as 8-bits, 16-bits?, 32-bits, or others? Do we want to go back to a "fill" API? ++ +-- +`RESOLVED`: We are reusing the "fill" API. +-- + +. What are the restrictions for the _dst_ptr_ values that can be passed to the "fill" API? ++ +-- +*UNRESOLVED*: +Need to close on: + +* Can a device "fill" another device's allocation? (Recommendation: Yes, if accessible.) +* Can a device "fill" arbitrary host memory? (Recommendation: Maybe?) +* Can a device "fill" a USM allocation from another context? (Recommendation: No.) +-- + +. What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to the "memcpy" API? ++ +-- +*UNRESOLVED*: +Need to close on: + +* Can a device "memcpy" from another device's allocation? +* Can a device "memcpy" to another device's allocation? +* Can a device "memcpy" to or from a USM allocation in another context? (Recommendation: No?) +* Can a device "memcpy" to arbitrary host memory? (Recommendation: Yes.) +* Can a device "memcpy" from arbitrary host memory? (Recommendation: Yes.) +* Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Recommendation: Yes.) +* Can the memory region to copy to overlap the memory region to copy from? (Recommendation: No.) +-- + +. Do we want to support migrating to devices other than the device associated with _command_queue_? ++ +-- +`RESOLVED`*: +The initial version of this extension will not extend *clEnqueueSVMMigrateMem*, and hence will only support migrating to the device or to the host. +-- + +. Should we support migrating an array of pointers with one API call? ++ +-- +`RESOLVED`: This is supported by *clEnqueueSVMMigrateMem*. +-- + +. Could the associated device be `NULL` if there is no need to associate a shared allocation to a specific device? ++ +-- +`RESOLVED`: Yes, the associated device may be `NULL`, if the SVM type supports the `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR` capability. +-- + +. Should we allow querying the associated device for a USM allocation using *clGetSVMPointerInfoKHR*? ++ +-- +`RESOLVED`: Yes, we should. +-- + +. Should we add explicit mem alloc flags for `CACHED` and `UNCACHED`? ++ +-- +*UNRESOLVED*: +Could be specific capabilities rather than mem alloc flags. +Solve (or at least have explored a layered extension) for the final spec. +-- + +. At least for HOST and SHARED allocations, should we have separate mem alloc flags for the host and the device? ++ +-- +`RESOLVED`: We removed the _flags_ argument entirely. +-- + +. What are invalid values for `ptr` and `size` for *clEnqueueSVMMigrateMem*? +How about *clEnqueueSVMMemFill* and *clEnqueueSVMMemcpy*? +Specifically, is `NULL` a valid value for `ptr`? +Is `size` equal to zero valid? ++ +-- +*UNRESOLVED*: +-- + +. Should we add a device query for a maximum supported SVM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device? +Should we allow implementation-defined behavior for alignments larger than the size of the largest data type supported by the device? ++ +-- +*UNRESOLVED*: +A device query would allow for larger supported alignments, such as page alignment. +Note that supported alignments should always be a power of two. + +Note that there are no maximum supported alignments defined for `posix_memalign` or `_aligned_alloc`, and supported alignments for the standard `aligned_alloc` and `std::aligned_alloc` are implementation-defined. + +Suggest adding a device query and use it to determine the maximum supported alignment error code. +-- + +. Should we add a device query for a maximum supported SVM fill pattern size, or should the maximum supported fill pattern size implicitly be defined by the size of the largest data type supported by the device? ++ +-- +`RESOLVED`: +The initial version of this extension will not support larger fill patterns. +-- + +. Can a pointer to a device, host, or shared SVM allocation be used to create a `cl_mem` using `CL_MEM_USE_HOST_PTR`? ++ +-- +*UNRESOLVED*: +Trending "no" in all cases. +If the SVM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the SVM allocation is from a different context then behavior could be undefined. +-- + +. Can a pointer to a device, host, or shared SVM allocation be used to create a `cl_mem` buffer using `CL_MEM_COPY_HOST_PTR`? ++ +-- +*UNRESOLVED*: +Trending "no" for device and shared USM allocations. +If the USM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the USM allocation is from a different context then behavior could be undefined. + +Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context. +-- + +. Can a pointer to a device, host, or shared SVM allocation be passed to API functions to read from or write to `cl_mem` objects, such as *clEnqueueReadBuffer* or *clEnqueueWriteImage*? ++ +-- +*UNRESOLVED*: +Trending "yes" for device SVM allocations, so long as the device SVM allocation is accessible by the device associated with the command-queue, and the device allocation was made against the context associated with the command-queue. + +Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context. + +Trending "no" for shared USM allocations. +If the shared USM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the shared USM allocation is from a different context then behavior could be undefined. +-- + +. Can a pointer to a device, host, or shared USM allocation be passed to API functions to fill a `cl_mem`, SVM allocation, or USM allocation, such as *clEnqueueFillBuffer*? ++ +-- +*UNRESOLVED*: +Trending "no" for device and shared allocations. +If the USM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the USM allocation is from a different context then behavior could be undefined. + +Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context. +-- + +. Should we support passing traditional `cl_mem_flags` via the USM allocation properties? ++ +-- +*UNRESOLVED*: +Trending "no", this functionality is better expressed by optional access properties. +-- + +. Exactly how do the additional SVM types affect the memory model? ++ +-- +*UNRESOLVED*: +-- + +. Should it be an error to set an unknown pointer as a kernel argument using *clSetKernelArgSVMPointer* if no devices support shared system allocations? ++ +-- +*UNRESOLVED*: +Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced. + +If we relax the error condition for *clSetKernelArgSVMPointer* then we could also consider relaxing the error condition for *clSetKernelExecInfo*(`CL_KERNEL_EXEC_INFO_SVM_PTRS`) similarly. + +Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer]. +-- + +. Should we support a "rect" memcpy similar to *clEnqueueCopyBufferRect*? ++ +-- +*UNRESOLVED*: +This would be a fairly straightforward addition if it is useful. +-- + +. Should there be an upper limit on the size of an SVM allocation? +If so, what should the upper limit be? ++ +-- +*UNRESOLVED*: +The upper limit is currently defined by `CL_DEVICE_MAX_MEM_ALLOC_SIZE` and if the allocation size exceeds this value then the error code `CL_INVALID_BUFFER_SIZE` is returned. + +This behavior is consistent with *clSVMAlloc* (although *clSVMAlloc* does not return an error code it is specified to return a `NULL` pointer in this case) and *clCreateBuffer*. +However, for host allocations, some implementations are able to support larger allocation sizes. + +Possible resolutions: + +* Add a new query representing the maximum host memory allocation size supported by the device, e.g. `CL_DEVICE_MAX_HOST_MEM_ALLOC_SIZE_KHR`. +For some devices, this query will return the same value as `CL_DEVICE_MAX_MEM_ALLOC_SIZE`, but for other devices this query will return a larger value. +* Relax the error behavior so implementations may return `CL_INVALID_BUFFER_SIZE`, but they would not be required to return an error if they support larger allocation sizes. +* Do nothing and keep the existing error behavior. +-- + +. Should it be an error to allocate zero bytes? ++ +-- +*UNRESOLVED*: +Currently, attempting to allocate zero bytes fails and returns `CL_INVALID_BUFFER_SIZE`. +This is consistent with SVM, where *clSVMAlloc* fails and returns a `NULL` pointer if the size to allocate is zero. +It is also consistent with CUDA, where *cuMemAlloc*, etc. returns an error if the size to allocate is zero. + +However, it is not necessarily consistent with other memory allocation functions. For example: + +* The result of calling `malloc(0)` is implementation-defined: it can either return a `NULL` pointer or a unique non-null pointer that must be freed. +If a `NULL` pointer is returned then `errno` may be set to an implementation-defined value. +If a unique non-null pointer is returned then it cannot be dereferenced. +* Allocating an array of zero elements using `new` must return a non-null pointer, though dereferencing the pointer is undefined. + +Possible resolutions: + +* Allow zero-sized allocations and require returning a non-null pointer that must be freed. +* Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned. +* Specify that this case is implementation-defined. +* Do nothing and keep the existing error behavior. +-- + +Note: The following issues were added to the KHR USM extension: + +[start=30] +. Should we add a synchronous memadvise function? Do we need to support memadvise at all? ++ +-- +*RESOLVED*: +We decided not to support a memadvise function in the initial version of this specification. + +For reference, for other APIs: +* The Level Zero memadvise function `zeCommandListAppendMemAdvise` appears to be asynchronous, but the implementation actually seems to be synchronous. +* It is unclear whether the CUDA memadvise functions `cudaMemAdvise` / `cuMemAdvise` are synchronous or asynchronous. + +-- + +. What about devices and sub-devices? ++ +-- +*UNRESOLVED*: +-- + +. Should we move more of the *clSVMAllocWithProperties* arguments to properties? ++ +-- +*RESOLVED*: +We moved the access flags and alignment to properties, so the only required arguments are now the properties, the SVM type index, and the SVM allocation size. +-- + +. Does the *clGetSuggestedSVMCapabilitiesKHR* query apply to _all_ of the devices in the device list or context, or to _any_ of the devices in the device list or context? ++ +-- +*UNRESOLVED*: The query should probably apply to _all_ of the devices in the device list or context, though other interpretations may make sense in some cases. + +This is especially important if the required SVM capabilities contains e.g. "device owned". +-- + +. Should we support a mechanism to enable indirect access for all SVM allocation types with a single call? ++ +-- +*RESOLVED*: Yes, we should. We now have: + +* `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR`, added by this extension, which enables indirect access for all SVM allocations made through the driver (by calling *clSVMAlloc* or *clSVMAllocWithPropertiesKHR*). +Indirect access for these types of allocations is **disabled** by default. +* `CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM`, already in the core specification, which enables indirect access for SVM allocations made using a system allocator. +Indirect access for these types of allocations is **enabled** by default, though it is ignored for devices that do not support system SVM. +-- + +. How should an SVM allocation with the access flag *NOWRITE* be initialized? ++ +-- +*RESOLVED*: For this extension, if an allocation is created with the *HOST_NOWRITE* flag, then it can only be initialized on the device. +If an allocation is created with the *DEVICE_NOWRITE* flag, then it can only be initialized on the host. +This extension does not support initialize an allocation with both the *HOST_NOWRITE* and *DEVICE_NOWRITE* flags. + +If desired, a layered extension could add a new property to *clSVMAllocWithPropertiesKHR* that would specify a pointer with the initial contents of an SVM allocation with both the *HOST_NOWRITE* and *DEVICE_NOWRITE* access flags. +-- + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Version|Date|Author|Changes +|0.2.0|2024-10-29|Ben Ashbaugh|Initial public revision. +|======================================== + +//************************************************************************ +//Other formatting suggestions: +// +//* Use *bold* text for host APIs, or [source] syntax highlighting. +//* Use `mono` text for device APIs, or [source] syntax highlighting. +//* Use `mono` text for extension names, types, or enum values. +//* Use _italics_ for parameters. +//************************************************************************ diff --git a/xml/cl.xml b/xml/cl.xml index 33f45ce82..ea7fb977b 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -255,6 +255,12 @@ server's OpenCL/api-docs repository. typedef cl_bitfield cl_platform_command_buffer_capabilities_khr; typedef cl_bitfield cl_mutable_dispatch_asserts_khr typedef cl_bitfield cl_device_kernel_clock_capabilities_khr; + typedef cl_bitfield cl_svm_capabilities_khr; + typedef cl_properties cl_svm_alloc_properties_khr; + typedef cl_bitfield cl_svm_alloc_access_flags_khr; + typedef cl_properties cl_svm_free_properties_khr; + typedef cl_bitfield cl_svm_free_flags_khr; + typedef cl_uint cl_svm_pointer_info_khr; Structure types @@ -382,6 +388,70 @@ server's OpenCL/api-docs repository. const size_t* global_work_size const size_t* local_work_size + + #define CL_SVM_TYPE_MACRO_COARSE_GRAIN_BUFFER_KHR \ + (CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR | \ + CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR | \ + CL_SVM_CAPABILITY_HOST_MAP_KHR | \ + CL_SVM_CAPABILITY_DEVICE_READ_KHR | \ + CL_SVM_CAPABILITY_DEVICE_WRITE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR) + + #define CL_SVM_TYPE_MACRO_FINE_GRAIN_BUFFER_KHR \ + (CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR | \ + CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR | \ + CL_SVM_CAPABILITY_HOST_READ_KHR | \ + CL_SVM_CAPABILITY_HOST_WRITE_KHR | \ + CL_SVM_CAPABILITY_HOST_MAP_KHR | \ + CL_SVM_CAPABILITY_DEVICE_READ_KHR | \ + CL_SVM_CAPABILITY_DEVICE_WRITE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR | \ + CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR) + + #define CL_SVM_TYPE_MACRO_DEVICE_KHR \ + (CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_OWNED_KHR | \ + CL_SVM_CAPABILITY_DEVICE_READ_KHR | \ + CL_SVM_CAPABILITY_DEVICE_WRITE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR | \ + CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR) + + #define CL_SVM_TYPE_MACRO_HOST_KHR \ + (CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR | \ + CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR | \ + CL_SVM_CAPABILITY_HOST_OWNED_KHR | \ + CL_SVM_CAPABILITY_HOST_READ_KHR | \ + CL_SVM_CAPABILITY_HOST_WRITE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_READ_KHR | \ + CL_SVM_CAPABILITY_DEVICE_WRITE_KHR | \ + CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR) + + #define CL_SVM_TYPE_MACRO_SINGLE_DEVICE_SHARED_KHR \ + (CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR | \ + CL_SVM_CAPABILITY_HOST_READ_KHR | \ + CL_SVM_CAPABILITY_HOST_WRITE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_READ_KHR | \ + CL_SVM_CAPABILITY_DEVICE_WRITE_KHR | \ + CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR) + + #define CL_SVM_TYPE_MACRO_SYSTEM_KHR \ + (CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR | \ + CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR | \ + CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR | \ + CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR | \ + CL_SVM_CAPABILITY_HOST_READ_KHR | \ + CL_SVM_CAPABILITY_HOST_WRITE_KHR | \ + CL_SVM_CAPABILITY_HOST_MAP_KHR | \ + CL_SVM_CAPABILITY_DEVICE_READ_KHR | \ + CL_SVM_CAPABILITY_DEVICE_WRITE_KHR | \ + CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR | \ + CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR | \ + CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR | \ + CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR) + @@ -1233,12 +1303,40 @@ server's OpenCL/api-docs repository. + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -1253,6 +1351,7 @@ server's OpenCL/api-docs repository. + @@ -1384,6 +1483,11 @@ server's OpenCL/api-docs repository. + + + + + In order to synchronize vendor IDs across Khronos APIs, Vulkan's vk.xml @@ -1411,7 +1515,8 @@ server's OpenCL/api-docs repository. - + + @@ -1544,7 +1649,8 @@ server's OpenCL/api-docs repository. - + + @@ -1723,7 +1829,8 @@ server's OpenCL/api-docs repository. - + + @@ -1887,7 +1994,18 @@ server's OpenCL/api-docs repository. - + + + + + + + + + + + + @@ -2192,8 +2310,11 @@ server's OpenCL/api-docs repository. + + + @@ -3313,6 +3434,40 @@ server's OpenCL/api-docs repository. cl_mem buffer cl_mem content_size_buffer + + void* clSVMAllocWithPropertiesKHR + cl_context context + const cl_svm_alloc_properties_khr* properties + cl_uint svm_type_index + size_t size + cl_int* errcode_ret + + + cl_int clSVMFreeWithPropertiesKHR + cl_context context + const cl_svm_free_properties_khr* properties + cl_svm_free_flags_khr flags + void* ptr + + + cl_int clGetSVMPointerInfoKHR + cl_context context + cl_device_id device + const void* ptr + cl_svm_pointer_info_khr param_name + size_t param_value_size + void* param_value + size_t* param_value_size_ret + + + cl_int clGetSVMSuggestedTypeIndexKHR + cl_context context + cl_svm_capabilities_khr required_capabilities + cl_svm_capabilities_khr desired_capabilities + const cl_svm_alloc_properties_khr* properties + size_t size + cl_uint* suggested_svm_type_index + cl_int clGetPlatformIDs cl_uint num_entries @@ -7497,5 +7652,80 @@ server's OpenCL/api-docs repository. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + From f371ea31416962c4cded5660158b5e4c09a94b64 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Thu, 7 Nov 2024 17:43:31 -0800 Subject: [PATCH 02/18] fix asciidoctor build error --- extensions/cl_khr_unified_svm.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 0a735aed9..e47323e38 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -637,7 +637,7 @@ Otherwise, it will return one of the following error values: Returns `0` if the `CL_SVM_ALLOC_ACCESS_FLAGS_KHR` property was not specified when _ptr_ was allocated, or if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. TODO: Check if `0` is the right default in all cases. - If _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_ should we return `NOREAD | NOWRITE` instead? + If _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_ should we return `NOREAD \| NOWRITE` instead? What if _device_ is different than the device associated with the SVM allocation? | `CL_SVM_INFO_BASE_PTR_KHR` From 24792fb2c981a7cfd7947cc1552c7235a20ca0c3 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 18 Nov 2024 17:44:18 -0800 Subject: [PATCH 03/18] editorial updates --- extensions/cl_khr_unified_svm.asciidoc | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index e47323e38..88ac3c244 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -311,7 +311,7 @@ Add to Table 5 - List of supported param_names by *clGetDeviceInfo*: | `CL_SVM_CAPABILITY_DEVICE_READ_KHR` | This type of SVM is accessible on the device for reading. | `CL_SVM_CAPABILITY_DEVICE_WRITE_KHR` - | This type of SVM is writeable on the device for writing. + | This type of SVM is accessible on the device for writing. | `CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR` | This type of SVM is accessible on the device using atomic built-in functions. | `CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR` @@ -922,7 +922,7 @@ If the shared USM allocation is from the same context this could be an error, su If the shared USM allocation is from a different context then behavior could be undefined. -- -. Can a pointer to a device, host, or shared USM allocation be passed to API functions to fill a `cl_mem`, SVM allocation, or USM allocation, such as *clEnqueueFillBuffer*? +. Can a pointer to a device, host, or shared USM allocation be passed as the `pattern` argument to API functions to fill a `cl_mem`, SVM allocation, or USM allocation, such as *clEnqueueFillBuffer*? + -- *UNRESOLVED*: @@ -936,8 +936,7 @@ Trending "yes" for host USM allocations, both when the host USM allocation is fr . Should we support passing traditional `cl_mem_flags` via the USM allocation properties? + -- -*UNRESOLVED*: -Trending "no", this functionality is better expressed by optional access properties. +`RESOLVED`: We decided not to accept any `cl_mem_flags` or `cl_svm_mem_flags`, and added access properties instead. -- . Exactly how do the additional SVM types affect the memory model? @@ -1011,7 +1010,7 @@ Note: The following issues were added to the KHR USM extension: . Should we add a synchronous memadvise function? Do we need to support memadvise at all? + -- -*RESOLVED*: +`RESOLVED`: We decided not to support a memadvise function in the initial version of this specification. For reference, for other APIs: @@ -1029,7 +1028,7 @@ For reference, for other APIs: . Should we move more of the *clSVMAllocWithProperties* arguments to properties? + -- -*RESOLVED*: +`RESOLVED`: We moved the access flags and alignment to properties, so the only required arguments are now the properties, the SVM type index, and the SVM allocation size. -- @@ -1044,7 +1043,7 @@ This is especially important if the required SVM capabilities contains e.g. "dev . Should we support a mechanism to enable indirect access for all SVM allocation types with a single call? + -- -*RESOLVED*: Yes, we should. We now have: +`RESOLVED`: Yes, we should. We now have: * `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR`, added by this extension, which enables indirect access for all SVM allocations made through the driver (by calling *clSVMAlloc* or *clSVMAllocWithPropertiesKHR*). Indirect access for these types of allocations is **disabled** by default. @@ -1055,7 +1054,7 @@ Indirect access for these types of allocations is **enabled** by default, though . How should an SVM allocation with the access flag *NOWRITE* be initialized? + -- -*RESOLVED*: For this extension, if an allocation is created with the *HOST_NOWRITE* flag, then it can only be initialized on the device. +`RESOLVED`: For this extension, if an allocation is created with the *HOST_NOWRITE* flag, then it can only be initialized on the device. If an allocation is created with the *DEVICE_NOWRITE* flag, then it can only be initialized on the host. This extension does not support initialize an allocation with both the *HOST_NOWRITE* and *DEVICE_NOWRITE* flags. From 19a81cf0ef177ca043ae0548199978fca0c4eefa Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Sun, 16 Mar 2025 11:20:44 -0700 Subject: [PATCH 04/18] remove CL_SVM_FREE_BLOCKING_KHR --- extensions/cl_khr_unified_svm.asciidoc | 34 +++++++++----------------- xml/cl.xml | 6 +---- 2 files changed, 12 insertions(+), 28 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 88ac3c244..df4ce0014 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -198,23 +198,19 @@ typedef cl_bitfield cl_svm_alloc_access_flags_khr; ---- Type to describe optional SVM free properties. -No free properties are added by this extension: +No SVM free properties are added by this extension: [source] ---- typedef cl_properties cl_svm_free_properties_khr; ---- -Type to describe SVM free flags, and SVM free flags added by this extension: +Type to describe SVM free flags. +No SVM free flags are added by this extension: [source] ---- -// TODO: should this be a bitfield, or is this an enum? -// If it is an enum, it should be renamed. typedef cl_bitfield cl_svm_free_flags_khr; - -/* cl_svm_free_flags_khr */ -#define CL_SVM_FREE_BLOCKING_KHR (1 << 0) ---- Enumeration type and values for the _param_name_ parameter to *clGetSVMPointerInfoKHR* to query information about an SVM allocation. @@ -522,10 +518,12 @@ If no free properties are required, _properties_ may be `NULL`. This extension does not define any free properties. _flags_ is used to specify how the SVM allocation is freed. -Please refer to the <> table for valid SVM free flags and their description. +This extension does not define any free flags. _ptr_ is the SVM allocation to free. It must be a value returned by *clSVMAlloc*, *clSVMAllocWithPropertiesKHR*, or a `NULL` pointer. +It is the responsibility of the application to make sure enqueued commands that use _ptr_ are complete before freeing _ptr_. +Behavior is undefined if a previously enqueued command that may be using _ptr_ is still executing. If _ptr_ is `NULL` then no action occurs. *clSVMFreeWithPropertiesKHR* will return `CL_SUCCESS` if the function executes successfully. @@ -538,21 +536,11 @@ Otherwise, it returns one of the following errors: * `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. * `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. -By default, *clSVMFreeWithPropertiesKHR* does not wait for previously enqueued commands that may be using _ptr_ to finish before freeing _ptr_. -It is the responsibility of the application to make sure enqueued commands that use _ptr_ are complete before freeing _ptr_. -Behavior is undefined if a previously enqueued command that may be using _ptr_ is still executing. -Applications should take particular care freeing memory allocations with kernels that may access memory indirectly, since a kernel that accesses memory indirectly may be using any memory allocation of the specified type or types. -To wait for previously enqueued commands to finish that may be using _ptr_ before freeing _ptr_, use the flag `CL_SVM_FREE_BLOCKING_KHR`. - -[[svm-free-flags-table]] -[caption="Table 40. "] -.List of supported SVM free flag values -[width="100%",cols="1,1",options="header"] -|==== -| SVM Free Flags | Description -| `CL_SVM_FREE_BLOCKING_KHR` - | Waits for all previously executing commands fo finish that may be using the SVM allocation before freeing the SVM allocation. -|==== +[NOTE] +==== +Whether *clSVMFree* or *clSVMFreeWithPropertiesKHR* is blocking or non-blocking is unspecified. +Applications should not rely on *clSVMFree* or *clSVMFreeWithPropertiesKHR* for synchronization, nor assume that *clSVMFree* or *clVMFreeWithPropertiesKHR* cannot cause deadlocks. +==== ===== Querying SVM Allocations diff --git a/xml/cl.xml b/xml/cl.xml index ea7fb977b..b13a3eea5 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -1484,8 +1484,7 @@ server's OpenCL/api-docs repository. - - + @@ -7697,9 +7696,6 @@ server's OpenCL/api-docs repository. - - - From 97479c9d5cfb81b0cf51cc38dde94a59d1218655 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 15 Apr 2025 08:37:52 -0700 Subject: [PATCH 05/18] integrate feedback from IWOCL discussions --- extensions/cl_khr_unified_svm.asciidoc | 42 +++++++++++++++++--------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index df4ce0014..d5d2b2d01 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -680,7 +680,7 @@ _size_ is the size in bytes for the suggestion. If _size_ is `0`, it is ignored. _suggested_svm_type_index_ is a pointer that will contain the result of the query. -The suggested SVM type may be `CL_UINT_MAX`, indicating that there is no SVM allocation type for the _context_ and devices in _device_list_ that support the _required_capabilities_. +The suggested SVM type may be `CL_UINT_MAX`, indicating that there is no SVM allocation type for the _context_ and devices in _device_list_ that support the _required_capabilities_ and _properties_. *clGetSuggestedSVMTypeKHR* returns `CL_SUCCESS` if the query executed successfully. Otherwise, it returns one of the following errors: @@ -737,6 +737,7 @@ Interactions with command buffers? *UNRESOLVED*: Need to solve now. Check the Vulkan query for `nonCoherentAtomSize`. +Basic idea: Add a device query like `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR` and concurrent access is allowed if the memory type allows it and the concurrent access is to two different blocks defined by the device query. -- . What other SVM allocation properties should we support? @@ -789,9 +790,19 @@ The initial version of this extension will only support allocating host memory. *UNRESOLVED*: Need to close on: -* Can a device "fill" another device's allocation? (Recommendation: Yes, if accessible.) -* Can a device "fill" arbitrary host memory? (Recommendation: Maybe?) -* Can a device "fill" a USM allocation from another context? (Recommendation: No.) +* Can a device "fill" another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.) +* Can a device "fill" arbitrary host memory? (No, undefined behavior unless system SVM is supported.) +* Can a device "fill" a USM allocation from another context? (No, contexts provide isolation, this will need export and import.) + +Three possibilities for filling arbitrary host memory: + +1. Allocated using clSVMAlloc from the same context --> good. +2. Allocated using clSVMAlloc from a different context --> undefined behavior. +3. Not allocated using clSVMAlloc --> good iff supports system SVM, else undefined behavior. + +Check whether the existing CTS test supports clEnqueueSVMMemFill on an arbitrary host allocation before resolving. + +Need to reword the existing spec text because it's confusing. -- . What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to the "memcpy" API? @@ -800,13 +811,15 @@ Need to close on: *UNRESOLVED*: Need to close on: -* Can a device "memcpy" from another device's allocation? -* Can a device "memcpy" to another device's allocation? -* Can a device "memcpy" to or from a USM allocation in another context? (Recommendation: No?) -* Can a device "memcpy" to arbitrary host memory? (Recommendation: Yes.) -* Can a device "memcpy" from arbitrary host memory? (Recommendation: Yes.) -* Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Recommendation: Yes.) -* Can the memory region to copy to overlap the memory region to copy from? (Recommendation: No.) +* Can a device "memcpy" from another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.) +* Can a device "memcpy" to another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.) +* Can a device "memcpy" to or from a SVM allocation in another context? (No, undefined behavior.) +* Can a device "memcpy" to arbitrary host memory? (Yes, we already have tests.) +* Can a device "memcpy" from arbitrary host memory? (Yes, we already have tests.) +* Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Yes, we already have tests.) +* Can the memory region to copy to overlap the memory region to copy from? (No, already an error.) + +Check whether we have a negative test for overlapping regions before resolving. -- . Do we want to support migrating to devices other than the device associated with _command_queue_? @@ -837,9 +850,10 @@ The initial version of this extension will not extend *clEnqueueSVMMigrateMem*, . Should we add explicit mem alloc flags for `CACHED` and `UNCACHED`? + -- -*UNRESOLVED*: -Could be specific capabilities rather than mem alloc flags. -Solve (or at least have explored a layered extension) for the final spec. +`RESOLVED`: +Recommend to solve by adding cacheability properties to a layered extension. +Cacheability properties are preferred vs. capabilities to avoid a combinatorial explosion. +Could have separate cacheability properties for host vs. device, or even for specific cache levels, if desired. -- . At least for HOST and SHARED allocations, should we have separate mem alloc flags for the host and the device? From 79f8229c402028df48afe6c64cff2f718dadae6a Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 21 Apr 2025 19:31:40 -0700 Subject: [PATCH 06/18] add CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR update a few issues --- extensions/cl_khr_unified_svm.asciidoc | 74 +++++++++++++++++--------- xml/cl.xml | 4 +- 2 files changed, 53 insertions(+), 25 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index d5d2b2d01..f10a17cef 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -174,6 +174,14 @@ Accepted value for the _param_name_ parameter to *clGetDeviceInfo* to query comb #define CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR 0x1077 ---- +Accepted value for the _param_name_ parameter to *clGetDeviceInfo* to query the size and alignment for concurrent access to SVM allocations: + +[source] +---- +/* cl_device_info */ +#define CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR 0x1078 +---- + Type to describe optional SVM allocation properties, and allocation properties added by this extension: [source] @@ -275,6 +283,13 @@ Add to Table 5 - List of supported param_names by *clGetDeviceInfo*: Each entry in the returned array must be either a super-set of the entry in the array returned by the platform query `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR`, indicating that the SVM type is supported by the device, or zero, indicating that the SVM type is not supported by this device. Please refer to the <> table for valid capability values and their description. +| `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR` + | `size_t` + | Queries the size and alignment in bytes that bound concurrent access to SVM allocations. + The returned value must be zero or a power of two. + + If the device does not support concurrent access to SVM allocations, then the query must return `0`, indicating that concurrent access is not supported. + If the query for `CL_DEVICE_SVM_CAPABILITIES` includes `CL_DEVICE_SVM_FINE_GRAIN_BUFFER` or `CL_DEVICE_SVM_FINE_GRAIN_SYSTEM`, then the query must return one, indicating that concurrent access occurs at the granularity of individual bytes within allocations. |==== [[svm-capabilities-table]] @@ -312,6 +327,7 @@ Add to Table 5 - List of supported param_names by *clGetDeviceInfo*: | This type of SVM is accessible on the device using atomic built-in functions. | `CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR` | This type of SVM supports concurrent access from the host and a device, or from multiple devices. + Use `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR` to determine the supported granularity for concurrent access. | `CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR` | This type of SVM supports concurrent atomic access from the host and a device, or from multiple devices. | `CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR` @@ -735,9 +751,14 @@ Interactions with command buffers? + -- *UNRESOLVED*: -Need to solve now. -Check the Vulkan query for `nonCoherentAtomSize`. -Basic idea: Add a device query like `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR` and concurrent access is allowed if the memory type allows it and the concurrent access is to two different blocks defined by the device query. +Added a device query for `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR`. + +Still TODO: + +* What should the return type be for this query? Specifically, should it be a `size_t` or a `cl_uint`? +* Do we want to distinguish between host concurrent access and device concurrent access, even if the only query we define for now is for host concurrent access? +* Are we OK with a single query for all types of SVM, or do we need a separate query for each SVM type? +* Is this only for concurrent access without atomic access? -- . What other SVM allocation properties should we support? @@ -787,29 +808,19 @@ The initial version of this extension will only support allocating host memory. . What are the restrictions for the _dst_ptr_ values that can be passed to the "fill" API? + -- -*UNRESOLVED*: -Need to close on: +`RESOLVED`: * Can a device "fill" another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.) * Can a device "fill" arbitrary host memory? (No, undefined behavior unless system SVM is supported.) -* Can a device "fill" a USM allocation from another context? (No, contexts provide isolation, this will need export and import.) - -Three possibilities for filling arbitrary host memory: +* Can a device "fill" a USM allocation from another context? (No, undefined behavior.) -1. Allocated using clSVMAlloc from the same context --> good. -2. Allocated using clSVMAlloc from a different context --> undefined behavior. -3. Not allocated using clSVMAlloc --> good iff supports system SVM, else undefined behavior. - -Check whether the existing CTS test supports clEnqueueSVMMemFill on an arbitrary host allocation before resolving. - -Need to reword the existing spec text because it's confusing. +Note, there are no existing CTS tests that pass an arbitrary host allocation to *clEnqueueSVMMemFill*. -- . What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to the "memcpy" API? + -- -*UNRESOLVED*: -Need to close on: +`RESOLVED`: * Can a device "memcpy" from another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.) * Can a device "memcpy" to another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.) @@ -818,14 +829,12 @@ Need to close on: * Can a device "memcpy" from arbitrary host memory? (Yes, we already have tests.) * Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Yes, we already have tests.) * Can the memory region to copy to overlap the memory region to copy from? (No, already an error.) - -Check whether we have a negative test for overlapping regions before resolving. -- . Do we want to support migrating to devices other than the device associated with _command_queue_? + -- -`RESOLVED`*: +`RESOLVED`: The initial version of this extension will not extend *clEnqueueSVMMigrateMem*, and hence will only support migrating to the device or to the host. -- @@ -851,9 +860,9 @@ The initial version of this extension will not extend *clEnqueueSVMMigrateMem*, + -- `RESOLVED`: -Recommend to solve by adding cacheability properties to a layered extension. -Cacheability properties are preferred vs. capabilities to avoid a combinatorial explosion. -Could have separate cacheability properties for host vs. device, or even for specific cache levels, if desired. +Support for cacheability controls will not be included in this extension but they can be supported in a layered extension, if desired. +In a layered extension, we recommend adding cacheability properties instead of cacheability capabilities to avoid an explosion of capability combinations. +The layered extension could add coarse `CACHED` and `UNCACHED` properties, or separate properties for host vs. device, or even separate properties for specific cache levels. -- . At least for HOST and SHARED allocations, should we have separate mem alloc flags for the host and the device? @@ -869,6 +878,22 @@ Is `size` equal to zero valid? + -- *UNRESOLVED*: + +Possible resolutions: + +* A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions. +* A `size` equal to zero is undefined behavior. +* A `size` equal to zero is an error. + +* A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error. +* A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions. +* A `ptr` equal to `NULL` is an error. + +Having this be either valid or undefined behavior provides the most flexibility for other language runtimes built on top of OpenCL, although disallowing these cases reduces testing surface area and could be handled with an enqueued marker. + +In the current OpenCL spec, it is unconditionally a `CL_INVALID_VALUE` error to pass a `NULL` `ptr` to *clEnqueueSVMMigrateMem*, *clEnqueueSVMMemcpy*, or *clEnqueueSVMMemFill*. + +There is no defined error behavior for `size` equal to zero. -- . Should we add a device query for a maximum supported SVM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device? @@ -881,7 +906,8 @@ Note that supported alignments should always be a power of two. Note that there are no maximum supported alignments defined for `posix_memalign` or `_aligned_alloc`, and supported alignments for the standard `aligned_alloc` and `std::aligned_alloc` are implementation-defined. -Suggest adding a device query and use it to determine the maximum supported alignment error code. +Suggest adding a device query, perhaps as a layered extension, and use it to determine the maximum supported alignment error code. +See internal merge request 198. -- . Should we add a device query for a maximum supported SVM fill pattern size, or should the maximum supported fill pattern size implicitly be defined by the size of the largest data type supported by the device? diff --git a/xml/cl.xml b/xml/cl.xml index e84a873be..cb195841d 100644 --- a/xml/cl.xml +++ b/xml/cl.xml @@ -1650,7 +1650,8 @@ server's OpenCL/api-docs repository. - + + @@ -7697,6 +7698,7 @@ server's OpenCL/api-docs repository. + From 1d2d7be54ae1a2600bbdfa677c414c18c755c62e Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 29 Apr 2025 11:04:15 -0700 Subject: [PATCH 07/18] add another option for behavior when allocating zero bytes --- extensions/cl_khr_unified_svm.asciidoc | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index f10a17cef..2550ce270 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -894,6 +894,8 @@ Having this be either valid or undefined behavior provides the most flexibility In the current OpenCL spec, it is unconditionally a `CL_INVALID_VALUE` error to pass a `NULL` `ptr` to *clEnqueueSVMMigrateMem*, *clEnqueueSVMMemcpy*, or *clEnqueueSVMMemFill*. There is no defined error behavior for `size` equal to zero. + +Whether we allow a `NULL` `ptr` could be dependent on the behavior when allocating zero bytes, see below. -- . Should we add a device query for a maximum supported SVM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device? @@ -1026,10 +1028,11 @@ If a unique non-null pointer is returned then it cannot be dereferenced. Possible resolutions: -* Allow zero-sized allocations and require returning a non-null pointer that must be freed. -* Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned. -* Specify that this case is implementation-defined. -* Do nothing and keep the existing error behavior. +.. Allow zero-sized allocations and require returning a non-null pointer that must be freed. +.. Allow zero-sized allocations and require returning a `NULL` pointer. Note, it is not currently an error to free a `NULL` pointer. +.. Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned. +.. Specify that this case is implementation-defined. +.. Do nothing and keep the existing error behavior. -- Note: The following issues were added to the KHR USM extension: @@ -1044,7 +1047,6 @@ We decided not to support a memadvise function in the initial version of this sp For reference, for other APIs: * The Level Zero memadvise function `zeCommandListAppendMemAdvise` appears to be asynchronous, but the implementation actually seems to be synchronous. * It is unclear whether the CUDA memadvise functions `cudaMemAdvise` / `cuMemAdvise` are synchronous or asynchronous. - -- . What about devices and sub-devices? From e754d5222970d0eb9d2a38ea6505c0a7b41234fa Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Wed, 7 May 2025 11:33:33 -0700 Subject: [PATCH 08/18] updates from the May 6th memory subgroup --- extensions/cl_khr_unified_svm.asciidoc | 29 +++++++++++++++++--------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 2550ce270..b11def2a2 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -881,13 +881,12 @@ Is `size` equal to zero valid? Possible resolutions: -* A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions. -* A `size` equal to zero is undefined behavior. -* A `size` equal to zero is an error. - -* A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error. -* A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions. -* A `ptr` equal to `NULL` is an error. +.. A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions. +.. [.line-through]#A `size` equal to zero is undefined behavior.# +.. A `size` equal to zero is an error. +.. A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error. +.. [.line-through]#A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions.# +.. A `ptr` equal to `NULL` is an error. Having this be either valid or undefined behavior provides the most flexibility for other language runtimes built on top of OpenCL, although disallowing these cases reduces testing surface area and could be handled with an enqueued marker. @@ -895,6 +894,12 @@ In the current OpenCL spec, it is unconditionally a `CL_INVALID_VALUE` error to There is no defined error behavior for `size` equal to zero. +We are leaning towards some kind of defined behavior, though we have not decided which defined behavior we prefer. +In practice, if it is an error then the application needs to check the size and the pointer and enqueue a marker, whereas if it is valid then the implementation needs to check the size and pointer and enqueue the equivalent of a marker. + +If we allow a size of zero as valid, could there be a problem mutating a memory fill or a copy in a command buffer from a nonzero size to a size of zero, or vice versa? +This is not a problem today, but it could be a problem when we have the proposed `cl_khr_command_buffer_muatble_memory_commands` extension. + Whether we allow a `NULL` `ptr` could be dependent on the behavior when allocating zero bytes, see below. -- @@ -1028,11 +1033,14 @@ If a unique non-null pointer is returned then it cannot be dereferenced. Possible resolutions: -.. Allow zero-sized allocations and require returning a non-null pointer that must be freed. +.. [.line-through]#Allow zero-sized allocations and require returning a non-null pointer that must be freed.# .. Allow zero-sized allocations and require returning a `NULL` pointer. Note, it is not currently an error to free a `NULL` pointer. -.. Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned. -.. Specify that this case is implementation-defined. +.. [.line-through]#Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned.# +.. [.line-through]#Specify that this case is implementation-defined.# .. Do nothing and keep the existing error behavior. + +We are leaning towards returning `NULL` for a zero-byte allocation, but we are still deciding whether it should be an error. +In practice, the main difference is that if it is an error then a well-behaving application would be required to check the size. -- Note: The following issues were added to the KHR USM extension: @@ -1045,6 +1053,7 @@ Note: The following issues were added to the KHR USM extension: We decided not to support a memadvise function in the initial version of this specification. For reference, for other APIs: + * The Level Zero memadvise function `zeCommandListAppendMemAdvise` appears to be asynchronous, but the implementation actually seems to be synchronous. * It is unclear whether the CUDA memadvise functions `cudaMemAdvise` / `cuMemAdvise` are synchronous or asynchronous. -- From 4d741a93e460f4f24df613b357cd27b8d5ed1e92 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 3 Jun 2025 16:34:53 -0700 Subject: [PATCH 09/18] add a memory model issue about querying the event status --- extensions/cl_khr_unified_svm.asciidoc | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index b11def2a2..a086254ca 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -977,7 +977,12 @@ Trending "yes" for host USM allocations, both when the host USM allocation is fr . Exactly how do the additional SVM types affect the memory model? + -- -*UNRESOLVED*: +*UNRESOLVED*: This issue may be easier to resolve now that this is a "unified SVM" extension vs. a "USM" extension, but it will still need more thought. + +One particular enhancement we may want to consider, though, is whether calling *clGetEventInfo* and passing `CL_EVENT_COMMAND_EXECUTION_STATUS` to query the event status is a synchronization point. +In the current specification, this is explicitly not a synchronization point. +However, in other APIs, querying the event status and observing that the event is complete is a synchronization point. +Should we adopt this behavior also, or do we want users to call *clWaitForEvents* to define a synchronization point? -- . Should it be an error to set an unknown pointer as a kernel argument using *clSetKernelArgSVMPointer* if no devices support shared system allocations? From 1e5e4189e3380a9a5faab9ddd2c1e7a54ea999c1 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 1 Jul 2025 16:37:29 -0700 Subject: [PATCH 10/18] updates from the July 1st memory subgroup --- extensions/cl_khr_unified_svm.asciidoc | 68 ++++++++++++++++---------- 1 file changed, 41 insertions(+), 27 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index a086254ca..74b8202cb 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -879,28 +879,30 @@ Is `size` equal to zero valid? -- *UNRESOLVED*: -Possible resolutions: +Tentative resolution: -.. A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions. -.. [.line-through]#A `size` equal to zero is undefined behavior.# -.. A `size` equal to zero is an error. +.. A `size` equal to zero is valid. +When `size` is zero, the call to *clEnqueueSVMMigrateMem*, *clEnqueueSVMMemFill*, and *clEnqueueSVMMemcpy* trivially succeeds, similar to an enqueued marker. +This appears to be the specified behavior for the C `memcpy` and `memset` functions. .. A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error. -.. [.line-through]#A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions.# -.. A `ptr` equal to `NULL` is an error. -Having this be either valid or undefined behavior provides the most flexibility for other language runtimes built on top of OpenCL, although disallowing these cases reduces testing surface area and could be handled with an enqueued marker. +Allowing `size` to be zero and `ptr` to be `NULL` provides the most flexibility for other language runtimes built on top of OpenCL and the additional testing is manageable. -In the current OpenCL spec, it is unconditionally a `CL_INVALID_VALUE` error to pass a `NULL` `ptr` to *clEnqueueSVMMigrateMem*, *clEnqueueSVMMemcpy*, or *clEnqueueSVMMemFill*. +Note that in the current OpenCL spec, it is unconditionally a `CL_INVALID_VALUE` error to pass `ptr` equal to `NULL` for *clEnqueueSVMMigrateMem*, *clEnqueueSVMMemcpy*, or *clEnqueueSVMMemFill*, so this will need to be explicitly relaxed for implementations supporting this extension. -There is no defined error behavior for `size` equal to zero. +There is currently no defined error behavior for `size` equal to zero, so this will not need to be explicitly relaxed in this extension, but it will need to be stated explicitly and tested. -We are leaning towards some kind of defined behavior, though we have not decided which defined behavior we prefer. -In practice, if it is an error then the application needs to check the size and the pointer and enqueue a marker, whereas if it is valid then the implementation needs to check the size and pointer and enqueue the equivalent of a marker. +We should carefully consider whether we need additional restrictions around mutating a memory fill or a copy in a command buffer from a nonzero size to a size of zero, or vice versa +This is not a problem today, but it could be a problem when we have the proposed `cl_khr_command_buffer_mutable_memory_commands` extension. -If we allow a size of zero as valid, could there be a problem mutating a memory fill or a copy in a command buffer from a nonzero size to a size of zero, or vice versa? -This is not a problem today, but it could be a problem when we have the proposed `cl_khr_command_buffer_muatble_memory_commands` extension. +For reference, the full set of options we considered were: -Whether we allow a `NULL` `ptr` could be dependent on the behavior when allocating zero bytes, see below. +.. A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions. +.. [.line-through]#A `size` equal to zero is undefined behavior.# +.. A `size` equal to zero is an error. +.. A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error. +.. [.line-through]#A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions.# +.. A `ptr` equal to `NULL` is an error. -- . Should we add a device query for a maximum supported SVM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device? @@ -996,11 +998,16 @@ If we relax the error condition for *clSetKernelArgSVMPointer* then we could als Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer]. -- -. Should we support a "rect" memcpy similar to *clEnqueueCopyBufferRect*? +. Should we support a "rect" or "2D" memcpy similar to *clEnqueueCopyBufferRect*? + -- *UNRESOLVED*: -This would be a fairly straightforward addition if it is useful. + +Tentative resolution: +Do not include a "rect" memcpy in the initial version of this extension. +A "rect" memcpy can always be added with a layered extension later, if desired. + +Note that standard SYCL does not include a "rect" memcpy, though the https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_memcpy2d.asciidoc[sycl_ext_onapi_memcpy2d] extension does include 2D memory copies and memory fills. -- . Should there be an upper limit on the size of an SVM allocation? @@ -1025,27 +1032,31 @@ For some devices, this query will return the same value as `CL_DEVICE_MAX_MEM_AL + -- *UNRESOLVED*: -Currently, attempting to allocate zero bytes fails and returns `CL_INVALID_BUFFER_SIZE`. -This is consistent with SVM, where *clSVMAlloc* fails and returns a `NULL` pointer if the size to allocate is zero. -It is also consistent with CUDA, where *cuMemAlloc*, etc. returns an error if the size to allocate is zero. -However, it is not necessarily consistent with other memory allocation functions. For example: +Tentative resolution: Allow zero-sized allocations and require returning a `NULL` pointer. +This is considered a successful operation and no error will be returned. + +We evaluated many scenarios and determined that there is no clearly correct behavior. +The scenarios we evaluated were: +* For OpenCL 2.0 SVM, *clSVMAlloc* with a size of zero is specified to return a `NULL` pointer. +Because *clSVMAlloc* has no mechanism to return an error code, it is unspecified whether this is considered an error. +* For `cl_intel_unified_shared_memory`, calling *clDeviceMemAllocINTEL*, etc. returns `CL_INVALID_BUFFER_SIZE` if the size to allocate is zero. +* For CUDA, calling *cuMemAlloc*, etc. returns an error if the size to allocate is zero. * The result of calling `malloc(0)` is implementation-defined: it can either return a `NULL` pointer or a unique non-null pointer that must be freed. If a `NULL` pointer is returned then `errno` may be set to an implementation-defined value. If a unique non-null pointer is returned then it cannot be dereferenced. * Allocating an array of zero elements using `new` must return a non-null pointer, though dereferencing the pointer is undefined. -Possible resolutions: +For reference, the full set of options we considered were: .. [.line-through]#Allow zero-sized allocations and require returning a non-null pointer that must be freed.# -.. Allow zero-sized allocations and require returning a `NULL` pointer. Note, it is not currently an error to free a `NULL` pointer. +.. Allow zero-sized allocations and require returning a `NULL` pointer. +No error will be generated. +Note, it is not currently an error to free a `NULL` pointer. .. [.line-through]#Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned.# .. [.line-through]#Specify that this case is implementation-defined.# -.. Do nothing and keep the existing error behavior. - -We are leaning towards returning `NULL` for a zero-byte allocation, but we are still deciding whether it should be an error. -In practice, the main difference is that if it is an error then a well-behaving application would be required to check the size. +.. Specify that this case is an error. -- Note: The following issues were added to the KHR USM extension: @@ -1066,7 +1077,10 @@ For reference, for other APIs: . What about devices and sub-devices? + -- -*UNRESOLVED*: +*UNRESOLVED*: Neither the OpenCL specification nor this extension specification currently says much about how SVM behaves for devices and sub-devices. + +My rough expectations are that if an allocation is made against a context with a device and a sub-device, and the allocation is associated with the device, then the allocation is also accessible to the sub-device. +Unless additional clarification is needed, perhaps this issue may simply be resolved. -- . Should we move more of the *clSVMAllocWithProperties* arguments to properties? From 4cef078f9f5daf930831cdd37ee04801971557c5 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 15 Sep 2025 10:29:19 -0700 Subject: [PATCH 11/18] update spec version to v0.9.0, consistent with the XML file --- extensions/cl_khr_unified_svm.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 74b8202cb..9f069418e 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -51,7 +51,7 @@ Working Draft == Version Built On: {docdate} + -Revision: 0.2.0 +Revision: 0.9.0 == Dependencies @@ -1126,7 +1126,7 @@ If desired, a layered extension could add a new property to *clSVMAllocWithPrope [options="header"] |======================================== |Version|Date|Author|Changes -|0.2.0|2024-10-29|Ben Ashbaugh|Initial public revision. +|0.9.0|2024-10-29|Ben Ashbaugh|Initial public revision. |======================================== //************************************************************************ From 55ae4a259afd448bcc37557619e0cfad62c86797 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 15 Sep 2025 11:02:23 -0700 Subject: [PATCH 12/18] switch to generated asciidoctor attributes --- extensions/cl_khr_unified_svm.asciidoc | 396 ++++++++++++------------- 1 file changed, 188 insertions(+), 208 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 9f069418e..8286bc727 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -1,3 +1,13 @@ +// Copyright 2025 The Khronos Group. This work is licensed under a +// Creative Commons Attribution 4.0 International License; see +// http://creativecommons.org/licenses/by/4.0/ + +:data-uri: +:icons: font +include::../config/attribs.txt[] +include::{generated}/api/api-dictionary-no-links.asciidoc[] +:source-highlighter: coderay + = cl_khr_unified_svm // This section needs to be after the document title. @@ -158,7 +168,7 @@ Convenience macros describing required properties for several common SVM allocat #define CL_SVM_TYPE_MACRO_SYSTEM_KHR /* ... */ ---- -Accepted value for the _param_name_ parameter to *clGetPlatformInfo* to query combinations of SVM capabilities defining the SVM types supported by an OpenCL platform: +Accepted value for the _param_name_ parameter to {clGetPlatformInfo} to query combinations of SVM capabilities defining the SVM types supported by an OpenCL platform: [source] ---- @@ -166,7 +176,7 @@ Accepted value for the _param_name_ parameter to *clGetPlatformInfo* to query co #define CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR 0x0909 ---- -Accepted value for the _param_name_ parameter to *clGetDeviceInfo* to query combinations of SVM capabilities defining the SVM types supported by an OpenCL device: +Accepted value for the _param_name_ parameter to {clGetDeviceInfo} to query combinations of SVM capabilities defining the SVM types supported by an OpenCL device: [source] ---- @@ -174,7 +184,7 @@ Accepted value for the _param_name_ parameter to *clGetDeviceInfo* to query comb #define CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR 0x1077 ---- -Accepted value for the _param_name_ parameter to *clGetDeviceInfo* to query the size and alignment for concurrent access to SVM allocations: +Accepted value for the _param_name_ parameter to {clGetDeviceInfo} to query the size and alignment for concurrent access to SVM allocations: [source] ---- @@ -221,7 +231,7 @@ No SVM free flags are added by this extension: typedef cl_bitfield cl_svm_free_flags_khr; ---- -Enumeration type and values for the _param_name_ parameter to *clGetSVMPointerInfoKHR* to query information about an SVM allocation. +Enumeration type and values for the _param_name_ parameter to {clGetSVMPointerInfoKHR} to query information about an SVM allocation. [source] ---- @@ -236,7 +246,7 @@ typedef cl_uint cl_svm_pointer_info_khr; #define CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR 0x419D ---- -Accepted values for the _param_name_ parameter to *clSetKernelExecInfo* to enable and disable indirect access to SVM allocations made by *clSVMAllocWithPropertiesKHR* or *clSVMAlloc*: +Accepted values for the _param_name_ parameter to {clSetKernelExecInfo} to enable and disable indirect access to SVM allocations made by {clSVMAllocWithPropertiesKHR} or {clSVMAlloc}: [source] ---- @@ -248,48 +258,48 @@ Accepted values for the _param_name_ parameter to *clSetKernelExecInfo* to enabl === Section 4.1 - Querying Platform Info: -Add to Table 3 - List of supported param_names by *clGetPlatformInfo*: +Add to Table 3 - List of supported param_names by {clGetPlatformInfo}: [caption="Table 5. "] .List of supported param_names by clGetDeviceInfo [width="100%",cols="<30%,<20%,<50%",options="header"] |==== | Device Info | Return Type | Description -| `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR` - | `cl_svm_capabilities_khr[]` +| {CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR} + | {cl_svm_capabilities_khr_TYPE}[] | Queries the combinations of SVM capabilities defining the SVM types supported by OpenCL devices in the OpenCL platform. Returns an array of bitfields, where each bitfield in the array describes the SVM capabilities for one SVM type. Each SVM type must be supported by at least one device in the platform, but may not be supported by all devices in the platform. - To determine the combinations of SVM capabilities defining the SVM types supported by a device, use the query `CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR`. + To determine the combinations of SVM capabilities defining the SVM types supported by a device, use the query {CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR}. Please refer to the <> table for capability values and their description. |==== === Section 4.2 - Querying Devices: -Add to Table 5 - List of supported param_names by *clGetDeviceInfo*: +Add to Table 5 - List of supported param_names by {clGetDeviceInfo}: [caption="Table 5. "] .List of supported param_names by clGetDeviceInfo [width="100%",cols="<30%,<20%,<50%",options="header"] |==== | Device Info | Return Type | Description -| `CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR` - | `cl_svm_capabilities_khr[]` +| {CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR} + | {cl_svm_capabilities_khr_TYPE}[] | Queries the combinations of SVM capabilities describing the SVM types supported by an OpenCL device. Returns an array of bitfields, where each bitfield in the array describes the SVM capabilities for one SVM type. - The size of the returned array must match the size of the array returned by the platform query `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR`. + The size of the returned array must match the size of the array returned by the platform query {CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR}. - Each entry in the returned array must be either a super-set of the entry in the array returned by the platform query `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR`, indicating that the SVM type is supported by the device, or zero, indicating that the SVM type is not supported by this device. + Each entry in the returned array must be either a super-set of the entry in the array returned by the platform query {CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR}, indicating that the SVM type is supported by the device, or zero, indicating that the SVM type is not supported by this device. Please refer to the <> table for valid capability values and their description. -| `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR` - | `size_t` +| {CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR} + | {size_t_TYPE} | Queries the size and alignment in bytes that bound concurrent access to SVM allocations. The returned value must be zero or a power of two. If the device does not support concurrent access to SVM allocations, then the query must return `0`, indicating that concurrent access is not supported. - If the query for `CL_DEVICE_SVM_CAPABILITIES` includes `CL_DEVICE_SVM_FINE_GRAIN_BUFFER` or `CL_DEVICE_SVM_FINE_GRAIN_SYSTEM`, then the query must return one, indicating that concurrent access occurs at the granularity of individual bytes within allocations. + If the query for {CL_DEVICE_SVM_CAPABILITIES} includes {CL_DEVICE_SVM_FINE_GRAIN_BUFFER} or {CL_DEVICE_SVM_FINE_GRAIN_SYSTEM}, then the query must return one, indicating that concurrent access occurs at the granularity of individual bytes within allocations. |==== [[svm-capabilities-table]] @@ -298,39 +308,39 @@ Add to Table 5 - List of supported param_names by *clGetDeviceInfo*: [width="100%",cols="2,3",options="header"] |==== | Capability | Description -| `CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR` +| {CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR} | There is a single address space for this type of SVM. The same pointer may be used on the host and the device; the pointer has _address equivalence_. -| `CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR` - | This type of SVM provides access to the entire host virtual memory, including memory allocated by a system allocator such as `malloc` or `new` or objects allocated on the stack, and does not require calling *clSVMAllocWithPropertiesKHR* or *clSVMAlloc*. -| `CL_SVM_CAPABILITY_DEVICE_OWNED_KHR` +| {CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR} + | This type of SVM provides access to the entire host virtual memory, including memory allocated by a system allocator such as `malloc` or `new` or objects allocated on the stack, and does not require calling {clSVMAllocWithPropertiesKHR} or {clSVMAlloc}. +| {CL_SVM_CAPABILITY_DEVICE_OWNED_KHR} | This type of SVM is owned by an associated device handle and is not intended to migrate to another device or the host. Allocations that are owned by a device generally trade off access limitations for higher performance. -| `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR` +| {CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR} | This type of SVM does not need to be associated with a device handle. -| `CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR` +| {CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR} | This type of SVM is accessible to other devices in the context that support the SVM type. -| `CL_SVM_CAPABILITY_HOST_OWNED_KHR` +| {CL_SVM_CAPABILITY_HOST_OWNED_KHR} | This type of SVM is owned by the host and is not intended to migrate to a device. Allocations that are owned by the host generally trade off wide accessibility for potentially higher per-access costs. -| `CL_SVM_CAPABILITY_HOST_READ_KHR` +| {CL_SVM_CAPABILITY_HOST_READ_KHR} | This type of SVM is readable on the host without needing to map or unmap the allocation. -| `CL_SVM_CAPABILITY_HOST_WRITE_KHR` +| {CL_SVM_CAPABILITY_HOST_WRITE_KHR} | This type of SVM is writeable on the host without needing to map or unmap the allocation. -| `CL_SVM_CAPABILITY_HOST_MAP_KHR` +| {CL_SVM_CAPABILITY_HOST_MAP_KHR} | This type of SVM is accessible on the host but requires mapping and unmapping the allocation. -| `CL_SVM_CAPABILITY_DEVICE_READ_KHR` +| {CL_SVM_CAPABILITY_DEVICE_READ_KHR} | This type of SVM is accessible on the device for reading. -| `CL_SVM_CAPABILITY_DEVICE_WRITE_KHR` +| {CL_SVM_CAPABILITY_DEVICE_WRITE_KHR} | This type of SVM is accessible on the device for writing. -| `CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR` +| {CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR} | This type of SVM is accessible on the device using atomic built-in functions. -| `CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR` +| {CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR} | This type of SVM supports concurrent access from the host and a device, or from multiple devices. - Use `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR` to determine the supported granularity for concurrent access. -| `CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR` + Use {CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR} to determine the supported granularity for concurrent access. +| {CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR} | This type of SVM supports concurrent atomic access from the host and a device, or from multiple devices. -| `CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR` +| {CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR} | This type of SVM supports a single kernel enable to indicate that the kernel may allocate any allocation of this type, rather than passing a list of indirectly accessed allocations to the kernel. |==== @@ -394,21 +404,21 @@ The following table describes the detailed set of SVM capabilities for some comm |==== | SVM Capability | Coarse-Grain Buffer SVM | Fine-Grain Buffer SVM | Device SVM | Host SVM | Single-Device Shared SVM | System SVM // CG FG Dev Host SDS Sys -| `CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR` | Y | Y | Y | Y | Y | Y -| `CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR` | | | | | | Y -| `CL_SVM_CAPABILITY_DEVICE_OWNED_KHR` | | | Y | | | -| `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR` | Y | Y | | Y | | Y -| `CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR` | Y | Y | | Y | | Y -| `CL_SVM_CAPABILITY_HOST_OWNED_KHR` | | | | Y | | -| `CL_SVM_CAPABILITY_HOST_READ_KHR` | | Y | | Y | Y | Y -| `CL_SVM_CAPABILITY_HOST_WRITE_KHR` | | Y | | Y | Y | Y -| `CL_SVM_CAPABILITY_HOST_MAP_KHR` | Y | Y | | | | Y? -| `CL_SVM_CAPABILITY_DEVICE_READ_KHR` | Y | Y | Y | Y | Y | Y -| `CL_SVM_CAPABILITY_DEVICE_WRITE_KHR` | Y | Y | Y | Y | Y | Y -| `CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR` | Y | Y | Y | {O} | {O} | Y -| `CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR` | | Y | {O} | {O} | {O} | Y -| `CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR` | | {O} | {O} | {O} | {O} | Y -| `CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR` | {O} | {O} | Y | Y | Y | Y +| {CL_SVM_CAPABILITY_SINGLE_ADDRESS_SPACE_KHR} | Y | Y | Y | Y | Y | Y +| {CL_SVM_CAPABILITY_SYSTEM_ALLOCATED_KHR} | | | | | | Y +| {CL_SVM_CAPABILITY_DEVICE_OWNED_KHR} | | | Y | | | +| {CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR} | Y | Y | | Y | | Y +| {CL_SVM_CAPABILITY_CONTEXT_ACCESS_KHR} | Y | Y | | Y | | Y +| {CL_SVM_CAPABILITY_HOST_OWNED_KHR} | | | | Y | | +| {CL_SVM_CAPABILITY_HOST_READ_KHR} | | Y | | Y | Y | Y +| {CL_SVM_CAPABILITY_HOST_WRITE_KHR} | | Y | | Y | Y | Y +| {CL_SVM_CAPABILITY_HOST_MAP_KHR} | Y | Y | | | | Y? +| {CL_SVM_CAPABILITY_DEVICE_READ_KHR} | Y | Y | Y | Y | Y | Y +| {CL_SVM_CAPABILITY_DEVICE_WRITE_KHR} | Y | Y | Y | Y | Y | Y +| {CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR} | Y | Y | Y | {O} | {O} | Y +| {CL_SVM_CAPABILITY_CONCURRENT_ACCESS_KHR} | | Y | {O} | {O} | {O} | Y +| {CL_SVM_CAPABILITY_CONCURRENT_ATOMIC_ACCESS_KHR} | | {O} | {O} | {O} | {O} | Y +| {CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR} | {O} | {O} | Y | Y | Y | Y |==== In this table: @@ -430,15 +440,8 @@ TODO: Probably ought to substantially rewrite portions of Section 5.6.1 and perh The function -[source] ----- -void* clSVMAllocWithPropertiesKHR( - cl_context context, - const cl_svm_alloc_properties_khr* properties, - cl_uint svm_type_index, - size_t size, - cl_int* errcode_ret); ----- +include::{generated}/api/protos/clSVMAllocWithPropertiesKHR.txt[] +include::{generated}/api/version-notes/clSVMAllocWithPropertiesKHR.asciidoc[] allocates shared virtual memory with optional properties. @@ -449,57 +452,57 @@ The list is terminated with the special property `0`. If no allocation properties are required, _properties_ may be `NULL`. Please refer to the <> table for valid SVM allocation properties and their description. -_svm_type_index_ is an index into the array of supported SVM types returned by `CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR` or `CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR` that specifies the type of SVM to allocate. +_svm_type_index_ is an index into the array of supported SVM types returned by {CL_PLATFORM_SVM_TYPE_CAPABILITIES_KHR} or {CL_DEVICE_SVM_TYPE_CAPABILITIES_KHR} that specifies the type of SVM to allocate. _size_ is the size in bytes of the requested SVM allocation. _errcode_ret_ may return an appropriate error code. If _errcode_ret_ is `NULL` then no error code will be returned. -*clSVMAllocWithPropertiesKHR* will return a valid non-`NULL` address and `CL_SUCCESS` will be returned in _errcode_ret_ if the shared virtual memory is allocated successfully. +{clSVMAllocWithPropertiesKHR} will return a valid non-`NULL` address and {CL_SUCCESS} will be returned in _errcode_ret_ if the shared virtual memory is allocated successfully. Otherwise, `NULL` will be returned, and _errcode_ret_ will be set to one of the following error values: -* `CL_INVALID_CONTEXT` if _context_ is not a valid context. -* `CL_INVALID_PROPERTY` if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. -* `CL_INVALID_OPERATION` if no devices in _context_ support the SVM type specified by _svm_type_index_, or if a device associated with the SVM allocation does not support the SVM type specified by _svm_type_index_. -* `CL_INVALID_VALUE` if _svm_type_index_ is greater than the number of SVM types supported the devices in _context_. -* `CL_INVALID_BUFFER_SIZE` if _size_ is zero or greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for any OpenCL device in _context_ that supports the specified SVM type, or if _size_ is greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for a device associated with the SVM allocation. +* {CL_INVALID_CONTEXT} if _context_ is not a valid context. +* {CL_INVALID_PROPERTY} if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. +* {CL_INVALID_OPERATION} if no devices in _context_ support the SVM type specified by _svm_type_index_, or if a device associated with the SVM allocation does not support the SVM type specified by _svm_type_index_. +* {CL_INVALID_VALUE} if _svm_type_index_ is greater than the number of SVM types supported the devices in _context_. +* {CL_INVALID_BUFFER_SIZE} if _size_ is zero or greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for any OpenCL device in _context_ that supports the specified SVM type, or if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for a device associated with the SVM allocation. TODO: update depending on the updated queries for available SVM sizes. -* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. -* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. +* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. +* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. TODO: Do we want to document any specific error conditions for invalid property values? [[svm-alloc-properties-table]] [caption="Table X. "] -.List of supported SVM allocation properties by *clSVMAllocWithPropertiesKHR* +.List of supported SVM allocation properties by {clSVMAllocWithPropertiesKHR} [width="100%",cols="2,1,3",options="header"] |==== | Allocation Property | Property Value | Description -| `CL_SVM_ALLOC_ASSOCIATED_DEVICE_HANDLE_KHR` - | `cl_device_id` +| {CL_SVM_ALLOC_ASSOCIATED_DEVICE_HANDLE_KHR} + | {cl_device_id_TYPE} | Associates the allocation with a specific device handle. The associated device handle property is required unless the specified - SVM type contains `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR`. + SVM type contains {CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR}. The default value is `NULL`, which indicates that the allocation is not associated with a specific device handle. -| `CL_SVM_ALLOC_ACCESS_FLAGS_KHR` - | `cl_svm_alloc_access_flags_khr` +| {CL_SVM_ALLOC_ACCESS_FLAGS_KHR} + | {cl_svm_alloc_access_flags_khr_TYPE} | Flags specifying access information for the allocation. If these access flags are violated, behavior is undefined. This is a bitfield type that may be set to a combination of the following values: - `CL_SVM_ALLOC_ACCESS_HOST_NOREAD_KHR`: the host will not read this allocation. + - `CL_SVM_ALLOC_ACCESS_HOST_NOWRITE_KHR`: the host will not write this allocation. + - `CL_SVM_ALLOC_ACCESS_DEVICE_NOREAD_KHR`: the device will not read this allocation. + - `CL_SVM_ALLOC_ACCESS_DEVICE_NOWRITE_KHR`: the device will not write this allocation. + {CL_SVM_ALLOC_ACCESS_HOST_NOREAD_KHR}: the host will not read this allocation. + + {CL_SVM_ALLOC_ACCESS_HOST_NOWRITE_KHR}: the host will not write this allocation. + + {CL_SVM_ALLOC_ACCESS_DEVICE_NOREAD_KHR}: the device will not read this allocation. + + {CL_SVM_ALLOC_ACCESS_DEVICE_NOWRITE_KHR}: the device will not write this allocation. The default value is `0`, which indicates no special access behavior for the host or the device for this allocation. -| `CL_SVM_ALLOC_ALIGNMENT_KHR` - | `size_t` +| {CL_SVM_ALLOC_ALIGNMENT_KHR} + | {size_t_TYPE} | Specifies the minimum alignment in bytes for the SVM allocation. The alignment must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in @@ -515,14 +518,8 @@ TODO: Do we want to document any specific error conditions for invalid property The function -[source] ----- -cl_int clSVMFreeWithPropertiesKHR( - cl_context context, - const cl_svm_free_properties_khr* properties, - cl_svm_free_flags_khr flags, - void* ptr); ----- +include::{generated}/api/protos/clSVMFreeWithPropertiesKHR.txt[] +include::{generated}/api/version-notes/clSVMFreeWithPropertiesKHR.asciidoc[] frees an SVM allocation with optional properties. @@ -537,42 +534,33 @@ _flags_ is used to specify how the SVM allocation is freed. This extension does not define any free flags. _ptr_ is the SVM allocation to free. -It must be a value returned by *clSVMAlloc*, *clSVMAllocWithPropertiesKHR*, or a `NULL` pointer. +It must be a value returned by {clSVMAlloc}, {clSVMAllocWithPropertiesKHR}, or a `NULL` pointer. It is the responsibility of the application to make sure enqueued commands that use _ptr_ are complete before freeing _ptr_. Behavior is undefined if a previously enqueued command that may be using _ptr_ is still executing. If _ptr_ is `NULL` then no action occurs. -*clSVMFreeWithPropertiesKHR* will return `CL_SUCCESS` if the function executes successfully. +{clSVMFreeWithPropertiesKHR} will return {CL_SUCCESS} if the function executes successfully. Otherwise, it returns one of the following errors: -* `CL_INVALID_CONTEXT` if _context_ is not a valid context. -* `CL_INVALID_PROPERTY` if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. -* `CL_INVALID_VALUE` if _flags_ contains an invalid SVM free flag. -* `CL_INVALID_VALUE` if _ptr_ is not a value returned by *clSVMAlloc*, *clSVMAllocWithPropertiesKHR*, or a `NULL` pointer. -* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. -* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. +* {CL_INVALID_CONTEXT} if _context_ is not a valid context. +* {CL_INVALID_PROPERTY} if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. +* {CL_INVALID_VALUE} if _flags_ contains an invalid SVM free flag. +* {CL_INVALID_VALUE} if _ptr_ is not a value returned by {clSVMAlloc}, {clSVMAllocWithPropertiesKHR}, or a `NULL` pointer. +* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. +* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. [NOTE] ==== -Whether *clSVMFree* or *clSVMFreeWithPropertiesKHR* is blocking or non-blocking is unspecified. -Applications should not rely on *clSVMFree* or *clSVMFreeWithPropertiesKHR* for synchronization, nor assume that *clSVMFree* or *clVMFreeWithPropertiesKHR* cannot cause deadlocks. +Whether {clSVMFree} or {clSVMFreeWithPropertiesKHR} is blocking or non-blocking is unspecified. +Applications should not rely on {clSVMFree} or {clSVMFreeWithPropertiesKHR} for synchronization, nor assume that {clSVMFree} or {clSVMFreeWithPropertiesKHR} cannot cause deadlocks. ==== ===== Querying SVM Allocations The function -[source] ----- -cl_int clGetSVMPointerInfoKHR( - cl_context context, - cl_device_id device, - const void* ptr, - cl_svm_pointer_info_khr param_name, - size_t param_value_size, - void* param_value, - size_t* param_value_size_ret); ----- +include::{generated}/api/protos/clGetSVMPointerInfoKHR.txt[] +include::{generated}/api/version-notes/clGetSVMPointerInfoKHR.asciidoc[] queries information about an SVM allocation. @@ -582,7 +570,7 @@ _device_ is an optional OpenCL device handle to query for information about the If _device_ is `NULL`, the default device is the device associated with the SVM allocation, or all devices in the _context_ if there is no device associated with the SVM allocation. _ptr_ is a pointer into an SVM allocation to query. -_ptr_ need not be a value returned by *clSVMAlloc* or *clSVMAllocWithProperties*, but the query may be faster if it is. +_ptr_ need not be a value returned by {clSVMAlloc} or {clSVMAllocWithPropertiesKHR}, but the query may be faster if it is. _param_name_ specifies the information to query. The list of supported _param_name_ values and the information returned in _param_value_ is described in the <> table. @@ -597,86 +585,78 @@ If _param_value_ is `NULL`, it is ignored. _param_value_size_ret_ returns the actual size in bytes of data being queried by _param_name_. If _param_value_size_ret_ is `NULL`, it is ignored. -*clGetSVMPointerInfoKHR* returns `CL_SUCCESS` if the function is executed successfully. +{clGetSVMPointerInfoKHR} returns {CL_SUCCESS} if the function is executed successfully. Otherwise, it will return one of the following error values: -* `CL_INVALID_CONTEXT` if _context_ is not a valid context. -* `CL_INVALID_DEVICE` if _device_ is not a valid device or is not associated with _context_. -* `CL_INVALID_VALUE` if _param_name_ is not a valid SVM allocation query. -* `CL_INVALID_VALUE` if _param_value_ is not `NULL` and _param_value_size_ is smaller than the size of the query return type. -* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. -* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. +* {CL_INVALID_CONTEXT} if _context_ is not a valid context. +* {CL_INVALID_DEVICE} if _device_ is not a valid device or is not associated with _context_. +* {CL_INVALID_VALUE} if _param_name_ is not a valid SVM allocation query. +* {CL_INVALID_VALUE} if _param_value_ is not `NULL` and _param_value_size_ is smaller than the size of the query return type. +* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. +* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. [[svm-queries-table]] .List of supported param_names by clGetSVMPointerInfoKHR [width="100%",cols="<34%,<33%,<33%",options="header"] |==== | *cl_svm_pointer_info_khr* | Return type | Info. returned in _param_value_ -| `CL_SVM_INFO_TYPE_INDEX_KHR` - | `cl_uint` +| {CL_SVM_INFO_TYPE_INDEX_KHR} + | {cl_uint_TYPE} | Returns the SVM type index used to allocate the SVM allocation. - Returns `CL_UINT_MAX` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + Returns {CL_UINT_MAX} if _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_. -| `CL_SVM_INFO_CAPABILITIES_KHR` - | `cl_svm_capabilities_khr` +| {CL_SVM_INFO_CAPABILITIES_KHR} + | {cl_svm_capabilities_khr_TYPE} | Returns the SVM capabilities for the SVM allocation for the specified _device_. If _device_ is `NULL` and there is a device associated with the SVM allocation, returns the SVM capabilities for the device associated with the SVM allocation. If _device_ is `NULL` and there is no device associated with the SVM allocation, returns the SVM capabilities for all devices in _context_ supporting the SVM allocation. - Returns `0` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + Returns `0` if _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_. -| `CL_SVM_INFO_PROPERTIES_KHR` - | `cl_svm_alloc_properties_khr` - | Returns the properties argument specified in *clSVMAllocWithPropertiesKHR* when _ptr_ was allocated. +| {CL_SVM_INFO_PROPERTIES_KHR} + | {cl_svm_alloc_properties_khr_TYPE} + | Returns the properties argument specified in {clSVMAllocWithPropertiesKHR} when _ptr_ was allocated. - If the properties argument specified in *clSVMAllocWithPropertiesKHR* was not `NULL`, the implementation must return the values specified in the properties argument in the same order and without including additional properties. + If the properties argument specified in {clSVMAllocWithPropertiesKHR} was not `NULL`, the implementation must return the values specified in the properties argument in the same order and without including additional properties. - If the properties argument specified in *clSVMAllocWithPropertiesKHR* was `NULL`, or if _ptr_ was allocated using *clSVMAlloc*, or if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_, the implementation must return _param_value_size_ret_ equal to `0`, indicating that there are no properties to be returned. + If the properties argument specified in {clSVMAllocWithPropertiesKHR} was `NULL`, or if _ptr_ was allocated using {clSVMAlloc}, or if _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_, the implementation must return _param_value_size_ret_ equal to `0`, indicating that there are no properties to be returned. -| `CL_SVM_INFO_ACCESS_FLAGS_KHR` - | `cl_svm_alloc_access_flags_khr` - | Returns access flags for the SVM allocation, specified by the `CL_SVM_ALLOC_ACCESS_FLAGS_KHR` property. +| {CL_SVM_INFO_ACCESS_FLAGS_KHR} + | {cl_svm_alloc_access_flags_khr_TYPE} + | Returns access flags for the SVM allocation, specified by the {CL_SVM_ALLOC_ACCESS_FLAGS_KHR} property. - Returns `0` if the `CL_SVM_ALLOC_ACCESS_FLAGS_KHR` property was not specified when _ptr_ was allocated, or if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + Returns `0` if the {CL_SVM_ALLOC_ACCESS_FLAGS_KHR} property was not specified when _ptr_ was allocated, or if _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_. TODO: Check if `0` is the right default in all cases. - If _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_ should we return `NOREAD \| NOWRITE` instead? + If _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_ should we return `NOREAD \| NOWRITE` instead? What if _device_ is different than the device associated with the SVM allocation? -| `CL_SVM_INFO_BASE_PTR_KHR` +| {CL_SVM_INFO_BASE_PTR_KHR} | `void*` | Returns the base address of the SVM allocation. - Returns `NULL` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + Returns `NULL` if _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_. -| `CL_SVM_INFO_SIZE_KHR` - | `size_t` +| {CL_SVM_INFO_SIZE_KHR} + | {size_t_TYPE} | Returns the size in bytes of the SVM allocation. - Returns `0` if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + Returns `0` if _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_. -| `CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR` - | `cl_device_id` +| {CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR} + | {cl_device_id_TYPE} | Returns the device associated with the SVM allocation. - Returns `NULL` if the SVM allocation has no associated device handle, or if _ptr_ does not point into an SVM allocation returned from *clSVMAllocWithPropertiesKHR* or *clSVMAlloc* for _context_. + Returns `NULL` if the SVM allocation has no associated device handle, or if _ptr_ does not point into an SVM allocation returned from {clSVMAllocWithPropertiesKHR} or {clSVMAlloc} for _context_. |==== ===== Suggesting an SVM Type The function -[source] ----- -cl_int clGetSVMSuggestedTypeIndexKHR( - cl_context context, - cl_svm_capabilities_khr required_capabilities, - cl_svm_capabilities_khr desired_capabilities, - const cl_svm_alloc_properties_khr* properties, - size_t size, - cl_uint* suggested_svm_type_index); ----- +include::{generated}/api/protos/clGetSVMSuggestedTypeIndexKHR.txt[] +include::{generated}/api/version-notes/clGetSVMSuggestedTypeIndexKHR.asciidoc[] suggests an SVM allocation type that meets the required SVM capabilities. @@ -696,43 +676,43 @@ _size_ is the size in bytes for the suggestion. If _size_ is `0`, it is ignored. _suggested_svm_type_index_ is a pointer that will contain the result of the query. -The suggested SVM type may be `CL_UINT_MAX`, indicating that there is no SVM allocation type for the _context_ and devices in _device_list_ that support the _required_capabilities_ and _properties_. +The suggested SVM type may be {CL_UINT_MAX}, indicating that there is no SVM allocation type for the _context_ and devices in _device_list_ that support the _required_capabilities_ and _properties_. -*clGetSuggestedSVMTypeKHR* returns `CL_SUCCESS` if the query executed successfully. Otherwise, it returns one of the following errors: +{clGetSVMSuggestedTypeIndexKHR} returns {CL_SUCCESS} if the query executed successfully. Otherwise, it returns one of the following errors: -* `CL_INVALID_CONTEXT` if _context_ is not a valid context. -* `CL_INVALID_PROPERTY` if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. -* `CL_INVALID_VALUE` if _required_capabilities_ or _desired_capabilities_ contains an invalid SVM capability. -* `CL_INVALID_BUFFER_SIZE` if _size_ is greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for any OpenCL device in _context_ or if _size_ is greater than `CL_DEVICE_MAX_MEM_ALLOC_SIZE` for a device associated with the SVM allocation. +* {CL_INVALID_CONTEXT} if _context_ is not a valid context. +* {CL_INVALID_PROPERTY} if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. +* {CL_INVALID_VALUE} if _required_capabilities_ or _desired_capabilities_ contains an invalid SVM capability. +* {CL_INVALID_BUFFER_SIZE} if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for any OpenCL device in _context_ or if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for a device associated with the SVM allocation. TODO: update depending on the updated queries for available SVM sizes. -* `CL_INVALID_VALUE` if _suggested_svm_type_index_ is `NULL`. -* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by the OpenCL implementation on the device. -* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. +* {CL_INVALID_VALUE} if _suggested_svm_type_index_ is `NULL`. +* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. +* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. ===== Using SVM with Kernels SVM allocations may be accessed by kernels indirectly, without passing a pointer to the allocation as a kernel argument. -The new _param_name_ values described below may be used with the existing *clSetKernelExecInfo* function to describe how SVM allocations are accessed indirectly by a kernel: +The new _param_name_ values described below may be used with the existing {clSetKernelExecInfo} function to describe how SVM allocations are accessed indirectly by a kernel: [caption="Table 28. "] .List of supported param_names by clSetKernelExecInfo [width="100%",cols="<34%,<33%,<33%",options="header"] |==== | *cl_kernel_exec_info* | Type | Description -| `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` - | `cl_bool` - | Specifies whether SVM allocations from *clSVMAlloc* or *clSVMAllocWithPropertiesKHR* may be accessed indirectly within a kernel. +| {CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR} + | {cl_bool_TYPE} + | Specifies whether SVM allocations from {clSVMAlloc} or {clSVMAllocWithPropertiesKHR} may be accessed indirectly within a kernel. - When `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` is `CL_FALSE`, the kernel may only access SVM allocations from *clSVMAlloc* or *clSVMAllocWithPropertiesKHR* that are explicitly passed as kernel arguments or using `CL_KERNEL_EXEC_INFO_SVM_PTRS`. + When {CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR} is {CL_FALSE}, the kernel may only access SVM allocations from {clSVMAlloc} or {clSVMAllocWithPropertiesKHR} that are explicitly passed as kernel arguments or using {CL_KERNEL_EXEC_INFO_SVM_PTRS}. - When `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` is `CL_TRUE`, the kernel may access any SVM pointers allocated by *clSVMAlloc* or *clSVMAllocWithPropertiesKHR* on any device where the SVM allocation type includes `CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR`. + When {CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR} is {CL_TRUE}, the kernel may access any SVM pointers allocated by {clSVMAlloc} or {clSVMAllocWithPropertiesKHR} on any device where the SVM allocation type includes {CL_SVM_CAPABILITY_INDIRECT_ACCESS_KHR}. - By default, indirect access is disabled for all SVM allocations (except fine-grain system SVM allocations, see `CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM`), indicating that the kernel will only access SVM allocations that are explicitly passed as kernel arguments or using `CL_KERNEL_EXEC_INFO_SVM_PTRS`. + By default, indirect access is disabled for all SVM allocations (except fine-grain system SVM allocations, see {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}), indicating that the kernel will only access SVM allocations that are explicitly passed as kernel arguments or using {CL_KERNEL_EXEC_INFO_SVM_PTRS}. |==== -The following errors may be returned by *clSetKernelExecInfo* for these new _param_name_ values: +The following errors may be returned by {clSetKernelExecInfo} for these new _param_name_ values: -* `CL_INVALID_OPERATION` if _param_name_ is `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR` and no devices in the context associated with _kernel_ support SVM. +* {CL_INVALID_OPERATION} if _param_name_ is {CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR} and no devices in the context associated with _kernel_ support SVM. == Interactions with Other Extensions @@ -751,11 +731,11 @@ Interactions with command buffers? + -- *UNRESOLVED*: -Added a device query for `CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR`. +Added a device query for {CL_DEVICE_SVM_CONCURRENT_ACCESS_ATOM_SIZE_KHR}. Still TODO: -* What should the return type be for this query? Specifically, should it be a `size_t` or a `cl_uint`? +* What should the return type be for this query? Specifically, should it be a {size_t_TYPE} or a {cl_uint_TYPE}? * Do we want to distinguish between host concurrent access and device concurrent access, even if the only query we define for now is for host concurrent access? * Are we OK with a single query for all types of SVM, or do we need a separate query for each SVM type? * Is this only for concurrent access without atomic access? @@ -764,7 +744,7 @@ Still TODO: . What other SVM allocation properties should we support? + -- -`RESOLVED`: We decided not to accept any `cl_mem_flags` or `cl_svm_mem_flags`, and added access properties instead. +`RESOLVED`: We decided not to accept any {cl_mem_flags_TYPE} or {cl_svm_mem_flags_TYPE}, and added access properties instead. -- . Do we need separate "concurrent access" capabilities for host access vs. device access? @@ -793,7 +773,7 @@ The initial version of this extension will only support allocating host memory. `RESOLVED`: No, we will not support a _flags_ argument, and we will only support _properties_. -- -. What should behavior be for *clGetSVMPointerInfoKHR* if the passed-in _ptr_ is `NULL` or doesn't point into an SVM allocation? +. What should behavior be for {clGetSVMPointerInfoKHR} if the passed-in _ptr_ is `NULL` or doesn't point into an SVM allocation? + -- `RESOLVED`: The behavior is defined for all queries for this case. @@ -814,7 +794,7 @@ The initial version of this extension will only support allocating host memory. * Can a device "fill" arbitrary host memory? (No, undefined behavior unless system SVM is supported.) * Can a device "fill" a USM allocation from another context? (No, undefined behavior.) -Note, there are no existing CTS tests that pass an arbitrary host allocation to *clEnqueueSVMMemFill*. +Note, there are no existing CTS tests that pass an arbitrary host allocation to {clEnqueueSVMMemFill}. -- . What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to the "memcpy" API? @@ -835,22 +815,22 @@ Note, there are no existing CTS tests that pass an arbitrary host allocation to + -- `RESOLVED`: -The initial version of this extension will not extend *clEnqueueSVMMigrateMem*, and hence will only support migrating to the device or to the host. +The initial version of this extension will not extend {clEnqueueSVMMigrateMem}, and hence will only support migrating to the device or to the host. -- . Should we support migrating an array of pointers with one API call? + -- -`RESOLVED`: This is supported by *clEnqueueSVMMigrateMem*. +`RESOLVED`: This is supported by {clEnqueueSVMMigrateMem}. -- . Could the associated device be `NULL` if there is no need to associate a shared allocation to a specific device? + -- -`RESOLVED`: Yes, the associated device may be `NULL`, if the SVM type supports the `CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR` capability. +`RESOLVED`: Yes, the associated device may be `NULL`, if the SVM type supports the {CL_SVM_CAPABILITY_DEVICE_UNASSOCIATED_KHR} capability. -- -. Should we allow querying the associated device for a USM allocation using *clGetSVMPointerInfoKHR*? +. Should we allow querying the associated device for a USM allocation using {clGetSVMPointerInfoKHR}? + -- `RESOLVED`: Yes, we should. @@ -871,8 +851,8 @@ The layered extension could add coarse `CACHED` and `UNCACHED` properties, or se `RESOLVED`: We removed the _flags_ argument entirely. -- -. What are invalid values for `ptr` and `size` for *clEnqueueSVMMigrateMem*? -How about *clEnqueueSVMMemFill* and *clEnqueueSVMMemcpy*? +. What are invalid values for `ptr` and `size` for {clEnqueueSVMMigrateMem}? +How about {clEnqueueSVMMemFill} and {clEnqueueSVMMemcpy}? Specifically, is `NULL` a valid value for `ptr`? Is `size` equal to zero valid? + @@ -882,13 +862,13 @@ Is `size` equal to zero valid? Tentative resolution: .. A `size` equal to zero is valid. -When `size` is zero, the call to *clEnqueueSVMMigrateMem*, *clEnqueueSVMMemFill*, and *clEnqueueSVMMemcpy* trivially succeeds, similar to an enqueued marker. +When `size` is zero, the call to {clEnqueueSVMMigrateMem}, {clEnqueueSVMMemFill}, and {clEnqueueSVMMemcpy} trivially succeeds, similar to an enqueued marker. This appears to be the specified behavior for the C `memcpy` and `memset` functions. .. A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error. Allowing `size` to be zero and `ptr` to be `NULL` provides the most flexibility for other language runtimes built on top of OpenCL and the additional testing is manageable. -Note that in the current OpenCL spec, it is unconditionally a `CL_INVALID_VALUE` error to pass `ptr` equal to `NULL` for *clEnqueueSVMMigrateMem*, *clEnqueueSVMMemcpy*, or *clEnqueueSVMMemFill*, so this will need to be explicitly relaxed for implementations supporting this extension. +Note that in the current OpenCL spec, it is unconditionally a {CL_INVALID_VALUE} error to pass `ptr` equal to `NULL` for {clEnqueueSVMMigrateMem}, {clEnqueueSVMMemcpy}, or {clEnqueueSVMMemFill}, so this will need to be explicitly relaxed for implementations supporting this extension. There is currently no defined error behavior for `size` equal to zero, so this will not need to be explicitly relaxed in this extension, but it will need to be stated explicitly and tested. @@ -926,27 +906,27 @@ See internal merge request 198. The initial version of this extension will not support larger fill patterns. -- -. Can a pointer to a device, host, or shared SVM allocation be used to create a `cl_mem` using `CL_MEM_USE_HOST_PTR`? +. Can a pointer to a device, host, or shared SVM allocation be used to create a {cl_mem_TYPE} using {CL_MEM_USE_HOST_PTR}? + -- *UNRESOLVED*: Trending "no" in all cases. -If the SVM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the SVM allocation is from the same context this could be an error, such as {CL_INVALID_HOST_PTR}. If the SVM allocation is from a different context then behavior could be undefined. -- -. Can a pointer to a device, host, or shared SVM allocation be used to create a `cl_mem` buffer using `CL_MEM_COPY_HOST_PTR`? +. Can a pointer to a device, host, or shared SVM allocation be used to create a {cl_mem_TYPE} buffer using {CL_MEM_COPY_HOST_PTR}? + -- *UNRESOLVED*: Trending "no" for device and shared USM allocations. -If the USM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the USM allocation is from the same context this could be an error, such as {CL_INVALID_HOST_PTR}. If the USM allocation is from a different context then behavior could be undefined. Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context. -- -. Can a pointer to a device, host, or shared SVM allocation be passed to API functions to read from or write to `cl_mem` objects, such as *clEnqueueReadBuffer* or *clEnqueueWriteImage*? +. Can a pointer to a device, host, or shared SVM allocation be passed to API functions to read from or write to {cl_mem_TYPE} objects, such as {clEnqueueReadBuffer} or {clEnqueueWriteImage}? + -- *UNRESOLVED*: @@ -955,25 +935,25 @@ Trending "yes" for device SVM allocations, so long as the device SVM allocation Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context. Trending "no" for shared USM allocations. -If the shared USM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the shared USM allocation is from the same context this could be an error, such as {CL_INVALID_HOST_PTR}. If the shared USM allocation is from a different context then behavior could be undefined. -- -. Can a pointer to a device, host, or shared USM allocation be passed as the `pattern` argument to API functions to fill a `cl_mem`, SVM allocation, or USM allocation, such as *clEnqueueFillBuffer*? +. Can a pointer to a device, host, or shared USM allocation be passed as the `pattern` argument to API functions to fill a {cl_mem_TYPE}, SVM allocation, or USM allocation, such as {clEnqueueFillBuffer}? + -- *UNRESOLVED*: Trending "no" for device and shared allocations. -If the USM allocation is from the same context this could be an error, such as `CL_INVALID_HOST_PTR`. +If the USM allocation is from the same context this could be an error, such as {CL_INVALID_HOST_PTR}. If the USM allocation is from a different context then behavior could be undefined. Trending "yes" for host USM allocations, both when the host USM allocation is from this context and from another context. -- -. Should we support passing traditional `cl_mem_flags` via the USM allocation properties? +. Should we support passing traditional {cl_mem_flags_TYPE} via the USM allocation properties? + -- -`RESOLVED`: We decided not to accept any `cl_mem_flags` or `cl_svm_mem_flags`, and added access properties instead. +`RESOLVED`: We decided not to accept any {cl_mem_flags_TYPE} or {cl_svm_mem_flags_TYPE}, and added access properties instead. -- . Exactly how do the additional SVM types affect the memory model? @@ -981,24 +961,24 @@ Trending "yes" for host USM allocations, both when the host USM allocation is fr -- *UNRESOLVED*: This issue may be easier to resolve now that this is a "unified SVM" extension vs. a "USM" extension, but it will still need more thought. -One particular enhancement we may want to consider, though, is whether calling *clGetEventInfo* and passing `CL_EVENT_COMMAND_EXECUTION_STATUS` to query the event status is a synchronization point. +One particular enhancement we may want to consider, though, is whether calling {clGetEventInfo} and passing {CL_EVENT_COMMAND_EXECUTION_STATUS} to query the event status is a synchronization point. In the current specification, this is explicitly not a synchronization point. However, in other APIs, querying the event status and observing that the event is complete is a synchronization point. -Should we adopt this behavior also, or do we want users to call *clWaitForEvents* to define a synchronization point? +Should we adopt this behavior also, or do we want users to call {clWaitForEvents} to define a synchronization point? -- -. Should it be an error to set an unknown pointer as a kernel argument using *clSetKernelArgSVMPointer* if no devices support shared system allocations? +. Should it be an error to set an unknown pointer as a kernel argument using {clSetKernelArgSVMPointer} if no devices support shared system allocations? + -- *UNRESOLVED*: Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced. -If we relax the error condition for *clSetKernelArgSVMPointer* then we could also consider relaxing the error condition for *clSetKernelExecInfo*(`CL_KERNEL_EXEC_INFO_SVM_PTRS`) similarly. +If we relax the error condition for {clSetKernelArgSVMPointer} then we could also consider relaxing the error condition for {clSetKernelExecInfo}({CL_KERNEL_EXEC_INFO_SVM_PTRS}) similarly. Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer]. -- -. Should we support a "rect" or "2D" memcpy similar to *clEnqueueCopyBufferRect*? +. Should we support a "rect" or "2D" memcpy similar to {clEnqueueCopyBufferRect}? + -- *UNRESOLVED*: @@ -1015,16 +995,16 @@ If so, what should the upper limit be? + -- *UNRESOLVED*: -The upper limit is currently defined by `CL_DEVICE_MAX_MEM_ALLOC_SIZE` and if the allocation size exceeds this value then the error code `CL_INVALID_BUFFER_SIZE` is returned. +The upper limit is currently defined by {CL_DEVICE_MAX_MEM_ALLOC_SIZE} and if the allocation size exceeds this value then the error code {CL_INVALID_BUFFER_SIZE} is returned. -This behavior is consistent with *clSVMAlloc* (although *clSVMAlloc* does not return an error code it is specified to return a `NULL` pointer in this case) and *clCreateBuffer*. +This behavior is consistent with {clSVMAlloc} (although {clSVMAlloc} does not return an error code it is specified to return a `NULL` pointer in this case) and {clCreateBuffer}. However, for host allocations, some implementations are able to support larger allocation sizes. Possible resolutions: * Add a new query representing the maximum host memory allocation size supported by the device, e.g. `CL_DEVICE_MAX_HOST_MEM_ALLOC_SIZE_KHR`. -For some devices, this query will return the same value as `CL_DEVICE_MAX_MEM_ALLOC_SIZE`, but for other devices this query will return a larger value. -* Relax the error behavior so implementations may return `CL_INVALID_BUFFER_SIZE`, but they would not be required to return an error if they support larger allocation sizes. +For some devices, this query will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices this query will return a larger value. +* Relax the error behavior so implementations may return {CL_INVALID_BUFFER_SIZE}, but they would not be required to return an error if they support larger allocation sizes. * Do nothing and keep the existing error behavior. -- @@ -1039,9 +1019,9 @@ This is considered a successful operation and no error will be returned. We evaluated many scenarios and determined that there is no clearly correct behavior. The scenarios we evaluated were: -* For OpenCL 2.0 SVM, *clSVMAlloc* with a size of zero is specified to return a `NULL` pointer. -Because *clSVMAlloc* has no mechanism to return an error code, it is unspecified whether this is considered an error. -* For `cl_intel_unified_shared_memory`, calling *clDeviceMemAllocINTEL*, etc. returns `CL_INVALID_BUFFER_SIZE` if the size to allocate is zero. +* For OpenCL 2.0 SVM, {clSVMAlloc} with a size of zero is specified to return a `NULL` pointer. +Because {clSVMAlloc} has no mechanism to return an error code, it is unspecified whether this is considered an error. +* For `cl_intel_unified_shared_memory`, calling {clDeviceMemAllocINTEL}, etc. returns {CL_INVALID_BUFFER_SIZE} if the size to allocate is zero. * For CUDA, calling *cuMemAlloc*, etc. returns an error if the size to allocate is zero. * The result of calling `malloc(0)` is implementation-defined: it can either return a `NULL` pointer or a unique non-null pointer that must be freed. If a `NULL` pointer is returned then `errno` may be set to an implementation-defined value. @@ -1083,14 +1063,14 @@ My rough expectations are that if an allocation is made against a context with a Unless additional clarification is needed, perhaps this issue may simply be resolved. -- -. Should we move more of the *clSVMAllocWithProperties* arguments to properties? +. Should we move more of the {clSVMAllocWithPropertiesKHR} arguments to properties? + -- `RESOLVED`: We moved the access flags and alignment to properties, so the only required arguments are now the properties, the SVM type index, and the SVM allocation size. -- -. Does the *clGetSuggestedSVMCapabilitiesKHR* query apply to _all_ of the devices in the device list or context, or to _any_ of the devices in the device list or context? +. Does the {clGetSVMSuggestedTypeIndexKHR} query apply to _all_ of the devices in the device list or context, or to _any_ of the devices in the device list or context? + -- *UNRESOLVED*: The query should probably apply to _all_ of the devices in the device list or context, though other interpretations may make sense in some cases. @@ -1103,9 +1083,9 @@ This is especially important if the required SVM capabilities contains e.g. "dev -- `RESOLVED`: Yes, we should. We now have: -* `CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR`, added by this extension, which enables indirect access for all SVM allocations made through the driver (by calling *clSVMAlloc* or *clSVMAllocWithPropertiesKHR*). +* {CL_KERNEL_EXEC_INFO_SVM_INDIRECT_ACCESS_KHR}, added by this extension, which enables indirect access for all SVM allocations made through the driver (by calling {clSVMAlloc} or {clSVMAllocWithPropertiesKHR}). Indirect access for these types of allocations is **disabled** by default. -* `CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM`, already in the core specification, which enables indirect access for SVM allocations made using a system allocator. +* {CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM}, already in the core specification, which enables indirect access for SVM allocations made using a system allocator. Indirect access for these types of allocations is **enabled** by default, though it is ignored for devices that do not support system SVM. -- @@ -1116,7 +1096,7 @@ Indirect access for these types of allocations is **enabled** by default, though If an allocation is created with the *DEVICE_NOWRITE* flag, then it can only be initialized on the host. This extension does not support initialize an allocation with both the *HOST_NOWRITE* and *DEVICE_NOWRITE* flags. -If desired, a layered extension could add a new property to *clSVMAllocWithPropertiesKHR* that would specify a pointer with the initial contents of an SVM allocation with both the *HOST_NOWRITE* and *DEVICE_NOWRITE* access flags. +If desired, a layered extension could add a new property to {clSVMAllocWithPropertiesKHR} that would specify a pointer with the initial contents of an SVM allocation with both the *HOST_NOWRITE* and *DEVICE_NOWRITE* access flags. -- == Revision History From 6cacb89a838251479963b3a85162745c9ea3d621 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 15 Sep 2025 11:15:03 -0700 Subject: [PATCH 13/18] minor fixes for system SVM allocated by driver APIs --- extensions/cl_khr_unified_svm.asciidoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 8286bc727..20ad9aed5 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -231,7 +231,7 @@ No SVM free flags are added by this extension: typedef cl_bitfield cl_svm_free_flags_khr; ---- -Enumeration type and values for the _param_name_ parameter to {clGetSVMPointerInfoKHR} to query information about an SVM allocation. +Enumeration type and values for the _param_name_ parameter to {clGetSVMPointerInfoKHR} to query information about SVM allocations made by {clSVMAllocWithPropertiesKHR} or {clSVMAlloc}. [source] ---- @@ -378,12 +378,12 @@ The following table provides a high-level summary of SVM capabilities for some c | Host | Yes | Host | N/A | Any Device | Yes (perhaps over a bus, such as PCIe) | Device | No -.3+| **Shared SVM** .3+| Host, or Associated Device, or Unspecified +.3+| **Shared SVM** .3+| Associated Device or Unspecified | Host | Yes | Host | Yes | Associated Device | Yes | Device | Yes | Another Device | Not With This Extension | Another Device | Not With This Extension -.2+| **Shared System SVM** .2+| Host +.2+| **Shared System SVM** .2+| Associated Device or Unspecified | Host | Yes | Host | Yes | Device | Yes | Device | Yes From 7a0403f87aa93c59778dc3493d47c637eff31ea2 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 15 Sep 2025 11:15:33 -0700 Subject: [PATCH 14/18] system SVM should be mappable by the host --- extensions/cl_khr_unified_svm.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 20ad9aed5..d4968c03a 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -412,7 +412,7 @@ The following table describes the detailed set of SVM capabilities for some comm | {CL_SVM_CAPABILITY_HOST_OWNED_KHR} | | | | Y | | | {CL_SVM_CAPABILITY_HOST_READ_KHR} | | Y | | Y | Y | Y | {CL_SVM_CAPABILITY_HOST_WRITE_KHR} | | Y | | Y | Y | Y -| {CL_SVM_CAPABILITY_HOST_MAP_KHR} | Y | Y | | | | Y? +| {CL_SVM_CAPABILITY_HOST_MAP_KHR} | Y | Y | | | | Y | {CL_SVM_CAPABILITY_DEVICE_READ_KHR} | Y | Y | Y | Y | Y | Y | {CL_SVM_CAPABILITY_DEVICE_WRITE_KHR} | Y | Y | Y | Y | Y | Y | {CL_SVM_CAPABILITY_DEVICE_ATOMIC_ACCESS_KHR} | Y | Y | Y | {O} | {O} | Y From 6fb17166002d71bc99e47c326cdeb9099996af7f Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 15 Sep 2025 11:31:23 -0700 Subject: [PATCH 15/18] switch to new error code convention --- extensions/cl_khr_unified_svm.asciidoc | 100 ++++++++++++++++++------- 1 file changed, 72 insertions(+), 28 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index d4968c03a..fc2f80998 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -462,14 +462,27 @@ If _errcode_ret_ is `NULL` then no error code will be returned. {clSVMAllocWithPropertiesKHR} will return a valid non-`NULL` address and {CL_SUCCESS} will be returned in _errcode_ret_ if the shared virtual memory is allocated successfully. Otherwise, `NULL` will be returned, and _errcode_ret_ will be set to one of the following error values: -* {CL_INVALID_CONTEXT} if _context_ is not a valid context. -* {CL_INVALID_PROPERTY} if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. -* {CL_INVALID_OPERATION} if no devices in _context_ support the SVM type specified by _svm_type_index_, or if a device associated with the SVM allocation does not support the SVM type specified by _svm_type_index_. -* {CL_INVALID_VALUE} if _svm_type_index_ is greater than the number of SVM types supported the devices in _context_. -* {CL_INVALID_BUFFER_SIZE} if _size_ is zero or greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for any OpenCL device in _context_ that supports the specified SVM type, or if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for a device associated with the SVM allocation. -TODO: update depending on the updated queries for available SVM sizes. -* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. -* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. + * {CL_INVALID_CONTEXT} + ** if _context_ is not a valid OpenCL context + * {CL_INVALID_PROPERTY} + ** if a memory property name in _properties_ is not a supported property name + ** if the value specified for a supported property name is not valid + ** if the same property name is specified more than once + * {CL_INVALID_OPERATION} + ** if no devices in _context_ support the SVM type specified by _svm_type_index_ + ** if a device associated with the SVM allocation does not support the SVM type specified by _svm_type_index_ + * {CL_INVALID_VALUE} + ** if _svm_type_index_ is greater than the number of SVM types supported the devices in _context_ + * {CL_INVALID_BUFFER_SIZE} +// TODO: update depending on the updated queries for available SVM sizes: + ** if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for any OpenCL device in _context_ that supports the specified SVM type + ** if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for a device associated with the SVM allocation + * {CL_OUT_OF_RESOURCES} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the device + * {CL_OUT_OF_HOST_MEMORY} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the host TODO: Do we want to document any specific error conditions for invalid property values? @@ -542,12 +555,21 @@ If _ptr_ is `NULL` then no action occurs. {clSVMFreeWithPropertiesKHR} will return {CL_SUCCESS} if the function executes successfully. Otherwise, it returns one of the following errors: -* {CL_INVALID_CONTEXT} if _context_ is not a valid context. -* {CL_INVALID_PROPERTY} if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. -* {CL_INVALID_VALUE} if _flags_ contains an invalid SVM free flag. -* {CL_INVALID_VALUE} if _ptr_ is not a value returned by {clSVMAlloc}, {clSVMAllocWithPropertiesKHR}, or a `NULL` pointer. -* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. -* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. + * {CL_INVALID_CONTEXT} + ** if _context_ is not a valid OpenCL context + * {CL_INVALID_PROPERTY} + ** if a memory property name in _properties_ is not a supported property name + ** if the value specified for a supported property name is not valid + ** if the same property name is specified more than once + * {CL_INVALID_VALUE} + ** if _flags_ contains an invalid SVM free flag + ** if _ptr_ is not a value returned by {clSVMAlloc}, {clSVMAllocWithPropertiesKHR}, or a `NULL` pointer + * {CL_OUT_OF_RESOURCES} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the device + * {CL_OUT_OF_HOST_MEMORY} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the host [NOTE] ==== @@ -588,12 +610,22 @@ If _param_value_size_ret_ is `NULL`, it is ignored. {clGetSVMPointerInfoKHR} returns {CL_SUCCESS} if the function is executed successfully. Otherwise, it will return one of the following error values: -* {CL_INVALID_CONTEXT} if _context_ is not a valid context. -* {CL_INVALID_DEVICE} if _device_ is not a valid device or is not associated with _context_. -* {CL_INVALID_VALUE} if _param_name_ is not a valid SVM allocation query. -* {CL_INVALID_VALUE} if _param_value_ is not `NULL` and _param_value_size_ is smaller than the size of the query return type. -* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. -* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. + * {CL_INVALID_CONTEXT} + ** if _context_ is not a valid OpenCL context + * {CL_INVALID_DEVICE} + ** if _device_ is not a valid device + ** if _device_ is not associated with _context_ + * {CL_INVALID_VALUE} + ** if _param_name_ is not one of the supported values + ** if the size in bytes specified by _param_value_size_ is less than the + size of the return type specified in the <> + table and _param_value_ is not `NULL` + * {CL_OUT_OF_RESOURCES} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the device + * {CL_OUT_OF_HOST_MEMORY} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the host [[svm-queries-table]] .List of supported param_names by clGetSVMPointerInfoKHR @@ -680,14 +712,26 @@ The suggested SVM type may be {CL_UINT_MAX}, indicating that there is no SVM all {clGetSVMSuggestedTypeIndexKHR} returns {CL_SUCCESS} if the query executed successfully. Otherwise, it returns one of the following errors: -* {CL_INVALID_CONTEXT} if _context_ is not a valid context. -* {CL_INVALID_PROPERTY} if a memory property name in _properties_ is not a supported property name, if the value specified for a supported property name is not valid, or if the same property name is specified more than once. -* {CL_INVALID_VALUE} if _required_capabilities_ or _desired_capabilities_ contains an invalid SVM capability. -* {CL_INVALID_BUFFER_SIZE} if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for any OpenCL device in _context_ or if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for a device associated with the SVM allocation. -TODO: update depending on the updated queries for available SVM sizes. -* {CL_INVALID_VALUE} if _suggested_svm_type_index_ is `NULL`. -* {CL_OUT_OF_RESOURCES} if there is a failure to allocate resources required by the OpenCL implementation on the device. -* {CL_OUT_OF_HOST_MEMORY} if there is a failure to allocate resources required by the OpenCL implementation on the host. + * {CL_INVALID_CONTEXT} + ** if _context_ is not a valid OpenCL context + * {CL_INVALID_PROPERTY} + ** if a memory property name in _properties_ is not a supported property name + ** if the value specified for a supported property name is not valid + ** if the same property name is specified more than once + * {CL_INVALID_VALUE} + ** if _required_capabilities_ contains an invalid SVM capability + ** if _desired_capabilities_ contains an invalid SVM capability + ** if _suggested_svm_type_index_ is `NULL` + * {CL_INVALID_BUFFER_SIZE} +// TODO: update depending on the updated queries for available SVM sizes: + ** if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for any OpenCL device in _context_ that supports the specified SVM type + ** if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for a device associated with the SVM allocation + * {CL_OUT_OF_RESOURCES} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the device + * {CL_OUT_OF_HOST_MEMORY} + ** if there is a failure to allocate resources required by the OpenCL + implementation on the host ===== Using SVM with Kernels From 862a2804f8d1535b1c8ba3eb118950746f2bb1ae Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 7 Oct 2025 08:18:07 -0700 Subject: [PATCH 16/18] tidy up and resolve a few open issues --- extensions/cl_khr_unified_svm.asciidoc | 59 +++++++++++++++----------- 1 file changed, 35 insertions(+), 24 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index fc2f80998..63e8a1f42 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -821,27 +821,30 @@ The initial version of this extension will only support allocating host memory. + -- `RESOLVED`: The behavior is defined for all queries for this case. + +This behavior is tested by the CTS test `unified_svm_api_query_defaults`. -- . Do we want separate "memset" APIs to set to different sized "value", such as 8-bits, 16-bits?, 32-bits, or others? Do we want to go back to a "fill" API? + -- -`RESOLVED`: We are reusing the "fill" API. +`RESOLVED`: We are reusing the "fill" API {clEnqueueSVMMemFill}. -- -. What are the restrictions for the _dst_ptr_ values that can be passed to the "fill" API? +. What are the restrictions for the _dst_ptr_ values that can be passed to {clEnqueueSVMMemFill}? + -- `RESOLVED`: +However, we should still tidy up the spec text for these cases. * Can a device "fill" another device's allocation? (Recommendation: Yes, but finalize as part of multi-device support.) * Can a device "fill" arbitrary host memory? (No, undefined behavior unless system SVM is supported.) * Can a device "fill" a USM allocation from another context? (No, undefined behavior.) -Note, there are no existing CTS tests that pass an arbitrary host allocation to {clEnqueueSVMMemFill}. +Note, there are not any existing CTS tests that pass an arbitrary host allocation to {clEnqueueSVMMemFill}. -- -. What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to the "memcpy" API? +. What are the restrictions for the _src_ptr_ and _dst_ptr_ values that can be passed to {clEnqueueSVMMemcpy}? + -- `RESOLVED`: @@ -853,6 +856,8 @@ Note, there are no existing CTS tests that pass an arbitrary host allocation to * Can a device "memcpy" from arbitrary host memory? (Yes, we already have tests.) * Can a device "memcpy" from arbitrary host memory to arbitrary host memory? (Yes, we already have tests.) * Can the memory region to copy to overlap the memory region to copy from? (No, already an error.) + +The valid cases are tested by the CTS tests `unified_svm_memcpy` and `unified_svm_corner_case_memcpy`. -- . Do we want to support migrating to devices other than the device associated with _command_queue_? @@ -877,7 +882,7 @@ The initial version of this extension will not extend {clEnqueueSVMMigrateMem}, . Should we allow querying the associated device for a USM allocation using {clGetSVMPointerInfoKHR}? + -- -`RESOLVED`: Yes, we should. +`RESOLVED`: Yes, we should, supported by {CL_SVM_INFO_ASSOCIATED_DEVICE_HANDLE_KHR}. -- . Should we add explicit mem alloc flags for `CACHED` and `UNCACHED`? @@ -889,7 +894,7 @@ In a layered extension, we recommend adding cacheability properties instead of c The layered extension could add coarse `CACHED` and `UNCACHED` properties, or separate properties for host vs. device, or even separate properties for specific cache levels. -- -. At least for HOST and SHARED allocations, should we have separate mem alloc flags for the host and the device? +. At least for `HOST` and `SHARED` allocations, should we have separate mem alloc flags for the host and the device? + -- `RESOLVED`: We removed the _flags_ argument entirely. @@ -901,9 +906,7 @@ Specifically, is `NULL` a valid value for `ptr`? Is `size` equal to zero valid? + -- -*UNRESOLVED*: - -Tentative resolution: +`RESOLVED`: .. A `size` equal to zero is valid. When `size` is zero, the call to {clEnqueueSVMMigrateMem}, {clEnqueueSVMMemFill}, and {clEnqueueSVMMemcpy} trivially succeeds, similar to an enqueued marker. @@ -923,10 +926,12 @@ For reference, the full set of options we considered were: .. A `size` equal to zero is valid. This appears to be the specified behavior for the C `memcpy` and `memset` functions. .. [.line-through]#A `size` equal to zero is undefined behavior.# -.. A `size` equal to zero is an error. +.. [.line-through]#A `size` equal to zero is an error.# .. A `ptr` equal to `NULL` is valid if and only if `size` is equal to zero, otherwise it is an error. .. [.line-through]#A `ptr` equal to `NULL` is undefined behavior. This appears to be the specified behavior for the C `memcpy` and `memset` functions.# -.. A `ptr` equal to `NULL` is an error. +.. [.line-through]#A `ptr` equal to `NULL` is an error.# + +These cases are tested by the CTS test `unified_svm_corner_case_migrate_mem`. -- . Should we add a device query for a maximum supported SVM alignment, or should the maximum supported alignment implicitly be defined by the size of the largest data type supported by the device? @@ -947,7 +952,8 @@ See internal merge request 198. + -- `RESOLVED`: -The initial version of this extension will not support larger fill patterns. +The initial version of this extension will not support larger fill patterns, therefore the maximum supported fill pattern size will implicitly be defined by the size of the largest data type supported by the device. +Supporting larger fill patterns could be added as a layered extension. -- . Can a pointer to a device, host, or shared SVM allocation be used to create a {cl_mem_TYPE} using {CL_MEM_USE_HOST_PTR}? @@ -1009,17 +1015,20 @@ One particular enhancement we may want to consider, though, is whether calling { In the current specification, this is explicitly not a synchronization point. However, in other APIs, querying the event status and observing that the event is complete is a synchronization point. Should we adopt this behavior also, or do we want users to call {clWaitForEvents} to define a synchronization point? +See internal issue 373. -- . Should it be an error to set an unknown pointer as a kernel argument using {clSetKernelArgSVMPointer} if no devices support shared system allocations? + -- -*UNRESOLVED*: -Returning an error for an unknown pointer is helpful to identify and diagnose possible programming errors sooner, but passing a pointer to arbitrary memory to a function on the host is not an error until the pointer is dereferenced. +`RESOLVED`: +It is not an error to set an unknown pointer as a kernel argument using {clSetKernelArgSVMPointer}. +This behavior matches passing a pointer to arbitrary memory to a function on the host, where it is not an error until the pointer is dereferenced. +Similarly, it is not an error to pass an unknown pointer via {clSetKernelExecInfo}({CL_KERNEL_EXEC_INFO_SVM_PTRS}). -If we relax the error condition for {clSetKernelArgSVMPointer} then we could also consider relaxing the error condition for {clSetKernelExecInfo}({CL_KERNEL_EXEC_INFO_SVM_PTRS}) similarly. +Note that we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer]. -Note that if the error condition is removed we can still check for possible programming errors via optional USM checking layers, such as the https://github.com/intel/opencl-intercept-layer/blob/master/docs/controls.md#usmchecking-bool[USMChecking] functionality in the https://github.com/intel/opencl-intercept-layer[OpenCL Intercept Layer]. +These cases are tested by the CTS tests `unified_svm_corner_case_set_kernel_arg` and `unified_svm_corner_case_set_kernel_exec_info`. -- . Should we support a "rect" or "2D" memcpy similar to {clEnqueueCopyBufferRect}? @@ -1046,8 +1055,9 @@ However, for host allocations, some implementations are able to support larger a Possible resolutions: -* Add a new query representing the maximum host memory allocation size supported by the device, e.g. `CL_DEVICE_MAX_HOST_MEM_ALLOC_SIZE_KHR`. -For some devices, this query will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices this query will return a larger value. +* Add a new query representing the maximum device-owned and host-owned memory allocation sizes supported by the device, e.g. `CL_DEVICE_MAX_DEVICE_OWNED_MEM_ALLOC_SIZE_KHR` and `CL_DEVICE_MAX_HOST_OWNED_MEM_ALLOC_SIZE_KHR`. +For some devices, these queries will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices the queries will return a larger value. +For SVM memory types that are not device-owned or host-owned, the existing limits will continue to apply. * Relax the error behavior so implementations may return {CL_INVALID_BUFFER_SIZE}, but they would not be required to return an error if they support larger allocation sizes. * Do nothing and keep the existing error behavior. -- @@ -1055,13 +1065,12 @@ For some devices, this query will return the same value as {CL_DEVICE_MAX_MEM_AL . Should it be an error to allocate zero bytes? + -- -*UNRESOLVED*: - -Tentative resolution: Allow zero-sized allocations and require returning a `NULL` pointer. +`RESOLVED`: +We will allow zero-sized allocations and require returning a `NULL` pointer. This is considered a successful operation and no error will be returned. We evaluated many scenarios and determined that there is no clearly correct behavior. -The scenarios we evaluated were: +For reference, the scenarios we evaluated were: * For OpenCL 2.0 SVM, {clSVMAlloc} with a size of zero is specified to return a `NULL` pointer. Because {clSVMAlloc} has no mechanism to return an error code, it is unspecified whether this is considered an error. @@ -1072,7 +1081,7 @@ If a `NULL` pointer is returned then `errno` may be set to an implementation-def If a unique non-null pointer is returned then it cannot be dereferenced. * Allocating an array of zero elements using `new` must return a non-null pointer, though dereferencing the pointer is undefined. -For reference, the full set of options we considered were: +Also for reference, the full set of options we considered were: .. [.line-through]#Allow zero-sized allocations and require returning a non-null pointer that must be freed.# .. Allow zero-sized allocations and require returning a `NULL` pointer. @@ -1080,7 +1089,9 @@ No error will be generated. Note, it is not currently an error to free a `NULL` pointer. .. [.line-through]#Allow zero-sized allocations but allow returning a `NULL` pointer. No error would be generated, even if a `NULL` pointer is returned.# .. [.line-through]#Specify that this case is implementation-defined.# -.. Specify that this case is an error. +.. [.line-through]#Specify that this case is an error.# + +This case is tested by the CTS test `unified_svm_corner_case_alloc_free`. -- Note: The following issues were added to the KHR USM extension: From 6d669008c4e906cfbd4f805854d39a746e6dba68 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Fri, 10 Oct 2025 15:39:41 -0700 Subject: [PATCH 17/18] resolve issue about devices and sub-devices --- extensions/cl_khr_unified_svm.asciidoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index 63e8a1f42..e6ca25d2f 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -1112,10 +1112,10 @@ For reference, for other APIs: . What about devices and sub-devices? + -- -*UNRESOLVED*: Neither the OpenCL specification nor this extension specification currently says much about how SVM behaves for devices and sub-devices. +`RESOLVED`: +In this extension, there is no implicit requirement that an allocation owned by a sub-device is accessible to its parent device or that an allocation owned by a parent device is accessible to its sub-device. -My rough expectations are that if an allocation is made against a context with a device and a sub-device, and the allocation is associated with the device, then the allocation is also accessible to the sub-device. -Unless additional clarification is needed, perhaps this issue may simply be resolved. +The query for {clGetSVMPointerInfoKHR}({CL_SVM_ALLOC_ACCESS_FLAGS_KHR}) can be used to determine whether an allocation is accessible to a parent device or a sub-device, just like for any other device. -- . Should we move more of the {clSVMAllocWithPropertiesKHR} arguments to properties? From f932a6e0deb0ee56e8d220af35deff21e2076c79 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Mon, 3 Nov 2025 10:44:32 -0800 Subject: [PATCH 18/18] resolve issue about maximum SVM allocation size --- extensions/cl_khr_unified_svm.asciidoc | 34 ++++++++++++++------------ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/extensions/cl_khr_unified_svm.asciidoc b/extensions/cl_khr_unified_svm.asciidoc index e6ca25d2f..0d64ca072 100644 --- a/extensions/cl_khr_unified_svm.asciidoc +++ b/extensions/cl_khr_unified_svm.asciidoc @@ -6,6 +6,7 @@ :icons: font include::../config/attribs.txt[] include::{generated}/api/api-dictionary-no-links.asciidoc[] +include::{generated}/api/ext-dictionary-no-links.asciidoc[] :source-highlighter: coderay = cl_khr_unified_svm @@ -472,11 +473,7 @@ Otherwise, `NULL` will be returned, and _errcode_ret_ will be set to one of the ** if no devices in _context_ support the SVM type specified by _svm_type_index_ ** if a device associated with the SVM allocation does not support the SVM type specified by _svm_type_index_ * {CL_INVALID_VALUE} - ** if _svm_type_index_ is greater than the number of SVM types supported the devices in _context_ - * {CL_INVALID_BUFFER_SIZE} -// TODO: update depending on the updated queries for available SVM sizes: - ** if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for any OpenCL device in _context_ that supports the specified SVM type - ** if _size_ is greater than {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for a device associated with the SVM allocation + ** if _svm_type_index_ is greater than the number of SVM types supported by the devices in _context_ * {CL_OUT_OF_RESOURCES} ** if there is a failure to allocate resources required by the OpenCL implementation on the device @@ -486,6 +483,14 @@ Otherwise, `NULL` will be returned, and _errcode_ret_ will be set to one of the TODO: Do we want to document any specific error conditions for invalid property values? +[NOTE] +==== +When _context_ includes a device that supports {cl_khr_unified_svm_EXT}, _size_ may be zero and may exceed {CL_DEVICE_MAX_MEM_ALLOC_SIZE} for the device, for both {clSVMAlloc} and {clSVMAllocWithPropertiesKHR}. + +When _size_ is zero, {clSVMAlloc} and {clSVMAllocWithPropertiesKHR} must return a `NULL` pointer and must not generate an error. +When _size_ exceeds {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, the allocation may succeed and return a non-`NULL` pointer, or the allocation may fail and return a `NULL` pointer and an implementation-defined error code. +==== + [[svm-alloc-properties-table]] [caption="Table X. "] .List of supported SVM allocation properties by {clSVMAllocWithPropertiesKHR} @@ -1047,19 +1052,16 @@ Note that standard SYCL does not include a "rect" memcpy, though the https://git If so, what should the upper limit be? + -- -*UNRESOLVED*: -The upper limit is currently defined by {CL_DEVICE_MAX_MEM_ALLOC_SIZE} and if the allocation size exceeds this value then the error code {CL_INVALID_BUFFER_SIZE} is returned. - -This behavior is consistent with {clSVMAlloc} (although {clSVMAlloc} does not return an error code it is specified to return a `NULL` pointer in this case) and {clCreateBuffer}. -However, for host allocations, some implementations are able to support larger allocation sizes. +`RESOLVED`: +This extension will not define an upper limit on the size of an SVM allocation. +Applications may discover the largest SVM allocation size experimentally or use implementation-defined mechanisms to determine the largest SVM allocation size. +This is consistent with `malloc`, which also has no defined upper or lower bound on the amount of memory that may be allocated, although this is different than both the non-extended {clSVMAlloc} and {clCreateBuffer}, where the upper limit is defined by {CL_DEVICE_MAX_MEM_ALLOC_SIZE}. -Possible resolutions: +By defining no limit on the size of an SVM allocation, we leave the door open for future mechanisms to query the largest SVM allocation size. +For example, we may add queries for the maximum device-owned or host-owned memory allocation sizes, or queries for each SVM type, or other more sophisticated queries. +We also do not constrain the maximum SVM allocation size to {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, especially for implementations that are able to support larger allocation sizes from host memory. -* Add a new query representing the maximum device-owned and host-owned memory allocation sizes supported by the device, e.g. `CL_DEVICE_MAX_DEVICE_OWNED_MEM_ALLOC_SIZE_KHR` and `CL_DEVICE_MAX_HOST_OWNED_MEM_ALLOC_SIZE_KHR`. -For some devices, these queries will return the same value as {CL_DEVICE_MAX_MEM_ALLOC_SIZE}, but for other devices the queries will return a larger value. -For SVM memory types that are not device-owned or host-owned, the existing limits will continue to apply. -* Relax the error behavior so implementations may return {CL_INVALID_BUFFER_SIZE}, but they would not be required to return an error if they support larger allocation sizes. -* Do nothing and keep the existing error behavior. +Note that for the non-extended {clSVMAlloc} and {clCreateBuffer}, {CL_DEVICE_MAX_MEM_ALLOC_SIZE} only defines an upper bound on the amount of memory that _may_ be allocated, and there is no guarantee that an allocation of this size will succeed, say due to prior memory allocations, or memory fragmentation, or any other reason. -- . Should it be an error to allocate zero bytes?