From 2e7c1b589bc58be0e155098cf87d8535d41adeab Mon Sep 17 00:00:00 2001 From: Tobias Burnus Date: Wed, 26 Mar 2025 11:27:56 +0100 Subject: [PATCH] libgomp.texi: Document supported OpenMP 'interop' types for nvptx and gcn Note that this commit also updates the API interface to OpenMP 6.0; while 5.1 and 5.2 use 'int *' for the the ret_code argument, OpenMP 6.0 changed this to omp_interop_rc_t *; this enum also exists in OpenMP 5.1. However, C++ does not like this change such that unless NULL is passed (i.e. the argument is ignored), OpenMP 5.x and 6.x are not compatible. Note that GCC's omp.h already follows OpenMP 6.0 and is now in sync with the documentation. libgomp/ChangeLog: * libgomp.texi (OpenMP 5.1): Add @ref to offload-target specifics for 'interop'. (OpenMP 6.0): Mark dispatch's interop clause as implemented. (omp_get_interop_int, omp_get_interop_str, omp_get_interop_ptr, omp_get_interop_type_desc): Add @ref to Offload-Target Specifics; change ret_code argument type to 'omp_interop_rc_t *'. (Offload-Target Specifics): Document the supported OpenMP interop foreign runtimes on AMD and Nvidia GPUs. --- libgomp/libgomp.texi | 170 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 161 insertions(+), 9 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index db42c32e748..4217c29dd37 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -313,7 +313,7 @@ The OpenMP 4.5 specification is fully supported. clauses @tab N @tab @item Indirect calls to the device version of a procedure or function in @code{target} regions @tab Y @tab -@item @code{interop} directive @tab N @tab +@item @code{interop} directive @tab Y @tab Cf. @ref{Offload-Target Specifics} @item @code{omp_interop_t} object support in runtime routines @tab Y @tab @item @code{nowait} clause in @code{taskwait} directive @tab Y @tab @item Extensions to the @code{atomic} directive @tab Y @tab @@ -544,7 +544,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @tab N @tab @item Semicolon-separated list to @code{uses_allocators} @tab N @tab @item New @code{need_device_addr} modifier to @code{adjust_args} clause @tab N @tab -@item @code{interop} clause to @code{dispatch} @tab N @tab +@item @code{interop} clause to @code{dispatch} @tab Y @tab @item Scope requirement changes for @code{declare_target} @tab N @tab @item @code{message} and @code{severity} clauses to @code{parallel} directive @tab N @tab @@ -3047,7 +3047,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3061,7 +3061,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2, @@ -3092,7 +3093,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{void *omp_get_interop_ptr(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3106,7 +3107,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3, @@ -3136,7 +3138,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_str(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3150,7 +3152,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.4, @@ -3233,7 +3236,8 @@ a null pointer is returned. The effect of running this routine in a @end multitable @item @emph{See also}: -@ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name} +@ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.6, @@ -6836,6 +6840,10 @@ The following sections present notes on the offload-target specifics @node AMD Radeon @section AMD Radeon (GCN) +@menu +* Foreign-runtime support for AMD GPUs:: +@end menu + On the hardware side, there is the hierarchy (fine to coarse): @itemize @item work item (thread) @@ -6911,10 +6919,75 @@ The implementation remark: @end itemize +@node Foreign-runtime support for AMD GPUs +@subsection OpenMP @code{interop} -- Foreign-Runtime Support for AMD GPUs + +On AMD GPUs, the foreign runtimes are HIP (C++ Heterogeneous-Compute Interface +for Portability) and HSA (Heterogeneous System Architecture), +where HIP is the default. The interop object is created using OpenMP's +@code{interop} directive or, implicitly, when invoking a @code{declare variant} +procedure that has the @code{append_args} clause. In either case, the +@code{prefer_type} modifier determines whether HIP or HSA is used. + +When specifying the @code{targetsync} modifier: For HIP, a stream is +created using @code{hipStreamCreate}. For HSA, a queue is created of type +@code{HSA_QUEUE_TYPE_MULTI} with a queue size of 64. + +Invoke the @ref{Interoperability Routines} on an interop object to obtain +the following properties. For properties with integral (int), pointer (ptr), +or string (str) data type, call @code{omp_get_interop_int}, +@code{omp_get_interop_ptr}, or @code{omp_get_interop_str}, respectively. +Note that @code{device_num} is the OpenMP device number +while @code{device} is the HIP device number or HSA device handle. + +For the API routine call, add the prefix @code{omp_ipr_} to the property name; +for instance: +@smallexample +omp_interop_rc_t ret; +int device_num = omp_get_interop_int (my_interop_obj, omp_ipr_device_num, &ret); +@end smallexample + +@noindent +Available properties for an HIP interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hip} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"hip"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{1} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"amd"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hipDevice_t} @tab int @tab +@item @code{device_context} @tab @code{hipCtx_t} @tab ptr @tab +@item @code{targetsync} @tab @code{hipStream_t} @tab ptr @tab +@end multitable + +@noindent +Available properties for an HSA interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hsa} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"hsa"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{1} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"amd"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hsa_agent *} @tab ptr @tab +@item @code{device_context} @tab N/A @tab @tab +@item @code{targetsync} @tab @code{hsa_queue *} @tab ptr @tab +@end multitable + + @node nvptx @section nvptx +@menu +* Foreign-runtime support for Nvidia GPUs:: +@end menu + On the hardware side, there is the hierarchy (fine to coarse): @itemize @item thread @@ -7007,6 +7080,85 @@ The implementation remark: @end itemize +@node Foreign-runtime support for Nvidia GPUs +@subsection OpenMP @code{interop} -- Foreign-Runtime Support for Nvidia GPUs + +On Nvidia GPUs, the foreign runtimes APIs are the CUDA runtime API, the CUDA +driver API, and HIP, the C++ Heterogeneous-Compute Interface for Portability +that is---on CUDA-based systems---a very thin layer on top of the CUDA API. By +default, CUDA is used. The interop object is created using OpenMP's +@code{interop} directive or, implicitly, when invoking a @code{declare variant} +procedure that has the @code{append_args} clause. In either case, the +@code{prefer_type} modifier determines whether CUDA, CUDA driver, or HSA is +used. + +When specifying the @code{targetsync} modifier, a CUDA stream is created using +the @code{CU_STREAM_DEFAULT} flag. + +Invoke the @ref{Interoperability Routines} on an interop object to obtain +the following properties. For properties with integral (int), pointer (ptr), +or string (str) data type, call @code{omp_get_interop_int}, +@code{omp_get_interop_ptr}, or @code{omp_get_interop_str}, respectively. +Note that @code{device_num} is the OpenMP device number while @code{device} +is the CUDA, CUDA Driver, or HIP device number. + +For the API routine call, add the prefix @code{omp_ipr_} to the property name; +for instance: +@smallexample +omp_interop_rc_t ret; +int device_num = omp_get_interop_int (my_interop_obj, omp_ipr_device_num, &ret); +@end smallexample + +@noindent +Available properties for a CUDA runtime API interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_cuda} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"cuda"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"nvidia"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{int} @tab int @tab +@item @code{device_context} @tab N/A @tab @tab +@item @code{targetsync} @tab @code{cudaStream_t} @tab ptr @tab +@end multitable + +@noindent +Available properties for a CUDA driver API interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_cuda_driver} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"cuda_driver"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"nvidia"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{CUdevice} @tab int @tab +@item @code{device_context} @tab @code{CUcontext} @tab ptr @tab +@item @code{targetsync} @tab @code{CUstream} @tab ptr @tab +@end multitable + +@noindent +Available properties for an HIP interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hip} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"hip"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"nvidia"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hipDevice_t} @tab int @tab +@item @code{device_context} @tab @code{hipCtx_t} @tab ptr @tab +@item @code{targetsync} @tab @code{hipStream_t} @tab ptr @tab +@end multitable + + + @c --------------------------------------------------------------------- @c The libgomp ABI @c ---------------------------------------------------------------------