diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index db42c32e748..4217c29dd37 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -313,7 +313,7 @@ The OpenMP 4.5 specification is fully supported. clauses @tab N @tab @item Indirect calls to the device version of a procedure or function in @code{target} regions @tab Y @tab -@item @code{interop} directive @tab N @tab +@item @code{interop} directive @tab Y @tab Cf. @ref{Offload-Target Specifics} @item @code{omp_interop_t} object support in runtime routines @tab Y @tab @item @code{nowait} clause in @code{taskwait} directive @tab Y @tab @item Extensions to the @code{atomic} directive @tab Y @tab @@ -544,7 +544,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab @tab N @tab @item Semicolon-separated list to @code{uses_allocators} @tab N @tab @item New @code{need_device_addr} modifier to @code{adjust_args} clause @tab N @tab -@item @code{interop} clause to @code{dispatch} @tab N @tab +@item @code{interop} clause to @code{dispatch} @tab Y @tab @item Scope requirement changes for @code{declare_target} @tab N @tab @item @code{message} and @code{severity} clauses to @code{parallel} directive @tab N @tab @@ -3047,7 +3047,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3061,7 +3061,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2, @@ -3092,7 +3093,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{void *omp_get_interop_ptr(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3106,7 +3107,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3, @@ -3136,7 +3138,7 @@ the initial device is unspecified. @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_str(const omp_interop_t interop, - omp_interop_property_t property_id, int *ret_code)} + omp_interop_property_t property_id, omp_interop_rc_t *ret_code)} @end multitable @item @emph{Fortran}: @@ -3150,7 +3152,8 @@ the initial device is unspecified. @end multitable @item @emph{See also}: -@ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc} +@ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.4, @@ -3233,7 +3236,8 @@ a null pointer is returned. The effect of running this routine in a @end multitable @item @emph{See also}: -@ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name} +@ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name}, +@ref{Offload-Target Specifics} @item @emph{Reference}: @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.6, @@ -6836,6 +6840,10 @@ The following sections present notes on the offload-target specifics @node AMD Radeon @section AMD Radeon (GCN) +@menu +* Foreign-runtime support for AMD GPUs:: +@end menu + On the hardware side, there is the hierarchy (fine to coarse): @itemize @item work item (thread) @@ -6911,10 +6919,75 @@ The implementation remark: @end itemize +@node Foreign-runtime support for AMD GPUs +@subsection OpenMP @code{interop} -- Foreign-Runtime Support for AMD GPUs + +On AMD GPUs, the foreign runtimes are HIP (C++ Heterogeneous-Compute Interface +for Portability) and HSA (Heterogeneous System Architecture), +where HIP is the default. The interop object is created using OpenMP's +@code{interop} directive or, implicitly, when invoking a @code{declare variant} +procedure that has the @code{append_args} clause. In either case, the +@code{prefer_type} modifier determines whether HIP or HSA is used. + +When specifying the @code{targetsync} modifier: For HIP, a stream is +created using @code{hipStreamCreate}. For HSA, a queue is created of type +@code{HSA_QUEUE_TYPE_MULTI} with a queue size of 64. + +Invoke the @ref{Interoperability Routines} on an interop object to obtain +the following properties. For properties with integral (int), pointer (ptr), +or string (str) data type, call @code{omp_get_interop_int}, +@code{omp_get_interop_ptr}, or @code{omp_get_interop_str}, respectively. +Note that @code{device_num} is the OpenMP device number +while @code{device} is the HIP device number or HSA device handle. + +For the API routine call, add the prefix @code{omp_ipr_} to the property name; +for instance: +@smallexample +omp_interop_rc_t ret; +int device_num = omp_get_interop_int (my_interop_obj, omp_ipr_device_num, &ret); +@end smallexample + +@noindent +Available properties for an HIP interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hip} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"hip"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{1} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"amd"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hipDevice_t} @tab int @tab +@item @code{device_context} @tab @code{hipCtx_t} @tab ptr @tab +@item @code{targetsync} @tab @code{hipStream_t} @tab ptr @tab +@end multitable + +@noindent +Available properties for an HSA interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hsa} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"hsa"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{1} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"amd"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hsa_agent *} @tab ptr @tab +@item @code{device_context} @tab N/A @tab @tab +@item @code{targetsync} @tab @code{hsa_queue *} @tab ptr @tab +@end multitable + + @node nvptx @section nvptx +@menu +* Foreign-runtime support for Nvidia GPUs:: +@end menu + On the hardware side, there is the hierarchy (fine to coarse): @itemize @item thread @@ -7007,6 +7080,85 @@ The implementation remark: @end itemize +@node Foreign-runtime support for Nvidia GPUs +@subsection OpenMP @code{interop} -- Foreign-Runtime Support for Nvidia GPUs + +On Nvidia GPUs, the foreign runtimes APIs are the CUDA runtime API, the CUDA +driver API, and HIP, the C++ Heterogeneous-Compute Interface for Portability +that is---on CUDA-based systems---a very thin layer on top of the CUDA API. By +default, CUDA is used. The interop object is created using OpenMP's +@code{interop} directive or, implicitly, when invoking a @code{declare variant} +procedure that has the @code{append_args} clause. In either case, the +@code{prefer_type} modifier determines whether CUDA, CUDA driver, or HSA is +used. + +When specifying the @code{targetsync} modifier, a CUDA stream is created using +the @code{CU_STREAM_DEFAULT} flag. + +Invoke the @ref{Interoperability Routines} on an interop object to obtain +the following properties. For properties with integral (int), pointer (ptr), +or string (str) data type, call @code{omp_get_interop_int}, +@code{omp_get_interop_ptr}, or @code{omp_get_interop_str}, respectively. +Note that @code{device_num} is the OpenMP device number while @code{device} +is the CUDA, CUDA Driver, or HIP device number. + +For the API routine call, add the prefix @code{omp_ipr_} to the property name; +for instance: +@smallexample +omp_interop_rc_t ret; +int device_num = omp_get_interop_int (my_interop_obj, omp_ipr_device_num, &ret); +@end smallexample + +@noindent +Available properties for a CUDA runtime API interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_cuda} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"cuda"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"nvidia"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{int} @tab int @tab +@item @code{device_context} @tab N/A @tab @tab +@item @code{targetsync} @tab @code{cudaStream_t} @tab ptr @tab +@end multitable + +@noindent +Available properties for a CUDA driver API interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_cuda_driver} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"cuda_driver"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"nvidia"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{CUdevice} @tab int @tab +@item @code{device_context} @tab @code{CUcontext} @tab ptr @tab +@item @code{targetsync} @tab @code{CUstream} @tab ptr @tab +@end multitable + +@noindent +Available properties for an HIP interop object: + +@multitable @columnfractions .20 .35 .20 .20 +@headitem Property @tab C data type @tab API routine @tab value (if constant) +@item @code{fr_id} @tab @code{omp_interop_fr_t} @tab int @tab @code{omp_fr_hip} +@item @code{fr_name} @tab @code{const char *} @tab str @tab @code{"hip"} +@item @code{vendor} @tab @code{int} @tab int @tab @code{11} +@item @code{vendor_name} @tab @code{const char *} @tab str @tab @code{"nvidia"} +@item @code{device_num} @tab @code{int} @tab int @tab +@item @code{platform} @tab N/A @tab @tab +@item @code{device} @tab @code{hipDevice_t} @tab int @tab +@item @code{device_context} @tab @code{hipCtx_t} @tab ptr @tab +@item @code{targetsync} @tab @code{hipStream_t} @tab ptr @tab +@end multitable + + + @c --------------------------------------------------------------------- @c The libgomp ABI @c ---------------------------------------------------------------------