The HIP header files recognize the used compiler, defaulting to either AMD
or Nvidia/CUDA; thus, the alternative way of explicitly defining a macro is
less prominently documented. With GCC, the user has to define the
preprocessor macro manually. Hence, as a service to the user, mention
__HIP_PLATFORM_AMD__ and __HIP_PLATFORM_NVIDIA__ in the interop documentation,
even though it has only indirectly to do with GCC and its interop support.
Note to commit-log readers, only: For Fortran, the hipfort modules can be
used; when compiling the hipfort package (defaults to use gfortran), it
generates the module (*.mod) files in include/hipfort/{amdgcn,nvidia}/ such
that the choice is made by setting the respective include path.
libgomp/ChangeLog:
* libgomp.texi (gcn interop, nvptx interop): For HIP with C/C++, add
a note about setting a preprocessor define.
When mapping an allocatable variable (or derived-type component), explicitly
or implicitly, all its allocated allocatable components will automatically be
mapped. The patch implements the target hooks, added for this feature to
omp-low.cc with commit r15-3895-ge4a58b6f28383c.
Namely, there is a check whether there are allocatable components at all:
gfc_omp_deep_mapping_p. Then gfc_omp_deep_mapping_cnt, counting the number
of required mappings; this is a dynamic value as it depends on array
bounds and whether an allocatable is allocated or not.
And, finally, the actual mapping: gfc_omp_deep_mapping.
Polymorphic variables are partially supported: the mapping of the _data
component is fully supported, but only components of the declared type
are processed for additional allocatables. Additionally, _vptr is not
touched. This means that everything needing _vtab information requires
unified shared memory; in particular, _size data is required when
accessing elements of polymorphic arrays.
However, for scalar arrays, accessing components of the declare type
should work just fine.
As polymorphic variables are not (really) supported and OpenMP 6
explicitly disallows them, there is now a warning (-Wopenmp) when
they are encountered. Unlimited polymorphics are rejected (error).
Additionally, PRIVATE and FIRSTPRIVATE are not quite supported for
allocatable components, polymorphic components and as polymorphic
variable. Thus, those are now rejected as well.
gcc/fortran/ChangeLog:
* f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING,
LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT):
Define.
* openmp.cc (gfc_match_omp_clause_reduction): Fix location setting.
(resolve_omp_clauses): Permit allocatable components, reject
them and polymorphic variables in PRIVATE/FIRSTPRIVATE.
* trans-decl.cc (add_clause): Set clause location.
* trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok and
shallow_alloc_only Boolean arguments.
(gfc_omp_replace_alloc_by_to_mapping): New.
(gfc_omp_private_outer_ref, gfc_walk_alloc_comps,
gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor,
gfc_omp_clause_assign_op, gfc_omp_clause_dtor): Update call to it.
(gfc_omp_finish_clause): Minor cleanups, improve location data,
handle allocatable components.
(gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item,
gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop,
gfc_omp_get_array_size, gfc_omp_elmental_loop,
gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p,
gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do,
gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New.
(gfc_trans_omp_array_section): Save array descriptor in case
deep-mapping lang hook will need it.
(gfc_trans_omp_clauses): Likewise; use better clause location data.
* trans.h (gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt,
gfc_omp_deep_mapping): Add function prototypes.
libgomp/ChangeLog:
* libgomp.texi (5.0 Impl. Status): Mark mapping alloc comps as 'Y'.
* testsuite/libgomp.fortran/allocatable-comp.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-9.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/map-alloc-comp-1.f90: Remove dg-error.
* gfortran.dg/gomp/polymorphic-mapping-2.f90: Update warn wording.
* gfortran.dg/gomp/polymorphic-mapping.f90: Change expected
diagnostic; some tests moved to ...
* gfortran.dg/gomp/polymorphic-mapping-1.f90: ... here as new test.
* gfortran.dg/gomp/polymorphic-mapping-3.f90: New test.
* gfortran.dg/gomp/polymorphic-mapping-4.f90: New test.
* gfortran.dg/gomp/polymorphic-mapping-5.f90: New test.
Note that this commit also updates the API interface to OpenMP 6.0;
while 5.1 and 5.2 use 'int *' for the the ret_code argument,
OpenMP 6.0 changed this to omp_interop_rc_t *; this enum also exists in
OpenMP 5.1. However, C++ does not like this change such that unless NULL
is passed (i.e. the argument is ignored), OpenMP 5.x and 6.x are not
compatible.
Note that GCC's omp.h already follows OpenMP 6.0 and is now in sync with
the documentation.
libgomp/ChangeLog:
* libgomp.texi (OpenMP 5.1): Add @ref to offload-target specifics
for 'interop'.
(OpenMP 6.0): Mark dispatch's interop clause as implemented.
(omp_get_interop_int, omp_get_interop_str,
omp_get_interop_ptr, omp_get_interop_type_desc): Add @ref to
Offload-Target Specifics; change ret_code argument type to
'omp_interop_rc_t *'.
(Offload-Target Specifics): Document the supported OpenMP
interop foreign runtimes on AMD and Nvidia GPUs.
This patch adds support for the case where #pragma omp declare variant
with append_args is used inside a #pragma omp dispatch interop that
specifies fewer interop args than required by the variant; new interop
objects are implicitly created and then destroyed around the call to the
variant, using the GOMP_interop builtin.
gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_variant): Remove accidental
redeclaration of pref.
gcc/ChangeLog
* gimplify.cc (modify_call_for_omp_dispatch): Adjust arguments.
Remove the "sorry" for the case where new interop objects must be
constructed, and add code to make it work instead.
(expand_variant_call_expr): Adjust arguments and call to
modify_call_for_omp_dispatch.
(gimplify_variant_call_expr): Simplify logic for calling
expand_variant_call_expr.
gcc/testsuite/ChangeLog
* c-c++-common/gomp/append-args-1.c: Adjust expected behavior.
* c-c++-common/gomp/append-args-interop.c: New.
* c-c++-common/gomp/dispatch-11.c: Adjust expected behavior.
* g++.dg/gomp/append-args-1.C: Likewise.
* gfortran.dg/gomp/append-args-interop.f90: New.
* gfortran.dg/gomp/declare-variant-mod-2.f90: Adjust expected behavior.
libgomp/ChangeLog
* libgomp.texi (OpenMP 5.1): Mark append_args as fully supported.
Co-Authored-By: Tobias Burnus <tburnus@baylibre.com>
libgomp/ChangeLog
* libgomp.texi (OpenMP 5.0): Mark metadirective and declare variant
as implemented.
(OpenMP 5.1): Mark target_device as supported.
Add changed interaction between declare target and OpenMP context
and dynamic selector support.
(OpenMP 5.2): Mark otherwise clause as supported, note that
default is also still accepted.
This fixes a large number of smaller and larger issues with the append_args
clause to 'declare variant' and adds Fortran support for it; it also contains
a larger number of testcases.
In particular, for Fortran, it also handles passing allocatable, pointer,
optional arguments to an interop dummy argument with or without value
attribute. And it changes the internal representation such that dumping the
tree does not lead to an ICE.
gcc/c/ChangeLog:
* c-parser.cc (c_finish_omp_declare_variant): Modify how
append_args is saved internally.
gcc/cp/ChangeLog:
* parser.cc (cp_finish_omp_declare_variant): Modify how append_args
is saved internally.
* pt.cc (tsubst_attribute): Likewise.
(tsubst_omp_clauses): Remove C_ORT_OMP_DECLARE_SIMD from interop
handling as no longer called for it.
* decl.cc (omp_declare_variant_finalize_one): Update append_args
changes; fixes for ADL input.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_omp_declare_variant): Add append_args_list.
* openmp.cc (gfc_parser_omp_clause_init_modifiers): New;
splitt of from ...
(gfc_match_omp_init): ... here; call it.
(gfc_match_omp_declare_variant): Update to handle append_args
clause; some syntax handling fixes.
* trans-openmp.cc (gfc_trans_omp_declare_variant): Handle
append_args clause; add some diagnostic.
gcc/ChangeLog:
* gimplify.cc (gimplify_call_expr): For OpenMP's append_args clause
processed by 'omp dispatch', update for internal-representation
changes; fix handling of hidden arguments, add some comments and
handle Fortran's value dummy and optional/pointer/allocatable actual
args.
libgomp/ChangeLog:
* libgomp.texi (Impl. Status): Update for accumpulated changes
related to 'dispatch' and interop.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/append-args-1.c: Update dg-*.
* c-c++-common/gomp/append-args-3.c: Likewise.
* g++.dg/gomp/append-args-1.C: Likewise.
* gfortran.dg/gomp/adjust-args-1.f90: Likewise.
* gfortran.dg/gomp/adjust-args-3.f90: Likewise.
* gfortran.dg/gomp/declare-variant-2.f90: Likewise.
* c-c++-common/gomp/append-args-6.c: New test.
* c-c++-common/gomp/append-args-7.c: New test.
* c-c++-common/gomp/append-args-8.c: New test.
* c-c++-common/gomp/append-args-9.c: New test.
* g++.dg/gomp/append-args-4.C: New test.
* g++.dg/gomp/append-args-5.C: New test.
* g++.dg/gomp/append-args-6.C: New test.
* g++.dg/gomp/append-args-7.C: New test.
* gcc.dg/gomp/append-args-1.c: New test.
* gfortran.dg/gomp/append_args-1.f90: New test.
* gfortran.dg/gomp/append_args-2.f90: New test.
* gfortran.dg/gomp/append_args-3.f90: New test.
* gfortran.dg/gomp/append_args-4.f90: New test.
libgomp/ChangeLog:
* libgomp.texi (OpenMP 6.0): Fix typo.
(omp_get_default_device): Update the wording as the value
returned by omp_get_initial_device is now ambiguous.
(omp_get_num_devices): Minor wording tweak.
(omp_get_initial_device): Note that the function may also
return omp_initial_device since OpenMP 6.
PTX '%dynamic_smem_size' was "Introduced in PTX ISA version 4.1", and
"Requires 'sm_20' or higher". Given that GCC/nvptx generally supports
'sm_20', only the PTX ISA version matters here, and that's all fine if
just using GCC's defaults. Follow-up to
commit e9a19ead49
"openmp, nvptx: low-lat memory access traits".
libgomp/
* libgomp.texi: Clarify nvptx 'omp_low_lat_mem_space'
documentation.
Fixed a check to permit [[omp::decl(allocate,...)]] parsing in C.
Additionaly, we discussed that 'allocate align' should not affect
'alignof' to avoid issues like with:
int a;
_Alignas(_Alignof(a)) int b;
#pragma omp allocate(a) align(128)
_Alignas(_Alignof(a)) int c;
Thus, the alignment is no longer set in the C and Fortran front ends,
but for static variables now in varpool_node::finalize_decl.
(For stack variables, the alignment is handled in gimplify_bind_expr.)
NOTE: 'omp allocate' is not yet supported in C++.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_allocate): Only check scope if
not in_omp_decl_attribute. Remove setting the alignment.
gcc/ChangeLog:
* cgraphunit.cc (varpool_node::finalize_decl): Set alignment
based on OpenMP's 'omp allocate' attribute/directive.
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_finish_var_decl): Remove setting the alignment.
libgomp/ChangeLog:
* libgomp.texi (Memory allocation): Mention (non-)effect of 'align'
on _Alignof.
* testsuite/libgomp.c/allocate-7.c: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/allocate-18.c: Check that alignof is unaffected
by 'omp allocate'.
* c-c++-common/gomp/allocate-19.c: Likewise.
libgomp/
* libgomp.texi (OpenMP Implementation Status): Change TR13 to
OpenMP 6.0, now released. Fix a typo in the omp_target_memset_async
routine name.
libgomp/ChangeLog:
* libgomp.texi (OpenMP Technical Report 13): Remove 'iterator'
in 'map' clause of 'declare mapper' as it is already the list above.
(Interoperability Routines): Add.
(omp_target_memcpy_async, omp_target_memcpy_rect_async):
Document that depobj_list may be omitted in C++ and Fortran.
It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE
due to comparing 'int' with 'logical(4)'. When digging deeper, it also
turned out that when the procedure pointer is needed, the builtin cannot
be used, either. (Follow up to r15-2799-gf1bfba3a9b3f31 )
Extend the code to also use the builtin acc_on_device with OpenACC,
which was previously only used in C/C++. Additionally, fix folding
when offloading is not enabled.
Fixes additionally the BT_BOOL data type, which was 'char'/integer(1)
instead of bool, backing the booleaness; use bool_type_node as the rest
of GCC.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_option_t): Add disable_acc_on_device.
* options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device.
* trans-decl.cc (gfc_get_extern_function_decl): Move
__builtin_omp_is_initial_device handling to ...
* trans-expr.cc (get_builtin_fn): ... this new function.
(conv_function_val): Call it.
(update_builtin_function): New.
(gfc_conv_procedure_call): Call it.
* types.def (BT_BOOL): Fix type by using bool_type_node.
gcc/ChangeLog:
* gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold
when offloading is not configured.
libgomp/ChangeLog:
* libgomp.texi (TR13): Fix minor typos.
(omp_is_initial_device): Improve wording.
(acc_on_device): Note how to disable the builtin.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
Add -fno-builtin-acc_on_device.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update
dg- as !offloading_enabled now compile-time expands acc_on_device.
* testsuite/libgomp.fortran/target-is-initial-device-3.f90: New test.
* testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: New test.
For the 'allocate' directive, remove the sorry for static variables and
just keep using normal memory, but honor the requested alignment and set
a DECL_ATTRIBUTE in case a target may want to make use of this later on.
The documentation is updated accordingly.
The C diagnostic to check for predefined allocators (req. for static vars)
failed to accept GCC's ompx_gnu_... allocator, now fixed. (Fortran was
already okay; but both now use new common #defined value for checking.)
And while Fortran common block variables are still rejected, the check
has been improved as before the sorry diagnostic did not work for
common blocks in modules.
Finally, for 'allocate' clause on the target/task/taskloop directives,
there is now a warning for omp_thread_mem_alloc (i.e. predefined allocator
with access = thread), which is undefined behavior according to the
OpenMP specification.
And, last, testing showed that var decl + static_assert sets TREE_USED
but does not produce a statement list in C, which did run into an assert
in gimplify. This special case is now also handled.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_allocate): Set alignment for alignof;
accept static variables and fix predef allocator check.
gcc/fortran/ChangeLog:
* openmp.cc (is_predefined_allocator): Use gomp-constants.h consts.
* trans-common.cc (translate_common): Reject OpenMP allocate directives.
* trans-decl.cc (gfc_finish_var_decl): Handle allocate directive
for static variables.
(gfc_trans_deferred_vars): Update for the latter.
gcc/ChangeLog:
* gimplify.cc (gimplify_bind_expr): Fix corner case for OpenMP
allocate directive.
(gimplify_scan_omp_clauses): Warn if omp_thread_mem_alloc is used
as allocator with the target/task/taskloop directive.
include/ChangeLog:
* gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_MAX,
GOMP_OMPX_PREDEF_ALLOC_MIN, GOMP_OMPX_PREDEF_ALLOC_MAX,
GOMP_OMP_PREDEF_ALLOC_THREADS): New defines.
libgomp/ChangeLog:
* allocator.c: Add static asserts for news
GOMP_OMP{,X}_PREDEF_ALLOC_{MIN,MAX} range values.
* libgomp.texi (OpenMP Impl. Status): Allocate directive for
static vars is now supported. Refer to PR for allocate clause.
(Memory allocation): Update for static vars; minor word tweaking.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/allocate-9.c: Update for removed sorry.
* gfortran.dg/gomp/allocate-15.f90: Likewise.
* gfortran.dg/gomp/allocate-pinned-1.f90: Likewise.
* gfortran.dg/gomp/allocate-4.f90: Likewise; add dg-error for
previously missing diagnostic.
* c-c++-common/gomp/allocate-18.c: New test.
* c-c++-common/gomp/allocate-19.c: New test.
* gfortran.dg/gomp/allocate-clause.f90: New test.
* gfortran.dg/gomp/allocate-static-2.f90: New test.
* gfortran.dg/gomp/allocate-static.f90: New test.
Remove an item under "Other new TR 13 features" that since the last commit
(r15-3917-g6b7eaec20b046e) to this file is is covered by the added
"New @code{storage} map-type modifier; context-dependent @code{alloc} and
@code{release} are aliases"
"Update of the map-type decay for mapping and @code{declare_mapper}"
libgomp/
* libgomp.texi (TR13 status): Update semi-duplicated, semi-obsoleted
item; remove left-over half-sentence.
Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Add
get_device_from_uid and omp_get_uid_from_device routines.
include/ChangeLog:
* cuda/cuda.h (cuDeviceGetUuid): Declare.
(cuDeviceGetUuid_v2): Add prototype.
libgomp/ChangeLog:
* config/gcn/target.c (omp_get_uid_from_device,
omp_get_device_from_uid): Add stub implementation.
* config/nvptx/target.c (omp_get_uid_from_device,
omp_get_device_from_uid): Likewise.
* fortran.c (omp_get_uid_from_device_,
omp_get_uid_from_device_8_): New functions.
* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
* libgomp.map (GOMP_6.0): New, includind the new UID routines.
* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
(Device Information Routines): Document new UID routines.
(Offload-Target Specifics): Document UID format.
* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
New prototype.
* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
New interface.
* omp_lib.h.in: Likewise.
* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
* target.c (str_omp_initial_device): New static var.
(STR_OMP_DEV_PREFIX): Define.
(gomp_get_uid_for_device, omp_get_uid_from_device,
omp_get_device_from_uid): New.
(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
(gomp_target_init): Set the device's 'uid' field to NULL.
* testsuite/libgomp.c/device_uid.c: New test.
* testsuite/libgomp.fortran/device_uid.f90: New test.
The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and
hasn't worked properly with the drivers since about ROCm 4.
This patch removes the device from GCC options and documentation, and removes
the direct mentions from the internals.
The TARGET_GCN3 support in the back-end is now unused and can be removed (in a
follow-up patch).
gcc/ChangeLog:
* config.gcc (amdgcn-*-*): Remove "fiji" from with_arch checks.
* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove fiji alternative.
(NO_XNACK): Likewise.
(NO_SRAM_ECC): Likewise.
(ASM_SPEC): Remove "%{}" around ABI_VERSION_SPEC.
* config/gcn/gcn-opts.h (enum processor_type): Remove PROCESSOR_FIJI.
(TARGET_FIJI): Delete.
* config/gcn/gcn.cc (gcn_option_override): Remove Fiji.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Likewise.
* config/gcn/gcn.opt (gpu_type): Likewise.
(march, mtune): Change default to PROCESSOR_VEGA10.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX803): Delete.
(copy_early_debug_info): Remove elf_flags_actual.
Use ELFABIVERSION_AMDGPU_HSA_V4 unconditionally.
(get_arch): Remove Fiji.
(main): Remove gfx803.
* config/gcn/t-omp-device
(omp-device-properties-gcn): Remove fiji and gfx803.
* doc/install.texi (amdgcn*-*-*): Remove fiji and special instructions.
* doc/invoke.texi: Remove fiji.
libgomp/ChangeLog:
* libgomp.texi: Remove fiji and gfx803.
* testsuite/libgomp.c/declare-variant-4.h: Remove fiji and gfx803.
* testsuite/libgomp.c/declare-variant-4-fiji.c: Removed.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: Removed.
To keep track of missing routine documentation (both implemented and not),
the libgomp.texi file contains all non-OMPT routines as commented items
in @menu. This commit adds the routines added in TR13 as commented fixme
items.
libgomp/ChangeLog:
* libgomp.texi (OpenMP Runtime Library Routines): Add TR13 routines
to @menu (commented out).
Update 'OpenMP Runtime Library Routines' by adding a note that invoking
inside a target region might invoke unspecified behavior. Additionally,
update omp_{get,set}_default_device for omp_{initial,invalid}_device
named constants.
libgomp/ChangeLog:
* libgomp.texi (OpenMP Runtime Library Routines): Add missing
title to some commented still undocumented items.
(Device Information Routines): Update.
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).
The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait. One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.
gcc/fortran/ChangeLog:
* openmp.cc (is_predefined_allocator): Update valid ranges to
incorporate ompx_gnu_pinned_mem_alloc.
libgomp/ChangeLog:
* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-pinned-1.f90: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
The implementation has been committed in r15-1037.
2024-06-06 Jakub Jelinek <jakub@redhat.com>
* libgomp.texi (OpenMP 5.1 status): Mark Loop transformation constructs
as implemented.
libgomp/ChangeLog:
* libgomp.texi (OpenMP 5.0 status): Mark 'requires' as done and
link to 'Offload-Target Specifics'.
(OpenMP 5.2 status): Add item about additional map-type modifiers
in 'declare mapper'.
If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true,
all GPUs on the system support unified shared memory. That's
the case for APUs and MI200 devices when XNACK is enabled.
XNACK can be enabled by setting HSA_XNACK=1 as env var for
supported devices; otherwise, if disable, USM code will
use host fallback.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo.
include/ChangeLog:
* hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add
enum value.
libgomp/ChangeLog:
* libgomp.texi (gcn): Update USM handling
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle
USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.
A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared
memory is supported in hardware. This patch enables support for those -
if all installed nvptx devices have this feature (as the capabilities
are per device type).
This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.
include/ChangeLog:
* cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add.
libgomp/ChangeLog:
* libgomp.texi (nvptx): Update USM description.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
Claim support when requesting USM and all devices support
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
devices supports USM.
Like in r12-7519-g027e30414492d50feb2854aff38227b14300dc4b, I've done
git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 '
This is just part of the changes, mostly for non-gcc directories.
I'll try to get to the rest soon. Obviously, the above command also
finds cases which are correct as is and shouldn't be changed, so one
needs to manually inspect everything.
I'd hope most of it is pretty obvious, but the config/ and libstdc++-v3/
hunks include a tweak in a license wording, though other copies of the
similar license have the wording right.
2024-04-02 Jakub Jelinek <jakub@redhat.com>
* Makefile.tpl: Fix duplicated words; returns returns ->
returns.
config/
* lcmessage.m4: Fix duplicated words; can can -> can,
package package -> package.
libdecnumber/
* decCommon.c (decFinalize): Fix duplicated words in
comment; the the -> the.
libgcc/
* unwind-dw2-fde.c (struct fde_accumulator): Fix duplicated
words in comment; is is -> is.
libgfortran/
* configure.host: Fix duplicated words; the the -> the.
libgm2/
* configure.host: Fix duplicated words; the the -> the.
libgomp/
* libgomp.texi (OpenMP 5.2): Fix duplicated words; with with ->
with.
(omp_target_associate_ptr): Fix duplicated words; either either ->
either.
(omp_init_allocator): Fix duplicated words; be be -> be.
(omp_realloc): Fix duplicated words; is is -> is.
(OMP_ALLOCATOR): Fix duplicated words; other other -> other.
* priority_queue.h (priority_queue_multi_p): Fix duplicated words;
to to -> to.
libiberty/
* regex.c (byte_re_match_2_internal): Fix duplicated words in comment;
next next -> next.
* dyn-string.c (dyn_string_init): Fix duplicated words in comment;
of of -> of.
libitm/
* beginend.cc (GTM::gtm_thread::begin_transaction): Fix duplicated
words in comment; not not -> not to.
libobjc/
* init.c (duplicate_classes): Fix duplicated words in comment; in in
-> in.
* sendmsg.c (__objc_prepare_dtable_for_class): Fix duplicated words
in comment; the the -> the.
* encoding.c (objc_layout_structure): Likewise.
libstdc++-v3/
* acinclude.m4: Fix duplicated words; file file -> file can.
* configure.host: Fix duplicated words; the the -> the.
libvtv/
* vtv_rts.cc (vtv_fail): Fix duplicated words; to to -> to.
* vtv_fail.cc (vtv_fail): Likewise.
While texinfo 7.0.3 does not warn, an older texinfo did complain about:
libgomp.texi:1964: warning: node next `omp_target_memcpy' in menu
`omp_target_memcpy_rect' and in sectioning `omp_target_memcpy_async' differ
libgomp/
* libgomp.texi (Device Memory Routines): Swap item order to match
the order of the '@node's of the '@subsection's.
These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)
libgomp/ChangeLog:
* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
routines that simply map to their C counterpart.
* openacc.f90 (openacc): Add them.
* openacc_lib.h: Likewise.
* testsuite/libgomp.oacc-fortran/acc_host_device_ptr.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy-2.f90: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-59.c: Crossref to f90 test.
* testsuite/libgomp.oacc-c-c++-common/lib-60.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
The main 'arch' context selector for nvptx is, well, 'nvptx';
however, as 'nvptx64' is used as by LLVM, it makes sense
to support it as well.
Note that LLVM has: "The triple architecture can be one of
``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX)."
GCC effectively only supports the 64bit variant (at least for
offloading). Thus, GCC's 'nvptx' is not quite the same as LLVM's.
The device-compiler part (nvptx_omp_device_kind_arch_isa) uses
TARGET_ABI64 such that nvptx64 is only defined with -m64.
gcc/ChangeLog:
* config/nvptx/gen-omp-device-properties.sh: Add 'nvptx64' to arch.
* config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Likewise.
libgomp/ChangeLog:
* libgomp.texi (OpenMP Context Selectors): Add 'nvptx64' as additional
'arch' value for nvptx.
Support for indirect calls to procedures/functions in offloaded target
regions is now available for C, C++ and Fortran.
2024-02-15 Kwok Cheung Yeung <kcyeung@baylibre.com>
libgomp/
* libgomp.texi (OpenMP 5.1): Mark indirect call support as fully
implemented.
This patch adds support for parsing general lvalues ("locator list item
types") for OpenMP "map", "to" and "from" clauses to the C front-end,
similar to the previously-posted patch for C++. Such syntax is permitted
for OpenMP 5.0 and above. It was previously posted for mainline here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609038.html
and for the og13 branch here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623355.html
2024-01-11 Julian Brown <julian@codesourcery.com>
gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::postfix_expression,
c_pretty_printer::expression): Add OMP_ARRAY_SECTION support.
gcc/c/
* c-parser.cc (c_parser_braced_init, c_parser_conditional_expression):
Don't allow OpenMP array section.
(c_parser_postfix_expression): Don't allow array section in statement
expression.
(c_parser_postfix_expression_after_primary): Add support for OpenMP
array section parsing.
(c_parser_expr_list): Don't allow OpenMP array section here.
(c_parser_omp_variable_list): Change ALLOW_DEREF parameter to
MAP_LVALUE. Support parsing of general lvalues in "map", "to" and
"from" clauses.
(c_parser_omp_var_list_parens): Change ALLOW_DEREF parameter to
MAP_LVALUE. Update call to c_parser_omp_variable_list.
(c_parser_oacc_data_clause): Update calls to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_reduction): Use OMP_ARRAY_SECTION tree node
instead of TREE_LIST for array sections.
(c_parser_omp_target): Allow GOMP_MAP_ATTACH.
* c-tree.h (c_omp_array_section_p): Add extern declaration.
(build_omp_array_section): Add prototype.
* c-typeck.cc (c_omp_array_section_p): Add flag.
(mark_exp_read): Support OMP_ARRAY_SECTION.
(build_omp_array_section): Add function.
(build_external_ref): Tweak error path for OpenMP array sections.
(handle_omp_array_sections_1): Use OMP_ARRAY_SECTION tree code instead
of TREE_LIST. Handle more kinds of expressions.
(c_oacc_check_attachments): Use OMP_ARRAY_SECTION instead of TREE_LIST
for array sections.
(c_finish_omp_clauses): Use OMP_ARRAY_SECTION instead of TREE_LIST.
Check for supported expression types.
gcc/testsuite/
* gcc.dg/gomp/bad-array-section-c-1.c: New test.
* gcc.dg/gomp/bad-array-section-c-2.c: New test.
* gcc.dg/gomp/bad-array-section-c-3.c: New test.
* gcc.dg/gomp/bad-array-section-c-4.c: New test.
* gcc.dg/gomp/bad-array-section-c-5.c: New test.
* gcc.dg/gomp/bad-array-section-c-6.c: New test.
* gcc.dg/gomp/bad-array-section-c-7.c: New test.
* gcc.dg/gomp/bad-array-section-c-8.c: New test.
libgomp/
* libgomp.texi: C/C++ lvalues are supported now for map/to/from.
* testsuite/libgomp.c-c++-common/ind-base-4.c: New test.
* testsuite/libgomp.c-c++-common/unary-ptr-1.c: New test.
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* libgomp.texi: Switch pinned trait to supported.
(MEMSPACE_VALIDATE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
This commit adds -fopenmp-allocators which enables support for
'omp allocators' and 'omp allocate' that are associated with a Fortran
allocate-stmt. If such a construct is encountered, an error is shown,
unless the -fopenmp-allocators flag is present.
With -fopenmp -fopenmp-allocators, those constructs get turned into
GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
ensures deallocation and reallocation (via intrinsic assignments) are
properly directed to GOMP_free/omp_realloc - while normal Fortran
allocations are processed by free/realloc.
In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
version field of the Fortran array discriptor is (mis)used: 0 indicates
the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
true it was previously added but also removes it from the list.
Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
omp-builtins.def and libgomp gains the mentioned two new function.
gcc/ChangeLog:
* builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
* omp-builtins.def (BUILT_IN_GOMP_REALLOC): New.
* builtins.cc (builtin_fnspec): Handle it.
* gimple-ssa-warn-access.cc (fndecl_alloc_p,
matching_alloc_calls_p): Likewise.
* gimple.cc (nonfreeing_call_p): Likewise.
* predict.cc (expr_expected_value_1): Likewise.
* tree-ssa-ccp.cc (evaluate_stmt): Likewise.
* tree.cc (fndecl_dealloc_argno): Likewise.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE
and EXEC_OMP_ALLOCATORS.
* f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'.
(ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define.
* gfortran.h (gfc_omp_clauses): Add contained_in_target_construct.
* invoke.texi (-fopenacc, -fopenmp): Update based on C version.
(-fopenmp-simd): New, based on C version.
(-fopenmp-allocators): New.
* lang.opt (fopenmp-allocators): Add.
* openmp.cc (resolve_omp_clauses): For allocators/allocate directive,
add target and no dynamic_allocators diagnostic and more invalid
diagnostic.
* parse.cc (decode_omp_directive): Set contains_teams_construct.
* trans-array.h (gfc_array_allocate): Update prototype.
(gfc_conv_descriptor_version): New prototype.
* trans-decl.cc (gfc_init_default_dt): Fix comment.
* trans-array.cc (gfc_conv_descriptor_version): New.
(gfc_array_allocate): Support GOMP_alloc allocation.
(gfc_alloc_allocatable_for_assignment, structure_alloc_comps):
Handle GOMP_free/omp_realloc as needed.
* trans-expr.cc (gfc_conv_procedure_call): Likewise.
(alloc_scalar_allocatable_for_assignment): Likewise.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise.
* trans-openmp.cc (gfc_trans_omp_allocators,
gfc_trans_omp_directive): Handle allocators/allocate directive.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New.
* trans-stmt.h (gfc_trans_allocate): Update prototype.
* trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc.
* trans-types.cc (gfc_get_dtype_rank_type): Set version field.
* trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable):
Update to handle GOMP_alloc.
(gfc_deallocate_with_status, gfc_deallocate_scalar_with_status):
Handle GOMP_free.
(trans_code): Update call.
* trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc):
Update prototype.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype.
* types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
libgomp/ChangeLog:
* allocator.c (struct fort_alloc_splay_tree_key_s,
fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New.
* libgomp.h: Define splay_tree_static for 'reverse' splay tree.
* libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and
GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ...
(GOMP_5.1.1): ... here.
* libgomp.texi (Impl. Status, Memory management): Update for
allocators/allocate directives.
* splay-tree.c: Handle splay_tree_static define to declare all
functions as static.
(splay_tree_lookup_node): New.
* splay-tree.h: Handle splay_tree_decl_only define.
(splay_tree_lookup_node): New prototype.
* target.c: Define splay_tree_static for 'reverse'.
* testsuite/libgomp.fortran/allocators-1.f90: New test.
* testsuite/libgomp.fortran/allocators-2.f90: New test.
* testsuite/libgomp.fortran/allocators-3.f90: New test.
* testsuite/libgomp.fortran/allocators-4.f90: New test.
* testsuite/libgomp.fortran/allocators-5.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-14.f90: Add coarray and
not-listed tests.
* gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message.
* gfortran.dg/bind_c_array_params_2.f90: Update expected
dump for dtype '.version=0'.
* gfortran.dg/gomp/allocate-16.f90: New test.
* gfortran.dg/gomp/allocators-3.f90: New test.
* gfortran.dg/gomp/allocators-4.f90: New test.
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories). A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.
gcc/ChangeLog:
* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.
libgomp/ChangeLog:
* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.
* libgomp.texi: Document low-latency implementation details.
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator no longer works (but omp_cgroup_mem_alloc still does).
libgomp/ChangeLog:
* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_init_allocator): Use MEMSPACE_VALIDATE.
(omp_aligned_alloc): Use OMP_LOW_LAT_MEM_ALLOC_INVALID.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
(OMP_LOW_LAT_MEM_ALLOC_INVALID): New define.
* libgomp.texi: Document low-latency implementation details.
* testsuite/libgomp.c/omp_alloc-1.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-2.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-3.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-5.c (main): Add gnu_lowlat.
* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-traits.c: New test.