This should have been part of commit r16-838-gb3d07ec7ac2ccd or
r16-883-g5d6ed6d604ff94 - all showing the same issue:
'!$omp target' followed by a metadirective with 'teams'; if
the metadirective cannot be early resolved, a diagnostic
error is shown about using directives between 'target' and
'teams'.
While the message is misleading, the problem is that the
host invokes 'target' differently when 'teams' is present;
in this case, host fallback + amdgcn offload require the
no-teams case, nvptx offload the teams case such that it
only can be resolved at runtime.
Mark the error as 'dg-bogus + xfail' to silence the FAIL,
when nvptx offloading is compiled for. (If not, the
metadirective can be resolved early during compilation.)
libgomp/ChangeLog:
PR middle-end/118694
* testsuite/libgomp.fortran/metadirective-1.f90: xfail when
compiling (also) for nvptx offloading as an error is then expected.
The check whether the location expression in map clause has allocatable
components was failing for some derived-type array expressions such as
map(var%tiles(1))
as the compiler produced
_4 = var.tiles;
MEMREF(_4, _5);
This commit now also handles this case.
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_omp_deep_mapping_do): Handle SSA_NAME if
a def_stmt is available.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/alloc-comp-4.f90: New test.
While the tests checked whether the CUDA/HIP runtime is available
before processing them, the execution was then done unconditionally,
leading to FAIL when the default device was the host (or the wrong
offload device).
Now the test is only executed ('run') when the default device is an
Nvidia or AMD GPU (depending on the test case, cf. the test file name).
Otherwise, only a 'link' test is done. (Except when the effective-target
check cannot find the runtime lib - then the test is skipped [as before].)
Note: The cublas/hipblas tests use variant functions and iterate over
all devices, such that the cublas or hipblas, respectively, is only
called when the active device is an AMD or Nvidia device, respectively,
while for the host and other device types the fallback is called.
libgomp/ChangeLog:
* testsuite/libgomp.c/interop-cuda-full.c: Use 'link' instead
of 'run' when the default device is "! offload_device_nvptx".
* testsuite/libgomp.c/interop-cuda-libonly.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: Likewise.
* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: Likewise.
* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: Likewise.
* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: Likewise.
* testsuite/libgomp.c/interop-hip-amd-full.c: Use 'link' instead
of 'run' when the default device is "! offload_device_gcn".
* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: Likewise.
* testsuite/libgomp.fortran/interop-hip-amd-full.F90: Likewise.
* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: Likewise.
When host memory is device accessible - independent whether mapping is done or
not (i.e. self map), the 'vtab' pointer becomes accessible, which stores the
dynamic type's type and size information.
In principle, we want to test: USM available but mapping is still done, but
as there is no simple + reliable not-crashing way to test for this, those
checks are skipped in the (pre)existing test file map-alloc-comp-9.f90.
Or rather: those are only active with self-maps, which is currently only true
for the host.
This commit adds map-alloc-comp-9-usm.f90 which runs the same test with
'omp requires unified_shared_memory'. While OpenMP permits both actual
mapping and self maps with this flag, it in theory covers the missing cases.
However, currently, GCC always uses self maps with USM. Still, having a
device-run self-maps check is better than nothing, even if it misses the
most interesting case.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/map-alloc-comp-9.f90: Process differently
when USE_USM_REQUIREMENT is set.
* testsuite/libgomp.fortran/map-alloc-comp-9-usm.f90: New test.
This testcase, which is present on the OG13 and OG14 branches, was
overlooked when the Fortran support for 'omp allocate' was added to
mainline (commit d4b6d14792 from
December 2023).
libgomp/ChangeLog
* testsuite/libgomp.fortran/allocate-8a.f90: New test.
Add checks for nowait/depend and for checks that the returned
CUDA, CUDA_DRIVER and HIP interop objects actually work.
While the CUDA/CUDA_DRIVER ones are only for Nvidia GPUs, HIP
works on both AMD and Nvidia GPUs; on Nvidia GPUs, it is a
very thin wrapper around CUDA.
For Fortran, only a HIP test has been added - using hipfort.
While libgomp.c-c++-common/interop-2.c always works - even without
GPU - and checks for depend / nowait, all others require that
runtime libraries are found at link (and execution) time:
For Nvidia GPUs, libcuda + libcudart or libcublas,
For AMD GPUs, libamdhip64 or libhipblas.
The header files and hipfort modules do not need to be present as a
fallback has been implemented, but if they are, they get used.
Due to the combinations, the basic 1x C/C++, 4x C and 1x Fortran tests
yield 1x C/C++, 14x C and 4 Fortran run-test files.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp (check_effective_target_openacc_cublas,
check_effective_target_openacc_cudart): Update description as
the check requires more.
(check_effective_target_openacc_libcuda,
check_effective_target_openacc_libcublas,
check_effective_target_openacc_libcudart,
check_effective_target_gomp_hip_header_amd,
check_effective_target_gomp_hip_header_nvidia,
check_effective_target_gomp_hipfort_module,
check_effective_target_gomp_libamdhip64,
check_effective_target_gomp_libhipblas): New.
* testsuite/libgomp.c-c++-common/interop-2.c: New test.
* testsuite/libgomp.c/interop-cublas-full.c: New test.
* testsuite/libgomp.c/interop-cublas-libonly.c: New test.
* testsuite/libgomp.c/interop-cuda-full.c: New test.
* testsuite/libgomp.c/interop-cuda-libonly.c: New test.
* testsuite/libgomp.c/interop-hip-amd-full.c: New test.
* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-full.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hip.h: New test.
* testsuite/libgomp.c/interop-hipblas-amd-full.c: New test.
* testsuite/libgomp.c/interop-hipblas-amd-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-no-headers.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hipblas.h: New test.
* testsuite/libgomp.fortran/interop-hip-amd-full.F90: New test.
* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: New test.
* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: New test.
* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: New test.
* testsuite/libgomp.fortran/interop-hip.h: New test.
Add another testcase for Fortran deep mapping of allocatable components.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-8.f90: New test.
When mapping an allocatable variable (or derived-type component), explicitly
or implicitly, all its allocated allocatable components will automatically be
mapped. The patch implements the target hooks, added for this feature to
omp-low.cc with commit r15-3895-ge4a58b6f28383c.
Namely, there is a check whether there are allocatable components at all:
gfc_omp_deep_mapping_p. Then gfc_omp_deep_mapping_cnt, counting the number
of required mappings; this is a dynamic value as it depends on array
bounds and whether an allocatable is allocated or not.
And, finally, the actual mapping: gfc_omp_deep_mapping.
Polymorphic variables are partially supported: the mapping of the _data
component is fully supported, but only components of the declared type
are processed for additional allocatables. Additionally, _vptr is not
touched. This means that everything needing _vtab information requires
unified shared memory; in particular, _size data is required when
accessing elements of polymorphic arrays.
However, for scalar arrays, accessing components of the declare type
should work just fine.
As polymorphic variables are not (really) supported and OpenMP 6
explicitly disallows them, there is now a warning (-Wopenmp) when
they are encountered. Unlimited polymorphics are rejected (error).
Additionally, PRIVATE and FIRSTPRIVATE are not quite supported for
allocatable components, polymorphic components and as polymorphic
variable. Thus, those are now rejected as well.
gcc/fortran/ChangeLog:
* f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING,
LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT):
Define.
* openmp.cc (gfc_match_omp_clause_reduction): Fix location setting.
(resolve_omp_clauses): Permit allocatable components, reject
them and polymorphic variables in PRIVATE/FIRSTPRIVATE.
* trans-decl.cc (add_clause): Set clause location.
* trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok and
shallow_alloc_only Boolean arguments.
(gfc_omp_replace_alloc_by_to_mapping): New.
(gfc_omp_private_outer_ref, gfc_walk_alloc_comps,
gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor,
gfc_omp_clause_assign_op, gfc_omp_clause_dtor): Update call to it.
(gfc_omp_finish_clause): Minor cleanups, improve location data,
handle allocatable components.
(gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item,
gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop,
gfc_omp_get_array_size, gfc_omp_elmental_loop,
gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p,
gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do,
gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New.
(gfc_trans_omp_array_section): Save array descriptor in case
deep-mapping lang hook will need it.
(gfc_trans_omp_clauses): Likewise; use better clause location data.
* trans.h (gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt,
gfc_omp_deep_mapping): Add function prototypes.
libgomp/ChangeLog:
* libgomp.texi (5.0 Impl. Status): Mark mapping alloc comps as 'Y'.
* testsuite/libgomp.fortran/allocatable-comp.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-9.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/map-alloc-comp-1.f90: Remove dg-error.
* gfortran.dg/gomp/polymorphic-mapping-2.f90: Update warn wording.
* gfortran.dg/gomp/polymorphic-mapping.f90: Change expected
diagnostic; some tests moved to ...
* gfortran.dg/gomp/polymorphic-mapping-1.f90: ... here as new test.
* gfortran.dg/gomp/polymorphic-mapping-3.f90: New test.
* gfortran.dg/gomp/polymorphic-mapping-4.f90: New test.
* gfortran.dg/gomp/polymorphic-mapping-5.f90: New test.
This is a fix for the GOMP_interop commit r15-8654-g99e2906ae255fc that
added GOMP_DEVICE_DEFAULT_OMP_61 alias omp_default_device, which is a
conforming device number. But that test used -5 as check for a
non-conforming device number.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/get-mapped-ptr-1.f90: Use -6
not -5 as non-conforming device number.
The test is a supposed to be a compile-only test but as dg-additional-sources
does not work with dg-compile (due to compiling with '-o'), dg-link had to
be used; as the code actually compiles (no diagnostic error), the linker
is actually invoked, which fails unless the system compiler by chance
provides the required files. Solution: move the test to libgomp.
PR fortran/115271
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/declare-variant-mod-1-use.f90: Move to
libgomp/testsuite/libgomp.fortran/.
* gfortran.dg/gomp/declare-variant-mod-1.f90: Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/declare-variant-mod-1-use.f90: Moved
from gcc/testsuite/gfortran.dg/gomp/.
* testsuite/libgomp.fortran/declare-variant-mod-1.f90: Likewise.
Without the target directive, the test would run on the host but still try to
use device pointers, which causes a segfault.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/dispatch-1.f90: Add missing target
directive.
This patch adds support for the `dispatch` construct and the `adjust_args`
clause to the Fortran front-end.
Handling of `adjust_args` across translation units is missing due to PR115271.
Minor modifications to the C++ FE and the ME are also folded into this patch as
a side effect of the Fortran work.
gcc/c-family/ChangeLog:
* c-attribs.cc: (c_common_gnu_attributes): Rename "omp declare variant
variant adjust_args" into "omp declare variant variant args" to also
accommodate append_args.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_dispatch): Handle INDIRECT_REF.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_clauses): Handle novariants and nocontext
clauses.
(show_omp_node): Handle EXEC_OMP_DISPATCH.
(show_code_node): Likewise.
* frontend-passes.cc (gfc_code_walker): Handle novariants and nocontext.
* gfortran.h (enum gfc_statement): Add ST_OMP_DISPATCH.
(symbol_attribute): Add omp_declare_variant_need_device_ptr.
(gfc_omp_clauses): Add novariants and nocontext.
(gfc_omp_declare_variant): Add need_device_ptr_arg_list.
(enum gfc_exec_op): Add EXEC_OMP_DISPATCH.
* match.h (gfc_match_omp_dispatch): Declare.
* openmp.cc (gfc_free_omp_clauses): Free novariants and nocontext
clauses.
(gfc_free_omp_declare_variant_list): Free need_device_ptr_arg_list
namelist.
(enum omp_mask2): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(OMP_DISPATCH_CLAUSES): Define.
(gfc_match_omp_dispatch): New function.
(gfc_match_omp_declare_variant): Parse adjust_args.
(resolve_omp_clauses): Handle adjust_args, novariants and nocontext.
Adjust handling of OMP_LIST_IS_DEVICE_PTR.
(icode_code_error_callback): Handle EXEC_OMP_DISPATCH.
(omp_code_to_statement): Likewise.
(resolve_omp_dispatch): New function.
(gfc_resolve_omp_directive): Handle EXEC_OMP_DISPATCH.
* parse.cc (decode_omp_directive): Match dispatch.
(next_statement): Handle ST_OMP_DISPATCH.
(gfc_ascii_statement): Likewise.
(parse_omp_dispatch): New function.
(parse_executable): Handle ST_OMP_DISPATCH.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_DISPATCH.
* st.cc (gfc_free_statement): Likewise.
* trans-decl.cc (create_function_arglist): Declare.
(gfc_get_extern_function_decl): Call it.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle novariants and
nocontext.
(replace_omp_dispatch_call): New function.
(gfc_trans_omp_dispatch): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_DISPATCH.
(gfc_trans_omp_declare_variant): Handle adjust_args.
* trans.cc (trans_code): Handle EXEC_OMP_DISPATCH:.
gcc/ChangeLog:
* gimplify.cc (gimplify_call_expr): Fix handling of need_device_ptr for
type(c_ptr). Fix handling of nested function calls in a dispatch region.
(find_ifn_gomp_dispatch): Return the IFN without stripping it.
(gimplify_omp_dispatch): Keep IFN_GOMP_DISPATCH until
gimplify_call_expr.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/declare-variant-2-aux.f90: New test.
* testsuite/libgomp.fortran/declare-variant-2.f90: New test (xfail).
* testsuite/libgomp.fortran/dispatch-1.f90: New test.
* testsuite/libgomp.fortran/dispatch-2.f90: New test.
* testsuite/libgomp.fortran/dispatch-3.f90: New test.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/dispatch-3.C: Update scan dumps.
* gfortran.dg/gomp/declare-variant-2.f90: Update dg-error.
* gfortran.dg/gomp/adjust-args-1.f90: New test.
* gfortran.dg/gomp/adjust-args-2.f90: New test.
* gfortran.dg/gomp/adjust-args-2a.f90: New test.
* gfortran.dg/gomp/adjust-args-3.f90: New test.
* gfortran.dg/gomp/adjust-args-4.f90: New test.
* gfortran.dg/gomp/adjust-args-5.f90: New test.
* gfortran.dg/gomp/adjust-args-6.f90: New test.
* gfortran.dg/gomp/adjust-args-7.f90: New test.
* gfortran.dg/gomp/adjust-args-8.f90: New test.
* gfortran.dg/gomp/adjust-args-9.f90: New test.
* gfortran.dg/gomp/dispatch-1.f90: New test.
* gfortran.dg/gomp/dispatch-2.f90: New test.
* gfortran.dg/gomp/dispatch-3.f90: New test.
* gfortran.dg/gomp/dispatch-4.f90: New test.
* gfortran.dg/gomp/dispatch-5.f90: New test.
* gfortran.dg/gomp/dispatch-6.f90: New test.
* gfortran.dg/gomp/dispatch-7.f90: New test.
* gfortran.dg/gomp/dispatch-8.f90: New test.
* gfortran.dg/gomp/dispatch-9.f90: New test.
* gfortran.dg/gomp/dispatch-9a.f90: New test.
* gfortran.dg/gomp/dispatch-10.f90: New test.
It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE
due to comparing 'int' with 'logical(4)'. When digging deeper, it also
turned out that when the procedure pointer is needed, the builtin cannot
be used, either. (Follow up to r15-2799-gf1bfba3a9b3f31 )
Extend the code to also use the builtin acc_on_device with OpenACC,
which was previously only used in C/C++. Additionally, fix folding
when offloading is not enabled.
Fixes additionally the BT_BOOL data type, which was 'char'/integer(1)
instead of bool, backing the booleaness; use bool_type_node as the rest
of GCC.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_option_t): Add disable_acc_on_device.
* options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device.
* trans-decl.cc (gfc_get_extern_function_decl): Move
__builtin_omp_is_initial_device handling to ...
* trans-expr.cc (get_builtin_fn): ... this new function.
(conv_function_val): Call it.
(update_builtin_function): New.
(gfc_conv_procedure_call): Call it.
* types.def (BT_BOOL): Fix type by using bool_type_node.
gcc/ChangeLog:
* gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold
when offloading is not configured.
libgomp/ChangeLog:
* libgomp.texi (TR13): Fix minor typos.
(omp_is_initial_device): Improve wording.
(acc_on_device): Note how to disable the builtin.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
Add -fno-builtin-acc_on_device.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update
dg- as !offloading_enabled now compile-time expands acc_on_device.
* testsuite/libgomp.fortran/target-is-initial-device-3.f90: New test.
* testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: New test.
The testcase was turned into a 'dg-do run' check to check for the alignment,
but this only works in testsuite/gfortran.dg, causing link errors for
out-of-tree testing. The test was added in r15-4104-ga8caeaacf499d5.
gcc/testsuite/:
* gfortran.dg/gomp/allocate-static.f90: Move to libgomp/testsuite/.
libgomp/:
* testsuite/libgomp.fortran/allocate-static.f90: Moved from
gcc/testsuite/ as it is a dg-do run test; use real omp_lib_kinds
instead of local definition
In Fortran, omp_get_device_from_uid can also accept substrings, which are
then not NUL terminated. Fixed by introducing a fortran.c wrapper function.
Additionally, in case of a fail the plugin functions now return NULL instead
of failing fatally such that a fall-back UID is generated.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Strip "omp_" from
string; move get_device_from_uid as now a '_' suffix exists.
libgomp/ChangeLog:
* fortran.c (omp_get_device_from_uid_): New function.
* libgomp.map (GOMP_6.0): Add it.
* oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'.
* omp_lib.f90.in: Make it used by removing bind(C).
* omp_lib.h.in: Likewise.
* target.c (omp_get_device_from_uid): Ensure the device is initialized.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment;
return NULL in case of an error.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise.
* testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.
Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Add
get_device_from_uid and omp_get_uid_from_device routines.
include/ChangeLog:
* cuda/cuda.h (cuDeviceGetUuid): Declare.
(cuDeviceGetUuid_v2): Add prototype.
libgomp/ChangeLog:
* config/gcn/target.c (omp_get_uid_from_device,
omp_get_device_from_uid): Add stub implementation.
* config/nvptx/target.c (omp_get_uid_from_device,
omp_get_device_from_uid): Likewise.
* fortran.c (omp_get_uid_from_device_,
omp_get_uid_from_device_8_): New functions.
* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
* libgomp.map (GOMP_6.0): New, includind the new UID routines.
* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
(Device Information Routines): Document new UID routines.
(Offload-Target Specifics): Document UID format.
* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
New prototype.
* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
New interface.
* omp_lib.h.in: Likewise.
* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
CUDA_ONE_CALL_MAYBE_NULL.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
* target.c (str_omp_initial_device): New static var.
(STR_OMP_DEV_PREFIX): Define.
(gomp_get_uid_for_device, omp_get_uid_from_device,
omp_get_device_from_uid): New.
(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
(gomp_target_init): Set the device's 'uid' field to NULL.
* testsuite/libgomp.c/device_uid.c: New test.
* testsuite/libgomp.fortran/device_uid.f90: New test.
This commit adds OpenMP 5.1+'s interop enumeration, type and routine
declarations to the C/C++ header file and, new in OpenMP TR13, also to
the Fortran module and omp_lib.h header file.
While a stub implementation is provided, only with foreign runtime
support by the libgomp GPU plugins and with the 'interop' directive,
this becomes really useful.
libgomp/ChangeLog:
* fortran.c (omp_get_interop_str_, omp_get_interop_name_,
omp_get_interop_type_desc_, omp_get_interop_rc_desc_): Add.
* libgomp.map (GOMP_5.1.3): New; add interop routines.
* omp.h.in: Add interop typedefs, enum and prototypes.
(__GOMP_DEFAULT_NULL): Define.
(omp_target_memcpy_async, omp_target_memcpy_rect_async):
Use it for the optional depend argument.
* omp_lib.f90.in: Add paramters and interfaces for interop.
* omp_lib.h.in: Likewise; move F90 '&' to column 81 for
-ffree-length-80.
* target.c (omp_get_num_interop_properties, omp_get_interop_int,
omp_get_interop_ptr, omp_get_interop_str, omp_get_interop_name,
omp_get_interop_type_desc, omp_get_interop_rc_desc): Add.
* config/gcn/target.c (omp_get_num_interop_properties,
omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
omp_get_interop_name, omp_get_interop_type_desc,
omp_get_interop_rc_desc): Add.
* config/nvptx/target.c (omp_get_num_interop_properties,
omp_get_interop_int, omp_get_interop_ptr, omp_get_interop_str,
omp_get_interop_name, omp_get_interop_type_desc,
omp_get_interop_rc_desc): Add.
* testsuite/libgomp.c-c++-common/interop-routines-1.c: New test.
* testsuite/libgomp.c-c++-common/interop-routines-2.c: New test.
* testsuite/libgomp.fortran/interop-routines-1.F90: New test.
* testsuite/libgomp.fortran/interop-routines-2.F90: New test.
* testsuite/libgomp.fortran/interop-routines-3.F: New test.
* testsuite/libgomp.fortran/interop-routines-4.F: New test.
* testsuite/libgomp.fortran/interop-routines-5.F: New test.
* testsuite/libgomp.fortran/interop-routines-6.F: New test.
* testsuite/libgomp.fortran/interop-routines-7.F90: New test.
(Most of) the tests added in commit f1bfba3a9b
"OpenMP: Constructors and destructors for "declare target" static aggregates"
had a mismatch between dump file production and its scanning; the former needs
to use 'offload_target_nvptx' (like 'offload_target_amdgcn'), not
'offload_device_nvptx'.
libgomp/
* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C:
Fix effective-target keyword.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C:
Likewise.
* testsuite/libgomp.c-c++-common/target-is-initial-host-2.c:
Likewise.
* testsuite/libgomp.c-c++-common/target-is-initial-host.c:
Likewise.
* testsuite/libgomp.fortran/target-is-initial-host-2.f90:
Likewise.
* testsuite/libgomp.fortran/target-is-initial-host.f: Likewise.
* testsuite/libgomp.fortran/target-is-initial-host.f90: Likewise.
This commit also compile-time expands (__builtin_)omp_is_initial_device for
both Fortran and C/C++ (unless, -fno-builtin-omp_is_initial_device is used).
But the main change is:
This commit adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.
Before this commit, space is allocated on the target for such aggregates,
but nothing ever constructs them properly, so they end up zero-initialised.
(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)
2024-08-07 Julian Brown <julian@codesourcery.com>
Tobias Burnus <tobias@baylibre.com>
gcc/ChangeLog:
* builtins.def (DEF_GOMP_BUILTIN_COMPILER): Define
DEF_GOMP_BUILTIN_COMPILER to handle the non-prefix version.
* gimple-fold.cc (gimple_fold_builtin_omp_is_initial_device): New.
(gimple_fold_builtin): Call it.
* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): Define.
* tree.cc (get_file_function_name): Support names for on-target
constructor/destructor functions.
gcc/cp/
* decl2.cc (tree-inline.h): Include.
(static_init_fini_fns): Bump to four entries. Update comment.
(start_objects, start_partial_init_fini_fn): Add 'omp_target'
parameter. Support "declare target" decls. Update forward declaration.
(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
the created function. Support "declare target".
(OMP_SSDF_IDENTIFIER): New macro.
(partition_vars_for_init_fini): Support partitioning "declare target"
variables also.
(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
"declare target" decls.
(c_parse_final_cleanups): Support constructors/destructors on OpenMP
offload targets.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_option_t): Add disable_omp_is_initial_device.
* lang.opt (fbuiltin-): Add.
* options.cc (gfc_handle_option): Handle
-fno-builtin-omp_is_initial_device.
* f95-lang.cc (gfc_init_builtin_functions): Handle
DEF_GOMP_BUILTIN_COMPILER.
* trans-decl.cc (gfc_get_extern_function_decl): Add code to use
DEF_GOMP_BUILTIN_COMPILER for 'omp_is_initial_device'.
libgomp/ChangeLog:
* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New test.
* testsuite/libgomp.c-c++-common/target-is-initial-host.c: New test.
* testsuite/libgomp.c-c++-common/target-is-initial-host-2.c: New test.
* testsuite/libgomp.fortran/target-is-initial-host.f: New test.
* testsuite/libgomp.fortran/target-is-initial-host.f90: New test.
* testsuite/libgomp.fortran/target-is-initial-host-2.f90: New test.
Co-authored-by: Tobias Burnus <tobias@baylibre.com>
As the PR and included testcase shows, replacing 'arr2' by its value expression
'*arr2$13$linkptr' failed for
MEM <uint128_t> [(c_char * {ref-all})&arr2]
which left 'arr2' in the code as unknown symbol. Now expand the value expression
already in pass_omp_target_link::execute's process_link_var_op walk_gimple_stmt
walk - and don't rely on gimple_regimplify_operands.
PR middle-end/115637
gcc/ChangeLog:
* gimplify.cc (gimplify_body): Fix macro name in the comment.
* omp-offload.cc (find_link_var_op): Rename to ...
(process_link_var_op): ... this. Replace value expr.
(pass_omp_target_link::execute): Update walk_gimple_stmt call.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/declare-target-link.f90: Uncomment
now working code.
Co-authored-by: Richard Biener <rguenther@suse.de
Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front end.
Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.
PR fortran/115559
gcc/fortran/ChangeLog:
* trans-common.cc (build_common_decl): Add 'omp declare target' and
'omp declare target link' variables to offload_vars.
* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
call decl_attributes.
(get_proc_pointer_decl, gfc_get_extern_function_decl,
build_function_decl): Update calls.
(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
to avoid errors with symtab_node::get_create.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/declare-target-link.f90: New test.
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).
The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait. One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.
gcc/fortran/ChangeLog:
* openmp.cc (is_predefined_allocator): Update valid ranges to
incorporate ompx_gnu_pinned_mem_alloc.
libgomp/ChangeLog:
* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-pinned-1.f90: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
This patch is largely rewritten version of the
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631764.html
patch set which I've promissed to adjust the way I'd like it but didn't
get to it until now.
The previous series together in diffstat was
176 files changed, 12107 insertions(+), 298 deletions(-)
This patch is
197 files changed, 10843 insertions(+), 212 deletions(-)
and diff between the old series and new patch is
268 files changed, 8053 insertions(+), 9231 deletions(-)
Only the 5.1/5.2 tile/unroll constructs are supported, in various
places some preparations for the other 6.0 loop transformations
constructs (interchange/reverse/fuse) are done, but certainly
not complete and not everywhere. The important difference is that
because tile/unroll partial map 1:1 the original loops to generated
canonical loops and add another set of generated loops without canonical
form inside of it, the tile/unroll partial constructs are terminal
for the generated loop, one can't have some loops from the tile or
unroll partial and some further loops from inside the body of that
construct.
The GENERIC representation attempts to match what the standard specifies,
so there are separate OMP_TILE and OMP_UNROLL trees. If for a particular
loop in a loop nest of some OpenMP loop it awaits a generated loop from a
nested loop, or if in OMP_LOOPXFORM_LOWERED OMP_TILE/UNROLL construct
a generated loop has been moved to some surrounding construct, that
particular loop is represented by all NULL_TREEs in the
OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} vector.
The lowering of the loop transforming constructs is done at gimplification
time, at the start of gimplify_omp_for.
I think this way it is more maintainable over magic clauses with various
loop depths on the other looping constructs or the magic OMP_LOOP_TRANS
construct.
Though, I admit I'm still undecided how to represent the OpenMP 6.0
loop transformation case of say:
#pragma omp for collapse (4)
for (int i = 0; i < 32; ++i)
#pragma omp interchange permutation (2, 1)
#pragma omp reverse
for (int j = 0; j < 32; ++j)
#pragma omp reverse
for (int k = 0; k < 32; ++k)
for (int l = 0; l < 32; ++l)
;
Surely the i loop would go to first vector elements of OMP_FOR_*
of the work-sharing loop, then 2 loops are expecting generated loops
from interchange which would be inside of the body. But the innermost
l loop isn't part of the interchange, so the question is where to
put it. One possibility is to have it in the 4th loop of the OMP_FOR,
another possibility would be to add some artificial construct inside
of the OMP_INTERCHANGE and 2 OMP_REVERSE bodies which would contain
the inner loop(s), e.g. it could be OMP_INTERCHANGE without permutation
clause or some artificial ones or whatever.
I've recently raised various unclear things in the 5.1/5.2/TRs versions
regarding loop transformations, in particular
https://github.com/OpenMP/spec/issues/3908https://github.com/OpenMP/spec/issues/3909
(sorry, private links unless you have OpenMP membership). Until those
are resolved, I have a sorry on trying to mix generated loops with
non-rectangular loops (way too many questions need to be answered before
that can be done) and similarly for mixing non-perfectly nested loops
with generated loops (again, it can be implemented somehow, but is way
too unclear). The second issue is mostly about data sharing, which is
ambiguous, the patch makes the artificial iterators of the loops effectively
private in the associated constructs (more like local), but for user
iterators doesn't do anything in particular, so for now one needs to use
explicit data sharing clauses on the non-loop transformation OpenMP looping
constructs or surrounding parallel/task/target etc.
2024-06-05 Jakub Jelinek <jakub@redhat.com>
Frederik Harwath <frederik@codesourcery.com>
Sandra Loosemore <sandra@codesourcery.com>
gcc/
* tree.def (OMP_TILE, OMP_UNROLL): New tree codes.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_PARTIAL,
OMP_CLAUSE_FULL and OMP_CLAUSE_SIZES.
* tree.h (OMP_LOOPXFORM_CHECK): Define.
(OMP_LOOPXFORM_LOWERED): Define.
(OMP_CLAUSE_PARTIAL_EXPR): Define.
(OMP_CLAUSE_SIZES_LIST): Define.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Add entries
for OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_{PARTIAL,FULL,SIZES}.
(dump_generic_node): Handle OMP_TILE and OMP_UNROLL. Skip printing
loops with NULL OMP_FOR_INIT (node) vector element.
* gimplify.cc (is_gimple_stmt): Handle OMP_TILE and OMP_UNROLL.
(gimplify_omp_taskloop_expr): For SAVE_EXPR use gimplify_save_expr.
(gimplify_omp_loop_xform): New function.
(gimplify_omp_for): Call omp_maybe_apply_loop_xforms and if that
reshuffles what the passed pointer points to, retry or return GS_OK.
Handle OMP_TILE and OMP_UNROLL.
(gimplify_omp_loop): Call omp_maybe_apply_loop_xforms and if that
reshuffles what the passed pointer points to, return GS_OK.
(gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
* omp-general.h (omp_loop_number_of_iterations,
omp_maybe_apply_loop_xforms): Declare.
* omp-general.cc (omp_adjust_for_condition): For LE_EXPR and GE_EXPR
with pointers, don't add/subtract one, but the size of what the
pointer points to.
(omp_loop_number_of_iterations, omp_apply_tile,
find_nested_loop_xform, omp_maybe_apply_loop_xforms): New functions.
gcc/c-family/
* c-common.h (c_omp_find_generated_loop): Declare.
* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_TILE and
OMP_UNROLL.
* c-omp.cc (c_finish_omp_for): Handle generated loops.
(c_omp_is_loop_iterator): Likewise.
(c_find_nested_loop_xform_r, c_omp_find_generated_loop): New
functions.
(c_omp_check_loop_iv): Handle generated loops. For now sorry
on mixing non-rectangular loop with generated loops.
(c_omp_check_loop_binding_exprs): For now sorry on mixing
imperfect loops with generated loops.
(c_omp_directives): Uncomment tile and unroll entries.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE and
PRAGMA_OMP_UNROLL, change PRAGMA_OMP__LAST_ to the latter.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.
* c-pragma.cc (omp_pragmas_simd): Add tile and unroll omp pragmas.
gcc/c/
* c-parser.cc (c_parser_skip_std_attribute_spec_seq): New function.
(check_omp_intervening_code): Reject imperfectly nested tile.
(c_parser_compound_statement_nostart): If want_nested_loop, use
c_parser_omp_next_tokens_can_be_canon_loop instead of just checking
for RID_FOR keyword.
(c_parser_omp_clause_name): Handle full and partial clause names.
(c_parser_omp_clause_allocate): Remove spurious semicolon.
(c_parser_omp_clause_full, c_parser_omp_clause_partial): New
functions.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.
(c_parser_omp_next_tokens_can_be_canon_loop): New function.
(c_parser_omp_loop_nest): Parse C23 attributes. Handle tile/unroll
constructs. Use c_parser_omp_next_tokens_can_be_canon_loop instead
of just checking for RID_FOR keyword. Only add_stmt (body) if it is
non-NULL.
(c_parser_omp_for_loop): Rename tiling variable to oacc_tiling. For
OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
Use c_parser_omp_next_tokens_can_be_canon_loop instead of just
checking for RID_FOR keyword. Remove spurious semicolon. Don't call
c_omp_check_loop_binding_exprs if stmt is NULL. Skip generated loops.
(c_parser_omp_tile_sizes, c_parser_omp_tile): New functions.
(OMP_UNROLL_CLAUSE_MASK): Define.
(c_parser_omp_unroll): New function.
(c_parser_omp_construct): Handle PRAGMA_OMP_TILE and
PRAGMA_OMP_UNROLL.
* c-typeck.cc (c_finish_omp_clauses): Adjust wording of some of the
conflicting clause diagnostic messages to include word clause.
Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial
conflict.
gcc/cp/
* cp-tree.h (dependent_omp_for_p): Add another tree argument.
* parser.cc (check_omp_intervening_code): Reject imperfectly nested
tile.
(cp_parser_statement_seq_opt): If want_nested_loop, use
cp_parser_next_tokens_can_be_canon_loop instead of just checking
for RID_FOR keyword.
(cp_parser_omp_clause_name): Handle full and partial clause names.
(cp_parser_omp_clause_full, cp_parser_omp_clause_partial): New
functions.
(cp_parser_omp_all_clauses): Formatting fix. Handle
PRAGMA_OMP_CLAUSE_PARTIAL and PRAGMA_OMP_CLAUSE_FULL.
(cp_parser_next_tokens_can_be_canon_loop): New function.
(cp_parser_omp_loop_nest): Parse C++11 attributes. Handle tile/unroll
constructs. Use cp_parser_next_tokens_can_be_canon_loop instead
of just checking for RID_FOR keyword. Only add_stmt
cp_parser_omp_loop_nest result if it is non-NULL.
(cp_parser_omp_for_loop): Rename tiling variable to oacc_tiling. For
OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST.
Use cp_parser_next_tokens_can_be_canon_loop instead of just
checking for RID_FOR keyword. Remove spurious semicolon. Don't call
c_omp_check_loop_binding_exprs if stmt is NULL. Skip and/or handle
generated loops. Remove spurious ()s around & operands.
(cp_parser_omp_tile_sizes, cp_parser_omp_tile): New functions.
(OMP_UNROLL_CLAUSE_MASK): Define.
(cp_parser_omp_unroll): New function.
(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE and
PRAGMA_OMP_UNROLL.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Don't call
fold_build_cleanup_point_expr for cases which obviously won't need it,
like checked INTEGER_CSTs. Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}
and diagnose full vs. partial conflict. Adjust wording of some of the
conflicting clause diagnostic messages to include word clause.
(finish_omp_for): Use decl equal to global_namespace as a marker for
generated loop. Pass also body to dependent_omp_for_p. Skip
generated loops.
(finish_omp_for_block): Skip generated loops.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
(tsubst_stmt): Handle OMP_TILE and OMP_UNROLL. Handle or skip
generated loops.
(dependent_omp_for_p): Add body argument. If declv vector element
is NULL, find generated loop.
* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_TILE and OMP_UNROLL.
(cp_fold_r): Likewise.
(cp_genericize_r): Likewise. Skip generated loops.
gcc/fortran/
* gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL,
ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_END_TILE.
(struct gfc_omp_clauses): Add sizes_list, partial, full and erroneous
members.
(enum gfc_exec_op): Add EXEC_OMP_UNROLL and EXEC_OMP_TILE.
(gfc_expr_list_len): Declare.
* match.h (gfc_match_omp_tile, gfc_match_omp_unroll): Declare.
* openmp.cc (gfc_get_location): Declare.
(gfc_free_omp_clauses): Free sizes_list.
(match_oacc_expr_list): Rename to ...
(match_omp_oacc_expr_list): ... this. Add is_omp argument and
change diagnostic wording if it is true.
(enum omp_mask2): Add OMP_CLAUSE_{FULL,PARTIAL,SIZES}.
(gfc_match_omp_clauses): Parse full, partial and sizes clauses.
(gfc_match_oacc_wait): Use match_omp_oacc_expr_list instead of
match_oacc_expr_list.
(OMP_UNROLL_CLAUSES, OMP_TILE_CLAUSES): Define.
(gfc_match_omp_tile, gfc_match_omp_unroll): New functions.
(resolve_omp_clauses): Diagnose full vs. partial clause conflict.
Resolve sizes clause arguments.
(find_nested_loop_in_chain): Use switch instead of series of ifs.
Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse to
list length of sizes_list if present.
(gfc_resolve_do_iterator): Return for EXEC_OMP_TILE or
EXEC_OMP_UNROLL.
(restructure_intervening_code): Remove spurious ()s around & operands.
(is_outer_iteration_variable): Handle EXEC_OMP_TILE and
EXEC_OMP_UNROLL.
(check_nested_loop_in_chain): Likewise.
(expr_is_invariant): Likewise.
(resolve_omp_do): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. Diagnose
tile without sizes clause. Use sizes_list length for count if
non-NULL. Set code->ext.omp_clauses->erroneous on loops where we've
reported diagnostics. Sorry for mixing non-rectangular loops with
generated loops.
(omp_code_to_statement): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Parse end tile, end unroll, tile
and unroll. Move nothing entry alphabetically.
(case_exec_markers): Add ST_OMP_TILE and ST_OMP_UNROLL.
(gfc_ascii_statement): Handle ST_OMP_END_TILE, ST_OMP_END_UNROLL,
ST_OMP_TILE and ST_OMP_UNROLL.
(parse_omp_do): Add nested argument. Handle ST_OMP_TILE and
ST_OMP_UNROLL.
(parse_omp_structured_block): Adjust parse_omp_do caller.
(parse_executable): Likewise. Handle ST_OMP_TILE and ST_OMP_UNROLL.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE and
EXEC_OMP_UNROLL.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans.cc (trans_code): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle full, partial and
sizes clauses. Use tree_cons + nreverse instead of
temporary vector and build_tree_list_vec for tile_list handling.
(gfc_expr_list_len): New function.
(gfc_trans_omp_do): Rename tile to oacc_tile. Handle sizes clause.
Don't assert code->op is EXEC_DO. Handle EXEC_OMP_TILE and
EXEC_OMP_UNROLL.
(gfc_trans_omp_directive): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
* dump-parse-tree.cc (show_omp_clauses): Dump full, partial and
sizes clauses.
(show_omp_node): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
(show_code_node): Likewise.
gcc/testsuite/
* c-c++-common/gomp/attrs-tile-1.c: New test.
* c-c++-common/gomp/attrs-tile-2.c: New test.
* c-c++-common/gomp/attrs-tile-3.c: New test.
* c-c++-common/gomp/attrs-tile-4.c: New test.
* c-c++-common/gomp/attrs-tile-5.c: New test.
* c-c++-common/gomp/attrs-tile-6.c: New test.
* c-c++-common/gomp/attrs-unroll-1.c: New test.
* c-c++-common/gomp/attrs-unroll-2.c: New test.
* c-c++-common/gomp/attrs-unroll-3.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-1.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-2.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-3.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-4.c: New test.
* c-c++-common/gomp/attrs-unroll-inner-5.c: New test.
* c-c++-common/gomp/imperfect-attributes.c: Adjust expected
diagnostics.
* c-c++-common/gomp/imperfect-loop-nest.c: New test.
* c-c++-common/gomp/ordered-5.c: New test.
* c-c++-common/gomp/scan-7.c: New test.
* c-c++-common/gomp/tile-1.c: New test.
* c-c++-common/gomp/tile-2.c: New test.
* c-c++-common/gomp/tile-3.c: New test.
* c-c++-common/gomp/tile-4.c: New test.
* c-c++-common/gomp/tile-5.c: New test.
* c-c++-common/gomp/tile-6.c: New test.
* c-c++-common/gomp/tile-7.c: New test.
* c-c++-common/gomp/tile-8.c: New test.
* c-c++-common/gomp/tile-9.c: New test.
* c-c++-common/gomp/tile-10.c: New test.
* c-c++-common/gomp/tile-11.c: New test.
* c-c++-common/gomp/tile-12.c: New test.
* c-c++-common/gomp/tile-13.c: New test.
* c-c++-common/gomp/tile-14.c: New test.
* c-c++-common/gomp/tile-15.c: New test.
* c-c++-common/gomp/unroll-1.c: New test.
* c-c++-common/gomp/unroll-2.c: New test.
* c-c++-common/gomp/unroll-3.c: New test.
* c-c++-common/gomp/unroll-4.c: New test.
* c-c++-common/gomp/unroll-5.c: New test.
* c-c++-common/gomp/unroll-6.c: New test.
* c-c++-common/gomp/unroll-7.c: New test.
* c-c++-common/gomp/unroll-8.c: New test.
* c-c++-common/gomp/unroll-9.c: New test.
* c-c++-common/gomp/unroll-inner-1.c: New test.
* c-c++-common/gomp/unroll-inner-2.c: New test.
* c-c++-common/gomp/unroll-inner-3.c: New test.
* c-c++-common/gomp/unroll-non-rect-1.c: New test.
* c-c++-common/gomp/unroll-non-rect-2.c: New test.
* c-c++-common/gomp/unroll-non-rect-3.c: New test.
* c-c++-common/gomp/unroll-simd-1.c: New test.
* gcc.dg/gomp/attrs-4.c: Adjust expected diagnostics.
* gcc.dg/gomp/for-1.c: Likewise.
* gcc.dg/gomp/for-11.c: Likewise.
* g++.dg/gomp/attrs-4.C: Likewise.
* g++.dg/gomp/for-1.C: Likewise.
* g++.dg/gomp/pr94512.C: Likewise.
* g++.dg/gomp/tile-1.C: New test.
* g++.dg/gomp/tile-2.C: New test.
* g++.dg/gomp/unroll-1.C: New test.
* g++.dg/gomp/unroll-2.C: New test.
* g++.dg/gomp/unroll-3.C: New test.
* gfortran.dg/gomp/inner-loops-1.f90: New test.
* gfortran.dg/gomp/inner-loops-2.f90: New test.
* gfortran.dg/gomp/pure-1.f90: Add tests for !$omp unroll
and !$omp tile.
* gfortran.dg/gomp/pure-2.f90: Remove those tests from here.
* gfortran.dg/gomp/scan-9.f90: New test.
* gfortran.dg/gomp/tile-1.f90: New test.
* gfortran.dg/gomp/tile-2.f90: New test.
* gfortran.dg/gomp/tile-3.f90: New test.
* gfortran.dg/gomp/tile-4.f90: New test.
* gfortran.dg/gomp/tile-5.f90: New test.
* gfortran.dg/gomp/tile-6.f90: New test.
* gfortran.dg/gomp/tile-7.f90: New test.
* gfortran.dg/gomp/tile-8.f90: New test.
* gfortran.dg/gomp/tile-9.f90: New test.
* gfortran.dg/gomp/tile-10.f90: New test.
* gfortran.dg/gomp/tile-imperfect-nest-1.f90: New test.
* gfortran.dg/gomp/tile-imperfect-nest-2.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-6.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-7.f90: New test.
* gfortran.dg/gomp/tile-inner-loops-8.f90: New test.
* gfortran.dg/gomp/tile-non-rectangular-1.f90: New test.
* gfortran.dg/gomp/tile-non-rectangular-2.f90: New test.
* gfortran.dg/gomp/tile-non-rectangular-3.f90: New test.
* gfortran.dg/gomp/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/tile-unroll-2.f90: New test.
* gfortran.dg/gomp/unroll-1.f90: New test.
* gfortran.dg/gomp/unroll-2.f90: New test.
* gfortran.dg/gomp/unroll-3.f90: New test.
* gfortran.dg/gomp/unroll-4.f90: New test.
* gfortran.dg/gomp/unroll-5.f90: New test.
* gfortran.dg/gomp/unroll-6.f90: New test.
* gfortran.dg/gomp/unroll-7.f90: New test.
* gfortran.dg/gomp/unroll-8.f90: New test.
* gfortran.dg/gomp/unroll-9.f90: New test.
* gfortran.dg/gomp/unroll-10.f90: New test.
* gfortran.dg/gomp/unroll-11.f90: New test.
* gfortran.dg/gomp/unroll-12.f90: New test.
* gfortran.dg/gomp/unroll-13.f90: New test.
* gfortran.dg/gomp/unroll-inner-loop-1.f90: New test.
* gfortran.dg/gomp/unroll-inner-loop-2.f90: New test.
* gfortran.dg/gomp/unroll-no-clause-1.f90: New test.
* gfortran.dg/gomp/unroll-non-rect-1.f90: New test.
* gfortran.dg/gomp/unroll-non-rect-2.f90: New test.
* gfortran.dg/gomp/unroll-simd-1.f90: New test.
* gfortran.dg/gomp/unroll-simd-2.f90: New test.
* gfortran.dg/gomp/unroll-simd-3.f90: New test.
* gfortran.dg/gomp/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/unroll-tile-2.f90: New test.
* gfortran.dg/gomp/unroll-tile-inner-1.f90: New test.
libgomp/
* testsuite/libgomp.c-c++-common/imperfect-transform-1.c: New test.
* testsuite/libgomp.c-c++-common/imperfect-transform-2.c: New test.
* testsuite/libgomp.c-c++-common/matrix-1.h: New test.
* testsuite/libgomp.c-c++-common/matrix-constant-iter.h: New test.
* testsuite/libgomp.c-c++-common/matrix-helper.h: New test.
* testsuite/libgomp.c-c++-common/matrix-no-directive-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-for-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-omp-parallel-for-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-target-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-target-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-omp-taskloop-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-omp-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/matrix-simd-1.c: New test.
* testsuite/libgomp.c-c++-common/matrix-transform-variants-1.h:
New test.
* testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c:
New test.
* testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c:
New test.
* testsuite/libgomp.c-c++-common/unroll-1.c: New test.
* testsuite/libgomp.c-c++-common/unroll-non-rect-1.c: New test.
* testsuite/libgomp.c++/matrix-no-directive-unroll-full-1.C: New test.
* testsuite/libgomp.c++/tile-2.C: New test.
* testsuite/libgomp.c++/tile-3.C: New test.
* testsuite/libgomp.c++/unroll-1.C: New test.
* testsuite/libgomp.c++/unroll-2.C: New test.
* testsuite/libgomp.c++/unroll-full-tile.C: New test.
* testsuite/libgomp.fortran/imperfect-transform-1.f90: New test.
* testsuite/libgomp.fortran/imperfect-transform-2.f90: New test.
* testsuite/libgomp.fortran/inner-1.f90: New test.
* testsuite/libgomp.fortran/nested-fn.f90: New test.
* testsuite/libgomp.fortran/target-imperfect-transform-1.f90: New test.
* testsuite/libgomp.fortran/target-imperfect-transform-2.f90: New test.
* testsuite/libgomp.fortran/tile-1.f90: New test.
* testsuite/libgomp.fortran/tile-2.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/unroll-1.f90: New test.
* testsuite/libgomp.fortran/unroll-2.f90: New test.
* testsuite/libgomp.fortran/unroll-3.f90: New test.
* testsuite/libgomp.fortran/unroll-4.f90: New test.
* testsuite/libgomp.fortran/unroll-5.f90: New test.
* testsuite/libgomp.fortran/unroll-6.f90: New test.
* testsuite/libgomp.fortran/unroll-7a.f90: New test.
* testsuite/libgomp.fortran/unroll-7b.f90: New test.
* testsuite/libgomp.fortran/unroll-7c.f90: New test.
* testsuite/libgomp.fortran/unroll-7.f90: New test.
* testsuite/libgomp.fortran/unroll-8.f90: New test.
* testsuite/libgomp.fortran/unroll-simd-1.f90: New test.
* testsuite/libgomp.fortran/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/unroll-tile-2.f90: New test.
A splay-tree was previously used to lookup equivalent target addresses
for a given host address on offload targets. However, as splay-trees can
modify their structure on lookup, they are not suitable for concurrent
access from separate teams/threads without some form of locking. This
patch changes the lookup data structure to a hashtab instead, which does
not have these issues.
The call to build_indirect_map to initialize the data structure is now
called from just the first thread of the first team to avoid redundant
calls to this function.
2024-03-22 Kwok Cheung Yeung <kcyeung@baylibre.com>
libgomp/
* config/accel/target-indirect.c: Include string.h and hashtab.h.
Remove include of splay-tree.h. Update comments.
(splay_tree_prefix, splay_tree_c): Delete.
(struct indirect_map_t): New.
(hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New.
(GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier.
(USE_SPLAY_TREE_LOOKUP): Rename to...
(USE_HASHTAB_LOOKUP): ..this.
(indirect_map, indirect_array): Delete.
(indirect_htab): New.
(build_indirect_map): Remove locking. Build indirect map using
hashtab.
(GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target
address.
(GOMP_target_map_indirect_ptr): Remove volatile qualifier.
* config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map
from first thread of first team only.
* config/nvptx/team.c (gomp_nvptx_main): Likewise.
* testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main):
Add missing break statements.
* testsuite/libgomp.fortran/declare-target-indirect-2.f90: Remove
xfail.
Dummy procedures look similar to variables but aren't - neither in Fortran
nor in OpenMP. As the middle end sees PARM_DECLs, mark them as predetermined
firstprivate for mapping (as already done in gfc_omp_predetermined_sharing).
This does not address the isses related to procedure pointers, which are
still discussed on spec level [see PR].
PR fortran/114283
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_omp_predetermined_mapping): Map dummy
procedures as firstprivate.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/declare-target-indirect-4.f90: New test.
2024-01-20 John David Anglin <danglin@gcc.gnu.org>
libgomp/ChangeLog:
* testsuite/libgomp.fortran/alloc-comp-3.f90: Increase
timeout by 2 on hppa*-*-*.
This patch has been separated out from the C++ "declare mapper"
support patch. It contains just the gimplify.cc rearrangement
work, mostly moving gimplification from gimplify_scan_omp_clauses
to gimplify_adjust_omp_clauses for map clauses.
The motivation for doing this was that we don't know if we need to
instantiate mappers implicitly until the body of an offload region has
been scanned, i.e. in gimplify_adjust_omp_clauses, but we also need the
un-gimplified form of clauses to sort by base-pointer dependencies after
mapper instantiation has taken place.
The patch also reimplements the "present" clause sorting code to avoid
another sorting pass on mapping nodes.
This version of the patch is based on the version posted for og13, and
additionally incorporates a follow-on fix for DECL_VALUE_EXPR handling
in gimplify_adjust_omp_clauses:
"OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622223.html
Parts of:
"OpenMP: OpenMP 5.2 semantics for pointers with unmapped target"
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623351.html
2023-12-16 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (omp_segregate_mapping_groups): Handle "present" groups.
(gimplify_scan_omp_clauses): Use mapping group functionality to
iterate through mapping nodes. Remove most gimplification of
OMP_CLAUSE_MAP nodes from here, but still populate ctx->variables
splay tree.
(gimplify_adjust_omp_clauses): Move most gimplification of
OMP_CLAUSE_MAP nodes here.
libgomp/
* testsuite/libgomp.fortran/target-enter-data-6.f90: Remove XFAIL.
..., as in 'libgomp.c-c++-common/map-arrayofstruct-{2,3}.c'.
Minor fix-up for commit f5745dc142
"OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic".
libgomp/
* testsuite/libgomp.fortran/map-subarray-5.f90: Restrict
'dg-output's to 'target offload_device_nonshared_as'.
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time. Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.
This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time. Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).
For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c). That could perhaps be addressed with a follow-up
patch, if necessary.
This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support
(though that is not yet applied to mainline).
2023-08-18 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.
libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
This patch changes the mapping node arrangement used for array components
of derived types in order to accommodate for changes made in the previous
patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed
derived-type members instead of "GOMP_MAP_ALWAYS_POINTER".
We change the mapping nodes used for a derived-type mapping like this:
type T
integer, pointer, dimension(:) :: arrptr
end type T
type(T) :: tvar
[...]
!$omp target map(tofrom: tvar%arrptr)
So that the nodes used look like this:
1) map(to: tvar%arrptr) -->
GOMP_MAP_TO [implicit] *tvar%arrptr%data (the array data)
GOMP_MAP_TO_PSET tvar%arrptr (the descriptor)
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data
2) map(tofrom: tvar%arrptr(3:8) -->
GOMP_MAP_TOFROM *tvar%arrptr%data(3) (size 8-3+1, etc.)
GOMP_MAP_TO_PSET tvar%arrptr
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data (bias 3, etc.)
In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer
-- so we drop it entirely. (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)
In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:
GOMP_MAP_STRUCT tvar (len: 1)
GOMP_MAP_TO_PSET tvar%arr (size: 64, etc.) <--. moved here
[...] |
GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data
In another case, if we have an array of derived-type values "dtarr",
and mappings like:
i = 1
j = 1
map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))
We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):
GOMP_MAP_STRUCT dtvar (len: 2)
GOMP_MAP_TO_PSET dtvar(i)%arrptr
GOMP_MAP_TO_PSET dtvar(j)%arrptr
[...]
GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3) (size: 8-3+1)
GOMP_MAP_ATTACH_DETACH dtvar(j)%arrptr%data
GOMP_MAP_TO [implicit] *dtvar(i)%arrptr%data(1) (size: whole array)
GOMP_MAP_ATTACH_DETACH dtvar(i)%arrptr%data
Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.
The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement. Also now if you have
mappings like this:
#pragma omp target enter data map(to: dv, dv%arr(1:20))
The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:
GOMP_MAP_TO dv
GOMP_MAP_TO *dv%arr%data
GOMP_MAP_TO_PSET dv%arr <-- deleted (array section mapping)
GOMP_MAP_ATTACH_DETACH dv%arr%data
To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion. A new
flag is introduced so the middle-end knows when the latter two kinds
are being used specifically for an array descriptor.
This version of the patch fixes "omp target exit data" handling
for GOMP_MAP_DELETE, and adds pretty-printing dump output
for the OMP_CLAUSE_RELEASE_DESCRIPTOR flag (for a little extra
clarity).
Also I noticed the handling of descriptors on *OpenACC*
exit-data directives was inconsistent, so I've made those use
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the new flag in the same way as
OpenMP too. In the end it doesn't actually matter to the runtime,
which handles GOMP_MAP_RELEASE/GOMP_MAP_DELETE/GOMP_MAP_TO_PSET for
array descriptors on OpenACC "exit data" directives the same, anyway,
and doing it this way in the FE avoids needless divergence.
I've added a couple of new tests (gomp/target-enter-exit-data.f90 and
goacc/enter-exit-data-2.f90).
2023-12-07 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* dependency.cc (gfc_omp_expr_prefix_same): New function.
* dependency.h (gfc_omp_expr_prefix_same): Add prototype.
* gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
union.
* trans-openmp.cc (dependency.h): Include.
(gfc_trans_omp_array_section): Adjust mapping node arrangement for
array descriptors. Use GOMP_MAP_TO_PSET or
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR
flag set.
(gfc_symbol_rooted_namelist): New function.
(gfc_trans_omp_clauses): Check subcomponent and subarray/element
accesses elsewhere in the clause list for pointers to derived types or
array descriptors, and adjust or drop mapping nodes appropriately.
Adjust for changes to mapping node arrangement.
(gfc_trans_oacc_executable_directive): Pass code op through.
gcc/
* gimplify.cc (omp_map_clause_descriptor_p): New function.
(build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use
above function.
(omp_tsort_mapping_groups): Process nodes that have
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't. Add
enter_exit_data parameter.
(omp_resolve_clause_dependencies): Remove GOMP_MAP_TO_PSET mappings if
we're mapping the whole containing derived-type variable.
(omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling.
Remove GOMP_MAP_ALWAYS_POINTER handling.
(gimplify_scan_omp_clauses): Pass enter_exit argument to
omp_tsort_mapping_groups. Don't adjust/remove GOMP_MAP_TO_PSET
mappings for derived-type components here.
* tree.h (OMP_CLAUSE_RELEASE_DESCRIPTOR): New macro.
* tree-pretty-print.cc (dump_omp_clause): Show
OMP_CLAUSE_RELEASE_DESCRIPTOR in dump output (with
GOMP_MAP_TO_PSET-like syntax).
gcc/testsuite/
* gfortran.dg/goacc/enter-exit-data-2.f90: New test.
* gfortran.dg/goacc/finalize-1.f: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.
* gfortran.dg/gomp/map-subarray-2.f90: New test.
* gfortran.dg/gomp/map-subarray.f90: New test.
* gfortran.dg/gomp/target-enter-exit-data.f90: New test.
libgomp/
* testsuite/libgomp.fortran/map-subarray.f90: New test.
* testsuite/libgomp.fortran/map-subarray-2.f90: New test.
* testsuite/libgomp.fortran/map-subarray-3.f90: New test.
* testsuite/libgomp.fortran/map-subarray-4.f90: New test.
* testsuite/libgomp.fortran/map-subarray-6.f90: New test.
* testsuite/libgomp.fortran/map-subarray-7.f90: New test.
* testsuite/libgomp.fortran/map-subarray-8.f90: New test.
* testsuite/libgomp.fortran/map-subcomponents.f90: New test.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for
descriptor-mapping changes. Remove XFAIL.
This patch reworks clause expansion in the C, C++ and (to a lesser
extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in
GPU offloading support.
At present a single clause may be turned into several mapping nodes,
or have its mapping type changed, in several places scattered through
the front- and middle-end. The analysis relating to which particular
transformations are needed for some given expression has become quite hard
to follow. Briefly, we manipulate clause types in the following places:
1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of
rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into
ATTACH_DETACH, or mark the decl addressable.
2. In semantics.cc or c-typeck.cc, clauses are expanded in
handle_omp_array_sections (called via {c_}finish_omp_clauses, or in
finish_omp_clauses itself. The two cases are for processing array
sections (the former), or non-array sections (the latter).
3. In gimplify.cc, we build sibling lists for struct accesses, which
groups and sorts accesses along with their struct base, creating
new ALLOC/RELEASE nodes for pointers.
4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be
adjusted or created.
This patch doesn't completely disrupt this scheme, though clause
types are no longer adjusted in c_omp_adjust_map_clauses (step 1).
Clause expansion in step 2 (for C and C++) now uses a single, unified
mechanism, parts of which are also reused for analysis in step 3.
Rather than the kind-of "ad-hoc" pattern matching on addresses used to
expand clauses used at present, a new method for analysing addresses is
introduced. This does a recursive-descent tree walk on expression nodes,
and emits a vector of tokens describing each "part" of the address.
This tokenized address can then be translated directly into mapping nodes,
with the assurance that no part of the expression has been inadvertently
skipped or misinterpreted. In this way, all the variations of ways
pointers, arrays, references and component accesses might be combined
can be teased apart into easily-understood cases - and we know we've
"parsed" the whole address before we start analysis, so the right code
paths can easily be selected.
For example, a simple access "arr[idx]" might parse as:
base-decl access-indexed-array
or "mystruct->foo[x]" with a pointer "foo" component might parse as:
base-decl access-pointer component-selector access-pointer
A key observation is that support for "array" bases, e.g. accesses
whose root nodes are not structures, but describe scalars or arrays,
and also *one-level deep* structure accesses, have first-class support
in gimplify and beyond. Expressions that use deeper struct accesses
or e.g. multiple indirections were more problematic: some cases worked,
but lots of cases didn't. This patch reimplements the support for those
in gimplify.cc, again using the new "address tokenization" support.
An expression like "mystruct->foo->bar[0:10]" used in a mapping node will
translate the right-hand access directly in the front-end. The base for
the access will be "mystruct->foo". This is handled recursively in
gimplify.cc -- there may be several accesses of "mystruct"'s members
on the same directive, so the sibling-list building machinery can be
used again. (This was already being done for OpenACC, but the new
implementation differs somewhat in details, and is more robust.)
For OpenMP, in the case where the base pointer itself,
i.e. "mystruct->foo" here, is NOT mapped on the same directive, we
create a "fragile" mapping. This turns the "foo" component access
into a zero-length allocation (which is a new feature for the runtime,
so support has been added there too).
A couple of changes have been made to how mapping clauses are turned
into mapping nodes:
The first change is based on the observation that it is probably never
correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for
references), because if the containing struct is already mapped on the
target then the host version of the pointer in question will be corrupted
if the struct is copied back from the target. This patch removes all
such uses, across each of C, C++ and Fortran.
The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes
are processed during sibling-list creation. For OpenMP, for pointer
components, we must map the base pointer separately from an array section
that uses the base pointer, so e.g. we must have both "map(mystruct.base)"
and "map(mystruct.base[0:10])" mappings. These create nodes such as:
GOMP_MAP_TOFROM mystruct.base
G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base
Instead of using the first of these directly when building the struct
sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH,
leading to:
GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base
we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that
drops the GOMP_MAP_TOFROM for the base pointer, marks the second group
as having had a base-pointer mapping, then omp_build_struct_sibling_lists
can create:
GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize]
This ends up working better in many cases, particularly those involving
references. (The "alloc" space is immediately overwritten by a pointer
attachment, so this is mildly more efficient than a redundant TO mapping
at runtime also.)
There is support in the address tokenizer for "arbitrary" base expressions
which aren't rooted at a decl, but that is not used as present because
such addresses are disallowed at parse time.
In the front-ends, the address tokenization machinery is mostly only
used for clause expansion and not for diagnostics at present. It could
be used for those too, which would allow more of my previous "address
inspector" implementation to be removed.
The new bits in gimplify.cc work with OpenACC also.
This version of the patch addresses several first-pass review comments
from Tobias, and fixes a few previously-missed cases for manually-managed
ragged array mappings (including cases using references). Some arbitrary
differences between handling of clause expansion for C vs. C++ have also
been fixed, and some fragments from later in the patch series have been
moved forward (where they were useful for fixing bugs). Several new
test cases have been added.
2023-11-29 Julian Brown <julian@codesourcery.com>
gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET.
(omp_addr_token): Add forward declaration.
(c_omp_address_inspector): New class.
* c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but
do not change any mapping node types.
(c_omp_address_inspector::unconverted_ref_origin,
c_omp_address_inspector::component_access_p,
c_omp_address_inspector::check_clause,
c_omp_address_inspector::get_root_term,
c_omp_address_inspector::map_supported_p,
c_omp_address_inspector::get_origin,
c_omp_address_inspector::maybe_unconvert_ref,
c_omp_address_inspector::maybe_zero_length_array_section,
c_omp_address_inspector::expand_array_base,
c_omp_address_inspector::expand_component_selector,
c_omp_address_inspector::expand_map_clause): New methods.
(omp_expand_access_chain): New function.
gcc/c/
* c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for c_finish_omp_clauses call.
(c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses.
(c_parser_oacc_compute): Likewise.
(c_parser_omp_target_data, c_parser_omp_target_enter_data): Support
ATTACH kind.
(c_parser_omp_target_exit_data): Support DETACH kind.
(check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here.
* c-typeck.cc (handle_omp_array_sections_1,
handle_omp_array_sections, c_finish_omp_clauses): Use
c_omp_address_inspector class and OMP address tokenizer to analyze and
expand map clause expressions. Fix some diagnostics. Fix "is OpenACC"
condition for C_ORT_ACC_TARGET addition.
gcc/cp/
* parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for finish_omp_clauses call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support
GOMP_MAP_ATTACH kind.
(cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind.
(cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses.
(cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses.
(cp_parser_oacc_compute): Likewise.
* pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to
tsubst_omp_clauses for OpenACC compute regions.
* semantics.cc (cp_omp_address_inspector): New class, derived from
c_omp_address_inspector.
(handle_omp_array_sections_1, handle_omp_array_sections,
finish_omp_clauses): Use cp_omp_address_inspector class and OMP address
tokenizer to analyze and expand OpenMP map clause expressions. Fix
some diagnostics. Support C_ORT_ACC_TARGET.
(finish_omp_target): Handle GOMP_MAP_POINTER.
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter.
Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for
derived type components.
(gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section.
gcc/
* gimplify.cc (build_struct_comp_nodes): Don't process
GOMP_MAP_ATTACH_DETACH "middle" nodes here.
(omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for
nested struct handling.
(omp_strip_components_and_deref, omp_strip_indirections): Remove
functions.
(omp_get_attachment): Handle GOMP_MAP_DETACH here.
(omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH,
GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer
component array sections.
(omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile
fields.
(omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT.
(omp_index_mapping_groups_1): Skip reprocess_struct groups.
(omp_get_nonfirstprivate_group, omp_directive_maps_explicitly,
omp_resolve_clause_dependencies, omp_first_chained_access_token): New
functions.
(omp_check_mapping_compatibility): Adjust accepted node combinations
for "from" clauses using release instead of alloc.
(omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P,
REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer
to analyze addresses. Reimplement nested struct handling, and
implement "fragile groups".
(omp_build_struct_sibling_lists): Adjust for changes to
omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes
after GOMP_MAP_STRUCT nodes.
(gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use
OMP address tokenizer.
(gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc
instead of build_simple_mem_ref_loc.
* omp-general.cc (omp-general.h, tree-pretty-print.h): Include.
(omp_addr_tokenizer): New namespace.
(omp_addr_tokenizer::omp_addr_token): New.
(omp_addr_tokenizer::omp_parse_component_selector,
omp_addr_tokenizer::omp_parse_ref,
omp_addr_tokenizer::omp_parse_pointer,
omp_addr_tokenizer::omp_parse_access_method,
omp_addr_tokenizer::omp_parse_access_methods,
omp_addr_tokenizer::omp_parse_structure_base,
omp_addr_tokenizer::omp_parse_structured_expr,
omp_addr_tokenizer::omp_parse_array_expr,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New functions.
(omp_parse_expr, debug_omp_tokenized_addr): New functions.
* omp-general.h (omp_addr_tokenizer::access_method_kinds,
omp_addr_tokenizer::structure_base_kinds,
omp_addr_tokenizer::token_type,
omp_addr_tokenizer::omp_addr_token,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New.
(omp_addr_token, omp_parse_expr): New.
* omp-low.cc (scan_sharing_clauses): Skip error check for references
to pointers.
* tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro.
gcc/testsuite/
* c-c++-common/gomp/clauses-2.c: Fix error output.
* c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output.
* c-c++-common/gomp/target-50.c: Adjust scan output.
* c-c++-common/gomp/target-enter-data-1.c: Adjust scan output.
* g++.dg/gomp/static-component-1.C: New test.
* gcc.dg/gomp/target-3.c: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.
libgomp/
* target.c (gomp_map_pointer): Modify zero-length array section
pointer handling.
(gomp_attach_pointer): Likewise.
(gomp_map_fields_existing): Use gomp_map_0len_lookup.
(gomp_attach_pointer): Allow attaching null pointers (or Fortran
"unassociated" pointers).
(gomp_map_vars_internal): Handle zero-sized struct members. Add
diagnostic for unmapped struct pointer members.
* testsuite/libgomp.c-c++-common/baseptrs-1.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-2.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-6.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-7.c: New test.
* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c++/class-array-1.C: New test.
* testsuite/libgomp.c++/baseptrs-3.C: New test.
* testsuite/libgomp.c++/baseptrs-4.C: New test.
* testsuite/libgomp.c++/baseptrs-5.C: New test.
* testsuite/libgomp.c++/baseptrs-8.C: New test.
* testsuite/libgomp.c++/baseptrs-9.C: New test.
* testsuite/libgomp.c++/ref-mapping-1.C: New test.
* testsuite/libgomp.c++/target-48.C: New test.
* testsuite/libgomp.c++/target-49.C: New test.
* testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test.
* testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2
semantics.
* testsuite/libgomp.c++/target-this-3.C: Likewise.
* testsuite/libgomp.c++/target-this-4.C: Likewise.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL.
* testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.
BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT
libgcc/ChangeLog:
* gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
(__gthrw): New function.
(__gthread_rwlock_rdlock): New function.
(__gthread_rwlock_tryrdlock): New function.
(__gthread_rwlock_wrlock): New function.
(__gthread_rwlock_trywrlock): New function.
(__gthread_rwlock_unlock): New function.
libgfortran/ChangeLog:
* io/async.c (DEBUG_LINE): New macro.
* io/async.h (RWLOCK_DEBUG_ADD): New macro.
(CHECK_RDLOCK): New macro.
(CHECK_WRLOCK): New macro.
(TAIL_RWLOCK_DEBUG_QUEUE): New macro.
(IN_RWLOCK_DEBUG_QUEUE): New macro.
(RDLOCK): New macro.
(WRLOCK): New macro.
(RWUNLOCK): New macro.
(RD_TO_WRLOCK): New macro.
(INTERN_RDLOCK): New macro.
(INTERN_WRLOCK): New macro.
(INTERN_RWUNLOCK): New macro.
* io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
a comment.
(unit_lock): Remove including associated internal_proto.
(unit_rwlock): New declarations including associated internal_proto.
(dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
instead of __gthread_mutex_lock and __gthread_mutex_unlock on
unit_lock.
* io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
unit_rwlock instead of LOCK and UNLOCK on unit_lock.
(st_write_done_worker): Likewise.
* io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
comment. Use unit_rwlock variable instead of unit_lock variable.
(get_gfc_unit_from_unit_root): New function.
(get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
LOCK and UNLOCK on unit_lock.
(close_units): Likewise.
(newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
unit_lock.
* io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
instead of LOCK and UNLOCK on unit_lock.
(flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
of LOCK and UNLOCK on unit_lock.
This patch enables the use of mixed-types for simd clones for AArch64, adds
aarch64 as a target_vect_simd_clones and corrects the way the simdlen is chosen
for non-specified simdlen clauses according to the 'Vector Function Application
Binary Interface Specification for AArch64'.
Additionally this patch also restricts combinations of simdlen and
return/argument types that map to vectors larger than 128 bits as we currently
do not have a way to represent these types in a way that is consistent
internally and externally.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (lane_size): New function.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Determine simdlen according to NDS rule
and reject combination of simdlen and types that lead to vectors larger than 128bits.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
* c-c++-common/gomp/declare-variant-14.c: Adapt test for aarch64.
* c-c++-common/gomp/pr60823-1.c: Likewise.
* c-c++-common/gomp/pr60823-2.c: Likewise.
* c-c++-common/gomp/pr60823-3.c: Likewise.
* g++.dg/gomp/attrs-10.C: Likewise.
* g++.dg/gomp/declare-simd-1.C: Likewise.
* g++.dg/gomp/declare-simd-3.C: Likewise.
* g++.dg/gomp/declare-simd-4.C: Likewise.
* g++.dg/gomp/declare-simd-7.C: Likewise.
* g++.dg/gomp/declare-simd-8.C: Likewise.
* g++.dg/gomp/pr88182.C: Likewise.
* gcc.dg/declare-simd.c: Likewise.
* gcc.dg/gomp/declare-simd-1.c: Likewise.
* gcc.dg/gomp/declare-simd-3.c: Likewise.
* gcc.dg/gomp/pr87887-1.c: Likewise.
* gcc.dg/gomp/pr87895-1.c: Likewise.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/pr99542.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-1.c: Likewise.
* gcc.dg/vect/vect-simd-clone-2.c: Likewise.
* gcc.dg/vect/vect-simd-clone-4.c: Likewise.
* gcc.dg/vect/vect-simd-clone-5.c: Likewise.
* gcc.dg/vect/vect-simd-clone-6.c: Likewise.
* gcc.dg/vect/vect-simd-clone-7.c: Likewise.
* gcc.dg/vect/vect-simd-clone-8.c: Likewise.
* gfortran.dg/gomp/declare-simd-2.f90: Likewise.
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* gfortran.dg/gomp/pr79154-1.f90: Likewise.
* gfortran.dg/gomp/pr83977.f90: Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.c/declare-variant-1.c: Adapt test for aarch64.
* testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.
This commit adds -fopenmp-allocators which enables support for
'omp allocators' and 'omp allocate' that are associated with a Fortran
allocate-stmt. If such a construct is encountered, an error is shown,
unless the -fopenmp-allocators flag is present.
With -fopenmp -fopenmp-allocators, those constructs get turned into
GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp)
ensures deallocation and reallocation (via intrinsic assignments) are
properly directed to GOMP_free/omp_realloc - while normal Fortran
allocations are processed by free/realloc.
In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the
version field of the Fortran array discriptor is (mis)used: 0 indicates
the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars,
there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the
pointer address to a splay_tree while GOMP_is_alloc(ptr) will return
true it was previously added but also removes it from the list.
Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of
omp-builtins.def and libgomp gains the mentioned two new function.
gcc/ChangeLog:
* builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
* omp-builtins.def (BUILT_IN_GOMP_REALLOC): New.
* builtins.cc (builtin_fnspec): Handle it.
* gimple-ssa-warn-access.cc (fndecl_alloc_p,
matching_alloc_calls_p): Likewise.
* gimple.cc (nonfreeing_call_p): Likewise.
* predict.cc (expr_expected_value_1): Likewise.
* tree-ssa-ccp.cc (evaluate_stmt): Likewise.
* tree.cc (fndecl_dealloc_argno): Likewise.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE
and EXEC_OMP_ALLOCATORS.
* f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'.
(ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define.
* gfortran.h (gfc_omp_clauses): Add contained_in_target_construct.
* invoke.texi (-fopenacc, -fopenmp): Update based on C version.
(-fopenmp-simd): New, based on C version.
(-fopenmp-allocators): New.
* lang.opt (fopenmp-allocators): Add.
* openmp.cc (resolve_omp_clauses): For allocators/allocate directive,
add target and no dynamic_allocators diagnostic and more invalid
diagnostic.
* parse.cc (decode_omp_directive): Set contains_teams_construct.
* trans-array.h (gfc_array_allocate): Update prototype.
(gfc_conv_descriptor_version): New prototype.
* trans-decl.cc (gfc_init_default_dt): Fix comment.
* trans-array.cc (gfc_conv_descriptor_version): New.
(gfc_array_allocate): Support GOMP_alloc allocation.
(gfc_alloc_allocatable_for_assignment, structure_alloc_comps):
Handle GOMP_free/omp_realloc as needed.
* trans-expr.cc (gfc_conv_procedure_call): Likewise.
(alloc_scalar_allocatable_for_assignment): Likewise.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise.
* trans-openmp.cc (gfc_trans_omp_allocators,
gfc_trans_omp_directive): Handle allocators/allocate directive.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New.
* trans-stmt.h (gfc_trans_allocate): Update prototype.
* trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc.
* trans-types.cc (gfc_get_dtype_rank_type): Set version field.
* trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable):
Update to handle GOMP_alloc.
(gfc_deallocate_with_status, gfc_deallocate_scalar_with_status):
Handle GOMP_free.
(trans_code): Update call.
* trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc):
Update prototype.
(gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype.
* types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New.
libgomp/ChangeLog:
* allocator.c (struct fort_alloc_splay_tree_key_s,
fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New.
* libgomp.h: Define splay_tree_static for 'reverse' splay tree.
* libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and
GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ...
(GOMP_5.1.1): ... here.
* libgomp.texi (Impl. Status, Memory management): Update for
allocators/allocate directives.
* splay-tree.c: Handle splay_tree_static define to declare all
functions as static.
(splay_tree_lookup_node): New.
* splay-tree.h: Handle splay_tree_decl_only define.
(splay_tree_lookup_node): New prototype.
* target.c: Define splay_tree_static for 'reverse'.
* testsuite/libgomp.fortran/allocators-1.f90: New test.
* testsuite/libgomp.fortran/allocators-2.f90: New test.
* testsuite/libgomp.fortran/allocators-3.f90: New test.
* testsuite/libgomp.fortran/allocators-4.f90: New test.
* testsuite/libgomp.fortran/allocators-5.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-14.f90: Add coarray and
not-listed tests.
* gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message.
* gfortran.dg/bind_c_array_params_2.f90: Update expected
dump for dtype '.version=0'.
* gfortran.dg/gomp/allocate-16.f90: New test.
* gfortran.dg/gomp/allocators-3.f90: New test.
* gfortran.dg/gomp/allocators-4.f90: New test.
gcc/fortran/ChangeLog:
* gfortran.h (ext_attr_t): Add omp_allocate flag.
* match.cc (gfc_free_omp_namelist): Void deleting same
u2.allocator multiple times now that a sequence can use
the same one.
* openmp.cc (gfc_match_omp_clauses, gfc_match_omp_allocate): Use
same allocator expr multiple times.
(is_predefined_allocator): Make static.
(gfc_resolve_omp_allocate): Update/extend restriction checks;
remove sorry message.
(resolve_omp_clauses): Reject corarrays in allocate/allocators
directive.
* parse.cc (check_omp_allocate_stmt): Permit procedure pointers
here (rejected later) for less misleading diagnostic.
* trans-array.cc (gfc_trans_auto_array_allocation): Propagate
size for GOMP_alloc and location to which it should be added to.
* trans-decl.cc (gfc_trans_deferred_vars): Handle 'omp allocate'
for stack variables; sorry for static variables/common blocks.
* trans-openmp.cc (gfc_trans_omp_clauses): Evaluate 'allocate'
clause's allocator only once; fix adding expressions to the
block.
(gfc_trans_omp_single): Pass a block to gfc_trans_omp_clauses.
gcc/ChangeLog:
* gimplify.cc (gimplify_bind_expr): Handle Fortran's
'omp allocate' for stack variables.
libgomp/ChangeLog:
* libgomp.texi (OpenMP Impl. Status): Mention that Fortran now
supports the allocate directive for stack variables.
* testsuite/libgomp.fortran/allocate-5.f90: New test.
* testsuite/libgomp.fortran/allocate-6.f90: New test.
* testsuite/libgomp.fortran/allocate-7.f90: New test.
* testsuite/libgomp.fortran/allocate-8.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/allocate-14.c: Fix directive name.
* c-c++-common/gomp/allocate-15.c: Likewise.
* c-c++-common/gomp/allocate-9.c: Fix comment typo.
* gfortran.dg/gomp/allocate-4.f90: Remove sorry dg-error.
* gfortran.dg/gomp/allocate-7.f90: Likewise.
* gfortran.dg/gomp/allocate-10.f90: New test.
* gfortran.dg/gomp/allocate-11.f90: New test.
* gfortran.dg/gomp/allocate-12.f90: New test.
* gfortran.dg/gomp/allocate-13.f90: New test.
* gfortran.dg/gomp/allocate-14.f90: New test.
* gfortran.dg/gomp/allocate-15.f90: New test.
* gfortran.dg/gomp/allocate-8.f90: New test.
* gfortran.dg/gomp/allocate-9.f90: New test.
For strictly structured blocks, a BLOCK was created but the code
was placed after the block the outer structured block. Additionally,
labelled blocks were mishandled. As the code is now properly in a
BLOCK, it solves additional issues.
gcc/fortran/ChangeLog:
* parse.cc (parse_omp_structured_block): Make the user code end
up inside of BLOCK construct for strictly structured blocks;
fix fallout for 'section' and 'teams'.
* openmp.cc (resolve_omp_target): Fix changed BLOCK handling
for teams in target checking.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/block_17.f90: New test.
* gfortran.dg/gomp/strictly-structured-block-5.f90: New test.
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop. In GCC this code is moved
into the inner loop body by the respective front ends.
In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.
gcc/fortran/ChangeLog
* gfortran.h (struct gfc_namespace): Add omp_structured_block bit.
* openmp.cc: Include omp-api.h.
(resolve_omp_clauses): Consolidate inscan reduction clause conflict
checking here.
(find_nested_loop_in_chain): New.
(find_nested_loop_in_block): New.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
Handle imperfectly-nested loops when looking for nested omp scan.
Refactor to move inscan reduction clause conflict checking to
resolve_omp_clauses.
(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
(struct icode_error_state): New.
(icode_code_error_callback): New.
(icode_expr_error_callback): New.
(diagnose_intervening_code_errors_1): New.
(diagnose_intervening_code_errors): New.
(make_structured_block): New.
(restructure_intervening_code): New.
(is_outer_iteration_variable): Do not assume loops are perfectly
nested.
(check_nested_loop_in_chain): New.
(check_nested_loop_in_block_state): New.
(check_nested_loop_in_block_symbol): New.
(check_nested_loop_in_block): New.
(expr_uses_intervening_var): New.
(is_intervening_var): New.
(expr_is_invariant): Do not assume loops are perfectly nested.
(resolve_omp_do): Handle imperfectly-nested loops.
* trans-stmt.cc (gfc_trans_block_construct): Generate
OMP_STRUCTURED_BLOCK if magic bit is set on block namespace.
gcc/testsuite/ChangeLog
* gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
* gfortran.dg/gomp/collapse2.f90: Likewise.
* gfortran.dg/gomp/imperfect-gotos.f90: New.
* gfortran.dg/gomp/imperfect-invalid-scope.f90: New.
* gfortran.dg/gomp/imperfect1.f90: New.
* gfortran.dg/gomp/imperfect2.f90: New.
* gfortran.dg/gomp/imperfect3.f90: New.
* gfortran.dg/gomp/imperfect4.f90: New.
* gfortran.dg/gomp/imperfect5.f90: New.
libgomp/ChangeLog
* testsuite/libgomp.fortran/imperfect-destructor.f90: New.
* testsuite/libgomp.fortran/imperfect1.f90: New.
* testsuite/libgomp.fortran/imperfect2.f90: New.
* testsuite/libgomp.fortran/imperfect3.f90: New.
* testsuite/libgomp.fortran/imperfect4.f90: New.
* testsuite/libgomp.fortran/target-imperfect1.f90: New.
* testsuite/libgomp.fortran/target-imperfect2.f90: New.
* testsuite/libgomp.fortran/target-imperfect3.f90: New.
* testsuite/libgomp.fortran/target-imperfect4.f90: New.
When copying a 2D or 3D rectangular memmory block, the performance is
better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the
data one by one. That's what this commit does.
Additionally, it permits device-to-device copies, if neccessary using a
temporary variable on the host.
include/ChangeLog:
* cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED,
CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE.
(CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D,
CUDA_MEMCPY3D_PEER): New typdefs.
(cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned,
cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer,
cuMemcpy3DPeerAsync): New prototypes.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): New prototypes.
* libgomp.h (struct gomp_device_descr): Add memcpy2d_func
and memcpy3d_func.
* libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used.
* oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL.
* plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned,
cuMemcpy3D): Invoke via CUDA_ONE_CALL.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d,
GOMP_OFFLOAD_memcpy3d): New.
* target.c (omp_target_memcpy_rect_worker):
(omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy):
Permit all device-to-device copyies; invoke new plugins for
2D and 3D copying when available.
(gomp_load_plugin_for_device): DLSYM the new plugin functions.
* testsuite/libgomp.c/target-12.c: Fix dimension bug.
* testsuite/libgomp.fortran/target-12.f90: Likewise.
* testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2'
the code
for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
i = count.0 * 2 + 1;
While such an inner loop can be collapsed, a non-rectangular could not.
With this commit and for all constant loop steps, a simple loop such
as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the
constant steps of 1 and -1.)
The constant step permits to know the direction (increasing/decreasing)
that is required for the loop condition.
The new code is only valid if one assumes no overflow of the loop variable.
However, the Fortran standard can be read that this must be ensured by
the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4):
"The execution of any numeric operation whose result is not defined by
the arithmetic used by the processor is prohibited."
And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the
following: The number of loop iterations handled by an iteration count,
which would permit code like 'do i = huge(i)-5, huge(i),4'. However,
in step (3), this count is not only decremented by one but also:
"... The DO variable, if any, is incremented by the value of the
incrementation parameter m3."
And for the example above, 'i' would be 'huge(i)+3' in the last
execution cycle, which exceeds the largest model number and should
render the example as invalid.
PR fortran/107424
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_nonrect_loop_expr): Accept all
constant loop steps.
(gfc_trans_omp_do): Likewise; use sign to determine
loop direction.
libgomp/ChangeLog:
* libgomp.texi (Impl. Status 5.0): Add link to new PR110735.
* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable
commented tests.
* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove
test file; tests are in non-rectangular-loop-1.f90.
* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change
testcase to use a non-constant step to retain the 'sorry' test.
* testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/linear-2.f90: Update dump to remove
the additional count variable.
The 'uses_allocators' clause to the 'target' construct accepts predefined
allocators and can also be used to define a new allocator for a target region.
As predefined allocators in GCC do not require special handling, those can and
are ignored after parsing, such that this feature now works. On the other hand,
defining a new allocator will fail for now with a 'sorry, unimplemented'.
Note that both the OpenMP 5.0/5.1 and 5.2 syntax for uses_allocators
is supported by this commit.
2023-07-17 Tobias Burnus <tobias@codesoucery.com>
Chung-Lin Tang <cltang@codesourcery.com>
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_namelist, show_omp_clauses): Dump
uses_allocators clause.
* gfortran.h (gfc_free_omp_namelist): Add memspace_sym to u union
and traits_sym to u2 union.
(OMP_LIST_USES_ALLOCATORS): New enum value.
(gfc_free_omp_namelist): Add 'bool free_mem_traits_space' arg.
* match.cc (gfc_free_omp_namelist): Likewise.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_variable_list,
gfc_match_omp_to_link, gfc_match_omp_doacross_sink,
gfc_match_omp_clause_reduction, gfc_match_omp_allocate,
gfc_match_omp_flush): Update call.
(gfc_match_omp_clauses): Likewise. Parse uses_allocators clause.
(gfc_match_omp_clause_uses_allocators): New.
(enum omp_mask2): Add new OMP_CLAUSE_USES_ALLOCATORS.
(OMP_TARGET_CLAUSES): Accept it.
(resolve_omp_clauses): Resolve uses_allocators clause
* st.cc (gfc_free_statement): Update gfc_free_omp_namelist call.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle
OMP_LIST_USES_ALLOCATORS; fail with sorry unless predefined allocator.
(gfc_split_omp_clauses): Handle uses_allocators.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/uses_allocators_1.f90: New test.
* testsuite/libgomp.fortran/uses_allocators_2.f90: New test.
Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
Add a testcase for 'omp requires unified_address' that is currently supported
by all devices but was not tested for.
libgomp/
PR libgomp/109837
* testsuite/libgomp.c-c++-common/requires-unified-addr-1.c: New test.
* testsuite/libgomp.fortran/requires-unified-addr-1.f90: New test.