Removal of HSA offloading from gcc and libgomp
This patch removes the generation of HSAIL from the compiler, the HSA offloading plugin from libgomp and the associated testsuite tests and infrastructure bits from the respective testsuites. Apart from removal of the obvious files, I removed bits that I found by searching for HSA related terms and by re-tracing my steps and looking at the patches that introduced HSA in the first place. I did not remove everything these patches brought in, for example: - the mechanism to pass offload-target specific info from the application to the offloading plugin - but the same mechanism is also used to communicate number of teams and the thread limit to all offload targets. - run_func hook in gomp_device_descr stays too, although now it is not used. If some future offload target would like the ability to refuse to offload some functions, it can use it. It is easy to remove as a follow-up if it is considered clutter, though. - configure options --with-hsa-runtime=PATH, -with-hsa-runtime-include=PATH and --with-hsa-runtime-lib=PATH rmeain because GCN uses them too. - Surprisingly, GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES (a constant from gomp-constants.h) appears in the source of the amdgcn libgomp plugin, although I tend to think that code path is not ever used and this patch certainly removes it from the compiler. Nevertheless, it seems it has potential value beyond HSAIL and so I've kept it, it can of course always be easily removed in the future of GCN folk abandon it too. - I assume constants OFFLOAD_TARGET_TYPE_HSA and GOMP_DEVICE_HSA need to stay indefinitely too just so that no future offload target picks that number. - I have kept dg-require-effective-target offload_device_nonshared_as requirement of thests which have it. It is quite probable I missed some small HSA artifacts but those should be easy to remove later as we find them. include/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * gomp-constants.h (GOMP_VERSION_HSA): Remove. gcc/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * hsa-brig-format.h: Moved to brig/brigfrontend. * hsa-brig.c: Removed. * hsa-builtins.def: Likewise. * hsa-common.c: Likewise. * hsa-common.h: Likewise. * hsa-dump.c: Likewise. * hsa-gen.c: Likewise. * hsa-regalloc.c: Likewise. * ipa-hsa.c: Likewise. * omp-grid.c: Likewise. * omp-grid.h: Likewise. * Makefile.in (BUILTINS_DEF): Remove hsa-builtins.def. (OBJS): Remove hsa-common.o, hsa-gen.o, hsa-regalloc.o, hsa-brig.o, hsa-dump.o, ipa-hsa.c and omp-grid.o. (GTFILES): Removed hsa-common.c and omp-expand.c. * builtins.def: Remove processing of hsa-builtins.def. (DEF_HSA_BUILTIN): Remove. * common.opt (flag_disable_hsa): Remove. (-Whsa): Ignore. * config.in (ENABLE_HSA): Removed. * configure.ac: Removed handling configuration for hsa offloading. (ENABLE_HSA): Removed. * configure: Regenerated. * doc/install.texi (--enable-offload-targets): Remove hsa from the example. (--with-hsa-runtime): Reword to reference any HSA run-time, not specifically HSA offloading. * doc/invoke.texi (Option Summary): Remove -Whsa. (Warning Options): Likewise. (Optimize Options): Remove hsa-gen-debug-stores. * doc/passes.texi (Regular IPA passes): Remove section on IPA HSA pass. * gimple-low.c (lower_stmt): Remove GIMPLE_OMP_GRID_BODY case. * gimple-pretty-print.c (dump_gimple_omp_for): Likewise. (dump_gimple_omp_block): Likewise. (pp_gimple_stmt_1): Likewise. * gimple-walk.c (walk_gimple_stmt): Likewise. * gimple.c (gimple_build_omp_grid_body): Removed function. (gimple_copy): Remove GIMPLE_OMP_GRID_BODY case. * gimple.def (GIMPLE_OMP_GRID_BODY): Removed. * gimple.h (gf_mask): Removed GF_OMP_PARALLEL_GRID_PHONY, OMP_FOR_KIND_GRID_LOOP, GF_OMP_FOR_GRID_PHONY, GF_OMP_FOR_GRID_INTRA_GROUP, GF_OMP_FOR_GRID_GROUP_ITER and GF_OMP_TEAMS_GRID_PHONY. Renumbered GF_OMP_FOR_KIND_SIMD and GF_OMP_TEAMS_HOST. (gimple_build_omp_grid_body): Removed declaration. (gimple_has_substatements): Remove GIMPLE_OMP_GRID_BODY case. (gimple_omp_for_grid_phony): Removed. (gimple_omp_for_set_grid_phony): Likewise. (gimple_omp_for_grid_intra_group): Likewise. (gimple_omp_for_grid_intra_group): Likewise. (gimple_omp_for_grid_group_iter): Likewise. (gimple_omp_for_set_grid_group_iter): Likewise. (gimple_omp_parallel_grid_phony): Likewise. (gimple_omp_parallel_set_grid_phony): Likewise. (gimple_omp_teams_grid_phony): Likewise. (gimple_omp_teams_set_grid_phony): Likewise. (CASE_GIMPLE_OMP): Remove GIMPLE_OMP_GRID_BODY case. * lto-section-in.c (lto_section_name): Removed hsa. * lto-streamer.h (lto_section_type): Removed LTO_section_ipa_hsa. * lto-wrapper.c (compile_images_for_offload_targets): Remove special handling of hsa. * omp-expand.c: Do not include hsa-common.h and gt-omp-expand.h. (parallel_needs_hsa_kernel_p): Removed. (grid_launch_attributes_trees): Likewise. (grid_launch_attributes_trees): Likewise. (grid_create_kernel_launch_attr_types): Likewise. (grid_insert_store_range_dim): Likewise. (grid_get_kernel_launch_attributes): Likewise. (get_target_arguments): Remove code passing HSA grid sizes. (grid_expand_omp_for_loop): Remove. (grid_arg_decl_map): Likewise. (grid_remap_kernel_arg_accesses): Likewise. (grid_expand_target_grid_body): Likewise. (expand_omp): Remove call to grid_expand_target_grid_body. (omp_make_gimple_edges): Remove GIMPLE_OMP_GRID_BODY case. * omp-general.c: Do not include hsa-common.h. (omp_maybe_offloaded): Do not check for HSA offloading. (omp_context_selector_matches): Likewise. * omp-low.c: Do not include hsa-common.h and omp-grid.h. (build_outer_var_ref): Remove handling of GIMPLE_OMP_GRID_BODY. (scan_sharing_clauses): Remove handling of OMP_CLAUSE__GRIDDIM_. (scan_omp_parallel): Remove handling of the phoney variant. (check_omp_nesting_restrictions): Remove handling of GIMPLE_OMP_GRID_BODY and GF_OMP_FOR_KIND_GRID_LOOP. (scan_omp_1_stmt): Remove handling of GIMPLE_OMP_GRID_BODY. (lower_omp_for_lastprivate): Remove handling of gridified loops. (lower_omp_for): Remove phony loop handling. (lower_omp_taskreg): Remove phony construct handling. (lower_omp_teams): Likewise. (lower_omp_grid_body): Removed. (lower_omp_1): Remove GIMPLE_OMP_GRID_BODY case. (execute_lower_omp): Do not call omp_grid_gridify_all_targets. * opts.c (common_handle_option): Do not handle hsa when processing OPT_foffload_. * params.opt (hsa-gen-debug-stores): Remove. * passes.def: Remove pass_ipa_hsa and pass_gen_hsail. * timevar.def: Remove TV_IPA_HSA. * toplev.c: Do not include hsa-common.h. (compile_file): Do not call hsa_output_brig. * tree-core.h (enum omp_clause_code): Remove OMP_CLAUSE__GRIDDIM_. (tree_omp_clause): Remove union field dimension. * tree-nested.c (convert_nonlocal_omp_clauses): Remove the OMP_CLAUSE__GRIDDIM_ case. (convert_local_omp_clauses): Likewise. * tree-pass.h (make_pass_gen_hsail): Remove declaration. (make_pass_ipa_hsa): Likewise. * tree-pretty-print.c (dump_omp_clause): Remove GIMPLE_OMP_GRID_BODY case. * tree.c (omp_clause_num_ops): Remove the element corresponding to OMP_CLAUSE__GRIDDIM_. (omp_clause_code_name): Likewise. (walk_tree_1): Remove GIMPLE_OMP_GRID_BODY case. * tree.h (OMP_CLAUSE__GRIDDIM__DIMENSION): Remove. (OMP_CLAUSE__GRIDDIM__SIZE): Likewise. (OMP_CLAUSE__GRIDDIM__GROUP): Likewise. gcc/fortran/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * f95-lang.c (gfc_init_builtin_functions): Remove processing of hsa-builtins.def. gcc/brig/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * brigfrontend/brig-util.h (hsa_type_packed_p): Declared. * brigfrontend/brig-util.cc (hsa_type_packed_p): Moved here from removed gcc/hsa-common.c. libgomp/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * plugin/Makefrag.am: Remove configuration of HSA plugin. * aclocal.m4: Regenerated. * Makefile.in: Regenerated. * config.h.in: Regenerated. * configure: Regenerated. * plugin/configfrag.ac: Likewise. * plugin/hsa_ext_finalize.h: Removed. * plugin/plugin-hsa.c: Likewise. * testsuite/Makefile.in: Regenerated. * testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type): Remove hsa case. (check_effective_target_hsa_offloading_selected_nocache): Removed (check_effective_target_hsa_offloading_selected): Likewise. (libgomp_init): Do not add -Wno-hsa to additional_flags. * testsuite/libgomp.hsa.c/alloca-1.c: Removed test. * testsuite/libgomp.hsa.c/bitfield-1.c: Likewise. * testsuite/libgomp.hsa.c/bits-insns.c: Likewise. * testsuite/libgomp.hsa.c/builtins-1.c: Likewise. * testsuite/libgomp.hsa.c/c.exp: Likewise. * testsuite/libgomp.hsa.c/complex-1.c: Likewise. * testsuite/libgomp.hsa.c/complex-align-2.c: Likewise. * testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise. * testsuite/libgomp.hsa.c/function-call-1.c: Likewise. * testsuite/libgomp.hsa.c/get-level-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-1.c: Likewise. * testsuite/libgomp.hsa.c/gridify-2.c: Likewise. * testsuite/libgomp.hsa.c/gridify-3.c: Likewise. * testsuite/libgomp.hsa.c/gridify-4.c: Likewise. * testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise. * testsuite/libgomp.hsa.c/pr69568.c: Likewise. * testsuite/libgomp.hsa.c/pr82416.c: Likewise. * testsuite/libgomp.hsa.c/rotate-1.c: Likewise. * testsuite/libgomp.hsa.c/staticvar.c: Likewise. * testsuite/libgomp.hsa.c/switch-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise. * testsuite/libgomp.hsa.c/switch-sbr-2.c: Likewise. * testsuite/libgomp.hsa.c/tiling-1.c: Likewise. * testsuite/libgomp.hsa.c/tiling-2.c: Likewise. gcc/testsuite/ChangeLog: 2020-07-24 Martin Jambor <mjambor@suse.cz> * lib/target-supports.exp (check_effective_target_offload_hsa): Removed. * c-c++-common/gomp/gridify-1.c: Removed test. * c-c++-common/gomp/gridify-2.c: Likewise. * c-c++-common/gomp/gridify-3.c: Likewise. * c-c++-common/gomp/hsa-indirect-call-1.c: Likewise. * gfortran.dg/gomp/gridify-1.f90: Likewise. * gcc.dg/gomp/gomp.exp: Do not pass -Wno-hsa to tests. * g++.dg/gomp/gomp.exp: Likewise. * gfortran.dg/gomp/gomp.exp: Likewise.
This commit is contained in:
parent
9623f61b14
commit
c56684fd61
90 changed files with 516 additions and 21260 deletions
|
@ -933,8 +933,7 @@ FIXED_VALUE_H = fixed-value.h
|
|||
RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h
|
||||
READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
|
||||
BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \
|
||||
gtm-builtins.def sanitizer.def \
|
||||
hsa-builtins.def
|
||||
gtm-builtins.def sanitizer.def
|
||||
INTERNAL_FN_DEF = internal-fn.def
|
||||
INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
|
||||
TREE_CORE_H = tree-core.h $(CORETYPES_H) all-tree.def tree.def \
|
||||
|
@ -1395,11 +1394,6 @@ OBJS = \
|
|||
haifa-sched.o \
|
||||
hash-map-tests.o \
|
||||
hash-set-tests.o \
|
||||
hsa-common.o \
|
||||
hsa-gen.o \
|
||||
hsa-regalloc.o \
|
||||
hsa-brig.o \
|
||||
hsa-dump.o \
|
||||
hw-doloop.o \
|
||||
hwint.o \
|
||||
ifcvt.o \
|
||||
|
@ -1427,7 +1421,6 @@ OBJS = \
|
|||
ipa-icf.o \
|
||||
ipa-icf-gimple.o \
|
||||
ipa-reference.o \
|
||||
ipa-hsa.o \
|
||||
ipa-ref.o \
|
||||
ipa-utils.o \
|
||||
ipa.o \
|
||||
|
@ -1471,7 +1464,6 @@ OBJS = \
|
|||
omp-offload.o \
|
||||
omp-expand.o \
|
||||
omp-general.o \
|
||||
omp-grid.o \
|
||||
omp-low.o \
|
||||
omp-simd-clone.o \
|
||||
opt-problem.o \
|
||||
|
@ -2619,7 +2611,6 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
|
|||
$(srcdir)/tree-profile.c $(srcdir)/tree-nested.c \
|
||||
$(srcdir)/omp-offload.h \
|
||||
$(srcdir)/omp-offload.c \
|
||||
$(srcdir)/omp-expand.c \
|
||||
$(srcdir)/omp-general.c \
|
||||
$(srcdir)/omp-low.c \
|
||||
$(srcdir)/targhooks.c $(out_file) $(srcdir)/passes.c $(srcdir)/cgraphunit.c \
|
||||
|
@ -2643,7 +2634,6 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
|
|||
$(srcdir)/sancov.c \
|
||||
$(srcdir)/ipa-devirt.c \
|
||||
$(srcdir)/internal-fn.h \
|
||||
$(srcdir)/hsa-common.c \
|
||||
$(srcdir)/calls.c \
|
||||
$(srcdir)/omp-general.h \
|
||||
@all_gtfiles@
|
||||
|
|
|
@ -563,3 +563,12 @@ gccbrig_print_reg_use_info (FILE *dump, const regs_use_index &info)
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Return true if TYPE is a packed HSA type. */
|
||||
|
||||
bool
|
||||
hsa_type_packed_p (BrigType16_t type)
|
||||
{
|
||||
return (type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE;
|
||||
}
|
||||
|
||||
|
|
|
@ -115,4 +115,6 @@ gccbrig_type_vector_subparts (const_tree type)
|
|||
return TYPE_VECTOR_SUBPARTS (type).to_constant ();
|
||||
}
|
||||
|
||||
bool hsa_type_packed_p (BrigType16_t type);
|
||||
|
||||
#endif
|
||||
|
|
|
@ -222,19 +222,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
|| flag_tree_parallelize_loops > 1 \
|
||||
|| flag_offload_abi != OFFLOAD_ABI_UNSET))
|
||||
|
||||
#undef DEF_HSA_BUILTIN
|
||||
#ifdef ENABLE_HSA
|
||||
#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
|
||||
DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \
|
||||
false, false, true, ATTRS, false, \
|
||||
(!flag_disable_hsa))
|
||||
#else
|
||||
#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
|
||||
DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \
|
||||
false, false, true, ATTRS, false, \
|
||||
(false))
|
||||
#endif
|
||||
|
||||
/* Builtin used by the implementation of GNU TM. These
|
||||
functions are mapped to the actual implementation of the STM library. */
|
||||
#undef DEF_TM_BUILTIN
|
||||
|
@ -1063,9 +1050,6 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
|
|||
/* Offloading and Multi Processing builtins. */
|
||||
#include "omp-builtins.def"
|
||||
|
||||
/* Heterogeneous Systems Architecture. */
|
||||
#include "hsa-builtins.def"
|
||||
|
||||
/* GTM builtins. */
|
||||
#include "gtm-builtins.def"
|
||||
|
||||
|
|
|
@ -228,10 +228,6 @@ unsigned int flag_sanitize_coverage
|
|||
Variable
|
||||
bool dump_base_name_prefixed = false
|
||||
|
||||
; Flag whether HSA generation has been explicitely disabled
|
||||
Variable
|
||||
bool flag_disable_hsa = false
|
||||
|
||||
###
|
||||
Driver
|
||||
|
||||
|
@ -619,8 +615,8 @@ Common Var(warn_free_nonheap_object) Init(1) Warning
|
|||
Warn when attempting to free a non-heap object.
|
||||
|
||||
Whsa
|
||||
Common Var(warn_hsa) Init(1) Warning
|
||||
Warn when a function cannot be expanded to HSAIL.
|
||||
Common Ignore Warning
|
||||
Does nothing. Preserved for backward compatibility.
|
||||
|
||||
Wimplicit-fallthrough
|
||||
Common Alias(Wimplicit-fallthrough=,3,0) Warning
|
||||
|
|
|
@ -181,12 +181,6 @@
|
|||
#endif
|
||||
|
||||
|
||||
/* Define this to enable support for generating HSAIL. */
|
||||
#ifndef USED_FOR_TARGET
|
||||
#undef ENABLE_HSA
|
||||
#endif
|
||||
|
||||
|
||||
/* Define if gcc should always pass --build-id to linker. */
|
||||
#ifndef USED_FOR_TARGET
|
||||
#undef ENABLE_LD_BUILDID
|
||||
|
|
54
gcc/configure
vendored
54
gcc/configure
vendored
|
@ -7948,30 +7948,26 @@ fi
|
|||
for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
|
||||
tgt=`echo $tgt | sed 's/=.*//'`
|
||||
|
||||
if echo "$tgt" | grep "^hsa" > /dev/null ; then
|
||||
enable_hsa=1
|
||||
else
|
||||
enable_offloading=1
|
||||
case "$tgt" in
|
||||
*-intelmic-* | *-intelmicemul-*)
|
||||
omp_device_property=omp-device-properties-i386
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/i386/t-omp-device"
|
||||
;;
|
||||
amdgcn*)
|
||||
omp_device_property=omp-device-properties-gcn
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/gcn/t-omp-device"
|
||||
;;
|
||||
nvptx*)
|
||||
omp_device_property=omp-device-properties-nvptx
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/nvptx/t-omp-device"
|
||||
;;
|
||||
*)
|
||||
as_fn_error $? "unknown offload target specified" "$LINENO" 5
|
||||
;;
|
||||
esac
|
||||
omp_device_properties="${omp_device_properties} ${tgt}=${omp_device_property}"
|
||||
omp_device_property_deps="${omp_device_property_deps} ${omp_device_property}"
|
||||
fi
|
||||
enable_offloading=1
|
||||
case "$tgt" in
|
||||
*-intelmic-* | *-intelmicemul-*)
|
||||
omp_device_property=omp-device-properties-i386
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/i386/t-omp-device"
|
||||
;;
|
||||
amdgcn*)
|
||||
omp_device_property=omp-device-properties-gcn
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/gcn/t-omp-device"
|
||||
;;
|
||||
nvptx*)
|
||||
omp_device_property=omp-device-properties-nvptx
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/nvptx/t-omp-device"
|
||||
;;
|
||||
*)
|
||||
as_fn_error $? "unknown offload target specified" "$LINENO" 5
|
||||
;;
|
||||
esac
|
||||
omp_device_properties="${omp_device_properties} ${tgt}=${omp_device_property}"
|
||||
omp_device_property_deps="${omp_device_property_deps} ${omp_device_property}"
|
||||
|
||||
if test x"$offload_targets" = x; then
|
||||
offload_targets=$tgt
|
||||
|
@ -7997,12 +7993,6 @@ $as_echo "#define ENABLE_OFFLOADING 0" >>confdefs.h
|
|||
|
||||
fi
|
||||
|
||||
if test x"$enable_hsa" = x1 ; then
|
||||
|
||||
$as_echo "#define ENABLE_HSA 1" >>confdefs.h
|
||||
|
||||
fi
|
||||
|
||||
|
||||
# Check whether --with-multilib-list was given.
|
||||
if test "${with_multilib_list+set}" = set; then :
|
||||
|
@ -19023,7 +19013,7 @@ else
|
|||
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
|
||||
lt_status=$lt_dlunknown
|
||||
cat > conftest.$ac_ext <<_LT_EOF
|
||||
#line 19026 "configure"
|
||||
#line 19016 "configure"
|
||||
#include "confdefs.h"
|
||||
|
||||
#if HAVE_DLFCN_H
|
||||
|
@ -19129,7 +19119,7 @@ else
|
|||
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
|
||||
lt_status=$lt_dlunknown
|
||||
cat > conftest.$ac_ext <<_LT_EOF
|
||||
#line 19132 "configure"
|
||||
#line 19122 "configure"
|
||||
#include "confdefs.h"
|
||||
|
||||
#if HAVE_DLFCN_H
|
||||
|
|
|
@ -1057,30 +1057,26 @@ AC_SUBST(accel_dir_suffix)
|
|||
for tgt in `echo $enable_offload_targets | sed 's/,/ /g'`; do
|
||||
tgt=`echo $tgt | sed 's/=.*//'`
|
||||
|
||||
if echo "$tgt" | grep "^hsa" > /dev/null ; then
|
||||
enable_hsa=1
|
||||
else
|
||||
enable_offloading=1
|
||||
case "$tgt" in
|
||||
*-intelmic-* | *-intelmicemul-*)
|
||||
omp_device_property=omp-device-properties-i386
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/i386/t-omp-device"
|
||||
;;
|
||||
amdgcn*)
|
||||
omp_device_property=omp-device-properties-gcn
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/gcn/t-omp-device"
|
||||
;;
|
||||
nvptx*)
|
||||
omp_device_property=omp-device-properties-nvptx
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/nvptx/t-omp-device"
|
||||
;;
|
||||
*)
|
||||
AC_MSG_ERROR([unknown offload target specified])
|
||||
;;
|
||||
esac
|
||||
omp_device_properties="${omp_device_properties} ${tgt}=${omp_device_property}"
|
||||
omp_device_property_deps="${omp_device_property_deps} ${omp_device_property}"
|
||||
fi
|
||||
enable_offloading=1
|
||||
case "$tgt" in
|
||||
*-intelmic-* | *-intelmicemul-*)
|
||||
omp_device_property=omp-device-properties-i386
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/i386/t-omp-device"
|
||||
;;
|
||||
amdgcn*)
|
||||
omp_device_property=omp-device-properties-gcn
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/gcn/t-omp-device"
|
||||
;;
|
||||
nvptx*)
|
||||
omp_device_property=omp-device-properties-nvptx
|
||||
omp_device_property_tmake_file="${omp_device_property_tmake_file} \$(srcdir)/config/nvptx/t-omp-device"
|
||||
;;
|
||||
*)
|
||||
AC_MSG_ERROR([unknown offload target specified])
|
||||
;;
|
||||
esac
|
||||
omp_device_properties="${omp_device_properties} ${tgt}=${omp_device_property}"
|
||||
omp_device_property_deps="${omp_device_property_deps} ${omp_device_property}"
|
||||
|
||||
if test x"$offload_targets" = x; then
|
||||
offload_targets=$tgt
|
||||
|
@ -1101,11 +1097,6 @@ else
|
|||
[Define this to enable support for offloading.])
|
||||
fi
|
||||
|
||||
if test x"$enable_hsa" = x1 ; then
|
||||
AC_DEFINE(ENABLE_HSA, 1,
|
||||
[Define this to enable support for generating HSAIL.])
|
||||
fi
|
||||
|
||||
AC_ARG_WITH(multilib-list,
|
||||
[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and x86-64 only)])],
|
||||
:,
|
||||
|
|
|
@ -2194,22 +2194,18 @@ specifying paths @var{path1}, @dots{}, @var{pathN}.
|
|||
|
||||
@smallexample
|
||||
% @var{srcdir}/configure \
|
||||
--enable-offload-targets=x86_64-intelmicemul-linux-gnu=/path/to/x86_64/compiler,nvptx-none,hsa
|
||||
--enable-offload-targets=x86_64-intelmicemul-linux-gnu=/path/to/x86_64/compiler,nvptx-none
|
||||
@end smallexample
|
||||
|
||||
If @samp{hsa} is specified as one of the targets, the compiler will be
|
||||
built with support for HSA GPU accelerators. Because the same
|
||||
compiler will emit the accelerator code, no path should be specified.
|
||||
|
||||
@item --with-hsa-runtime=@var{pathname}
|
||||
@itemx --with-hsa-runtime-include=@var{pathname}
|
||||
@itemx --with-hsa-runtime-lib=@var{pathname}
|
||||
|
||||
If you configure GCC with HSA offloading but do not have the HSA
|
||||
run-time library installed in a standard location then you can
|
||||
explicitly specify the directory where they are installed. The
|
||||
@option{--with-hsa-runtime=@/@var{hsainstalldir}} option is a
|
||||
shorthand for
|
||||
If you configure GCC with offloading which uses an HSA run-time such as
|
||||
AMDGCN but do not have the HSA run-time library installed in a standard
|
||||
location then you can explicitly specify the directory where they are
|
||||
installed. The @option{--with-hsa-runtime=@/@var{hsainstalldir}} option
|
||||
is a shorthand for
|
||||
@option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
|
||||
@option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
|
||||
|
||||
|
|
|
@ -332,7 +332,7 @@ Objective-C and Objective-C++ Dialects}.
|
|||
-Wformat-security -Wformat-signedness -Wformat-truncation=@var{n} @gol
|
||||
-Wformat-y2k -Wframe-address @gol
|
||||
-Wframe-larger-than=@var{byte-size} -Wno-free-nonheap-object @gol
|
||||
-Wno-hsa -Wno-if-not-aligned -Wno-ignored-attributes @gol
|
||||
-Wno-if-not-aligned -Wno-ignored-attributes @gol
|
||||
-Wignored-qualifiers -Wno-incompatible-pointer-types @gol
|
||||
-Wimplicit -Wimplicit-fallthrough -Wimplicit-fallthrough=@var{n} @gol
|
||||
-Wno-implicit-function-declaration -Wno-implicit-int @gol
|
||||
|
@ -8591,12 +8591,6 @@ Suppress warnings when a positional initializer is used to initialize
|
|||
a structure that has been marked with the @code{designated_init}
|
||||
attribute.
|
||||
|
||||
@item -Wno-hsa
|
||||
@opindex Whsa
|
||||
@opindex Wno-hsa
|
||||
Do not warn when HSAIL cannot be emitted for the compiled function or
|
||||
OpenMP construct. These warnings are enabled by default.
|
||||
|
||||
@end table
|
||||
|
||||
@node Static Analyzer Options
|
||||
|
@ -13393,12 +13387,6 @@ Maximum depth of recursion when querying properties of SSA names in things
|
|||
like fold routines. One level of recursion corresponds to following a
|
||||
use-def chain.
|
||||
|
||||
@item hsa-gen-debug-stores
|
||||
Enable emission of special debug stores within HSA kernels which are
|
||||
then read and reported by libgomp plugin. Generation of these stores
|
||||
is disabled by default, use @option{--param hsa-gen-debug-stores=1} to
|
||||
enable it.
|
||||
|
||||
@item max-speculative-devirt-maydefs
|
||||
The maximum number of may-defs we analyze when looking for a must-def
|
||||
specifying the dynamic type of an object that invokes a virtual call
|
||||
|
|
|
@ -360,13 +360,6 @@ target doesn't support constructors and destructors natively. The
|
|||
pass is located in @file{ipa.c} and is described by
|
||||
@code{pass_ipa_cdtor_merge}.
|
||||
|
||||
@item IPA HSA
|
||||
|
||||
This pass is part of the GCC support for HSA (Heterogeneous System
|
||||
Architecture) accelerators. It is responsible for creation of HSA
|
||||
clones and emitting HSAIL instructions for them. It is located in
|
||||
@file{ipa-hsa.c} and is described by @code{pass_ipa_hsa}.
|
||||
|
||||
@item IPA function summary
|
||||
|
||||
This pass provides function analysis for inter-procedural passes.
|
||||
|
|
|
@ -1238,17 +1238,6 @@ gfc_init_builtin_functions (void)
|
|||
#undef DEF_GOMP_BUILTIN
|
||||
}
|
||||
|
||||
#ifdef ENABLE_HSA
|
||||
if (!flag_disable_hsa)
|
||||
{
|
||||
#undef DEF_HSA_BUILTIN
|
||||
#define DEF_HSA_BUILTIN(code, name, type, attr) \
|
||||
gfc_define_builtin ("__builtin_" name, builtin_types[type], \
|
||||
code, name, attr);
|
||||
#include "../hsa-builtins.def"
|
||||
}
|
||||
#endif
|
||||
|
||||
gfc_define_builtin ("__builtin_trap", builtin_types[BT_FN_VOID],
|
||||
BUILT_IN_TRAP, NULL, ATTR_NOTHROW_LEAF_LIST);
|
||||
TREE_THIS_VOLATILE (builtin_decl_explicit (BUILT_IN_TRAP)) = 1;
|
||||
|
|
|
@ -393,7 +393,6 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data *data)
|
|||
case GIMPLE_OMP_TASK:
|
||||
case GIMPLE_OMP_TARGET:
|
||||
case GIMPLE_OMP_TEAMS:
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
data->cannot_fallthru = false;
|
||||
lower_omp_directive (gsi, data);
|
||||
data->cannot_fallthru = false;
|
||||
|
|
|
@ -1498,9 +1498,6 @@ dump_gimple_omp_for (pretty_printer *buffer, const gomp_for *gs, int spc,
|
|||
case GF_OMP_FOR_KIND_SIMD:
|
||||
pp_string (buffer, "#pragma omp simd");
|
||||
break;
|
||||
case GF_OMP_FOR_KIND_GRID_LOOP:
|
||||
pp_string (buffer, "#pragma omp for grid_loop");
|
||||
break;
|
||||
default:
|
||||
gcc_unreachable ();
|
||||
}
|
||||
|
@ -1836,9 +1833,6 @@ dump_gimple_omp_block (pretty_printer *buffer, const gimple *gs, int spc,
|
|||
case GIMPLE_OMP_SECTION:
|
||||
pp_string (buffer, "#pragma omp section");
|
||||
break;
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
pp_string (buffer, "#pragma omp gridified body");
|
||||
break;
|
||||
default:
|
||||
gcc_unreachable ();
|
||||
}
|
||||
|
@ -2703,7 +2697,6 @@ pp_gimple_stmt_1 (pretty_printer *buffer, const gimple *gs, int spc,
|
|||
|
||||
case GIMPLE_OMP_MASTER:
|
||||
case GIMPLE_OMP_SECTION:
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
dump_gimple_omp_block (buffer, gs, spc, flags);
|
||||
break;
|
||||
|
||||
|
|
|
@ -668,7 +668,6 @@ walk_gimple_stmt (gimple_stmt_iterator *gsi, walk_stmt_fn callback_stmt,
|
|||
case GIMPLE_OMP_SINGLE:
|
||||
case GIMPLE_OMP_TARGET:
|
||||
case GIMPLE_OMP_TEAMS:
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
ret = walk_gimple_seq_mod (gimple_omp_body_ptr (stmt), callback_stmt,
|
||||
callback_op, wi);
|
||||
if (ret)
|
||||
|
|
15
gcc/gimple.c
15
gcc/gimple.c
|
@ -1035,20 +1035,6 @@ gimple_build_omp_master (gimple_seq body)
|
|||
return p;
|
||||
}
|
||||
|
||||
/* Build a GIMPLE_OMP_GRID_BODY statement.
|
||||
|
||||
BODY is the sequence of statements to be executed by the kernel. */
|
||||
|
||||
gimple *
|
||||
gimple_build_omp_grid_body (gimple_seq body)
|
||||
{
|
||||
gimple *p = gimple_alloc (GIMPLE_OMP_GRID_BODY, 0);
|
||||
if (body)
|
||||
gimple_omp_set_body (p, body);
|
||||
|
||||
return p;
|
||||
}
|
||||
|
||||
/* Build a GIMPLE_OMP_TASKGROUP statement.
|
||||
|
||||
BODY is the sequence of statements to be executed by the taskgroup
|
||||
|
@ -2018,7 +2004,6 @@ gimple_copy (gimple *stmt)
|
|||
|
||||
case GIMPLE_OMP_SECTION:
|
||||
case GIMPLE_OMP_MASTER:
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
copy_omp_body:
|
||||
new_seq = gimple_seq_copy (gimple_omp_body (stmt));
|
||||
gimple_omp_set_body (copy, new_seq);
|
||||
|
|
|
@ -384,10 +384,6 @@ DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_PARALLEL_LAYOUT)
|
|||
CLAUSES is an OMP_CLAUSE chain holding the associated clauses. */
|
||||
DEFGSCODE(GIMPLE_OMP_ORDERED, "gimple_omp_ordered", GSS_OMP_SINGLE_LAYOUT)
|
||||
|
||||
/* GIMPLE_OMP_GRID_BODY <BODY> represents a parallel loop lowered for execution
|
||||
on a GPU. It is an artificial statement created by omp lowering. */
|
||||
DEFGSCODE(GIMPLE_OMP_GRID_BODY, "gimple_omp_gpukernel", GSS_OMP)
|
||||
|
||||
/* GIMPLE_PREDICT <PREDICT, OUTCOME> specifies a hint for branch prediction.
|
||||
|
||||
PREDICT is one of the predictors from predict.def.
|
||||
|
|
127
gcc/gimple.h
127
gcc/gimple.h
|
@ -150,7 +150,6 @@ enum gf_mask {
|
|||
GF_CALL_BY_DESCRIPTOR = 1 << 10,
|
||||
GF_CALL_NOCF_CHECK = 1 << 11,
|
||||
GF_OMP_PARALLEL_COMBINED = 1 << 0,
|
||||
GF_OMP_PARALLEL_GRID_PHONY = 1 << 1,
|
||||
GF_OMP_TASK_TASKLOOP = 1 << 0,
|
||||
GF_OMP_TASK_TASKWAIT = 1 << 1,
|
||||
GF_OMP_FOR_KIND_MASK = (1 << 3) - 1,
|
||||
|
@ -158,17 +157,9 @@ enum gf_mask {
|
|||
GF_OMP_FOR_KIND_DISTRIBUTE = 1,
|
||||
GF_OMP_FOR_KIND_TASKLOOP = 2,
|
||||
GF_OMP_FOR_KIND_OACC_LOOP = 4,
|
||||
GF_OMP_FOR_KIND_GRID_LOOP = 5,
|
||||
GF_OMP_FOR_KIND_SIMD = 6,
|
||||
GF_OMP_FOR_KIND_SIMD = 5,
|
||||
GF_OMP_FOR_COMBINED = 1 << 3,
|
||||
GF_OMP_FOR_COMBINED_INTO = 1 << 4,
|
||||
/* The following flag must not be used on GF_OMP_FOR_KIND_GRID_LOOP loop
|
||||
statements. */
|
||||
GF_OMP_FOR_GRID_PHONY = 1 << 5,
|
||||
/* The following two flags should only be set on GF_OMP_FOR_KIND_GRID_LOOP
|
||||
loop statements. */
|
||||
GF_OMP_FOR_GRID_INTRA_GROUP = 1 << 5,
|
||||
GF_OMP_FOR_GRID_GROUP_ITER = 1 << 6,
|
||||
GF_OMP_TARGET_KIND_MASK = (1 << 4) - 1,
|
||||
GF_OMP_TARGET_KIND_REGION = 0,
|
||||
GF_OMP_TARGET_KIND_DATA = 1,
|
||||
|
@ -183,8 +174,7 @@ enum gf_mask {
|
|||
GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 10,
|
||||
GF_OMP_TARGET_KIND_OACC_DECLARE = 11,
|
||||
GF_OMP_TARGET_KIND_OACC_HOST_DATA = 12,
|
||||
GF_OMP_TEAMS_GRID_PHONY = 1 << 0,
|
||||
GF_OMP_TEAMS_HOST = 1 << 1,
|
||||
GF_OMP_TEAMS_HOST = 1 << 0,
|
||||
|
||||
/* True on an GIMPLE_OMP_RETURN statement if the return does not require
|
||||
a thread synchronization via some sort of barrier. The exact barrier
|
||||
|
@ -1559,7 +1549,6 @@ gomp_task *gimple_build_omp_task (gimple_seq, tree, tree, tree, tree,
|
|||
tree, tree);
|
||||
gimple *gimple_build_omp_section (gimple_seq);
|
||||
gimple *gimple_build_omp_master (gimple_seq);
|
||||
gimple *gimple_build_omp_grid_body (gimple_seq);
|
||||
gimple *gimple_build_omp_taskgroup (gimple_seq, tree);
|
||||
gomp_continue *gimple_build_omp_continue (tree, tree);
|
||||
gomp_ordered *gimple_build_omp_ordered (gimple_seq, tree);
|
||||
|
@ -1830,7 +1819,6 @@ gimple_has_substatements (gimple *g)
|
|||
case GIMPLE_OMP_CRITICAL:
|
||||
case GIMPLE_WITH_CLEANUP_EXPR:
|
||||
case GIMPLE_TRANSACTION:
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
return true;
|
||||
|
||||
default:
|
||||
|
@ -5440,76 +5428,6 @@ gimple_omp_for_set_pre_body (gimple *gs, gimple_seq pre_body)
|
|||
omp_for_stmt->pre_body = pre_body;
|
||||
}
|
||||
|
||||
/* Return the kernel_phony of OMP_FOR statement. */
|
||||
|
||||
static inline bool
|
||||
gimple_omp_for_grid_phony (const gomp_for *omp_for)
|
||||
{
|
||||
gcc_checking_assert (gimple_omp_for_kind (omp_for)
|
||||
!= GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_PHONY) != 0;
|
||||
}
|
||||
|
||||
/* Set kernel_phony flag of OMP_FOR to VALUE. */
|
||||
|
||||
static inline void
|
||||
gimple_omp_for_set_grid_phony (gomp_for *omp_for, bool value)
|
||||
{
|
||||
gcc_checking_assert (gimple_omp_for_kind (omp_for)
|
||||
!= GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
if (value)
|
||||
omp_for->subcode |= GF_OMP_FOR_GRID_PHONY;
|
||||
else
|
||||
omp_for->subcode &= ~GF_OMP_FOR_GRID_PHONY;
|
||||
}
|
||||
|
||||
/* Return the kernel_intra_group of a GRID_LOOP OMP_FOR statement. */
|
||||
|
||||
static inline bool
|
||||
gimple_omp_for_grid_intra_group (const gomp_for *omp_for)
|
||||
{
|
||||
gcc_checking_assert (gimple_omp_for_kind (omp_for)
|
||||
== GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_INTRA_GROUP) != 0;
|
||||
}
|
||||
|
||||
/* Set kernel_intra_group flag of OMP_FOR to VALUE. */
|
||||
|
||||
static inline void
|
||||
gimple_omp_for_set_grid_intra_group (gomp_for *omp_for, bool value)
|
||||
{
|
||||
gcc_checking_assert (gimple_omp_for_kind (omp_for)
|
||||
== GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
if (value)
|
||||
omp_for->subcode |= GF_OMP_FOR_GRID_INTRA_GROUP;
|
||||
else
|
||||
omp_for->subcode &= ~GF_OMP_FOR_GRID_INTRA_GROUP;
|
||||
}
|
||||
|
||||
/* Return true if iterations of a grid OMP_FOR statement correspond to HSA
|
||||
groups. */
|
||||
|
||||
static inline bool
|
||||
gimple_omp_for_grid_group_iter (const gomp_for *omp_for)
|
||||
{
|
||||
gcc_checking_assert (gimple_omp_for_kind (omp_for)
|
||||
== GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
return (gimple_omp_subcode (omp_for) & GF_OMP_FOR_GRID_GROUP_ITER) != 0;
|
||||
}
|
||||
|
||||
/* Set group_iter flag of OMP_FOR to VALUE. */
|
||||
|
||||
static inline void
|
||||
gimple_omp_for_set_grid_group_iter (gomp_for *omp_for, bool value)
|
||||
{
|
||||
gcc_checking_assert (gimple_omp_for_kind (omp_for)
|
||||
== GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
if (value)
|
||||
omp_for->subcode |= GF_OMP_FOR_GRID_GROUP_ITER;
|
||||
else
|
||||
omp_for->subcode &= ~GF_OMP_FOR_GRID_GROUP_ITER;
|
||||
}
|
||||
|
||||
/* Return the clauses associated with OMP_PARALLEL GS. */
|
||||
|
||||
static inline tree
|
||||
|
@ -5595,25 +5513,6 @@ gimple_omp_parallel_set_data_arg (gomp_parallel *omp_parallel_stmt,
|
|||
omp_parallel_stmt->data_arg = data_arg;
|
||||
}
|
||||
|
||||
/* Return the kernel_phony flag of OMP_PARALLEL_STMT. */
|
||||
|
||||
static inline bool
|
||||
gimple_omp_parallel_grid_phony (const gomp_parallel *stmt)
|
||||
{
|
||||
return (gimple_omp_subcode (stmt) & GF_OMP_PARALLEL_GRID_PHONY) != 0;
|
||||
}
|
||||
|
||||
/* Set kernel_phony flag of OMP_PARALLEL_STMT to VALUE. */
|
||||
|
||||
static inline void
|
||||
gimple_omp_parallel_set_grid_phony (gomp_parallel *stmt, bool value)
|
||||
{
|
||||
if (value)
|
||||
stmt->subcode |= GF_OMP_PARALLEL_GRID_PHONY;
|
||||
else
|
||||
stmt->subcode &= ~GF_OMP_PARALLEL_GRID_PHONY;
|
||||
}
|
||||
|
||||
/* Return the clauses associated with OMP_TASK GS. */
|
||||
|
||||
static inline tree
|
||||
|
@ -6165,25 +6064,6 @@ gimple_omp_teams_set_data_arg (gomp_teams *omp_teams_stmt, tree data_arg)
|
|||
omp_teams_stmt->data_arg = data_arg;
|
||||
}
|
||||
|
||||
/* Return the kernel_phony flag of an OMP_TEAMS_STMT. */
|
||||
|
||||
static inline bool
|
||||
gimple_omp_teams_grid_phony (const gomp_teams *omp_teams_stmt)
|
||||
{
|
||||
return (gimple_omp_subcode (omp_teams_stmt) & GF_OMP_TEAMS_GRID_PHONY) != 0;
|
||||
}
|
||||
|
||||
/* Set kernel_phony flag of an OMP_TEAMS_STMT to VALUE. */
|
||||
|
||||
static inline void
|
||||
gimple_omp_teams_set_grid_phony (gomp_teams *omp_teams_stmt, bool value)
|
||||
{
|
||||
if (value)
|
||||
omp_teams_stmt->subcode |= GF_OMP_TEAMS_GRID_PHONY;
|
||||
else
|
||||
omp_teams_stmt->subcode &= ~GF_OMP_TEAMS_GRID_PHONY;
|
||||
}
|
||||
|
||||
/* Return the host flag of an OMP_TEAMS_STMT. */
|
||||
|
||||
static inline bool
|
||||
|
@ -6547,8 +6427,7 @@ gimple_return_set_retval (greturn *gs, tree retval)
|
|||
case GIMPLE_OMP_RETURN: \
|
||||
case GIMPLE_OMP_ATOMIC_LOAD: \
|
||||
case GIMPLE_OMP_ATOMIC_STORE: \
|
||||
case GIMPLE_OMP_CONTINUE: \
|
||||
case GIMPLE_OMP_GRID_BODY
|
||||
case GIMPLE_OMP_CONTINUE
|
||||
|
||||
static inline bool
|
||||
is_gimple_omp (const gimple *stmt)
|
||||
|
|
2612
gcc/hsa-brig.c
2612
gcc/hsa-brig.c
File diff suppressed because it is too large
Load diff
|
@ -1,39 +0,0 @@
|
|||
/* This file contains the definitions and documentation for the
|
||||
Offloading and Multi Processing builtins used in the GNU compiler.
|
||||
Copyright (C) 2005-2020 Free Software Foundation, Inc.
|
||||
|
||||
This file is part of GCC.
|
||||
|
||||
GCC is free software; you can redistribute it and/or modify it under
|
||||
the terms of the GNU General Public License as published by the Free
|
||||
Software Foundation; either version 3, or (at your option) any later
|
||||
version.
|
||||
|
||||
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
|
||||
WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
||||
for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with GCC; see the file COPYING3. If not see
|
||||
<http://www.gnu.org/licenses/>. */
|
||||
|
||||
/* Before including this file, you should define a macro:
|
||||
|
||||
DEF_HSA_BUILTIN (ENUM, NAME, TYPE, ATTRS)
|
||||
|
||||
See builtins.def for details. */
|
||||
|
||||
/* The reason why they aren't in gcc/builtins.def is that the Fortran front end
|
||||
doesn't source those. */
|
||||
|
||||
DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKGROUPID, "hsa_workgroupid",
|
||||
BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
|
||||
DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMID, "hsa_workitemid",
|
||||
BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
|
||||
DEF_HSA_BUILTIN (BUILT_IN_HSA_WORKITEMABSID, "hsa_workitemabsid",
|
||||
BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
|
||||
DEF_HSA_BUILTIN (BUILT_IN_HSA_GRIDSIZE, "hsa_gridsize",
|
||||
BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
|
||||
DEF_HSA_BUILTIN (BUILT_IN_HSA_CURRENTWORKGROUPSIZE, "hsa_currentworkgroupsize",
|
||||
BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
|
996
gcc/hsa-common.c
996
gcc/hsa-common.c
|
@ -1,996 +0,0 @@
|
|||
/* Implementation of commonly needed HSAIL related functions and methods.
|
||||
Copyright (C) 2013-2020 Free Software Foundation, Inc.
|
||||
Contributed by Martin Jambor <mjambor@suse.cz> and
|
||||
Martin Liska <mliska@suse.cz>.
|
||||
|
||||
This file is part of GCC.
|
||||
|
||||
GCC is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 3, or (at your option)
|
||||
any later version.
|
||||
|
||||
GCC is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with GCC; see the file COPYING3. If not see
|
||||
<http://www.gnu.org/licenses/>. */
|
||||
|
||||
#include "config.h"
|
||||
#include "system.h"
|
||||
#include "coretypes.h"
|
||||
#include "tm.h"
|
||||
#include "is-a.h"
|
||||
#include "hash-set.h"
|
||||
#include "hash-map.h"
|
||||
#include "vec.h"
|
||||
#include "tree.h"
|
||||
#include "dumpfile.h"
|
||||
#include "gimple-pretty-print.h"
|
||||
#include "diagnostic-core.h"
|
||||
#include "alloc-pool.h"
|
||||
#include "cgraph.h"
|
||||
#include "print-tree.h"
|
||||
#include "stringpool.h"
|
||||
#include "symbol-summary.h"
|
||||
#include "hsa-common.h"
|
||||
#include "internal-fn.h"
|
||||
#include "ctype.h"
|
||||
#include "builtins.h"
|
||||
#include "stringpool.h"
|
||||
#include "attribs.h"
|
||||
|
||||
/* Structure containing intermediate HSA representation of the generated
|
||||
function. */
|
||||
class hsa_function_representation *hsa_cfun;
|
||||
|
||||
/* Element of the mapping vector between a host decl and an HSA kernel. */
|
||||
|
||||
struct GTY(()) hsa_decl_kernel_map_element
|
||||
{
|
||||
/* The decl of the host function. */
|
||||
tree decl;
|
||||
/* Name of the HSA kernel in BRIG. */
|
||||
char * GTY((skip)) name;
|
||||
/* Size of OMP data, if the kernel contains a kernel dispatch. */
|
||||
unsigned omp_data_size;
|
||||
/* True if the function is gridified kernel. */
|
||||
bool gridified_kernel_p;
|
||||
};
|
||||
|
||||
/* Mapping between decls and corresponding HSA kernels in this compilation
|
||||
unit. */
|
||||
|
||||
static GTY (()) vec<hsa_decl_kernel_map_element, va_gc>
|
||||
*hsa_decl_kernel_mapping;
|
||||
|
||||
/* Mapping between decls and corresponding HSA kernels
|
||||
called by the function. */
|
||||
hash_map <tree, vec <const char *> *> *hsa_decl_kernel_dependencies;
|
||||
|
||||
/* Hash function to lookup a symbol for a decl. */
|
||||
hash_table <hsa_noop_symbol_hasher> *hsa_global_variable_symbols;
|
||||
|
||||
/* HSA summaries. */
|
||||
hsa_summary_t *hsa_summaries = NULL;
|
||||
|
||||
/* HSA number of threads. */
|
||||
hsa_symbol *hsa_num_threads = NULL;
|
||||
|
||||
/* HSA function that cannot be expanded to HSAIL. */
|
||||
hash_set <tree> *hsa_failed_functions = NULL;
|
||||
|
||||
/* True if compilation unit-wide data are already allocated and initialized. */
|
||||
static bool compilation_unit_data_initialized;
|
||||
|
||||
/* Return true if FNDECL represents an HSA-callable function. */
|
||||
|
||||
bool
|
||||
hsa_callable_function_p (tree fndecl)
|
||||
{
|
||||
return (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (fndecl))
|
||||
&& !lookup_attribute ("oacc function", DECL_ATTRIBUTES (fndecl)));
|
||||
}
|
||||
|
||||
/* Allocate HSA structures that are used when dealing with different
|
||||
functions. */
|
||||
|
||||
void
|
||||
hsa_init_compilation_unit_data (void)
|
||||
{
|
||||
if (compilation_unit_data_initialized)
|
||||
return;
|
||||
|
||||
compilation_unit_data_initialized = true;
|
||||
|
||||
hsa_global_variable_symbols = new hash_table <hsa_noop_symbol_hasher> (8);
|
||||
hsa_failed_functions = new hash_set <tree> ();
|
||||
hsa_emitted_internal_decls = new hash_table <hsa_internal_fn_hasher> (2);
|
||||
}
|
||||
|
||||
/* Free data structures that are used when dealing with different
|
||||
functions. */
|
||||
|
||||
void
|
||||
hsa_deinit_compilation_unit_data (void)
|
||||
{
|
||||
gcc_assert (compilation_unit_data_initialized);
|
||||
|
||||
delete hsa_failed_functions;
|
||||
delete hsa_emitted_internal_decls;
|
||||
|
||||
for (hash_table <hsa_noop_symbol_hasher>::iterator it
|
||||
= hsa_global_variable_symbols->begin ();
|
||||
it != hsa_global_variable_symbols->end ();
|
||||
++it)
|
||||
{
|
||||
hsa_symbol *sym = *it;
|
||||
delete sym;
|
||||
}
|
||||
|
||||
delete hsa_global_variable_symbols;
|
||||
|
||||
if (hsa_num_threads)
|
||||
{
|
||||
delete hsa_num_threads;
|
||||
hsa_num_threads = NULL;
|
||||
}
|
||||
|
||||
compilation_unit_data_initialized = false;
|
||||
}
|
||||
|
||||
/* Return true if we are generating large HSA machine model. */
|
||||
|
||||
bool
|
||||
hsa_machine_large_p (void)
|
||||
{
|
||||
/* FIXME: I suppose this is technically wrong but should work for me now. */
|
||||
return (GET_MODE_BITSIZE (Pmode) == 64);
|
||||
}
|
||||
|
||||
/* Return the HSA profile we are using. */
|
||||
|
||||
bool
|
||||
hsa_full_profile_p (void)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
/* Return true if a register in operand number OPNUM of instruction
|
||||
is an output. False if it is an input. */
|
||||
|
||||
bool
|
||||
hsa_insn_basic::op_output_p (unsigned opnum)
|
||||
{
|
||||
switch (m_opcode)
|
||||
{
|
||||
case HSA_OPCODE_PHI:
|
||||
case BRIG_OPCODE_CBR:
|
||||
case BRIG_OPCODE_SBR:
|
||||
case BRIG_OPCODE_ST:
|
||||
case BRIG_OPCODE_SIGNALNORET:
|
||||
case BRIG_OPCODE_DEBUGTRAP:
|
||||
/* FIXME: There are probably missing cases here, double check. */
|
||||
return false;
|
||||
case BRIG_OPCODE_EXPAND:
|
||||
/* Example: expand_v4_b32_b128 (dest0, dest1, dest2, dest3), src0. */
|
||||
return opnum < operand_count () - 1;
|
||||
default:
|
||||
return opnum == 0;
|
||||
}
|
||||
}
|
||||
|
||||
/* Return true if OPCODE is an floating-point bit instruction opcode. */
|
||||
|
||||
bool
|
||||
hsa_opcode_floating_bit_insn_p (BrigOpcode16_t opcode)
|
||||
{
|
||||
switch (opcode)
|
||||
{
|
||||
case BRIG_OPCODE_NEG:
|
||||
case BRIG_OPCODE_ABS:
|
||||
case BRIG_OPCODE_CLASS:
|
||||
case BRIG_OPCODE_COPYSIGN:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/* Return the number of destination operands for this INSN. */
|
||||
|
||||
unsigned
|
||||
hsa_insn_basic::input_count ()
|
||||
{
|
||||
switch (m_opcode)
|
||||
{
|
||||
default:
|
||||
return 1;
|
||||
|
||||
case BRIG_OPCODE_NOP:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_EXPAND:
|
||||
return 2;
|
||||
|
||||
case BRIG_OPCODE_LD:
|
||||
/* ld_v[234] not yet handled. */
|
||||
return 1;
|
||||
|
||||
case BRIG_OPCODE_ST:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_ATOMICNORET:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_SIGNAL:
|
||||
return 1;
|
||||
|
||||
case BRIG_OPCODE_SIGNALNORET:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_MEMFENCE:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_RDIMAGE:
|
||||
case BRIG_OPCODE_LDIMAGE:
|
||||
case BRIG_OPCODE_STIMAGE:
|
||||
case BRIG_OPCODE_QUERYIMAGE:
|
||||
case BRIG_OPCODE_QUERYSAMPLER:
|
||||
sorry ("HSA image ops not handled");
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_CBR:
|
||||
case BRIG_OPCODE_BR:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_SBR:
|
||||
return 0; /* ??? */
|
||||
|
||||
case BRIG_OPCODE_WAVEBARRIER:
|
||||
return 0; /* ??? */
|
||||
|
||||
case BRIG_OPCODE_BARRIER:
|
||||
case BRIG_OPCODE_ARRIVEFBAR:
|
||||
case BRIG_OPCODE_INITFBAR:
|
||||
case BRIG_OPCODE_JOINFBAR:
|
||||
case BRIG_OPCODE_LEAVEFBAR:
|
||||
case BRIG_OPCODE_RELEASEFBAR:
|
||||
case BRIG_OPCODE_WAITFBAR:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_LDF:
|
||||
return 1;
|
||||
|
||||
case BRIG_OPCODE_ACTIVELANECOUNT:
|
||||
case BRIG_OPCODE_ACTIVELANEID:
|
||||
case BRIG_OPCODE_ACTIVELANEMASK:
|
||||
case BRIG_OPCODE_ACTIVELANEPERMUTE:
|
||||
return 1; /* ??? */
|
||||
|
||||
case BRIG_OPCODE_CALL:
|
||||
case BRIG_OPCODE_SCALL:
|
||||
case BRIG_OPCODE_ICALL:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_RET:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_ALLOCA:
|
||||
return 1;
|
||||
|
||||
case BRIG_OPCODE_CLEARDETECTEXCEPT:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_SETDETECTEXCEPT:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_PACKETCOMPLETIONSIG:
|
||||
case BRIG_OPCODE_PACKETID:
|
||||
case BRIG_OPCODE_CASQUEUEWRITEINDEX:
|
||||
case BRIG_OPCODE_LDQUEUEREADINDEX:
|
||||
case BRIG_OPCODE_LDQUEUEWRITEINDEX:
|
||||
case BRIG_OPCODE_STQUEUEREADINDEX:
|
||||
case BRIG_OPCODE_STQUEUEWRITEINDEX:
|
||||
return 1; /* ??? */
|
||||
|
||||
case BRIG_OPCODE_ADDQUEUEWRITEINDEX:
|
||||
return 1;
|
||||
|
||||
case BRIG_OPCODE_DEBUGTRAP:
|
||||
return 0;
|
||||
|
||||
case BRIG_OPCODE_GROUPBASEPTR:
|
||||
case BRIG_OPCODE_KERNARGBASEPTR:
|
||||
return 1; /* ??? */
|
||||
|
||||
case HSA_OPCODE_ARG_BLOCK:
|
||||
return 0;
|
||||
|
||||
case BRIG_KIND_DIRECTIVE_COMMENT:
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
/* Return the number of source operands for this INSN. */
|
||||
|
||||
unsigned
|
||||
hsa_insn_basic::num_used_ops ()
|
||||
{
|
||||
gcc_checking_assert (input_count () <= operand_count ());
|
||||
|
||||
return operand_count () - input_count ();
|
||||
}
|
||||
|
||||
/* Set alignment to VALUE. */
|
||||
|
||||
void
|
||||
hsa_insn_mem::set_align (BrigAlignment8_t value)
|
||||
{
|
||||
/* TODO: Perhaps remove this dump later on: */
|
||||
if (dump_file && (dump_flags & TDF_DETAILS) && value < m_align)
|
||||
{
|
||||
fprintf (dump_file, "Decreasing alignment to %u in instruction ", value);
|
||||
dump_hsa_insn (dump_file, this);
|
||||
}
|
||||
m_align = value;
|
||||
}
|
||||
|
||||
/* Return size of HSA type T in bits. */
|
||||
|
||||
unsigned
|
||||
hsa_type_bit_size (BrigType16_t t)
|
||||
{
|
||||
switch (t)
|
||||
{
|
||||
case BRIG_TYPE_B1:
|
||||
return 1;
|
||||
|
||||
case BRIG_TYPE_U8:
|
||||
case BRIG_TYPE_S8:
|
||||
case BRIG_TYPE_B8:
|
||||
return 8;
|
||||
|
||||
case BRIG_TYPE_U16:
|
||||
case BRIG_TYPE_S16:
|
||||
case BRIG_TYPE_B16:
|
||||
case BRIG_TYPE_F16:
|
||||
return 16;
|
||||
|
||||
case BRIG_TYPE_U32:
|
||||
case BRIG_TYPE_S32:
|
||||
case BRIG_TYPE_B32:
|
||||
case BRIG_TYPE_F32:
|
||||
case BRIG_TYPE_U8X4:
|
||||
case BRIG_TYPE_U16X2:
|
||||
case BRIG_TYPE_S8X4:
|
||||
case BRIG_TYPE_S16X2:
|
||||
case BRIG_TYPE_F16X2:
|
||||
return 32;
|
||||
|
||||
case BRIG_TYPE_U64:
|
||||
case BRIG_TYPE_S64:
|
||||
case BRIG_TYPE_F64:
|
||||
case BRIG_TYPE_B64:
|
||||
case BRIG_TYPE_U8X8:
|
||||
case BRIG_TYPE_U16X4:
|
||||
case BRIG_TYPE_U32X2:
|
||||
case BRIG_TYPE_S8X8:
|
||||
case BRIG_TYPE_S16X4:
|
||||
case BRIG_TYPE_S32X2:
|
||||
case BRIG_TYPE_F16X4:
|
||||
case BRIG_TYPE_F32X2:
|
||||
|
||||
return 64;
|
||||
|
||||
case BRIG_TYPE_B128:
|
||||
case BRIG_TYPE_U8X16:
|
||||
case BRIG_TYPE_U16X8:
|
||||
case BRIG_TYPE_U32X4:
|
||||
case BRIG_TYPE_U64X2:
|
||||
case BRIG_TYPE_S8X16:
|
||||
case BRIG_TYPE_S16X8:
|
||||
case BRIG_TYPE_S32X4:
|
||||
case BRIG_TYPE_S64X2:
|
||||
case BRIG_TYPE_F16X8:
|
||||
case BRIG_TYPE_F32X4:
|
||||
case BRIG_TYPE_F64X2:
|
||||
return 128;
|
||||
|
||||
default:
|
||||
gcc_assert (hsa_seen_error ());
|
||||
return t;
|
||||
}
|
||||
}
|
||||
|
||||
/* Return BRIG bit-type with BITSIZE length. */
|
||||
|
||||
BrigType16_t
|
||||
hsa_bittype_for_bitsize (unsigned bitsize)
|
||||
{
|
||||
switch (bitsize)
|
||||
{
|
||||
case 1:
|
||||
return BRIG_TYPE_B1;
|
||||
case 8:
|
||||
return BRIG_TYPE_B8;
|
||||
case 16:
|
||||
return BRIG_TYPE_B16;
|
||||
case 32:
|
||||
return BRIG_TYPE_B32;
|
||||
case 64:
|
||||
return BRIG_TYPE_B64;
|
||||
case 128:
|
||||
return BRIG_TYPE_B128;
|
||||
default:
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
/* Return BRIG unsigned int type with BITSIZE length. */
|
||||
|
||||
BrigType16_t
|
||||
hsa_uint_for_bitsize (unsigned bitsize)
|
||||
{
|
||||
switch (bitsize)
|
||||
{
|
||||
case 8:
|
||||
return BRIG_TYPE_U8;
|
||||
case 16:
|
||||
return BRIG_TYPE_U16;
|
||||
case 32:
|
||||
return BRIG_TYPE_U32;
|
||||
case 64:
|
||||
return BRIG_TYPE_U64;
|
||||
default:
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
/* Return BRIG float type with BITSIZE length. */
|
||||
|
||||
BrigType16_t
|
||||
hsa_float_for_bitsize (unsigned bitsize)
|
||||
{
|
||||
switch (bitsize)
|
||||
{
|
||||
case 16:
|
||||
return BRIG_TYPE_F16;
|
||||
case 32:
|
||||
return BRIG_TYPE_F32;
|
||||
case 64:
|
||||
return BRIG_TYPE_F64;
|
||||
default:
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
/* Return HSA bit-type with the same size as the type T. */
|
||||
|
||||
BrigType16_t
|
||||
hsa_bittype_for_type (BrigType16_t t)
|
||||
{
|
||||
return hsa_bittype_for_bitsize (hsa_type_bit_size (t));
|
||||
}
|
||||
|
||||
/* Return HSA unsigned integer type with the same size as the type T. */
|
||||
|
||||
BrigType16_t
|
||||
hsa_unsigned_type_for_type (BrigType16_t t)
|
||||
{
|
||||
return hsa_uint_for_bitsize (hsa_type_bit_size (t));
|
||||
}
|
||||
|
||||
/* Return true if TYPE is a packed HSA type. */
|
||||
|
||||
bool
|
||||
hsa_type_packed_p (BrigType16_t type)
|
||||
{
|
||||
return (type & BRIG_TYPE_PACK_MASK) != BRIG_TYPE_PACK_NONE;
|
||||
}
|
||||
|
||||
/* Return true if and only if TYPE is a floating point number type. */
|
||||
|
||||
bool
|
||||
hsa_type_float_p (BrigType16_t type)
|
||||
{
|
||||
switch (type & BRIG_TYPE_BASE_MASK)
|
||||
{
|
||||
case BRIG_TYPE_F16:
|
||||
case BRIG_TYPE_F32:
|
||||
case BRIG_TYPE_F64:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/* Return true if and only if TYPE is an integer number type. */
|
||||
|
||||
bool
|
||||
hsa_type_integer_p (BrigType16_t type)
|
||||
{
|
||||
switch (type & BRIG_TYPE_BASE_MASK)
|
||||
{
|
||||
case BRIG_TYPE_U8:
|
||||
case BRIG_TYPE_U16:
|
||||
case BRIG_TYPE_U32:
|
||||
case BRIG_TYPE_U64:
|
||||
case BRIG_TYPE_S8:
|
||||
case BRIG_TYPE_S16:
|
||||
case BRIG_TYPE_S32:
|
||||
case BRIG_TYPE_S64:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/* Return true if and only if TYPE is an bit-type. */
|
||||
|
||||
bool
|
||||
hsa_btype_p (BrigType16_t type)
|
||||
{
|
||||
switch (type & BRIG_TYPE_BASE_MASK)
|
||||
{
|
||||
case BRIG_TYPE_B8:
|
||||
case BRIG_TYPE_B16:
|
||||
case BRIG_TYPE_B32:
|
||||
case BRIG_TYPE_B64:
|
||||
case BRIG_TYPE_B128:
|
||||
return true;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* Return HSA alignment encoding alignment to N bits. */
|
||||
|
||||
BrigAlignment8_t
|
||||
hsa_alignment_encoding (unsigned n)
|
||||
{
|
||||
gcc_assert (n >= 8 && !(n & (n - 1)));
|
||||
if (n >= 256)
|
||||
return BRIG_ALIGNMENT_32;
|
||||
|
||||
switch (n)
|
||||
{
|
||||
case 8:
|
||||
return BRIG_ALIGNMENT_1;
|
||||
case 16:
|
||||
return BRIG_ALIGNMENT_2;
|
||||
case 32:
|
||||
return BRIG_ALIGNMENT_4;
|
||||
case 64:
|
||||
return BRIG_ALIGNMENT_8;
|
||||
case 128:
|
||||
return BRIG_ALIGNMENT_16;
|
||||
default:
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
/* Return HSA alignment encoding alignment of T got
|
||||
by get_object_alignment. */
|
||||
|
||||
BrigAlignment8_t
|
||||
hsa_object_alignment (tree t)
|
||||
{
|
||||
return hsa_alignment_encoding (get_object_alignment (t));
|
||||
}
|
||||
|
||||
/* Return byte alignment for given BrigAlignment8_t value. */
|
||||
|
||||
unsigned
|
||||
hsa_byte_alignment (BrigAlignment8_t alignment)
|
||||
{
|
||||
gcc_assert (alignment != BRIG_ALIGNMENT_NONE);
|
||||
|
||||
return 1 << (alignment - 1);
|
||||
}
|
||||
|
||||
/* Return natural alignment of HSA TYPE. */
|
||||
|
||||
BrigAlignment8_t
|
||||
hsa_natural_alignment (BrigType16_t type)
|
||||
{
|
||||
return hsa_alignment_encoding (hsa_type_bit_size (type & ~BRIG_TYPE_ARRAY));
|
||||
}
|
||||
|
||||
/* Call the correct destructor of a HSA instruction. */
|
||||
|
||||
void
|
||||
hsa_destroy_insn (hsa_insn_basic *insn)
|
||||
{
|
||||
if (hsa_insn_phi *phi = dyn_cast <hsa_insn_phi *> (insn))
|
||||
phi->~hsa_insn_phi ();
|
||||
else if (hsa_insn_cbr *br = dyn_cast <hsa_insn_cbr *> (insn))
|
||||
br->~hsa_insn_cbr ();
|
||||
else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
|
||||
cmp->~hsa_insn_cmp ();
|
||||
else if (hsa_insn_mem *mem = dyn_cast <hsa_insn_mem *> (insn))
|
||||
mem->~hsa_insn_mem ();
|
||||
else if (hsa_insn_atomic *atomic = dyn_cast <hsa_insn_atomic *> (insn))
|
||||
atomic->~hsa_insn_atomic ();
|
||||
else if (hsa_insn_seg *seg = dyn_cast <hsa_insn_seg *> (insn))
|
||||
seg->~hsa_insn_seg ();
|
||||
else if (hsa_insn_call *call = dyn_cast <hsa_insn_call *> (insn))
|
||||
call->~hsa_insn_call ();
|
||||
else if (hsa_insn_arg_block *block = dyn_cast <hsa_insn_arg_block *> (insn))
|
||||
block->~hsa_insn_arg_block ();
|
||||
else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
|
||||
sbr->~hsa_insn_sbr ();
|
||||
else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
|
||||
br->~hsa_insn_br ();
|
||||
else if (hsa_insn_comment *comment = dyn_cast <hsa_insn_comment *> (insn))
|
||||
comment->~hsa_insn_comment ();
|
||||
else
|
||||
insn->~hsa_insn_basic ();
|
||||
}
|
||||
|
||||
/* Call the correct destructor of a HSA operand. */
|
||||
|
||||
void
|
||||
hsa_destroy_operand (hsa_op_base *op)
|
||||
{
|
||||
if (hsa_op_code_list *list = dyn_cast <hsa_op_code_list *> (op))
|
||||
list->~hsa_op_code_list ();
|
||||
else if (hsa_op_operand_list *list = dyn_cast <hsa_op_operand_list *> (op))
|
||||
list->~hsa_op_operand_list ();
|
||||
else if (hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op))
|
||||
reg->~hsa_op_reg ();
|
||||
else if (hsa_op_immed *immed = dyn_cast <hsa_op_immed *> (op))
|
||||
immed->~hsa_op_immed ();
|
||||
else
|
||||
op->~hsa_op_base ();
|
||||
}
|
||||
|
||||
/* Create a mapping between the original function DECL and kernel name NAME. */
|
||||
|
||||
void
|
||||
hsa_add_kern_decl_mapping (tree decl, char *name, unsigned omp_data_size,
|
||||
bool gridified_kernel_p)
|
||||
{
|
||||
hsa_decl_kernel_map_element dkm;
|
||||
dkm.decl = decl;
|
||||
dkm.name = name;
|
||||
dkm.omp_data_size = omp_data_size;
|
||||
dkm.gridified_kernel_p = gridified_kernel_p;
|
||||
vec_safe_push (hsa_decl_kernel_mapping, dkm);
|
||||
}
|
||||
|
||||
/* Return the number of kernel decl name mappings. */
|
||||
|
||||
unsigned
|
||||
hsa_get_number_decl_kernel_mappings (void)
|
||||
{
|
||||
return vec_safe_length (hsa_decl_kernel_mapping);
|
||||
}
|
||||
|
||||
/* Return the decl in the Ith kernel decl name mapping. */
|
||||
|
||||
tree
|
||||
hsa_get_decl_kernel_mapping_decl (unsigned i)
|
||||
{
|
||||
return (*hsa_decl_kernel_mapping)[i].decl;
|
||||
}
|
||||
|
||||
/* Return the name in the Ith kernel decl name mapping. */
|
||||
|
||||
char *
|
||||
hsa_get_decl_kernel_mapping_name (unsigned i)
|
||||
{
|
||||
return (*hsa_decl_kernel_mapping)[i].name;
|
||||
}
|
||||
|
||||
/* Return maximum OMP size for kernel decl name mapping. */
|
||||
|
||||
unsigned
|
||||
hsa_get_decl_kernel_mapping_omp_size (unsigned i)
|
||||
{
|
||||
return (*hsa_decl_kernel_mapping)[i].omp_data_size;
|
||||
}
|
||||
|
||||
/* Return if the function is gridified kernel in decl name mapping. */
|
||||
|
||||
bool
|
||||
hsa_get_decl_kernel_mapping_gridified (unsigned i)
|
||||
{
|
||||
return (*hsa_decl_kernel_mapping)[i].gridified_kernel_p;
|
||||
}
|
||||
|
||||
/* Free the mapping between original decls and kernel names. */
|
||||
|
||||
void
|
||||
hsa_free_decl_kernel_mapping (void)
|
||||
{
|
||||
if (hsa_decl_kernel_mapping == NULL)
|
||||
return;
|
||||
|
||||
for (unsigned i = 0; i < hsa_decl_kernel_mapping->length (); ++i)
|
||||
free ((*hsa_decl_kernel_mapping)[i].name);
|
||||
ggc_free (hsa_decl_kernel_mapping);
|
||||
}
|
||||
|
||||
/* Add new kernel dependency. */
|
||||
|
||||
void
|
||||
hsa_add_kernel_dependency (tree caller, const char *called_function)
|
||||
{
|
||||
if (hsa_decl_kernel_dependencies == NULL)
|
||||
hsa_decl_kernel_dependencies = new hash_map<tree, vec<const char *> *> ();
|
||||
|
||||
vec <const char *> *s = NULL;
|
||||
vec <const char *> **slot = hsa_decl_kernel_dependencies->get (caller);
|
||||
if (slot == NULL)
|
||||
{
|
||||
s = new vec <const char *> ();
|
||||
hsa_decl_kernel_dependencies->put (caller, s);
|
||||
}
|
||||
else
|
||||
s = *slot;
|
||||
|
||||
s->safe_push (called_function);
|
||||
}
|
||||
|
||||
/* Expansion to HSA needs a few gc roots to hold types, constructors etc. In
|
||||
order to minimize the number of GTY roots, we'll root them all in the
|
||||
following array. The individual elements should only be accessed by the
|
||||
very simple getters (of a pointer-to-tree) below. */
|
||||
|
||||
static GTY(()) tree hsa_tree_gt_roots[3];
|
||||
|
||||
tree *
|
||||
hsa_get_ctor_statements (void)
|
||||
{
|
||||
return &hsa_tree_gt_roots[0];
|
||||
}
|
||||
|
||||
tree *
|
||||
hsa_get_dtor_statements (void)
|
||||
{
|
||||
return &hsa_tree_gt_roots[1];
|
||||
}
|
||||
|
||||
tree *
|
||||
hsa_get_kernel_dispatch_type (void)
|
||||
{
|
||||
return &hsa_tree_gt_roots[2];
|
||||
}
|
||||
|
||||
/* Modify the name P in-place so that it is a valid HSA identifier. */
|
||||
|
||||
void
|
||||
hsa_sanitize_name (char *p)
|
||||
{
|
||||
for (; *p; p++)
|
||||
if (*p == '.' || *p == '-')
|
||||
*p = '_';
|
||||
}
|
||||
|
||||
/* Clone the name P, set trailing ampersand and sanitize the name. */
|
||||
|
||||
char *
|
||||
hsa_brig_function_name (const char *p)
|
||||
{
|
||||
unsigned len = strlen (p);
|
||||
char *buf = XNEWVEC (char, len + 2);
|
||||
|
||||
buf[0] = '&';
|
||||
buf[len + 1] = '\0';
|
||||
memcpy (buf + 1, p, len);
|
||||
|
||||
hsa_sanitize_name (buf);
|
||||
return buf;
|
||||
}
|
||||
|
||||
/* Add a flatten attribute and disable vectorization for gpu implementation
|
||||
function decl GDECL. */
|
||||
|
||||
void hsa_summary_t::process_gpu_implementation_attributes (tree gdecl)
|
||||
{
|
||||
DECL_ATTRIBUTES (gdecl)
|
||||
= tree_cons (get_identifier ("flatten"), NULL_TREE,
|
||||
DECL_ATTRIBUTES (gdecl));
|
||||
|
||||
tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
|
||||
if (fn_opts == NULL_TREE)
|
||||
fn_opts = optimization_default_node;
|
||||
fn_opts = copy_node (fn_opts);
|
||||
TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
|
||||
TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
|
||||
DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
|
||||
}
|
||||
|
||||
void
|
||||
hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
|
||||
hsa_function_kind kind, bool gridified_kernel_p)
|
||||
{
|
||||
hsa_function_summary *gpu_summary = get_create (gpu);
|
||||
hsa_function_summary *host_summary = get_create (host);
|
||||
|
||||
gpu_summary->m_kind = kind;
|
||||
host_summary->m_kind = kind;
|
||||
|
||||
gpu_summary->m_gpu_implementation_p = true;
|
||||
host_summary->m_gpu_implementation_p = false;
|
||||
|
||||
gpu_summary->m_gridified_kernel_p = gridified_kernel_p;
|
||||
host_summary->m_gridified_kernel_p = gridified_kernel_p;
|
||||
|
||||
gpu_summary->m_bound_function = host;
|
||||
host_summary->m_bound_function = gpu;
|
||||
|
||||
process_gpu_implementation_attributes (gpu->decl);
|
||||
|
||||
/* Create reference between a kernel and a corresponding host implementation
|
||||
to quarantee LTO streaming to a same LTRANS. */
|
||||
if (kind == HSA_KERNEL)
|
||||
gpu->create_reference (host, IPA_REF_ADDR);
|
||||
}
|
||||
|
||||
/* Add a HOST function to HSA summaries. */
|
||||
|
||||
void
|
||||
hsa_register_kernel (cgraph_node *host)
|
||||
{
|
||||
if (hsa_summaries == NULL)
|
||||
hsa_summaries = new hsa_summary_t (symtab);
|
||||
hsa_function_summary *s = hsa_summaries->get_create (host);
|
||||
s->m_kind = HSA_KERNEL;
|
||||
}
|
||||
|
||||
/* Add a pair of functions to HSA summaries. GPU is an HSA implementation of
|
||||
a HOST function. */
|
||||
|
||||
void
|
||||
hsa_register_kernel (cgraph_node *gpu, cgraph_node *host)
|
||||
{
|
||||
if (hsa_summaries == NULL)
|
||||
hsa_summaries = new hsa_summary_t (symtab);
|
||||
hsa_summaries->link_functions (gpu, host, HSA_KERNEL, true);
|
||||
}
|
||||
|
||||
/* Return true if expansion of the current HSA function has already failed. */
|
||||
|
||||
bool
|
||||
hsa_seen_error (void)
|
||||
{
|
||||
return hsa_cfun->m_seen_error;
|
||||
}
|
||||
|
||||
/* Mark current HSA function as failed. */
|
||||
|
||||
void
|
||||
hsa_fail_cfun (void)
|
||||
{
|
||||
hsa_failed_functions->add (hsa_cfun->m_decl);
|
||||
hsa_cfun->m_seen_error = true;
|
||||
}
|
||||
|
||||
char *
|
||||
hsa_internal_fn::name ()
|
||||
{
|
||||
char *name = xstrdup (internal_fn_name (m_fn));
|
||||
for (char *ptr = name; *ptr; ptr++)
|
||||
*ptr = TOLOWER (*ptr);
|
||||
|
||||
const char *suffix = NULL;
|
||||
if (m_type_bit_size == 32)
|
||||
suffix = "f";
|
||||
|
||||
if (suffix)
|
||||
{
|
||||
char *name2 = concat (name, suffix, NULL);
|
||||
free (name);
|
||||
name = name2;
|
||||
}
|
||||
|
||||
hsa_sanitize_name (name);
|
||||
return name;
|
||||
}
|
||||
|
||||
unsigned
|
||||
hsa_internal_fn::get_arity ()
|
||||
{
|
||||
switch (m_fn)
|
||||
{
|
||||
case IFN_ACOS:
|
||||
case IFN_ASIN:
|
||||
case IFN_ATAN:
|
||||
case IFN_COS:
|
||||
case IFN_EXP:
|
||||
case IFN_EXP10:
|
||||
case IFN_EXP2:
|
||||
case IFN_EXPM1:
|
||||
case IFN_LOG:
|
||||
case IFN_LOG10:
|
||||
case IFN_LOG1P:
|
||||
case IFN_LOG2:
|
||||
case IFN_LOGB:
|
||||
case IFN_SIGNIFICAND:
|
||||
case IFN_SIN:
|
||||
case IFN_SQRT:
|
||||
case IFN_TAN:
|
||||
case IFN_CEIL:
|
||||
case IFN_FLOOR:
|
||||
case IFN_NEARBYINT:
|
||||
case IFN_RINT:
|
||||
case IFN_ROUND:
|
||||
case IFN_TRUNC:
|
||||
return 1;
|
||||
case IFN_ATAN2:
|
||||
case IFN_COPYSIGN:
|
||||
case IFN_FMOD:
|
||||
case IFN_POW:
|
||||
case IFN_REMAINDER:
|
||||
case IFN_SCALB:
|
||||
case IFN_LDEXP:
|
||||
return 2;
|
||||
case IFN_CLRSB:
|
||||
case IFN_CLZ:
|
||||
case IFN_CTZ:
|
||||
case IFN_FFS:
|
||||
case IFN_PARITY:
|
||||
case IFN_POPCOUNT:
|
||||
default:
|
||||
/* As we produce sorry message for unknown internal functions,
|
||||
reaching this label is definitely a bug. */
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
BrigType16_t
|
||||
hsa_internal_fn::get_argument_type (int n)
|
||||
{
|
||||
switch (m_fn)
|
||||
{
|
||||
case IFN_ACOS:
|
||||
case IFN_ASIN:
|
||||
case IFN_ATAN:
|
||||
case IFN_COS:
|
||||
case IFN_EXP:
|
||||
case IFN_EXP10:
|
||||
case IFN_EXP2:
|
||||
case IFN_EXPM1:
|
||||
case IFN_LOG:
|
||||
case IFN_LOG10:
|
||||
case IFN_LOG1P:
|
||||
case IFN_LOG2:
|
||||
case IFN_LOGB:
|
||||
case IFN_SIGNIFICAND:
|
||||
case IFN_SIN:
|
||||
case IFN_SQRT:
|
||||
case IFN_TAN:
|
||||
case IFN_CEIL:
|
||||
case IFN_FLOOR:
|
||||
case IFN_NEARBYINT:
|
||||
case IFN_RINT:
|
||||
case IFN_ROUND:
|
||||
case IFN_TRUNC:
|
||||
case IFN_ATAN2:
|
||||
case IFN_COPYSIGN:
|
||||
case IFN_FMOD:
|
||||
case IFN_POW:
|
||||
case IFN_REMAINDER:
|
||||
case IFN_SCALB:
|
||||
return hsa_float_for_bitsize (m_type_bit_size);
|
||||
case IFN_LDEXP:
|
||||
{
|
||||
if (n == -1 || n == 0)
|
||||
return hsa_float_for_bitsize (m_type_bit_size);
|
||||
else
|
||||
return BRIG_TYPE_S32;
|
||||
}
|
||||
default:
|
||||
/* As we produce sorry message for unknown internal functions,
|
||||
reaching this label is definitely a bug. */
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
#include "gt-hsa-common.h"
|
1419
gcc/hsa-common.h
1419
gcc/hsa-common.h
File diff suppressed because it is too large
Load diff
1278
gcc/hsa-dump.c
1278
gcc/hsa-dump.c
File diff suppressed because it is too large
Load diff
6694
gcc/hsa-gen.c
6694
gcc/hsa-gen.c
File diff suppressed because it is too large
Load diff
|
@ -1,729 +0,0 @@
|
|||
/* HSAIL IL Register allocation and out-of-SSA.
|
||||
Copyright (C) 2013-2020 Free Software Foundation, Inc.
|
||||
Contributed by Michael Matz <matz@suse.de>
|
||||
|
||||
This file is part of GCC.
|
||||
|
||||
GCC is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 3, or (at your option)
|
||||
any later version.
|
||||
|
||||
GCC is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with GCC; see the file COPYING3. If not see
|
||||
<http://www.gnu.org/licenses/>. */
|
||||
|
||||
#include "config.h"
|
||||
#include "system.h"
|
||||
#include "coretypes.h"
|
||||
#include "tm.h"
|
||||
#include "is-a.h"
|
||||
#include "vec.h"
|
||||
#include "tree.h"
|
||||
#include "dominance.h"
|
||||
#include "basic-block.h"
|
||||
#include "function.h"
|
||||
#include "cfganal.h"
|
||||
#include "cfg.h"
|
||||
#include "bitmap.h"
|
||||
#include "dumpfile.h"
|
||||
#include "cgraph.h"
|
||||
#include "print-tree.h"
|
||||
#include "cfghooks.h"
|
||||
#include "alloc-pool.h"
|
||||
#include "symbol-summary.h"
|
||||
#include "hsa-common.h"
|
||||
|
||||
|
||||
/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa. */
|
||||
|
||||
static void
|
||||
naive_process_phi (hsa_insn_phi *phi, const vec<edge> &predecessors)
|
||||
{
|
||||
unsigned count = phi->operand_count ();
|
||||
for (unsigned i = 0; i < count; i++)
|
||||
{
|
||||
gcc_checking_assert (phi->get_op (i));
|
||||
hsa_op_base *op = phi->get_op (i);
|
||||
hsa_bb *hbb;
|
||||
edge e;
|
||||
|
||||
if (!op)
|
||||
break;
|
||||
|
||||
e = predecessors[i];
|
||||
if (single_succ_p (e->src))
|
||||
hbb = hsa_bb_for_bb (e->src);
|
||||
else
|
||||
{
|
||||
basic_block old_dest = e->dest;
|
||||
hbb = hsa_init_new_bb (split_edge (e));
|
||||
|
||||
/* If switch insn used this edge, fix jump table. */
|
||||
hsa_bb *source = hsa_bb_for_bb (e->src);
|
||||
hsa_insn_sbr *sbr;
|
||||
if (source->m_last_insn
|
||||
&& (sbr = dyn_cast <hsa_insn_sbr *> (source->m_last_insn)))
|
||||
sbr->replace_all_labels (old_dest, hbb->m_bb);
|
||||
}
|
||||
|
||||
hsa_build_append_simple_mov (phi->m_dest, op, hbb);
|
||||
}
|
||||
}
|
||||
|
||||
/* Naive out-of SSA. */
|
||||
|
||||
static void
|
||||
naive_outof_ssa (void)
|
||||
{
|
||||
basic_block bb;
|
||||
|
||||
hsa_cfun->m_in_ssa = false;
|
||||
|
||||
FOR_ALL_BB_FN (bb, cfun)
|
||||
{
|
||||
hsa_bb *hbb = hsa_bb_for_bb (bb);
|
||||
hsa_insn_phi *phi;
|
||||
|
||||
/* naive_process_phi can call split_edge on an incoming edge which order if
|
||||
the incoming edges to the basic block and thus make it inconsistent with
|
||||
the ordering of PHI arguments, so we collect them in advance. */
|
||||
auto_vec<edge, 8> predecessors;
|
||||
unsigned pred_count = EDGE_COUNT (bb->preds);
|
||||
for (unsigned i = 0; i < pred_count; i++)
|
||||
predecessors.safe_push (EDGE_PRED (bb, i));
|
||||
|
||||
for (phi = hbb->m_first_phi;
|
||||
phi;
|
||||
phi = phi->m_next ? as_a <hsa_insn_phi *> (phi->m_next) : NULL)
|
||||
naive_process_phi (phi, predecessors);
|
||||
|
||||
/* Zap PHI nodes, they will be deallocated when everything else will. */
|
||||
hbb->m_first_phi = NULL;
|
||||
hbb->m_last_phi = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
/* Return register class number for the given HSA TYPE. 0 means the 'c' one
|
||||
bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class
|
||||
and 3 for 'q' 128 bit class. */
|
||||
|
||||
static int
|
||||
m_reg_class_for_type (BrigType16_t type)
|
||||
{
|
||||
switch (type)
|
||||
{
|
||||
case BRIG_TYPE_B1:
|
||||
return 0;
|
||||
|
||||
case BRIG_TYPE_U8:
|
||||
case BRIG_TYPE_U16:
|
||||
case BRIG_TYPE_U32:
|
||||
case BRIG_TYPE_S8:
|
||||
case BRIG_TYPE_S16:
|
||||
case BRIG_TYPE_S32:
|
||||
case BRIG_TYPE_F16:
|
||||
case BRIG_TYPE_F32:
|
||||
case BRIG_TYPE_B8:
|
||||
case BRIG_TYPE_B16:
|
||||
case BRIG_TYPE_B32:
|
||||
case BRIG_TYPE_U8X4:
|
||||
case BRIG_TYPE_S8X4:
|
||||
case BRIG_TYPE_U16X2:
|
||||
case BRIG_TYPE_S16X2:
|
||||
case BRIG_TYPE_F16X2:
|
||||
return 1;
|
||||
|
||||
case BRIG_TYPE_U64:
|
||||
case BRIG_TYPE_S64:
|
||||
case BRIG_TYPE_F64:
|
||||
case BRIG_TYPE_B64:
|
||||
case BRIG_TYPE_U8X8:
|
||||
case BRIG_TYPE_S8X8:
|
||||
case BRIG_TYPE_U16X4:
|
||||
case BRIG_TYPE_S16X4:
|
||||
case BRIG_TYPE_F16X4:
|
||||
case BRIG_TYPE_U32X2:
|
||||
case BRIG_TYPE_S32X2:
|
||||
case BRIG_TYPE_F32X2:
|
||||
return 2;
|
||||
|
||||
case BRIG_TYPE_B128:
|
||||
case BRIG_TYPE_U8X16:
|
||||
case BRIG_TYPE_S8X16:
|
||||
case BRIG_TYPE_U16X8:
|
||||
case BRIG_TYPE_S16X8:
|
||||
case BRIG_TYPE_F16X8:
|
||||
case BRIG_TYPE_U32X4:
|
||||
case BRIG_TYPE_U64X2:
|
||||
case BRIG_TYPE_S32X4:
|
||||
case BRIG_TYPE_S64X2:
|
||||
case BRIG_TYPE_F32X4:
|
||||
case BRIG_TYPE_F64X2:
|
||||
return 3;
|
||||
|
||||
default:
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
/* If the Ith operands of INSN is or contains a register (in an address),
|
||||
return the address of that register operand. If not return NULL. */
|
||||
|
||||
static hsa_op_reg **
|
||||
insn_reg_addr (hsa_insn_basic *insn, int i)
|
||||
{
|
||||
hsa_op_base *op = insn->get_op (i);
|
||||
if (!op)
|
||||
return NULL;
|
||||
hsa_op_reg *reg = dyn_cast <hsa_op_reg *> (op);
|
||||
if (reg)
|
||||
return (hsa_op_reg **) insn->get_op_addr (i);
|
||||
hsa_op_address *addr = dyn_cast <hsa_op_address *> (op);
|
||||
if (addr && addr->m_reg)
|
||||
return &addr->m_reg;
|
||||
return NULL;
|
||||
}
|
||||
|
||||
struct m_reg_class_desc
|
||||
{
|
||||
unsigned next_avail, max_num;
|
||||
unsigned used_num, max_used;
|
||||
uint64_t used[2];
|
||||
char cl_char;
|
||||
};
|
||||
|
||||
/* Rewrite the instructions in BB to observe spilled live ranges.
|
||||
CLASSES is the global register class state. */
|
||||
|
||||
static void
|
||||
rewrite_code_bb (basic_block bb, struct m_reg_class_desc *classes)
|
||||
{
|
||||
hsa_bb *hbb = hsa_bb_for_bb (bb);
|
||||
hsa_insn_basic *insn, *next_insn;
|
||||
|
||||
for (insn = hbb->m_first_insn; insn; insn = next_insn)
|
||||
{
|
||||
next_insn = insn->m_next;
|
||||
unsigned count = insn->operand_count ();
|
||||
for (unsigned i = 0; i < count; i++)
|
||||
{
|
||||
gcc_checking_assert (insn->get_op (i));
|
||||
hsa_op_reg **regaddr = insn_reg_addr (insn, i);
|
||||
|
||||
if (regaddr)
|
||||
{
|
||||
hsa_op_reg *reg = *regaddr;
|
||||
if (reg->m_reg_class)
|
||||
continue;
|
||||
gcc_assert (reg->m_spill_sym);
|
||||
|
||||
int cl = m_reg_class_for_type (reg->m_type);
|
||||
hsa_op_reg *tmp, *tmp2;
|
||||
if (insn->op_output_p (i))
|
||||
tmp = hsa_spill_out (insn, reg, &tmp2);
|
||||
else
|
||||
tmp = hsa_spill_in (insn, reg, &tmp2);
|
||||
|
||||
*regaddr = tmp;
|
||||
|
||||
tmp->m_reg_class = classes[cl].cl_char;
|
||||
tmp->m_hard_num = (char) (classes[cl].max_num + i);
|
||||
if (tmp2)
|
||||
{
|
||||
gcc_assert (cl == 0);
|
||||
tmp2->m_reg_class = classes[1].cl_char;
|
||||
tmp2->m_hard_num = (char) (classes[1].max_num + i);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Dump current function to dump file F, with info specific
|
||||
to register allocation. */
|
||||
|
||||
void
|
||||
dump_hsa_cfun_regalloc (FILE *f)
|
||||
{
|
||||
basic_block bb;
|
||||
|
||||
fprintf (f, "\nHSAIL IL for %s\n", hsa_cfun->m_name);
|
||||
|
||||
FOR_ALL_BB_FN (bb, cfun)
|
||||
{
|
||||
hsa_bb *hbb = (class hsa_bb *) bb->aux;
|
||||
bitmap_print (dump_file, hbb->m_livein, "m_livein ", "\n");
|
||||
dump_hsa_bb (f, hbb);
|
||||
bitmap_print (dump_file, hbb->m_liveout, "m_liveout ", "\n");
|
||||
}
|
||||
}
|
||||
|
||||
/* Given the global register allocation state CLASSES and a
|
||||
register REG, try to give it a hardware register. If successful,
|
||||
store that hardreg in REG and return it, otherwise return -1.
|
||||
Also changes CLASSES to accommodate for the allocated register. */
|
||||
|
||||
static int
|
||||
try_alloc_reg (struct m_reg_class_desc *classes, hsa_op_reg *reg)
|
||||
{
|
||||
int cl = m_reg_class_for_type (reg->m_type);
|
||||
int ret = -1;
|
||||
if (classes[1].used_num + classes[2].used_num * 2 + classes[3].used_num * 4
|
||||
>= 128 - 5)
|
||||
return -1;
|
||||
if (classes[cl].used_num < classes[cl].max_num)
|
||||
{
|
||||
unsigned int i;
|
||||
classes[cl].used_num++;
|
||||
if (classes[cl].used_num > classes[cl].max_used)
|
||||
classes[cl].max_used = classes[cl].used_num;
|
||||
for (i = 0; i < classes[cl].used_num; i++)
|
||||
if (! (classes[cl].used[i / 64] & (((uint64_t)1) << (i & 63))))
|
||||
break;
|
||||
ret = i;
|
||||
classes[cl].used[i / 64] |= (((uint64_t)1) << (i & 63));
|
||||
reg->m_reg_class = classes[cl].cl_char;
|
||||
reg->m_hard_num = i;
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
/* Free up hardregs used by REG, into allocation state CLASSES. */
|
||||
|
||||
static void
|
||||
free_reg (struct m_reg_class_desc *classes, hsa_op_reg *reg)
|
||||
{
|
||||
int cl = m_reg_class_for_type (reg->m_type);
|
||||
int ret = reg->m_hard_num;
|
||||
gcc_assert (reg->m_reg_class == classes[cl].cl_char);
|
||||
classes[cl].used_num--;
|
||||
classes[cl].used[ret / 64] &= ~(((uint64_t)1) << (ret & 63));
|
||||
}
|
||||
|
||||
/* Note that the live range for REG ends at least at END. */
|
||||
|
||||
static void
|
||||
note_lr_end (hsa_op_reg *reg, int end)
|
||||
{
|
||||
if (reg->m_lr_end < end)
|
||||
reg->m_lr_end = end;
|
||||
}
|
||||
|
||||
/* Note that the live range for REG starts at least at BEGIN. */
|
||||
|
||||
static void
|
||||
note_lr_begin (hsa_op_reg *reg, int begin)
|
||||
{
|
||||
if (reg->m_lr_begin > begin)
|
||||
reg->m_lr_begin = begin;
|
||||
}
|
||||
|
||||
/* Given two registers A and B, return -1, 0 or 1 if A's live range
|
||||
starts before, at or after B's live range. */
|
||||
|
||||
static int
|
||||
cmp_begin (const void *a, const void *b)
|
||||
{
|
||||
const hsa_op_reg * const *rega = (const hsa_op_reg * const *)a;
|
||||
const hsa_op_reg * const *regb = (const hsa_op_reg * const *)b;
|
||||
int ret;
|
||||
if (rega == regb)
|
||||
return 0;
|
||||
ret = (*rega)->m_lr_begin - (*regb)->m_lr_begin;
|
||||
if (ret)
|
||||
return ret;
|
||||
return ((*rega)->m_order - (*regb)->m_order);
|
||||
}
|
||||
|
||||
/* Given two registers REGA and REGB, return true if REGA's
|
||||
live range ends after REGB's. This results in a sorting order
|
||||
with earlier end points at the end. */
|
||||
|
||||
static bool
|
||||
cmp_end (hsa_op_reg * const ®a, hsa_op_reg * const ®b)
|
||||
{
|
||||
int ret;
|
||||
if (rega == regb)
|
||||
return false;
|
||||
ret = (regb)->m_lr_end - (rega)->m_lr_end;
|
||||
if (ret)
|
||||
return ret < 0;
|
||||
return (((regb)->m_order - (rega)->m_order)) < 0;
|
||||
}
|
||||
|
||||
/* Expire all old intervals in ACTIVE (a per-regclass vector),
|
||||
that is, those that end before the interval REG starts. Give
|
||||
back resources freed so into the state CLASSES. */
|
||||
|
||||
static void
|
||||
expire_old_intervals (hsa_op_reg *reg, vec<hsa_op_reg*> *active,
|
||||
struct m_reg_class_desc *classes)
|
||||
{
|
||||
for (int i = 0; i < 4; i++)
|
||||
while (!active[i].is_empty ())
|
||||
{
|
||||
hsa_op_reg *a = active[i].pop ();
|
||||
if (a->m_lr_end > reg->m_lr_begin)
|
||||
{
|
||||
active[i].quick_push (a);
|
||||
break;
|
||||
}
|
||||
free_reg (classes, a);
|
||||
}
|
||||
}
|
||||
|
||||
/* The interval REG didn't get a hardreg. Spill it or one of those
|
||||
from ACTIVE (if the latter, then REG will become allocated to the
|
||||
hardreg that formerly was used by it). */
|
||||
|
||||
static void
|
||||
spill_at_interval (hsa_op_reg *reg, vec<hsa_op_reg*> *active)
|
||||
{
|
||||
int cl = m_reg_class_for_type (reg->m_type);
|
||||
gcc_assert (!active[cl].is_empty ());
|
||||
hsa_op_reg *cand = active[cl][0];
|
||||
if (cand->m_lr_end > reg->m_lr_end)
|
||||
{
|
||||
reg->m_reg_class = cand->m_reg_class;
|
||||
reg->m_hard_num = cand->m_hard_num;
|
||||
active[cl].ordered_remove (0);
|
||||
unsigned place = active[cl].lower_bound (reg, cmp_end);
|
||||
active[cl].quick_insert (place, reg);
|
||||
}
|
||||
else
|
||||
cand = reg;
|
||||
|
||||
gcc_assert (!cand->m_spill_sym);
|
||||
BrigType16_t type = cand->m_type;
|
||||
if (type == BRIG_TYPE_B1)
|
||||
type = BRIG_TYPE_U8;
|
||||
cand->m_reg_class = 0;
|
||||
cand->m_spill_sym = hsa_get_spill_symbol (type);
|
||||
cand->m_spill_sym->m_name_number = cand->m_order;
|
||||
}
|
||||
|
||||
/* Given the global register state CLASSES allocate all HSA virtual
|
||||
registers either to hardregs or to a spill symbol. */
|
||||
|
||||
static void
|
||||
linear_scan_regalloc (struct m_reg_class_desc *classes)
|
||||
{
|
||||
/* Compute liveness. */
|
||||
bool changed;
|
||||
int i, n;
|
||||
int insn_order;
|
||||
int *bbs = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
|
||||
bitmap work = BITMAP_ALLOC (NULL);
|
||||
vec<hsa_op_reg*> ind2reg = vNULL;
|
||||
vec<hsa_op_reg*> active[4] = {vNULL, vNULL, vNULL, vNULL};
|
||||
hsa_insn_basic *m_last_insn;
|
||||
|
||||
/* We will need the reverse post order for linearization,
|
||||
and the post order for liveness analysis, which is the same
|
||||
backward. */
|
||||
n = pre_and_rev_post_order_compute (NULL, bbs, true);
|
||||
ind2reg.safe_grow_cleared (hsa_cfun->m_reg_count);
|
||||
|
||||
/* Give all instructions a linearized number, at the same time
|
||||
build a mapping from register index to register. */
|
||||
insn_order = 1;
|
||||
for (i = 0; i < n; i++)
|
||||
{
|
||||
basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
|
||||
hsa_bb *hbb = hsa_bb_for_bb (bb);
|
||||
hsa_insn_basic *insn;
|
||||
for (insn = hbb->m_first_insn; insn; insn = insn->m_next)
|
||||
{
|
||||
unsigned opi;
|
||||
insn->m_number = insn_order++;
|
||||
for (opi = 0; opi < insn->operand_count (); opi++)
|
||||
{
|
||||
gcc_checking_assert (insn->get_op (opi));
|
||||
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
|
||||
if (regaddr)
|
||||
ind2reg[(*regaddr)->m_order] = *regaddr;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Initialize all live ranges to [after-end, 0). */
|
||||
for (i = 0; i < hsa_cfun->m_reg_count; i++)
|
||||
if (ind2reg[i])
|
||||
ind2reg[i]->m_lr_begin = insn_order, ind2reg[i]->m_lr_end = 0;
|
||||
|
||||
/* Classic liveness analysis, as long as something changes:
|
||||
m_liveout is union (m_livein of successors)
|
||||
m_livein is m_liveout minus defs plus uses. */
|
||||
do
|
||||
{
|
||||
changed = false;
|
||||
for (i = n - 1; i >= 0; i--)
|
||||
{
|
||||
edge e;
|
||||
edge_iterator ei;
|
||||
basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
|
||||
hsa_bb *hbb = hsa_bb_for_bb (bb);
|
||||
|
||||
/* Union of successors m_livein (or empty if none). */
|
||||
bool first = true;
|
||||
FOR_EACH_EDGE (e, ei, bb->succs)
|
||||
if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
|
||||
{
|
||||
hsa_bb *succ = hsa_bb_for_bb (e->dest);
|
||||
if (first)
|
||||
{
|
||||
bitmap_copy (work, succ->m_livein);
|
||||
first = false;
|
||||
}
|
||||
else
|
||||
bitmap_ior_into (work, succ->m_livein);
|
||||
}
|
||||
if (first)
|
||||
bitmap_clear (work);
|
||||
|
||||
bitmap_copy (hbb->m_liveout, work);
|
||||
|
||||
/* Remove defs, include uses in a backward insn walk. */
|
||||
hsa_insn_basic *insn;
|
||||
for (insn = hbb->m_last_insn; insn; insn = insn->m_prev)
|
||||
{
|
||||
unsigned opi;
|
||||
unsigned ndefs = insn->input_count ();
|
||||
for (opi = 0; opi < ndefs && insn->get_op (opi); opi++)
|
||||
{
|
||||
gcc_checking_assert (insn->get_op (opi));
|
||||
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
|
||||
if (regaddr)
|
||||
bitmap_clear_bit (work, (*regaddr)->m_order);
|
||||
}
|
||||
for (; opi < insn->operand_count (); opi++)
|
||||
{
|
||||
gcc_checking_assert (insn->get_op (opi));
|
||||
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
|
||||
if (regaddr)
|
||||
bitmap_set_bit (work, (*regaddr)->m_order);
|
||||
}
|
||||
}
|
||||
|
||||
/* Note if that changed something. */
|
||||
if (bitmap_ior_into (hbb->m_livein, work))
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
while (changed);
|
||||
|
||||
/* Make one pass through all instructions in linear order,
|
||||
noting and merging possible live range start and end points. */
|
||||
m_last_insn = NULL;
|
||||
for (i = n - 1; i >= 0; i--)
|
||||
{
|
||||
basic_block bb = BASIC_BLOCK_FOR_FN (cfun, bbs[i]);
|
||||
hsa_bb *hbb = hsa_bb_for_bb (bb);
|
||||
hsa_insn_basic *insn;
|
||||
int after_end_number;
|
||||
unsigned bit;
|
||||
bitmap_iterator bi;
|
||||
|
||||
if (m_last_insn)
|
||||
after_end_number = m_last_insn->m_number;
|
||||
else
|
||||
after_end_number = insn_order;
|
||||
/* Everything live-out in this BB has at least an end point
|
||||
after us. */
|
||||
EXECUTE_IF_SET_IN_BITMAP (hbb->m_liveout, 0, bit, bi)
|
||||
note_lr_end (ind2reg[bit], after_end_number);
|
||||
|
||||
for (insn = hbb->m_last_insn; insn; insn = insn->m_prev)
|
||||
{
|
||||
unsigned opi;
|
||||
unsigned ndefs = insn->input_count ();
|
||||
for (opi = 0; opi < insn->operand_count (); opi++)
|
||||
{
|
||||
gcc_checking_assert (insn->get_op (opi));
|
||||
hsa_op_reg **regaddr = insn_reg_addr (insn, opi);
|
||||
if (regaddr)
|
||||
{
|
||||
hsa_op_reg *reg = *regaddr;
|
||||
if (opi < ndefs)
|
||||
note_lr_begin (reg, insn->m_number);
|
||||
else
|
||||
note_lr_end (reg, insn->m_number);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Everything live-in in this BB has a start point before
|
||||
our first insn. */
|
||||
int before_start_number;
|
||||
if (hbb->m_first_insn)
|
||||
before_start_number = hbb->m_first_insn->m_number;
|
||||
else
|
||||
before_start_number = after_end_number;
|
||||
before_start_number--;
|
||||
EXECUTE_IF_SET_IN_BITMAP (hbb->m_livein, 0, bit, bi)
|
||||
note_lr_begin (ind2reg[bit], before_start_number);
|
||||
|
||||
if (hbb->m_first_insn)
|
||||
m_last_insn = hbb->m_first_insn;
|
||||
}
|
||||
|
||||
for (i = 0; i < hsa_cfun->m_reg_count; i++)
|
||||
if (ind2reg[i])
|
||||
{
|
||||
/* All regs that have still their start at after all code actually
|
||||
are defined at the start of the routine (prologue). */
|
||||
if (ind2reg[i]->m_lr_begin == insn_order)
|
||||
ind2reg[i]->m_lr_begin = 0;
|
||||
/* All regs that have no use but a def will have lr_end == 0,
|
||||
they are actually live from def until after the insn they are
|
||||
defined in. */
|
||||
if (ind2reg[i]->m_lr_end == 0)
|
||||
ind2reg[i]->m_lr_end = ind2reg[i]->m_lr_begin + 1;
|
||||
}
|
||||
|
||||
/* Sort all intervals by increasing start point. */
|
||||
gcc_assert (ind2reg.length () == (size_t) hsa_cfun->m_reg_count);
|
||||
|
||||
if (flag_checking)
|
||||
for (unsigned i = 0; i < ind2reg.length (); i++)
|
||||
gcc_assert (ind2reg[i]);
|
||||
|
||||
ind2reg.qsort (cmp_begin);
|
||||
for (i = 0; i < 4; i++)
|
||||
active[i].reserve_exact (hsa_cfun->m_reg_count);
|
||||
|
||||
/* Now comes the linear scan allocation. */
|
||||
for (i = 0; i < hsa_cfun->m_reg_count; i++)
|
||||
{
|
||||
hsa_op_reg *reg = ind2reg[i];
|
||||
if (!reg)
|
||||
continue;
|
||||
expire_old_intervals (reg, active, classes);
|
||||
int cl = m_reg_class_for_type (reg->m_type);
|
||||
if (try_alloc_reg (classes, reg) >= 0)
|
||||
{
|
||||
unsigned place = active[cl].lower_bound (reg, cmp_end);
|
||||
active[cl].quick_insert (place, reg);
|
||||
}
|
||||
else
|
||||
spill_at_interval (reg, active);
|
||||
|
||||
/* Some interesting dumping as we go. */
|
||||
if (dump_file && (dump_flags & TDF_DETAILS))
|
||||
{
|
||||
fprintf (dump_file, " reg%d: [%5d, %5d)->",
|
||||
reg->m_order, reg->m_lr_begin, reg->m_lr_end);
|
||||
if (reg->m_reg_class)
|
||||
fprintf (dump_file, "$%c%i", reg->m_reg_class, reg->m_hard_num);
|
||||
else
|
||||
fprintf (dump_file, "[%%__%s_%i]",
|
||||
hsa_seg_name (reg->m_spill_sym->m_segment),
|
||||
reg->m_spill_sym->m_name_number);
|
||||
for (int cl = 0; cl < 4; cl++)
|
||||
{
|
||||
bool first = true;
|
||||
hsa_op_reg *r;
|
||||
fprintf (dump_file, " {");
|
||||
for (int j = 0; active[cl].iterate (j, &r); j++)
|
||||
if (first)
|
||||
{
|
||||
fprintf (dump_file, "%d", r->m_order);
|
||||
first = false;
|
||||
}
|
||||
else
|
||||
fprintf (dump_file, ", %d", r->m_order);
|
||||
fprintf (dump_file, "}");
|
||||
}
|
||||
fprintf (dump_file, "\n");
|
||||
}
|
||||
}
|
||||
|
||||
BITMAP_FREE (work);
|
||||
free (bbs);
|
||||
|
||||
if (dump_file && (dump_flags & TDF_DETAILS))
|
||||
{
|
||||
fprintf (dump_file, "------- After liveness: -------\n");
|
||||
dump_hsa_cfun_regalloc (dump_file);
|
||||
fprintf (dump_file, " ----- Intervals:\n");
|
||||
for (i = 0; i < hsa_cfun->m_reg_count; i++)
|
||||
{
|
||||
hsa_op_reg *reg = ind2reg[i];
|
||||
if (!reg)
|
||||
continue;
|
||||
fprintf (dump_file, " reg%d: [%5d, %5d)->", reg->m_order,
|
||||
reg->m_lr_begin, reg->m_lr_end);
|
||||
if (reg->m_reg_class)
|
||||
fprintf (dump_file, "$%c%i\n", reg->m_reg_class, reg->m_hard_num);
|
||||
else
|
||||
fprintf (dump_file, "[%%__%s_%i]\n",
|
||||
hsa_seg_name (reg->m_spill_sym->m_segment),
|
||||
reg->m_spill_sym->m_name_number);
|
||||
}
|
||||
}
|
||||
|
||||
for (i = 0; i < 4; i++)
|
||||
active[i].release ();
|
||||
ind2reg.release ();
|
||||
}
|
||||
|
||||
/* Entry point for register allocation. */
|
||||
|
||||
static void
|
||||
regalloc (void)
|
||||
{
|
||||
basic_block bb;
|
||||
m_reg_class_desc classes[4];
|
||||
|
||||
/* If there are no registers used in the function, exit right away. */
|
||||
if (hsa_cfun->m_reg_count == 0)
|
||||
return;
|
||||
|
||||
memset (classes, 0, sizeof (classes));
|
||||
classes[0].next_avail = 0;
|
||||
classes[0].max_num = 7;
|
||||
classes[0].cl_char = 'c';
|
||||
classes[1].cl_char = 's';
|
||||
classes[2].cl_char = 'd';
|
||||
classes[3].cl_char = 'q';
|
||||
|
||||
for (int i = 1; i < 4; i++)
|
||||
{
|
||||
classes[i].next_avail = 0;
|
||||
classes[i].max_num = 20;
|
||||
}
|
||||
|
||||
linear_scan_regalloc (classes);
|
||||
|
||||
FOR_ALL_BB_FN (bb, cfun)
|
||||
rewrite_code_bb (bb, classes);
|
||||
}
|
||||
|
||||
/* Out of SSA and register allocation on HSAIL IL. */
|
||||
|
||||
void
|
||||
hsa_regalloc (void)
|
||||
{
|
||||
hsa_cfun->update_dominance ();
|
||||
naive_outof_ssa ();
|
||||
|
||||
if (dump_file && (dump_flags & TDF_DETAILS))
|
||||
{
|
||||
fprintf (dump_file, "------- After out-of-SSA: -------\n");
|
||||
dump_hsa_cfun (dump_file);
|
||||
}
|
||||
|
||||
regalloc ();
|
||||
|
||||
if (dump_file && (dump_flags & TDF_DETAILS))
|
||||
{
|
||||
fprintf (dump_file, "------- After register allocation: -------\n");
|
||||
dump_hsa_cfun (dump_file);
|
||||
}
|
||||
}
|
336
gcc/ipa-hsa.c
336
gcc/ipa-hsa.c
|
@ -1,336 +0,0 @@
|
|||
/* Callgraph based analysis of static variables.
|
||||
Copyright (C) 2015-2020 Free Software Foundation, Inc.
|
||||
Contributed by Martin Liska <mliska@suse.cz>
|
||||
|
||||
This file is part of GCC.
|
||||
|
||||
GCC is free software; you can redistribute it and/or modify it under
|
||||
the terms of the GNU General Public License as published by the Free
|
||||
Software Foundation; either version 3, or (at your option) any later
|
||||
version.
|
||||
|
||||
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
|
||||
WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
||||
for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with GCC; see the file COPYING3. If not see
|
||||
<http://www.gnu.org/licenses/>. */
|
||||
|
||||
/* Interprocedural HSA pass is responsible for creation of HSA clones.
|
||||
For all these HSA clones, we emit HSAIL instructions and pass processing
|
||||
is terminated. */
|
||||
|
||||
#include "config.h"
|
||||
#include "system.h"
|
||||
#include "coretypes.h"
|
||||
#include "tm.h"
|
||||
#include "is-a.h"
|
||||
#include "hash-set.h"
|
||||
#include "vec.h"
|
||||
#include "tree.h"
|
||||
#include "tree-pass.h"
|
||||
#include "function.h"
|
||||
#include "basic-block.h"
|
||||
#include "gimple.h"
|
||||
#include "dumpfile.h"
|
||||
#include "gimple-pretty-print.h"
|
||||
#include "tree-streamer.h"
|
||||
#include "stringpool.h"
|
||||
#include "cgraph.h"
|
||||
#include "print-tree.h"
|
||||
#include "alloc-pool.h"
|
||||
#include "symbol-summary.h"
|
||||
#include "hsa-common.h"
|
||||
|
||||
namespace {
|
||||
|
||||
/* If NODE is not versionable, warn about not emiting HSAIL and return false.
|
||||
Otherwise return true. */
|
||||
|
||||
static bool
|
||||
check_warn_node_versionable (cgraph_node *node)
|
||||
{
|
||||
if (!node->versionable)
|
||||
{
|
||||
warning_at (EXPR_LOCATION (node->decl), OPT_Whsa,
|
||||
"could not emit HSAIL for function %s: function cannot be "
|
||||
"cloned", node->dump_name ());
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/* The function creates HSA clones for all functions that were either
|
||||
marked as HSA kernels or are callable HSA functions. Apart from that,
|
||||
we redirect all edges that come from an HSA clone and end in another
|
||||
HSA clone to connect these two functions. */
|
||||
|
||||
static unsigned int
|
||||
process_hsa_functions (void)
|
||||
{
|
||||
struct cgraph_node *node;
|
||||
|
||||
if (hsa_summaries == NULL)
|
||||
hsa_summaries = new hsa_summary_t (symtab);
|
||||
|
||||
FOR_EACH_DEFINED_FUNCTION (node)
|
||||
{
|
||||
hsa_function_summary *s = hsa_summaries->get (node);
|
||||
|
||||
/* A linked function is skipped. */
|
||||
if (s != NULL && s->m_bound_function != NULL)
|
||||
continue;
|
||||
|
||||
if (s != NULL)
|
||||
{
|
||||
if (!check_warn_node_versionable (node))
|
||||
continue;
|
||||
cgraph_node *clone
|
||||
= node->create_virtual_clone (vec <cgraph_edge *> (),
|
||||
NULL, NULL, "hsa", 0);
|
||||
TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
|
||||
clone->externally_visible = node->externally_visible;
|
||||
|
||||
clone->force_output = true;
|
||||
hsa_summaries->link_functions (clone, node, s->m_kind, false);
|
||||
|
||||
if (dump_file)
|
||||
fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n",
|
||||
clone->dump_name (),
|
||||
s->m_kind == HSA_KERNEL ? "kernel" : "function");
|
||||
}
|
||||
else if (hsa_callable_function_p (node->decl)
|
||||
/* At this point, this is enough to identify clones for
|
||||
parallel, which for HSA would need to be kernels anyway. */
|
||||
&& !DECL_ARTIFICIAL (node->decl))
|
||||
{
|
||||
if (!check_warn_node_versionable (node))
|
||||
continue;
|
||||
cgraph_node *clone
|
||||
= node->create_virtual_clone (vec <cgraph_edge *> (),
|
||||
NULL, NULL, "hsa", 0);
|
||||
TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
|
||||
clone->externally_visible = node->externally_visible;
|
||||
|
||||
if (!node->local)
|
||||
clone->force_output = true;
|
||||
hsa_summaries->link_functions (clone, node, HSA_FUNCTION, false);
|
||||
|
||||
if (dump_file)
|
||||
fprintf (dump_file, "Created a new HSA function clone: %s\n",
|
||||
clone->dump_name ());
|
||||
}
|
||||
}
|
||||
|
||||
/* Redirect all edges that are between HSA clones. */
|
||||
FOR_EACH_DEFINED_FUNCTION (node)
|
||||
{
|
||||
cgraph_edge *e = node->callees;
|
||||
|
||||
while (e)
|
||||
{
|
||||
hsa_function_summary *src = hsa_summaries->get (node);
|
||||
if (src != NULL && src->m_gpu_implementation_p)
|
||||
{
|
||||
hsa_function_summary *dst = hsa_summaries->get (e->callee);
|
||||
if (dst != NULL && !dst->m_gpu_implementation_p)
|
||||
{
|
||||
e->redirect_callee (dst->m_bound_function);
|
||||
if (dump_file)
|
||||
fprintf (dump_file,
|
||||
"Redirecting edge to HSA function: %s->%s\n",
|
||||
e->caller->dump_name (),
|
||||
e->callee->dump_name ());
|
||||
}
|
||||
}
|
||||
|
||||
e = e->next_callee;
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Iterate all HSA functions and stream out HSA function summary. */
|
||||
|
||||
static void
|
||||
ipa_hsa_write_summary (void)
|
||||
{
|
||||
struct bitpack_d bp;
|
||||
struct cgraph_node *node;
|
||||
struct output_block *ob;
|
||||
unsigned int count = 0;
|
||||
lto_symtab_encoder_iterator lsei;
|
||||
lto_symtab_encoder_t encoder;
|
||||
|
||||
if (!hsa_summaries)
|
||||
return;
|
||||
|
||||
ob = create_output_block (LTO_section_ipa_hsa);
|
||||
encoder = ob->decl_state->symtab_node_encoder;
|
||||
ob->symbol = NULL;
|
||||
for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
|
||||
lsei_next_function_in_partition (&lsei))
|
||||
{
|
||||
node = lsei_cgraph_node (lsei);
|
||||
hsa_function_summary *s = hsa_summaries->get (node);
|
||||
|
||||
if (s != NULL)
|
||||
count++;
|
||||
}
|
||||
|
||||
streamer_write_uhwi (ob, count);
|
||||
|
||||
/* Process all of the functions. */
|
||||
for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei);
|
||||
lsei_next_function_in_partition (&lsei))
|
||||
{
|
||||
node = lsei_cgraph_node (lsei);
|
||||
hsa_function_summary *s = hsa_summaries->get (node);
|
||||
|
||||
if (s != NULL)
|
||||
{
|
||||
encoder = ob->decl_state->symtab_node_encoder;
|
||||
int node_ref = lto_symtab_encoder_encode (encoder, node);
|
||||
streamer_write_uhwi (ob, node_ref);
|
||||
|
||||
bp = bitpack_create (ob->main_stream);
|
||||
bp_pack_value (&bp, s->m_kind, 2);
|
||||
bp_pack_value (&bp, s->m_gpu_implementation_p, 1);
|
||||
bp_pack_value (&bp, s->m_bound_function != NULL, 1);
|
||||
streamer_write_bitpack (&bp);
|
||||
if (s->m_bound_function)
|
||||
stream_write_tree (ob, s->m_bound_function->decl, true);
|
||||
}
|
||||
}
|
||||
|
||||
streamer_write_char_stream (ob->main_stream, 0);
|
||||
produce_asm (ob, NULL);
|
||||
destroy_output_block (ob);
|
||||
}
|
||||
|
||||
/* Read section in file FILE_DATA of length LEN with data DATA. */
|
||||
|
||||
static void
|
||||
ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
|
||||
size_t len)
|
||||
{
|
||||
const struct lto_function_header *header
|
||||
= (const struct lto_function_header *) data;
|
||||
const int cfg_offset = sizeof (struct lto_function_header);
|
||||
const int main_offset = cfg_offset + header->cfg_size;
|
||||
const int string_offset = main_offset + header->main_size;
|
||||
class data_in *data_in;
|
||||
unsigned int i;
|
||||
unsigned int count;
|
||||
|
||||
lto_input_block ib_main ((const char *) data + main_offset,
|
||||
header->main_size, file_data->mode_table);
|
||||
|
||||
data_in
|
||||
= lto_data_in_create (file_data, (const char *) data + string_offset,
|
||||
header->string_size, vNULL);
|
||||
count = streamer_read_uhwi (&ib_main);
|
||||
|
||||
for (i = 0; i < count; i++)
|
||||
{
|
||||
unsigned int index;
|
||||
struct cgraph_node *node;
|
||||
lto_symtab_encoder_t encoder;
|
||||
|
||||
index = streamer_read_uhwi (&ib_main);
|
||||
encoder = file_data->symtab_node_encoder;
|
||||
node = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder,
|
||||
index));
|
||||
gcc_assert (node->definition);
|
||||
hsa_function_summary *s = hsa_summaries->get_create (node);
|
||||
|
||||
struct bitpack_d bp = streamer_read_bitpack (&ib_main);
|
||||
s->m_kind = (hsa_function_kind) bp_unpack_value (&bp, 2);
|
||||
s->m_gpu_implementation_p = bp_unpack_value (&bp, 1);
|
||||
bool has_tree = bp_unpack_value (&bp, 1);
|
||||
|
||||
if (has_tree)
|
||||
{
|
||||
tree decl = stream_read_tree (&ib_main, data_in);
|
||||
s->m_bound_function = cgraph_node::get_create (decl);
|
||||
}
|
||||
}
|
||||
lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data,
|
||||
len);
|
||||
lto_data_in_delete (data_in);
|
||||
}
|
||||
|
||||
/* Load streamed HSA functions summary and assign the summary to a function. */
|
||||
|
||||
static void
|
||||
ipa_hsa_read_summary (void)
|
||||
{
|
||||
struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
|
||||
struct lto_file_decl_data *file_data;
|
||||
unsigned int j = 0;
|
||||
|
||||
if (hsa_summaries == NULL)
|
||||
hsa_summaries = new hsa_summary_t (symtab);
|
||||
|
||||
while ((file_data = file_data_vec[j++]))
|
||||
{
|
||||
size_t len;
|
||||
const char *data
|
||||
= lto_get_summary_section_data (file_data, LTO_section_ipa_hsa, &len);
|
||||
if (data)
|
||||
ipa_hsa_read_section (file_data, data, len);
|
||||
}
|
||||
}
|
||||
|
||||
const pass_data pass_data_ipa_hsa =
|
||||
{
|
||||
IPA_PASS, /* type */
|
||||
"hsa", /* name */
|
||||
OPTGROUP_OMP, /* optinfo_flags */
|
||||
TV_IPA_HSA, /* tv_id */
|
||||
0, /* properties_required */
|
||||
0, /* properties_provided */
|
||||
0, /* properties_destroyed */
|
||||
0, /* todo_flags_start */
|
||||
TODO_dump_symtab, /* todo_flags_finish */
|
||||
};
|
||||
|
||||
class pass_ipa_hsa : public ipa_opt_pass_d
|
||||
{
|
||||
public:
|
||||
pass_ipa_hsa (gcc::context *ctxt)
|
||||
: ipa_opt_pass_d (pass_data_ipa_hsa, ctxt,
|
||||
NULL, /* generate_summary */
|
||||
ipa_hsa_write_summary, /* write_summary */
|
||||
ipa_hsa_read_summary, /* read_summary */
|
||||
ipa_hsa_write_summary, /* write_optimization_summary */
|
||||
ipa_hsa_read_summary, /* read_optimization_summary */
|
||||
NULL, /* stmt_fixup */
|
||||
0, /* function_transform_todo_flags_start */
|
||||
NULL, /* function_transform */
|
||||
NULL) /* variable_transform */
|
||||
{}
|
||||
|
||||
/* opt_pass methods: */
|
||||
virtual bool gate (function *);
|
||||
|
||||
virtual unsigned int execute (function *) { return process_hsa_functions (); }
|
||||
|
||||
}; // class pass_ipa_reference
|
||||
|
||||
bool
|
||||
pass_ipa_hsa::gate (function *)
|
||||
{
|
||||
return hsa_gen_requested_p ();
|
||||
}
|
||||
|
||||
} // anon namespace
|
||||
|
||||
ipa_opt_pass_d *
|
||||
make_pass_ipa_hsa (gcc::context *ctxt)
|
||||
{
|
||||
return new pass_ipa_hsa (ctxt);
|
||||
}
|
|
@ -53,7 +53,6 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
|
|||
"icf",
|
||||
"offload_table",
|
||||
"mode_table",
|
||||
"hsa",
|
||||
"lto",
|
||||
"ipa_sra",
|
||||
"odr_types",
|
||||
|
|
|
@ -224,7 +224,6 @@ enum lto_section_type
|
|||
LTO_section_ipa_icf,
|
||||
LTO_section_offload_table,
|
||||
LTO_section_mode_table,
|
||||
LTO_section_ipa_hsa,
|
||||
LTO_section_lto,
|
||||
LTO_section_ipa_sra,
|
||||
LTO_section_odr_types,
|
||||
|
|
|
@ -976,7 +976,6 @@ compile_images_for_offload_targets (unsigned in_argc, char *in_argv[],
|
|||
return;
|
||||
unsigned num_targets = parse_env_var (target_names, &names, NULL);
|
||||
|
||||
int next_name_entry = 0;
|
||||
const char *compiler_path = getenv ("COMPILER_PATH");
|
||||
if (!compiler_path)
|
||||
goto out;
|
||||
|
@ -986,19 +985,13 @@ compile_images_for_offload_targets (unsigned in_argc, char *in_argv[],
|
|||
offload_names = XCNEWVEC (char *, num_targets + 1);
|
||||
for (unsigned i = 0; i < num_targets; i++)
|
||||
{
|
||||
/* HSA does not use LTO-like streaming and a different compiler, skip
|
||||
it. */
|
||||
if (strcmp (names[i], "hsa") == 0)
|
||||
continue;
|
||||
|
||||
offload_names[next_name_entry]
|
||||
offload_names[i]
|
||||
= compile_offload_image (names[i], compiler_path, in_argc, in_argv,
|
||||
compiler_opts, compiler_opt_count,
|
||||
linker_opts, linker_opt_count);
|
||||
if (!offload_names[next_name_entry])
|
||||
if (!offload_names[i])
|
||||
fatal_error (input_location,
|
||||
"problem with building target image for %s", names[i]);
|
||||
next_name_entry++;
|
||||
}
|
||||
|
||||
out:
|
||||
|
|
457
gcc/omp-expand.c
457
gcc/omp-expand.c
|
@ -56,7 +56,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
#include "symbol-summary.h"
|
||||
#include "gomp-constants.h"
|
||||
#include "gimple-pretty-print.h"
|
||||
#include "hsa-common.h"
|
||||
#include "stringpool.h"
|
||||
#include "attribs.h"
|
||||
|
||||
|
@ -484,37 +483,6 @@ gimple_build_cond_empty (tree cond)
|
|||
return gimple_build_cond (pred_code, lhs, rhs, NULL_TREE, NULL_TREE);
|
||||
}
|
||||
|
||||
/* Return true if a parallel REGION is within a declare target function or
|
||||
within a target region and is not a part of a gridified target. */
|
||||
|
||||
static bool
|
||||
parallel_needs_hsa_kernel_p (struct omp_region *region)
|
||||
{
|
||||
bool indirect = false;
|
||||
for (region = region->outer; region; region = region->outer)
|
||||
{
|
||||
if (region->type == GIMPLE_OMP_PARALLEL)
|
||||
indirect = true;
|
||||
else if (region->type == GIMPLE_OMP_TARGET)
|
||||
{
|
||||
gomp_target *tgt_stmt
|
||||
= as_a <gomp_target *> (last_stmt (region->entry));
|
||||
|
||||
if (omp_find_clause (gimple_omp_target_clauses (tgt_stmt),
|
||||
OMP_CLAUSE__GRIDDIM_))
|
||||
return indirect;
|
||||
else
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
if (lookup_attribute ("omp declare target",
|
||||
DECL_ATTRIBUTES (current_function_decl)))
|
||||
return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Change DECL_CONTEXT of CHILD_FNDECL to that of the parent function.
|
||||
Add CHILD_FNDECL to decl chain of the supercontext of the block
|
||||
ENTRY_BLOCK - this is the block which originally contained the
|
||||
|
@ -772,13 +740,6 @@ expand_parallel_call (struct omp_region *region, basic_block bb,
|
|||
}
|
||||
force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
|
||||
false, GSI_CONTINUE_LINKING);
|
||||
|
||||
if (hsa_gen_requested_p ()
|
||||
&& parallel_needs_hsa_kernel_p (region))
|
||||
{
|
||||
cgraph_node *child_cnode = cgraph_node::get (child_fndecl);
|
||||
hsa_register_kernel (child_cnode);
|
||||
}
|
||||
}
|
||||
|
||||
/* Build the function call to GOMP_task to actually
|
||||
|
@ -8528,113 +8489,6 @@ mark_loops_in_oacc_kernels_region (basic_block region_entry,
|
|||
loop->in_oacc_kernels_region = true;
|
||||
}
|
||||
|
||||
/* Types used to pass grid and wortkgroup sizes to kernel invocation. */
|
||||
|
||||
struct GTY(()) grid_launch_attributes_trees
|
||||
{
|
||||
tree kernel_dim_array_type;
|
||||
tree kernel_lattrs_dimnum_decl;
|
||||
tree kernel_lattrs_grid_decl;
|
||||
tree kernel_lattrs_group_decl;
|
||||
tree kernel_launch_attributes_type;
|
||||
};
|
||||
|
||||
static GTY(()) struct grid_launch_attributes_trees *grid_attr_trees;
|
||||
|
||||
/* Create types used to pass kernel launch attributes to target. */
|
||||
|
||||
static void
|
||||
grid_create_kernel_launch_attr_types (void)
|
||||
{
|
||||
if (grid_attr_trees)
|
||||
return;
|
||||
grid_attr_trees = ggc_alloc <grid_launch_attributes_trees> ();
|
||||
|
||||
tree dim_arr_index_type
|
||||
= build_index_type (build_int_cst (integer_type_node, 2));
|
||||
grid_attr_trees->kernel_dim_array_type
|
||||
= build_array_type (uint32_type_node, dim_arr_index_type);
|
||||
|
||||
grid_attr_trees->kernel_launch_attributes_type = make_node (RECORD_TYPE);
|
||||
grid_attr_trees->kernel_lattrs_dimnum_decl
|
||||
= build_decl (BUILTINS_LOCATION, FIELD_DECL, get_identifier ("ndim"),
|
||||
uint32_type_node);
|
||||
DECL_CHAIN (grid_attr_trees->kernel_lattrs_dimnum_decl) = NULL_TREE;
|
||||
|
||||
grid_attr_trees->kernel_lattrs_grid_decl
|
||||
= build_decl (BUILTINS_LOCATION, FIELD_DECL, get_identifier ("grid_size"),
|
||||
grid_attr_trees->kernel_dim_array_type);
|
||||
DECL_CHAIN (grid_attr_trees->kernel_lattrs_grid_decl)
|
||||
= grid_attr_trees->kernel_lattrs_dimnum_decl;
|
||||
grid_attr_trees->kernel_lattrs_group_decl
|
||||
= build_decl (BUILTINS_LOCATION, FIELD_DECL, get_identifier ("group_size"),
|
||||
grid_attr_trees->kernel_dim_array_type);
|
||||
DECL_CHAIN (grid_attr_trees->kernel_lattrs_group_decl)
|
||||
= grid_attr_trees->kernel_lattrs_grid_decl;
|
||||
finish_builtin_struct (grid_attr_trees->kernel_launch_attributes_type,
|
||||
"__gomp_kernel_launch_attributes",
|
||||
grid_attr_trees->kernel_lattrs_group_decl, NULL_TREE);
|
||||
}
|
||||
|
||||
/* Insert before the current statement in GSI a store of VALUE to INDEX of
|
||||
array (of type kernel_dim_array_type) FLD_DECL of RANGE_VAR. VALUE must be
|
||||
of type uint32_type_node. */
|
||||
|
||||
static void
|
||||
grid_insert_store_range_dim (gimple_stmt_iterator *gsi, tree range_var,
|
||||
tree fld_decl, int index, tree value)
|
||||
{
|
||||
tree ref = build4 (ARRAY_REF, uint32_type_node,
|
||||
build3 (COMPONENT_REF,
|
||||
grid_attr_trees->kernel_dim_array_type,
|
||||
range_var, fld_decl, NULL_TREE),
|
||||
build_int_cst (integer_type_node, index),
|
||||
NULL_TREE, NULL_TREE);
|
||||
gsi_insert_before (gsi, gimple_build_assign (ref, value), GSI_SAME_STMT);
|
||||
}
|
||||
|
||||
/* Return a tree representation of a pointer to a structure with grid and
|
||||
work-group size information. Statements filling that information will be
|
||||
inserted before GSI, TGT_STMT is the target statement which has the
|
||||
necessary information in it. */
|
||||
|
||||
static tree
|
||||
grid_get_kernel_launch_attributes (gimple_stmt_iterator *gsi,
|
||||
gomp_target *tgt_stmt)
|
||||
{
|
||||
grid_create_kernel_launch_attr_types ();
|
||||
tree lattrs = create_tmp_var (grid_attr_trees->kernel_launch_attributes_type,
|
||||
"__kernel_launch_attrs");
|
||||
|
||||
unsigned max_dim = 0;
|
||||
for (tree clause = gimple_omp_target_clauses (tgt_stmt);
|
||||
clause;
|
||||
clause = OMP_CLAUSE_CHAIN (clause))
|
||||
{
|
||||
if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE__GRIDDIM_)
|
||||
continue;
|
||||
|
||||
unsigned dim = OMP_CLAUSE__GRIDDIM__DIMENSION (clause);
|
||||
max_dim = MAX (dim, max_dim);
|
||||
|
||||
grid_insert_store_range_dim (gsi, lattrs,
|
||||
grid_attr_trees->kernel_lattrs_grid_decl,
|
||||
dim, OMP_CLAUSE__GRIDDIM__SIZE (clause));
|
||||
grid_insert_store_range_dim (gsi, lattrs,
|
||||
grid_attr_trees->kernel_lattrs_group_decl,
|
||||
dim, OMP_CLAUSE__GRIDDIM__GROUP (clause));
|
||||
}
|
||||
|
||||
tree dimref = build3 (COMPONENT_REF, uint32_type_node, lattrs,
|
||||
grid_attr_trees->kernel_lattrs_dimnum_decl, NULL_TREE);
|
||||
gcc_checking_assert (max_dim <= 2);
|
||||
tree dimensions = build_int_cstu (uint32_type_node, max_dim + 1);
|
||||
gsi_insert_before (gsi, gimple_build_assign (dimref, dimensions),
|
||||
GSI_SAME_STMT);
|
||||
TREE_ADDRESSABLE (lattrs) = 1;
|
||||
return build_fold_addr_expr (lattrs);
|
||||
}
|
||||
|
||||
/* Build target argument identifier from the DEVICE identifier, value
|
||||
identifier ID and whether the element also has a SUBSEQUENT_PARAM. */
|
||||
|
||||
|
@ -8725,16 +8579,6 @@ get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt)
|
|||
GOMP_TARGET_ARG_THREAD_LIMIT, t,
|
||||
&args);
|
||||
|
||||
/* Add HSA-specific grid sizes, if available. */
|
||||
if (omp_find_clause (gimple_omp_target_clauses (tgt_stmt),
|
||||
OMP_CLAUSE__GRIDDIM_))
|
||||
{
|
||||
int id = GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES;
|
||||
t = get_target_argument_identifier (GOMP_DEVICE_HSA, true, id);
|
||||
args.quick_push (t);
|
||||
args.quick_push (grid_get_kernel_launch_attributes (gsi, tgt_stmt));
|
||||
}
|
||||
|
||||
/* Produce more, perhaps device specific, arguments here. */
|
||||
|
||||
tree argarray = create_tmp_var (build_array_type_nelts (ptr_type_node,
|
||||
|
@ -9351,302 +9195,6 @@ expand_omp_target (struct omp_region *region)
|
|||
}
|
||||
}
|
||||
|
||||
/* Expand KFOR loop as a HSA grifidied kernel, i.e. as a body only with
|
||||
iteration variable derived from the thread number. INTRA_GROUP means this
|
||||
is an expansion of a loop iterating over work-items within a separate
|
||||
iteration over groups. */
|
||||
|
||||
static void
|
||||
grid_expand_omp_for_loop (struct omp_region *kfor, bool intra_group)
|
||||
{
|
||||
gimple_stmt_iterator gsi;
|
||||
gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
|
||||
gcc_checking_assert (gimple_omp_for_kind (for_stmt)
|
||||
== GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
size_t collapse = gimple_omp_for_collapse (for_stmt);
|
||||
struct omp_for_data_loop *loops
|
||||
= XALLOCAVEC (struct omp_for_data_loop,
|
||||
gimple_omp_for_collapse (for_stmt));
|
||||
struct omp_for_data fd;
|
||||
|
||||
remove_edge (BRANCH_EDGE (kfor->entry));
|
||||
basic_block body_bb = FALLTHRU_EDGE (kfor->entry)->dest;
|
||||
|
||||
gcc_assert (kfor->cont);
|
||||
omp_extract_for_data (for_stmt, &fd, loops);
|
||||
|
||||
gsi = gsi_start_bb (body_bb);
|
||||
|
||||
for (size_t dim = 0; dim < collapse; dim++)
|
||||
{
|
||||
tree type, itype;
|
||||
itype = type = TREE_TYPE (fd.loops[dim].v);
|
||||
if (POINTER_TYPE_P (type))
|
||||
itype = signed_type_for (type);
|
||||
|
||||
tree n1 = fd.loops[dim].n1;
|
||||
tree step = fd.loops[dim].step;
|
||||
n1 = force_gimple_operand_gsi (&gsi, fold_convert (type, n1),
|
||||
true, NULL_TREE, true, GSI_SAME_STMT);
|
||||
step = force_gimple_operand_gsi (&gsi, fold_convert (itype, step),
|
||||
true, NULL_TREE, true, GSI_SAME_STMT);
|
||||
tree threadid;
|
||||
if (gimple_omp_for_grid_group_iter (for_stmt))
|
||||
{
|
||||
gcc_checking_assert (!intra_group);
|
||||
threadid = build_call_expr (builtin_decl_explicit
|
||||
(BUILT_IN_HSA_WORKGROUPID), 1,
|
||||
build_int_cstu (unsigned_type_node, dim));
|
||||
}
|
||||
else if (intra_group)
|
||||
threadid = build_call_expr (builtin_decl_explicit
|
||||
(BUILT_IN_HSA_WORKITEMID), 1,
|
||||
build_int_cstu (unsigned_type_node, dim));
|
||||
else
|
||||
threadid = build_call_expr (builtin_decl_explicit
|
||||
(BUILT_IN_HSA_WORKITEMABSID), 1,
|
||||
build_int_cstu (unsigned_type_node, dim));
|
||||
threadid = fold_convert (itype, threadid);
|
||||
threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE,
|
||||
true, GSI_SAME_STMT);
|
||||
|
||||
tree startvar = fd.loops[dim].v;
|
||||
tree t = fold_build2 (MULT_EXPR, itype, threadid, step);
|
||||
if (POINTER_TYPE_P (type))
|
||||
t = fold_build_pointer_plus (n1, t);
|
||||
else
|
||||
t = fold_build2 (PLUS_EXPR, type, t, n1);
|
||||
t = fold_convert (type, t);
|
||||
t = force_gimple_operand_gsi (&gsi, t,
|
||||
DECL_P (startvar)
|
||||
&& TREE_ADDRESSABLE (startvar),
|
||||
NULL_TREE, true, GSI_SAME_STMT);
|
||||
gassign *assign_stmt = gimple_build_assign (startvar, t);
|
||||
gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
|
||||
}
|
||||
/* Remove the omp for statement. */
|
||||
gsi = gsi_last_nondebug_bb (kfor->entry);
|
||||
gsi_remove (&gsi, true);
|
||||
|
||||
/* Remove the GIMPLE_OMP_CONTINUE statement. */
|
||||
gsi = gsi_last_nondebug_bb (kfor->cont);
|
||||
gcc_assert (!gsi_end_p (gsi)
|
||||
&& gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_CONTINUE);
|
||||
gsi_remove (&gsi, true);
|
||||
|
||||
/* Replace the GIMPLE_OMP_RETURN with a barrier, if necessary. */
|
||||
gsi = gsi_last_nondebug_bb (kfor->exit);
|
||||
gcc_assert (!gsi_end_p (gsi)
|
||||
&& gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
|
||||
if (intra_group)
|
||||
gsi_insert_before (&gsi, omp_build_barrier (NULL_TREE), GSI_SAME_STMT);
|
||||
gsi_remove (&gsi, true);
|
||||
|
||||
/* Fixup the much simpler CFG. */
|
||||
remove_edge (find_edge (kfor->cont, body_bb));
|
||||
|
||||
if (kfor->cont != body_bb)
|
||||
set_immediate_dominator (CDI_DOMINATORS, kfor->cont, body_bb);
|
||||
set_immediate_dominator (CDI_DOMINATORS, kfor->exit, kfor->cont);
|
||||
}
|
||||
|
||||
/* Structure passed to grid_remap_kernel_arg_accesses so that it can remap
|
||||
argument_decls. */
|
||||
|
||||
struct grid_arg_decl_map
|
||||
{
|
||||
tree old_arg;
|
||||
tree new_arg;
|
||||
};
|
||||
|
||||
/* Invoked through walk_gimple_op, will remap all PARM_DECLs to the ones
|
||||
pertaining to kernel function. */
|
||||
|
||||
static tree
|
||||
grid_remap_kernel_arg_accesses (tree *tp, int *walk_subtrees, void *data)
|
||||
{
|
||||
struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
|
||||
struct grid_arg_decl_map *adm = (struct grid_arg_decl_map *) wi->info;
|
||||
tree t = *tp;
|
||||
|
||||
if (t == adm->old_arg)
|
||||
*tp = adm->new_arg;
|
||||
*walk_subtrees = !TYPE_P (t) && !DECL_P (t);
|
||||
return NULL_TREE;
|
||||
}
|
||||
|
||||
/* If TARGET region contains a kernel body for loop, remove its region from the
|
||||
TARGET and expand it in HSA gridified kernel fashion. */
|
||||
|
||||
static void
|
||||
grid_expand_target_grid_body (struct omp_region *target)
|
||||
{
|
||||
if (!hsa_gen_requested_p ())
|
||||
return;
|
||||
|
||||
gomp_target *tgt_stmt = as_a <gomp_target *> (last_stmt (target->entry));
|
||||
struct omp_region **pp;
|
||||
|
||||
for (pp = &target->inner; *pp; pp = &(*pp)->next)
|
||||
if ((*pp)->type == GIMPLE_OMP_GRID_BODY)
|
||||
break;
|
||||
|
||||
struct omp_region *gpukernel = *pp;
|
||||
|
||||
tree orig_child_fndecl = gimple_omp_target_child_fn (tgt_stmt);
|
||||
if (!gpukernel)
|
||||
{
|
||||
/* HSA cannot handle OACC stuff. */
|
||||
if (gimple_omp_target_kind (tgt_stmt) != GF_OMP_TARGET_KIND_REGION)
|
||||
return;
|
||||
gcc_checking_assert (orig_child_fndecl);
|
||||
gcc_assert (!omp_find_clause (gimple_omp_target_clauses (tgt_stmt),
|
||||
OMP_CLAUSE__GRIDDIM_));
|
||||
cgraph_node *n = cgraph_node::get (orig_child_fndecl);
|
||||
|
||||
hsa_register_kernel (n);
|
||||
return;
|
||||
}
|
||||
|
||||
gcc_assert (omp_find_clause (gimple_omp_target_clauses (tgt_stmt),
|
||||
OMP_CLAUSE__GRIDDIM_));
|
||||
tree inside_block
|
||||
= gimple_block (first_stmt (single_succ (gpukernel->entry)));
|
||||
*pp = gpukernel->next;
|
||||
for (pp = &gpukernel->inner; *pp; pp = &(*pp)->next)
|
||||
if ((*pp)->type == GIMPLE_OMP_FOR)
|
||||
break;
|
||||
|
||||
struct omp_region *kfor = *pp;
|
||||
gcc_assert (kfor);
|
||||
gomp_for *for_stmt = as_a <gomp_for *> (last_stmt (kfor->entry));
|
||||
gcc_assert (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
*pp = kfor->next;
|
||||
if (kfor->inner)
|
||||
{
|
||||
if (gimple_omp_for_grid_group_iter (for_stmt))
|
||||
{
|
||||
struct omp_region **next_pp;
|
||||
for (pp = &kfor->inner; *pp; pp = next_pp)
|
||||
{
|
||||
next_pp = &(*pp)->next;
|
||||
if ((*pp)->type != GIMPLE_OMP_FOR)
|
||||
continue;
|
||||
gomp_for *inner = as_a <gomp_for *> (last_stmt ((*pp)->entry));
|
||||
gcc_assert (gimple_omp_for_kind (inner)
|
||||
== GF_OMP_FOR_KIND_GRID_LOOP);
|
||||
grid_expand_omp_for_loop (*pp, true);
|
||||
*pp = (*pp)->next;
|
||||
next_pp = pp;
|
||||
}
|
||||
}
|
||||
expand_omp (kfor->inner);
|
||||
}
|
||||
if (gpukernel->inner)
|
||||
expand_omp (gpukernel->inner);
|
||||
|
||||
tree kern_fndecl = copy_node (orig_child_fndecl);
|
||||
DECL_NAME (kern_fndecl) = clone_function_name_numbered (kern_fndecl,
|
||||
"kernel");
|
||||
SET_DECL_ASSEMBLER_NAME (kern_fndecl, DECL_NAME (kern_fndecl));
|
||||
tree tgtblock = gimple_block (tgt_stmt);
|
||||
tree fniniblock = make_node (BLOCK);
|
||||
BLOCK_ABSTRACT_ORIGIN (fniniblock) = BLOCK_ORIGIN (tgtblock);
|
||||
BLOCK_SOURCE_LOCATION (fniniblock) = BLOCK_SOURCE_LOCATION (tgtblock);
|
||||
BLOCK_SOURCE_END_LOCATION (fniniblock) = BLOCK_SOURCE_END_LOCATION (tgtblock);
|
||||
BLOCK_SUPERCONTEXT (fniniblock) = kern_fndecl;
|
||||
DECL_INITIAL (kern_fndecl) = fniniblock;
|
||||
push_struct_function (kern_fndecl);
|
||||
cfun->function_end_locus = gimple_location (tgt_stmt);
|
||||
init_tree_ssa (cfun);
|
||||
pop_cfun ();
|
||||
|
||||
tree old_parm_decl = DECL_ARGUMENTS (kern_fndecl);
|
||||
gcc_assert (!DECL_CHAIN (old_parm_decl));
|
||||
tree new_parm_decl = copy_node (DECL_ARGUMENTS (kern_fndecl));
|
||||
DECL_CONTEXT (new_parm_decl) = kern_fndecl;
|
||||
DECL_ARGUMENTS (kern_fndecl) = new_parm_decl;
|
||||
gcc_assert (VOID_TYPE_P (TREE_TYPE (DECL_RESULT (kern_fndecl))));
|
||||
DECL_RESULT (kern_fndecl) = copy_node (DECL_RESULT (kern_fndecl));
|
||||
DECL_CONTEXT (DECL_RESULT (kern_fndecl)) = kern_fndecl;
|
||||
struct function *kern_cfun = DECL_STRUCT_FUNCTION (kern_fndecl);
|
||||
kern_cfun->curr_properties = cfun->curr_properties;
|
||||
|
||||
grid_expand_omp_for_loop (kfor, false);
|
||||
|
||||
/* Remove the omp for statement. */
|
||||
gimple_stmt_iterator gsi = gsi_last_nondebug_bb (gpukernel->entry);
|
||||
gsi_remove (&gsi, true);
|
||||
/* Replace the GIMPLE_OMP_RETURN at the end of the kernel region with a real
|
||||
return. */
|
||||
gsi = gsi_last_nondebug_bb (gpukernel->exit);
|
||||
gcc_assert (!gsi_end_p (gsi)
|
||||
&& gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
|
||||
gimple *ret_stmt = gimple_build_return (NULL);
|
||||
gsi_insert_after (&gsi, ret_stmt, GSI_SAME_STMT);
|
||||
gsi_remove (&gsi, true);
|
||||
|
||||
/* Statements in the first BB in the target construct have been produced by
|
||||
target lowering and must be copied inside the GPUKERNEL, with the two
|
||||
exceptions of the first OMP statement and the OMP_DATA assignment
|
||||
statement. */
|
||||
gsi = gsi_start_bb (single_succ (gpukernel->entry));
|
||||
tree data_arg = gimple_omp_target_data_arg (tgt_stmt);
|
||||
tree sender = data_arg ? TREE_VEC_ELT (data_arg, 0) : NULL;
|
||||
for (gimple_stmt_iterator tsi = gsi_start_bb (single_succ (target->entry));
|
||||
!gsi_end_p (tsi); gsi_next (&tsi))
|
||||
{
|
||||
gimple *stmt = gsi_stmt (tsi);
|
||||
if (is_gimple_omp (stmt))
|
||||
break;
|
||||
if (sender
|
||||
&& is_gimple_assign (stmt)
|
||||
&& TREE_CODE (gimple_assign_rhs1 (stmt)) == ADDR_EXPR
|
||||
&& TREE_OPERAND (gimple_assign_rhs1 (stmt), 0) == sender)
|
||||
continue;
|
||||
gimple *copy = gimple_copy (stmt);
|
||||
gsi_insert_before (&gsi, copy, GSI_SAME_STMT);
|
||||
gimple_set_block (copy, fniniblock);
|
||||
}
|
||||
|
||||
move_sese_region_to_fn (kern_cfun, single_succ (gpukernel->entry),
|
||||
gpukernel->exit, inside_block);
|
||||
|
||||
cgraph_node *kcn = cgraph_node::get_create (kern_fndecl);
|
||||
kcn->mark_force_output ();
|
||||
cgraph_node *orig_child = cgraph_node::get (orig_child_fndecl);
|
||||
|
||||
hsa_register_kernel (kcn, orig_child);
|
||||
|
||||
cgraph_node::add_new_function (kern_fndecl, true);
|
||||
push_cfun (kern_cfun);
|
||||
cgraph_edge::rebuild_edges ();
|
||||
|
||||
/* Re-map any mention of the PARM_DECL of the original function to the
|
||||
PARM_DECL of the new one.
|
||||
|
||||
TODO: It would be great if lowering produced references into the GPU
|
||||
kernel decl straight away and we did not have to do this. */
|
||||
struct grid_arg_decl_map adm;
|
||||
adm.old_arg = old_parm_decl;
|
||||
adm.new_arg = new_parm_decl;
|
||||
basic_block bb;
|
||||
FOR_EACH_BB_FN (bb, kern_cfun)
|
||||
{
|
||||
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
|
||||
{
|
||||
gimple *stmt = gsi_stmt (gsi);
|
||||
struct walk_stmt_info wi;
|
||||
memset (&wi, 0, sizeof (wi));
|
||||
wi.info = &adm;
|
||||
walk_gimple_op (stmt, grid_remap_kernel_arg_accesses, &wi);
|
||||
}
|
||||
}
|
||||
pop_cfun ();
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
/* Expand the parallel region tree rooted at REGION. Expansion
|
||||
proceeds in depth-first order. Innermost regions are expanded
|
||||
first. This way, parallel regions that require a new function to
|
||||
|
@ -9666,8 +9214,6 @@ expand_omp (struct omp_region *region)
|
|||
region. */
|
||||
if (region->type == GIMPLE_OMP_PARALLEL)
|
||||
determine_parallel_type (region);
|
||||
else if (region->type == GIMPLE_OMP_TARGET)
|
||||
grid_expand_target_grid_body (region);
|
||||
|
||||
if (region->type == GIMPLE_OMP_FOR
|
||||
&& gimple_omp_for_combined_p (last_stmt (region->entry)))
|
||||
|
@ -10039,7 +9585,6 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
|
|||
case GIMPLE_OMP_TASKGROUP:
|
||||
case GIMPLE_OMP_CRITICAL:
|
||||
case GIMPLE_OMP_SECTION:
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
cur_region = new_omp_region (bb, code, cur_region);
|
||||
fallthru = true;
|
||||
break;
|
||||
|
@ -10181,5 +9726,3 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
|
|||
|
||||
return fallthru;
|
||||
}
|
||||
|
||||
#include "gt-omp-expand.h"
|
||||
|
|
|
@ -39,7 +39,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
#include "cgraph.h"
|
||||
#include "alloc-pool.h"
|
||||
#include "symbol-summary.h"
|
||||
#include "hsa-common.h"
|
||||
#include "tree-pass.h"
|
||||
#include "omp-device-properties.h"
|
||||
#include "tree-iterator.h"
|
||||
|
@ -1052,14 +1051,12 @@ omp_offload_device_kind_arch_isa (const char *props, const char *prop)
|
|||
static bool
|
||||
omp_maybe_offloaded (void)
|
||||
{
|
||||
if (!hsa_gen_requested_p ())
|
||||
{
|
||||
if (!ENABLE_OFFLOADING)
|
||||
return false;
|
||||
const char *names = getenv ("OFFLOAD_TARGET_NAMES");
|
||||
if (names == NULL || *names == '\0')
|
||||
return false;
|
||||
}
|
||||
if (!ENABLE_OFFLOADING)
|
||||
return false;
|
||||
const char *names = getenv ("OFFLOAD_TARGET_NAMES");
|
||||
if (names == NULL || *names == '\0')
|
||||
return false;
|
||||
|
||||
if (symtab->state == PARSING)
|
||||
/* Maybe. */
|
||||
return true;
|
||||
|
@ -1234,12 +1231,6 @@ omp_context_selector_matches (tree ctx)
|
|||
also offloading values. */
|
||||
if (!omp_maybe_offloaded ())
|
||||
return 0;
|
||||
if (strcmp (arch, "hsa") == 0
|
||||
&& hsa_gen_requested_p ())
|
||||
{
|
||||
ret = -1;
|
||||
continue;
|
||||
}
|
||||
if (ENABLE_OFFLOADING)
|
||||
{
|
||||
const char *arches = omp_offload_device_arch;
|
||||
|
@ -1360,12 +1351,6 @@ omp_context_selector_matches (tree ctx)
|
|||
also offloading values. */
|
||||
if (!omp_maybe_offloaded ())
|
||||
return 0;
|
||||
if (strcmp (prop, "gpu") == 0
|
||||
&& hsa_gen_requested_p ())
|
||||
{
|
||||
ret = -1;
|
||||
continue;
|
||||
}
|
||||
if (ENABLE_OFFLOADING)
|
||||
{
|
||||
const char *kinds = omp_offload_device_kind;
|
||||
|
|
1419
gcc/omp-grid.c
1419
gcc/omp-grid.c
File diff suppressed because it is too large
Load diff
|
@ -1,27 +0,0 @@
|
|||
/* Lowering and expansion of OpenMP directives for HSA GPU agents.
|
||||
|
||||
Copyright (C) 2013-2020 Free Software Foundation, Inc.
|
||||
|
||||
This file is part of GCC.
|
||||
|
||||
GCC is free software; you can redistribute it and/or modify it under
|
||||
the terms of the GNU General Public License as published by the Free
|
||||
Software Foundation; either version 3, or (at your option) any later
|
||||
version.
|
||||
|
||||
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
|
||||
WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
||||
for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with GCC; see the file COPYING3. If not see
|
||||
<http://www.gnu.org/licenses/>. */
|
||||
|
||||
#ifndef GCC_OMP_GRID_H
|
||||
#define GCC_OMP_GRID_H
|
||||
|
||||
extern tree omp_grid_lastprivate_predicate (struct omp_for_data *fd);
|
||||
extern void omp_grid_gridify_all_targets (gimple_seq *body_p);
|
||||
|
||||
#endif /* GCC_OMP_GRID_H */
|
214
gcc/omp-low.c
214
gcc/omp-low.c
|
@ -50,7 +50,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
#include "splay-tree.h"
|
||||
#include "omp-general.h"
|
||||
#include "omp-low.h"
|
||||
#include "omp-grid.h"
|
||||
#include "gimple-low.h"
|
||||
#include "alloc-pool.h"
|
||||
#include "symbol-summary.h"
|
||||
|
@ -58,7 +57,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
#include "context.h"
|
||||
#include "gomp-constants.h"
|
||||
#include "gimple-pretty-print.h"
|
||||
#include "hsa-common.h"
|
||||
#include "stringpool.h"
|
||||
#include "attribs.h"
|
||||
|
||||
|
@ -681,15 +679,7 @@ build_outer_var_ref (tree var, omp_context *ctx,
|
|||
}
|
||||
}
|
||||
else if (outer)
|
||||
{
|
||||
if (gimple_code (outer->stmt) == GIMPLE_OMP_GRID_BODY)
|
||||
{
|
||||
outer = outer->outer;
|
||||
gcc_assert (outer
|
||||
&& gimple_code (outer->stmt) != GIMPLE_OMP_GRID_BODY);
|
||||
}
|
||||
x = lookup_decl (var, outer);
|
||||
}
|
||||
x = lookup_decl (var, outer);
|
||||
else if (omp_is_reference (var))
|
||||
/* This can happen with orphaned constructs. If var is reference, it is
|
||||
possible it is shared and as such valid. */
|
||||
|
@ -1460,14 +1450,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
|
|||
}
|
||||
break;
|
||||
|
||||
case OMP_CLAUSE__GRIDDIM_:
|
||||
if (ctx->outer)
|
||||
{
|
||||
scan_omp_op (&OMP_CLAUSE__GRIDDIM__SIZE (c), ctx->outer);
|
||||
scan_omp_op (&OMP_CLAUSE__GRIDDIM__GROUP (c), ctx->outer);
|
||||
}
|
||||
break;
|
||||
|
||||
case OMP_CLAUSE_ORDER:
|
||||
ctx->order_concurrent = true;
|
||||
break;
|
||||
|
@ -1698,7 +1680,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
|
|||
case OMP_CLAUSE_AUTO:
|
||||
case OMP_CLAUSE_SEQ:
|
||||
case OMP_CLAUSE_TILE:
|
||||
case OMP_CLAUSE__GRIDDIM_:
|
||||
case OMP_CLAUSE__SIMT_:
|
||||
case OMP_CLAUSE_IF_PRESENT:
|
||||
case OMP_CLAUSE_FINALIZE:
|
||||
|
@ -2021,11 +2002,8 @@ scan_omp_parallel (gimple_stmt_iterator *gsi, omp_context *outer_ctx)
|
|||
DECL_NAMELESS (name) = 1;
|
||||
TYPE_NAME (ctx->record_type) = name;
|
||||
TYPE_ARTIFICIAL (ctx->record_type) = 1;
|
||||
if (!gimple_omp_parallel_grid_phony (stmt))
|
||||
{
|
||||
create_omp_child_function (ctx, false);
|
||||
gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
|
||||
}
|
||||
create_omp_child_function (ctx, false);
|
||||
gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
|
||||
|
||||
scan_sharing_clauses (gimple_omp_parallel_clauses (stmt), ctx);
|
||||
scan_omp (gimple_omp_body_ptr (stmt), ctx);
|
||||
|
@ -2801,11 +2779,6 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
|
|||
{
|
||||
tree c;
|
||||
|
||||
if (ctx && gimple_code (ctx->stmt) == GIMPLE_OMP_GRID_BODY)
|
||||
/* GRID_BODY is an artificial construct, nesting rules will be checked in
|
||||
the original copy of its contents. */
|
||||
return true;
|
||||
|
||||
/* No nesting of non-OpenACC STMT (that is, an OpenMP one, or a GOMP builtin)
|
||||
inside an OpenACC CTX. */
|
||||
if (!(is_gimple_omp (stmt)
|
||||
|
@ -2891,7 +2864,6 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
|
|||
{
|
||||
if ((gimple_code (stmt) != GIMPLE_OMP_FOR
|
||||
|| (gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_DISTRIBUTE
|
||||
&& gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_GRID_LOOP
|
||||
&& omp_find_clause (gimple_omp_for_clauses (stmt),
|
||||
OMP_CLAUSE_BIND) == NULL_TREE))
|
||||
&& gimple_code (stmt) != GIMPLE_OMP_PARALLEL)
|
||||
|
@ -3783,7 +3755,6 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
|
|||
case GIMPLE_OMP_MASTER:
|
||||
case GIMPLE_OMP_ORDERED:
|
||||
case GIMPLE_OMP_CRITICAL:
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
ctx = new_omp_context (stmt, ctx);
|
||||
scan_omp (gimple_omp_body_ptr (stmt), ctx);
|
||||
break;
|
||||
|
@ -9518,65 +9489,59 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
|
|||
cond_code = EQ_EXPR;
|
||||
}
|
||||
|
||||
if (gimple_omp_for_kind (fd->for_stmt) == GF_OMP_FOR_KIND_GRID_LOOP
|
||||
|| gimple_omp_for_grid_phony (fd->for_stmt))
|
||||
cond = omp_grid_lastprivate_predicate (fd);
|
||||
else
|
||||
tree n2 = fd->loop.n2;
|
||||
if (fd->collapse > 1
|
||||
&& TREE_CODE (n2) != INTEGER_CST
|
||||
&& gimple_omp_for_combined_into_p (fd->for_stmt))
|
||||
{
|
||||
tree n2 = fd->loop.n2;
|
||||
if (fd->collapse > 1
|
||||
&& TREE_CODE (n2) != INTEGER_CST
|
||||
&& gimple_omp_for_combined_into_p (fd->for_stmt))
|
||||
struct omp_context *taskreg_ctx = NULL;
|
||||
if (gimple_code (ctx->outer->stmt) == GIMPLE_OMP_FOR)
|
||||
{
|
||||
struct omp_context *taskreg_ctx = NULL;
|
||||
if (gimple_code (ctx->outer->stmt) == GIMPLE_OMP_FOR)
|
||||
gomp_for *gfor = as_a <gomp_for *> (ctx->outer->stmt);
|
||||
if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_FOR
|
||||
|| gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_DISTRIBUTE)
|
||||
{
|
||||
gomp_for *gfor = as_a <gomp_for *> (ctx->outer->stmt);
|
||||
if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_FOR
|
||||
|| gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_DISTRIBUTE)
|
||||
if (gimple_omp_for_combined_into_p (gfor))
|
||||
{
|
||||
if (gimple_omp_for_combined_into_p (gfor))
|
||||
{
|
||||
gcc_assert (ctx->outer->outer
|
||||
&& is_parallel_ctx (ctx->outer->outer));
|
||||
taskreg_ctx = ctx->outer->outer;
|
||||
}
|
||||
else
|
||||
{
|
||||
struct omp_for_data outer_fd;
|
||||
omp_extract_for_data (gfor, &outer_fd, NULL);
|
||||
n2 = fold_convert (TREE_TYPE (n2), outer_fd.loop.n2);
|
||||
}
|
||||
gcc_assert (ctx->outer->outer
|
||||
&& is_parallel_ctx (ctx->outer->outer));
|
||||
taskreg_ctx = ctx->outer->outer;
|
||||
}
|
||||
else
|
||||
{
|
||||
struct omp_for_data outer_fd;
|
||||
omp_extract_for_data (gfor, &outer_fd, NULL);
|
||||
n2 = fold_convert (TREE_TYPE (n2), outer_fd.loop.n2);
|
||||
}
|
||||
else if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_TASKLOOP)
|
||||
taskreg_ctx = ctx->outer->outer;
|
||||
}
|
||||
else if (is_taskreg_ctx (ctx->outer))
|
||||
taskreg_ctx = ctx->outer;
|
||||
if (taskreg_ctx)
|
||||
else if (gimple_omp_for_kind (gfor) == GF_OMP_FOR_KIND_TASKLOOP)
|
||||
taskreg_ctx = ctx->outer->outer;
|
||||
}
|
||||
else if (is_taskreg_ctx (ctx->outer))
|
||||
taskreg_ctx = ctx->outer;
|
||||
if (taskreg_ctx)
|
||||
{
|
||||
int i;
|
||||
tree taskreg_clauses
|
||||
= gimple_omp_taskreg_clauses (taskreg_ctx->stmt);
|
||||
tree innerc = omp_find_clause (taskreg_clauses,
|
||||
OMP_CLAUSE__LOOPTEMP_);
|
||||
gcc_assert (innerc);
|
||||
for (i = 0; i < fd->collapse; i++)
|
||||
{
|
||||
int i;
|
||||
tree taskreg_clauses
|
||||
= gimple_omp_taskreg_clauses (taskreg_ctx->stmt);
|
||||
tree innerc = omp_find_clause (taskreg_clauses,
|
||||
OMP_CLAUSE__LOOPTEMP_);
|
||||
gcc_assert (innerc);
|
||||
for (i = 0; i < fd->collapse; i++)
|
||||
{
|
||||
innerc = omp_find_clause (OMP_CLAUSE_CHAIN (innerc),
|
||||
OMP_CLAUSE__LOOPTEMP_);
|
||||
gcc_assert (innerc);
|
||||
}
|
||||
innerc = omp_find_clause (OMP_CLAUSE_CHAIN (innerc),
|
||||
OMP_CLAUSE__LOOPTEMP_);
|
||||
if (innerc)
|
||||
n2 = fold_convert (TREE_TYPE (n2),
|
||||
lookup_decl (OMP_CLAUSE_DECL (innerc),
|
||||
taskreg_ctx));
|
||||
gcc_assert (innerc);
|
||||
}
|
||||
innerc = omp_find_clause (OMP_CLAUSE_CHAIN (innerc),
|
||||
OMP_CLAUSE__LOOPTEMP_);
|
||||
if (innerc)
|
||||
n2 = fold_convert (TREE_TYPE (n2),
|
||||
lookup_decl (OMP_CLAUSE_DECL (innerc),
|
||||
taskreg_ctx));
|
||||
}
|
||||
cond = build2 (cond_code, boolean_type_node, fd->loop.v, n2);
|
||||
}
|
||||
cond = build2 (cond_code, boolean_type_node, fd->loop.v, n2);
|
||||
|
||||
clauses = gimple_omp_for_clauses (fd->for_stmt);
|
||||
stmts = NULL;
|
||||
|
@ -10638,24 +10603,17 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
ctx);
|
||||
}
|
||||
|
||||
bool phony_loop = (gimple_omp_for_kind (stmt) != GF_OMP_FOR_KIND_GRID_LOOP
|
||||
&& gimple_omp_for_grid_phony (stmt));
|
||||
if ((ctx->scan_inclusive || ctx->scan_exclusive)
|
||||
&& gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_FOR)
|
||||
{
|
||||
gcc_assert (!phony_loop);
|
||||
lower_omp_for_scan (&body, &dlist, stmt, &fd, ctx);
|
||||
}
|
||||
lower_omp_for_scan (&body, &dlist, stmt, &fd, ctx);
|
||||
else
|
||||
{
|
||||
if (!phony_loop)
|
||||
gimple_seq_add_stmt (&body, stmt);
|
||||
gimple_seq_add_stmt (&body, stmt);
|
||||
gimple_seq_add_seq (&body, gimple_omp_body (stmt));
|
||||
}
|
||||
|
||||
if (!phony_loop)
|
||||
gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
|
||||
fd.loop.v));
|
||||
gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
|
||||
fd.loop.v));
|
||||
|
||||
/* After the loop, add exit clauses. */
|
||||
lower_reduction_clauses (gimple_omp_for_clauses (stmt), &body, &clist, ctx);
|
||||
|
@ -10684,19 +10642,16 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
|
||||
body = maybe_catch_exception (body);
|
||||
|
||||
if (!phony_loop)
|
||||
{
|
||||
/* Region exit marker goes at the end of the loop body. */
|
||||
gimple *g = gimple_build_omp_return (fd.have_nowait);
|
||||
gimple_seq_add_stmt (&body, g);
|
||||
/* Region exit marker goes at the end of the loop body. */
|
||||
gimple *g = gimple_build_omp_return (fd.have_nowait);
|
||||
gimple_seq_add_stmt (&body, g);
|
||||
|
||||
gimple_seq_add_seq (&body, tred_dlist);
|
||||
gimple_seq_add_seq (&body, tred_dlist);
|
||||
|
||||
maybe_add_implicit_barrier_cancel (ctx, g, &body);
|
||||
maybe_add_implicit_barrier_cancel (ctx, g, &body);
|
||||
|
||||
if (rclauses)
|
||||
OMP_CLAUSE_DECL (rclauses) = rtmp;
|
||||
}
|
||||
if (rclauses)
|
||||
OMP_CLAUSE_DECL (rclauses) = rtmp;
|
||||
|
||||
/* Add OpenACC joining and reduction markers just after the loop. */
|
||||
if (oacc_tail)
|
||||
|
@ -11279,14 +11234,6 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
gimple_seq par_olist = NULL;
|
||||
gimple_seq par_ilist = NULL;
|
||||
gimple_seq par_rlist = NULL;
|
||||
bool phony_construct = gimple_code (stmt) == GIMPLE_OMP_PARALLEL
|
||||
&& gimple_omp_parallel_grid_phony (as_a <gomp_parallel *> (stmt));
|
||||
if (phony_construct && ctx->record_type)
|
||||
{
|
||||
gcc_checking_assert (!ctx->receiver_decl);
|
||||
ctx->receiver_decl = create_tmp_var
|
||||
(build_reference_type (ctx->record_type), ".omp_rec");
|
||||
}
|
||||
lower_rec_input_clauses (clauses, &par_ilist, &par_olist, ctx, NULL);
|
||||
lower_omp (&par_body, ctx);
|
||||
if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
|
||||
|
@ -11345,11 +11292,8 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
gimple_seq_add_stmt (&new_body,
|
||||
gimple_build_omp_continue (integer_zero_node,
|
||||
integer_zero_node));
|
||||
if (!phony_construct)
|
||||
{
|
||||
gimple_seq_add_stmt (&new_body, gimple_build_omp_return (false));
|
||||
gimple_omp_set_body (stmt, new_body);
|
||||
}
|
||||
gimple_seq_add_stmt (&new_body, gimple_build_omp_return (false));
|
||||
gimple_omp_set_body (stmt, new_body);
|
||||
|
||||
if (dep_bind && gimple_bind_block (par_bind) == NULL_TREE)
|
||||
bind = gimple_build_bind (NULL, NULL, make_node (BLOCK));
|
||||
|
@ -11357,10 +11301,7 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind));
|
||||
gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true);
|
||||
gimple_bind_add_seq (bind, ilist);
|
||||
if (!phony_construct)
|
||||
gimple_bind_add_stmt (bind, stmt);
|
||||
else
|
||||
gimple_bind_add_seq (bind, new_body);
|
||||
gimple_bind_add_stmt (bind, stmt);
|
||||
gimple_bind_add_seq (bind, olist);
|
||||
|
||||
pop_gimplify_context (NULL);
|
||||
|
@ -12641,22 +12582,19 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
lower_omp (gimple_omp_body_ptr (teams_stmt), ctx);
|
||||
lower_reduction_clauses (gimple_omp_teams_clauses (teams_stmt), &olist,
|
||||
NULL, ctx);
|
||||
if (!gimple_omp_teams_grid_phony (teams_stmt))
|
||||
{
|
||||
gimple_seq_add_stmt (&bind_body, teams_stmt);
|
||||
location_t loc = gimple_location (teams_stmt);
|
||||
tree decl = builtin_decl_explicit (BUILT_IN_GOMP_TEAMS);
|
||||
gimple *call = gimple_build_call (decl, 2, num_teams, thread_limit);
|
||||
gimple_set_location (call, loc);
|
||||
gimple_seq_add_stmt (&bind_body, call);
|
||||
}
|
||||
gimple_seq_add_stmt (&bind_body, teams_stmt);
|
||||
|
||||
location_t loc = gimple_location (teams_stmt);
|
||||
tree decl = builtin_decl_explicit (BUILT_IN_GOMP_TEAMS);
|
||||
gimple *call = gimple_build_call (decl, 2, num_teams, thread_limit);
|
||||
gimple_set_location (call, loc);
|
||||
gimple_seq_add_stmt (&bind_body, call);
|
||||
|
||||
gimple_seq_add_seq (&bind_body, gimple_omp_body (teams_stmt));
|
||||
gimple_omp_set_body (teams_stmt, NULL);
|
||||
gimple_seq_add_seq (&bind_body, olist);
|
||||
gimple_seq_add_seq (&bind_body, dlist);
|
||||
if (!gimple_omp_teams_grid_phony (teams_stmt))
|
||||
gimple_seq_add_stmt (&bind_body, gimple_build_omp_return (true));
|
||||
gimple_seq_add_stmt (&bind_body, gimple_build_omp_return (true));
|
||||
gimple_bind_set_body (bind, bind_body);
|
||||
|
||||
pop_gimplify_context (bind);
|
||||
|
@ -12667,18 +12605,6 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
TREE_USED (block) = 1;
|
||||
}
|
||||
|
||||
/* Expand code within an artificial GIMPLE_OMP_GRID_BODY OMP construct. */
|
||||
|
||||
static void
|
||||
lower_omp_grid_body (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
||||
{
|
||||
gimple *stmt = gsi_stmt (*gsi_p);
|
||||
lower_omp (gimple_omp_body_ptr (stmt), ctx);
|
||||
gimple_seq_add_stmt (gimple_omp_body_ptr (stmt),
|
||||
gimple_build_omp_return (false));
|
||||
}
|
||||
|
||||
|
||||
/* Callback for lower_omp_1. Return non-NULL if *tp needs to be
|
||||
regimplified. If DATA is non-NULL, lower_omp_1 is outside
|
||||
of OMP context, but with task_shared_vars set. */
|
||||
|
@ -12897,11 +12823,6 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
|
|||
else
|
||||
lower_omp_teams (gsi_p, ctx);
|
||||
break;
|
||||
case GIMPLE_OMP_GRID_BODY:
|
||||
ctx = maybe_lookup_ctx (stmt);
|
||||
gcc_assert (ctx);
|
||||
lower_omp_grid_body (gsi_p, ctx);
|
||||
break;
|
||||
case GIMPLE_CALL:
|
||||
tree fndecl;
|
||||
call_stmt = as_a <gcall *> (stmt);
|
||||
|
@ -13059,9 +12980,6 @@ execute_lower_omp (void)
|
|||
|
||||
body = gimple_body (current_function_decl);
|
||||
|
||||
if (hsa_gen_requested_p ())
|
||||
omp_grid_gridify_all_targets (&body);
|
||||
|
||||
scan_omp (&body, NULL);
|
||||
gcc_assert (taskreg_nesting_level == 0);
|
||||
FOR_EACH_VEC_ELT (taskreg_contexts, i, ctx)
|
||||
|
|
31
gcc/opts.c
31
gcc/opts.c
|
@ -2484,35 +2484,8 @@ common_handle_option (struct gcc_options *opts,
|
|||
break;
|
||||
|
||||
case OPT_foffload_:
|
||||
{
|
||||
const char *p = arg;
|
||||
opts->x_flag_disable_hsa = true;
|
||||
while (*p != 0)
|
||||
{
|
||||
const char *comma = strchr (p, ',');
|
||||
|
||||
if ((strncmp (p, "disable", 7) == 0)
|
||||
&& (p[7] == ',' || p[7] == '\0'))
|
||||
{
|
||||
opts->x_flag_disable_hsa = true;
|
||||
break;
|
||||
}
|
||||
|
||||
if ((strncmp (p, "hsa", 3) == 0)
|
||||
&& (p[3] == ',' || p[3] == '\0'))
|
||||
{
|
||||
#ifdef ENABLE_HSA
|
||||
opts->x_flag_disable_hsa = false;
|
||||
#else
|
||||
sorry ("HSA has not been enabled during configuration");
|
||||
#endif
|
||||
}
|
||||
if (!comma)
|
||||
break;
|
||||
p = comma + 1;
|
||||
}
|
||||
break;
|
||||
}
|
||||
/* Deferred. */
|
||||
break;
|
||||
|
||||
#ifndef ACCEL_COMPILER
|
||||
case OPT_foffload_abi_:
|
||||
|
|
|
@ -170,10 +170,6 @@ The number of most executed permilles of the profiled execution of the entire pr
|
|||
Common Joined UInteger Var(param_hot_bb_frequency_fraction) Init(1000) Param
|
||||
The denominator n of fraction 1/n of the execution frequency of the entry block of a function that a basic block of this function needs to at least have in order to be considered hot.
|
||||
|
||||
-param=hsa-gen-debug-stores=
|
||||
Common Joined UInteger Var(param_hsa_gen_debug_stores) IntegerRange(0, 1) Param
|
||||
Level of hsa debug stores verbosity.
|
||||
|
||||
-param=inline-heuristics-hint-percent=
|
||||
Common Joined UInteger Var(param_inline_heuristics_hint_percent) Init(200) Optimization IntegerRange(100, 1000000) Param
|
||||
The scale (in percents) applied to inline-insns-single and auto limits when heuristics hints that inlining is very profitable.
|
||||
|
|
|
@ -153,7 +153,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
NEXT_PASS (pass_ipa_cp);
|
||||
NEXT_PASS (pass_ipa_sra);
|
||||
NEXT_PASS (pass_ipa_cdtor_merge);
|
||||
NEXT_PASS (pass_ipa_hsa);
|
||||
NEXT_PASS (pass_ipa_fn_summary);
|
||||
NEXT_PASS (pass_ipa_inline);
|
||||
NEXT_PASS (pass_ipa_pure_const);
|
||||
|
@ -402,7 +401,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
NEXT_PASS (pass_gimple_isel);
|
||||
NEXT_PASS (pass_cleanup_cfg_post_optimizing);
|
||||
NEXT_PASS (pass_warn_function_noreturn);
|
||||
NEXT_PASS (pass_gen_hsail);
|
||||
|
||||
NEXT_PASS (pass_expand);
|
||||
|
||||
|
|
|
@ -1,54 +0,0 @@
|
|||
/* { dg-do compile } */
|
||||
/* { dg-require-effective-target offload_hsa } */
|
||||
/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
|
||||
|
||||
void
|
||||
foo1 (int n, int *a, int workgroup_size)
|
||||
{
|
||||
int i;
|
||||
#pragma omp target
|
||||
#pragma omp teams thread_limit(workgroup_size)
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i)
|
||||
for (i = 0; i < n; i++)
|
||||
a[i]++;
|
||||
}
|
||||
|
||||
void
|
||||
foo2 (int j, int n, int *a)
|
||||
{
|
||||
int i;
|
||||
#pragma omp target teams
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j)
|
||||
for (i = j + 1; i < n; i++)
|
||||
a[i] = i;
|
||||
}
|
||||
|
||||
void
|
||||
foo3 (int j, int n, int *a)
|
||||
{
|
||||
int i;
|
||||
#pragma omp target teams
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j)
|
||||
for (i = j + 1; i < n; i += 3)
|
||||
a[i] = i;
|
||||
}
|
||||
|
||||
void
|
||||
foo4 (int j, int n, int *a)
|
||||
{
|
||||
#pragma omp parallel
|
||||
{
|
||||
#pragma omp single
|
||||
{
|
||||
int i;
|
||||
#pragma omp target
|
||||
#pragma omp teams
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j)
|
||||
for (i = j + 1; i < n; i += 3)
|
||||
a[i] = i;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/* { dg-final { scan-tree-dump-times "Target construct will be turned into a gridified HSA kernel" 4 "omplower" } } */
|
|
@ -1,66 +0,0 @@
|
|||
/* { dg-do compile } */
|
||||
/* { dg-require-effective-target offload_hsa } */
|
||||
/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
|
||||
|
||||
#define BLOCK_SIZE 16
|
||||
|
||||
|
||||
void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
|
||||
const float*B, const int LDB, const float beta, float*C, const int LDC){
|
||||
|
||||
#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
|
||||
#pragma omp distribute collapse(2)
|
||||
for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
|
||||
for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
|
||||
{
|
||||
// Each team has a local copy of these mini matrices
|
||||
float As[BLOCK_SIZE][BLOCK_SIZE];
|
||||
float Bs[BLOCK_SIZE][BLOCK_SIZE];
|
||||
#pragma omp parallel
|
||||
{
|
||||
int C_row, C_col;
|
||||
float Cval = 0.0;
|
||||
|
||||
for (int kblock = 0; kblock < K ; kblock += BLOCK_SIZE )
|
||||
{
|
||||
#pragma omp for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (kblock + col < K))
|
||||
As[row][col] = A[(C_row*LDA)+ kblock + col];
|
||||
else
|
||||
As[row][col] = 0;
|
||||
if ((kblock + row < K) && C_col < N)
|
||||
Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
|
||||
else
|
||||
Bs[row][col] = 0;
|
||||
}
|
||||
|
||||
#pragma omp for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
for (int e = 0; e < BLOCK_SIZE; ++e)
|
||||
Cval += As[row][e] * Bs[e][col];
|
||||
}
|
||||
} /* End for kblock .. */
|
||||
|
||||
|
||||
#pragma omp for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (C_col < N))
|
||||
C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];
|
||||
|
||||
}
|
||||
} /* end parallel */
|
||||
} /* end target teams distribute */
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump "Target construct will be turned into a gridified HSA kernel" "omplower" } } */
|
|
@ -1,68 +0,0 @@
|
|||
/* { dg-do compile } */
|
||||
/* { dg-require-effective-target offload_hsa } */
|
||||
/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
|
||||
|
||||
#define BLOCK_SIZE 16
|
||||
|
||||
void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
|
||||
const float*B, const int LDB, const float beta, float*C, const int LDC)
|
||||
{
|
||||
#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
|
||||
#pragma omp distribute collapse(2)
|
||||
for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
|
||||
for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
|
||||
{
|
||||
float As[BLOCK_SIZE][BLOCK_SIZE];
|
||||
float Bs[BLOCK_SIZE][BLOCK_SIZE];
|
||||
float Cs[BLOCK_SIZE][BLOCK_SIZE];
|
||||
int C_row, C_col;
|
||||
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
Cs[row][col] = 0.0;
|
||||
}
|
||||
|
||||
|
||||
for (int kblock = 0; kblock < K ; kblock += BLOCK_SIZE )
|
||||
{
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (kblock + col < K))
|
||||
As[row][col] = A[(C_row*LDA)+ kblock + col];
|
||||
else
|
||||
As[row][col] = 0;
|
||||
if ((kblock + row < K) && C_col < N)
|
||||
Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
|
||||
else
|
||||
Bs[row][col] = 0;
|
||||
}
|
||||
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
for (int e = 0; e < BLOCK_SIZE; ++e)
|
||||
Cs[row][col] += As[row][e] * Bs[e][col];
|
||||
}
|
||||
} /* End for kblock .. */
|
||||
|
||||
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (C_col < N))
|
||||
C[(C_row*LDC)+C_col] = alpha*Cs[row][col] + beta*C[(C_row*LDC)+C_col];
|
||||
}
|
||||
} /* End distribute */
|
||||
}
|
||||
|
||||
/* { dg-final { scan-tree-dump "Target construct will be turned into a gridified HSA kernel" "omplower" } } */
|
|
@ -1,24 +0,0 @@
|
|||
/* Instead of ICE, we'd like "HSA does not implement indirect calls". */
|
||||
|
||||
/* Reduced from 'libgomp.c/target-39.c'. */
|
||||
|
||||
/* { dg-require-effective-target offload_hsa } */
|
||||
/* { dg-additional-options "-Whsa" } to override '{gcc,g++}.dg/gomp/gomp.exp'. */
|
||||
|
||||
typedef void (*fnp) (void);
|
||||
void f1 (void) { }
|
||||
fnp f2 (void) { return f1; }
|
||||
#pragma omp declare target to (f1, f2)
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
#pragma omp target
|
||||
{
|
||||
fnp fnp = f2 ();
|
||||
fnp (); /* { dg-message "note: support for HSA does not implement indirect calls" } */
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* { dg-warning "could not emit HSAIL for the function" "" { target *-*-* } 0 } */
|
|
@ -29,7 +29,7 @@ dg-init
|
|||
# Main loop.
|
||||
g++-dg-runtest [lsort [concat \
|
||||
[find $srcdir/$subdir *.C] \
|
||||
[find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp -Wno-hsa"
|
||||
[find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp"
|
||||
|
||||
# All done.
|
||||
dg-finish
|
||||
|
|
|
@ -31,7 +31,7 @@ dg-init
|
|||
# Main loop.
|
||||
dg-runtest [lsort [concat \
|
||||
[find $srcdir/$subdir *.c] \
|
||||
[find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp -Wno-hsa"
|
||||
[find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp"
|
||||
|
||||
# All done.
|
||||
dg-finish
|
||||
|
|
|
@ -30,7 +30,7 @@ dg-init
|
|||
|
||||
# Main loop.
|
||||
gfortran-dg-runtest [lsort \
|
||||
[find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "" "-fopenmp -Wno-hsa"
|
||||
[find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "" "-fopenmp"
|
||||
|
||||
# All done.
|
||||
dg-finish
|
||||
|
|
|
@ -1,16 +0,0 @@
|
|||
! { dg-do compile }
|
||||
! { dg-require-effective-target offload_hsa }
|
||||
! { dg-options "-fopenmp -fdump-tree-omplower-details" } */
|
||||
|
||||
subroutine vector_square(n, a, b)
|
||||
integer i, n, b(n), a(n)
|
||||
!$omp target teams
|
||||
!$omp distribute parallel do
|
||||
do i=1,n
|
||||
b(i) = a(i) * a(i)
|
||||
enddo
|
||||
!$omp end distribute parallel do
|
||||
!$omp end target teams
|
||||
end subroutine vector_square
|
||||
|
||||
! { dg-final { scan-tree-dump "Target construct will be turned into a gridified HSA kernel" "omplower" } }
|
|
@ -9858,14 +9858,6 @@ proc check_effective_target_offload_nvptx { } {
|
|||
} "-foffload=nvptx-none" ]
|
||||
}
|
||||
|
||||
# Return 1 if the compiler has been configured with hsa offloading.
|
||||
|
||||
proc check_effective_target_offload_hsa { } {
|
||||
return [check_no_compiler_messages offload_hsa assembly {
|
||||
int main () {return 0;}
|
||||
} "-foffload=hsa" ]
|
||||
}
|
||||
|
||||
# Return 1 if the compiler has been configured with gcn offloading.
|
||||
|
||||
proc check_effective_target_offload_gcn { } {
|
||||
|
|
|
@ -99,7 +99,6 @@ DEFTIMEVAR (TV_WHOPR_WPA_IO , "whopr wpa I/O")
|
|||
DEFTIMEVAR (TV_WHOPR_PARTITIONING , "whopr partitioning")
|
||||
DEFTIMEVAR (TV_WHOPR_LTRANS , "whopr ltrans")
|
||||
DEFTIMEVAR (TV_IPA_REFERENCE , "ipa reference")
|
||||
DEFTIMEVAR (TV_IPA_HSA , "ipa HSA")
|
||||
DEFTIMEVAR (TV_IPA_PROFILE , "ipa profile")
|
||||
DEFTIMEVAR (TV_IPA_AUTOFDO , "auto profile")
|
||||
DEFTIMEVAR (TV_IPA_PURE_CONST , "ipa pure const")
|
||||
|
|
|
@ -77,7 +77,6 @@ along with GCC; see the file COPYING3. If not see
|
|||
#include "ipa-prop.h"
|
||||
#include "gcse.h"
|
||||
#include "omp-offload.h"
|
||||
#include "hsa-common.h"
|
||||
#include "edit-context.h"
|
||||
#include "tree-pass.h"
|
||||
#include "dumpfile.h"
|
||||
|
@ -512,8 +511,6 @@ compile_file (void)
|
|||
|
||||
omp_finish_file ();
|
||||
|
||||
hsa_output_brig ();
|
||||
|
||||
output_shared_constant_pool ();
|
||||
output_object_blocks ();
|
||||
finish_tm_clone_pairs ();
|
||||
|
|
|
@ -488,10 +488,6 @@ enum omp_clause_code {
|
|||
/* OpenACC clause: tile ( size-expr-list ). */
|
||||
OMP_CLAUSE_TILE,
|
||||
|
||||
/* OpenMP internal-only clause to specify grid dimensions of a gridified
|
||||
kernel. */
|
||||
OMP_CLAUSE__GRIDDIM_,
|
||||
|
||||
/* OpenACC clause: if_present. */
|
||||
OMP_CLAUSE_IF_PRESENT,
|
||||
|
||||
|
@ -1557,9 +1553,6 @@ struct GTY(()) tree_omp_clause {
|
|||
enum omp_clause_defaultmap_kind defaultmap_kind;
|
||||
enum omp_clause_bind_kind bind_kind;
|
||||
enum omp_clause_device_type_kind device_type_kind;
|
||||
/* The dimension a OMP_CLAUSE__GRIDDIM_ clause of a gridified target
|
||||
construct describes. */
|
||||
unsigned int dimension;
|
||||
} GTY ((skip)) subcode;
|
||||
|
||||
/* The gimplification of OMP_CLAUSE_REDUCTION_{INIT,MERGE} for omp-low's
|
||||
|
|
|
@ -1394,7 +1394,6 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
|
|||
case OMP_CLAUSE__LOOPTEMP_:
|
||||
case OMP_CLAUSE__REDUCTEMP_:
|
||||
case OMP_CLAUSE__SIMDUID_:
|
||||
case OMP_CLAUSE__GRIDDIM_:
|
||||
case OMP_CLAUSE__SIMT_:
|
||||
/* Anything else. */
|
||||
default:
|
||||
|
@ -2137,7 +2136,6 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
|
|||
case OMP_CLAUSE__LOOPTEMP_:
|
||||
case OMP_CLAUSE__REDUCTEMP_:
|
||||
case OMP_CLAUSE__SIMDUID_:
|
||||
case OMP_CLAUSE__GRIDDIM_:
|
||||
case OMP_CLAUSE__SIMT_:
|
||||
/* Anything else. */
|
||||
default:
|
||||
|
|
|
@ -474,7 +474,6 @@ extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
|
|||
extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
|
||||
extern simple_ipa_opt_pass *make_pass_ipa_oacc (gcc::context *ctxt);
|
||||
extern simple_ipa_opt_pass *make_pass_ipa_oacc_kernels (gcc::context *ctxt);
|
||||
extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt);
|
||||
extern gimple_opt_pass *make_pass_warn_nonnull_compare (gcc::context *ctxt);
|
||||
extern gimple_opt_pass *make_pass_sprintf_length (gcc::context *ctxt);
|
||||
extern gimple_opt_pass *make_pass_walloca (gcc::context *ctxt);
|
||||
|
@ -508,7 +507,6 @@ extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
|
|||
extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
|
||||
extern ipa_opt_pass_d *make_pass_ipa_odr (gcc::context *ctxt);
|
||||
extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
|
||||
extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt);
|
||||
extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
|
||||
extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
|
||||
extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
|
||||
|
|
|
@ -1246,17 +1246,6 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
|
|||
pp_right_paren (pp);
|
||||
break;
|
||||
|
||||
case OMP_CLAUSE__GRIDDIM_:
|
||||
pp_string (pp, "_griddim_(");
|
||||
pp_unsigned_wide_integer (pp, OMP_CLAUSE__GRIDDIM__DIMENSION (clause));
|
||||
pp_colon (pp);
|
||||
dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__SIZE (clause), spc, flags,
|
||||
false);
|
||||
pp_comma (pp);
|
||||
dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__GROUP (clause), spc, flags,
|
||||
false);
|
||||
pp_right_paren (pp);
|
||||
break;
|
||||
case OMP_CLAUSE_IF_PRESENT:
|
||||
pp_string (pp, "if_present");
|
||||
break;
|
||||
|
|
|
@ -357,7 +357,6 @@ unsigned const char omp_clause_num_ops[] =
|
|||
1, /* OMP_CLAUSE_NUM_WORKERS */
|
||||
1, /* OMP_CLAUSE_VECTOR_LENGTH */
|
||||
3, /* OMP_CLAUSE_TILE */
|
||||
2, /* OMP_CLAUSE__GRIDDIM_ */
|
||||
0, /* OMP_CLAUSE_IF_PRESENT */
|
||||
0, /* OMP_CLAUSE_FINALIZE */
|
||||
};
|
||||
|
@ -442,7 +441,6 @@ const char * const omp_clause_code_name[] =
|
|||
"num_workers",
|
||||
"vector_length",
|
||||
"tile",
|
||||
"_griddim_",
|
||||
"if_present",
|
||||
"finalize",
|
||||
};
|
||||
|
@ -12098,7 +12096,6 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
|
|||
switch (OMP_CLAUSE_CODE (*tp))
|
||||
{
|
||||
case OMP_CLAUSE_GANG:
|
||||
case OMP_CLAUSE__GRIDDIM_:
|
||||
WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 1));
|
||||
/* FALLTHRU */
|
||||
|
||||
|
|
|
@ -1779,14 +1779,6 @@ class auto_suppress_location_wrappers
|
|||
#define OMP_CLAUSE_TILE_COUNT(NODE) \
|
||||
OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 2)
|
||||
|
||||
#define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \
|
||||
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
|
||||
->omp_clause.subcode.dimension)
|
||||
#define OMP_CLAUSE__GRIDDIM__SIZE(NODE) \
|
||||
OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0)
|
||||
#define OMP_CLAUSE__GRIDDIM__GROUP(NODE) \
|
||||
OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 1)
|
||||
|
||||
/* _CONDTEMP_ holding temporary with iteration count. */
|
||||
#define OMP_CLAUSE__CONDTEMP__ITER(NODE) \
|
||||
(OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CONDTEMP_)->base.public_flag)
|
||||
|
|
|
@ -243,7 +243,6 @@ enum gomp_map_kind
|
|||
#define GOMP_VERSION 1
|
||||
#define GOMP_VERSION_NVIDIA_PTX 1
|
||||
#define GOMP_VERSION_INTEL_MIC 0
|
||||
#define GOMP_VERSION_HSA 0
|
||||
#define GOMP_VERSION_GCN 1
|
||||
|
||||
#define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# Makefile.in generated by automake 1.15.1 from Makefile.am.
|
||||
# Makefile.in generated by automake 1.16.1 from Makefile.am.
|
||||
# @configure_input@
|
||||
|
||||
# Copyright (C) 1994-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1994-2018 Free Software Foundation, Inc.
|
||||
|
||||
# This Makefile.in is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -119,9 +119,8 @@ build_triplet = @build@
|
|||
host_triplet = @host@
|
||||
target_triplet = @target@
|
||||
@PLUGIN_NVPTX_TRUE@am__append_1 = libgomp-plugin-nvptx.la
|
||||
@PLUGIN_HSA_TRUE@am__append_2 = libgomp-plugin-hsa.la
|
||||
@PLUGIN_GCN_TRUE@am__append_3 = libgomp-plugin-gcn.la
|
||||
@USE_FORTRAN_TRUE@am__append_4 = openacc.f90
|
||||
@PLUGIN_GCN_TRUE@am__append_2 = libgomp-plugin-gcn.la
|
||||
@USE_FORTRAN_TRUE@am__append_3 = openacc.f90
|
||||
subdir = .
|
||||
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
|
||||
am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
|
||||
|
@ -198,17 +197,6 @@ libgomp_plugin_gcn_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
|
|||
$(libgomp_plugin_gcn_la_LDFLAGS) $(LDFLAGS) -o $@
|
||||
@PLUGIN_GCN_TRUE@am_libgomp_plugin_gcn_la_rpath = -rpath \
|
||||
@PLUGIN_GCN_TRUE@ $(toolexeclibdir)
|
||||
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_DEPENDENCIES = libgomp.la \
|
||||
@PLUGIN_HSA_TRUE@ $(am__DEPENDENCIES_1)
|
||||
@PLUGIN_HSA_TRUE@am_libgomp_plugin_hsa_la_OBJECTS = \
|
||||
@PLUGIN_HSA_TRUE@ libgomp_plugin_hsa_la-plugin-hsa.lo
|
||||
libgomp_plugin_hsa_la_OBJECTS = $(am_libgomp_plugin_hsa_la_OBJECTS)
|
||||
libgomp_plugin_hsa_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
|
||||
$(libgomp_plugin_hsa_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
|
||||
--mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
|
||||
$(libgomp_plugin_hsa_la_LDFLAGS) $(LDFLAGS) -o $@
|
||||
@PLUGIN_HSA_TRUE@am_libgomp_plugin_hsa_la_rpath = -rpath \
|
||||
@PLUGIN_HSA_TRUE@ $(toolexeclibdir)
|
||||
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_DEPENDENCIES = libgomp.la \
|
||||
@PLUGIN_NVPTX_TRUE@ $(am__DEPENDENCIES_1)
|
||||
@PLUGIN_NVPTX_TRUE@am_libgomp_plugin_nvptx_la_OBJECTS = \
|
||||
|
@ -248,7 +236,32 @@ am__v_at_0 = @
|
|||
am__v_at_1 =
|
||||
DEFAULT_INCLUDES = -I.@am__isrc@
|
||||
depcomp = $(SHELL) $(top_srcdir)/../depcomp
|
||||
am__depfiles_maybe = depfiles
|
||||
am__maybe_remake_depfiles = depfiles
|
||||
am__depfiles_remade = ./$(DEPDIR)/affinity-fmt.Plo \
|
||||
./$(DEPDIR)/affinity.Plo ./$(DEPDIR)/alloc.Plo \
|
||||
./$(DEPDIR)/allocator.Plo ./$(DEPDIR)/atomic.Plo \
|
||||
./$(DEPDIR)/bar.Plo ./$(DEPDIR)/barrier.Plo \
|
||||
./$(DEPDIR)/critical.Plo ./$(DEPDIR)/env.Plo \
|
||||
./$(DEPDIR)/error.Plo ./$(DEPDIR)/fortran.Plo \
|
||||
./$(DEPDIR)/icv-device.Plo ./$(DEPDIR)/icv.Plo \
|
||||
./$(DEPDIR)/iter.Plo ./$(DEPDIR)/iter_ull.Plo \
|
||||
./$(DEPDIR)/libgomp-plugin.Plo \
|
||||
./$(DEPDIR)/libgomp_plugin_gcn_la-plugin-gcn.Plo \
|
||||
./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo \
|
||||
./$(DEPDIR)/lock.Plo ./$(DEPDIR)/loop.Plo \
|
||||
./$(DEPDIR)/loop_ull.Plo ./$(DEPDIR)/mutex.Plo \
|
||||
./$(DEPDIR)/oacc-async.Plo ./$(DEPDIR)/oacc-cuda.Plo \
|
||||
./$(DEPDIR)/oacc-host.Plo ./$(DEPDIR)/oacc-init.Plo \
|
||||
./$(DEPDIR)/oacc-mem.Plo ./$(DEPDIR)/oacc-parallel.Plo \
|
||||
./$(DEPDIR)/oacc-plugin.Plo ./$(DEPDIR)/oacc-profiling.Plo \
|
||||
./$(DEPDIR)/oacc-target.Plo ./$(DEPDIR)/ordered.Plo \
|
||||
./$(DEPDIR)/parallel.Plo ./$(DEPDIR)/priority_queue.Plo \
|
||||
./$(DEPDIR)/proc.Plo ./$(DEPDIR)/ptrlock.Plo \
|
||||
./$(DEPDIR)/sections.Plo ./$(DEPDIR)/sem.Plo \
|
||||
./$(DEPDIR)/single.Plo ./$(DEPDIR)/splay-tree.Plo \
|
||||
./$(DEPDIR)/target.Plo ./$(DEPDIR)/task.Plo \
|
||||
./$(DEPDIR)/team.Plo ./$(DEPDIR)/teams.Plo \
|
||||
./$(DEPDIR)/time.Plo ./$(DEPDIR)/work.Plo
|
||||
am__mv = mv -f
|
||||
COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
|
||||
$(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
|
||||
|
@ -281,7 +294,6 @@ am__v_FCLD_ = $(am__v_FCLD_@AM_DEFAULT_V@)
|
|||
am__v_FCLD_0 = @echo " FCLD " $@;
|
||||
am__v_FCLD_1 =
|
||||
SOURCES = $(libgomp_plugin_gcn_la_SOURCES) \
|
||||
$(libgomp_plugin_hsa_la_SOURCES) \
|
||||
$(libgomp_plugin_nvptx_la_SOURCES) $(libgomp_la_SOURCES)
|
||||
AM_V_DVIPS = $(am__v_DVIPS_@AM_V@)
|
||||
am__v_DVIPS_ = $(am__v_DVIPS_@AM_DEFAULT_V@)
|
||||
|
@ -450,10 +462,6 @@ PLUGIN_GCN = @PLUGIN_GCN@
|
|||
PLUGIN_GCN_CPPFLAGS = @PLUGIN_GCN_CPPFLAGS@
|
||||
PLUGIN_GCN_LDFLAGS = @PLUGIN_GCN_LDFLAGS@
|
||||
PLUGIN_GCN_LIBS = @PLUGIN_GCN_LIBS@
|
||||
PLUGIN_HSA = @PLUGIN_HSA@
|
||||
PLUGIN_HSA_CPPFLAGS = @PLUGIN_HSA_CPPFLAGS@
|
||||
PLUGIN_HSA_LDFLAGS = @PLUGIN_HSA_LDFLAGS@
|
||||
PLUGIN_HSA_LIBS = @PLUGIN_HSA_LIBS@
|
||||
PLUGIN_NVPTX = @PLUGIN_NVPTX@
|
||||
PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
|
||||
PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
|
||||
|
@ -550,8 +558,7 @@ libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
|
|||
AM_CPPFLAGS = $(addprefix -I, $(search_path))
|
||||
AM_CFLAGS = $(XCFLAGS)
|
||||
AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS)
|
||||
toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2) \
|
||||
$(am__append_3)
|
||||
toolexeclib_LTLIBRARIES = libgomp.la $(am__append_1) $(am__append_2)
|
||||
nodist_toolexeclib_HEADERS = libgomp.spec
|
||||
|
||||
# -Wc is only a libtool option.
|
||||
|
@ -577,7 +584,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
|
|||
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
|
||||
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
|
||||
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
|
||||
oacc-target.c $(am__append_4)
|
||||
oacc-target.c $(am__append_3)
|
||||
|
||||
# Nvidia PTX OpenACC plugin.
|
||||
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION)
|
||||
|
@ -589,18 +596,6 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \
|
|||
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
|
||||
@PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
|
||||
|
||||
# Heterogenous Systems Architecture plugin
|
||||
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_version_info = -version-info $(libtool_VERSION)
|
||||
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_SOURCES = plugin/plugin-hsa.c
|
||||
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_HSA_CPPFLAGS) \
|
||||
@PLUGIN_HSA_TRUE@ -D_GNU_SOURCE
|
||||
|
||||
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_LDFLAGS = \
|
||||
@PLUGIN_HSA_TRUE@ $(libgomp_plugin_hsa_version_info) \
|
||||
@PLUGIN_HSA_TRUE@ $(lt_host_flags) $(PLUGIN_HSA_LDFLAGS)
|
||||
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_LIBADD = libgomp.la $(PLUGIN_HSA_LIBS)
|
||||
@PLUGIN_HSA_TRUE@libgomp_plugin_hsa_la_LIBTOOLFLAGS = --tag=disable-static
|
||||
|
||||
# AMD GCN plugin
|
||||
@PLUGIN_GCN_TRUE@libgomp_plugin_gcn_version_info = -version-info $(libtool_VERSION)
|
||||
@PLUGIN_GCN_TRUE@libgomp_plugin_gcn_la_SOURCES = plugin/plugin-gcn.c
|
||||
|
@ -674,8 +669,8 @@ Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
|
|||
echo ' $(SHELL) ./config.status'; \
|
||||
$(SHELL) ./config.status;; \
|
||||
*) \
|
||||
echo ' cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__depfiles_maybe)'; \
|
||||
cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__depfiles_maybe);; \
|
||||
echo ' cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__maybe_remake_depfiles)'; \
|
||||
cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__maybe_remake_depfiles);; \
|
||||
esac;
|
||||
$(top_srcdir)/plugin/Makefrag.am $(top_srcdir)/../multilib.am $(am__empty):
|
||||
|
||||
|
@ -751,9 +746,6 @@ clean-toolexeclibLTLIBRARIES:
|
|||
libgomp-plugin-gcn.la: $(libgomp_plugin_gcn_la_OBJECTS) $(libgomp_plugin_gcn_la_DEPENDENCIES) $(EXTRA_libgomp_plugin_gcn_la_DEPENDENCIES)
|
||||
$(AM_V_CCLD)$(libgomp_plugin_gcn_la_LINK) $(am_libgomp_plugin_gcn_la_rpath) $(libgomp_plugin_gcn_la_OBJECTS) $(libgomp_plugin_gcn_la_LIBADD) $(LIBS)
|
||||
|
||||
libgomp-plugin-hsa.la: $(libgomp_plugin_hsa_la_OBJECTS) $(libgomp_plugin_hsa_la_DEPENDENCIES) $(EXTRA_libgomp_plugin_hsa_la_DEPENDENCIES)
|
||||
$(AM_V_CCLD)$(libgomp_plugin_hsa_la_LINK) $(am_libgomp_plugin_hsa_la_rpath) $(libgomp_plugin_hsa_la_OBJECTS) $(libgomp_plugin_hsa_la_LIBADD) $(LIBS)
|
||||
|
||||
libgomp-plugin-nvptx.la: $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_DEPENDENCIES) $(EXTRA_libgomp_plugin_nvptx_la_DEPENDENCIES)
|
||||
$(AM_V_CCLD)$(libgomp_plugin_nvptx_la_LINK) $(am_libgomp_plugin_nvptx_la_rpath) $(libgomp_plugin_nvptx_la_OBJECTS) $(libgomp_plugin_nvptx_la_LIBADD) $(LIBS)
|
||||
|
||||
|
@ -766,53 +758,58 @@ mostlyclean-compile:
|
|||
distclean-compile:
|
||||
-rm -f *.tab.c
|
||||
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/affinity-fmt.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/affinity.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/alloc.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/allocator.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/atomic.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/bar.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/barrier.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/critical.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/env.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/error.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fortran.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/icv-device.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/icv.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter_ull.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp-plugin.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_gcn_la-plugin-gcn.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/lock.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop_ull.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mutex.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-async.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-cuda.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-host.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-init.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-mem.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-parallel.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-profiling.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/proc.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ptrlock.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sections.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/task.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/teams.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/time.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/work.Plo@am__quote@
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/affinity-fmt.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/affinity.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/alloc.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/allocator.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/atomic.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/bar.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/barrier.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/critical.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/env.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/error.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fortran.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/icv-device.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/icv.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/iter_ull.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp-plugin.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_gcn_la-plugin-gcn.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/lock.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/loop_ull.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mutex.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-async.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-cuda.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-host.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-init.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-mem.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-parallel.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-plugin.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-profiling.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/proc.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ptrlock.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sections.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/target.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/task.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/teams.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/time.Plo@am__quote@ # am--include-marker
|
||||
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/work.Plo@am__quote@ # am--include-marker
|
||||
|
||||
$(am__depfiles_remade):
|
||||
@$(MKDIR_P) $(@D)
|
||||
@echo '# dummy' >$@-t && $(am__mv) $@-t $@
|
||||
|
||||
am--depfiles: $(am__depfiles_remade)
|
||||
|
||||
.c.o:
|
||||
@am__fastdepCC_TRUE@ $(AM_V_CC)$(COMPILE) -MT $@ -MD -MP -MF $(DEPDIR)/$*.Tpo -c -o $@ $<
|
||||
|
@ -842,13 +839,6 @@ libgomp_plugin_gcn_la-plugin-gcn.lo: plugin/plugin-gcn.c
|
|||
@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
|
||||
@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(libgomp_plugin_gcn_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_gcn_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_gcn_la-plugin-gcn.lo `test -f 'plugin/plugin-gcn.c' || echo '$(srcdir)/'`plugin/plugin-gcn.c
|
||||
|
||||
libgomp_plugin_hsa_la-plugin-hsa.lo: plugin/plugin-hsa.c
|
||||
@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(libgomp_plugin_hsa_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_hsa_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_hsa_la-plugin-hsa.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Tpo -c -o libgomp_plugin_hsa_la-plugin-hsa.lo `test -f 'plugin/plugin-hsa.c' || echo '$(srcdir)/'`plugin/plugin-hsa.c
|
||||
@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) $(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Tpo $(DEPDIR)/libgomp_plugin_hsa_la-plugin-hsa.Plo
|
||||
@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='plugin/plugin-hsa.c' object='libgomp_plugin_hsa_la-plugin-hsa.lo' libtool=yes @AMDEPBACKSLASH@
|
||||
@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@
|
||||
@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(libgomp_plugin_hsa_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_hsa_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -c -o libgomp_plugin_hsa_la-plugin-hsa.lo `test -f 'plugin/plugin-hsa.c' || echo '$(srcdir)/'`plugin/plugin-hsa.c
|
||||
|
||||
libgomp_plugin_nvptx_la-plugin-nvptx.lo: plugin/plugin-nvptx.c
|
||||
@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(libgomp_plugin_nvptx_la_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(libgomp_plugin_nvptx_la_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) -MT libgomp_plugin_nvptx_la-plugin-nvptx.lo -MD -MP -MF $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo -c -o libgomp_plugin_nvptx_la-plugin-nvptx.lo `test -f 'plugin/plugin-nvptx.c' || echo '$(srcdir)/'`plugin/plugin-nvptx.c
|
||||
@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Tpo $(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo
|
||||
|
@ -1205,7 +1195,52 @@ clean-am: clean-aminfo clean-generic clean-libtool clean-local \
|
|||
|
||||
distclean: distclean-recursive
|
||||
-rm -f $(am__CONFIG_DISTCLEAN_FILES)
|
||||
-rm -rf ./$(DEPDIR)
|
||||
-rm -f ./$(DEPDIR)/affinity-fmt.Plo
|
||||
-rm -f ./$(DEPDIR)/affinity.Plo
|
||||
-rm -f ./$(DEPDIR)/alloc.Plo
|
||||
-rm -f ./$(DEPDIR)/allocator.Plo
|
||||
-rm -f ./$(DEPDIR)/atomic.Plo
|
||||
-rm -f ./$(DEPDIR)/bar.Plo
|
||||
-rm -f ./$(DEPDIR)/barrier.Plo
|
||||
-rm -f ./$(DEPDIR)/critical.Plo
|
||||
-rm -f ./$(DEPDIR)/env.Plo
|
||||
-rm -f ./$(DEPDIR)/error.Plo
|
||||
-rm -f ./$(DEPDIR)/fortran.Plo
|
||||
-rm -f ./$(DEPDIR)/icv-device.Plo
|
||||
-rm -f ./$(DEPDIR)/icv.Plo
|
||||
-rm -f ./$(DEPDIR)/iter.Plo
|
||||
-rm -f ./$(DEPDIR)/iter_ull.Plo
|
||||
-rm -f ./$(DEPDIR)/libgomp-plugin.Plo
|
||||
-rm -f ./$(DEPDIR)/libgomp_plugin_gcn_la-plugin-gcn.Plo
|
||||
-rm -f ./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo
|
||||
-rm -f ./$(DEPDIR)/lock.Plo
|
||||
-rm -f ./$(DEPDIR)/loop.Plo
|
||||
-rm -f ./$(DEPDIR)/loop_ull.Plo
|
||||
-rm -f ./$(DEPDIR)/mutex.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-async.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-cuda.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-host.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-init.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-mem.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-parallel.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-plugin.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-profiling.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-target.Plo
|
||||
-rm -f ./$(DEPDIR)/ordered.Plo
|
||||
-rm -f ./$(DEPDIR)/parallel.Plo
|
||||
-rm -f ./$(DEPDIR)/priority_queue.Plo
|
||||
-rm -f ./$(DEPDIR)/proc.Plo
|
||||
-rm -f ./$(DEPDIR)/ptrlock.Plo
|
||||
-rm -f ./$(DEPDIR)/sections.Plo
|
||||
-rm -f ./$(DEPDIR)/sem.Plo
|
||||
-rm -f ./$(DEPDIR)/single.Plo
|
||||
-rm -f ./$(DEPDIR)/splay-tree.Plo
|
||||
-rm -f ./$(DEPDIR)/target.Plo
|
||||
-rm -f ./$(DEPDIR)/task.Plo
|
||||
-rm -f ./$(DEPDIR)/team.Plo
|
||||
-rm -f ./$(DEPDIR)/teams.Plo
|
||||
-rm -f ./$(DEPDIR)/time.Plo
|
||||
-rm -f ./$(DEPDIR)/work.Plo
|
||||
-rm -f Makefile
|
||||
distclean-am: clean-am distclean-compile distclean-generic \
|
||||
distclean-hdr distclean-libtool distclean-local distclean-tags
|
||||
|
@ -1346,7 +1381,52 @@ installcheck-am:
|
|||
maintainer-clean: maintainer-clean-recursive
|
||||
-rm -f $(am__CONFIG_DISTCLEAN_FILES)
|
||||
-rm -rf $(top_srcdir)/autom4te.cache
|
||||
-rm -rf ./$(DEPDIR)
|
||||
-rm -f ./$(DEPDIR)/affinity-fmt.Plo
|
||||
-rm -f ./$(DEPDIR)/affinity.Plo
|
||||
-rm -f ./$(DEPDIR)/alloc.Plo
|
||||
-rm -f ./$(DEPDIR)/allocator.Plo
|
||||
-rm -f ./$(DEPDIR)/atomic.Plo
|
||||
-rm -f ./$(DEPDIR)/bar.Plo
|
||||
-rm -f ./$(DEPDIR)/barrier.Plo
|
||||
-rm -f ./$(DEPDIR)/critical.Plo
|
||||
-rm -f ./$(DEPDIR)/env.Plo
|
||||
-rm -f ./$(DEPDIR)/error.Plo
|
||||
-rm -f ./$(DEPDIR)/fortran.Plo
|
||||
-rm -f ./$(DEPDIR)/icv-device.Plo
|
||||
-rm -f ./$(DEPDIR)/icv.Plo
|
||||
-rm -f ./$(DEPDIR)/iter.Plo
|
||||
-rm -f ./$(DEPDIR)/iter_ull.Plo
|
||||
-rm -f ./$(DEPDIR)/libgomp-plugin.Plo
|
||||
-rm -f ./$(DEPDIR)/libgomp_plugin_gcn_la-plugin-gcn.Plo
|
||||
-rm -f ./$(DEPDIR)/libgomp_plugin_nvptx_la-plugin-nvptx.Plo
|
||||
-rm -f ./$(DEPDIR)/lock.Plo
|
||||
-rm -f ./$(DEPDIR)/loop.Plo
|
||||
-rm -f ./$(DEPDIR)/loop_ull.Plo
|
||||
-rm -f ./$(DEPDIR)/mutex.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-async.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-cuda.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-host.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-init.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-mem.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-parallel.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-plugin.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-profiling.Plo
|
||||
-rm -f ./$(DEPDIR)/oacc-target.Plo
|
||||
-rm -f ./$(DEPDIR)/ordered.Plo
|
||||
-rm -f ./$(DEPDIR)/parallel.Plo
|
||||
-rm -f ./$(DEPDIR)/priority_queue.Plo
|
||||
-rm -f ./$(DEPDIR)/proc.Plo
|
||||
-rm -f ./$(DEPDIR)/ptrlock.Plo
|
||||
-rm -f ./$(DEPDIR)/sections.Plo
|
||||
-rm -f ./$(DEPDIR)/sem.Plo
|
||||
-rm -f ./$(DEPDIR)/single.Plo
|
||||
-rm -f ./$(DEPDIR)/splay-tree.Plo
|
||||
-rm -f ./$(DEPDIR)/target.Plo
|
||||
-rm -f ./$(DEPDIR)/task.Plo
|
||||
-rm -f ./$(DEPDIR)/team.Plo
|
||||
-rm -f ./$(DEPDIR)/teams.Plo
|
||||
-rm -f ./$(DEPDIR)/time.Plo
|
||||
-rm -f ./$(DEPDIR)/work.Plo
|
||||
-rm -f Makefile
|
||||
maintainer-clean-am: distclean-am maintainer-clean-aminfo \
|
||||
maintainer-clean-generic maintainer-clean-local
|
||||
|
@ -1373,8 +1453,8 @@ uninstall-am: uninstall-dvi-am uninstall-html-am uninstall-info-am \
|
|||
.MAKE: $(am__recursive_targets) all install-am install-strip
|
||||
|
||||
.PHONY: $(am__recursive_targets) CTAGS GTAGS TAGS all all-am all-local \
|
||||
am--refresh check check-am clean clean-aminfo clean-cscope \
|
||||
clean-generic clean-libtool clean-local \
|
||||
am--depfiles am--refresh check check-am clean clean-aminfo \
|
||||
clean-cscope clean-generic clean-libtool clean-local \
|
||||
clean-toolexeclibLTLIBRARIES cscope cscopelist-am ctags \
|
||||
ctags-am dist-info distclean distclean-compile \
|
||||
distclean-generic distclean-hdr distclean-libtool \
|
||||
|
|
189
libgomp/aclocal.m4
vendored
189
libgomp/aclocal.m4
vendored
|
@ -1,6 +1,6 @@
|
|||
# generated automatically by aclocal 1.15.1 -*- Autoconf -*-
|
||||
# generated automatically by aclocal 1.16.1 -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 1996-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
|
||||
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -20,7 +20,7 @@ You have another version of autoconf. It may work, but is not guaranteed to.
|
|||
If you have problems, you may need to regenerate the build system entirely.
|
||||
To do so, use the procedure documented by the package, typically 'autoreconf'.])])
|
||||
|
||||
# Copyright (C) 2002-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2002-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -32,10 +32,10 @@ To do so, use the procedure documented by the package, typically 'autoreconf'.])
|
|||
# generated from the m4 files accompanying Automake X.Y.
|
||||
# (This private macro should not be called outside this file.)
|
||||
AC_DEFUN([AM_AUTOMAKE_VERSION],
|
||||
[am__api_version='1.15'
|
||||
[am__api_version='1.16'
|
||||
dnl Some users find AM_AUTOMAKE_VERSION and mistake it for a way to
|
||||
dnl require some minimum version. Point them to the right macro.
|
||||
m4_if([$1], [1.15.1], [],
|
||||
m4_if([$1], [1.16.1], [],
|
||||
[AC_FATAL([Do not call $0, use AM_INIT_AUTOMAKE([$1]).])])dnl
|
||||
])
|
||||
|
||||
|
@ -51,14 +51,14 @@ m4_define([_AM_AUTOCONF_VERSION], [])
|
|||
# Call AM_AUTOMAKE_VERSION and AM_AUTOMAKE_VERSION so they can be traced.
|
||||
# This function is AC_REQUIREd by AM_INIT_AUTOMAKE.
|
||||
AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION],
|
||||
[AM_AUTOMAKE_VERSION([1.15.1])dnl
|
||||
[AM_AUTOMAKE_VERSION([1.16.1])dnl
|
||||
m4_ifndef([AC_AUTOCONF_VERSION],
|
||||
[m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl
|
||||
_AM_AUTOCONF_VERSION(m4_defn([AC_AUTOCONF_VERSION]))])
|
||||
|
||||
# AM_AUX_DIR_EXPAND -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 2001-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -110,7 +110,7 @@ am_aux_dir=`cd "$ac_aux_dir" && pwd`
|
|||
|
||||
# AM_CONDITIONAL -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 1997-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1997-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -141,7 +141,7 @@ AC_CONFIG_COMMANDS_PRE(
|
|||
Usually this means the macro was only invoked conditionally.]])
|
||||
fi])])
|
||||
|
||||
# Copyright (C) 1999-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1999-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -332,13 +332,12 @@ _AM_SUBST_NOTMAKE([am__nodep])dnl
|
|||
|
||||
# Generate code to set up dependency tracking. -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 1999-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1999-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
# with or without modifications, as long as this notice is preserved.
|
||||
|
||||
|
||||
# _AM_OUTPUT_DEPENDENCY_COMMANDS
|
||||
# ------------------------------
|
||||
AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
|
||||
|
@ -346,49 +345,41 @@ AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
|
|||
# Older Autoconf quotes --file arguments for eval, but not when files
|
||||
# are listed without --file. Let's play safe and only enable the eval
|
||||
# if we detect the quoting.
|
||||
case $CONFIG_FILES in
|
||||
*\'*) eval set x "$CONFIG_FILES" ;;
|
||||
*) set x $CONFIG_FILES ;;
|
||||
esac
|
||||
# TODO: see whether this extra hack can be removed once we start
|
||||
# requiring Autoconf 2.70 or later.
|
||||
AS_CASE([$CONFIG_FILES],
|
||||
[*\'*], [eval set x "$CONFIG_FILES"],
|
||||
[*], [set x $CONFIG_FILES])
|
||||
shift
|
||||
for mf
|
||||
# Used to flag and report bootstrapping failures.
|
||||
am_rc=0
|
||||
for am_mf
|
||||
do
|
||||
# Strip MF so we end up with the name of the file.
|
||||
mf=`echo "$mf" | sed -e 's/:.*$//'`
|
||||
# Check whether this is an Automake generated Makefile or not.
|
||||
# We used to match only the files named 'Makefile.in', but
|
||||
# some people rename them; so instead we look at the file content.
|
||||
# Grep'ing the first line is not enough: some people post-process
|
||||
# each Makefile.in and add a new line on top of each file to say so.
|
||||
# Grep'ing the whole file is not good either: AIX grep has a line
|
||||
am_mf=`AS_ECHO(["$am_mf"]) | sed -e 's/:.*$//'`
|
||||
# Check whether this is an Automake generated Makefile which includes
|
||||
# dependency-tracking related rules and includes.
|
||||
# Grep'ing the whole file directly is not great: AIX grep has a line
|
||||
# limit of 2048, but all sed's we know have understand at least 4000.
|
||||
if sed -n 's,^#.*generated by automake.*,X,p' "$mf" | grep X >/dev/null 2>&1; then
|
||||
dirpart=`AS_DIRNAME("$mf")`
|
||||
else
|
||||
continue
|
||||
fi
|
||||
# Extract the definition of DEPDIR, am__include, and am__quote
|
||||
# from the Makefile without running 'make'.
|
||||
DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"`
|
||||
test -z "$DEPDIR" && continue
|
||||
am__include=`sed -n 's/^am__include = //p' < "$mf"`
|
||||
test -z "$am__include" && continue
|
||||
am__quote=`sed -n 's/^am__quote = //p' < "$mf"`
|
||||
# Find all dependency output files, they are included files with
|
||||
# $(DEPDIR) in their names. We invoke sed twice because it is the
|
||||
# simplest approach to changing $(DEPDIR) to its actual value in the
|
||||
# expansion.
|
||||
for file in `sed -n "
|
||||
s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \
|
||||
sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g'`; do
|
||||
# Make sure the directory exists.
|
||||
test -f "$dirpart/$file" && continue
|
||||
fdir=`AS_DIRNAME(["$file"])`
|
||||
AS_MKDIR_P([$dirpart/$fdir])
|
||||
# echo "creating $dirpart/$file"
|
||||
echo '# dummy' > "$dirpart/$file"
|
||||
done
|
||||
sed -n 's,^am--depfiles:.*,X,p' "$am_mf" | grep X >/dev/null 2>&1 \
|
||||
|| continue
|
||||
am_dirpart=`AS_DIRNAME(["$am_mf"])`
|
||||
am_filepart=`AS_BASENAME(["$am_mf"])`
|
||||
AM_RUN_LOG([cd "$am_dirpart" \
|
||||
&& sed -e '/# am--include-marker/d' "$am_filepart" \
|
||||
| $MAKE -f - am--depfiles]) || am_rc=$?
|
||||
done
|
||||
if test $am_rc -ne 0; then
|
||||
AC_MSG_FAILURE([Something went wrong bootstrapping makefile fragments
|
||||
for automatic dependency tracking. Try re-running configure with the
|
||||
'--disable-dependency-tracking' option to at least be able to build
|
||||
the package (albeit without support for automatic dependency tracking).])
|
||||
fi
|
||||
AS_UNSET([am_dirpart])
|
||||
AS_UNSET([am_filepart])
|
||||
AS_UNSET([am_mf])
|
||||
AS_UNSET([am_rc])
|
||||
rm -f conftest-deps.mk
|
||||
}
|
||||
])# _AM_OUTPUT_DEPENDENCY_COMMANDS
|
||||
|
||||
|
@ -397,18 +388,17 @@ AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
|
|||
# -----------------------------
|
||||
# This macro should only be invoked once -- use via AC_REQUIRE.
|
||||
#
|
||||
# This code is only required when automatic dependency tracking
|
||||
# is enabled. FIXME. This creates each '.P' file that we will
|
||||
# need in order to bootstrap the dependency handling code.
|
||||
# This code is only required when automatic dependency tracking is enabled.
|
||||
# This creates each '.Po' and '.Plo' makefile fragment that we'll need in
|
||||
# order to bootstrap the dependency handling code.
|
||||
AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS],
|
||||
[AC_CONFIG_COMMANDS([depfiles],
|
||||
[test x"$AMDEP_TRUE" != x"" || _AM_OUTPUT_DEPENDENCY_COMMANDS],
|
||||
[AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"])
|
||||
])
|
||||
[AMDEP_TRUE="$AMDEP_TRUE" MAKE="${MAKE-make}"])])
|
||||
|
||||
# Do all the work for Automake. -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 1996-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -495,8 +485,8 @@ AC_REQUIRE([AM_PROG_INSTALL_STRIP])dnl
|
|||
AC_REQUIRE([AC_PROG_MKDIR_P])dnl
|
||||
# For better backward compatibility. To be removed once Automake 1.9.x
|
||||
# dies out for good. For more background, see:
|
||||
# <http://lists.gnu.org/archive/html/automake/2012-07/msg00001.html>
|
||||
# <http://lists.gnu.org/archive/html/automake/2012-07/msg00014.html>
|
||||
# <https://lists.gnu.org/archive/html/automake/2012-07/msg00001.html>
|
||||
# <https://lists.gnu.org/archive/html/automake/2012-07/msg00014.html>
|
||||
AC_SUBST([mkdir_p], ['$(MKDIR_P)'])
|
||||
# We need awk for the "check" target (and possibly the TAP driver). The
|
||||
# system "awk" is bad on some platforms.
|
||||
|
@ -563,7 +553,7 @@ END
|
|||
Aborting the configuration process, to ensure you take notice of the issue.
|
||||
|
||||
You can download and install GNU coreutils to get an 'rm' implementation
|
||||
that behaves properly: <http://www.gnu.org/software/coreutils/>.
|
||||
that behaves properly: <https://www.gnu.org/software/coreutils/>.
|
||||
|
||||
If you want to complete the configuration process using your problematic
|
||||
'rm' anyway, export the environment variable ACCEPT_INFERIOR_RM_PROGRAM
|
||||
|
@ -605,7 +595,7 @@ for _am_header in $config_headers :; do
|
|||
done
|
||||
echo "timestamp for $_am_arg" >`AS_DIRNAME(["$_am_arg"])`/stamp-h[]$_am_stamp_count])
|
||||
|
||||
# Copyright (C) 2001-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -629,7 +619,7 @@ AC_SUBST([install_sh])])
|
|||
# Add --enable-maintainer-mode option to configure. -*- Autoconf -*-
|
||||
# From Jim Meyering
|
||||
|
||||
# Copyright (C) 1996-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -664,7 +654,7 @@ AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
|
|||
|
||||
# Check to see how 'make' treats includes. -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 2001-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -672,49 +662,42 @@ AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
|
|||
|
||||
# AM_MAKE_INCLUDE()
|
||||
# -----------------
|
||||
# Check to see how make treats includes.
|
||||
# Check whether make has an 'include' directive that can support all
|
||||
# the idioms we need for our automatic dependency tracking code.
|
||||
AC_DEFUN([AM_MAKE_INCLUDE],
|
||||
[am_make=${MAKE-make}
|
||||
cat > confinc << 'END'
|
||||
[AC_MSG_CHECKING([whether ${MAKE-make} supports the include directive])
|
||||
cat > confinc.mk << 'END'
|
||||
am__doit:
|
||||
@echo this is the am__doit target
|
||||
@echo this is the am__doit target >confinc.out
|
||||
.PHONY: am__doit
|
||||
END
|
||||
# If we don't find an include directive, just comment out the code.
|
||||
AC_MSG_CHECKING([for style of include used by $am_make])
|
||||
am__include="#"
|
||||
am__quote=
|
||||
_am_result=none
|
||||
# First try GNU make style include.
|
||||
echo "include confinc" > confmf
|
||||
# Ignore all kinds of additional output from 'make'.
|
||||
case `$am_make -s -f confmf 2> /dev/null` in #(
|
||||
*the\ am__doit\ target*)
|
||||
am__include=include
|
||||
am__quote=
|
||||
_am_result=GNU
|
||||
;;
|
||||
esac
|
||||
# Now try BSD make style include.
|
||||
if test "$am__include" = "#"; then
|
||||
echo '.include "confinc"' > confmf
|
||||
case `$am_make -s -f confmf 2> /dev/null` in #(
|
||||
*the\ am__doit\ target*)
|
||||
am__include=.include
|
||||
am__quote="\""
|
||||
_am_result=BSD
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
AC_SUBST([am__include])
|
||||
AC_SUBST([am__quote])
|
||||
AC_MSG_RESULT([$_am_result])
|
||||
rm -f confinc confmf
|
||||
])
|
||||
# BSD make does it like this.
|
||||
echo '.include "confinc.mk" # ignored' > confmf.BSD
|
||||
# Other make implementations (GNU, Solaris 10, AIX) do it like this.
|
||||
echo 'include confinc.mk # ignored' > confmf.GNU
|
||||
_am_result=no
|
||||
for s in GNU BSD; do
|
||||
AM_RUN_LOG([${MAKE-make} -f confmf.$s && cat confinc.out])
|
||||
AS_CASE([$?:`cat confinc.out 2>/dev/null`],
|
||||
['0:this is the am__doit target'],
|
||||
[AS_CASE([$s],
|
||||
[BSD], [am__include='.include' am__quote='"'],
|
||||
[am__include='include' am__quote=''])])
|
||||
if test "$am__include" != "#"; then
|
||||
_am_result="yes ($s style)"
|
||||
break
|
||||
fi
|
||||
done
|
||||
rm -f confinc.* confmf.*
|
||||
AC_MSG_RESULT([${_am_result}])
|
||||
AC_SUBST([am__include])])
|
||||
AC_SUBST([am__quote])])
|
||||
|
||||
# Fake the existence of programs that GNU maintainers use. -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 1997-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1997-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -753,7 +736,7 @@ fi
|
|||
|
||||
# Helper functions for option handling. -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 2001-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -782,7 +765,7 @@ AC_DEFUN([_AM_SET_OPTIONS],
|
|||
AC_DEFUN([_AM_IF_OPTION],
|
||||
[m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])])
|
||||
|
||||
# Copyright (C) 1999-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1999-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -829,7 +812,7 @@ AC_LANG_POP([C])])
|
|||
# For backward compatibility.
|
||||
AC_DEFUN_ONCE([AM_PROG_CC_C_O], [AC_REQUIRE([AC_PROG_CC])])
|
||||
|
||||
# Copyright (C) 2001-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -848,7 +831,7 @@ AC_DEFUN([AM_RUN_LOG],
|
|||
|
||||
# Check to make sure that the build environment is sane. -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 1996-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1996-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -929,7 +912,7 @@ AC_CONFIG_COMMANDS_PRE(
|
|||
rm -f conftest.file
|
||||
])
|
||||
|
||||
# Copyright (C) 2009-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2009-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -989,7 +972,7 @@ AC_SUBST([AM_BACKSLASH])dnl
|
|||
_AM_SUBST_NOTMAKE([AM_BACKSLASH])dnl
|
||||
])
|
||||
|
||||
# Copyright (C) 2001-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2001-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -1017,7 +1000,7 @@ fi
|
|||
INSTALL_STRIP_PROGRAM="\$(install_sh) -c -s"
|
||||
AC_SUBST([INSTALL_STRIP_PROGRAM])])
|
||||
|
||||
# Copyright (C) 2006-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2006-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -1036,7 +1019,7 @@ AC_DEFUN([AM_SUBST_NOTMAKE], [_AM_SUBST_NOTMAKE($@)])
|
|||
|
||||
# Check how to create a tarball. -*- Autoconf -*-
|
||||
|
||||
# Copyright (C) 2004-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 2004-2018 Free Software Foundation, Inc.
|
||||
#
|
||||
# This file is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
|
|
@ -173,9 +173,6 @@
|
|||
/* Define to 1 if the GCN plugin is built, 0 if not. */
|
||||
#undef PLUGIN_GCN
|
||||
|
||||
/* Define to 1 if the HSA plugin is built, 0 if not. */
|
||||
#undef PLUGIN_HSA
|
||||
|
||||
/* Define to 1 if the NVIDIA plugin is built, 0 if not. */
|
||||
#undef PLUGIN_NVPTX
|
||||
|
||||
|
|
273
libgomp/configure
vendored
273
libgomp/configure
vendored
|
@ -667,8 +667,6 @@ OPT_LDFLAGS
|
|||
SECTION_LDFLAGS
|
||||
PLUGIN_GCN_FALSE
|
||||
PLUGIN_GCN_TRUE
|
||||
PLUGIN_HSA_FALSE
|
||||
PLUGIN_HSA_TRUE
|
||||
PLUGIN_NVPTX_FALSE
|
||||
PLUGIN_NVPTX_TRUE
|
||||
offload_additional_lib_paths
|
||||
|
@ -679,10 +677,6 @@ PLUGIN_GCN_LIBS
|
|||
PLUGIN_GCN_LDFLAGS
|
||||
PLUGIN_GCN_CPPFLAGS
|
||||
PLUGIN_GCN
|
||||
PLUGIN_HSA_LIBS
|
||||
PLUGIN_HSA_LDFLAGS
|
||||
PLUGIN_HSA_CPPFLAGS
|
||||
PLUGIN_HSA
|
||||
HSA_RUNTIME_LIB
|
||||
HSA_RUNTIME_INCLUDE
|
||||
PLUGIN_NVPTX_LIBS
|
||||
|
@ -730,7 +724,6 @@ am__nodep
|
|||
AMDEPBACKSLASH
|
||||
AMDEP_FALSE
|
||||
AMDEP_TRUE
|
||||
am__quote
|
||||
am__include
|
||||
DEPDIR
|
||||
OBJEXT
|
||||
|
@ -821,7 +814,8 @@ PACKAGE_VERSION
|
|||
PACKAGE_TARNAME
|
||||
PACKAGE_NAME
|
||||
PATH_SEPARATOR
|
||||
SHELL'
|
||||
SHELL
|
||||
am__quote'
|
||||
ac_subst_files=''
|
||||
ac_user_opts='
|
||||
enable_option_checking
|
||||
|
@ -2891,7 +2885,7 @@ target_alias=${target_alias-$host_alias}
|
|||
# -Wall: turns on all automake warnings...
|
||||
# -Wno-portability: ...except this one, since GNU make is required.
|
||||
# -Wno-override: ... and this one, since we do want this in testsuite.
|
||||
am__api_version='1.15'
|
||||
am__api_version='1.16'
|
||||
|
||||
# Find a good install program. We prefer a C program (faster),
|
||||
# so one script is as good as another. But avoid the broken or
|
||||
|
@ -3407,8 +3401,8 @@ MAKEINFO=${MAKEINFO-"${am_missing_run}makeinfo"}
|
|||
|
||||
# For better backward compatibility. To be removed once Automake 1.9.x
|
||||
# dies out for good. For more background, see:
|
||||
# <http://lists.gnu.org/archive/html/automake/2012-07/msg00001.html>
|
||||
# <http://lists.gnu.org/archive/html/automake/2012-07/msg00014.html>
|
||||
# <https://lists.gnu.org/archive/html/automake/2012-07/msg00001.html>
|
||||
# <https://lists.gnu.org/archive/html/automake/2012-07/msg00014.html>
|
||||
mkdir_p='$(MKDIR_P)'
|
||||
|
||||
# We need awk for the "check" target (and possibly the TAP driver). The
|
||||
|
@ -3459,7 +3453,7 @@ END
|
|||
Aborting the configuration process, to ensure you take notice of the issue.
|
||||
|
||||
You can download and install GNU coreutils to get an 'rm' implementation
|
||||
that behaves properly: <http://www.gnu.org/software/coreutils/>.
|
||||
that behaves properly: <https://www.gnu.org/software/coreutils/>.
|
||||
|
||||
If you want to complete the configuration process using your problematic
|
||||
'rm' anyway, export the environment variable ACCEPT_INFERIOR_RM_PROGRAM
|
||||
|
@ -4420,45 +4414,45 @@ DEPDIR="${am__leading_dot}deps"
|
|||
|
||||
ac_config_commands="$ac_config_commands depfiles"
|
||||
|
||||
|
||||
am_make=${MAKE-make}
|
||||
cat > confinc << 'END'
|
||||
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${MAKE-make} supports the include directive" >&5
|
||||
$as_echo_n "checking whether ${MAKE-make} supports the include directive... " >&6; }
|
||||
cat > confinc.mk << 'END'
|
||||
am__doit:
|
||||
@echo this is the am__doit target
|
||||
@echo this is the am__doit target >confinc.out
|
||||
.PHONY: am__doit
|
||||
END
|
||||
# If we don't find an include directive, just comment out the code.
|
||||
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for style of include used by $am_make" >&5
|
||||
$as_echo_n "checking for style of include used by $am_make... " >&6; }
|
||||
am__include="#"
|
||||
am__quote=
|
||||
_am_result=none
|
||||
# First try GNU make style include.
|
||||
echo "include confinc" > confmf
|
||||
# Ignore all kinds of additional output from 'make'.
|
||||
case `$am_make -s -f confmf 2> /dev/null` in #(
|
||||
*the\ am__doit\ target*)
|
||||
am__include=include
|
||||
am__quote=
|
||||
_am_result=GNU
|
||||
;;
|
||||
esac
|
||||
# Now try BSD make style include.
|
||||
if test "$am__include" = "#"; then
|
||||
echo '.include "confinc"' > confmf
|
||||
case `$am_make -s -f confmf 2> /dev/null` in #(
|
||||
*the\ am__doit\ target*)
|
||||
am__include=.include
|
||||
am__quote="\""
|
||||
_am_result=BSD
|
||||
# BSD make does it like this.
|
||||
echo '.include "confinc.mk" # ignored' > confmf.BSD
|
||||
# Other make implementations (GNU, Solaris 10, AIX) do it like this.
|
||||
echo 'include confinc.mk # ignored' > confmf.GNU
|
||||
_am_result=no
|
||||
for s in GNU BSD; do
|
||||
{ echo "$as_me:$LINENO: ${MAKE-make} -f confmf.$s && cat confinc.out" >&5
|
||||
(${MAKE-make} -f confmf.$s && cat confinc.out) >&5 2>&5
|
||||
ac_status=$?
|
||||
echo "$as_me:$LINENO: \$? = $ac_status" >&5
|
||||
(exit $ac_status); }
|
||||
case $?:`cat confinc.out 2>/dev/null` in #(
|
||||
'0:this is the am__doit target') :
|
||||
case $s in #(
|
||||
BSD) :
|
||||
am__include='.include' am__quote='"' ;; #(
|
||||
*) :
|
||||
am__include='include' am__quote='' ;;
|
||||
esac ;; #(
|
||||
*) :
|
||||
;;
|
||||
esac
|
||||
fi
|
||||
|
||||
|
||||
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $_am_result" >&5
|
||||
$as_echo "$_am_result" >&6; }
|
||||
rm -f confinc confmf
|
||||
esac
|
||||
if test "$am__include" != "#"; then
|
||||
_am_result="yes ($s style)"
|
||||
break
|
||||
fi
|
||||
done
|
||||
rm -f confinc.* confmf.*
|
||||
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: ${_am_result}" >&5
|
||||
$as_echo "${_am_result}" >&6; }
|
||||
|
||||
# Check whether --enable-dependency-tracking was given.
|
||||
if test "${enable_dependency_tracking+set}" = set; then :
|
||||
|
@ -11435,7 +11429,7 @@ else
|
|||
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
|
||||
lt_status=$lt_dlunknown
|
||||
cat > conftest.$ac_ext <<_LT_EOF
|
||||
#line 11438 "configure"
|
||||
#line 11432 "configure"
|
||||
#include "confdefs.h"
|
||||
|
||||
#if HAVE_DLFCN_H
|
||||
|
@ -11541,7 +11535,7 @@ else
|
|||
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
|
||||
lt_status=$lt_dlunknown
|
||||
cat > conftest.$ac_ext <<_LT_EOF
|
||||
#line 11544 "configure"
|
||||
#line 11538 "configure"
|
||||
#include "confdefs.h"
|
||||
|
||||
#if HAVE_DLFCN_H
|
||||
|
@ -15256,15 +15250,6 @@ if test "x$HSA_RUNTIME_LIB" != x; then
|
|||
HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
|
||||
fi
|
||||
|
||||
PLUGIN_HSA=0
|
||||
PLUGIN_HSA_CPPFLAGS=
|
||||
PLUGIN_HSA_LDFLAGS=
|
||||
PLUGIN_HSA_LIBS=
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
PLUGIN_GCN=0
|
||||
PLUGIN_GCN_CPPFLAGS=
|
||||
PLUGIN_GCN_LDFLAGS=
|
||||
|
@ -15346,45 +15331,6 @@ rm -f core conftest.err conftest.$ac_objext \
|
|||
;;
|
||||
esac
|
||||
;;
|
||||
hsa*)
|
||||
case "${target}" in
|
||||
x86_64-*-*)
|
||||
case " ${CC} ${CFLAGS} " in
|
||||
*" -m32 "*|*" -mx32 "*)
|
||||
PLUGIN_HSA=0
|
||||
;;
|
||||
*)
|
||||
tgt_plugin=hsa
|
||||
PLUGIN_HSA=$tgt
|
||||
PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
|
||||
PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
|
||||
PLUGIN_HSA_LIBS="-ldl"
|
||||
|
||||
PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
|
||||
CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
|
||||
PLUGIN_HSA_save_LDFLAGS=$LDFLAGS
|
||||
LDFLAGS="$PLUGIN_HSA_LDFLAGS $LDFLAGS"
|
||||
PLUGIN_HSA_save_LIBS=$LIBS
|
||||
LIBS="$PLUGIN_HSA_LIBS $LIBS"
|
||||
|
||||
PLUGIN_HSA=1
|
||||
CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
|
||||
LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
|
||||
LIBS=$PLUGIN_HSA_save_LIBS
|
||||
case $PLUGIN_HSA in
|
||||
hsa*)
|
||||
HSA_PLUGIN=0
|
||||
as_fn_error $? "HSA run-time package required for HSA support" "$LINENO" 5
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
*-*-*)
|
||||
PLUGIN_HSA=0
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
|
||||
amdgcn*)
|
||||
case "${target}" in
|
||||
|
@ -15424,10 +15370,7 @@ rm -f core conftest.err conftest.$ac_objext \
|
|||
offload_targets=$offload_targets,$tgt
|
||||
fi
|
||||
# Configure additional search paths.
|
||||
if test "$tgt_plugin" = hsa; then
|
||||
# Offloading compilation is all handled by the target compiler.
|
||||
:
|
||||
elif test x"$tgt_dir" != x; then
|
||||
if test x"$tgt_dir" != x; then
|
||||
offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
|
||||
offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
|
||||
else
|
||||
|
@ -15457,19 +15400,6 @@ _ACEOF
|
|||
|
||||
cat >>confdefs.h <<_ACEOF
|
||||
#define PLUGIN_NVPTX_DYNAMIC $PLUGIN_NVPTX_DYNAMIC
|
||||
_ACEOF
|
||||
|
||||
if test $PLUGIN_HSA = 1; then
|
||||
PLUGIN_HSA_TRUE=
|
||||
PLUGIN_HSA_FALSE='#'
|
||||
else
|
||||
PLUGIN_HSA_TRUE='#'
|
||||
PLUGIN_HSA_FALSE=
|
||||
fi
|
||||
|
||||
|
||||
cat >>confdefs.h <<_ACEOF
|
||||
#define PLUGIN_HSA $PLUGIN_HSA
|
||||
_ACEOF
|
||||
|
||||
if test $PLUGIN_GCN = 1; then
|
||||
|
@ -16756,7 +16686,7 @@ case "$host" in
|
|||
case "$enable_cet" in
|
||||
auto)
|
||||
# Check if target supports multi-byte NOPs
|
||||
# and if assembler supports CET insn.
|
||||
# and if compiler and assembler support CET insn.
|
||||
cet_save_CFLAGS="$CFLAGS"
|
||||
CFLAGS="$CFLAGS -fcf-protection"
|
||||
cat confdefs.h - <<_ACEOF >conftest.$ac_ext
|
||||
|
@ -17247,10 +17177,6 @@ if test -z "${PLUGIN_NVPTX_TRUE}" && test -z "${PLUGIN_NVPTX_FALSE}"; then
|
|||
as_fn_error $? "conditional \"PLUGIN_NVPTX\" was never defined.
|
||||
Usually this means the macro was only invoked conditionally." "$LINENO" 5
|
||||
fi
|
||||
if test -z "${PLUGIN_HSA_TRUE}" && test -z "${PLUGIN_HSA_FALSE}"; then
|
||||
as_fn_error $? "conditional \"PLUGIN_HSA\" was never defined.
|
||||
Usually this means the macro was only invoked conditionally." "$LINENO" 5
|
||||
fi
|
||||
if test -z "${PLUGIN_GCN_TRUE}" && test -z "${PLUGIN_GCN_FALSE}"; then
|
||||
as_fn_error $? "conditional \"PLUGIN_GCN\" was never defined.
|
||||
Usually this means the macro was only invoked conditionally." "$LINENO" 5
|
||||
|
@ -17869,7 +17795,7 @@ CC="$CC"
|
|||
CXX="$CXX"
|
||||
GFORTRAN="$GFORTRAN"
|
||||
GDC="$GDC"
|
||||
AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"
|
||||
AMDEP_TRUE="$AMDEP_TRUE" MAKE="${MAKE-make}"
|
||||
|
||||
|
||||
# The HP-UX ksh and POSIX shell print the target directory to stdout
|
||||
|
@ -18859,29 +18785,35 @@ esac ;;
|
|||
# Older Autoconf quotes --file arguments for eval, but not when files
|
||||
# are listed without --file. Let's play safe and only enable the eval
|
||||
# if we detect the quoting.
|
||||
case $CONFIG_FILES in
|
||||
*\'*) eval set x "$CONFIG_FILES" ;;
|
||||
*) set x $CONFIG_FILES ;;
|
||||
esac
|
||||
# TODO: see whether this extra hack can be removed once we start
|
||||
# requiring Autoconf 2.70 or later.
|
||||
case $CONFIG_FILES in #(
|
||||
*\'*) :
|
||||
eval set x "$CONFIG_FILES" ;; #(
|
||||
*) :
|
||||
set x $CONFIG_FILES ;; #(
|
||||
*) :
|
||||
;;
|
||||
esac
|
||||
shift
|
||||
for mf
|
||||
# Used to flag and report bootstrapping failures.
|
||||
am_rc=0
|
||||
for am_mf
|
||||
do
|
||||
# Strip MF so we end up with the name of the file.
|
||||
mf=`echo "$mf" | sed -e 's/:.*$//'`
|
||||
# Check whether this is an Automake generated Makefile or not.
|
||||
# We used to match only the files named 'Makefile.in', but
|
||||
# some people rename them; so instead we look at the file content.
|
||||
# Grep'ing the first line is not enough: some people post-process
|
||||
# each Makefile.in and add a new line on top of each file to say so.
|
||||
# Grep'ing the whole file is not good either: AIX grep has a line
|
||||
am_mf=`$as_echo "$am_mf" | sed -e 's/:.*$//'`
|
||||
# Check whether this is an Automake generated Makefile which includes
|
||||
# dependency-tracking related rules and includes.
|
||||
# Grep'ing the whole file directly is not great: AIX grep has a line
|
||||
# limit of 2048, but all sed's we know have understand at least 4000.
|
||||
if sed -n 's,^#.*generated by automake.*,X,p' "$mf" | grep X >/dev/null 2>&1; then
|
||||
dirpart=`$as_dirname -- "$mf" ||
|
||||
$as_expr X"$mf" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
|
||||
X"$mf" : 'X\(//\)[^/]' \| \
|
||||
X"$mf" : 'X\(//\)$' \| \
|
||||
X"$mf" : 'X\(/\)' \| . 2>/dev/null ||
|
||||
$as_echo X"$mf" |
|
||||
sed -n 's,^am--depfiles:.*,X,p' "$am_mf" | grep X >/dev/null 2>&1 \
|
||||
|| continue
|
||||
am_dirpart=`$as_dirname -- "$am_mf" ||
|
||||
$as_expr X"$am_mf" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
|
||||
X"$am_mf" : 'X\(//\)[^/]' \| \
|
||||
X"$am_mf" : 'X\(//\)$' \| \
|
||||
X"$am_mf" : 'X\(/\)' \| . 2>/dev/null ||
|
||||
$as_echo X"$am_mf" |
|
||||
sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{
|
||||
s//\1/
|
||||
q
|
||||
|
@ -18899,53 +18831,48 @@ $as_echo X"$mf" |
|
|||
q
|
||||
}
|
||||
s/.*/./; q'`
|
||||
else
|
||||
continue
|
||||
fi
|
||||
# Extract the definition of DEPDIR, am__include, and am__quote
|
||||
# from the Makefile without running 'make'.
|
||||
DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"`
|
||||
test -z "$DEPDIR" && continue
|
||||
am__include=`sed -n 's/^am__include = //p' < "$mf"`
|
||||
test -z "$am__include" && continue
|
||||
am__quote=`sed -n 's/^am__quote = //p' < "$mf"`
|
||||
# Find all dependency output files, they are included files with
|
||||
# $(DEPDIR) in their names. We invoke sed twice because it is the
|
||||
# simplest approach to changing $(DEPDIR) to its actual value in the
|
||||
# expansion.
|
||||
for file in `sed -n "
|
||||
s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \
|
||||
sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g'`; do
|
||||
# Make sure the directory exists.
|
||||
test -f "$dirpart/$file" && continue
|
||||
fdir=`$as_dirname -- "$file" ||
|
||||
$as_expr X"$file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \
|
||||
X"$file" : 'X\(//\)[^/]' \| \
|
||||
X"$file" : 'X\(//\)$' \| \
|
||||
X"$file" : 'X\(/\)' \| . 2>/dev/null ||
|
||||
$as_echo X"$file" |
|
||||
sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{
|
||||
am_filepart=`$as_basename -- "$am_mf" ||
|
||||
$as_expr X/"$am_mf" : '.*/\([^/][^/]*\)/*$' \| \
|
||||
X"$am_mf" : 'X\(//\)$' \| \
|
||||
X"$am_mf" : 'X\(/\)' \| . 2>/dev/null ||
|
||||
$as_echo X/"$am_mf" |
|
||||
sed '/^.*\/\([^/][^/]*\)\/*$/{
|
||||
s//\1/
|
||||
q
|
||||
}
|
||||
/^X\(\/\/\)[^/].*/{
|
||||
/^X\/\(\/\/\)$/{
|
||||
s//\1/
|
||||
q
|
||||
}
|
||||
/^X\(\/\/\)$/{
|
||||
s//\1/
|
||||
q
|
||||
}
|
||||
/^X\(\/\).*/{
|
||||
/^X\/\(\/\).*/{
|
||||
s//\1/
|
||||
q
|
||||
}
|
||||
s/.*/./; q'`
|
||||
as_dir=$dirpart/$fdir; as_fn_mkdir_p
|
||||
# echo "creating $dirpart/$file"
|
||||
echo '# dummy' > "$dirpart/$file"
|
||||
done
|
||||
{ echo "$as_me:$LINENO: cd "$am_dirpart" \
|
||||
&& sed -e '/# am--include-marker/d' "$am_filepart" \
|
||||
| $MAKE -f - am--depfiles" >&5
|
||||
(cd "$am_dirpart" \
|
||||
&& sed -e '/# am--include-marker/d' "$am_filepart" \
|
||||
| $MAKE -f - am--depfiles) >&5 2>&5
|
||||
ac_status=$?
|
||||
echo "$as_me:$LINENO: \$? = $ac_status" >&5
|
||||
(exit $ac_status); } || am_rc=$?
|
||||
done
|
||||
if test $am_rc -ne 0; then
|
||||
{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
|
||||
$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
|
||||
as_fn_error $? "Something went wrong bootstrapping makefile fragments
|
||||
for automatic dependency tracking. Try re-running configure with the
|
||||
'--disable-dependency-tracking' option to at least be able to build
|
||||
the package (albeit without support for automatic dependency tracking).
|
||||
See \`config.log' for more details" "$LINENO" 5; }
|
||||
fi
|
||||
{ am_dirpart=; unset am_dirpart;}
|
||||
{ am_filepart=; unset am_filepart;}
|
||||
{ am_mf=; unset am_mf;}
|
||||
{ am_rc=; unset am_rc;}
|
||||
rm -f conftest-deps.mk
|
||||
}
|
||||
;;
|
||||
"libtool":C)
|
||||
|
|
|
@ -39,20 +39,6 @@ libgomp_plugin_nvptx_la_LIBADD = libgomp.la $(PLUGIN_NVPTX_LIBS)
|
|||
libgomp_plugin_nvptx_la_LIBTOOLFLAGS = --tag=disable-static
|
||||
endif
|
||||
|
||||
if PLUGIN_HSA
|
||||
# Heterogenous Systems Architecture plugin
|
||||
libgomp_plugin_hsa_version_info = -version-info $(libtool_VERSION)
|
||||
toolexeclib_LTLIBRARIES += libgomp-plugin-hsa.la
|
||||
libgomp_plugin_hsa_la_SOURCES = plugin/plugin-hsa.c
|
||||
libgomp_plugin_hsa_la_CPPFLAGS = $(AM_CPPFLAGS) $(PLUGIN_HSA_CPPFLAGS) \
|
||||
-D_GNU_SOURCE
|
||||
libgomp_plugin_hsa_la_LDFLAGS = $(libgomp_plugin_hsa_version_info) \
|
||||
$(lt_host_flags)
|
||||
libgomp_plugin_hsa_la_LDFLAGS += $(PLUGIN_HSA_LDFLAGS)
|
||||
libgomp_plugin_hsa_la_LIBADD = libgomp.la $(PLUGIN_HSA_LIBS)
|
||||
libgomp_plugin_hsa_la_LIBTOOLFLAGS = --tag=disable-static
|
||||
endif
|
||||
|
||||
if PLUGIN_GCN
|
||||
# AMD GCN plugin
|
||||
libgomp_plugin_gcn_version_info = -version-info $(libtool_VERSION)
|
||||
|
|
|
@ -128,15 +128,6 @@ if test "x$HSA_RUNTIME_LIB" != x; then
|
|||
HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
|
||||
fi
|
||||
|
||||
PLUGIN_HSA=0
|
||||
PLUGIN_HSA_CPPFLAGS=
|
||||
PLUGIN_HSA_LDFLAGS=
|
||||
PLUGIN_HSA_LIBS=
|
||||
AC_SUBST(PLUGIN_HSA)
|
||||
AC_SUBST(PLUGIN_HSA_CPPFLAGS)
|
||||
AC_SUBST(PLUGIN_HSA_LDFLAGS)
|
||||
AC_SUBST(PLUGIN_HSA_LIBS)
|
||||
|
||||
PLUGIN_GCN=0
|
||||
PLUGIN_GCN_CPPFLAGS=
|
||||
PLUGIN_GCN_LDFLAGS=
|
||||
|
@ -207,45 +198,6 @@ if test x"$enable_offload_targets" != x; then
|
|||
;;
|
||||
esac
|
||||
;;
|
||||
hsa*)
|
||||
case "${target}" in
|
||||
x86_64-*-*)
|
||||
case " ${CC} ${CFLAGS} " in
|
||||
*" -m32 "*|*" -mx32 "*)
|
||||
PLUGIN_HSA=0
|
||||
;;
|
||||
*)
|
||||
tgt_plugin=hsa
|
||||
PLUGIN_HSA=$tgt
|
||||
PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
|
||||
PLUGIN_HSA_LDFLAGS="$HSA_RUNTIME_LDFLAGS"
|
||||
PLUGIN_HSA_LIBS="-ldl"
|
||||
|
||||
PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
|
||||
CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
|
||||
PLUGIN_HSA_save_LDFLAGS=$LDFLAGS
|
||||
LDFLAGS="$PLUGIN_HSA_LDFLAGS $LDFLAGS"
|
||||
PLUGIN_HSA_save_LIBS=$LIBS
|
||||
LIBS="$PLUGIN_HSA_LIBS $LIBS"
|
||||
|
||||
PLUGIN_HSA=1
|
||||
CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
|
||||
LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
|
||||
LIBS=$PLUGIN_HSA_save_LIBS
|
||||
case $PLUGIN_HSA in
|
||||
hsa*)
|
||||
HSA_PLUGIN=0
|
||||
AC_MSG_ERROR([HSA run-time package required for HSA support])
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
*-*-*)
|
||||
PLUGIN_HSA=0
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
|
||||
amdgcn*)
|
||||
case "${target}" in
|
||||
|
@ -285,10 +237,7 @@ if test x"$enable_offload_targets" != x; then
|
|||
offload_targets=$offload_targets,$tgt
|
||||
fi
|
||||
# Configure additional search paths.
|
||||
if test "$tgt_plugin" = hsa; then
|
||||
# Offloading compilation is all handled by the target compiler.
|
||||
:
|
||||
elif test x"$tgt_dir" != x; then
|
||||
if test x"$tgt_dir" != x; then
|
||||
offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
|
||||
offload_additional_lib_paths="$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib:$tgt_dir/lib32"
|
||||
else
|
||||
|
@ -304,9 +253,6 @@ AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
|
|||
[Define to 1 if the NVIDIA plugin is built, 0 if not.])
|
||||
AC_DEFINE_UNQUOTED([PLUGIN_NVPTX_DYNAMIC], [$PLUGIN_NVPTX_DYNAMIC],
|
||||
[Define to 1 if the NVIDIA plugin should dlopen libcuda.so.1, 0 if it should be linked against it.])
|
||||
AM_CONDITIONAL([PLUGIN_HSA], [test $PLUGIN_HSA = 1])
|
||||
AC_DEFINE_UNQUOTED([PLUGIN_HSA], [$PLUGIN_HSA],
|
||||
[Define to 1 if the HSA plugin is built, 0 if not.])
|
||||
AM_CONDITIONAL([PLUGIN_GCN], [test $PLUGIN_GCN = 1])
|
||||
AC_DEFINE_UNQUOTED([PLUGIN_GCN], [$PLUGIN_GCN],
|
||||
[Define to 1 if the GCN plugin is built, 0 if not.])
|
||||
|
|
|
@ -1,270 +0,0 @@
|
|||
/* HSA Extensions API 1.0.1 representation description.
|
||||
Copyright (C) 2016-2020 Free Software Foundation, Inc.
|
||||
|
||||
This file is part of GCC.
|
||||
|
||||
GCC is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 3, or (at your option)
|
||||
any later version.
|
||||
|
||||
GCC is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
Under Section 7 of GPL version 3, you are granted additional
|
||||
permissions described in the GCC Runtime Library Exception, version
|
||||
3.1, as published by the Free Software Foundation.
|
||||
|
||||
You should have received a copy of the GNU General Public License and
|
||||
a copy of the GCC Runtime Library Exception along with this program;
|
||||
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
|
||||
<http://www.gnu.org/licenses/>.
|
||||
|
||||
The contents of the file was created by extracting data structures, enum,
|
||||
typedef and other definitions from HSA Runtime Programmer’s Reference Manual
|
||||
Version 1.0 (http://www.hsafoundation.com/standards/).
|
||||
|
||||
HTML version is provided on the following link:
|
||||
http://www.hsafoundation.com/html/Content/Runtime/Topics/Runtime_title_page.htm
|
||||
*/
|
||||
|
||||
|
||||
#ifndef _HSA_EXT_FINALIZE_H
|
||||
#define _HSA_EXT_FINALIZE_H 1
|
||||
|
||||
struct BrigModuleHeader;
|
||||
typedef struct BrigModuleHeader *BrigModule_t;
|
||||
|
||||
typedef enum {
|
||||
HSA_EXT_IMAGE_GEOMETRY_1D = 0,
|
||||
HSA_EXT_IMAGE_GEOMETRY_2D = 1,
|
||||
HSA_EXT_IMAGE_GEOMETRY_3D = 2,
|
||||
HSA_EXT_IMAGE_GEOMETRY_1DA = 3,
|
||||
HSA_EXT_IMAGE_GEOMETRY_2DA = 4,
|
||||
HSA_EXT_IMAGE_GEOMETRY_1DB = 5,
|
||||
HSA_EXT_IMAGE_GEOMETRY_2DDEPTH = 6,
|
||||
HSA_EXT_IMAGE_GEOMETRY_2DADEPTH = 7
|
||||
} hsa_ext_image_geometry_t;
|
||||
|
||||
typedef enum {
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT8 = 0,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_SNORM_INT16 = 1,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT8 = 2,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT16 = 3,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_INT24 = 4,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_555 = 5,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_565 = 6,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNORM_SHORT_101010 = 7,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT8 = 8,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT16 = 9,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_SIGNED_INT32 = 10,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT8 = 11,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT16 = 12,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_UNSIGNED_INT32 = 13,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_HALF_FLOAT = 14,
|
||||
HSA_EXT_IMAGE_CHANNEL_TYPE_FLOAT = 15
|
||||
} hsa_ext_image_channel_type_t;
|
||||
|
||||
typedef enum {
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_A = 0,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_R = 1,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_RX = 2,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_RG = 3,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_RGX = 4,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_RA = 5,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_RGB = 6,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_RGBX = 7,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_RGBA = 8,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_BGRA = 9,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_ARGB = 10,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_ABGR = 11,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_SRGB = 12,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBX = 13,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_SRGBA = 14,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_SBGRA = 15,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_INTENSITY = 16,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_LUMINANCE = 17,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH = 18,
|
||||
HSA_EXT_IMAGE_CHANNEL_ORDER_DEPTH_STENCIL = 19
|
||||
} hsa_ext_image_channel_order_t;
|
||||
|
||||
typedef struct hsa_ext_image_format_s
|
||||
{
|
||||
hsa_ext_image_channel_type_t channel_type;
|
||||
hsa_ext_image_channel_order_t channel_order;
|
||||
} hsa_ext_image_format_t;
|
||||
|
||||
typedef struct hsa_ext_sampler_s
|
||||
{
|
||||
uint64_t handle;
|
||||
} hsa_ext_sampler_t;
|
||||
typedef struct hsa_ext_image_data_info_s
|
||||
{
|
||||
size_t size;
|
||||
size_t alignment;
|
||||
} hsa_ext_image_data_info_t;
|
||||
typedef enum {
|
||||
HSA_EXT_SAMPLER_ADDRESSING_MODE_UNDEFINED = 0,
|
||||
HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_EDGE = 1,
|
||||
HSA_EXT_SAMPLER_ADDRESSING_MODE_CLAMP_TO_BORDER = 2,
|
||||
HSA_EXT_SAMPLER_ADDRESSING_MODE_REPEAT = 3,
|
||||
HSA_EXT_SAMPLER_ADDRESSING_MODE_MIRRORED_REPEAT = 4
|
||||
} hsa_ext_sampler_addressing_mode_t;
|
||||
typedef struct hsa_ext_image_s
|
||||
{
|
||||
uint64_t handle;
|
||||
} hsa_ext_image_t;
|
||||
typedef enum {
|
||||
HSA_EXT_IMAGE_CAPABILITY_NOT_SUPPORTED = 0x0,
|
||||
HSA_EXT_IMAGE_CAPABILITY_READ_ONLY = 0x1,
|
||||
HSA_EXT_IMAGE_CAPABILITY_WRITE_ONLY = 0x2,
|
||||
HSA_EXT_IMAGE_CAPABILITY_READ_WRITE = 0x4,
|
||||
HSA_EXT_IMAGE_CAPABILITY_READ_MODIFY_WRITE = 0x8,
|
||||
HSA_EXT_IMAGE_CAPABILITY_ACCESS_INVARIANT_DATA_LAYOUT = 0x10
|
||||
} hsa_ext_image_capability_t;
|
||||
typedef struct hsa_ext_control_directives_s
|
||||
{
|
||||
uint64_t control_directives_mask;
|
||||
uint16_t break_exceptions_mask;
|
||||
uint16_t detect_exceptions_mask;
|
||||
uint32_t max_dynamic_group_size;
|
||||
uint64_t max_flat_grid_size;
|
||||
uint32_t max_flat_workgroup_size;
|
||||
uint32_t reserved1;
|
||||
uint64_t required_grid_size[3];
|
||||
hsa_dim3_t required_workgroup_size;
|
||||
uint8_t required_dim;
|
||||
uint8_t reserved2[75];
|
||||
} hsa_ext_control_directives_t;
|
||||
typedef enum {
|
||||
HSA_EXT_SAMPLER_FILTER_MODE_NEAREST = 0,
|
||||
HSA_EXT_SAMPLER_FILTER_MODE_LINEAR = 1
|
||||
} hsa_ext_sampler_filter_mode_t;
|
||||
|
||||
typedef enum {
|
||||
HSA_EXT_SAMPLER_COORDINATE_MODE_UNNORMALIZED = 0,
|
||||
HSA_EXT_SAMPLER_COORDINATE_MODE_NORMALIZED = 1
|
||||
} hsa_ext_sampler_coordinate_mode_t;
|
||||
typedef enum {
|
||||
HSA_EXT_FINALIZER_CALL_CONVENTION_AUTO = -1
|
||||
} hsa_ext_finalizer_call_convention_t;
|
||||
typedef struct hsa_ext_program_s
|
||||
{
|
||||
uint64_t handle;
|
||||
} hsa_ext_program_t;
|
||||
typedef struct hsa_ext_image_descriptor_s
|
||||
{
|
||||
hsa_ext_image_geometry_t geometry;
|
||||
size_t width;
|
||||
size_t height;
|
||||
size_t depth;
|
||||
size_t array_size;
|
||||
hsa_ext_image_format_t format;
|
||||
} hsa_ext_image_descriptor_t;
|
||||
typedef enum {
|
||||
HSA_EXT_PROGRAM_INFO_MACHINE_MODEL = 0,
|
||||
HSA_EXT_PROGRAM_INFO_PROFILE = 1,
|
||||
HSA_EXT_PROGRAM_INFO_DEFAULT_FLOAT_ROUNDING_MODE = 2
|
||||
} hsa_ext_program_info_t;
|
||||
typedef BrigModule_t hsa_ext_module_t;
|
||||
typedef struct hsa_ext_sampler_descriptor_s
|
||||
{
|
||||
hsa_ext_sampler_coordinate_mode_t coordinate_mode;
|
||||
hsa_ext_sampler_filter_mode_t filter_mode;
|
||||
hsa_ext_sampler_addressing_mode_t address_mode;
|
||||
} hsa_ext_sampler_descriptor_t;
|
||||
|
||||
typedef struct hsa_ext_image_region_s
|
||||
{
|
||||
hsa_dim3_t offset;
|
||||
hsa_dim3_t range;
|
||||
} hsa_ext_image_region_t;
|
||||
hsa_status_t hsa_ext_image_export (hsa_agent_t agent, hsa_ext_image_t src_image,
|
||||
void *dst_memory, size_t dst_row_pitch,
|
||||
size_t dst_slice_pitch,
|
||||
const hsa_ext_image_region_t *image_region);
|
||||
hsa_status_t hsa_ext_program_add_module (hsa_ext_program_t program,
|
||||
hsa_ext_module_t module);
|
||||
hsa_status_t hsa_ext_program_iterate_modules (
|
||||
hsa_ext_program_t program,
|
||||
hsa_status_t (*callback) (hsa_ext_program_t program, hsa_ext_module_t module,
|
||||
void *data),
|
||||
void *data);
|
||||
hsa_status_t hsa_ext_program_create (
|
||||
hsa_machine_model_t machine_model, hsa_profile_t profile,
|
||||
hsa_default_float_rounding_mode_t default_float_rounding_mode,
|
||||
const char *options, hsa_ext_program_t *program);
|
||||
hsa_status_t
|
||||
hsa_ext_image_data_get_info (hsa_agent_t agent,
|
||||
const hsa_ext_image_descriptor_t *image_descriptor,
|
||||
hsa_access_permission_t access_permission,
|
||||
hsa_ext_image_data_info_t *image_data_info);
|
||||
|
||||
hsa_status_t hsa_ext_image_import (hsa_agent_t agent, const void *src_memory,
|
||||
size_t src_row_pitch, size_t src_slice_pitch,
|
||||
hsa_ext_image_t dst_image,
|
||||
const hsa_ext_image_region_t *image_region);
|
||||
hsa_status_t hsa_ext_program_get_info (hsa_ext_program_t program,
|
||||
hsa_ext_program_info_t attribute,
|
||||
void *value);
|
||||
enum
|
||||
{
|
||||
HSA_EXT_STATUS_ERROR_IMAGE_FORMAT_UNSUPPORTED = 0x3000,
|
||||
HSA_EXT_STATUS_ERROR_IMAGE_SIZE_UNSUPPORTED = 0x3001
|
||||
};
|
||||
hsa_status_t hsa_ext_image_destroy (hsa_agent_t agent, hsa_ext_image_t image);
|
||||
hsa_status_t hsa_ext_image_get_capability (
|
||||
hsa_agent_t agent, hsa_ext_image_geometry_t geometry,
|
||||
const hsa_ext_image_format_t *image_format, uint32_t *capability_mask);
|
||||
enum
|
||||
{
|
||||
HSA_EXT_STATUS_ERROR_INVALID_PROGRAM = 0x2000,
|
||||
HSA_EXT_STATUS_ERROR_INVALID_MODULE = 0x2001,
|
||||
HSA_EXT_STATUS_ERROR_INCOMPATIBLE_MODULE = 0x2002,
|
||||
HSA_EXT_STATUS_ERROR_MODULE_ALREADY_INCLUDED = 0x2003,
|
||||
HSA_EXT_STATUS_ERROR_SYMBOL_MISMATCH = 0x2004,
|
||||
HSA_EXT_STATUS_ERROR_FINALIZATION_FAILED = 0x2005,
|
||||
HSA_EXT_STATUS_ERROR_DIRECTIVE_MISMATCH = 0x2006
|
||||
};
|
||||
hsa_status_t hsa_ext_sampler_destroy (hsa_agent_t agent,
|
||||
hsa_ext_sampler_t sampler);
|
||||
hsa_status_t hsa_ext_program_finalize (
|
||||
hsa_ext_program_t program, hsa_isa_t isa, int32_t call_convention,
|
||||
hsa_ext_control_directives_t control_directives, const char *options,
|
||||
hsa_code_object_type_t code_object_type, hsa_code_object_t *code_object);
|
||||
hsa_status_t hsa_ext_image_create (
|
||||
hsa_agent_t agent, const hsa_ext_image_descriptor_t *image_descriptor,
|
||||
const void *image_data, hsa_access_permission_t access_permission,
|
||||
hsa_ext_image_t *image);
|
||||
hsa_status_t hsa_ext_program_destroy (hsa_ext_program_t program);
|
||||
hsa_status_t hsa_ext_image_copy (hsa_agent_t agent, hsa_ext_image_t src_image,
|
||||
const hsa_dim3_t *src_offset,
|
||||
hsa_ext_image_t dst_image,
|
||||
const hsa_dim3_t *dst_offset,
|
||||
const hsa_dim3_t *range);
|
||||
hsa_status_t hsa_ext_image_clear (hsa_agent_t agent, hsa_ext_image_t image,
|
||||
const void *data,
|
||||
const hsa_ext_image_region_t *image_region);
|
||||
enum
|
||||
{
|
||||
HSA_EXT_AGENT_INFO_IMAGE_1D_MAX_ELEMENTS = 0x3000,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_1DA_MAX_ELEMENTS = 0x3001,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_1DB_MAX_ELEMENTS = 0x3002,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_2D_MAX_ELEMENTS = 0x3003,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_2DA_MAX_ELEMENTS = 0x3004,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_2DDEPTH_MAX_ELEMENTS = 0x3005,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_2DADEPTH_MAX_ELEMENTS = 0x3006,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_3D_MAX_ELEMENTS = 0x3007,
|
||||
HSA_EXT_AGENT_INFO_IMAGE_ARRAY_MAX_LAYERS = 0x3008,
|
||||
HSA_EXT_AGENT_INFO_MAX_IMAGE_RD_HANDLES = 0x3009,
|
||||
HSA_EXT_AGENT_INFO_MAX_IMAGE_RORW_HANDLES = 0x300A,
|
||||
HSA_EXT_AGENT_INFO_MAX_SAMPLER_HANDLERS = 0x300B
|
||||
};
|
||||
hsa_status_t
|
||||
hsa_ext_sampler_create (hsa_agent_t agent,
|
||||
const hsa_ext_sampler_descriptor_t *sampler_descriptor,
|
||||
hsa_ext_sampler_t *sampler);
|
||||
|
||||
#endif /* _HSA_EXT_FINALIZE_H */
|
File diff suppressed because it is too large
Load diff
|
@ -1,7 +1,7 @@
|
|||
# Makefile.in generated by automake 1.15.1 from Makefile.am.
|
||||
# Makefile.in generated by automake 1.16.1 from Makefile.am.
|
||||
# @configure_input@
|
||||
|
||||
# Copyright (C) 1994-2017 Free Software Foundation, Inc.
|
||||
# Copyright (C) 1994-2018 Free Software Foundation, Inc.
|
||||
|
||||
# This Makefile.in is free software; the Free Software Foundation
|
||||
# gives unlimited permission to copy and/or distribute it,
|
||||
|
@ -215,10 +215,6 @@ PLUGIN_GCN = @PLUGIN_GCN@
|
|||
PLUGIN_GCN_CPPFLAGS = @PLUGIN_GCN_CPPFLAGS@
|
||||
PLUGIN_GCN_LDFLAGS = @PLUGIN_GCN_LDFLAGS@
|
||||
PLUGIN_GCN_LIBS = @PLUGIN_GCN_LIBS@
|
||||
PLUGIN_HSA = @PLUGIN_HSA@
|
||||
PLUGIN_HSA_CPPFLAGS = @PLUGIN_HSA_CPPFLAGS@
|
||||
PLUGIN_HSA_LDFLAGS = @PLUGIN_HSA_LDFLAGS@
|
||||
PLUGIN_HSA_LIBS = @PLUGIN_HSA_LIBS@
|
||||
PLUGIN_NVPTX = @PLUGIN_NVPTX@
|
||||
PLUGIN_NVPTX_CPPFLAGS = @PLUGIN_NVPTX_CPPFLAGS@
|
||||
PLUGIN_NVPTX_LDFLAGS = @PLUGIN_NVPTX_LDFLAGS@
|
||||
|
@ -335,8 +331,8 @@ Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
|
|||
*config.status*) \
|
||||
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
|
||||
*) \
|
||||
echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
|
||||
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
|
||||
echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__maybe_remake_depfiles)'; \
|
||||
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__maybe_remake_depfiles);; \
|
||||
esac;
|
||||
|
||||
$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
|
||||
|
|
|
@ -233,9 +233,6 @@ proc libgomp_init { args } {
|
|||
# Disable caret
|
||||
lappend ALWAYS_CFLAGS "additional_flags=-fno-diagnostics-show-caret"
|
||||
|
||||
# Disable HSA warnings by default.
|
||||
lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa"
|
||||
|
||||
# Disable color diagnostics
|
||||
lappend ALWAYS_CFLAGS "additional_flags=-fdiagnostics-color=never"
|
||||
|
||||
|
@ -325,9 +322,6 @@ proc offload_target_to_openacc_device_type { offload_target } {
|
|||
disable {
|
||||
return "host"
|
||||
}
|
||||
hsa* {
|
||||
return ""
|
||||
}
|
||||
*-intelmic* {
|
||||
return ""
|
||||
}
|
||||
|
@ -430,60 +424,6 @@ proc check_effective_target_openacc_host_selected { } {
|
|||
return [string match "host" $openacc_device_type]
|
||||
}
|
||||
|
||||
# Return 1 if the selected OMP device is actually a HSA device
|
||||
|
||||
proc check_effective_target_hsa_offloading_selected_nocache {} {
|
||||
global tool
|
||||
|
||||
set src {
|
||||
int main () {
|
||||
int v = 1;
|
||||
#pragma omp target map(from:v)
|
||||
v = 0;
|
||||
return v;
|
||||
}
|
||||
}
|
||||
|
||||
set result [check_compile hsa_offloading_src executable $src]
|
||||
set lines [lindex $result 0]
|
||||
set exe [lindex $result 1]
|
||||
|
||||
set ok 0
|
||||
if { [string match "" $lines] } {
|
||||
# No error messages, let us switch on HSA debugging output and run it
|
||||
set prev_HSA_DEBUG [getenv HSA_DEBUG]
|
||||
setenv HSA_DEBUG "1"
|
||||
set result [remote_load target "./$exe"]
|
||||
if { [string match "" $prev_HSA_DEBUG] } {
|
||||
unsetenv HSA_DEBUG
|
||||
} else {
|
||||
setenv HSA_DEBUG $prev_HSA_DEBUG
|
||||
}
|
||||
set status [lindex $result 0]
|
||||
if { $status != "pass" } {
|
||||
remote_file build delete $exe
|
||||
verbose "HSA availability test failed"
|
||||
return 0
|
||||
}
|
||||
set output [lindex $result 1]
|
||||
if { [string match "*HSA debug: Going to dispatch kernel*" $output] } {
|
||||
verbose "HSA availability detected"
|
||||
set ok 1
|
||||
}
|
||||
}
|
||||
remote_file build delete $exe
|
||||
return $ok
|
||||
}
|
||||
|
||||
# Return 1 if the selected OMP device is actually a HSA device and
|
||||
# cache the result
|
||||
|
||||
proc check_effective_target_hsa_offloading_selected {} {
|
||||
return [check_cached_effective_target hsa_offloading_selected {
|
||||
check_effective_target_hsa_offloading_selected_nocache
|
||||
}]
|
||||
}
|
||||
|
||||
# Return 1 if at least one AMD GPU is accessible.
|
||||
|
||||
proc check_effective_target_openacc_radeon_accel_present { } {
|
||||
|
|
|
@ -1,25 +0,0 @@
|
|||
#define size 10
|
||||
int i, j, k;
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
char *s = __builtin_malloc (size + 1);
|
||||
|
||||
#pragma omp target teams
|
||||
{
|
||||
#pragma omp distribute parallel for default(none) private(i) shared(s)
|
||||
for (i = 0; i < size; ++i)
|
||||
{
|
||||
char *buffer = __builtin_alloca (10);
|
||||
buffer[5] = 97 + i;
|
||||
s[i] = buffer[5];
|
||||
}
|
||||
}
|
||||
|
||||
for (i = 0; i < size; ++i)
|
||||
if (s[i] != 97 + i)
|
||||
__builtin_abort ();
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,160 +0,0 @@
|
|||
#include <assert.h>
|
||||
|
||||
#define ASSIGN_SX(N) \
|
||||
s##N.a1 = 1; \
|
||||
s##N.a2 = 2; \
|
||||
s##N.a3 = 3; \
|
||||
s##N.a4 = 4; \
|
||||
s##N.a5 = 5; \
|
||||
s##N.a6 = 6; \
|
||||
s##N.a7 = 7; \
|
||||
s##N.a8 = 8; \
|
||||
s##N.a9 = 9; \
|
||||
s##N.a10 = 10;
|
||||
|
||||
#define ASSERT_SX(N) \
|
||||
assert (s##N.a1 == 1); \
|
||||
assert (s##N.a2 == 2); \
|
||||
assert (s##N.a3 == 3); \
|
||||
assert (s##N.a4 == 4); \
|
||||
assert (s##N.a5 == 5); \
|
||||
assert (s##N.a6 == 6); \
|
||||
assert (s##N.a7 == 7); \
|
||||
assert (s##N.a8 == 8); \
|
||||
assert (s##N.a9 == 9); \
|
||||
assert (s##N.a10 == 10);
|
||||
|
||||
struct S1
|
||||
{
|
||||
unsigned a : 10;
|
||||
unsigned b : 20;
|
||||
};
|
||||
|
||||
struct S2
|
||||
{
|
||||
unsigned a1 : 10;
|
||||
unsigned a2 : 10;
|
||||
unsigned a3 : 10;
|
||||
unsigned a4 : 10;
|
||||
unsigned a5 : 10;
|
||||
unsigned a6 : 10;
|
||||
unsigned a7 : 10;
|
||||
unsigned a8 : 10;
|
||||
unsigned a9 : 10;
|
||||
unsigned a10 : 10;
|
||||
};
|
||||
|
||||
struct S3
|
||||
{
|
||||
unsigned a1 : 10;
|
||||
unsigned a2 : 9;
|
||||
unsigned a3 : 8;
|
||||
unsigned a4 : 7;
|
||||
unsigned a5 : 6;
|
||||
unsigned a6 : 5;
|
||||
unsigned a7 : 6;
|
||||
unsigned a8 : 7;
|
||||
unsigned a9 : 8;
|
||||
unsigned a10 : 9;
|
||||
};
|
||||
|
||||
struct S4
|
||||
{
|
||||
unsigned a1 : 10;
|
||||
int a2 : 9;
|
||||
unsigned a3 : 8;
|
||||
int a4 : 7;
|
||||
unsigned a5 : 6;
|
||||
int a6 : 5;
|
||||
unsigned a7 : 6;
|
||||
int a8 : 7;
|
||||
unsigned a9 : 8;
|
||||
int a10 : 9;
|
||||
};
|
||||
|
||||
struct S5
|
||||
{
|
||||
unsigned a1 : 31;
|
||||
int a2 : 9;
|
||||
unsigned a3 : 17;
|
||||
int a4 : 7;
|
||||
unsigned a5 : 6;
|
||||
int a6 : 5;
|
||||
unsigned long a7 : 55;
|
||||
int a8 : 7;
|
||||
unsigned a9 : 8;
|
||||
int a10 : 9;
|
||||
};
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
struct S1 s1;
|
||||
|
||||
#pragma omp target map(to: s1)
|
||||
{
|
||||
s1.a = 2;
|
||||
s1.b = 3;
|
||||
}
|
||||
|
||||
assert (s1.a == 2);
|
||||
assert (s1.b == 3);
|
||||
|
||||
struct S2 s2;
|
||||
|
||||
#pragma omp target map(to: s2)
|
||||
{
|
||||
ASSIGN_SX (2)
|
||||
}
|
||||
|
||||
ASSERT_SX (2)
|
||||
|
||||
struct S3 s3;
|
||||
|
||||
#pragma omp target map(to: s3)
|
||||
{
|
||||
ASSIGN_SX (3)
|
||||
}
|
||||
|
||||
ASSERT_SX (3)
|
||||
|
||||
struct S4 s4;
|
||||
|
||||
#pragma omp target map(to: s4)
|
||||
{
|
||||
ASSIGN_SX (4)
|
||||
}
|
||||
|
||||
ASSERT_SX (4)
|
||||
|
||||
struct S4 s5;
|
||||
|
||||
s5.a1 = 0;
|
||||
s5.a2 = 1;
|
||||
s5.a3 = 2;
|
||||
s5.a4 = 3;
|
||||
s5.a5 = 4;
|
||||
s5.a6 = 5;
|
||||
s5.a7 = 6;
|
||||
s5.a8 = 7;
|
||||
s5.a9 = 8;
|
||||
s5.a10 = 9;
|
||||
|
||||
#pragma omp target map(to: s5)
|
||||
{
|
||||
s5.a1++;
|
||||
s5.a2++;
|
||||
s5.a3++;
|
||||
s5.a4++;
|
||||
s5.a5++;
|
||||
s5.a6++;
|
||||
s5.a7++;
|
||||
s5.a8++;
|
||||
s5.a9++;
|
||||
s5.a10++;
|
||||
}
|
||||
|
||||
ASSERT_SX (5)
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,73 +0,0 @@
|
|||
#include <math.h>
|
||||
|
||||
#define N 12
|
||||
|
||||
int main()
|
||||
{
|
||||
unsigned int arguments[N] = {0u, 1u, 2u, 3u, 111u, 333u, 444u, 0x80000000u, 0x0000ffffu, 0xf0000000u, 0xff000000u, 0xffffffffu};
|
||||
int clrsb[N] = {};
|
||||
int clz[N] = {};
|
||||
int ctz[N] = {};
|
||||
int ffs[N] = {};
|
||||
int parity[N] = {};
|
||||
int popcount[N] = {};
|
||||
|
||||
int ref_clrsb[N] = {};
|
||||
int ref_clz[N] = {};
|
||||
int ref_ctz[N] = {};
|
||||
int ref_ffs[N] = {};
|
||||
int ref_parity[N] = {};
|
||||
int ref_popcount[N] = {};
|
||||
|
||||
for (unsigned i = 0; i < N; i++)
|
||||
{
|
||||
ref_clrsb[i] = __builtin_clrsb (arguments[i]);
|
||||
ref_clz[i] = __builtin_clz (arguments[i]);
|
||||
ref_ctz[i] = __builtin_ctz (arguments[i]);
|
||||
ref_ffs[i] = __builtin_ffs (arguments[i]);
|
||||
ref_parity[i] = __builtin_parity (arguments[i]);
|
||||
ref_popcount[i] = __builtin_popcount (arguments[i]);
|
||||
}
|
||||
|
||||
#pragma omp target map(from:clz, ctz, ffs, parity, popcount)
|
||||
{
|
||||
for (unsigned i = 0; i < N; i++)
|
||||
{
|
||||
clrsb[i] = __builtin_clrsb (arguments[i]);
|
||||
clz[i] = __builtin_clz (arguments[i]);
|
||||
ctz[i] = __builtin_ctz (arguments[i]);
|
||||
ffs[i] = __builtin_ffs (arguments[i]);
|
||||
parity[i] = __builtin_parity (arguments[i]);
|
||||
popcount[i] = __builtin_popcount (arguments[i]);
|
||||
}
|
||||
}
|
||||
|
||||
for (unsigned i = 0; i < N; i++)
|
||||
if (ref_clrsb[i] != clrsb[i])
|
||||
__builtin_abort ();
|
||||
|
||||
/* CLZ of zero is undefined for zero. */
|
||||
for (unsigned i = 1; i < N; i++)
|
||||
if (ref_clz[i] != clz[i])
|
||||
__builtin_abort ();
|
||||
|
||||
/* Likewise for ctz */
|
||||
for (unsigned i = 1; i < N; i++)
|
||||
if (ref_ctz[i] != ctz[i])
|
||||
__builtin_abort ();
|
||||
|
||||
for (unsigned i = 0; i < N; i++)
|
||||
if (ref_ffs[i] != ffs[i])
|
||||
__builtin_abort ();
|
||||
|
||||
for (unsigned i = 0; i < N; i++)
|
||||
if (ref_parity[i] != parity[i])
|
||||
__builtin_abort ();
|
||||
|
||||
for (unsigned i = 0; i < N; i++)
|
||||
if (ref_popcount[i] != popcount[i])
|
||||
__builtin_abort ();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
|
@ -1,97 +0,0 @@
|
|||
/* { dg-additional-options "-ffast-math" } */
|
||||
|
||||
#include <assert.h>
|
||||
#include <math.h>
|
||||
|
||||
#define N 10
|
||||
#define N2 14
|
||||
|
||||
#define c1 1.2345f
|
||||
#define c2 1.2345
|
||||
|
||||
#define DELTA 0.001
|
||||
|
||||
#define TEST_BIT_BUILTINS(T, S, S2) \
|
||||
{ \
|
||||
T arguments[N2] \
|
||||
= {0##S, 1##S, 2##S, 3##S, \
|
||||
111##S, 333##S, 444##S, 0x80000000##S, \
|
||||
0x0000ffff##S, 0xf0000000##S, 0xff000000##S, 0xffffffff##S}; \
|
||||
int clrsb[N2] = {}; \
|
||||
int clz[N2] = {}; \
|
||||
int ctz[N2] = {}; \
|
||||
int ffs[N2] = {}; \
|
||||
int parity[N2] = {}; \
|
||||
int popcount[N2] = {}; \
|
||||
\
|
||||
_Pragma ("omp target map(to:clz[:N2], ctz[:N2], ffs[:N2], parity[:N2], popcount[:N2])") \
|
||||
{ \
|
||||
for (unsigned i = 0; i < N2; i++) \
|
||||
{ \
|
||||
clrsb[i] = __builtin_clrsb##S2 (arguments[i]); \
|
||||
clz[i] = __builtin_clz##S2 (arguments[i]); \
|
||||
ctz[i] = __builtin_ctz##S2 (arguments[i]); \
|
||||
ffs[i] = __builtin_ffs##S2 (arguments[i]); \
|
||||
parity[i] = __builtin_parity##S2 (arguments[i]); \
|
||||
popcount[i] = __builtin_popcount##S2 (arguments[i]); \
|
||||
} \
|
||||
} \
|
||||
\
|
||||
for (unsigned i = 0; i < N2; i++) \
|
||||
{ \
|
||||
assert (clrsb[i] == __builtin_clrsb##S2 (arguments[i])); \
|
||||
if (arguments[0] != 0) \
|
||||
{ \
|
||||
assert (clz[i] == __builtin_clz##S2 (arguments[i])); \
|
||||
assert (ctz[i] == __builtin_ctz##S2 (arguments[i])); \
|
||||
} \
|
||||
assert (ffs[i] == __builtin_ffs##S2 (arguments[i])); \
|
||||
assert (parity[i] == __builtin_parity##S2 (arguments[i])); \
|
||||
assert (popcount[i] == __builtin_popcount##S2 (arguments[i])); \
|
||||
} \
|
||||
}
|
||||
|
||||
#define ASSERT(v1, v2) assert (fabs (v1 - v2) < DELTA)
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
float f[N] = {};
|
||||
float d[N] = {};
|
||||
|
||||
/* 1) test direct mapping to HSA insns. */
|
||||
|
||||
#pragma omp target map(to: f[ : N], d[ : N])
|
||||
{
|
||||
f[0] = sinf (c1);
|
||||
f[1] = cosf (c1);
|
||||
f[2] = exp2f (c1);
|
||||
f[3] = log2f (c1);
|
||||
f[4] = truncf (c1);
|
||||
f[5] = sqrtf (c1);
|
||||
|
||||
d[0] = trunc (c2);
|
||||
d[1] = sqrt (c2);
|
||||
}
|
||||
|
||||
ASSERT (f[0], sinf (c1));
|
||||
ASSERT (f[1], cosf (c1));
|
||||
ASSERT (f[2], exp2f (c1));
|
||||
ASSERT (f[3], log2f (c1));
|
||||
ASSERT (f[4], truncf (c1));
|
||||
ASSERT (f[5], sqrtf (c1));
|
||||
|
||||
ASSERT (d[0], trunc (c2));
|
||||
ASSERT (d[1], sqrt (c2));
|
||||
|
||||
/* 2) test bit builtins for unsigned int. */
|
||||
TEST_BIT_BUILTINS (int, , );
|
||||
|
||||
/* 3) test bit builtins for unsigned long int. */
|
||||
TEST_BIT_BUILTINS (long, l, l);
|
||||
|
||||
/* 4) test bit builtins for unsigned long long int. */
|
||||
TEST_BIT_BUILTINS (long long, ll, ll);
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,42 +0,0 @@
|
|||
if [info exists lang_library_path] then {
|
||||
unset lang_library_path
|
||||
unset lang_link_flags
|
||||
}
|
||||
if [info exists lang_test_file] then {
|
||||
unset lang_test_file
|
||||
}
|
||||
if [info exists lang_include_flags] then {
|
||||
unset lang_include_flags
|
||||
}
|
||||
|
||||
load_lib libgomp-dg.exp
|
||||
load_gcc_lib gcc-dg.exp
|
||||
|
||||
# Initialize dg.
|
||||
dg-init
|
||||
|
||||
# Turn on OpenMP.
|
||||
lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
|
||||
|
||||
set ld_library_path $always_ld_library_path
|
||||
append ld_library_path [gcc-set-multilib-library-path $GCC_UNDER_TEST]
|
||||
set_ld_library_path_env_vars
|
||||
|
||||
global DEFAULT_CFLAGS
|
||||
if [info exists DEFAULT_CFLAGS] then {
|
||||
set CFLAGS_list [list "-O0" $DEFAULT_CFLAGS]
|
||||
} else {
|
||||
set CFLAGS_list [list "-O0" "-O2"]
|
||||
}
|
||||
|
||||
if [check_effective_target_hsa_offloading_selected] {
|
||||
foreach USE_CFLAGS $CFLAGS_list {
|
||||
# Gather a list of all tests.
|
||||
set tests [lsort [find $srcdir/$subdir *.c]]
|
||||
# Main loop.
|
||||
dg-runtest $tests "" [concat $USE_CFLAGS "-Whsa"]
|
||||
}
|
||||
}
|
||||
|
||||
# All done.
|
||||
dg-finish
|
|
@ -1,65 +0,0 @@
|
|||
#include <assert.h>
|
||||
#include <complex.h>
|
||||
#include <math.h>
|
||||
|
||||
#define uchar unsigned char
|
||||
#define C 123
|
||||
|
||||
#define TEST(type) \
|
||||
type foo_##type (void) \
|
||||
{ \
|
||||
_Complex type a = C + 45I; \
|
||||
return __real__ a; \
|
||||
}
|
||||
|
||||
#pragma omp declare target
|
||||
TEST (char)
|
||||
TEST (uchar)
|
||||
TEST (short)
|
||||
TEST (int)
|
||||
|
||||
float
|
||||
bar (float a, float b)
|
||||
{
|
||||
_Complex float c = a + b * I;
|
||||
|
||||
c += 11.f + 12.f * I;
|
||||
|
||||
_Complex float d = 2.f + 4.44f * I;
|
||||
|
||||
return __real__(crealf (c + d) + cimag (d) * I);
|
||||
}
|
||||
|
||||
#pragma omp end declare target
|
||||
|
||||
int
|
||||
main (void)
|
||||
{
|
||||
int v = 0;
|
||||
float v2 = 0.0f;
|
||||
|
||||
#pragma omp target map(to: v)
|
||||
v = foo_char ();
|
||||
|
||||
assert (v == C);
|
||||
|
||||
#pragma omp target map(to: v)
|
||||
v = foo_uchar ();
|
||||
|
||||
assert (v == C);
|
||||
|
||||
#pragma omp target map(to: v)
|
||||
v = foo_short ();
|
||||
|
||||
assert (v == C);
|
||||
|
||||
#pragma omp target map(to: v)
|
||||
v = foo_int ();
|
||||
|
||||
assert (v == C);
|
||||
|
||||
#pragma omp target map(to: v2)
|
||||
v2 = bar (1.12f, 4.44f);
|
||||
|
||||
assert (fabs (v2 - 14.12) < 0.0001f);
|
||||
}
|
|
@ -1,27 +0,0 @@
|
|||
#pragma omp declare target
|
||||
_Complex int *g;
|
||||
#pragma omp end declare target
|
||||
|
||||
|
||||
|
||||
_Complex float f(void);
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
_Complex int y;
|
||||
#pragma omp target map(from:y)
|
||||
{
|
||||
_Complex int x;
|
||||
g = &x;
|
||||
__imag__ x = 1;
|
||||
__real__ x = 2;
|
||||
y = x;
|
||||
}
|
||||
|
||||
if ((__imag__ y != 1)
|
||||
|| (__real__ y != 2))
|
||||
__builtin_abort ();
|
||||
return 0;
|
||||
}
|
||||
|
|
@ -1,83 +0,0 @@
|
|||
#include <assert.h>
|
||||
|
||||
struct Cube
|
||||
{
|
||||
int x;
|
||||
int y;
|
||||
int z;
|
||||
};
|
||||
|
||||
#pragma omp declare target
|
||||
int
|
||||
foo (short a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1:
|
||||
return 11;
|
||||
break;
|
||||
case 33:
|
||||
return 333;
|
||||
break;
|
||||
case 55:
|
||||
return 55;
|
||||
break;
|
||||
default:
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
bar (int a)
|
||||
{
|
||||
int *ptr = &a;
|
||||
|
||||
*ptr = 100;
|
||||
return a + *ptr;
|
||||
}
|
||||
|
||||
struct Cube
|
||||
baz (struct Cube c)
|
||||
{
|
||||
c.x = 11;
|
||||
return c;
|
||||
}
|
||||
|
||||
#pragma omp end declare target
|
||||
|
||||
#define s 100
|
||||
|
||||
int
|
||||
main (int argc)
|
||||
{
|
||||
/* Test 1: argument types: char to short. */
|
||||
|
||||
int array[s];
|
||||
#pragma omp target map(tofrom : array[ : s])
|
||||
{
|
||||
for (char i = 0; i < s; i++)
|
||||
array[i] = foo (i);
|
||||
}
|
||||
|
||||
for (int i = 0; i < s; i++)
|
||||
assert (array[i] == foo (i));
|
||||
|
||||
/* Test 2: argument address is taken. */
|
||||
int v = 2;
|
||||
|
||||
#pragma omp target map(tofrom : v)
|
||||
v = bar (v);
|
||||
|
||||
assert (v == 200);
|
||||
|
||||
/* Test 3: passing a structure as a function argument. */
|
||||
struct Cube r;
|
||||
struct Cube c = {.x = 1, .y = 2, .z = 3};
|
||||
|
||||
#pragma omp target map(to : r) map(from : c)
|
||||
r = baz (c);
|
||||
|
||||
assert (r.x == 11);
|
||||
assert (r.y == c.y);
|
||||
assert (r.z == c.z);
|
||||
}
|
|
@ -1,50 +0,0 @@
|
|||
#define size 8
|
||||
|
||||
#pragma omp declare target
|
||||
int
|
||||
identity (int x)
|
||||
{
|
||||
return x;
|
||||
}
|
||||
|
||||
int
|
||||
expx (int x, int n)
|
||||
{
|
||||
for (int i = 0; i < n - 1; i++)
|
||||
x *= x;
|
||||
|
||||
return x;
|
||||
}
|
||||
|
||||
float
|
||||
init (int x, int y)
|
||||
{
|
||||
int x1 = identity (identity (identity (identity (x))));
|
||||
int y1 = identity (identity (identity (identity (y))));
|
||||
|
||||
int x2 = expx (x1, 2);
|
||||
int y2 = expx (y1, 2);
|
||||
|
||||
return (x2 + y2);
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
int i, j;
|
||||
int a[size][size];
|
||||
|
||||
#pragma omp target teams map(to:a[:size][:size])
|
||||
#pragma omp distribute parallel for default(none) private(i, j) shared(a)
|
||||
for (i = 0; i < size; ++i)
|
||||
for (j = 0; j < size; ++j)
|
||||
a[i][j] = init (i, j);
|
||||
|
||||
for (i = 0; i < size; ++i)
|
||||
for (j = 0; j < size; ++j)
|
||||
if (i * i + j * j != a[i][j])
|
||||
__builtin_abort ();
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
#include <omp.h>
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
int i;
|
||||
int level = -1;
|
||||
|
||||
#pragma omp target map(tofrom : level)
|
||||
{
|
||||
level = omp_get_level ();
|
||||
}
|
||||
|
||||
if (level != 0)
|
||||
__builtin_abort ();
|
||||
|
||||
#pragma omp target teams map(tofrom : level)
|
||||
#pragma omp distribute parallel for default(none) private(i) shared(level)
|
||||
for (i = 0; i < 1; ++i)
|
||||
level += omp_get_level ();
|
||||
|
||||
if (level != 1)
|
||||
__builtin_abort ();
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
void __attribute__((noinline, noclone))
|
||||
foo (int n, int *a, int workgroup_size)
|
||||
{
|
||||
int i;
|
||||
#pragma omp target
|
||||
#pragma omp teams thread_limit(workgroup_size)
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i)
|
||||
for (i = 0; i < n; i++)
|
||||
a[i]++;
|
||||
}
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int n = 32;
|
||||
int *a = __builtin_malloc (sizeof (int) * n);
|
||||
int i;
|
||||
|
||||
__builtin_memset (a, 0, sizeof (int) * n);
|
||||
foo (n, a, 32);
|
||||
for (i = 0; i < n; i ++)
|
||||
{
|
||||
if (a[i] != 1)
|
||||
__builtin_abort ();
|
||||
}
|
||||
return 0;
|
||||
}
|
|
@ -1,26 +0,0 @@
|
|||
void __attribute__((noinline, noclone))
|
||||
foo (int j, int n, int *a)
|
||||
{
|
||||
int i;
|
||||
#pragma omp target
|
||||
#pragma omp teams
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j)
|
||||
for (i = j + 1; i < n; i++)
|
||||
a[i] = i;
|
||||
}
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int n = 32;
|
||||
int *a = __builtin_malloc (sizeof (int) * n);
|
||||
int i, j = 4;
|
||||
|
||||
__builtin_memset (a, 0, sizeof (int) * n);
|
||||
foo (j, n, a);
|
||||
for (i = j + 1; i < n; i ++)
|
||||
{
|
||||
if (a[i] != i)
|
||||
__builtin_abort ();
|
||||
}
|
||||
return 0;
|
||||
}
|
|
@ -1,39 +0,0 @@
|
|||
#define THE_LOOP \
|
||||
for (i = j + 1; i < n; i += 3) \
|
||||
a[i] = i
|
||||
|
||||
void __attribute__((noinline, noclone))
|
||||
foo (int j, int n, int *a)
|
||||
{
|
||||
int i;
|
||||
#pragma omp target
|
||||
#pragma omp teams
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j)
|
||||
THE_LOOP;
|
||||
}
|
||||
|
||||
void __attribute__((noinline, noclone))
|
||||
bar (int j, int n, int *a)
|
||||
{
|
||||
int i;
|
||||
THE_LOOP;
|
||||
}
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int n = 32;
|
||||
int *a = __builtin_malloc (sizeof (int) * n);
|
||||
int *ref = __builtin_malloc (sizeof (int) * n);
|
||||
int i, j = 4;
|
||||
|
||||
__builtin_memset (a, 0, sizeof (int) * n);
|
||||
__builtin_memset (ref, 0, sizeof (int) * n);
|
||||
bar (j, n, ref);
|
||||
foo (j, n, a);
|
||||
for (i = 0; i < n; i ++)
|
||||
{
|
||||
if (a[i] != ref[i])
|
||||
__builtin_abort ();
|
||||
}
|
||||
return 0;
|
||||
}
|
|
@ -1,45 +0,0 @@
|
|||
#define THE_LOOP \
|
||||
for (i = j + 1; i < n; i += 3) \
|
||||
a[i] = i
|
||||
|
||||
void __attribute__((noinline, noclone))
|
||||
foo (int j, int n, int *a)
|
||||
{
|
||||
#pragma omp parallel
|
||||
{
|
||||
#pragma omp single
|
||||
{
|
||||
int i;
|
||||
#pragma omp target
|
||||
#pragma omp teams
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) firstprivate(j)
|
||||
THE_LOOP;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void __attribute__((noinline, noclone))
|
||||
bar (int j, int n, int *a)
|
||||
{
|
||||
int i;
|
||||
THE_LOOP;
|
||||
}
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int n = 32;
|
||||
int *a = __builtin_malloc (sizeof (int) * n);
|
||||
int *ref = __builtin_malloc (sizeof (int) * n);
|
||||
int i, j = 4;
|
||||
|
||||
__builtin_memset (a, 0, sizeof (int) * n);
|
||||
__builtin_memset (ref, 0, sizeof (int) * n);
|
||||
bar (j, n, ref);
|
||||
foo (j, n, a);
|
||||
for (i = 0; i < n; i ++)
|
||||
{
|
||||
if (a[i] != ref[i])
|
||||
__builtin_abort ();
|
||||
}
|
||||
return 0;
|
||||
}
|
|
@ -1,92 +0,0 @@
|
|||
#include <assert.h>
|
||||
|
||||
#define C 55
|
||||
|
||||
int i, j, k;
|
||||
|
||||
static void
|
||||
test_bzero (unsigned size)
|
||||
{
|
||||
unsigned bsize = size * sizeof (int);
|
||||
int *x = __builtin_malloc (bsize);
|
||||
__builtin_memset (x, C, bsize);
|
||||
|
||||
#pragma omp target map(tofrom: x[:size]) map(from: bsize)
|
||||
{
|
||||
__builtin_bzero (x, bsize);
|
||||
}
|
||||
|
||||
char *buffer = (char *) x;
|
||||
for (unsigned i = 0; i < bsize; ++i)
|
||||
assert (buffer[i] == 0);
|
||||
}
|
||||
|
||||
static void
|
||||
test_memcpy (unsigned size)
|
||||
{
|
||||
unsigned bsize = size * sizeof (int);
|
||||
int *x = __builtin_malloc (bsize);
|
||||
__builtin_memset (x, C, bsize);
|
||||
int *y = __builtin_malloc (bsize);
|
||||
|
||||
#pragma omp target map(tofrom: x[:size], y[:size]) map(from: bsize)
|
||||
{
|
||||
__builtin_memcpy (y, x, bsize);
|
||||
}
|
||||
|
||||
char *buffer = (char *) y;
|
||||
for (unsigned i = 0; i < bsize; ++i)
|
||||
assert (buffer[i] == C);
|
||||
}
|
||||
|
||||
static void
|
||||
test_mempcpy (unsigned size)
|
||||
{
|
||||
unsigned bsize = size * sizeof (int);
|
||||
int *x = __builtin_malloc (bsize);
|
||||
__builtin_memset (x, C, bsize);
|
||||
int *y = __builtin_malloc (bsize);
|
||||
int *ptr = 0;
|
||||
|
||||
#pragma omp target map(tofrom :x[:size], y[:size], ptr) map(from: bsize)
|
||||
{
|
||||
ptr = __builtin_mempcpy (y, x, bsize);
|
||||
}
|
||||
|
||||
char *buffer = (char *) y;
|
||||
for (unsigned i = 0; i < bsize; ++i)
|
||||
assert (buffer[i] == C);
|
||||
|
||||
assert (ptr == y + size);
|
||||
}
|
||||
|
||||
static void
|
||||
test_memset (unsigned size)
|
||||
{
|
||||
unsigned bsize = size * sizeof (int);
|
||||
int *x = __builtin_malloc (bsize);
|
||||
__builtin_bzero (x, bsize);
|
||||
|
||||
#pragma omp target map(tofrom : x[:size]) map(from: bsize)
|
||||
{
|
||||
__builtin_memset (x, C, bsize);
|
||||
}
|
||||
|
||||
char *buffer = (char *) x;
|
||||
for (unsigned i = 0; i < bsize; ++i)
|
||||
assert (buffer[i] == C);
|
||||
}
|
||||
|
||||
int
|
||||
main (void)
|
||||
{
|
||||
unsigned tests[] = {1, 2, 3, 4, 5, 8, 15, 17, 23, 33, 0};
|
||||
|
||||
for (unsigned i = 0; tests[i]; i++)
|
||||
{
|
||||
test_bzero (tests[i]);
|
||||
test_memset (tests[i]);
|
||||
test_memcpy (tests[i]);
|
||||
test_mempcpy (tests[i]);
|
||||
}
|
||||
}
|
|
@ -1,41 +0,0 @@
|
|||
/* PR hsa/69568 */
|
||||
|
||||
typedef float float2 __attribute__ ((vector_size (8)));
|
||||
float2 *output;
|
||||
|
||||
void __attribute__((noinline, noclone))
|
||||
foo (int n, float2 *a, int workgroup_size)
|
||||
{
|
||||
int i;
|
||||
#pragma omp target map(from:a[:n]) firstprivate(n, workgroup_size)
|
||||
#pragma omp teams thread_limit(workgroup_size)
|
||||
#pragma omp distribute parallel for shared(a) firstprivate(n) private(i)
|
||||
for (i = 0; i < n; i++)
|
||||
{ float2 v;
|
||||
v[0] = i;
|
||||
v[1] = 1+i;
|
||||
a[i] = v;
|
||||
}
|
||||
}
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
int n = 32;
|
||||
float2 *a = __builtin_malloc (sizeof (float2) * n);
|
||||
int i;
|
||||
|
||||
__builtin_memset (a, 0, sizeof (float2) * n);
|
||||
foo (n, a, 32);
|
||||
for (i = 0; i < n; i++)
|
||||
{
|
||||
float2 v = a[i];
|
||||
if (__builtin_abs (v[0] - i) > 0.1
|
||||
|| __builtin_abs (v[1] - i - 1) > 0.1)
|
||||
{
|
||||
__builtin_abort ();
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
|
@ -1,43 +0,0 @@
|
|||
char __attribute__ ((noipa))
|
||||
toup (char X)
|
||||
{
|
||||
if (X >= 97 && X <= 122)
|
||||
return X - 32;
|
||||
else
|
||||
return X;
|
||||
}
|
||||
|
||||
char
|
||||
target_toup_1 (char X)
|
||||
{
|
||||
char r;
|
||||
#pragma omp target map(to:X) map(from:r)
|
||||
{
|
||||
if (X >= 97 && X <= 122)
|
||||
r = X - 32;
|
||||
else
|
||||
r = X;
|
||||
}
|
||||
return r;
|
||||
}
|
||||
|
||||
char __attribute__ ((noipa))
|
||||
target_toup (char X)
|
||||
{
|
||||
return target_toup_1 (X);
|
||||
}
|
||||
|
||||
int main (int argc, char **argv)
|
||||
{
|
||||
char a = 'a';
|
||||
if (toup (a) != target_toup (a))
|
||||
__builtin_abort ();
|
||||
a = 'Z';
|
||||
if (toup (a) != target_toup (a))
|
||||
__builtin_abort ();
|
||||
a = 5;
|
||||
if (toup (a) != target_toup (a))
|
||||
__builtin_abort ();
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,39 +0,0 @@
|
|||
#include <assert.h>
|
||||
#include <limits.h>
|
||||
|
||||
#define T unsigned int
|
||||
#define BITSIZE CHAR_BIT * sizeof (T)
|
||||
|
||||
#define C1 123u
|
||||
|
||||
#pragma omp declare target
|
||||
T
|
||||
rotate (T value, T shift)
|
||||
{
|
||||
T r = (value << shift) | (value >> (BITSIZE - shift));
|
||||
return (r >> shift) | (r << (BITSIZE - shift));
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
int
|
||||
main (int argc)
|
||||
{
|
||||
T v1, v2, v3, v4, v5;
|
||||
|
||||
#pragma omp target map(to: v1, v2, v3, v4, v5)
|
||||
{
|
||||
v1 = rotate (C1, 10);
|
||||
v2 = rotate (C1, 2);
|
||||
v3 = rotate (C1, 5);
|
||||
v4 = rotate (C1, 16);
|
||||
v5 = rotate (C1, 32);
|
||||
}
|
||||
|
||||
assert (v1 == C1);
|
||||
assert (v2 == C1);
|
||||
assert (v3 == C1);
|
||||
assert (v4 == C1);
|
||||
assert (v5 == C1);
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,23 +0,0 @@
|
|||
extern void abort (void);
|
||||
|
||||
#pragma omp declare target
|
||||
int
|
||||
foo (void)
|
||||
{
|
||||
static int s;
|
||||
return ++s;
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
int
|
||||
main ()
|
||||
{
|
||||
int r;
|
||||
#pragma omp target map(from:r)
|
||||
{
|
||||
r = foo ();
|
||||
}
|
||||
if (r != 1)
|
||||
abort ();
|
||||
return 0;
|
||||
}
|
|
@ -1,145 +0,0 @@
|
|||
#include <assert.h>
|
||||
|
||||
#define s 100
|
||||
|
||||
#pragma omp declare target
|
||||
int
|
||||
switch1 (int a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1:
|
||||
return 11;
|
||||
case 33:
|
||||
return 333;
|
||||
case 55:
|
||||
return 55;
|
||||
default:
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
switch2 (int a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1 ... 11:
|
||||
return 11;
|
||||
break;
|
||||
case 33:
|
||||
return 333;
|
||||
break;
|
||||
case 55:
|
||||
return 55;
|
||||
break;
|
||||
default:
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
switch3 (int a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1 ... 11:
|
||||
return 11;
|
||||
case 12 ... 22:
|
||||
return 22;
|
||||
case 23 ... 33:
|
||||
return 33;
|
||||
case 34 ... 44:
|
||||
return 44;
|
||||
default:
|
||||
return 44;
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
switch4 (int a, int b)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1 ... 11:
|
||||
return a;
|
||||
case 12 ... 22:
|
||||
return b;
|
||||
case 23 ... 33:
|
||||
return a;
|
||||
case 34 ... 44:
|
||||
return b;
|
||||
default:
|
||||
return 12345;
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
switch5 (int a, int b)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1 ... 2:
|
||||
return 1;
|
||||
case 3 ... 4:
|
||||
return 2;
|
||||
case 5 ... 6:
|
||||
return 3;
|
||||
case 7 ... 11:
|
||||
return 4;
|
||||
}
|
||||
|
||||
return -1;
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
int
|
||||
main (int argc)
|
||||
{
|
||||
int array[s];
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = 0; i < s; i++)
|
||||
array[i] = switch1 (i);
|
||||
}
|
||||
|
||||
for (int i = 0; i < s; i++)
|
||||
assert (array[i] == switch1 (i));
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = 0; i < s; i++)
|
||||
array[i] = switch2 (i);
|
||||
}
|
||||
|
||||
for (int i = 0; i < s; i++)
|
||||
assert (array[i] == switch2 (i));
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = 0; i < s; i++)
|
||||
array[i] = switch3 (i);
|
||||
}
|
||||
|
||||
for (int i = 0; i < s; i++)
|
||||
assert (array[i] == switch3 (i));
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = 0; i < s; i++)
|
||||
array[i] = switch4 (i, i + 1);
|
||||
}
|
||||
|
||||
for (int i = 0; i < s; i++)
|
||||
assert (array[i] == switch4 (i, i + 1));
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = 0; i < s; i++)
|
||||
array[i] = switch5 (i, i + 1);
|
||||
}
|
||||
|
||||
for (int i = 0; i < s; i++)
|
||||
assert (array[i] == switch5 (i, i + 1));
|
||||
}
|
|
@ -1,116 +0,0 @@
|
|||
#include <assert.h>
|
||||
|
||||
#define s 100
|
||||
|
||||
#pragma omp declare target
|
||||
int
|
||||
switch1 (unsigned a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1 ... 11:
|
||||
return 11;
|
||||
case 12 ... 13:
|
||||
return 22;
|
||||
default:
|
||||
return 44;
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
switch2 (unsigned a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1 ... 5:
|
||||
return 1;
|
||||
case 9 ... 11:
|
||||
return a + 3;
|
||||
case 12 ... 13:
|
||||
return a + 3;
|
||||
default:
|
||||
return 44;
|
||||
}
|
||||
}
|
||||
|
||||
#define OFFSET 12
|
||||
|
||||
int
|
||||
switch3 (unsigned a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case (OFFSET + 0):
|
||||
return 1;
|
||||
case (OFFSET + 1)...(OFFSET + 11):
|
||||
return 11;
|
||||
case (OFFSET + 12)...(OFFSET + 13):
|
||||
return (OFFSET + 22);
|
||||
default:
|
||||
return (OFFSET + 44);
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
switch4 (unsigned a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case -2:
|
||||
return 1;
|
||||
case -1:
|
||||
return a + 3;
|
||||
case 3:
|
||||
return a + 3;
|
||||
default:
|
||||
return 44;
|
||||
}
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
#define low -33
|
||||
#define high 55
|
||||
|
||||
int
|
||||
main (int argc)
|
||||
{
|
||||
int array[s];
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = low; i < high; i++)
|
||||
array[i - low] = switch1 (i);
|
||||
}
|
||||
|
||||
for (int i = low; i < high; i++)
|
||||
assert (array[i - low] == switch1 (i));
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = low; i < high; i++)
|
||||
array[i - low] = switch2 (i);
|
||||
}
|
||||
|
||||
for (int i = low; i < high; i++)
|
||||
assert (array[i - low] == switch2 (i));
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = low; i < high; i++)
|
||||
array[i - low] = switch3 (i);
|
||||
}
|
||||
|
||||
for (int i = low; i < high; i++)
|
||||
assert (array[i - low] == switch3 (i));
|
||||
|
||||
#pragma omp target map(tofrom : array[:s])
|
||||
{
|
||||
for (int i = low; i < high; i++)
|
||||
array[i - low] = switch4 (i);
|
||||
}
|
||||
|
||||
for (int i = low; i < high; i++)
|
||||
assert (array[i - low] == switch4 (i));
|
||||
|
||||
return 0;
|
||||
}
|
|
@ -1,59 +0,0 @@
|
|||
/* { dg-additional-options "-fno-tree-switch-conversion" } */
|
||||
|
||||
#pragma omp declare target
|
||||
int
|
||||
foo (unsigned a)
|
||||
{
|
||||
switch (a)
|
||||
{
|
||||
case 1 ... 5:
|
||||
return 1;
|
||||
case 9 ... 11:
|
||||
return a + 3;
|
||||
case 12 ... 13:
|
||||
return a + 3;
|
||||
default:
|
||||
return 44;
|
||||
}
|
||||
}
|
||||
#pragma omp end declare target
|
||||
|
||||
#define s 100
|
||||
|
||||
void __attribute__((noinline, noclone))
|
||||
verify(int *a)
|
||||
{
|
||||
if (a[0] != 44)
|
||||
__builtin_abort ();
|
||||
|
||||
for (int i = 1; i <= 5; i++)
|
||||
if (a[i] != 1)
|
||||
__builtin_abort ();
|
||||
|
||||
for (int i = 6; i <= 8; i++)
|
||||
if (a[i] != 44)
|
||||
__builtin_abort ();
|
||||
|
||||
for (int i = 9; i <= 13; i++)
|
||||
if (a[i] != i + 3)
|
||||
__builtin_abort ();
|
||||
|
||||
for (int i = 14; i < s; i++)
|
||||
if (a[i] != 44)
|
||||
__builtin_abort ();
|
||||
}
|
||||
|
||||
int main(int argc)
|
||||
{
|
||||
int array[s];
|
||||
#pragma omp target
|
||||
{
|
||||
for (int i = 0; i < s; i++)
|
||||
{
|
||||
int v = foo (i);
|
||||
array[i] = v;
|
||||
}
|
||||
}
|
||||
verify (array);
|
||||
return 0;
|
||||
}
|
|
@ -1,212 +0,0 @@
|
|||
/*
|
||||
|
||||
matmul.c : Matrix Multiplication with tiling for openmp4 example
|
||||
|
||||
*/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <math.h>
|
||||
|
||||
#define BLOCK_SIZE 16
|
||||
/*
|
||||
#define BLOCK_SIZE 32
|
||||
*/
|
||||
#define NSECPERSEC 1000000000L
|
||||
|
||||
typedef struct {
|
||||
int width;
|
||||
int height;
|
||||
int stride;
|
||||
int hpad;
|
||||
float* elements;
|
||||
} Matrix;
|
||||
|
||||
/* Correctly extract the number of nanoseconds from the two time structures */
|
||||
long int get_nanosecs( struct timespec start_time, struct timespec end_time) {
|
||||
long int nanosecs;
|
||||
if ((end_time.tv_nsec-start_time.tv_nsec)<0) nanosecs =
|
||||
((((long int) end_time.tv_sec- (long int) start_time.tv_sec )-1)*NSECPERSEC ) +
|
||||
( NSECPERSEC + (long int) end_time.tv_nsec - (long int) start_time.tv_nsec) ;
|
||||
else nanosecs =
|
||||
(((long int) end_time.tv_sec- (long int) start_time.tv_sec )*NSECPERSEC ) +
|
||||
( (long int) end_time.tv_nsec - (long int) start_time.tv_nsec );
|
||||
return nanosecs;
|
||||
}
|
||||
|
||||
void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) ;
|
||||
void simple_sgemm_tn(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) ;
|
||||
void tiled_sgemm_tt(const int M,const int N,const int K,const float alpha, const float*A, const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) ;
|
||||
|
||||
int verify(float* v_res, float* v_ref, int len) {
|
||||
int passed = 1;
|
||||
int i;
|
||||
for (i = 0; i < len; ++i) {
|
||||
if (fabs(v_res[i] - v_ref[i]) > 0.001*v_ref[i]) {
|
||||
__builtin_abort ();
|
||||
}
|
||||
}
|
||||
return passed;
|
||||
}
|
||||
|
||||
|
||||
int main(int argc, char* argv[]){
|
||||
|
||||
Matrix A,B,Bt,C,Cref;
|
||||
int a1,a2,a3,i,j;
|
||||
struct timespec start_time1, end_time1;
|
||||
struct timespec start_time2, end_time2;
|
||||
long int nanosecs,total_ops;
|
||||
float gflopsTiled,gflopsCPU;
|
||||
|
||||
a1 = 35;
|
||||
a2 = 28;
|
||||
a3 = 47;
|
||||
|
||||
A.height = a1;
|
||||
A.width = a2;
|
||||
A.stride = (((A.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
A.hpad = (((A.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
A.elements = (float*)malloc(A.stride * A.hpad* sizeof(float));
|
||||
|
||||
B.height = a2;
|
||||
B.width = a3;
|
||||
B.stride = (((B.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
B.hpad = (((B.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
B.elements = (float*)malloc(B.stride * B.hpad * sizeof(float));
|
||||
|
||||
/* Bt is same as B but stored in column-major order */
|
||||
Bt.height = B.height;
|
||||
Bt.width = B.width;
|
||||
Bt.stride = B.stride;
|
||||
Bt.hpad = B.hpad;
|
||||
Bt.elements = (float*)malloc(Bt.stride * Bt.hpad * sizeof(float));
|
||||
|
||||
C.height = a1;
|
||||
C.width = a3;
|
||||
C.stride = (((C.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
C.hpad = (((C.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
C.elements = (float*)malloc(C.stride * C.hpad * sizeof(float));
|
||||
|
||||
Cref.height = a1;
|
||||
Cref.width = a3;
|
||||
Cref.stride = (((Cref.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
Cref.hpad = (((Cref.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
Cref.elements = (float*)malloc(Cref.stride * Cref.hpad * sizeof(float));
|
||||
|
||||
for(i = 0; i < A.hpad ; i++)
|
||||
for(j = 0; j < A.stride; j++) {
|
||||
if (( j<A.width ) && (i<A.height)) {
|
||||
A.elements[i*A.stride + j] = (i % 3);
|
||||
} else {
|
||||
A.elements[i*A.stride + j] = 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
/* Initialize B and Bt */
|
||||
for(i = 0; i < B.hpad ; i++)
|
||||
for(j = 0; j < B.stride; j++) {
|
||||
if (( j<B.width ) && (i<B.height)) {
|
||||
B.elements[i*B.stride+j] = (j % 2);
|
||||
Bt.elements[j*Bt.stride+i] = B.elements[i*B.stride+j] ;
|
||||
} else {
|
||||
B.elements[i*B.stride+j] = 0.0;
|
||||
Bt.elements[j*Bt.stride+i] = 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
/* zero C, and Cref */
|
||||
for(i = 0; i < C.hpad; i++)
|
||||
for(j = 0; j < C.stride; j++) {
|
||||
C.elements[i*C.stride+j] = 0.0;
|
||||
Cref.elements[i*Cref.stride+j] = 0.0;
|
||||
}
|
||||
|
||||
simple_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,Cref.elements,Cref.stride);
|
||||
tiled_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,C.elements,C.stride);
|
||||
|
||||
verify(C.elements, Cref.elements, C.height * C.stride);
|
||||
return 0;
|
||||
}
|
||||
|
||||
void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) {
|
||||
/* A,B, and C are in row-major order */
|
||||
int c_row,c_col,inner;
|
||||
float sum;
|
||||
for (c_col = 0 ; c_col<N; c_col++ ) {
|
||||
for (c_row = 0 ; c_row<M; c_row++ ) {
|
||||
sum = 0.0 ;
|
||||
for (inner = 0 ; inner<K; inner++ ) {
|
||||
sum += A[c_row*LDA + inner] * B[inner*LDB + c_col] ;
|
||||
}
|
||||
C[c_row*LDC + c_col] = alpha*sum + beta*C[ c_row*LDC + c_col] ;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/***************************
|
||||
|
||||
tiled_sgemm_tt: Tiled matrix multiplication:
|
||||
|
||||
***************************/
|
||||
|
||||
void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
|
||||
const float*B, const int LDB, const float beta, float*C, const int LDC){
|
||||
|
||||
#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
|
||||
#pragma omp distribute collapse(2)
|
||||
for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE)
|
||||
for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE)
|
||||
{
|
||||
// Each team has a local copy of these mini matrices
|
||||
float As[BLOCK_SIZE][BLOCK_SIZE];
|
||||
float Bs[BLOCK_SIZE][BLOCK_SIZE];
|
||||
#pragma omp parallel
|
||||
{
|
||||
int C_row, C_col;
|
||||
float Cval = 0.0;
|
||||
|
||||
for (int kblock = 0; kblock < K ; kblock += BLOCK_SIZE )
|
||||
{
|
||||
#pragma omp for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (kblock + col < K))
|
||||
As[row][col] = A[(C_row*LDA)+ kblock + col];
|
||||
else
|
||||
As[row][col] = 0;
|
||||
if ((kblock + row < K) && C_col < N)
|
||||
Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
|
||||
else
|
||||
Bs[row][col] = 0;
|
||||
}
|
||||
|
||||
#pragma omp for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
for (int e = 0; e < BLOCK_SIZE; ++e)
|
||||
Cval += As[row][e] * Bs[e][col];
|
||||
}
|
||||
} /* End for kblock .. */
|
||||
|
||||
|
||||
#pragma omp for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++)
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++)
|
||||
{
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (C_col < N))
|
||||
C[(C_row*LDC)+C_col] = alpha*Cval + beta*C[(C_row*LDC)+C_col];
|
||||
|
||||
}
|
||||
} /* end parallel */
|
||||
} /* end target teams distribute */
|
||||
}
|
|
@ -1,258 +0,0 @@
|
|||
/*
|
||||
|
||||
matmul.c : Matrix Multiplication with tiling for openmp4 example
|
||||
|
||||
*/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <math.h>
|
||||
|
||||
#define BLOCK_SIZE 16
|
||||
/*
|
||||
#define BLOCK_SIZE 32
|
||||
*/
|
||||
#define NSECPERSEC 1000000000L
|
||||
|
||||
typedef struct {
|
||||
int width;
|
||||
int height;
|
||||
int stride;
|
||||
int hpad;
|
||||
float* elements;
|
||||
} Matrix;
|
||||
|
||||
/* Correctly extract the number of nanoseconds from the two time structures */
|
||||
long int get_nanosecs( struct timespec start_time, struct timespec end_time) {
|
||||
long int nanosecs;
|
||||
if ((end_time.tv_nsec-start_time.tv_nsec)<0) nanosecs =
|
||||
((((long int) end_time.tv_sec- (long int) start_time.tv_sec )-1)*NSECPERSEC ) +
|
||||
( NSECPERSEC + (long int) end_time.tv_nsec - (long int) start_time.tv_nsec) ;
|
||||
else nanosecs =
|
||||
(((long int) end_time.tv_sec- (long int) start_time.tv_sec )*NSECPERSEC ) +
|
||||
( (long int) end_time.tv_nsec - (long int) start_time.tv_nsec );
|
||||
return nanosecs;
|
||||
}
|
||||
|
||||
void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) ;
|
||||
void simple_sgemm_tn(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) ;
|
||||
void tiled_sgemm_tt(const int M,const int N,const int K,const float alpha, const float*A, const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) ;
|
||||
|
||||
int verify(float* v_res, float* v_ref, int len) {
|
||||
int passed = 1;
|
||||
int i;
|
||||
for (i = 0; i < len; ++i) {
|
||||
if (fabs(v_res[i] - v_ref[i]) > 0.001*v_ref[i]) {
|
||||
__builtin_abort ();
|
||||
}
|
||||
}
|
||||
return passed;
|
||||
}
|
||||
|
||||
|
||||
int main(int argc, char* argv[]){
|
||||
|
||||
Matrix A,B,Bt,C,Cref;
|
||||
int a1,a2,a3,i,j;
|
||||
struct timespec start_time1, end_time1;
|
||||
struct timespec start_time2, end_time2;
|
||||
long int nanosecs,total_ops;
|
||||
float gflopsTiled,gflopsCPU;
|
||||
|
||||
a1 = 35;
|
||||
a2 = 28;
|
||||
a3 = 47;
|
||||
|
||||
A.height = a1;
|
||||
A.width = a2;
|
||||
A.stride = (((A.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
A.hpad = (((A.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
A.elements = (float*)malloc(A.stride * A.hpad* sizeof(float));
|
||||
|
||||
B.height = a2;
|
||||
B.width = a3;
|
||||
B.stride = (((B.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
B.hpad = (((B.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
B.elements = (float*)malloc(B.stride * B.hpad * sizeof(float));
|
||||
|
||||
/* Bt is same as B but stored in column-major order */
|
||||
Bt.height = B.height;
|
||||
Bt.width = B.width;
|
||||
Bt.stride = B.stride;
|
||||
Bt.hpad = B.hpad;
|
||||
Bt.elements = (float*)malloc(Bt.stride * Bt.hpad * sizeof(float));
|
||||
|
||||
C.height = a1;
|
||||
C.width = a3;
|
||||
C.stride = (((C.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
C.hpad = (((C.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
C.elements = (float*)malloc(C.stride * C.hpad * sizeof(float));
|
||||
|
||||
Cref.height = a1;
|
||||
Cref.width = a3;
|
||||
Cref.stride = (((Cref.width-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
Cref.hpad = (((Cref.height-1)/BLOCK_SIZE)+1) * BLOCK_SIZE;
|
||||
Cref.elements = (float*)malloc(Cref.stride * Cref.hpad * sizeof(float));
|
||||
|
||||
for(i = 0; i < A.hpad ; i++)
|
||||
for(j = 0; j < A.stride; j++) {
|
||||
if (( j<A.width ) && (i<A.height)) {
|
||||
A.elements[i*A.stride + j] = (i % 3);
|
||||
} else {
|
||||
A.elements[i*A.stride + j] = 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
/* Initialize B and Bt */
|
||||
for(i = 0; i < B.hpad ; i++)
|
||||
for(j = 0; j < B.stride; j++) {
|
||||
if (( j<B.width ) && (i<B.height)) {
|
||||
B.elements[i*B.stride+j] = (j % 2);
|
||||
Bt.elements[j*Bt.stride+i] = B.elements[i*B.stride+j] ;
|
||||
} else {
|
||||
B.elements[i*B.stride+j] = 0.0;
|
||||
Bt.elements[j*Bt.stride+i] = 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
/* zero C, and Cref */
|
||||
for(i = 0; i < C.hpad; i++)
|
||||
for(j = 0; j < C.stride; j++) {
|
||||
C.elements[i*C.stride+j] = 0.0;
|
||||
Cref.elements[i*Cref.stride+j] = 0.0;
|
||||
}
|
||||
|
||||
simple_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,Cref.elements,Cref.stride);
|
||||
tiled_sgemm_tt(A.height,B.width,B.height,1.0,A.elements,A.stride,B.elements,B.stride,1.0,C.elements,C.stride);
|
||||
|
||||
verify(C.elements, Cref.elements, C.height * C.stride);
|
||||
return 0;
|
||||
}
|
||||
|
||||
void simple_sgemm_tt(const int M,const int N,const int K,const float alpha, const float* A,const int LDA,
|
||||
const float* B,const int LDB, const float beta,float* C, const int LDC) {
|
||||
/* A,B, and C are in row-major order */
|
||||
int c_row,c_col,inner;
|
||||
float sum;
|
||||
for (c_col = 0 ; c_col<N; c_col++ ) {
|
||||
for (c_row = 0 ; c_row<M; c_row++ ) {
|
||||
sum = 0.0 ;
|
||||
for (inner = 0 ; inner<K; inner++ ) {
|
||||
sum += A[c_row*LDA + inner] * B[inner*LDB + c_col] ;
|
||||
}
|
||||
C[c_row*LDC + c_col] = alpha*sum + beta*C[ c_row*LDC + c_col] ;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/***************************
|
||||
|
||||
tiled_sgemm_tt: Tiled matrix multiplication:
|
||||
|
||||
***************************/
|
||||
|
||||
void tiled_sgemm_tt(const int M, const int N, const int K, const float alpha, const float*A, const int LDA,
|
||||
const float*B, const int LDB, const float beta, float*C, const int LDC){
|
||||
|
||||
#pragma omp target teams map(to:A[M*K],B[K*N]) map(from:C[M*N])
|
||||
#pragma omp distribute collapse(2)
|
||||
for (int C_row_start=0 ; C_row_start < M ; C_row_start+=BLOCK_SIZE) {
|
||||
for (int C_col_start=0 ; C_col_start < N ; C_col_start+=BLOCK_SIZE) {
|
||||
|
||||
// We now have M/BLOCK_SIZE * N/BLOCK_SIZE teams = (M*N)/(BLOCK_SIZE*BLOCK_SIZE)
|
||||
// The grid global dimensions are M,N,1
|
||||
// The grid local dimensions are BLOCK_SIZE,BLOCK_SIZE,1
|
||||
|
||||
// -------------------------------------------------------------------
|
||||
// The rest of this code forms the HSAIL kernel with the
|
||||
// pairs of "parallel for collapse(2)" loops replaced with a barrier.
|
||||
// The kernel initializes these values
|
||||
// C_row_start = get_group_id(0) * BLOCK_SIZE
|
||||
// C_col_start = get_group_id(1) * BLOCK_SIZE
|
||||
// row=get_local_id(0)
|
||||
// col=get_local_id(1)
|
||||
// -------------------------------------------------------------------
|
||||
|
||||
// Each team has a local copy of these mini matrices
|
||||
float As[BLOCK_SIZE][BLOCK_SIZE];
|
||||
float Bs[BLOCK_SIZE][BLOCK_SIZE];
|
||||
float Cs[BLOCK_SIZE][BLOCK_SIZE];
|
||||
int C_row, C_col;
|
||||
|
||||
/* Zero Cs for this BLOCK */
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
// REPLACE NEXT THREE LINES WITH A BARRIER
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++) {
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++) {
|
||||
// END BARRIER
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
Cs[row][col] = 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
// This kblock loop is run on the master thread of each team
|
||||
for (int kblock = 0; kblock < K ; kblock += BLOCK_SIZE ) {
|
||||
|
||||
// Copy global memory values to local memory
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
// REPLACE NEXT THREE LINES WITH A BARRIER
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++) {
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++) {
|
||||
// END BARRIER
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (kblock + col < K))
|
||||
As[row][col] = A[(C_row*LDA)+ kblock + col];
|
||||
else
|
||||
As[row][col] = 0;
|
||||
if ((kblock + row < K) && C_col < N)
|
||||
Bs[row][col] = B[((kblock+row)*LDB)+ C_col];
|
||||
else
|
||||
Bs[row][col] = 0;
|
||||
}
|
||||
}
|
||||
|
||||
// Calculate Cs <- Sum(As X Bs) across all kblocks
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
// REPLACE NEXT THREE LINES WITH A BARRIER
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++) {
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++) {
|
||||
// END BARRIER
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
for (int e = 0; e < BLOCK_SIZE; ++e)
|
||||
Cs[row][col] += As[row][e] * Bs[e][col];
|
||||
}
|
||||
}
|
||||
|
||||
} /* End for kblock .. */
|
||||
|
||||
|
||||
// Scale Update actual C from Cs
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
// REPLACE NEXT THREE LINES WITH A BARRIER
|
||||
#pragma omp parallel for collapse(2)
|
||||
for (int row=0 ; row < BLOCK_SIZE ; row++) {
|
||||
for (int col=0 ; col < BLOCK_SIZE ; col++) {
|
||||
// END BARRIER
|
||||
// - - - - - - - - - - - - - - - - - - - -
|
||||
C_row = C_row_start + row;
|
||||
C_col = C_col_start + col;
|
||||
if ((C_row < M) && (C_col < N)) {
|
||||
C[(C_row*LDC)+C_col] = alpha*Cs[row][col] + beta*C[(C_row*LDC)+C_col];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------
|
||||
// This is the end of the kernel
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
}
|
Loading…
Add table
Reference in a new issue