gcc/config
Kyrylo Tkachov 6d9fdf4bf5
Locality cloning pass: -fipa-reorder-for-locality
Implement partitioning and cloning in the callgraph to help locality.
A new -fipa-reorder-for-locality flag is used to enable this.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
The optimization has two components:
* Partitioning the callgraph so as to group callers and callees that frequently
call each other in the same partition
* Cloning functions that straddle multiple callchains and allowing each clone
to be local to the partition of its callchain.

The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
It creates a partitioning plan and does the prerequisite cloning.
The partitioning is then implemented during the existing LTO partitioning pass.

To guide these locality heuristics we use PGO data.
In the absence of PGO data we use a static heuristic that uses the accumulated
estimated edge frequencies of the callees for each function to guide the
reordering.
We are investigating some more elaborate static heuristics, in particular using
the demangled C++ names to group template instantiatios together.
This is promising but we are working out some kinks in the implementation
currently and want to send that out as a follow-up once we're more confident
in it.

A new bootstrap-lto-locality bootstrap config is added that allows us to test
this on GCC itself with either static or PGO heuristics.
GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).

As this new pass enables a new partitioning scheme it is incompatible with
explicit -flto-partition= options so an error is introduced when the user
uses both flags explicitly.

With this optimization we are seeing good performance gains on some large
internal workloads that stress the parts of the processor that is sensitive
to code locality, but we'd appreciate wider performance evaluation.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Prachi Godbole <pgodbole@nvidia.com>
Co-authored-by: Kyrylo Tkachov <ktkachov@nvidia.com>

config/ChangeLog:

	* bootstrap-lto-locality.mk: New file.

gcc/ChangeLog:

	* Makefile.in (OBJS): Add ipa-locality-cloning.o.
	* cgraph.h (set_new_clone_decl_and_node_flags): Declare prototype.
	* cgraphclones.cc (set_new_clone_decl_and_node_flags): Remove static
	qualifier.
	* common.opt (fipa-reorder-for-locality): New flag.
	(LTO_PARTITION_DEFAULT): Declare.
	(flto-partition): Change default to LTO_PARTITION_DFEAULT.
	* doc/invoke.texi: Document -fipa-reorder-for-locality.
	* flag-types.h (enum lto_locality_cloning_model): Declare.
	(lto_partitioning_model): Add LTO_PARTITION_DEFAULT.
	* lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping of
	node and index.
	* opts.cc (validate_ipa_reorder_locality_lto_partition): Define.
	(finish_options): Handle LTO_PARTITION_DEFAULT.
	* params.opt (lto_locality_cloning_model): New enum.
	(lto-partition-locality-cloning): New param.
	(lto-partition-locality-frequency-cutoff): Likewise.
	(lto-partition-locality-size-cutoff): Likewise.
	(lto-max-locality-partition): Likewise.
	* passes.def: Register pass_ipa_locality_cloning.
	* timevar.def (TV_IPA_LC): New timevar.
	* tree-pass.h (make_pass_ipa_locality_cloning): Declare.
	* ipa-locality-cloning.cc: New file.
	* ipa-locality-cloning.h: New file.

gcc/lto/ChangeLog:

	* lto-partition.cc (add_node_references_to_partition): Define.
	(create_partition): Likewise.
	(lto_locality_map): Likewise.
	(lto_promote_cross_file_statics): Add extra dumping.
	* lto-partition.h (lto_locality_map): Declare prototype.
	* lto.cc (do_whole_program_analysis): Handle
	flag_ipa_reorder_for_locality.
2025-04-15 16:35:44 +02:00
..
acinclude.m4 config: delete unused CYG_AC_PATH_LIBERTY macro 2024-01-10 19:51:07 -05:00
acx.m4 build: Don't check for host-prefixed 'cargo' program 2024-04-16 09:43:47 +02:00
asmcfi.m4
ax_check_define.m4
ax_count_cpus.m4
ax_cxx_compile_stdcxx.m4 configure: Also check C++11 (flags) for ${build} compiler not only for ${host} 2020-08-20 21:59:00 +02:00
ax_lib_socket_nsl.m4 build: libcody: Link with -lsocket -lnsl if necessary [PR98316] 2021-01-05 11:32:31 +01:00
ax_pthread.m4
bitfields.m4
bootstrap-asan.mk
bootstrap-cet.mk
bootstrap-debug-big.mk
bootstrap-debug-ckovw.mk
bootstrap-debug-lean.mk
bootstrap-debug-lib.mk
bootstrap-debug.mk
bootstrap-hwasan.mk libsanitizer: mid-end: Introduce stack variable handling for HWASAN 2020-11-25 16:38:06 +00:00
bootstrap-lto-lean.mk Fix PR bootstrap/102389: --with-build-config=bootstrap-lto is broken 2021-09-19 17:29:36 +00:00
bootstrap-lto-locality.mk Locality cloning pass: -fipa-reorder-for-locality 2025-04-15 16:35:44 +02:00
bootstrap-lto-noplugin.mk
bootstrap-lto.mk Fix PR bootstrap/102389: --with-build-config=bootstrap-lto is broken 2021-09-19 17:29:36 +00:00
bootstrap-O1.mk
bootstrap-O3.mk
bootstrap-Og.mk
bootstrap-time.mk
bootstrap-ubsan.mk
cet.m4 GCC_CET_HOST_FLAGS: Check if host supports multi-byte NOPs 2021-05-03 05:00:05 -07:00
ChangeLog Daily bump. 2024-11-26 00:19:26 +00:00
codeset.m4
depstand.m4
dfp.m4 aarch64: Enable DFP (Decimal Floating-point) (BID format) 2022-05-20 09:31:15 +02:00
elf.m4
enable.m4
extensions.m4
futex.m4
gc++filt.m4
gcc-plugin.m4 config: Fix host -rdynamic detection for build != host != target 2023-08-11 13:20:07 +00:00
gettext-sister.m4 *: add modern gettext 2023-11-14 00:47:11 +01:00
gettext.m4 *: add modern gettext 2023-11-14 00:47:11 +01:00
glibc21.m4
gthr.m4 gcc: Add 'mcf' thread model support from mcfgthread 2022-10-19 13:52:37 +00:00
gxx-include-dir.m4
hwcaps.m4 libiberty: Disable hwcaps for sha1.o 2023-11-30 10:06:23 +01:00
iconv.m4 *: add modern gettext 2023-11-14 00:47:11 +01:00
intdiv0.m4
intlmacosx.m4 *: add modern gettext 2023-11-14 00:47:11 +01:00
inttypes-pri.m4
inttypes.m4
inttypes_h.m4
isl.m4
largefile.m4 config: Sync largefile.m4 from binutils-gdb 2020-09-09 11:02:01 +02:00
lcmessage.m4 Fix up duplicated words mostly in comments, part 1 2024-04-02 13:39:11 +02:00
ld-symbolic.m4
lead-dot.m4
lib-ld.m4 egrep in binutils 2023-08-07 22:59:40 +02:00
lib-link.m4 Fixes after recent configure changes relating to static libraries 2020-02-01 00:34:28 +00:00
lib-prefix.m4
libstdc++-raw-cxx.m4
lthostflags.m4
math.m4 libgfortran: Provide some further math library fallbacks [PR94694] 2020-04-22 21:34:19 +02:00
mh-alpha-linux
mh-cygwin
mh-darwin configure: Allow host fragments to react to --enable-host-shared. 2021-08-18 19:46:32 +01:00
mh-djgpp
mh-mingw mh-mingw: drop unused BOOT_CXXFLAGS variable 2023-07-21 10:37:58 +01:00
mh-pa
mh-ppc-aix
mmap.m4
mt-alphaieee
mt-android
mt-d30v
mt-gnu
mt-loongarch-elf LoongArch: Reimplement multilib build option handling. 2023-09-15 10:42:12 +08:00
mt-loongarch-gnu LoongArch: Reimplement multilib build option handling. 2023-09-15 10:42:12 +08:00
mt-loongarch-mlib LoongArch: Reimplement multilib build option handling. 2023-09-15 10:42:12 +08:00
mt-mips-elfoabi
mt-mips-gnu
mt-mips16-compat
mt-ospace
mt-sde
mt-spu
multi.m4
nls.m4
no-executables.m4 Use a non-empty test program to test ability to link. 2020-02-12 13:22:07 -08:00
override.m4 PR27116, Spelling errors found by Debian style checker 2023-08-07 22:59:40 +02:00
picflag.m4 Deprecate a.out support for NetBSD targets. 2023-08-07 22:59:41 +02:00
pkg.m4 PKG_CHECK_MODULES: Properly check if $pkg_cv_[]$1[]_LIBS works 2023-08-07 22:59:41 +02:00
plugins.m4
po.m4
proginstall.m4
progtest.m4
sjlj.m4
stdint.m4
stdint_h.m4
target-posix
tcl.m4
tls.m4
toolexeclibdir.m4
uintmax_t.m4
ulonglong.m4
unwind_ipinfo.m4
warnings.m4
weakref.m4
zlib.m4
zstd.m4 configure: require libzstd >= 1.4.0 2023-08-07 22:59:37 +02:00