Locality cloning pass: -fipa-reorder-for-locality
Implement partitioning and cloning in the callgraph to help locality. A new -fipa-reorder-for-locality flag is used to enable this. The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc The optimization has two components: * Partitioning the callgraph so as to group callers and callees that frequently call each other in the same partition * Cloning functions that straddle multiple callchains and allowing each clone to be local to the partition of its callchain. The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc. It creates a partitioning plan and does the prerequisite cloning. The partitioning is then implemented during the existing LTO partitioning pass. To guide these locality heuristics we use PGO data. In the absence of PGO data we use a static heuristic that uses the accumulated estimated edge frequencies of the callees for each function to guide the reordering. We are investigating some more elaborate static heuristics, in particular using the demangled C++ names to group template instantiatios together. This is promising but we are working out some kinks in the implementation currently and want to send that out as a follow-up once we're more confident in it. A new bootstrap-lto-locality bootstrap config is added that allows us to test this on GCC itself with either static or PGO heuristics. GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap). As this new pass enables a new partitioning scheme it is incompatible with explicit -flto-partition= options so an error is introduced when the user uses both flags explicitly. With this optimization we are seeing good performance gains on some large internal workloads that stress the parts of the processor that is sensitive to code locality, but we'd appreciate wider performance evaluation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Prachi Godbole <pgodbole@nvidia.com> Co-authored-by: Kyrylo Tkachov <ktkachov@nvidia.com> config/ChangeLog: * bootstrap-lto-locality.mk: New file. gcc/ChangeLog: * Makefile.in (OBJS): Add ipa-locality-cloning.o. * cgraph.h (set_new_clone_decl_and_node_flags): Declare prototype. * cgraphclones.cc (set_new_clone_decl_and_node_flags): Remove static qualifier. * common.opt (fipa-reorder-for-locality): New flag. (LTO_PARTITION_DEFAULT): Declare. (flto-partition): Change default to LTO_PARTITION_DFEAULT. * doc/invoke.texi: Document -fipa-reorder-for-locality. * flag-types.h (enum lto_locality_cloning_model): Declare. (lto_partitioning_model): Add LTO_PARTITION_DEFAULT. * lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping of node and index. * opts.cc (validate_ipa_reorder_locality_lto_partition): Define. (finish_options): Handle LTO_PARTITION_DEFAULT. * params.opt (lto_locality_cloning_model): New enum. (lto-partition-locality-cloning): New param. (lto-partition-locality-frequency-cutoff): Likewise. (lto-partition-locality-size-cutoff): Likewise. (lto-max-locality-partition): Likewise. * passes.def: Register pass_ipa_locality_cloning. * timevar.def (TV_IPA_LC): New timevar. * tree-pass.h (make_pass_ipa_locality_cloning): Declare. * ipa-locality-cloning.cc: New file. * ipa-locality-cloning.h: New file. gcc/lto/ChangeLog: * lto-partition.cc (add_node_references_to_partition): Define. (create_partition): Likewise. (lto_locality_map): Likewise. (lto_promote_cross_file_statics): Add extra dumping. * lto-partition.h (lto_locality_map): Declare prototype. * lto.cc (do_whole_program_analysis): Handle flag_ipa_reorder_for_locality.
This commit is contained in:
parent
b4cf69503b
commit
6d9fdf4bf5
18 changed files with 1423 additions and 11 deletions
20
config/bootstrap-lto-locality.mk
Normal file
20
config/bootstrap-lto-locality.mk
Normal file
|
@ -0,0 +1,20 @@
|
|||
# This option enables LTO and locality partitioning for stage2 and stage3 in slim mode
|
||||
|
||||
STAGE2_CFLAGS += -flto=jobserver -frandom-seed=1 -fipa-reorder-for-locality
|
||||
STAGE3_CFLAGS += -flto=jobserver -frandom-seed=1 -fipa-reorder-for-locality
|
||||
STAGEprofile_CFLAGS += -flto=jobserver -frandom-seed=1 -fipa-reorder-for-locality
|
||||
STAGEtrain_CFLAGS += -flto=jobserver -frandom-seed=1 -fipa-reorder-for-locality
|
||||
STAGEfeedback_CFLAGS += -flto=jobserver -frandom-seed=1 -fipa-reorder-for-locality
|
||||
|
||||
# assumes the host supports the linker plugin
|
||||
LTO_AR = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-ar$(exeext) -B$$r/$(HOST_SUBDIR)/prev-gcc/
|
||||
LTO_RANLIB = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-ranlib$(exeext) -B$$r/$(HOST_SUBDIR)/prev-gcc/
|
||||
LTO_NM = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-nm$(exeext) -B$$r/$(HOST_SUBDIR)/prev-gcc/
|
||||
|
||||
LTO_EXPORTS = AR="$(LTO_AR)"; export AR; \
|
||||
RANLIB="$(LTO_RANLIB)"; export RANLIB; \
|
||||
NM="$(LTO_NM)"; export NM;
|
||||
LTO_FLAGS_TO_PASS = AR="$(LTO_AR)" RANLIB="$(LTO_RANLIB)" NM="$(LTO_NM)"
|
||||
|
||||
do-compare = $(SHELL) $(srcdir)/contrib/compare-lto $$f1 $$f2
|
||||
extra-compare = gcc/lto1$(exeext)
|
|
@ -1555,6 +1555,7 @@ OBJS = \
|
|||
incpath.o \
|
||||
init-regs.o \
|
||||
internal-fn.o \
|
||||
ipa-locality-cloning.o \
|
||||
ipa-cp.o \
|
||||
ipa-sra.o \
|
||||
ipa-devirt.o \
|
||||
|
@ -3026,6 +3027,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
|
|||
$(srcdir)/ipa-param-manipulation.h $(srcdir)/ipa-sra.cc \
|
||||
$(srcdir)/ipa-modref.h $(srcdir)/ipa-modref.cc \
|
||||
$(srcdir)/ipa-modref-tree.h \
|
||||
$(srcdir)/ipa-locality-cloning.cc \
|
||||
$(srcdir)/signop.h \
|
||||
$(srcdir)/diagnostic-spec.h $(srcdir)/diagnostic-spec.cc \
|
||||
$(srcdir)/dwarf2out.h \
|
||||
|
|
|
@ -2627,6 +2627,7 @@ void tree_function_versioning (tree, tree, vec<ipa_replace_map *, va_gc> *,
|
|||
void dump_callgraph_transformation (const cgraph_node *original,
|
||||
const cgraph_node *clone,
|
||||
const char *suffix);
|
||||
void set_new_clone_decl_and_node_flags (cgraph_node *new_node);
|
||||
/* In cgraphbuild.cc */
|
||||
int compute_call_stmt_bb_frequency (tree, basic_block bb);
|
||||
void record_references_in_initializer (tree, bool);
|
||||
|
|
|
@ -158,7 +158,7 @@ cgraph_edge::clone (cgraph_node *n, gcall *call_stmt, unsigned stmt_uid,
|
|||
/* Set flags of NEW_NODE and its decl. NEW_NODE is a newly created private
|
||||
clone or its thunk. */
|
||||
|
||||
static void
|
||||
void
|
||||
set_new_clone_decl_and_node_flags (cgraph_node *new_node)
|
||||
{
|
||||
DECL_EXTERNAL (new_node->decl) = 0;
|
||||
|
|
|
@ -2116,6 +2116,10 @@ fipa-modref
|
|||
Common Var(flag_ipa_modref) Optimization
|
||||
Perform interprocedural modref analysis.
|
||||
|
||||
fipa-reorder-for-locality
|
||||
Common Var(flag_ipa_reorder_for_locality) Init(0) Optimization
|
||||
Perform reordering and cloning of functions to maximize locality.
|
||||
|
||||
fipa-profile
|
||||
Common Var(flag_ipa_profile) Init(0) Optimization
|
||||
Perform interprocedural profile propagation.
|
||||
|
@ -2274,6 +2278,9 @@ Number of cache entries in incremental LTO after which to prune old entries.
|
|||
Enum
|
||||
Name(lto_partition_model) Type(enum lto_partition_model) UnknownError(unknown LTO partitioning model %qs)
|
||||
|
||||
EnumValue
|
||||
Enum(lto_partition_model) String(default) Value(LTO_PARTITION_DEFAULT)
|
||||
|
||||
EnumValue
|
||||
Enum(lto_partition_model) String(none) Value(LTO_PARTITION_NONE)
|
||||
|
||||
|
@ -2293,7 +2300,7 @@ EnumValue
|
|||
Enum(lto_partition_model) String(cache) Value(LTO_PARTITION_CACHE)
|
||||
|
||||
flto-partition=
|
||||
Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) Init(LTO_PARTITION_BALANCED)
|
||||
Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) Init(LTO_PARTITION_DEFAULT)
|
||||
Specify the algorithm to partition symbols and vars at linktime.
|
||||
|
||||
; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
|
||||
|
|
|
@ -593,7 +593,7 @@ Objective-C and Objective-C++ Dialects}.
|
|||
-finline-functions -finline-functions-called-once -finline-limit=@var{n}
|
||||
-finline-small-functions -fipa-modref -fipa-cp -fipa-cp-clone
|
||||
-fipa-bit-cp -fipa-vrp -fipa-pta -fipa-profile -fipa-pure-const
|
||||
-fipa-reference -fipa-reference-addressable
|
||||
-fipa-reference -fipa-reference-addressable -fipa-reorder-for-locality
|
||||
-fipa-stack-alignment -fipa-icf -fira-algorithm=@var{algorithm}
|
||||
-flate-combine-instructions -flifetime-dse -flive-patching=@var{level}
|
||||
-fira-region=@var{region} -fira-hoist-pressure
|
||||
|
@ -13871,6 +13871,21 @@ Enabled by default at @option{-O1} and higher.
|
|||
Discover read-only, write-only and non-addressable static variables.
|
||||
Enabled by default at @option{-O1} and higher.
|
||||
|
||||
@opindex fipa-reorder-for-locality
|
||||
@item -fipa-reorder-for-locality
|
||||
Group call chains close together in the binary layout to improve code
|
||||
locality and minimize jump distances between frequently called functions.
|
||||
Unlike @option{-freorder-functions} this pass considers the call
|
||||
chains between functions and groups them together, rather than grouping all
|
||||
hot/normal/cold/never-executed functions into separate sections.
|
||||
Unlike @option{-fprofile-reorder-functions} it aims to improve code locality
|
||||
throughout the runtime of the program rather than focusing on program startup.
|
||||
This option is incompatible with an explicit
|
||||
@option{-flto-partition=} option since it enforces a custom partitioning
|
||||
scheme.
|
||||
If using this option it is recommended to also use profile feedback, but this
|
||||
option is not enabled by default otherwise.
|
||||
|
||||
@opindex fipa-stack-alignment
|
||||
@item -fipa-stack-alignment
|
||||
Reduce stack alignment on call sites if possible.
|
||||
|
@ -14606,11 +14621,13 @@ Enabled for x86 at levels @option{-O2}, @option{-O3}, @option{-Os}.
|
|||
@opindex freorder-functions
|
||||
@item -freorder-functions
|
||||
Reorder functions in the object file in order to
|
||||
improve code locality. This is implemented by using special
|
||||
subsections @code{.text.hot} for most frequently executed functions and
|
||||
@code{.text.unlikely} for unlikely executed functions. Reordering is done by
|
||||
the linker so object file format must support named sections and linker must
|
||||
place them in a reasonable way.
|
||||
improve code locality. Unlike @option{-fipa-reorder-for-locality} this option
|
||||
prioritises grouping all functions within a category
|
||||
(hot/normal/cold/never-executed) together.
|
||||
This is implemented by using special subsections @code{.text.hot} for most
|
||||
frequently executed functions and @code{.text.unlikely} for unlikely executed
|
||||
functions. Reordering is done by the linker so object file format must support
|
||||
named sections and linker must place them in a reasonable way.
|
||||
|
||||
This option isn't effective unless you either provide profile feedback
|
||||
(see @option{-fprofile-arcs} for details) or manually annotate functions with
|
||||
|
@ -15635,7 +15652,8 @@ Enabled by @option{-fprofile-generate}, @option{-fprofile-use}, and
|
|||
@item -fprofile-reorder-functions
|
||||
Function reordering based on profile instrumentation collects
|
||||
first time of execution of a function and orders these functions
|
||||
in ascending order.
|
||||
in ascending order, aiming to optimize program startup through more
|
||||
efficient loading of text segments.
|
||||
|
||||
Enabled with @option{-fprofile-use}.
|
||||
|
||||
|
|
|
@ -404,7 +404,15 @@ enum lto_partition_model {
|
|||
LTO_PARTITION_BALANCED = 2,
|
||||
LTO_PARTITION_1TO1 = 3,
|
||||
LTO_PARTITION_MAX = 4,
|
||||
LTO_PARTITION_CACHE = 5
|
||||
LTO_PARTITION_CACHE = 5,
|
||||
LTO_PARTITION_DEFAULT= 6
|
||||
};
|
||||
|
||||
/* flag_lto_locality_cloning initialization values. */
|
||||
enum lto_locality_cloning_model {
|
||||
LTO_LOCALITY_NO_CLONING = 0,
|
||||
LTO_LOCALITY_NON_INTERPOSABLE_CLONING = 1,
|
||||
LTO_LOCALITY_MAXIMAL_CLONING = 2,
|
||||
};
|
||||
|
||||
/* flag_lto_linker_output initialization values. */
|
||||
|
|
1137
gcc/ipa-locality-cloning.cc
Normal file
1137
gcc/ipa-locality-cloning.cc
Normal file
File diff suppressed because it is too large
Load diff
35
gcc/ipa-locality-cloning.h
Normal file
35
gcc/ipa-locality-cloning.h
Normal file
|
@ -0,0 +1,35 @@
|
|||
/* LTO partitioning logic routines.
|
||||
Copyright The GNU Toolchain Authors
|
||||
|
||||
This file is part of GCC.
|
||||
|
||||
GCC is free software; you can redistribute it and/or modify it under
|
||||
the terms of the GNU General Public License as published by the Free
|
||||
Software Foundation; either version 3, or (at your option) any later
|
||||
version.
|
||||
|
||||
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
|
||||
WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
||||
for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with GCC; see the file COPYING3. If not see
|
||||
<http://www.gnu.org/licenses/>. */
|
||||
|
||||
#ifndef IPA_LOCALITY_CLONING_H
|
||||
#define IPA_LOCALITY_CLONING_H
|
||||
|
||||
/* Structure describing locality partitions. */
|
||||
struct locality_partition_def
|
||||
{
|
||||
int part_id;
|
||||
vec<cgraph_node *> nodes;
|
||||
int insns;
|
||||
};
|
||||
|
||||
typedef struct locality_partition_def *locality_partition;
|
||||
|
||||
extern vec<locality_partition> locality_partitions;
|
||||
|
||||
#endif /* IPA_LOCALITY_CLONING_H */
|
|
@ -229,6 +229,8 @@ lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
|
|||
symtab_node *node)
|
||||
{
|
||||
int index = lto_symtab_encoder_encode (encoder, node);
|
||||
if (dump_file)
|
||||
fprintf(dump_file, "Node %s, index %d\n", node->asm_name(), index);
|
||||
encoder->nodes[index].in_partition = true;
|
||||
}
|
||||
|
||||
|
|
|
@ -37,6 +37,7 @@ along with GCC; see the file COPYING3. If not see
|
|||
#include "ipa-prop.h"
|
||||
#include "ipa-fnsummary.h"
|
||||
#include "lto-partition.h"
|
||||
#include "ipa-locality-cloning.h"
|
||||
|
||||
#include <limits>
|
||||
|
||||
|
@ -1418,6 +1419,126 @@ lto_balanced_map (int n_lto_partitions, int max_partition_size)
|
|||
}
|
||||
}
|
||||
|
||||
/* Add all references of NODE into PARTITION. */
|
||||
|
||||
static void
|
||||
add_node_references_to_partition (ltrans_partition partition, symtab_node *node)
|
||||
{
|
||||
struct ipa_ref *ref = NULL;
|
||||
varpool_node *vnode;
|
||||
for (int j = 0; node->iterate_reference (j, ref); j++)
|
||||
if (is_a <varpool_node *> (ref->referred))
|
||||
{
|
||||
vnode = dyn_cast <varpool_node *> (ref->referred);
|
||||
if (!symbol_partitioned_p (vnode)
|
||||
&& !vnode->no_reorder
|
||||
&& vnode->get_partitioning_class () == SYMBOL_PARTITION)
|
||||
{
|
||||
add_symbol_to_partition (partition, vnode);
|
||||
if (dump_file)
|
||||
fprintf (dump_file, "Varpool Node: %s\n", vnode->dump_asm_name ());
|
||||
add_node_references_to_partition (partition, vnode);
|
||||
}
|
||||
}
|
||||
|
||||
for (int j = 0; node->iterate_referring (j, ref); j++)
|
||||
if (is_a <varpool_node *> (ref->referring))
|
||||
{
|
||||
vnode = dyn_cast <varpool_node *> (ref->referring);
|
||||
gcc_assert (vnode->definition);
|
||||
if (!symbol_partitioned_p (vnode)
|
||||
&& !vnode->no_reorder
|
||||
&& !vnode->can_remove_if_no_refs_p ()
|
||||
&& vnode->get_partitioning_class () == SYMBOL_PARTITION)
|
||||
{
|
||||
add_symbol_to_partition (partition, vnode);
|
||||
if (dump_file)
|
||||
fprintf (dump_file, "Varpool Node: %s\n", vnode->dump_asm_name ());
|
||||
add_node_references_to_partition (partition, vnode);
|
||||
}
|
||||
}
|
||||
if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
|
||||
{
|
||||
struct cgraph_edge *e;
|
||||
|
||||
/* Add all inline clones and callees that are duplicated. */
|
||||
for (e = cnode->callees; e; e = e->next_callee)
|
||||
if (e->callee->get_partitioning_class () == SYMBOL_DUPLICATE)
|
||||
add_node_references_to_partition (partition, e->callee);
|
||||
|
||||
/* Add all thunks associated with the function. */
|
||||
for (e = cnode->callers; e; e = e->next_caller)
|
||||
if (e->caller->thunk && !e->caller->inlined_to)
|
||||
add_node_references_to_partition (partition, e->caller);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/* Create and return the created partition of name NAME. */
|
||||
|
||||
static ltrans_partition
|
||||
create_partition (int &npartitions, const char *name)
|
||||
{
|
||||
npartitions++;
|
||||
return new_partition (name);
|
||||
}
|
||||
|
||||
/* Partitioning for code locality.
|
||||
The partitioning plan (and prerequisite cloning) will have been done by the
|
||||
IPA locality cloning pass. This function just implements that plan by
|
||||
assigning those partitions to ltrans_parititions. */
|
||||
|
||||
void
|
||||
lto_locality_map (int max_partition_size)
|
||||
{
|
||||
symtab_node *snode;
|
||||
int npartitions = 0;
|
||||
|
||||
auto_vec<varpool_node *> varpool_order;
|
||||
struct cgraph_node *node;
|
||||
|
||||
if (locality_partitions.length () == 0)
|
||||
{
|
||||
if (dump_file)
|
||||
{
|
||||
fprintf (dump_file, "Locality partition: falling back to balanced "
|
||||
"model\n");
|
||||
}
|
||||
lto_balanced_map (param_lto_partitions, param_max_partition_size);
|
||||
return;
|
||||
}
|
||||
ltrans_partition partition = nullptr;
|
||||
for (auto part : locality_partitions)
|
||||
{
|
||||
partition = create_partition (npartitions, "");
|
||||
for (unsigned j = 0; j < part->nodes.length (); j++)
|
||||
{
|
||||
node = part->nodes[j];
|
||||
if (symbol_partitioned_p (node))
|
||||
continue;
|
||||
|
||||
add_symbol_to_partition (partition, node);
|
||||
add_node_references_to_partition (partition, node);
|
||||
}
|
||||
}
|
||||
|
||||
int64_t partition_size = max_partition_size;
|
||||
/* All other unpartitioned symbols. */
|
||||
FOR_EACH_SYMBOL (snode)
|
||||
{
|
||||
if (snode->get_partitioning_class () == SYMBOL_PARTITION
|
||||
&& !symbol_partitioned_p (snode))
|
||||
{
|
||||
if (partition->insns > partition_size)
|
||||
partition = create_partition (npartitions, "");
|
||||
|
||||
add_symbol_to_partition (partition, snode);
|
||||
if (dump_file)
|
||||
fprintf (dump_file, "Un-ordered Node: %s\n", snode->dump_asm_name ());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/* Return true if we must not change the name of the NODE. The name as
|
||||
extracted from the corresponding decl should be passed in NAME. */
|
||||
|
||||
|
@ -1732,7 +1853,12 @@ lto_promote_cross_file_statics (void)
|
|||
{
|
||||
ltrans_partition part
|
||||
= ltrans_partitions[i];
|
||||
if (dump_file)
|
||||
fprintf (dump_file, "lto_promote_cross_file_statics for part %s %p\n",
|
||||
part->name, (void *)part->encoder);
|
||||
part->encoder = compute_ltrans_boundary (part->encoder);
|
||||
if (dump_file)
|
||||
fprintf (dump_file, "new encoder %p\n", (void *)part->encoder);
|
||||
}
|
||||
|
||||
lto_clone_numbers = new hash_map<const char *, unsigned>;
|
||||
|
|
|
@ -37,6 +37,7 @@ void lto_1_to_1_map (void);
|
|||
void lto_max_map (void);
|
||||
void lto_cache_map (int, int);
|
||||
void lto_balanced_map (int, int);
|
||||
void lto_locality_map (int);
|
||||
void lto_promote_cross_file_statics (void);
|
||||
void free_ltrans_partitions (void);
|
||||
void lto_promote_statics_nonwpa (void);
|
||||
|
|
|
@ -547,7 +547,9 @@ do_whole_program_analysis (void)
|
|||
|
||||
symtab_node::checking_verify_symtab_nodes ();
|
||||
bitmap_obstack_release (NULL);
|
||||
if (flag_lto_partition == LTO_PARTITION_1TO1)
|
||||
if (flag_ipa_reorder_for_locality)
|
||||
lto_locality_map (param_max_locality_partition_size);
|
||||
else if (flag_lto_partition == LTO_PARTITION_1TO1)
|
||||
lto_1_to_1_map ();
|
||||
else if (flag_lto_partition == LTO_PARTITION_MAX)
|
||||
lto_max_map ();
|
||||
|
|
23
gcc/opts.cc
23
gcc/opts.cc
|
@ -1037,6 +1037,25 @@ report_conflicting_sanitizer_options (struct gcc_options *opts, location_t loc,
|
|||
}
|
||||
}
|
||||
|
||||
/* Validate from OPTS and OPTS_SET that when -fipa-reorder-for-locality is
|
||||
enabled no explicit -flto-partition is also passed as the locality cloning
|
||||
pass uses its own partitioning scheme. */
|
||||
|
||||
static void
|
||||
validate_ipa_reorder_locality_lto_partition (struct gcc_options *opts,
|
||||
struct gcc_options *opts_set)
|
||||
{
|
||||
static bool validated_p = false;
|
||||
|
||||
if (opts->x_flag_lto_partition != LTO_PARTITION_DEFAULT)
|
||||
{
|
||||
if (opts_set->x_flag_ipa_reorder_for_locality && !validated_p)
|
||||
error ("%<-fipa-reorder-for-locality%> is incompatible with"
|
||||
" an explicit %qs option", "-flto-partition");
|
||||
}
|
||||
validated_p = true;
|
||||
}
|
||||
|
||||
/* After all options at LOC have been read into OPTS and OPTS_SET,
|
||||
finalize settings of those options and diagnose incompatible
|
||||
combinations. */
|
||||
|
@ -1249,6 +1268,10 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
|
|||
if (opts->x_flag_reorder_blocks_and_partition)
|
||||
SET_OPTION_IF_UNSET (opts, opts_set, flag_reorder_functions, 1);
|
||||
|
||||
validate_ipa_reorder_locality_lto_partition (opts, opts_set);
|
||||
if (opts_set->x_flag_lto_partition != LTO_PARTITION_DEFAULT)
|
||||
opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = LTO_PARTITION_BALANCED;
|
||||
|
||||
/* The -gsplit-dwarf option requires -ggnu-pubnames. */
|
||||
if (opts->x_dwarf_split_debug_info)
|
||||
opts->x_debug_generate_pub_sections = 2;
|
||||
|
|
|
@ -469,6 +469,33 @@ Minimal size of a partition for LTO (in estimated instructions).
|
|||
Common Joined UInteger Var(param_lto_partitions) Init(128) IntegerRange(1, 65536) Param
|
||||
Number of partitions the program should be split to.
|
||||
|
||||
Enum
|
||||
Name(lto_locality_cloning_model) Type(enum lto_locality_cloning_model) UnknownError(unknown LTO partitioning model %qs)
|
||||
|
||||
EnumValue
|
||||
Enum(lto_locality_cloning_model) String(no) Value(LTO_LOCALITY_NO_CLONING)
|
||||
|
||||
EnumValue
|
||||
Enum(lto_locality_cloning_model) String(non_interposable) Value(LTO_LOCALITY_NON_INTERPOSABLE_CLONING)
|
||||
|
||||
EnumValue
|
||||
Enum(lto_locality_cloning_model) String(maximal) Value(LTO_LOCALITY_MAXIMAL_CLONING)
|
||||
|
||||
-param=lto-partition-locality-cloning=
|
||||
Common Joined RejectNegative Enum(lto_locality_cloning_model) Var(flag_lto_locality_cloning) Init(LTO_LOCALITY_MAXIMAL_CLONING) Optimization
|
||||
|
||||
-param=lto-partition-locality-frequency-cutoff=
|
||||
Common Joined UInteger Var(param_lto_locality_frequency) Init(1) IntegerRange(0, 65536) Param Optimization
|
||||
The denominator n of fraction 1/n of the execution frequency of callee to be cloned for a particular caller. Special value of 0 dictates to always clone without a cut-off.
|
||||
|
||||
-param=lto-partition-locality-size-cutoff=
|
||||
Common Joined UInteger Var(param_lto_locality_size) Init(1000) IntegerRange(1, 65536) Param Optimization
|
||||
Size cut-off for callee including inlined calls to be cloned for a particular caller.
|
||||
|
||||
-param=lto-max-locality-partition=
|
||||
Common Joined UInteger Var(param_max_locality_partition_size) Init(1000000) Param
|
||||
Maximal size of a locality partition for LTO (in estimated instructions). Value of 0 results in default value being used.
|
||||
|
||||
-param=max-average-unrolled-insns=
|
||||
Common Joined UInteger Var(param_max_average_unrolled_insns) Init(80) Param Optimization
|
||||
The maximum number of instructions to consider to unroll in a loop on average.
|
||||
|
|
|
@ -162,6 +162,7 @@ along with GCC; see the file COPYING3. If not see
|
|||
NEXT_PASS (pass_ipa_sra);
|
||||
NEXT_PASS (pass_ipa_fn_summary);
|
||||
NEXT_PASS (pass_ipa_inline);
|
||||
NEXT_PASS (pass_ipa_locality_cloning);
|
||||
NEXT_PASS (pass_ipa_pure_const);
|
||||
NEXT_PASS (pass_ipa_modref);
|
||||
NEXT_PASS (pass_ipa_free_fn_summary, false /* small_p */);
|
||||
|
|
|
@ -105,6 +105,7 @@ DEFTIMEVAR (TV_IPA_PURE_CONST , "ipa pure const")
|
|||
DEFTIMEVAR (TV_IPA_ICF , "ipa icf")
|
||||
DEFTIMEVAR (TV_IPA_PTA , "ipa points-to")
|
||||
DEFTIMEVAR (TV_IPA_SRA , "ipa SRA")
|
||||
DEFTIMEVAR (TV_IPA_LC , "ipa locality clone")
|
||||
DEFTIMEVAR (TV_IPA_FREE_LANG_DATA , "ipa free lang data")
|
||||
DEFTIMEVAR (TV_IPA_FREE_INLINE_SUMMARY, "ipa free inline summary")
|
||||
DEFTIMEVAR (TV_IPA_MODREF , "ipa modref")
|
||||
|
|
|
@ -551,6 +551,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
|
|||
extern ipa_opt_pass_d *make_pass_ipa_single_use (gcc::context *ctxt);
|
||||
extern ipa_opt_pass_d *make_pass_ipa_comdats (gcc::context *ctxt);
|
||||
extern ipa_opt_pass_d *make_pass_ipa_modref (gcc::context *ctxt);
|
||||
extern ipa_opt_pass_d *make_pass_ipa_locality_cloning (gcc::context *ctxt);
|
||||
|
||||
extern gimple_opt_pass *make_pass_cleanup_cfg_post_optimizing (gcc::context
|
||||
*ctxt);
|
||||
|
|
Loading…
Add table
Reference in a new issue