diff --git a/libstdc++-v3/doc/html/manual/profile_mode.html b/libstdc++-v3/doc/html/manual/profile_mode.html deleted file mode 100644 index 39c732180ac..00000000000 --- a/libstdc++-v3/doc/html/manual/profile_mode.html +++ /dev/null @@ -1,145 +0,0 @@ - -
Table of Contents
- Goal: Give performance improvement advice based on - recognition of suboptimal usage patterns of the standard library. -
- Method: Wrap the standard library code. Insert - calls to an instrumentation library to record the internal state of - various components at interesting entry/exit points to/from the standard - library. Process trace, recognize suboptimal patterns, give advice. - For details, see the - Perflint - paper presented at CGO 2009. -
- Strengths: -
- Unintrusive solution. The application code does not require any - modification. -
The advice is call context sensitive, thus capable of - identifying precisely interesting dynamic performance behavior. -
- The overhead model is pay-per-view. When you turn off a diagnostic class - at compile time, its overhead disappears. -
-
- Drawbacks: -
- You must recompile the application code with custom options. -
You must run the application on representative input. - The advice is input dependent. -
- The execution time will increase, in some cases by factors. -
-
- This is the anticipated common workflow for program foo.cc
:
-
-$ cat foo.cc -#include <vector> -int main() { - vector<int> v; - for (int k = 0; k < 1024; ++k) v.insert(v.begin(), k); -} - -$ g++ -D_GLIBCXX_PROFILE foo.cc -$ ./a.out -$ cat libstdcxx-profile.txt -vector-to-list: improvement = 5: call stack = 0x804842c ... - : advice = change std::vector to std::list -vector-size: improvement = 3: call stack = 0x804842c ... - : advice = change initial container size from 0 to 1024 -
-
- Anatomy of a warning: -
- Warning id. This is a short descriptive string for the class - that this warning belongs to. E.g., "vector-to-list". -
- Estimated improvement. This is an approximation of the benefit expected - from implementing the change suggested by the warning. It is given on - a log10 scale. Negative values mean that the alternative would actually - do worse than the current choice. - In the example above, 5 comes from the fact that the overhead of - inserting at the beginning of a vector vs. a list is around 1024 * 1024 / 2, - which is around 10e5. The improvement from setting the initial size to - 1024 is in the range of 10e3, since the overhead of dynamic resizing is - linear in this case. -
- Call stack. Currently, the addresses are printed without - symbol name or code location attribution. - Users are expected to postprocess the output using, for instance, addr2line. -
- The warning message. For some warnings, this is static text, e.g., - "change vector to list". For other warnings, such as the one above, - the message contains numeric advice, e.g., the suggested initial size - of the vector. -
-
Three files are generated. libstdcxx-profile.txt
- contains human readable advice. libstdcxx-profile.raw
- contains implementation specific data about each diagnostic.
- Their format is not documented. They are sufficient to generate
- all the advice given in libstdcxx-profile.txt
. The advantage
- of keeping this raw format is that traces from multiple executions can
- be aggregated simply by concatenating the raw traces. We intend to
- offer an external utility program that can issue advice from a trace.
- libstdcxx-profile.conf.out
lists the actual diagnostic
- parameters used. To alter parameters, edit this file and rename it to
- libstdcxx-profile.conf
.
-
Advice is given regardless whether the transformation is valid. - For instance, we advise changing a map to an unordered_map even if the - application semantics require that data be ordered. - We believe such warnings can help users understand the performance - behavior of their application better, which can lead to changes - at a higher abstraction level. -
Compile time switches and environment variables (see also file - profiler.h). Unless specified otherwise, they can be set at compile time - using -D_<name> or by setting variable <name> - in the environment where the program is run, before starting execution. -
- _GLIBCXX_PROFILE_NO_<diagnostic>
:
- disable specific diagnostics.
- See section Diagnostics for possible values.
- (Environment variables not supported.)
-
- _GLIBCXX_PROFILE_TRACE_PATH_ROOT
: set an alternative root
- path for the output files.
-
_GLIBCXX_PROFILE_MAX_WARN_COUNT: set it to the maximum - number of warnings desired. The default value is 10.
- _GLIBCXX_PROFILE_MAX_STACK_DEPTH
: if set to 0,
- the advice will
- be collected and reported for the program as a whole, and not for each
- call context.
- This could also be used in continuous regression tests, where you
- just need to know whether there is a regression or not.
- The default value is 32.
-
- _GLIBCXX_PROFILE_MEM_PER_DIAGNOSTIC
:
- set a limit on how much memory to use for the accounting tables for each
- diagnostic type. When this limit is reached, new events are ignored
- until the memory usage decreases under the limit. Generally, this means
- that newly created containers will not be instrumented until some
- live containers are deleted. The default is 128 MB.
-
- _GLIBCXX_PROFILE_NO_THREADS
:
- Make the library not use threads. If thread local storage (TLS) is not
- available, you will get a preprocessor error asking you to set
- -D_GLIBCXX_PROFILE_NO_THREADS if your program is single-threaded.
- Multithreaded execution without TLS is not supported.
- (Environment variable not supported.)
-
- _GLIBCXX_HAVE_EXECINFO_H
:
- This name should be defined automatically at library configuration time.
- If your library was configured without execinfo.h
, but
- you have it in your include path, you can define it explicitly. Without
- it, advice is collected for the program as a whole, and not for each
- call context.
- (Environment variable not supported.)
-
-
- Many large projects use their own data structures instead of the ones in the
- standard library. If these data structures are similar in functionality
- to the standard library, they can be instrumented with the same hooks
- that are used to instrument the standard library.
- The instrumentation API is exposed in file
- profiler.h
(look for "Instrumentation hooks").
-
- Currently, the cost model uses formulas with predefined relative weights - for alternative containers or container implementations. For instance, - iterating through a vector is X times faster than iterating through a list. -
- (Under development.) - We are working on customizing this to a particular machine by providing - an automated way to compute the actual relative weights for operations - on the given machine. -
- (Under development.) - We plan to provide a performance parameter database format that can be - filled in either by hand or by an automated training mechanism. - The analysis module will then use this database instead of the built in. - generic parameters. -
-
Table 19.1. Profile Code Location
Code Location | Use |
---|---|
libstdc++-v3/include/std/* | Preprocessor code to redirect to profile extension headers. |
libstdc++-v3/include/profile/* | Profile extension public headers (map, vector, ...). |
libstdc++-v3/include/profile/impl/* | Profile extension internals. Implementation files are
- only included from impl/profiler.h , which is the only
- file included from the public headers. |
-
- In order to get our instrumented library version included instead of the
- release one,
- we use the same wrapper model as the debug mode.
- We subclass entities from the release version. Wherever
- _GLIBCXX_PROFILE
is defined, the release namespace is
- std::__norm
, whereas the profile namespace is
- std::__profile
. Using plain std
translates
- into std::__profile
.
-
- Whenever possible, we try to wrap at the public interface level, e.g.,
- in unordered_set
rather than in hashtable
,
- in order not to depend on implementation.
-
- Mixing object files built with and without the profile mode must
- not affect the program execution. However, there are no guarantees to
- the accuracy of diagnostics when using even a single object not built with
- -D_GLIBCXX_PROFILE
.
- Currently, mixing the profile mode with debug and parallel extensions is
- not allowed. Mixing them at compile time will result in preprocessor errors.
- Mixing them at link time is undefined.
-
- Instead of instrumenting every public entry and exit point, - we chose to add instrumentation on demand, as needed - by individual diagnostics. - The main reason is that some diagnostics require us to extract bits of - internal state that are particular only to that diagnostic. - We plan to formalize this later, after we learn more about the requirements - of several diagnostics. -
- All the instrumentation points can be switched on and off using
- -D[_NO]_GLIBCXX_PROFILE_<diagnostic>
options.
- With all the instrumentation calls off, there should be negligible
- overhead over the release version. This property is needed to support
- diagnostics based on timing of internal operations. For such diagnostics,
- we anticipate turning most of the instrumentation off in order to prevent
- profiling overhead from polluting time measurements, and thus diagnostics.
-
- All the instrumentation on/off compile time switches live in
- include/profile/profiler.h
.
-
- For practical reasons, the instrumentation library processes the trace - partially - rather than dumping it to disk in raw form. Each event is processed when - it occurs. It is usually attached a cost and it is aggregated into - the database of a specific diagnostic class. The cost model - is based largely on the standard performance guarantees, but in some - cases we use knowledge about GCC's standard library implementation. -
- Information is indexed by (1) call stack and (2) instance id or address - to be able to understand and summarize precise creation-use-destruction - dynamic chains. Although the analysis is sensitive to dynamic instances, - the reports are only sensitive to call context. Whenever a dynamic instance - is destroyed, we accumulate its effect to the corresponding entry for the - call stack of its constructor location. -
- For details, see - paper presented at - CGO 2009. -
- Final analysis takes place offline, and it is based entirely on the - generated trace and debugging info in the application binary. - See section Diagnostics for a list of analysis types that we plan to support. -
- The input to the analysis is a table indexed by profile type and call stack. - The data type for each entry depends on the profile type. -
- While it is likely that cost models become complex as we get into - more sophisticated analysis, we will try to follow a simple set of rules - at the beginning. -
Relative benefit estimation:
- The idea is to estimate or measure the cost of all operations
- in the original scenario versus the scenario we advise to switch to.
- For instance, when advising to change a vector to a list, an occurrence
- of the insert
method will generally count as a benefit.
- Its magnitude depends on (1) the number of elements that get shifted
- and (2) whether it triggers a reallocation.
-
Synthetic measurements: - We will measure the relative difference between similar operations on - different containers. We plan to write a battery of small tests that - compare the times of the executions of similar methods on different - containers. The idea is to run these tests on the target machine. - If this training phase is very quick, we may decide to perform it at - library initialization time. The results can be cached on disk and reused - across runs. -
Timers: - We plan to use timers for operations of larger granularity, such as sort. - For instance, we can switch between different sort methods on the fly - and report the one that performs best for each call context. -
Show stoppers:
- We may decide that the presence of an operation nullifies the advice.
- For instance, when considering switching from set
to
- unordered_set
, if we detect use of operator ++
,
- we will simply not issue the advice, since this could signal that the use
- care require a sorted container.
-There are two types of reports. First, if we recognize a pattern for which -we have a substitute that is likely to give better performance, we print -the advice and estimated performance gain. The advice is usually associated -to a code position and possibly a call stack. -
-Second, we report performance characteristics for which we do not have
-a clear solution for improvement. For instance, we can point to the user
-the top 10 multimap
locations
-which have the worst data locality in actual traversals.
-Although this does not offer a solution,
-it helps the user focus on the key problems and ignore the uninteresting ones.
-
- First, we want to make sure we preserve the behavior of the release mode.
- You can just type "make check-profile"
, which
- builds and runs the whole test suite in profile mode.
-
- Second, we want to test the correctness of each diagnostic.
- We created a profile
directory in the test suite.
- Each diagnostic must come with at least two tests, one for false positives
- and one for false negatives.
-
The profile mode headers are included with
- -D_GLIBCXX_PROFILE
through preprocessor directives in
- include/std/*
.
-
Instrumented implementations are provided in
- include/profile/*
. All instrumentation hooks are macros
- defined in include/profile/profiler.h
.
-
All the implementation of the instrumentation hooks is in
- include/profile/impl/*
. Although all the code gets included,
- thus is publicly visible, only a small number of functions are called from
- outside this directory. All calls to hook implementations must be
- done through macros defined in profiler.h
. The macro
- must ensure (1) that the call is guarded against reentrance and
- (2) that the call can be turned off at compile time using a
- -D_GLIBCXX_PROFILE_...
compiler option.
-
Let's say the diagnostic name is "magic". -
If you need to instrument a header not already under
- include/profile/*
, first edit the corresponding header
- under include/std/
and add a preprocessor directive such
- as the one in include/std/vector
:
-
-#ifdef _GLIBCXX_PROFILE -# include <profile/vector> -#endif -
-
If the file you need to instrument is not yet under
- include/profile/
, make a copy of the one in
- include/debug
, or the main implementation.
- You'll need to include the main implementation and inherit the classes
- you want to instrument. Then define the methods you want to instrument,
- define the instrumentation hooks and add calls to them.
- Look at include/profile/vector
for an example.
-
Add macros for the instrumentation hooks in
- include/profile/impl/profiler.h
.
- Hook names must start with __profcxx_
.
- Make sure they transform
- in no code with -D_NO_GLIBCXX_PROFILE_MAGIC
.
- Make sure all calls to any method in namespace __gnu_profile
- is protected against reentrance using macro
- _GLIBCXX_PROFILE_REENTRANCE_GUARD
.
- All names of methods in namespace __gnu_profile
called from
- profiler.h
must start with __trace_magic_
.
-
Add the implementation of the diagnostic. -
- Create new file include/profile/impl/profiler_magic.h
.
-
- Define class __magic_info: public __object_info_base
.
- This is the representation of a line in the object table.
- The __merge
method is used to aggregate information
- across all dynamic instances created at the same call context.
- The __magnitude
must return the estimation of the benefit
- as a number of small operations, e.g., number of words copied.
- The __write
method is used to produce the raw trace.
- The __advice
method is used to produce the advice string.
-
- Define class __magic_stack_info: public __magic_info
.
- This defines the content of a line in the stack table.
-
- Define class __trace_magic: public __trace_base<__magic_info,
- __magic_stack_info>
.
- It defines the content of the trace associated with this diagnostic.
-
-
Add initialization and reporting calls in
- include/profile/impl/profiler_trace.h
. Use
- __trace_vector_to_list
as an example.
-
Add documentation in file doc/xml/manual/profile_mode.xml
.
-
- Accurate stack traces are needed during profiling since we group events by - call context and dynamic instance. Without accurate traces, diagnostics - may be hard to interpret. For instance, when giving advice to the user - it is imperative to reference application code, not library code. -
- Currently we are using the libc backtrace
routine to get
- stack traces.
- _GLIBCXX_PROFILE_STACK_DEPTH
can be set
- to 0 if you are willing to give up call context information, or to a small
- positive value to reduce run time overhead.
-
- The profiling and analysis phases use only instruction addresses. - An external utility such as addr2line is needed to postprocess the result. - We do not plan to add symbolization support in the profile extension. - This would require access to symbol tables, debug information tables, - external programs or libraries and other system dependent information. -
- Our current model is simplistic, but precise. - We cannot afford to approximate because some of our diagnostics require - precise matching of operations to container instance and call context. - During profiling, we keep a single information table per diagnostic. - There is a single lock per information table. -
- As much as we would like to avoid uses of libstdc++ within our - instrumentation library, containers such as unordered_map are very - appealing. We plan to use them as long as they are named properly - to avoid ambiguity. -
- User applications/libraries can provide malloc hooks. - When the implementation of the malloc hooks uses stdlibc++, there can - be an infinite cycle between the profile mode instrumentation and the - malloc hook code. -
- We protect against reentrance to the profile mode instrumentation code, - which should avoid this problem in most cases. - The protection mechanism is thread safe and exception safe. - This mechanism does not prevent reentrance to the malloc hook itself, - which could still result in deadlock, if, for instance, the malloc hook - uses non-recursive locks. - XXX: A definitive solution to this problem would be for the profile extension - to use a custom allocator internally, and perhaps not to use libstdc++. -
- The profiling library state is initialized at the first call to a profiling
- method. This allows us to record the construction of all global objects.
- However, we cannot do the same at destruction time. The trace is written
- by a function registered by atexit
, thus invoked by
- exit
.
-