diff --git a/libstdc++-v3/doc/html/manual/profile_mode.html b/libstdc++-v3/doc/html/manual/profile_mode.html deleted file mode 100644 index 39c732180ac..00000000000 --- a/libstdc++-v3/doc/html/manual/profile_mode.html +++ /dev/null @@ -1,145 +0,0 @@ - -Chapter 19. Profile Mode

Chapter 19. Profile Mode

Table of Contents

Intro
Using the Profile Mode
Tuning the Profile Mode
Design
Wrapper Model
Instrumentation
Run Time Behavior
Analysis and Diagnostics
Cost Model
Reports
Testing
Extensions for Custom Containers
Empirical Cost Model
Implementation Issues
Stack Traces
Symbolization of Instruction Addresses
Concurrency
Using the Standard Library in the Instrumentation Implementation
Malloc Hooks
Construction and Destruction of Global Objects
Developer Information
Big Picture
How To Add A Diagnostic
Diagnostics
Diagnostic Template
Containers
Hashtable Too Small
Hashtable Too Large
Inefficient Hash
Vector Too Small
Vector Too Large
Vector to Hashtable
Hashtable to Vector
Vector to List
List to Vector
List to Forward List (Slist)
Ordered to Unordered Associative Container
Algorithms
Sort Algorithm Performance
Data Locality
Need Software Prefetch
Linked Structure Locality
Multithreaded Data Access
Data Dependence Violations at Container Level
False Sharing
Statistics
Bibliography

Intro

- Goal: Give performance improvement advice based on - recognition of suboptimal usage patterns of the standard library. -

- Method: Wrap the standard library code. Insert - calls to an instrumentation library to record the internal state of - various components at interesting entry/exit points to/from the standard - library. Process trace, recognize suboptimal patterns, give advice. - For details, see the - Perflint - paper presented at CGO 2009. -

- Strengths: -

  • - Unintrusive solution. The application code does not require any - modification. -

  • The advice is call context sensitive, thus capable of - identifying precisely interesting dynamic performance behavior. -

  • - The overhead model is pay-per-view. When you turn off a diagnostic class - at compile time, its overhead disappears. -

-

- Drawbacks: -

  • - You must recompile the application code with custom options. -

  • You must run the application on representative input. - The advice is input dependent. -

  • - The execution time will increase, in some cases by factors. -

-

Using the Profile Mode

- This is the anticipated common workflow for program foo.cc: -

-$ cat foo.cc
-#include <vector>
-int main() {
-  vector<int> v;
-  for (int k = 0; k < 1024; ++k) v.insert(v.begin(), k);
-}
-
-$ g++ -D_GLIBCXX_PROFILE foo.cc
-$ ./a.out
-$ cat libstdcxx-profile.txt
-vector-to-list: improvement = 5: call stack = 0x804842c ...
-    : advice = change std::vector to std::list
-vector-size: improvement = 3: call stack = 0x804842c ...
-    : advice = change initial container size from 0 to 1024
-

-

- Anatomy of a warning: -

  • - Warning id. This is a short descriptive string for the class - that this warning belongs to. E.g., "vector-to-list". -

  • - Estimated improvement. This is an approximation of the benefit expected - from implementing the change suggested by the warning. It is given on - a log10 scale. Negative values mean that the alternative would actually - do worse than the current choice. - In the example above, 5 comes from the fact that the overhead of - inserting at the beginning of a vector vs. a list is around 1024 * 1024 / 2, - which is around 10e5. The improvement from setting the initial size to - 1024 is in the range of 10e3, since the overhead of dynamic resizing is - linear in this case. -

  • - Call stack. Currently, the addresses are printed without - symbol name or code location attribution. - Users are expected to postprocess the output using, for instance, addr2line. -

  • - The warning message. For some warnings, this is static text, e.g., - "change vector to list". For other warnings, such as the one above, - the message contains numeric advice, e.g., the suggested initial size - of the vector. -

-

Three files are generated. libstdcxx-profile.txt - contains human readable advice. libstdcxx-profile.raw - contains implementation specific data about each diagnostic. - Their format is not documented. They are sufficient to generate - all the advice given in libstdcxx-profile.txt. The advantage - of keeping this raw format is that traces from multiple executions can - be aggregated simply by concatenating the raw traces. We intend to - offer an external utility program that can issue advice from a trace. - libstdcxx-profile.conf.out lists the actual diagnostic - parameters used. To alter parameters, edit this file and rename it to - libstdcxx-profile.conf. -

Advice is given regardless whether the transformation is valid. - For instance, we advise changing a map to an unordered_map even if the - application semantics require that data be ordered. - We believe such warnings can help users understand the performance - behavior of their application better, which can lead to changes - at a higher abstraction level. -

Tuning the Profile Mode

Compile time switches and environment variables (see also file - profiler.h). Unless specified otherwise, they can be set at compile time - using -D_<name> or by setting variable <name> - in the environment where the program is run, before starting execution. -

  • - _GLIBCXX_PROFILE_NO_<diagnostic>: - disable specific diagnostics. - See section Diagnostics for possible values. - (Environment variables not supported.) -

  • - _GLIBCXX_PROFILE_TRACE_PATH_ROOT: set an alternative root - path for the output files. -

  • _GLIBCXX_PROFILE_MAX_WARN_COUNT: set it to the maximum - number of warnings desired. The default value is 10.

  • - _GLIBCXX_PROFILE_MAX_STACK_DEPTH: if set to 0, - the advice will - be collected and reported for the program as a whole, and not for each - call context. - This could also be used in continuous regression tests, where you - just need to know whether there is a regression or not. - The default value is 32. -

  • - _GLIBCXX_PROFILE_MEM_PER_DIAGNOSTIC: - set a limit on how much memory to use for the accounting tables for each - diagnostic type. When this limit is reached, new events are ignored - until the memory usage decreases under the limit. Generally, this means - that newly created containers will not be instrumented until some - live containers are deleted. The default is 128 MB. -

  • - _GLIBCXX_PROFILE_NO_THREADS: - Make the library not use threads. If thread local storage (TLS) is not - available, you will get a preprocessor error asking you to set - -D_GLIBCXX_PROFILE_NO_THREADS if your program is single-threaded. - Multithreaded execution without TLS is not supported. - (Environment variable not supported.) -

  • - _GLIBCXX_HAVE_EXECINFO_H: - This name should be defined automatically at library configuration time. - If your library was configured without execinfo.h, but - you have it in your include path, you can define it explicitly. Without - it, advice is collected for the program as a whole, and not for each - call context. - (Environment variable not supported.) -

-

Bibliography

- Perflint: A Context Sensitive Performance Advisor for C++ Programs - . Lixia Liu. Silvius Rus. Copyright © 2009 . - Proceedings of the 2009 International Symposium on Code Generation - and Optimization - .

\ No newline at end of file diff --git a/libstdc++-v3/doc/html/manual/profile_mode_api.html b/libstdc++-v3/doc/html/manual/profile_mode_api.html deleted file mode 100644 index e63bd5701c6..00000000000 --- a/libstdc++-v3/doc/html/manual/profile_mode_api.html +++ /dev/null @@ -1,9 +0,0 @@ - -Extensions for Custom Containers

Extensions for Custom Containers

- Many large projects use their own data structures instead of the ones in the - standard library. If these data structures are similar in functionality - to the standard library, they can be instrumented with the same hooks - that are used to instrument the standard library. - The instrumentation API is exposed in file - profiler.h (look for "Instrumentation hooks"). -

\ No newline at end of file diff --git a/libstdc++-v3/doc/html/manual/profile_mode_cost_model.html b/libstdc++-v3/doc/html/manual/profile_mode_cost_model.html deleted file mode 100644 index bc87048b4df..00000000000 --- a/libstdc++-v3/doc/html/manual/profile_mode_cost_model.html +++ /dev/null @@ -1,17 +0,0 @@ - -Empirical Cost Model

Empirical Cost Model

- Currently, the cost model uses formulas with predefined relative weights - for alternative containers or container implementations. For instance, - iterating through a vector is X times faster than iterating through a list. -

- (Under development.) - We are working on customizing this to a particular machine by providing - an automated way to compute the actual relative weights for operations - on the given machine. -

- (Under development.) - We plan to provide a performance parameter database format that can be - filled in either by hand or by an automated training mechanism. - The analysis module will then use this database instead of the built in. - generic parameters. -

\ No newline at end of file diff --git a/libstdc++-v3/doc/html/manual/profile_mode_design.html b/libstdc++-v3/doc/html/manual/profile_mode_design.html deleted file mode 100644 index 8ce51c88950..00000000000 --- a/libstdc++-v3/doc/html/manual/profile_mode_design.html +++ /dev/null @@ -1,121 +0,0 @@ - -Design

Design

-

Table 19.1. Profile Code Location

Code LocationUse
libstdc++-v3/include/std/*Preprocessor code to redirect to profile extension headers.
libstdc++-v3/include/profile/*Profile extension public headers (map, vector, ...).
libstdc++-v3/include/profile/impl/*Profile extension internals. Implementation files are - only included from impl/profiler.h, which is the only - file included from the public headers.

-

Wrapper Model

- In order to get our instrumented library version included instead of the - release one, - we use the same wrapper model as the debug mode. - We subclass entities from the release version. Wherever - _GLIBCXX_PROFILE is defined, the release namespace is - std::__norm, whereas the profile namespace is - std::__profile. Using plain std translates - into std::__profile. -

- Whenever possible, we try to wrap at the public interface level, e.g., - in unordered_set rather than in hashtable, - in order not to depend on implementation. -

- Mixing object files built with and without the profile mode must - not affect the program execution. However, there are no guarantees to - the accuracy of diagnostics when using even a single object not built with - -D_GLIBCXX_PROFILE. - Currently, mixing the profile mode with debug and parallel extensions is - not allowed. Mixing them at compile time will result in preprocessor errors. - Mixing them at link time is undefined. -

Instrumentation

- Instead of instrumenting every public entry and exit point, - we chose to add instrumentation on demand, as needed - by individual diagnostics. - The main reason is that some diagnostics require us to extract bits of - internal state that are particular only to that diagnostic. - We plan to formalize this later, after we learn more about the requirements - of several diagnostics. -

- All the instrumentation points can be switched on and off using - -D[_NO]_GLIBCXX_PROFILE_<diagnostic> options. - With all the instrumentation calls off, there should be negligible - overhead over the release version. This property is needed to support - diagnostics based on timing of internal operations. For such diagnostics, - we anticipate turning most of the instrumentation off in order to prevent - profiling overhead from polluting time measurements, and thus diagnostics. -

- All the instrumentation on/off compile time switches live in - include/profile/profiler.h. -

Run Time Behavior

- For practical reasons, the instrumentation library processes the trace - partially - rather than dumping it to disk in raw form. Each event is processed when - it occurs. It is usually attached a cost and it is aggregated into - the database of a specific diagnostic class. The cost model - is based largely on the standard performance guarantees, but in some - cases we use knowledge about GCC's standard library implementation. -

- Information is indexed by (1) call stack and (2) instance id or address - to be able to understand and summarize precise creation-use-destruction - dynamic chains. Although the analysis is sensitive to dynamic instances, - the reports are only sensitive to call context. Whenever a dynamic instance - is destroyed, we accumulate its effect to the corresponding entry for the - call stack of its constructor location. -

- For details, see - paper presented at - CGO 2009. -

Analysis and Diagnostics

- Final analysis takes place offline, and it is based entirely on the - generated trace and debugging info in the application binary. - See section Diagnostics for a list of analysis types that we plan to support. -

- The input to the analysis is a table indexed by profile type and call stack. - The data type for each entry depends on the profile type. -

Cost Model

- While it is likely that cost models become complex as we get into - more sophisticated analysis, we will try to follow a simple set of rules - at the beginning. -

  • Relative benefit estimation: - The idea is to estimate or measure the cost of all operations - in the original scenario versus the scenario we advise to switch to. - For instance, when advising to change a vector to a list, an occurrence - of the insert method will generally count as a benefit. - Its magnitude depends on (1) the number of elements that get shifted - and (2) whether it triggers a reallocation. -

  • Synthetic measurements: - We will measure the relative difference between similar operations on - different containers. We plan to write a battery of small tests that - compare the times of the executions of similar methods on different - containers. The idea is to run these tests on the target machine. - If this training phase is very quick, we may decide to perform it at - library initialization time. The results can be cached on disk and reused - across runs. -

  • Timers: - We plan to use timers for operations of larger granularity, such as sort. - For instance, we can switch between different sort methods on the fly - and report the one that performs best for each call context. -

  • Show stoppers: - We may decide that the presence of an operation nullifies the advice. - For instance, when considering switching from set to - unordered_set, if we detect use of operator ++, - we will simply not issue the advice, since this could signal that the use - care require a sorted container.

Reports

-There are two types of reports. First, if we recognize a pattern for which -we have a substitute that is likely to give better performance, we print -the advice and estimated performance gain. The advice is usually associated -to a code position and possibly a call stack. -

-Second, we report performance characteristics for which we do not have -a clear solution for improvement. For instance, we can point to the user -the top 10 multimap locations -which have the worst data locality in actual traversals. -Although this does not offer a solution, -it helps the user focus on the key problems and ignore the uninteresting ones. -

Testing

- First, we want to make sure we preserve the behavior of the release mode. - You can just type "make check-profile", which - builds and runs the whole test suite in profile mode. -

- Second, we want to test the correctness of each diagnostic. - We created a profile directory in the test suite. - Each diagnostic must come with at least two tests, one for false positives - and one for false negatives. -

\ No newline at end of file diff --git a/libstdc++-v3/doc/html/manual/profile_mode_devel.html b/libstdc++-v3/doc/html/manual/profile_mode_devel.html deleted file mode 100644 index 768c610ba80..00000000000 --- a/libstdc++-v3/doc/html/manual/profile_mode_devel.html +++ /dev/null @@ -1,67 +0,0 @@ - -Developer Information

Developer Information

Big Picture

The profile mode headers are included with - -D_GLIBCXX_PROFILE through preprocessor directives in - include/std/*. -

Instrumented implementations are provided in - include/profile/*. All instrumentation hooks are macros - defined in include/profile/profiler.h. -

All the implementation of the instrumentation hooks is in - include/profile/impl/*. Although all the code gets included, - thus is publicly visible, only a small number of functions are called from - outside this directory. All calls to hook implementations must be - done through macros defined in profiler.h. The macro - must ensure (1) that the call is guarded against reentrance and - (2) that the call can be turned off at compile time using a - -D_GLIBCXX_PROFILE_... compiler option. -

How To Add A Diagnostic

Let's say the diagnostic name is "magic". -

If you need to instrument a header not already under - include/profile/*, first edit the corresponding header - under include/std/ and add a preprocessor directive such - as the one in include/std/vector: -

-#ifdef _GLIBCXX_PROFILE
-# include <profile/vector>
-#endif
-

-

If the file you need to instrument is not yet under - include/profile/, make a copy of the one in - include/debug, or the main implementation. - You'll need to include the main implementation and inherit the classes - you want to instrument. Then define the methods you want to instrument, - define the instrumentation hooks and add calls to them. - Look at include/profile/vector for an example. -

Add macros for the instrumentation hooks in - include/profile/impl/profiler.h. - Hook names must start with __profcxx_. - Make sure they transform - in no code with -D_NO_GLIBCXX_PROFILE_MAGIC. - Make sure all calls to any method in namespace __gnu_profile - is protected against reentrance using macro - _GLIBCXX_PROFILE_REENTRANCE_GUARD. - All names of methods in namespace __gnu_profile called from - profiler.h must start with __trace_magic_. -

Add the implementation of the diagnostic. -

  • - Create new file include/profile/impl/profiler_magic.h. -

  • - Define class __magic_info: public __object_info_base. - This is the representation of a line in the object table. - The __merge method is used to aggregate information - across all dynamic instances created at the same call context. - The __magnitude must return the estimation of the benefit - as a number of small operations, e.g., number of words copied. - The __write method is used to produce the raw trace. - The __advice method is used to produce the advice string. -

  • - Define class __magic_stack_info: public __magic_info. - This defines the content of a line in the stack table. -

  • - Define class __trace_magic: public __trace_base<__magic_info, - __magic_stack_info>. - It defines the content of the trace associated with this diagnostic. -

-

Add initialization and reporting calls in - include/profile/impl/profiler_trace.h. Use - __trace_vector_to_list as an example. -

Add documentation in file doc/xml/manual/profile_mode.xml. -

\ No newline at end of file diff --git a/libstdc++-v3/doc/html/manual/profile_mode_impl.html b/libstdc++-v3/doc/html/manual/profile_mode_impl.html deleted file mode 100644 index e9495273d52..00000000000 --- a/libstdc++-v3/doc/html/manual/profile_mode_impl.html +++ /dev/null @@ -1,50 +0,0 @@ - -Implementation Issues

Implementation Issues

Stack Traces

- Accurate stack traces are needed during profiling since we group events by - call context and dynamic instance. Without accurate traces, diagnostics - may be hard to interpret. For instance, when giving advice to the user - it is imperative to reference application code, not library code. -

- Currently we are using the libc backtrace routine to get - stack traces. - _GLIBCXX_PROFILE_STACK_DEPTH can be set - to 0 if you are willing to give up call context information, or to a small - positive value to reduce run time overhead. -

Symbolization of Instruction Addresses

- The profiling and analysis phases use only instruction addresses. - An external utility such as addr2line is needed to postprocess the result. - We do not plan to add symbolization support in the profile extension. - This would require access to symbol tables, debug information tables, - external programs or libraries and other system dependent information. -

Concurrency

- Our current model is simplistic, but precise. - We cannot afford to approximate because some of our diagnostics require - precise matching of operations to container instance and call context. - During profiling, we keep a single information table per diagnostic. - There is a single lock per information table. -

Using the Standard Library in the Instrumentation Implementation

- As much as we would like to avoid uses of libstdc++ within our - instrumentation library, containers such as unordered_map are very - appealing. We plan to use them as long as they are named properly - to avoid ambiguity. -

Malloc Hooks

- User applications/libraries can provide malloc hooks. - When the implementation of the malloc hooks uses stdlibc++, there can - be an infinite cycle between the profile mode instrumentation and the - malloc hook code. -

- We protect against reentrance to the profile mode instrumentation code, - which should avoid this problem in most cases. - The protection mechanism is thread safe and exception safe. - This mechanism does not prevent reentrance to the malloc hook itself, - which could still result in deadlock, if, for instance, the malloc hook - uses non-recursive locks. - XXX: A definitive solution to this problem would be for the profile extension - to use a custom allocator internally, and perhaps not to use libstdc++. -

Construction and Destruction of Global Objects

- The profiling library state is initialized at the first call to a profiling - method. This allows us to record the construction of all global objects. - However, we cannot do the same at destruction time. The trace is written - by a function registered by atexit, thus invoked by - exit. -

\ No newline at end of file