mid-end: Use integral time intervals in timevar.cc
On some AArch64 bootstrapped builds, we were getting a flaky test because the floating point operations in `get_time` were being fused with the floating point operations in `timevar_accumulate`. This meant that the rounding behaviour of our multiplication with `ticks_to_msec` was different when used in `timer::start` and when performed in `timer::stop`. These extra inaccuracies led to the testcase `g++.dg/ext/timevar1.C` being flaky on some hardware. ------------------------------ Avoiding the inlining which was agreed to be undesirable. Three alternative approaches: 1) Use `-ffp-contract=on` to avoid this particular optimisation. 2) Adjusting the code so that the "tolerance" is always of the order of a "tick". 3) Recording times and elapsed differences in integral values. - Could be in terms of a standard measurement (e.g. nanoseconds or microseconds). - Could be in terms of whatever integral value ("ticks" / secondsµseconds / "clock ticks") is returned from the syscall chosen at configure time. While `-ffp-contract=on` removes the problem that I bumped into, there has been a similar bug on x86 that was to do with a different floating point problem that also happens after `get_time` and `timevar_accumulate` both being inlined into the same function. Hence it seems worth choosing a different approach. Of the two other solutions, recording measurements in integral values seems the most robust against slightly "off" measurements being presented to the user -- even though it could avoid the ICE that creates a flaky test. I considered storing time in whatever units our syscall returns and normalising them at the time we print out rather than normalising them to nanoseconds at the point we record our "current time". The logic being that normalisation could have some rounding affect (e.g. if TICKS_PER_SECOND is 3) that would be taken into account in calculations. I decided against it in order to give the values recorded in `timevar_time_def` some interpretive value so it's easier to read the code. Compared to the small rounding that would represent a tiny amount of time and AIUI can not trigger the same kind of ICE's as we are attempting to fix, said interpretive value seems more valuable. Recording time in microseconds seemed reasonable since all obvious values for ticks and `getrusage` are at microsecond granularity or less precise. That said, since TICKS_PER_SECOND and CLOCKS_PER_SEC are both variables given to use by the host system I was not sure of that enough to make this decision. ------------------------------ timer::all_zero is ignoring rows which are inconsequential to the user and would be printed out as all zeros. Since upon printing rows we convert to the same double value and print out the same precision as before, we return true/false based on the same amount of time as before. timer::print_row casts to a floating point measurement in units of seconds as was printed out before. timer::validate_phases -- I'm printing out nanoseconds here rather than floating point seconds since this is an error message for when things have "gone wrong" printing out the actual nanoseconds that have been recorded seems like the best approach. N.b. since we now print out nanoseconds instead of floating point value the padding requirements are different. Originally we were padding to 24 characters and printing 18 decimal places. This looked odd with the now visually smaller values getting printed. I judged 13 characters (corresponding to 2 hours) to be a reasonable point at which our alignment could start to degrade and this provides a more compact output for the majority of cases (checked by triggering the error case via GDB). ------------------------------ N.b. I use a literal 1000000000 for "NANOSEC_PER_SEC". I believe this would fit in an integer on all hosts that GCC supports, but am not certain there are not strange integer sizes we support hence am pointing it out for special attention during review. ------------------------------ No expected change in generated code. Bootstrapped and regtested on AArch64 with no regressions. Hope this is acceptable -- I had originally planned to use `-ffp-contract` as agreed until I saw mention of the old x86 bug in the same area which was not to do with floating point contraction of operations (PR 99903). gcc/ChangeLog: PR middle-end/110316 PR middle-end/9903 * timevar.cc (NANOSEC_PER_SEC, TICKS_TO_NANOSEC, CLOCKS_TO_NANOSEC, nanosec_to_floating_sec, percent_of): New. (TICKS_TO_MSEC, CLOCKS_TO_MSEC): Remove these macros. (timer::validate_phases): Use integral arithmetic to check validity. (timer::print_row, timer::print): Convert from integral nanoseconds to floating point seconds before printing. (timer::all_zero): Change limit to nanosec count instead of fractional count of seconds. (make_json_for_timevar_time_def): Convert from integral nanoseconds to floating point seconds before recording. * timevar.h (struct timevar_time_def): Update all measurements to use uint64_t nanoseconds rather than seconds stored in a double.
This commit is contained in:
parent
04aa0edcac
commit
0782b01c9e
2 changed files with 56 additions and 44 deletions
|
@ -43,7 +43,7 @@ struct tms
|
|||
# define RUSAGE_SELF 0
|
||||
#endif
|
||||
|
||||
/* Calculation of scale factor to convert ticks to microseconds.
|
||||
/* Calculation of scale factor to convert ticks to seconds.
|
||||
We mustn't use CLOCKS_PER_SEC except with clock(). */
|
||||
#if HAVE_SYSCONF && defined _SC_CLK_TCK
|
||||
# define TICKS_PER_SECOND sysconf (_SC_CLK_TCK) /* POSIX 1003.1-1996 */
|
||||
|
@ -92,14 +92,15 @@ struct tms
|
|||
the underlying constants, and that can be very slow, so we have to
|
||||
precompute them. Whose wonderful idea was it to make all those
|
||||
_constants_ variable at run time, anyway? */
|
||||
#define NANOSEC_PER_SEC 1000000000
|
||||
#ifdef USE_TIMES
|
||||
static double ticks_to_msec;
|
||||
#define TICKS_TO_MSEC (1 / (double)TICKS_PER_SECOND)
|
||||
static uint64_t ticks_to_nanosec;
|
||||
#define TICKS_TO_NANOSEC (NANOSEC_PER_SEC / TICKS_PER_SECOND)
|
||||
#endif
|
||||
|
||||
#ifdef USE_CLOCK
|
||||
static double clocks_to_msec;
|
||||
#define CLOCKS_TO_MSEC (1 / (double)CLOCKS_PER_SEC)
|
||||
static uint64_t clocks_to_nanosec;
|
||||
#define CLOCKS_TO_NANOSEC (NANOSEC_PER_SEC / CLOCKS_PER_SEC)
|
||||
#endif
|
||||
|
||||
/* Non-NULL if timevars should be used. In GCC, this happens with
|
||||
|
@ -243,18 +244,20 @@ get_time (struct timevar_time_def *now)
|
|||
{
|
||||
#ifdef USE_TIMES
|
||||
struct tms tms;
|
||||
now->wall = times (&tms) * ticks_to_msec;
|
||||
now->user = tms.tms_utime * ticks_to_msec;
|
||||
now->sys = tms.tms_stime * ticks_to_msec;
|
||||
now->wall = times (&tms) * ticks_to_nanosec;
|
||||
now->user = tms.tms_utime * ticks_to_nanosec;
|
||||
now->sys = tms.tms_stime * ticks_to_nanosec;
|
||||
#endif
|
||||
#ifdef USE_GETRUSAGE
|
||||
struct rusage rusage;
|
||||
getrusage (RUSAGE_SELF, &rusage);
|
||||
now->user = rusage.ru_utime.tv_sec + rusage.ru_utime.tv_usec * 1e-6;
|
||||
now->sys = rusage.ru_stime.tv_sec + rusage.ru_stime.tv_usec * 1e-6;
|
||||
now->user = rusage.ru_utime.tv_sec * NANOSEC_PER_SEC
|
||||
+ rusage.ru_utime.tv_usec * 1000;
|
||||
now->sys = rusage.ru_stime.tv_sec * NANOSEC_PER_SEC
|
||||
+ rusage.ru_stime.tv_usec * 1000;
|
||||
#endif
|
||||
#ifdef USE_CLOCK
|
||||
now->user = clock () * clocks_to_msec;
|
||||
now->user = clock () * clocks_to_nanosec;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
|
@ -305,10 +308,10 @@ timer::timer () :
|
|||
/* Initialize configuration-specific state.
|
||||
Ideally this would be one-time initialization. */
|
||||
#ifdef USE_TIMES
|
||||
ticks_to_msec = TICKS_TO_MSEC;
|
||||
ticks_to_nanosec = TICKS_TO_NANOSEC;
|
||||
#endif
|
||||
#ifdef USE_CLOCK
|
||||
clocks_to_msec = CLOCKS_TO_MSEC;
|
||||
clocks_to_nanosec = CLOCKS_TO_NANOSEC;
|
||||
#endif
|
||||
}
|
||||
|
||||
|
@ -617,12 +620,11 @@ timer::validate_phases (FILE *fp) const
|
|||
{
|
||||
unsigned int /* timevar_id_t */ id;
|
||||
const timevar_time_def *total = &m_timevars[TV_TOTAL].elapsed;
|
||||
double phase_user = 0.0;
|
||||
double phase_sys = 0.0;
|
||||
double phase_wall = 0.0;
|
||||
uint64_t phase_user = 0;
|
||||
uint64_t phase_sys = 0;
|
||||
uint64_t phase_wall = 0;
|
||||
size_t phase_ggc_mem = 0;
|
||||
static char phase_prefix[] = "phase ";
|
||||
const double tolerance = 1.000001; /* One part in a million. */
|
||||
|
||||
for (id = 0; id < (unsigned int) TIMEVAR_LAST; ++id)
|
||||
{
|
||||
|
@ -641,26 +643,32 @@ timer::validate_phases (FILE *fp) const
|
|||
}
|
||||
}
|
||||
|
||||
if (phase_user > total->user * tolerance
|
||||
|| phase_sys > total->sys * tolerance
|
||||
|| phase_wall > total->wall * tolerance
|
||||
|| phase_ggc_mem > total->ggc_mem * tolerance)
|
||||
if (phase_user > total->user
|
||||
|| phase_sys > total->sys
|
||||
|| phase_wall > total->wall
|
||||
|| phase_ggc_mem > total->ggc_mem)
|
||||
{
|
||||
|
||||
fprintf (fp, "Timing error: total of phase timers exceeds total time.\n");
|
||||
if (phase_user > total->user)
|
||||
fprintf (fp, "user %24.18e > %24.18e\n", phase_user, total->user);
|
||||
fprintf (fp, "user %13" PRIu64 " > %13" PRIu64 "\n",
|
||||
phase_user, total->user);
|
||||
if (phase_sys > total->sys)
|
||||
fprintf (fp, "sys %24.18e > %24.18e\n", phase_sys, total->sys);
|
||||
fprintf (fp, "sys %13" PRIu64 " > %13" PRIu64 "\n",
|
||||
phase_sys, total->sys);
|
||||
if (phase_wall > total->wall)
|
||||
fprintf (fp, "wall %24.18e > %24.18e\n", phase_wall, total->wall);
|
||||
fprintf (fp, "wall %13" PRIu64 " > %13" PRIu64 "\n",
|
||||
phase_wall, total->wall);
|
||||
if (phase_ggc_mem > total->ggc_mem)
|
||||
fprintf (fp, "ggc_mem %24lu > %24lu\n", (unsigned long)phase_ggc_mem,
|
||||
fprintf (fp, "ggc_mem %13lu > %13lu\n", (unsigned long)phase_ggc_mem,
|
||||
(unsigned long)total->ggc_mem);
|
||||
gcc_unreachable ();
|
||||
}
|
||||
}
|
||||
|
||||
#define nanosec_to_floating_sec(NANO) ((double)(NANO) * 1e-9)
|
||||
#define percent_of(TOTAL, SUBTOTAL) \
|
||||
((TOTAL) == 0 ? 0 : ((double)SUBTOTAL / TOTAL) * 100)
|
||||
/* Helper function for timer::print. */
|
||||
|
||||
void
|
||||
|
@ -674,22 +682,22 @@ timer::print_row (FILE *fp,
|
|||
#ifdef HAVE_USER_TIME
|
||||
/* Print user-mode time for this process. */
|
||||
fprintf (fp, "%7.2f (%3.0f%%)",
|
||||
elapsed.user,
|
||||
(total->user == 0 ? 0 : elapsed.user / total->user) * 100);
|
||||
nanosec_to_floating_sec (elapsed.user),
|
||||
percent_of (total->user, elapsed.user));
|
||||
#endif /* HAVE_USER_TIME */
|
||||
|
||||
#ifdef HAVE_SYS_TIME
|
||||
/* Print system-mode time for this process. */
|
||||
fprintf (fp, "%7.2f (%3.0f%%)",
|
||||
elapsed.sys,
|
||||
(total->sys == 0 ? 0 : elapsed.sys / total->sys) * 100);
|
||||
nanosec_to_floating_sec (elapsed.sys),
|
||||
percent_of (total->sys, elapsed.sys));
|
||||
#endif /* HAVE_SYS_TIME */
|
||||
|
||||
#ifdef HAVE_WALL_TIME
|
||||
/* Print wall clock time elapsed. */
|
||||
fprintf (fp, "%7.2f (%3.0f%%)",
|
||||
elapsed.wall,
|
||||
(total->wall == 0 ? 0 : elapsed.wall / total->wall) * 100);
|
||||
nanosec_to_floating_sec (elapsed.wall),
|
||||
percent_of (total->wall, elapsed.wall));
|
||||
#endif /* HAVE_WALL_TIME */
|
||||
|
||||
/* Print the amount of ggc memory allocated. */
|
||||
|
@ -707,7 +715,8 @@ timer::print_row (FILE *fp,
|
|||
bool
|
||||
timer::all_zero (const timevar_time_def &elapsed)
|
||||
{
|
||||
const double tiny = 5e-3;
|
||||
/* 5000000 nanosec == 5e-3 seconds. */
|
||||
uint64_t tiny = 5000000;
|
||||
return (elapsed.user < tiny
|
||||
&& elapsed.sys < tiny
|
||||
&& elapsed.wall < tiny
|
||||
|
@ -800,13 +809,13 @@ timer::print (FILE *fp)
|
|||
/* Print total time. */
|
||||
fprintf (fp, " %-35s:", "TOTAL");
|
||||
#ifdef HAVE_USER_TIME
|
||||
fprintf (fp, "%7.2f ", total->user);
|
||||
fprintf (fp, "%7.2f ", nanosec_to_floating_sec (total->user));
|
||||
#endif
|
||||
#ifdef HAVE_SYS_TIME
|
||||
fprintf (fp, "%8.2f ", total->sys);
|
||||
fprintf (fp, "%8.2f ", nanosec_to_floating_sec (total->sys));
|
||||
#endif
|
||||
#ifdef HAVE_WALL_TIME
|
||||
fprintf (fp, "%8.2f ", total->wall);
|
||||
fprintf (fp, "%8.2f ", nanosec_to_floating_sec (total->wall));
|
||||
#endif
|
||||
fprintf (fp, PRsa (7) "\n", SIZE_AMOUNT (total->ggc_mem));
|
||||
|
||||
|
@ -832,12 +841,16 @@ json::object *
|
|||
make_json_for_timevar_time_def (const timevar_time_def &ttd)
|
||||
{
|
||||
json::object *obj = new json::object ();
|
||||
obj->set ("user", new json::float_number (ttd.user));
|
||||
obj->set ("sys", new json::float_number (ttd.sys));
|
||||
obj->set ("wall", new json::float_number (ttd.wall));
|
||||
obj->set ("user",
|
||||
new json::float_number (nanosec_to_floating_sec (ttd.user)));
|
||||
obj->set ("sys", new json::float_number (nanosec_to_floating_sec (ttd.sys)));
|
||||
obj->set ("wall",
|
||||
new json::float_number (nanosec_to_floating_sec (ttd.wall)));
|
||||
obj->set ("ggc_mem", new json::integer_number (ttd.ggc_mem));
|
||||
return obj;
|
||||
}
|
||||
#undef nanosec_to_floating_sec
|
||||
#undef percent_of
|
||||
|
||||
/* Create a json value representing this object, suitable for use
|
||||
in SARIF output. */
|
||||
|
|
|
@ -46,7 +46,7 @@ namespace json { class value; }
|
|||
*/
|
||||
|
||||
/* This structure stores the various varieties of time that can be
|
||||
measured. Times are stored in seconds. The time may be an
|
||||
measured. Times are stored in nanoseconds. The time may be an
|
||||
absolute time or a time difference; in the former case, the time
|
||||
base is undefined, except that the difference between two times
|
||||
produces a valid time difference. */
|
||||
|
@ -54,14 +54,13 @@ namespace json { class value; }
|
|||
struct timevar_time_def
|
||||
{
|
||||
/* User time in this process. */
|
||||
double user;
|
||||
uint64_t user;
|
||||
|
||||
/* System time (if applicable for this host platform) in this
|
||||
process. */
|
||||
double sys;
|
||||
/* System time (if applicable for this host platform) in this process. */
|
||||
uint64_t sys;
|
||||
|
||||
/* Wall clock time. */
|
||||
double wall;
|
||||
uint64_t wall;
|
||||
|
||||
/* Garbage collector memory. */
|
||||
size_t ggc_mem;
|
||||
|
|
Loading…
Add table
Reference in a new issue