diff mbox

[RFC] Disable -fprofile-use related optimizations if corresponding .gcda file not found.

Message ID 5615442E.9090200@partner.samsung.com
State New
Headers show

Commit Message

max Oct. 7, 2015, 4:11 p.m. UTC
Hi,

when testing OpenSSL performance, I found out that sometimes PGO-built 
binaries can actually introduce performance regressions. We could 
identify affected object files and disable PGO for them by simply 
removing corresponding .gcda file. However, even if profile data is not 
presented, GCC doesn't switch back -fprofile-use dependent optimizations 
(e.g. -fbranch-probabilities, -fvpt, etc). This may also lead to 
performance degradations.

The issue had already raised quite time ago 
(https://gcc.gnu.org/ml/gcc-patches/2009-09/msg02119.html), but for some 
reasons wasn't discussed.

Here a draft patch that disables -fprofile-use related optimizations if 
profile data wasn't found (perhaps it makes sense to introduce a special 
switch for this?). Does this look reasonable?

Thanks,
-Maxim

Comments

Andrew Pinski Oct. 7, 2015, 4:18 p.m. UTC | #1
On Wed, Oct 7, 2015 at 9:11 AM, Maxim Ostapenko
<m.ostapenko@partner.samsung.com> wrote:
> Hi,
>
> when testing OpenSSL performance, I found out that sometimes PGO-built
> binaries can actually introduce performance regressions. We could identify
> affected object files and disable PGO for them by simply removing
> corresponding .gcda file. However, even if profile data is not presented,
> GCC doesn't switch back -fprofile-use dependent optimizations (e.g.
> -fbranch-probabilities, -fvpt, etc). This may also lead to performance
> degradations.
>
> The issue had already raised quite time ago
> (https://gcc.gnu.org/ml/gcc-patches/2009-09/msg02119.html), but for some
> reasons wasn't discussed.
>
> Here a draft patch that disables -fprofile-use related optimizations if
> profile data wasn't found (perhaps it makes sense to introduce a special
> switch for this?). Does this look reasonable?

I thought if profile is not present, then branch probabilities goes
back to the original heuristics?
Which option is really causing the performance degradation here?

Also I think your patch is very incomplete as someone could use
-frename-registers with -fprofile-use and then you just turned it off.

Thanks,
Andrew Pinski

>
> Thanks,
> -Maxim
max Oct. 7, 2015, 4:28 p.m. UTC | #2
On 07/10/15 19:18, Andrew Pinski wrote:
> On Wed, Oct 7, 2015 at 9:11 AM, Maxim Ostapenko
> <m.ostapenko@partner.samsung.com> wrote:
>> Hi,
>>
>> when testing OpenSSL performance, I found out that sometimes PGO-built
>> binaries can actually introduce performance regressions. We could identify
>> affected object files and disable PGO for them by simply removing
>> corresponding .gcda file. However, even if profile data is not presented,
>> GCC doesn't switch back -fprofile-use dependent optimizations (e.g.
>> -fbranch-probabilities, -fvpt, etc). This may also lead to performance
>> degradations.
>>
>> The issue had already raised quite time ago
>> (https://gcc.gnu.org/ml/gcc-patches/2009-09/msg02119.html), but for some
>> reasons wasn't discussed.
>>
>> Here a draft patch that disables -fprofile-use related optimizations if
>> profile data wasn't found (perhaps it makes sense to introduce a special
>> switch for this?). Does this look reasonable?
> I thought if profile is not present, then branch probabilities goes
> back to the original heuristics?
> Which option is really causing the performance degradation here?

-fprofile-use enabled -fpeel-loops that in turn enabled 
-frename-registers. This caused the scheduler to transform the code in 
sched2 pass.

> Also I think your patch is very incomplete as someone could use
> -frename-registers with -fprofile-use and then you just turned it off.
>
> Thanks,
> Andrew Pinski

Doesn't -fprofile-use enable -frename-registers transitively through 
-fpeel-loops?

>> Thanks,
>> -Maxim
Markus Trippelsdorf Oct. 7, 2015, 4:43 p.m. UTC | #3
On 2015.10.07 at 19:11 +0300, Maxim Ostapenko wrote:
> when testing OpenSSL performance, I found out that sometimes PGO-built 
> binaries can actually introduce performance regressions. We could 
> identify affected object files and disable PGO for them by simply 
> removing corresponding .gcda file. However, even if profile data is not 
> presented, GCC doesn't switch back -fprofile-use dependent optimizations 
> (e.g. -fbranch-probabilities, -fvpt, etc). This may also lead to 
> performance degradations.
> 
> The issue had already raised quite time ago 
> (https://gcc.gnu.org/ml/gcc-patches/2009-09/msg02119.html), but for some 
> reasons wasn't discussed.
> 
> Here a draft patch that disables -fprofile-use related optimizations if 
> profile data wasn't found (perhaps it makes sense to introduce a special 
> switch for this?). Does this look reasonable?

I find -fprofile-use a useful shortcut to: -fbranch-probabilities,
-fvpt, -funroll-loops, -fpeel-loops, -ftracer, -ftree-vectorize, and
ftree-loop-distribute-patterns.

There are many applications were the bulk of the speedup surprisingly
comes from these flags alone, even without any gcda file. E.g. with
tramp3d-v4 (-Ofast) I get ~5.3 percent speedup (without gcda) and an
additional 2.2 percent when using a gcda file.

> +/* Reset flags if a .gcda file is not found.  */ 
> +static void
> +disable_profile_use_flags (void)
> +{
> +  flag_branch_probabilities = flag_profile_values = flag_unroll_loops = false;
> +  flag_value_profile_transformations
> +   = flag_tree_loop_distribute_patterns = false;
> +  flag_rename_registers = flag_peel_loops = false;
> +  flag_profile_reorder_functions = flag_tracer = false;
> +}

Here you are disabling unrelated flags, e.g. flag_rename_registers.
Andrew Pinski Oct. 7, 2015, 5:07 p.m. UTC | #4
> On Oct 7, 2015, at 9:28 AM, Maxim Ostapenko <m.ostapenko@partner.samsung.com> wrote:
> 
> 
> 
>> On 07/10/15 19:18, Andrew Pinski wrote:
>> On Wed, Oct 7, 2015 at 9:11 AM, Maxim Ostapenko
>> <m.ostapenko@partner.samsung.com> wrote:
>>> Hi,
>>> 
>>> when testing OpenSSL performance, I found out that sometimes PGO-built
>>> binaries can actually introduce performance regressions. We could identify
>>> affected object files and disable PGO for them by simply removing
>>> corresponding .gcda file. However, even if profile data is not presented,
>>> GCC doesn't switch back -fprofile-use dependent optimizations (e.g.
>>> -fbranch-probabilities, -fvpt, etc). This may also lead to performance
>>> degradations.
>>> 
>>> The issue had already raised quite time ago
>>> (https://gcc.gnu.org/ml/gcc-patches/2009-09/msg02119.html), but for some
>>> reasons wasn't discussed.
>>> 
>>> Here a draft patch that disables -fprofile-use related optimizations if
>>> profile data wasn't found (perhaps it makes sense to introduce a special
>>> switch for this?). Does this look reasonable?
>> I thought if profile is not present, then branch probabilities goes
>> back to the original heuristics?
>> Which option is really causing the performance degradation here?
> 
> -fprofile-use enabled -fpeel-loops that in turn enabled -frename-registers. This caused the scheduler to transform the code in sched2 pass.

Why not figure out how much to improve that instead. Rename register is just doing renaming of register usage and not undoing any scheduling at all (well it might cause some moves to be removed).  I think you really should dig into deeper why it is causing a performance issue. 

> 
>> Also I think your patch is very incomplete as someone could use
>> -frename-registers with -fprofile-use and then you just turned it off.
>> 
>> Thanks,
>> Andrew Pinski
> 
> Doesn't -fprofile-use enable -frename-registers transitively through -fpeel-loops?

Yes. But I am saying some one could do -fprofile-use -frename-registers and expect rename registers to stay on even if there is no profile. 

Thanks,
Andrew


> 
>>> Thanks,
>>> -Maxim
>
Nikolai Bozhenov Oct. 8, 2015, 7:54 a.m. UTC | #5
On 10/07/2015 08:07 PM, pinskia@gmail.com wrote:
> Yes. But I am saying some one could do -fprofile-use -frename-registers and expect rename registers to stay on even if there is no profile.

That's true, we shouldn't disable any options that were explicitly 
requested by
the user.

> Why not figure out how much to improve that instead. Rename register is just doing renaming of register usage and not undoing any scheduling at all (well it might cause some moves to be removed).  I think you really should dig into deeper why it is causing a performance issue.

Yep, the better way is to fix all the problems in all the passes. We're
working on it but it may take a while. Meanwhile, we want to use PGO and we
need a way to avoid degradations caused by it. We want to fallback to the
usual compilation process for a source file if we don't like how it is
compiled with PGO. And what we want to do is to simply remove the
corresponding gcda file and have the compiler ignore the -fprofile-use 
option
for the translation unit.

Anyway, I find current GCC's behavior to be confusing. Currently it silently
ignores missed profile data but triggers some opaque internal options which
affect further compilation. I think in case of a missed gcda file GCC should
either ignore -fprofile-use completely or issue a warning/error.

Thanks,
Nikolai
Richard Biener Oct. 8, 2015, 10:23 a.m. UTC | #6
On Thu, Oct 8, 2015 at 9:54 AM, Nikolai Bozhenov <n.bozhenov@samsung.com> wrote:
>
> On 10/07/2015 08:07 PM, pinskia@gmail.com wrote:
>>
>> Yes. But I am saying some one could do -fprofile-use -frename-registers
>> and expect rename registers to stay on even if there is no profile.
>
>
> That's true, we shouldn't disable any options that were explicitly requested
> by
> the user.
>
>> Why not figure out how much to improve that instead. Rename register is
>> just doing renaming of register usage and not undoing any scheduling at all
>> (well it might cause some moves to be removed).  I think you really should
>> dig into deeper why it is causing a performance issue.
>
>
> Yep, the better way is to fix all the problems in all the passes. We're
> working on it but it may take a while. Meanwhile, we want to use PGO and we
> need a way to avoid degradations caused by it. We want to fallback to the
> usual compilation process for a source file if we don't like how it is
> compiled with PGO. And what we want to do is to simply remove the
> corresponding gcda file and have the compiler ignore the -fprofile-use
> option
> for the translation unit.
>
> Anyway, I find current GCC's behavior to be confusing. Currently it silently
> ignores missed profile data but triggers some opaque internal options which
> affect further compilation. I think in case of a missed gcda file GCC should
> either ignore -fprofile-use completely or issue a warning/error.

I agree that it is confusing.  You have to watch out for explicit
-fprofile-use -fXXX
and not clear flags set explicitely though.  Didn't look at the patch in detail.

Richard.

> Thanks,
> Nikolai
diff mbox

Patch

gcc/ChangeLog:

2015-10-07  Maxim Ostapenko  <m.ostapenko@partner.samsung.com>

	* coverage.c (disable_profile_use_flags): New function.
	(read_counts_file): Call it if corresponding .gcda file wasn't found.

diff --git a/gcc/coverage.c b/gcc/coverage.c
index 4c06fa4..37e16b7 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -134,6 +134,7 @@  static bool coverage_obj_init (void);
 static vec<constructor_elt, va_gc> *coverage_obj_fn
 (vec<constructor_elt, va_gc> *, tree, struct coverage_data const *);
 static void coverage_obj_finish (vec<constructor_elt, va_gc> *);
+static void disable_profile_use_flags (void);
 
 /* Return the type node for gcov_type.  */
 
@@ -190,7 +191,10 @@  read_counts_file (void)
   unsigned cfg_checksum = 0;
 
   if (!gcov_open (da_file_name, 1))
-    return;
+    {
+      disable_profile_use_flags ();
+      return;
+    }
 
   if (!gcov_magic (gcov_read_unsigned (), GCOV_DATA_MAGIC))
     {
@@ -1219,4 +1223,14 @@  coverage_finish (void)
   da_file_name = NULL;
 }
 
+/* Reset flags if a .gcda file is not found.  */ 
+static void
+disable_profile_use_flags (void)
+{
+  flag_branch_probabilities = flag_profile_values = flag_unroll_loops = false;
+  flag_value_profile_transformations
+   = flag_tree_loop_distribute_patterns = false;
+  flag_rename_registers = flag_peel_loops = false;
+  flag_profile_reorder_functions = flag_tracer = false;
+}
+
 #include "gt-coverage.h"