diff mbox series

PR88751: Backport to GCC 8 and 9 branches?

Message ID bec04eb5-e703-279c-09f8-e62eac12fd3e@linux.ibm.com
State New
Headers show
Series PR88751: Backport to GCC 8 and 9 branches? | expand

Commit Message

Andreas Krebbel Sept. 6, 2019, 8:11 a.m. UTC
Hi,

since this caused a critical performance regression in the OpenJ9 byte code interpreter after
migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.

Ok - after bootstrap and regression test went fine?

Andreas


commit d3dc20418aad41af83fe45ccba527deb0b334983
Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jun 6 11:35:04 2019 +0000

    Fix PR88751

    This patch implements a small improvement for the heuristic in lra
    which decides when it has to activate the simpler register allocation
    algorithm.

    gcc/ChangeLog:

    2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>

            PR rtl-optimization/88751
            * ira.c (ira): Use the number of the actually referenced registers
            when calculating the threshold.



    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4

Comments

Richard Biener Sept. 6, 2019, 10:48 a.m. UTC | #1
On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:
>
> Hi,
>
> since this caused a critical performance regression in the OpenJ9 byte code interpreter after
> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.
>
> Ok - after bootstrap and regression test went fine?

Looks reasonable to me.  But what about GCC 7?  I assume you also verified the
actual performance regression is gone.

Richard.

>
> Andreas
>
>
> commit d3dc20418aad41af83fe45ccba527deb0b334983
> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Thu Jun 6 11:35:04 2019 +0000
>
>     Fix PR88751
>
>     This patch implements a small improvement for the heuristic in lra
>     which decides when it has to activate the simpler register allocation
>     algorithm.
>
>     gcc/ChangeLog:
>
>     2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>
>
>             PR rtl-optimization/88751
>             * ira.c (ira): Use the number of the actually referenced registers
>             when calculating the threshold.
>
>
>
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4
>
>
> diff --git a/gcc/ira.c b/gcc/ira.c
> index 4a14fb31583..725636d8dc5 100644
> --- a/gcc/ira.c
> +++ b/gcc/ira.c
> @@ -5198,6 +5198,8 @@ ira (FILE *f)
>    int ira_max_point_before_emit;
>    bool saved_flag_caller_saves = flag_caller_saves;
>    enum ira_region saved_flag_ira_region = flag_ira_region;
> +  unsigned int i;
> +  int num_used_regs = 0;
>
>    clear_bb_flags ();
>
> @@ -5213,12 +5215,17 @@ ira (FILE *f)
>
>    ira_conflicts_p = optimize > 0;
>
> +  /* Determine the number of pseudos actually requiring coloring.  */
> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
> +    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
> +
>    /* If there are too many pseudos and/or basic blocks (e.g. 10K
>       pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
>       use simplified and faster algorithms in LRA.  */
>    lra_simple_p
>      = (ira_use_lra_p
> -       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
> +       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
> +
>    if (lra_simple_p)
>      {
>        /* It permits to skip live range splitting in LRA.  */
>
Andreas Krebbel Sept. 20, 2019, 9:27 a.m. UTC | #2
On 06.09.19 12:48, Richard Biener wrote:
> On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> since this caused a critical performance regression in the OpenJ9 byte code interpreter after
>> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.
>>
>> Ok - after bootstrap and regression test went fine?
> 
> Looks reasonable to me.  But what about GCC 7?  I assume you also verified the
> actual performance regression is gone.

I've committed the patch to GCC 7 and 8 branch after verifying that the change has the desired
effect on the source code file from OpenJ9.

GCC 9 branch is currently frozen. Ok, to apply there as well?

Andreas

> 
> Richard.
> 
>>
>> Andreas
>>
>>
>> commit d3dc20418aad41af83fe45ccba527deb0b334983
>> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Thu Jun 6 11:35:04 2019 +0000
>>
>>     Fix PR88751
>>
>>     This patch implements a small improvement for the heuristic in lra
>>     which decides when it has to activate the simpler register allocation
>>     algorithm.
>>
>>     gcc/ChangeLog:
>>
>>     2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>
>>
>>             PR rtl-optimization/88751
>>             * ira.c (ira): Use the number of the actually referenced registers
>>             when calculating the threshold.
>>
>>
>>
>>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4
>>
>>
>> diff --git a/gcc/ira.c b/gcc/ira.c
>> index 4a14fb31583..725636d8dc5 100644
>> --- a/gcc/ira.c
>> +++ b/gcc/ira.c
>> @@ -5198,6 +5198,8 @@ ira (FILE *f)
>>    int ira_max_point_before_emit;
>>    bool saved_flag_caller_saves = flag_caller_saves;
>>    enum ira_region saved_flag_ira_region = flag_ira_region;
>> +  unsigned int i;
>> +  int num_used_regs = 0;
>>
>>    clear_bb_flags ();
>>
>> @@ -5213,12 +5215,17 @@ ira (FILE *f)
>>
>>    ira_conflicts_p = optimize > 0;
>>
>> +  /* Determine the number of pseudos actually requiring coloring.  */
>> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
>> +    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
>> +
>>    /* If there are too many pseudos and/or basic blocks (e.g. 10K
>>       pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
>>       use simplified and faster algorithms in LRA.  */
>>    lra_simple_p
>>      = (ira_use_lra_p
>> -       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
>> +       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
>> +
>>    if (lra_simple_p)
>>      {
>>        /* It permits to skip live range splitting in LRA.  */
>>
Richard Biener Sept. 20, 2019, 12:02 p.m. UTC | #3
On Fri, Sep 20, 2019 at 11:28 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:
>
> On 06.09.19 12:48, Richard Biener wrote:
> > On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> since this caused a critical performance regression in the OpenJ9 byte code interpreter after
> >> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.
> >>
> >> Ok - after bootstrap and regression test went fine?
> >
> > Looks reasonable to me.  But what about GCC 7?  I assume you also verified the
> > actual performance regression is gone.
>
> I've committed the patch to GCC 7 and 8 branch after verifying that the change has the desired
> effect on the source code file from OpenJ9.
>
> GCC 9 branch is currently frozen. Ok, to apply there as well?

Yes, it shouldn't be frozen anymore...

Richard.

> Andreas
>
> >
> > Richard.
> >
> >>
> >> Andreas
> >>
> >>
> >> commit d3dc20418aad41af83fe45ccba527deb0b334983
> >> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>
> >> Date:   Thu Jun 6 11:35:04 2019 +0000
> >>
> >>     Fix PR88751
> >>
> >>     This patch implements a small improvement for the heuristic in lra
> >>     which decides when it has to activate the simpler register allocation
> >>     algorithm.
> >>
> >>     gcc/ChangeLog:
> >>
> >>     2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>
> >>
> >>             PR rtl-optimization/88751
> >>             * ira.c (ira): Use the number of the actually referenced registers
> >>             when calculating the threshold.
> >>
> >>
> >>
> >>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4
> >>
> >>
> >> diff --git a/gcc/ira.c b/gcc/ira.c
> >> index 4a14fb31583..725636d8dc5 100644
> >> --- a/gcc/ira.c
> >> +++ b/gcc/ira.c
> >> @@ -5198,6 +5198,8 @@ ira (FILE *f)
> >>    int ira_max_point_before_emit;
> >>    bool saved_flag_caller_saves = flag_caller_saves;
> >>    enum ira_region saved_flag_ira_region = flag_ira_region;
> >> +  unsigned int i;
> >> +  int num_used_regs = 0;
> >>
> >>    clear_bb_flags ();
> >>
> >> @@ -5213,12 +5215,17 @@ ira (FILE *f)
> >>
> >>    ira_conflicts_p = optimize > 0;
> >>
> >> +  /* Determine the number of pseudos actually requiring coloring.  */
> >> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
> >> +    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
> >> +
> >>    /* If there are too many pseudos and/or basic blocks (e.g. 10K
> >>       pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
> >>       use simplified and faster algorithms in LRA.  */
> >>    lra_simple_p
> >>      = (ira_use_lra_p
> >> -       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
> >> +       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
> >> +
> >>    if (lra_simple_p)
> >>      {
> >>        /* It permits to skip live range splitting in LRA.  */
> >>
>
diff mbox series

Patch

diff --git a/gcc/ira.c b/gcc/ira.c
index 4a14fb31583..725636d8dc5 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5198,6 +5198,8 @@  ira (FILE *f)
   int ira_max_point_before_emit;
   bool saved_flag_caller_saves = flag_caller_saves;
   enum ira_region saved_flag_ira_region = flag_ira_region;
+  unsigned int i;
+  int num_used_regs = 0;

   clear_bb_flags ();

@@ -5213,12 +5215,17 @@  ira (FILE *f)

   ira_conflicts_p = optimize > 0;

+  /* Determine the number of pseudos actually requiring coloring.  */
+  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
+
   /* If there are too many pseudos and/or basic blocks (e.g. 10K
      pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
      use simplified and faster algorithms in LRA.  */
   lra_simple_p
     = (ira_use_lra_p
-       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
+       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
+
   if (lra_simple_p)
     {
       /* It permits to skip live range splitting in LRA.  */