Message ID | bec04eb5-e703-279c-09f8-e62eac12fd3e@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | PR88751: Backport to GCC 8 and 9 branches? | expand |
On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote: > > Hi, > > since this caused a critical performance regression in the OpenJ9 byte code interpreter after > migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch. > > Ok - after bootstrap and regression test went fine? Looks reasonable to me. But what about GCC 7? I assume you also verified the actual performance regression is gone. Richard. > > Andreas > > > commit d3dc20418aad41af83fe45ccba527deb0b334983 > Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4> > Date: Thu Jun 6 11:35:04 2019 +0000 > > Fix PR88751 > > This patch implements a small improvement for the heuristic in lra > which decides when it has to activate the simpler register allocation > algorithm. > > gcc/ChangeLog: > > 2019-06-06 Andreas Krebbel <krebbel@linux.ibm.com> > > PR rtl-optimization/88751 > * ira.c (ira): Use the number of the actually referenced registers > when calculating the threshold. > > > > git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4 > > > diff --git a/gcc/ira.c b/gcc/ira.c > index 4a14fb31583..725636d8dc5 100644 > --- a/gcc/ira.c > +++ b/gcc/ira.c > @@ -5198,6 +5198,8 @@ ira (FILE *f) > int ira_max_point_before_emit; > bool saved_flag_caller_saves = flag_caller_saves; > enum ira_region saved_flag_ira_region = flag_ira_region; > + unsigned int i; > + int num_used_regs = 0; > > clear_bb_flags (); > > @@ -5213,12 +5215,17 @@ ira (FILE *f) > > ira_conflicts_p = optimize > 0; > > + /* Determine the number of pseudos actually requiring coloring. */ > + for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++) > + num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i)); > + > /* If there are too many pseudos and/or basic blocks (e.g. 10K > pseudos and 10K blocks or 100K pseudos and 1K blocks), we will > use simplified and faster algorithms in LRA. */ > lra_simple_p > = (ira_use_lra_p > - && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun)); > + && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun)); > + > if (lra_simple_p) > { > /* It permits to skip live range splitting in LRA. */ >
On 06.09.19 12:48, Richard Biener wrote: > On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote: >> >> Hi, >> >> since this caused a critical performance regression in the OpenJ9 byte code interpreter after >> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch. >> >> Ok - after bootstrap and regression test went fine? > > Looks reasonable to me. But what about GCC 7? I assume you also verified the > actual performance regression is gone. I've committed the patch to GCC 7 and 8 branch after verifying that the change has the desired effect on the source code file from OpenJ9. GCC 9 branch is currently frozen. Ok, to apply there as well? Andreas > > Richard. > >> >> Andreas >> >> >> commit d3dc20418aad41af83fe45ccba527deb0b334983 >> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4> >> Date: Thu Jun 6 11:35:04 2019 +0000 >> >> Fix PR88751 >> >> This patch implements a small improvement for the heuristic in lra >> which decides when it has to activate the simpler register allocation >> algorithm. >> >> gcc/ChangeLog: >> >> 2019-06-06 Andreas Krebbel <krebbel@linux.ibm.com> >> >> PR rtl-optimization/88751 >> * ira.c (ira): Use the number of the actually referenced registers >> when calculating the threshold. >> >> >> >> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4 >> >> >> diff --git a/gcc/ira.c b/gcc/ira.c >> index 4a14fb31583..725636d8dc5 100644 >> --- a/gcc/ira.c >> +++ b/gcc/ira.c >> @@ -5198,6 +5198,8 @@ ira (FILE *f) >> int ira_max_point_before_emit; >> bool saved_flag_caller_saves = flag_caller_saves; >> enum ira_region saved_flag_ira_region = flag_ira_region; >> + unsigned int i; >> + int num_used_regs = 0; >> >> clear_bb_flags (); >> >> @@ -5213,12 +5215,17 @@ ira (FILE *f) >> >> ira_conflicts_p = optimize > 0; >> >> + /* Determine the number of pseudos actually requiring coloring. */ >> + for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++) >> + num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i)); >> + >> /* If there are too many pseudos and/or basic blocks (e.g. 10K >> pseudos and 10K blocks or 100K pseudos and 1K blocks), we will >> use simplified and faster algorithms in LRA. */ >> lra_simple_p >> = (ira_use_lra_p >> - && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun)); >> + && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun)); >> + >> if (lra_simple_p) >> { >> /* It permits to skip live range splitting in LRA. */ >>
On Fri, Sep 20, 2019 at 11:28 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote: > > On 06.09.19 12:48, Richard Biener wrote: > > On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> since this caused a critical performance regression in the OpenJ9 byte code interpreter after > >> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch. > >> > >> Ok - after bootstrap and regression test went fine? > > > > Looks reasonable to me. But what about GCC 7? I assume you also verified the > > actual performance regression is gone. > > I've committed the patch to GCC 7 and 8 branch after verifying that the change has the desired > effect on the source code file from OpenJ9. > > GCC 9 branch is currently frozen. Ok, to apply there as well? Yes, it shouldn't be frozen anymore... Richard. > Andreas > > > > > Richard. > > > >> > >> Andreas > >> > >> > >> commit d3dc20418aad41af83fe45ccba527deb0b334983 > >> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4> > >> Date: Thu Jun 6 11:35:04 2019 +0000 > >> > >> Fix PR88751 > >> > >> This patch implements a small improvement for the heuristic in lra > >> which decides when it has to activate the simpler register allocation > >> algorithm. > >> > >> gcc/ChangeLog: > >> > >> 2019-06-06 Andreas Krebbel <krebbel@linux.ibm.com> > >> > >> PR rtl-optimization/88751 > >> * ira.c (ira): Use the number of the actually referenced registers > >> when calculating the threshold. > >> > >> > >> > >> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4 > >> > >> > >> diff --git a/gcc/ira.c b/gcc/ira.c > >> index 4a14fb31583..725636d8dc5 100644 > >> --- a/gcc/ira.c > >> +++ b/gcc/ira.c > >> @@ -5198,6 +5198,8 @@ ira (FILE *f) > >> int ira_max_point_before_emit; > >> bool saved_flag_caller_saves = flag_caller_saves; > >> enum ira_region saved_flag_ira_region = flag_ira_region; > >> + unsigned int i; > >> + int num_used_regs = 0; > >> > >> clear_bb_flags (); > >> > >> @@ -5213,12 +5215,17 @@ ira (FILE *f) > >> > >> ira_conflicts_p = optimize > 0; > >> > >> + /* Determine the number of pseudos actually requiring coloring. */ > >> + for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++) > >> + num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i)); > >> + > >> /* If there are too many pseudos and/or basic blocks (e.g. 10K > >> pseudos and 10K blocks or 100K pseudos and 1K blocks), we will > >> use simplified and faster algorithms in LRA. */ > >> lra_simple_p > >> = (ira_use_lra_p > >> - && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun)); > >> + && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun)); > >> + > >> if (lra_simple_p) > >> { > >> /* It permits to skip live range splitting in LRA. */ > >> >
diff --git a/gcc/ira.c b/gcc/ira.c index 4a14fb31583..725636d8dc5 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -5198,6 +5198,8 @@ ira (FILE *f) int ira_max_point_before_emit; bool saved_flag_caller_saves = flag_caller_saves; enum ira_region saved_flag_ira_region = flag_ira_region; + unsigned int i; + int num_used_regs = 0; clear_bb_flags (); @@ -5213,12 +5215,17 @@ ira (FILE *f) ira_conflicts_p = optimize > 0; + /* Determine the number of pseudos actually requiring coloring. */ + for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++) + num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i)); + /* If there are too many pseudos and/or basic blocks (e.g. 10K pseudos and 10K blocks or 100K pseudos and 1K blocks), we will use simplified and faster algorithms in LRA. */ lra_simple_p = (ira_use_lra_p - && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun)); + && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun)); + if (lra_simple_p) { /* It permits to skip live range splitting in LRA. */