Message ID | CAFULd4ZOgSxG5sn2Y7Rbpn8mcykO29mV_LenpKK+KuAte-0m_g@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [i386] : Do not limit the cost of moves to/from XMM register to minimum 8. | expand |
On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > 2019-08-28 Uroš Bizjak <ubizjak@gmail.com> > > * config/i386/i386.c (ix86_register_move_cost): Do not > limit the cost of moves to/from XMM register to minimum 8. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > Actually committed as r274994 with the wrong ChangeLog. > > Uros. There is 11% regression in 548.exchange_r of SPEC2017. Reason for the regression: For 548.exchange_r, a lot of movements between gpr and xmm are generated as expected, and it reduced clocksticks by 3%. But however maybe too many xmm registers are used, a frequency reduction issue is triggered(average frequency reduced by 13%). So totally it takes more time.
On Fri, Aug 30, 2019 at 8:10 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > 2019-08-28 Uroš Bizjak <ubizjak@gmail.com> > > > > * config/i386/i386.c (ix86_register_move_cost): Do not > > limit the cost of moves to/from XMM register to minimum 8. > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > Actually committed as r274994 with the wrong ChangeLog. > > > > Uros. > > There is 11% regression in 548.exchange_r of SPEC2017. > > Reason for the regression: > For 548.exchange_r, a lot of movements between gpr and xmm are > generated as expected, > and it reduced clocksticks by 3%. > But however maybe too many xmm registers are used, > a frequency reduction issue is triggered(average frequency reduced by 13%). > So totally it takes more time. > > > > -- > BR, > Hongtao Tested on skylake workstation.
On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > 2019-08-28 Uroš Bizjak <ubizjak@gmail.com> > > > > * config/i386/i386.c (ix86_register_move_cost): Do not > > limit the cost of moves to/from XMM register to minimum 8. > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > Actually committed as r274994 with the wrong ChangeLog. > > > > Uros. > > There is 11% regression in 548.exchange_r of SPEC2017. > > Reason for the regression: > For 548.exchange_r, a lot of movements between gpr and xmm are > generated as expected, > and it reduced clocksticks by 3%. This is OK, and expected from the patch. > But however maybe too many xmm registers are used, > a frequency reduction issue is triggered(average frequency reduced by 13%). > So totally it takes more time. This is a secondary effect that is currently not modelled by the compiler. However, I expected that SSE <-> int moves in x86-tune-cost.h will have to be retuned. Up to now, both directions were limited to minimum 8, so any value lower than 8 was ignored. However, minimum was set to work-around certain limitation in reload, which is not needed anymore. You can simply set the values of SSE <-> int moves to 8 (which is an arbitrary value!) to restore the previous behaviour, but I think that a more precise cost value should be determined, probably a different one for each direction. But until register pressure effects are modelled, any artificially higher value will represent a workaround and not the true reg-reg move cost. Uros.
On Fri, Aug 30, 2019 at 2:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > 2019-08-28 Uroš Bizjak <ubizjak@gmail.com> > > > > > > * config/i386/i386.c (ix86_register_move_cost): Do not > > > limit the cost of moves to/from XMM register to minimum 8. > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > > > Actually committed as r274994 with the wrong ChangeLog. > > > > > > Uros. > > > > There is 11% regression in 548.exchange_r of SPEC2017. > > > > Reason for the regression: > > For 548.exchange_r, a lot of movements between gpr and xmm are > > generated as expected, > > and it reduced clocksticks by 3%. > > This is OK, and expected from the patch. > > > But however maybe too many xmm registers are used, > > a frequency reduction issue is triggered(average frequency reduced by 13%). > > So totally it takes more time. > > This is a secondary effect that is currently not modelled by the compiler. > > However, I expected that SSE <-> int moves in x86-tune-cost.h will > have to be retuned. Up to now, both directions were limited to minimum > 8, so any value lower than 8 was ignored. However, minimum was set to > work-around certain limitation in reload, which is not needed anymore. > > You can simply set the values of SSE <-> int moves to 8 (which is an > arbitrary value!) to restore the previous behaviour, but I think that > a more precise cost value should be determined, probably a different > one for each direction. But until register pressure effects are > modelled, any artificially higher value will represent a workaround > and not the true reg-reg move cost. > > Uros. Yes, we're still working on skylake_cost model.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 49ab50ea41bf..11c75be113e0 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18601,9 +18601,9 @@ ix86_register_move_cost (machine_mode mode, reg_class_t class1_i, where integer modes in SSE registers are not tieable because of missing QImode and HImode moves to, from or between MMX/SSE registers. */ - return MAX (8, SSE_CLASS_P (class1) - ? ix86_cost->hard_register.sse_to_integer - : ix86_cost->hard_register.integer_to_sse); + return (SSE_CLASS_P (class1) + ? ix86_cost->hard_register.sse_to_integer + : ix86_cost->hard_register.integer_to_sse); if (MAYBE_FLOAT_CLASS_P (class1)) return ix86_cost->hard_register.fp_move;