diff mbox series

[i386] : Do not limit the cost of moves to/from XMM register to minimum 8.

Message ID CAFULd4ZOgSxG5sn2Y7Rbpn8mcykO29mV_LenpKK+KuAte-0m_g@mail.gmail.com
State New
Headers show
Series [i386] : Do not limit the cost of moves to/from XMM register to minimum 8. | expand

Commit Message

Uros Bizjak Aug. 29, 2019, 6:08 p.m. UTC
2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>

    * config/i386/i386.c (ix86_register_move_cost): Do not
    limit the cost of moves to/from XMM register to minimum 8.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Actually committed as r274994 with the wrong ChangeLog.

Uros.

Comments

Hongtao Liu Aug. 30, 2019, 12:10 a.m. UTC | #1
On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>
>
>     * config/i386/i386.c (ix86_register_move_cost): Do not
>     limit the cost of moves to/from XMM register to minimum 8.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Actually committed as r274994 with the wrong ChangeLog.
>
> Uros.

There is 11% regression in 548.exchange_r of SPEC2017.

Reason for the regression:
For 548.exchange_r, a lot of movements between gpr and xmm are
generated as expected,
and it reduced  clocksticks by 3%.
But  however maybe too many xmm registers are used,
a frequency reduction issue is triggered(average frequency reduced by 13%).
So totally it takes more time.
Hongtao Liu Aug. 30, 2019, 12:15 a.m. UTC | #2
On Fri, Aug 30, 2019 at 8:10 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>
> >
> >     * config/i386/i386.c (ix86_register_move_cost): Do not
> >     limit the cost of moves to/from XMM register to minimum 8.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Actually committed as r274994 with the wrong ChangeLog.
> >
> > Uros.
>
> There is 11% regression in 548.exchange_r of SPEC2017.
>
> Reason for the regression:
> For 548.exchange_r, a lot of movements between gpr and xmm are
> generated as expected,
> and it reduced  clocksticks by 3%.
> But  however maybe too many xmm registers are used,
> a frequency reduction issue is triggered(average frequency reduced by 13%).
> So totally it takes more time.
>
>
>
> --
> BR,
> Hongtao

Tested on skylake workstation.
Uros Bizjak Aug. 30, 2019, 6:18 a.m. UTC | #3
On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>
> >
> >     * config/i386/i386.c (ix86_register_move_cost): Do not
> >     limit the cost of moves to/from XMM register to minimum 8.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Actually committed as r274994 with the wrong ChangeLog.
> >
> > Uros.
>
> There is 11% regression in 548.exchange_r of SPEC2017.
>
> Reason for the regression:
> For 548.exchange_r, a lot of movements between gpr and xmm are
> generated as expected,
> and it reduced  clocksticks by 3%.

This is OK, and expected from the patch.

> But  however maybe too many xmm registers are used,
> a frequency reduction issue is triggered(average frequency reduced by 13%).
> So totally it takes more time.

This is a secondary effect that is currently not modelled by the compiler.

However, I expected that SSE <-> int moves in x86-tune-cost.h will
have to be retuned. Up to now, both directions were limited to minimum
8, so any value lower than 8 was ignored. However, minimum was set to
work-around certain limitation in reload, which is not needed anymore.

You can simply set the values of SSE <-> int moves to 8 (which is an
arbitrary value!) to restore the previous behaviour, but I think that
a more precise cost value should be determined, probably a different
one for each direction. But until register pressure effects are
modelled, any artificially higher value will represent a workaround
and not the true reg-reg move cost.

Uros.
Hongtao Liu Aug. 30, 2019, 7:24 a.m. UTC | #4
On Fri, Aug 30, 2019 at 2:18 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > 2019-08-28  Uroš Bizjak  <ubizjak@gmail.com>
> > >
> > >     * config/i386/i386.c (ix86_register_move_cost): Do not
> > >     limit the cost of moves to/from XMM register to minimum 8.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > >
> > > Actually committed as r274994 with the wrong ChangeLog.
> > >
> > > Uros.
> >
> > There is 11% regression in 548.exchange_r of SPEC2017.
> >
> > Reason for the regression:
> > For 548.exchange_r, a lot of movements between gpr and xmm are
> > generated as expected,
> > and it reduced  clocksticks by 3%.
>
> This is OK, and expected from the patch.
>
> > But  however maybe too many xmm registers are used,
> > a frequency reduction issue is triggered(average frequency reduced by 13%).
> > So totally it takes more time.
>
> This is a secondary effect that is currently not modelled by the compiler.
>
> However, I expected that SSE <-> int moves in x86-tune-cost.h will
> have to be retuned. Up to now, both directions were limited to minimum
> 8, so any value lower than 8 was ignored. However, minimum was set to
> work-around certain limitation in reload, which is not needed anymore.
>
> You can simply set the values of SSE <-> int moves to 8 (which is an
> arbitrary value!) to restore the previous behaviour, but I think that
> a more precise cost value should be determined, probably a different
> one for each direction. But until register pressure effects are
> modelled, any artificially higher value will represent a workaround
> and not the true reg-reg move cost.
>
> Uros.

Yes,  we're still working on skylake_cost model.
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 49ab50ea41bf..11c75be113e0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18601,9 +18601,9 @@  ix86_register_move_cost (machine_mode mode, reg_class_t class1_i,
        where integer modes in SSE registers are not tieable
        because of missing QImode and HImode moves to, from or between
        MMX/SSE registers.  */
-    return MAX (8, SSE_CLASS_P (class1)
-		? ix86_cost->hard_register.sse_to_integer
-		: ix86_cost->hard_register.integer_to_sse);
+    return (SSE_CLASS_P (class1)
+	    ? ix86_cost->hard_register.sse_to_integer
+	    : ix86_cost->hard_register.integer_to_sse);
 
   if (MAYBE_FLOAT_CLASS_P (class1))
     return ix86_cost->hard_register.fp_move;