diff mbox series

middle-end: Correct calculation of mul_widen_cost and mul_highpart_cost.

Message ID 00a801d66e38$1d22eb60$5768c220$@nextmovesoftware.com
State New
Headers show
Series middle-end: Correct calculation of mul_widen_cost and mul_highpart_cost. | expand

Commit Message

Roger Sayle Aug. 9, 2020, 10:30 a.m. UTC
This patch fixes a subtle bug in the depths of GCC's synth_mult,
where the middle-end queries whether (how well) the target supports
widening and highpart multiplications by calling targetm.rtx_costs.
The code in init_expmed and init_expmed_one_mode iterates over various
RTL patterns querying the cost of each.  To avoid generating & garbage
collecting too much junk, it reuses the same RTL over and over, but
adjusting the modes between each call.

Alas this reuse of state is a little fragile, and at some point a
change to init_expmed_one_conv has resulted in the state (mode of
a register) being changed, but not reset before being used again.

The fallout is that GCC currently queries the backend for the cost
of non-sense RTL such as:

(mult:HI (zero_extend:HI (reg:TI 82))
    (zero_extend:HI (reg:TI 82)))

and

(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82))
        (zero_extend:HI (reg:TI 82)))
    (const_int 8 [0x8]))

The fix is to set the mode of the register back to its assumed
state, as (reg:QI 82) in the above patterns makes much more sense.

Using the old software engineering/defensive programming maxim of
"why fix a bug just once, if it can be fixed in multiple places",
this patch both restores the original value in init_expmed_one_conv,
and also sets it to the expected value in init_expmed_one_mode.
This should hopefully signal the need to be careful of invariants for
anyone modifying this code in future.

Alas things are rarely simple...  Fixing this obviously incorrect
logic causes a failure of gcc.target/i386/pr71321.c that tests for
a particular expansion from synth_mult.  The issue here is that this
test is checking for a specific multiplication expansion, when it
should really be checking that we don't generate the inefficient
"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as
reported in the PR target/71321.  Now that we use correct costs,
GCC uses a multiply instruction matching icc, LLVM and GCC prior
to v4.8.  I've even microbenchmarked this function (over several
minutes) with (disappointingly) no difference in timings.  Three
dependent leas has 3-cycle latency, exactly the same as a widening
byte multiply (on Haswell), so the shorter code splits the tie.
[I have a follow-up patch that may improve things further].

Before:
        movzbl  %dil, %eax
        leal    (%rax,%rax,4), %edx
        leal    (%rax,%rdx,8), %edx
        leal    (%rdx,%rdx,4), %edx
        shrw    $11, %dx
        leal    (%rdx,%rdx,4), %eax
        ...

After:
        movl    $-51, %edx
        movl    %edx, %eax
        mulb    %dil
        movl    %eax, %edx
        shrw    $11, %dx
        leal    (%rdx,%rdx,4), %eax
        ...


This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.
Ok for mainline?


2020-08-09  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* expmed.c (init_expmed_one_conv): Restore all->reg's mode.
	(init_expmed_one_mode): Set all->reg to desired mode.

gcc/testsuite/ChangeLog
	PR target/71321
	* gcc.target/i386/pr71321.c: Check that the code doesn't use
	the 4B zero displacement lea, not that it uses lea.

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

Comments

Richard Biener Aug. 9, 2020, 2:58 p.m. UTC | #1
On August 9, 2020 12:30:32 PM GMT+02:00, Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>This patch fixes a subtle bug in the depths of GCC's synth_mult,
>where the middle-end queries whether (how well) the target supports
>widening and highpart multiplications by calling targetm.rtx_costs.
>The code in init_expmed and init_expmed_one_mode iterates over various
>RTL patterns querying the cost of each.  To avoid generating & garbage
>collecting too much junk, it reuses the same RTL over and over, but
>adjusting the modes between each call.
>
>Alas this reuse of state is a little fragile, and at some point a
>change to init_expmed_one_conv has resulted in the state (mode of
>a register) being changed, but not reset before being used again.
>
>The fallout is that GCC currently queries the backend for the cost
>of non-sense RTL such as:
>
>(mult:HI (zero_extend:HI (reg:TI 82))
>    (zero_extend:HI (reg:TI 82)))
>
>and
>
>(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82))
>        (zero_extend:HI (reg:TI 82)))
>    (const_int 8 [0x8]))
>
>The fix is to set the mode of the register back to its assumed
>state, as (reg:QI 82) in the above patterns makes much more sense.
>
>Using the old software engineering/defensive programming maxim of
>"why fix a bug just once, if it can be fixed in multiple places",
>this patch both restores the original value in init_expmed_one_conv,
>and also sets it to the expected value in init_expmed_one_mode.
>This should hopefully signal the need to be careful of invariants for
>anyone modifying this code in future.
>
>Alas things are rarely simple...  Fixing this obviously incorrect
>logic causes a failure of gcc.target/i386/pr71321.c that tests for
>a particular expansion from synth_mult.  The issue here is that this
>test is checking for a specific multiplication expansion, when it
>should really be checking that we don't generate the inefficient
>"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as
>reported in the PR target/71321.  Now that we use correct costs,
>GCC uses a multiply instruction matching icc, LLVM and GCC prior
>to v4.8.  I've even microbenchmarked this function (over several
>minutes) with (disappointingly) no difference in timings.  Three
>dependent leas has 3-cycle latency, exactly the same as a widening
>byte multiply (on Haswell), so the shorter code splits the tie.
>[I have a follow-up patch that may improve things further].
>
>Before:
>        movzbl  %dil, %eax
>        leal    (%rax,%rax,4), %edx
>        leal    (%rax,%rdx,8), %edx
>        leal    (%rdx,%rdx,4), %edx
>        shrw    $11, %dx
>        leal    (%rdx,%rdx,4), %eax
>        ...
>
>After:
>        movl    $-51, %edx
>        movl    %edx, %eax
>        mulb    %dil
>        movl    %eax, %edx
>        shrw    $11, %dx
>        leal    (%rdx,%rdx,4), %eax
>        ...
>
>
>This patch has been tested on x86_64-pc-linux-gnu with a "make
>bootstrap" and "make -k check" with no new failures.
>Ok for mainline?

OK. 

Richard. 

>
>2020-08-09  Roger Sayle  <roger@nextmovesoftware.com>
>
>gcc/ChangeLog
>	* expmed.c (init_expmed_one_conv): Restore all->reg's mode.
>	(init_expmed_one_mode): Set all->reg to desired mode.
>
>gcc/testsuite/ChangeLog
>	PR target/71321
>	* gcc.target/i386/pr71321.c: Check that the code doesn't use
>	the 4B zero displacement lea, not that it uses lea.
>
>Thanks in advance,
>Roger
>--
>Roger Sayle
>NextMove Software
>Cambridge, UK
diff mbox series

Patch

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3d2d234..d34f0fb 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -155,6 +155,8 @@  init_expmed_one_conv (struct init_expmed_rtl *all, scalar_int_mode to_mode,
   PUT_MODE (all->reg, from_mode);
   set_convert_cost (to_mode, from_mode, speed,
 		    set_src_cost (which, to_mode, speed));
+  /* Restore all->reg's mode.  */
+  PUT_MODE (all->reg, to_mode);
 }
 
 static void
@@ -229,6 +231,7 @@  init_expmed_one_mode (struct init_expmed_rtl *all,
       if (GET_MODE_CLASS (int_mode_to) == MODE_INT
 	  && GET_MODE_WIDER_MODE (int_mode_to).exists (&wider_mode))
 	{
+	  PUT_MODE (all->reg, mode);
 	  PUT_MODE (all->zext, wider_mode);
 	  PUT_MODE (all->wide_mult, wider_mode);
 	  PUT_MODE (all->wide_lshr, wider_mode);
diff --git a/gcc/testsuite/gcc.target/i386/pr71321.c b/gcc/testsuite/gcc.target/i386/pr71321.c
index 86ad812..24d144b 100644
--- a/gcc/testsuite/gcc.target/i386/pr71321.c
+++ b/gcc/testsuite/gcc.target/i386/pr71321.c
@@ -12,5 +12,4 @@  unsigned cvt_to_2digit_ascii(uint8_t i)
 {
   return cvt_to_2digit(i, 10) + 0x0a3030;
 }
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3 } } */
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1 } } */
+/* { dg-final { scan-assembler-not "lea.*0" } } */