Patchwork combine: Undo canonicalizations when splitting (minus x (mult))

login
register
mail settings
Submitter Bernd Schmidt
Date Aug. 20, 2010, 11:29 p.m.
Message ID <4C6F0FD1.7040207@codesourcery.com>
Download mbox | patch
Permalink /patch/62331/
State New
Headers show

Comments

Bernd Schmidt - Aug. 20, 2010, 11:29 p.m.
A week or two ago I posted a patch which added some new
canonicalizations in the combiner, to change
  (plus (mult X -Y) Z)
to
  (minus Z (mult X Y))

This helped on ARM, and to a lesser degree on x86.  I mentioned at the
time that I was aware of one counterexample, in the 20040709-2 testcase.
 Here, we have two successive pairs of multiply and add insns, which can
be combined to a single multiply and add - however, that becomes
 (minus (const_int) (mult X (const_int))
which is not recognizable on i686 even when split.

Since then I've also found a couple additional examples on x86_64 with
vectorization enabled.

The patch below corrects the problem by tweaking find_split_point.  If
we have
  (minus Z (mult X Y))
at the top level, and Y is not a constant power of 2, assume it is
easier for the target to split the (plus (mult X -Y) Z) form.

-       imull   $1103515245, %ecx, %r8d
-       addl    $12345, %r8d
-       imull   $1103515245, %r8d, %r8d
-       addl    $12345, %r8d
+       imull   $-1029531031, %ecx, %r8d
+       subl    $740551042, %r8d

Bootstrapped and tested on x86_64-linux.  Looking at code generation on
ARMv7 and x86_64, a few sequences were improved, and I did not discover
additional regressions.

Ok?


Bernd
Richard Earnshaw - Aug. 24, 2010, 10:55 a.m.
On Sat, 2010-08-21 at 01:29 +0200, Bernd Schmidt wrote:
> A week or two ago I posted a patch which added some new
> canonicalizations in the combiner, to change
>   (plus (mult X -Y) Z)
> to
>   (minus Z (mult X Y))
> 
> This helped on ARM, and to a lesser degree on x86.  I mentioned at the
> time that I was aware of one counterexample, in the 20040709-2 testcase.
>  Here, we have two successive pairs of multiply and add insns, which can
> be combined to a single multiply and add - however, that becomes
>  (minus (const_int) (mult X (const_int))
> which is not recognizable on i686 even when split.
> 
> Since then I've also found a couple additional examples on x86_64 with
> vectorization enabled.
> 
> The patch below corrects the problem by tweaking find_split_point.  If
> we have
>   (minus Z (mult X Y))
> at the top level, and Y is not a constant power of 2, assume it is
> easier for the target to split the (plus (mult X -Y) Z) form.
> 
> -       imull   $1103515245, %ecx, %r8d
> -       addl    $12345, %r8d
> -       imull   $1103515245, %r8d, %r8d
> -       addl    $12345, %r8d
> +       imull   $-1029531031, %ecx, %r8d
> +       subl    $740551042, %r8d
> 
> Bootstrapped and tested on x86_64-linux.  Looking at code generation on
> ARMv7 and x86_64, a few sequences were improved, and I did not discover
> additional regressions.
> 
> Ok?
> 
> 
This is OK.

R.

Patch

Index: testsuite/gcc.target/i386/combine-mul.c
===================================================================
--- testsuite/gcc.target/i386/combine-mul.c	(revision 0)
+++ testsuite/gcc.target/i386/combine-mul.c	(revision 0)
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "12345" } } */
+
+static inline unsigned int myrnd (void)
+{
+    static unsigned int s = 1388815473;
+    s *= 1103515245;
+    s += 12345;
+}
+
+struct __attribute__ ((packed)) A {
+    unsigned short i:1, l:1, j:3, k:11;
+};
+
+struct A sA;
+void testA (void)
+{
+    char *p = (char *) &sA;
+    *p++ = myrnd ();
+    *p++ = myrnd ();
+    sA.k = -1;
+}
Index: combine.c
===================================================================
--- combine.c	(revision 163389)
+++ combine.c	(working copy)
@@ -4771,6 +4771,23 @@  find_split_point (rtx *loc, rtx insn, bo
 
     case PLUS:
     case MINUS:
+      /* Canonicalization can produce (minus A (mult B C)), where C is a
+	 constant.  It may be better to try splitting (plus (mult B -C) A)
+	 instead if this isn't a multiply by a power of two.  */
+      if (set_src && code == MINUS && GET_CODE (XEXP (x, 1)) == MULT
+	  && GET_CODE (XEXP (XEXP (x, 1), 1)) == CONST_INT
+	  && exact_log2 (INTVAL (XEXP (XEXP (x, 1), 1))) < 0)
+	{
+	  enum machine_mode mode = GET_MODE (x);
+	  unsigned HOST_WIDE_INT this_int = INTVAL (XEXP (XEXP (x, 1), 1));
+	  HOST_WIDE_INT other_int = trunc_int_for_mode (-this_int, mode);
+	  SUBST (*loc, gen_rtx_PLUS (mode, gen_rtx_MULT (mode,
+							 XEXP (XEXP (x, 1), 0),
+							 GEN_INT (other_int)),
+				     XEXP (x, 0)));
+	  return find_split_point (loc, insn, set_src);
+	}
+
       /* Split at a multiply-accumulate instruction.  However if this is
          the SET_SRC, we likely do not have such an instruction and it's
          worthless to try this split.  */