diff mbox

[RFA] Fix combine to canonicalize (mult X pow2)) more often

Message ID 555DF8B5.1000501@redhat.com
State New
Headers show

Commit Message

Jeff Law May 21, 2015, 3:24 p.m. UTC
When combine needs to split a complex insn, it will canonicalize a 
simple (mult X (const_int Y)) where Y is a power of 2 into the expected 
(ashift X (const_int Y')) if the (mult ...) is selected as a split point.

However if the split point is (plus (mult (X (const_int Y)) Z) combine 
fails to canonicalize.

In this particular testcase we end up with two expressions which produce 
the same value, one is in canonical form using ASHIFT, the other in 
non-canonical form with a MULT.  Because the two forms differ, we fail 
to find the common subexpression in postreload-gcse.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  Also 
successfully ran hppa.exp with hppa2.0w-hp-hpux11 cross compiler.  With 
this change we generate the same code for my 300+ files on the PA that 
we had prior to Venkat's combine changes.

The remaining patches in this series will just be cleaning up the PA 
backend, in particular removing all cases where we might generate 
non-canonical forms.  You could legitimately ask if that would eliminate 
the need for this change.  It will not because the MULT form is still 
canonical inside a MEM and we might need to split out an address 
calculation from the MEM to make an insn recognizable.

OK for the trunk?

Jeff

Comments

Segher Boessenkool May 22, 2015, 3:45 p.m. UTC | #1
On Thu, May 21, 2015 at 09:24:37AM -0600, Jeff Law wrote:
> When combine needs to split a complex insn, it will canonicalize a 
> simple (mult X (const_int Y)) where Y is a power of 2 into the expected 
> (ashift X (const_int Y')) if the (mult ...) is selected as a split point.
> 
> However if the split point is (plus (mult (X (const_int Y)) Z) combine 
> fails to canonicalize.

> OK for the trunk?

Okay.  Something went wrong with your changelog though.  Rebasing
changelogs is a recipe for disaster.


The following is just copy-paste, but anyway...

> +	  /* Similarly for (plus (mult FOO (const_int pow2))).  */
> +	  if (split_code == PLUS
> +	      && GET_CODE (XEXP (*split, 0)) == MULT
> +	      && CONST_INT_P (XEXP (XEXP (*split, 0), 1))
> +	      && INTVAL (XEXP (XEXP (*split, 0), 1)) > 0
> +	      && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)

INTVAL (..) > 0  here disallows 1<<63 while exact_log2 allows it
(with a 64-bit HWI).  Most places don't check the > 0 and it doesn't
seem necessary in the places that do.

This won't ever trigger for a SImode 1<<31 either.

None of this matters when we're just looking at scaled indexing, of
course.  But exact_log2 is much harder to use correctly than to use
it incorrectly :-(


Segher
Jeff Law May 22, 2015, 8:05 p.m. UTC | #2
On 05/22/2015 09:45 AM, Segher Boessenkool wrote:
> On Thu, May 21, 2015 at 09:24:37AM -0600, Jeff Law wrote:
>> When combine needs to split a complex insn, it will canonicalize a
>> simple (mult X (const_int Y)) where Y is a power of 2 into the expected
>> (ashift X (const_int Y')) if the (mult ...) is selected as a split point.
>>
>> However if the split point is (plus (mult (X (const_int Y)) Z) combine
>> fails to canonicalize.
>
>> OK for the trunk?
>
> Okay.  Something went wrong with your changelog though.  Rebasing
> changelogs is a recipe for disaster.
Yup. I still haven't found a workflow that makes sense with ChangeLogs. 
Ultimately I end up having to check them before each commit/push, so 
assume it'll end up in the right place :-)

>
> The following is just copy-paste, but anyway...
>
>> +	  /* Similarly for (plus (mult FOO (const_int pow2))).  */
>> +	  if (split_code == PLUS
>> +	      && GET_CODE (XEXP (*split, 0)) == MULT
>> +	      && CONST_INT_P (XEXP (XEXP (*split, 0), 1))
>> +	      && INTVAL (XEXP (XEXP (*split, 0), 1)) > 0
>> +	      && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
>
> INTVAL (..) > 0  here disallows 1<<63 while exact_log2 allows it
> (with a 64-bit HWI).  Most places don't check the > 0 and it doesn't
> seem necessary in the places that do.
>
> This won't ever trigger for a SImode 1<<31 either.
>
> None of this matters when we're just looking at scaled indexing, of
> course.  But exact_log2 is much harder to use correctly than to use
> it incorrectly :-(
That test was just copied from the equivalent hunk of code that tests 
for (mult FOO (const_int pow2)) above. It's certainly ok for the INTVAL 
to reject things that might be allowed through the exact_log2 test.  But 
as you say, this isn't going to matter.

Jeff
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f4012b7..425813c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@ 
 2015-05-21  Jeff Law  <law@redhat.com>
 
+	* combine.c (try_combine): Canonicalize (plus (mult X pow2) Y) into
+	(plus (ashift X log2) Y) if it is a split point.
+
 	* combine.c (find_split_point): Handle ASHIFT like MULT to encourage
 	multiply-accumulate/shift-add insn generation.
 
diff --git a/gcc/combine.c b/gcc/combine.c
index 8c527a7..2cb9fd2 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3749,6 +3749,21 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
 	      split_code = GET_CODE (*split);
 	    }
 
+	  /* Similarly for (plus (mult FOO (const_int pow2))).  */
+	  if (split_code == PLUS
+	      && GET_CODE (XEXP (*split, 0)) == MULT
+	      && CONST_INT_P (XEXP (XEXP (*split, 0), 1))
+	      && INTVAL (XEXP (XEXP (*split, 0), 1)) > 0
+	      && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
+	    {
+	      rtx nsplit = XEXP (*split, 0);
+	      SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
+					     XEXP (nsplit, 0), GEN_INT (i)));
+	      /* Update split_code because we may not have a multiply
+		 anymore.  */
+	      split_code = GET_CODE (*split);
+	    }
+
 #ifdef INSN_SCHEDULING
 	  /* If *SPLIT is a paradoxical SUBREG, when we split it, it should
 	     be written as a ZERO_EXTEND.  */
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 283644c..41a09bad 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,6 @@ 
 2015-05-21  Jeff Law  <law@redhat.com>
 
+	* gcc.target/hppa/shadd-3.c: New test.
 	* gcc.target/hppa/shadd-2.c: New test.
 
 2015-05-21  Oleg Endo  <olegendo@gcc.gnu.org>
diff --git a/gcc/testsuite/gcc.target/hppa/shadd-3.c b/gcc/testsuite/gcc.target/hppa/shadd-3.c
new file mode 100644
index 0000000..f0443ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/hppa/shadd-3.c
@@ -0,0 +1,41 @@ 
+/* { dg-do compile }  */
+/* { dg-options "-O2" }  */
+/* In this test we want to verify that combine canonicalizes the
+   MULT into an ASHIFT which in turn allows postreload-gcse to
+   find the common subexpression.
+
+   Neither pass dumps stuff in a format that is particularly good
+   for parsing here, so we count the shadd insns.  More is not
+   necessarily better in this test.  If this test is too fragile
+   over time we'll have to revisit the combine and/or postreload
+   dumps.  */
+/* { dg-final { scan-assembler-times "sh.add" 5 } }  */
+
+extern void oof (void);
+typedef struct simple_bitmap_def *sbitmap;
+struct simple_bitmap_def
+{
+  unsigned char *popcount;
+  unsigned int n_bits;
+  unsigned long elms[1];
+};
+__inline__ void
+SET_BIT (sbitmap map, unsigned int bitno)
+{
+  if (map->popcount)
+    {
+      unsigned char oldbit;
+      oldbit =
+	((map)->elms[bitno / 64]);
+      if (!oldbit)
+	oof ();
+    }
+  map->elms[bitno / 64] |= 1;
+}
+
+void
+fix_bb_placements (int indx1, int indx2, sbitmap in_queue)
+{
+  SET_BIT (in_queue, indx1);
+  SET_BIT (in_queue, indx2);
+}