Patchwork [RTL] Canonicalize commutative operations more

login
register
mail settings
Submitter Marc Glisse
Date March 19, 2013, 5:13 p.m.
Message ID <alpine.DEB.2.02.1303191715280.4515@stedding.saclay.inria.fr>
Download mbox | patch
Permalink /patch/229147/
State New
Headers show

Comments

Marc Glisse - March 19, 2013, 5:13 p.m.
Hello,

as explained in the PR, the goal of this patch is to have a more canonical 
form for RTL expressions, so it is easier to model them with a small 
number of patterns.

This patch passes bootstrap+testsuite on x86_64-linux-gnu. Using the 
opposite arbitrary order in compare_commutative_operands_precedence 
(exchange x and y in the line with GET_CODE) passes as well. The 
simplify-rtx bit is because I get an infinite recursion otherwise. 
Surprisingly, that infinite recursion occurs only in var-tracking, and 
only for a single file in bootstrap+testsuite (not the same one depending 
on the arbitrary order). I am not sure that is the best place to break the 
recursion, but it seems to work.

I don't really expect the patch to be accepted as is, but it seems good 
enough to start the conversation.

2013-03-19  Marc Glisse  <marc.glisse@inria.fr>

 	PR rtl-optimization/55611
 	* rtl.h (commutative_operand_precedence): Remove.
 	(compare_commutative_operands_precedence): Declare.
 	* optabs.c (swap_commutative_operands_with_target): Use
 	compare_commutative_operands_precedence.
 	* simplify-rtx.c (simplify_associative_operation): Break the
 	recursion cycle.
 	(simplify_plus_minus_op_data_cmp): Use
 	compare_commutative_operands_precedence.
 	* rtlanal.c (commutative_operand_precedence): Make static.
 	(compare_commutative_operands_precedence): New function.
 	(swap_commutative_operands_p): Use it.
Eric Botcazou - March 27, 2013, 10:01 a.m.
> This patch passes bootstrap+testsuite on x86_64-linux-gnu. Using the
> opposite arbitrary order in compare_commutative_operands_precedence
> (exchange x and y in the line with GET_CODE) passes as well. The
> simplify-rtx bit is because I get an infinite recursion otherwise.
> Surprisingly, that infinite recursion occurs only in var-tracking, and
> only for a single file in bootstrap+testsuite (not the same one depending
> on the arbitrary order). I am not sure that is the best place to break the
> recursion, but it seems to work.
> 
> I don't really expect the patch to be accepted as is, but it seems good
> enough to start the conversation.

I'm rather with Jakub here, I think that arbitrary canonicalization as the one 
proposed here will probably be worse than no canonicalization in practice, 
because it will generate much shuffling.  If you need more canonicalization, 
then why not propose new, precise canonicalization rules?
Marc Glisse - March 30, 2013, 12:49 p.m.
On Wed, 27 Mar 2013, Eric Botcazou wrote:

>> This patch passes bootstrap+testsuite on x86_64-linux-gnu. Using the
>> opposite arbitrary order in compare_commutative_operands_precedence
>> (exchange x and y in the line with GET_CODE) passes as well. The
>> simplify-rtx bit is because I get an infinite recursion otherwise.
>> Surprisingly, that infinite recursion occurs only in var-tracking, and
>> only for a single file in bootstrap+testsuite (not the same one depending
>> on the arbitrary order). I am not sure that is the best place to break the
>> recursion, but it seems to work.
>>
>> I don't really expect the patch to be accepted as is, but it seems good
>> enough to start the conversation.
>
> I'm rather with Jakub here, I think that arbitrary canonicalization as the one
> proposed here will probably be worse than no canonicalization in practice,
> because it will generate much shuffling.  If you need more canonicalization,
> then why not propose new, precise canonicalization rules?

Well, the goal was to have arbitrary canonicalization for all cases where 
precise rules are not already provided. In (a<<8)|(b>>24), there is no 
obvious reason why lshift should have higher or lower priority than 
rshift, but we don't want to have to write patterns with both orders in 
the target .md files. In v[0]+v[1] (v is a V2DF), it is likewise 
preferable if we know in which order the operands will appear, but I don't 
care what that order is (note that ordering this one requires a refinement 
that I didn't include in this first version of the patch). And this way we 
don't have to refine the order all the time, we just check what order this 
arbitrary rule generates and write the pattern accordingly.

I am not sure about adding just a few rules. If I just say that lshift is 
stronger than rshift, the relation is not an order (transitive) anymore. 
And it might already have some of the drawbacks you and Jakub fear (I 
don't have a good understanding of those, I may be wrong).

If there is a nice way to instead try permutations (note that for a 
pattern involving k commutative operations, there are 2^k versions), I am 
just as happy with that.

By the way, even if we don't take the canonicalization (just ignore the 
line comparing GET_CODEs), do we still want the refactoring that this 
patch brings, or is the current interface better?
Eric Botcazou - April 2, 2013, 8:55 a.m.
> Well, the goal was to have arbitrary canonicalization for all cases where
> precise rules are not already provided. In (a<<8)|(b>>24), there is no
> obvious reason why lshift should have higher or lower priority than
> rshift, but we don't want to have to write patterns with both orders in
> the target .md files. In v[0]+v[1] (v is a V2DF), it is likewise
> preferable if we know in which order the operands will appear, but I don't
> care what that order is (note that ordering this one requires a refinement
> that I didn't include in this first version of the patch). And this way we
> don't have to refine the order all the time, we just check what order this
> arbitrary rule generates and write the pattern accordingly.

The question is: do we really need to canonicalize everything?  The more you 
canonicalize, the more internal shuffling you will get in the middle-end for a 
practical benefit that might be elusive in most cases.

> I am not sure about adding just a few rules. If I just say that lshift is
> stronger than rshift, the relation is not an order (transitive) anymore.

Why?  Can't you give them precedences in commutative_operand_precedence that 
preserve the transitivity?

> And it might already have some of the drawbacks you and Jakub fear (I
> don't have a good understanding of those, I may be wrong).

The fear (at least mine) is that, by canonicalizing everything, you will make 
changes behind the back of back-ends that could disable some of their patterns 
silently.  This is also a concern for new canonicalization rules, but at least 
you can reasonably audit the back-ends for every new rule.

> If there is a nice way to instead try permutations (note that for a
> pattern involving k commutative operations, there are 2^k versions), I am
> just as happy with that.

No, of course, canonicalization is desirable.

> By the way, even if we don't take the canonicalization (just ignore the
> line comparing GET_CODEs), do we still want the refactoring that this
> patch brings, or is the current interface better?

I think that the current interface makes it clear that you need to tweak the 
precedence in order to change the canonicalization (thus preserving the 
transitivity).  Btw, there is something to fix in the head comment of 
commutative_operand_precedence because I cannot make sense of it.
Marc Glisse - April 2, 2013, 10:26 a.m.
On Tue, 2 Apr 2013, Eric Botcazou wrote:

>> I am not sure about adding just a few rules. If I just say that lshift is
>> stronger than rshift, the relation is not an order (transitive) anymore.
>
> Why?  Can't you give them precedences in commutative_operand_precedence that
> preserve the transitivity?

I can, but then I am giving lshift higher priority than every other 
operation, not just rshift. And if I want to give (vec_select x 0) a 
higher precedence than (vec_select x 1) but lower than (vec_select 
(vec_concat a b) 1), the weights may become complicated, whereas the 
comparison function could just recurse. But I understand that it has 
advantages over an arbitrary order, so if I ever feel the need again I 
will try to play with precedence values.

I might also experiment with the new transformation feature of .md files 
to write a pattern once and have it expand to 2 patterns for the 2 orders.

> The fear (at least mine) is that, by canonicalizing everything, you will 
> make changes behind the back of back-ends that could disable some of 
> their patterns silently.

I wonder if those issues might in most cases be bugs in the back-ends 
(optimizations missed depending on the order), that the canonicalization 
would make more noticable (and thus easier to fix).
Eric Botcazou - April 2, 2013, 10:54 a.m.
> I can, but then I am giving lshift higher priority than every other
> operation, not just rshift. And if I want to give (vec_select x 0) a
> higher precedence than (vec_select x 1) but lower than (vec_select
> (vec_concat a b) 1), the weights may become complicated, whereas the
> comparison function could just recurse.

You cannot have both transitivity and fast-tracks in the comparison function.
However it isn't clear (at least to me) if transitivity is really required.

(vec_select x 0) and (vec_select x 1) are the same pattern so they shouldn't 
be distinguished here.

> I wonder if those issues might in most cases be bugs in the back-ends
> (optimizations missed depending on the order), that the canonicalization
> would make more noticable (and thus easier to fix).

There are certainly such issues in the back-ends (and even ??? comments about 
it) so sensible canonicalization is desirable.
Jeff Law - April 2, 2013, 3:29 p.m.
On 04/02/2013 04:54 AM, Eric Botcazou wrote:
>> I wonder if those issues might in most cases be bugs in the back-ends
>> (optimizations missed depending on the order), that the canonicalization
>> would make more noticable (and thus easier to fix).
>
> There are certainly such issues in the back-ends (and even ??? comments about
> it) so sensible canonicalization is desirable.
Right.  The choice between canonicalization and code bloat to support 
multiple RTL forms computing the same value is a never-ending problem. 
Where it's clearly made sense we have canonicalized, where it doesn't, 
we haven't.  There's a huge no-mans land in the middle.

Jeff
Mike Stump - April 4, 2013, 3:15 p.m.
On Apr 2, 2013, at 1:55 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> The question is: do we really need to canonicalize everything?

> The fear (at least mine) is that, by canonicalizing everything, you will make 
> changes behind the back of back-ends that could disable some of their patterns 
> silently.

My take, yes, we really need to.  My take, during construction, the canonicalization code should kick in and build it canonically.  If you had a total ordering predicate, and defined an ordering for any two rtls, then the builder could use it and then there would be no case in which things are not canonicalized.  If one tries to do this after the fact, we'll, let's just say, no good come from that.

Patch

Index: rtlanal.c

===================================================================
--- rtlanal.c	(revision 196633)

+++ rtlanal.c	(working copy)

@@ -3076,21 +3076,21 @@  regno_use_in (unsigned int regno, rtx x)

 
   return NULL_RTX;
 }
 
 /* Return a value indicating whether OP, an operand of a commutative
    operation, is preferred as the first or second operand.  The higher
    the value, the stronger the preference for being the first operand.
    We use negative values to indicate a preference for the first operand
    and positive values for the second operand.  */
 
-int

+static int

 commutative_operand_precedence (rtx op)
 {
   enum rtx_code code = GET_CODE (op);
 
   /* Constants always come the second operand.  Prefer "nice" constants.  */
   if (code == CONST_INT)
     return -8;
   if (code == CONST_DOUBLE)
     return -7;
   if (code == CONST_FIXED)
@@ -3138,28 +3138,43 @@  commutative_operand_precedence (rtx op)

     case RTX_UNARY:
       /* Then prefer NEG and NOT.  */
       if (code == NEG || code == NOT)
         return 1;
 
     default:
       return 0;
     }
 }
 
+/* Return < 0 if it is necessary to swap operands of commutative operation

+   in order to canonicalize expression, > 0 if it is wrong to swap the

+   operands, and 0 if we don't know.  */

+

+int

+compare_commutative_operands_precedence (rtx x, rtx y)

+{

+  int px = commutative_operand_precedence (x);

+  int py = commutative_operand_precedence (y);

+  if (px != py)

+    return px - py;

+

+  /* Use some arbitrary order to avoid pattern explosion.  */

+  return (int) GET_CODE (x) - (int) GET_CODE (y);

+}

+

 /* Return 1 iff it is necessary to swap operands of commutative operation
    in order to canonicalize expression.  */
 
 bool
 swap_commutative_operands_p (rtx x, rtx y)
 {
-  return (commutative_operand_precedence (x)

-	  < commutative_operand_precedence (y));

+  return compare_commutative_operands_precedence (x, y) < 0;

 }
 
 /* Return 1 if X is an autoincrement side effect and the register is
    not the stack pointer.  */
 int
 auto_inc_p (const_rtx x)
 {
   switch (GET_CODE (x))
     {
     case PRE_INC:
Index: optabs.c

===================================================================
--- optabs.c	(revision 196633)

+++ optabs.c	(working copy)

@@ -1285,33 +1285,29 @@  expand_simple_binop (enum machine_mode m

 		     rtx op1, rtx target, int unsignedp,
 		     enum optab_methods methods)
 {
   optab binop = code_to_optab (code);
   gcc_assert (binop);
 
   return expand_binop (mode, binop, op0, op1, target, unsignedp, methods);
 }
 
 /* Return whether OP0 and OP1 should be swapped when expanding a commutative
-   binop.  Order them according to commutative_operand_precedence and, if

-   possible, try to put TARGET or a pseudo first.  */

+   binop.  Order them according to compare_commutative_operands_precedence and,

+   if possible, try to put TARGET or a pseudo first.  */

 static bool
 swap_commutative_operands_with_target (rtx target, rtx op0, rtx op1)
 {
-  int op0_prec = commutative_operand_precedence (op0);

-  int op1_prec = commutative_operand_precedence (op1);

+  int cmp = compare_commutative_operands_precedence (op0, op1);

 
-  if (op0_prec < op1_prec)

-    return true;

-

-  if (op0_prec > op1_prec)

-    return false;

+  if (cmp != 0)

+    return cmp < 0;

 
   /* With equal precedence, both orders are ok, but it is better if the
      first operand is TARGET, or if both TARGET and OP0 are pseudos.  */
   if (target == 0 || REG_P (target))
     return (REG_P (op1) && !REG_P (op0)) || target == op1;
   else
     return rtx_equal_p (op1, target);
 }
 
 /* Return true if BINOPTAB implements a shift operation.  */
Index: simplify-rtx.c

===================================================================
--- simplify-rtx.c	(revision 196633)

+++ simplify-rtx.c	(working copy)

@@ -2076,22 +2076,24 @@  simplify_associative_operation (enum rtx

       if (! swap_commutative_operands_p (op1, op0))
 	return simplify_gen_binary (code, mode, op1, op0);
 
       tem = op0;
       op0 = op1;
       op1 = tem;
     }
 
   if (GET_CODE (op0) == code)
     {
-      /* Canonicalize "(x op c) op y" as "(x op y) op c".  */

-      if (swap_commutative_operands_p (XEXP (op0, 1), op1))

+      /* Canonicalize "(x op c) op y" as "(x op y) op c".

+	 GET_CODE != code is here to avoid an infinite recursion.  */

+      if (GET_CODE (XEXP (op0, 1)) != code

+	  && swap_commutative_operands_p (XEXP (op0, 1), op1))

 	{
 	  tem = simplify_gen_binary (code, mode, XEXP (op0, 0), op1);
 	  return simplify_gen_binary (code, mode, tem, XEXP (op0, 1));
 	}
 
       /* Attempt to simplify "(a op b) op c" as "a op (b op c)".  */
       tem = simplify_binary_operation (code, mode, XEXP (op0, 1), op1);
       if (tem != 0)
         return simplify_gen_binary (code, mode, XEXP (op0, 0), tem);
 
@@ -4158,26 +4160,24 @@  simplify_const_binary_operation (enum rt

 
 struct simplify_plus_minus_op_data
 {
   rtx op;
   short neg;
 };
 
 static bool
 simplify_plus_minus_op_data_cmp (rtx x, rtx y)
 {
-  int result;

+  int result = compare_commutative_operands_precedence (x, y);

 
-  result = (commutative_operand_precedence (y)

-	    - commutative_operand_precedence (x));

   if (result)
-    return result > 0;

+    return result < 0;

 
   /* Group together equal REGs to do more simplification.  */
   if (REG_P (x) && REG_P (y))
     return REGNO (x) > REGNO (y);
   else
     return false;
 }
 
 static rtx
 simplify_plus_minus (enum rtx_code code, enum machine_mode mode, rtx op0,
Index: rtl.h

===================================================================
--- rtl.h	(revision 196633)

+++ rtl.h	(working copy)

@@ -2015,21 +2015,21 @@  extern rtx get_call_rtx_from (rtx);

 extern HOST_WIDE_INT get_integer_term (const_rtx);
 extern rtx get_related_value (const_rtx);
 extern bool offset_within_block_p (const_rtx, HOST_WIDE_INT);
 extern void split_const (rtx, rtx *, rtx *);
 extern bool unsigned_reg_p (rtx);
 extern int reg_mentioned_p (const_rtx, const_rtx);
 extern int count_occurrences (const_rtx, const_rtx, int);
 extern int reg_referenced_p (const_rtx, const_rtx);
 extern int reg_used_between_p (const_rtx, const_rtx, const_rtx);
 extern int reg_set_between_p (const_rtx, const_rtx, const_rtx);
-extern int commutative_operand_precedence (rtx);

+extern int compare_commutative_operands_precedence (rtx, rtx);

 extern bool swap_commutative_operands_p (rtx, rtx);
 extern int modified_between_p (const_rtx, const_rtx, const_rtx);
 extern int no_labels_between_p (const_rtx, const_rtx);
 extern int modified_in_p (const_rtx, const_rtx);
 extern int reg_set_p (const_rtx, const_rtx);
 extern rtx single_set_2 (const_rtx, const_rtx);
 extern int multiple_sets (const_rtx);
 extern int set_noop_p (const_rtx);
 extern int noop_move_p (const_rtx);
 extern rtx find_last_value (rtx, rtx *, rtx, int);