Message ID | alpine.DEB.2.02.1303191715280.4515@stedding.saclay.inria.fr |
---|---|
State | New |
Headers | show |
> This patch passes bootstrap+testsuite on x86_64-linux-gnu. Using the > opposite arbitrary order in compare_commutative_operands_precedence > (exchange x and y in the line with GET_CODE) passes as well. The > simplify-rtx bit is because I get an infinite recursion otherwise. > Surprisingly, that infinite recursion occurs only in var-tracking, and > only for a single file in bootstrap+testsuite (not the same one depending > on the arbitrary order). I am not sure that is the best place to break the > recursion, but it seems to work. > > I don't really expect the patch to be accepted as is, but it seems good > enough to start the conversation. I'm rather with Jakub here, I think that arbitrary canonicalization as the one proposed here will probably be worse than no canonicalization in practice, because it will generate much shuffling. If you need more canonicalization, then why not propose new, precise canonicalization rules?
On Wed, 27 Mar 2013, Eric Botcazou wrote: >> This patch passes bootstrap+testsuite on x86_64-linux-gnu. Using the >> opposite arbitrary order in compare_commutative_operands_precedence >> (exchange x and y in the line with GET_CODE) passes as well. The >> simplify-rtx bit is because I get an infinite recursion otherwise. >> Surprisingly, that infinite recursion occurs only in var-tracking, and >> only for a single file in bootstrap+testsuite (not the same one depending >> on the arbitrary order). I am not sure that is the best place to break the >> recursion, but it seems to work. >> >> I don't really expect the patch to be accepted as is, but it seems good >> enough to start the conversation. > > I'm rather with Jakub here, I think that arbitrary canonicalization as the one > proposed here will probably be worse than no canonicalization in practice, > because it will generate much shuffling. If you need more canonicalization, > then why not propose new, precise canonicalization rules? Well, the goal was to have arbitrary canonicalization for all cases where precise rules are not already provided. In (a<<8)|(b>>24), there is no obvious reason why lshift should have higher or lower priority than rshift, but we don't want to have to write patterns with both orders in the target .md files. In v[0]+v[1] (v is a V2DF), it is likewise preferable if we know in which order the operands will appear, but I don't care what that order is (note that ordering this one requires a refinement that I didn't include in this first version of the patch). And this way we don't have to refine the order all the time, we just check what order this arbitrary rule generates and write the pattern accordingly. I am not sure about adding just a few rules. If I just say that lshift is stronger than rshift, the relation is not an order (transitive) anymore. And it might already have some of the drawbacks you and Jakub fear (I don't have a good understanding of those, I may be wrong). If there is a nice way to instead try permutations (note that for a pattern involving k commutative operations, there are 2^k versions), I am just as happy with that. By the way, even if we don't take the canonicalization (just ignore the line comparing GET_CODEs), do we still want the refactoring that this patch brings, or is the current interface better?
> Well, the goal was to have arbitrary canonicalization for all cases where > precise rules are not already provided. In (a<<8)|(b>>24), there is no > obvious reason why lshift should have higher or lower priority than > rshift, but we don't want to have to write patterns with both orders in > the target .md files. In v[0]+v[1] (v is a V2DF), it is likewise > preferable if we know in which order the operands will appear, but I don't > care what that order is (note that ordering this one requires a refinement > that I didn't include in this first version of the patch). And this way we > don't have to refine the order all the time, we just check what order this > arbitrary rule generates and write the pattern accordingly. The question is: do we really need to canonicalize everything? The more you canonicalize, the more internal shuffling you will get in the middle-end for a practical benefit that might be elusive in most cases. > I am not sure about adding just a few rules. If I just say that lshift is > stronger than rshift, the relation is not an order (transitive) anymore. Why? Can't you give them precedences in commutative_operand_precedence that preserve the transitivity? > And it might already have some of the drawbacks you and Jakub fear (I > don't have a good understanding of those, I may be wrong). The fear (at least mine) is that, by canonicalizing everything, you will make changes behind the back of back-ends that could disable some of their patterns silently. This is also a concern for new canonicalization rules, but at least you can reasonably audit the back-ends for every new rule. > If there is a nice way to instead try permutations (note that for a > pattern involving k commutative operations, there are 2^k versions), I am > just as happy with that. No, of course, canonicalization is desirable. > By the way, even if we don't take the canonicalization (just ignore the > line comparing GET_CODEs), do we still want the refactoring that this > patch brings, or is the current interface better? I think that the current interface makes it clear that you need to tweak the precedence in order to change the canonicalization (thus preserving the transitivity). Btw, there is something to fix in the head comment of commutative_operand_precedence because I cannot make sense of it.
On Tue, 2 Apr 2013, Eric Botcazou wrote: >> I am not sure about adding just a few rules. If I just say that lshift is >> stronger than rshift, the relation is not an order (transitive) anymore. > > Why? Can't you give them precedences in commutative_operand_precedence that > preserve the transitivity? I can, but then I am giving lshift higher priority than every other operation, not just rshift. And if I want to give (vec_select x 0) a higher precedence than (vec_select x 1) but lower than (vec_select (vec_concat a b) 1), the weights may become complicated, whereas the comparison function could just recurse. But I understand that it has advantages over an arbitrary order, so if I ever feel the need again I will try to play with precedence values. I might also experiment with the new transformation feature of .md files to write a pattern once and have it expand to 2 patterns for the 2 orders. > The fear (at least mine) is that, by canonicalizing everything, you will > make changes behind the back of back-ends that could disable some of > their patterns silently. I wonder if those issues might in most cases be bugs in the back-ends (optimizations missed depending on the order), that the canonicalization would make more noticable (and thus easier to fix).
> I can, but then I am giving lshift higher priority than every other > operation, not just rshift. And if I want to give (vec_select x 0) a > higher precedence than (vec_select x 1) but lower than (vec_select > (vec_concat a b) 1), the weights may become complicated, whereas the > comparison function could just recurse. You cannot have both transitivity and fast-tracks in the comparison function. However it isn't clear (at least to me) if transitivity is really required. (vec_select x 0) and (vec_select x 1) are the same pattern so they shouldn't be distinguished here. > I wonder if those issues might in most cases be bugs in the back-ends > (optimizations missed depending on the order), that the canonicalization > would make more noticable (and thus easier to fix). There are certainly such issues in the back-ends (and even ??? comments about it) so sensible canonicalization is desirable.
On 04/02/2013 04:54 AM, Eric Botcazou wrote: >> I wonder if those issues might in most cases be bugs in the back-ends >> (optimizations missed depending on the order), that the canonicalization >> would make more noticable (and thus easier to fix). > > There are certainly such issues in the back-ends (and even ??? comments about > it) so sensible canonicalization is desirable. Right. The choice between canonicalization and code bloat to support multiple RTL forms computing the same value is a never-ending problem. Where it's clearly made sense we have canonicalized, where it doesn't, we haven't. There's a huge no-mans land in the middle. Jeff
On Apr 2, 2013, at 1:55 AM, Eric Botcazou <ebotcazou@adacore.com> wrote: > The question is: do we really need to canonicalize everything? > The fear (at least mine) is that, by canonicalizing everything, you will make > changes behind the back of back-ends that could disable some of their patterns > silently. My take, yes, we really need to. My take, during construction, the canonicalization code should kick in and build it canonically. If you had a total ordering predicate, and defined an ordering for any two rtls, then the builder could use it and then there would be no case in which things are not canonicalized. If one tries to do this after the fact, we'll, let's just say, no good come from that.
Index: rtlanal.c =================================================================== --- rtlanal.c (revision 196633) +++ rtlanal.c (working copy) @@ -3076,21 +3076,21 @@ regno_use_in (unsigned int regno, rtx x) return NULL_RTX; } /* Return a value indicating whether OP, an operand of a commutative operation, is preferred as the first or second operand. The higher the value, the stronger the preference for being the first operand. We use negative values to indicate a preference for the first operand and positive values for the second operand. */ -int +static int commutative_operand_precedence (rtx op) { enum rtx_code code = GET_CODE (op); /* Constants always come the second operand. Prefer "nice" constants. */ if (code == CONST_INT) return -8; if (code == CONST_DOUBLE) return -7; if (code == CONST_FIXED) @@ -3138,28 +3138,43 @@ commutative_operand_precedence (rtx op) case RTX_UNARY: /* Then prefer NEG and NOT. */ if (code == NEG || code == NOT) return 1; default: return 0; } } +/* Return < 0 if it is necessary to swap operands of commutative operation + in order to canonicalize expression, > 0 if it is wrong to swap the + operands, and 0 if we don't know. */ + +int +compare_commutative_operands_precedence (rtx x, rtx y) +{ + int px = commutative_operand_precedence (x); + int py = commutative_operand_precedence (y); + if (px != py) + return px - py; + + /* Use some arbitrary order to avoid pattern explosion. */ + return (int) GET_CODE (x) - (int) GET_CODE (y); +} + /* Return 1 iff it is necessary to swap operands of commutative operation in order to canonicalize expression. */ bool swap_commutative_operands_p (rtx x, rtx y) { - return (commutative_operand_precedence (x) - < commutative_operand_precedence (y)); + return compare_commutative_operands_precedence (x, y) < 0; } /* Return 1 if X is an autoincrement side effect and the register is not the stack pointer. */ int auto_inc_p (const_rtx x) { switch (GET_CODE (x)) { case PRE_INC: Index: optabs.c =================================================================== --- optabs.c (revision 196633) +++ optabs.c (working copy) @@ -1285,33 +1285,29 @@ expand_simple_binop (enum machine_mode m rtx op1, rtx target, int unsignedp, enum optab_methods methods) { optab binop = code_to_optab (code); gcc_assert (binop); return expand_binop (mode, binop, op0, op1, target, unsignedp, methods); } /* Return whether OP0 and OP1 should be swapped when expanding a commutative - binop. Order them according to commutative_operand_precedence and, if - possible, try to put TARGET or a pseudo first. */ + binop. Order them according to compare_commutative_operands_precedence and, + if possible, try to put TARGET or a pseudo first. */ static bool swap_commutative_operands_with_target (rtx target, rtx op0, rtx op1) { - int op0_prec = commutative_operand_precedence (op0); - int op1_prec = commutative_operand_precedence (op1); + int cmp = compare_commutative_operands_precedence (op0, op1); - if (op0_prec < op1_prec) - return true; - - if (op0_prec > op1_prec) - return false; + if (cmp != 0) + return cmp < 0; /* With equal precedence, both orders are ok, but it is better if the first operand is TARGET, or if both TARGET and OP0 are pseudos. */ if (target == 0 || REG_P (target)) return (REG_P (op1) && !REG_P (op0)) || target == op1; else return rtx_equal_p (op1, target); } /* Return true if BINOPTAB implements a shift operation. */ Index: simplify-rtx.c =================================================================== --- simplify-rtx.c (revision 196633) +++ simplify-rtx.c (working copy) @@ -2076,22 +2076,24 @@ simplify_associative_operation (enum rtx if (! swap_commutative_operands_p (op1, op0)) return simplify_gen_binary (code, mode, op1, op0); tem = op0; op0 = op1; op1 = tem; } if (GET_CODE (op0) == code) { - /* Canonicalize "(x op c) op y" as "(x op y) op c". */ - if (swap_commutative_operands_p (XEXP (op0, 1), op1)) + /* Canonicalize "(x op c) op y" as "(x op y) op c". + GET_CODE != code is here to avoid an infinite recursion. */ + if (GET_CODE (XEXP (op0, 1)) != code + && swap_commutative_operands_p (XEXP (op0, 1), op1)) { tem = simplify_gen_binary (code, mode, XEXP (op0, 0), op1); return simplify_gen_binary (code, mode, tem, XEXP (op0, 1)); } /* Attempt to simplify "(a op b) op c" as "a op (b op c)". */ tem = simplify_binary_operation (code, mode, XEXP (op0, 1), op1); if (tem != 0) return simplify_gen_binary (code, mode, XEXP (op0, 0), tem); @@ -4158,26 +4160,24 @@ simplify_const_binary_operation (enum rt struct simplify_plus_minus_op_data { rtx op; short neg; }; static bool simplify_plus_minus_op_data_cmp (rtx x, rtx y) { - int result; + int result = compare_commutative_operands_precedence (x, y); - result = (commutative_operand_precedence (y) - - commutative_operand_precedence (x)); if (result) - return result > 0; + return result < 0; /* Group together equal REGs to do more simplification. */ if (REG_P (x) && REG_P (y)) return REGNO (x) > REGNO (y); else return false; } static rtx simplify_plus_minus (enum rtx_code code, enum machine_mode mode, rtx op0, Index: rtl.h =================================================================== --- rtl.h (revision 196633) +++ rtl.h (working copy) @@ -2015,21 +2015,21 @@ extern rtx get_call_rtx_from (rtx); extern HOST_WIDE_INT get_integer_term (const_rtx); extern rtx get_related_value (const_rtx); extern bool offset_within_block_p (const_rtx, HOST_WIDE_INT); extern void split_const (rtx, rtx *, rtx *); extern bool unsigned_reg_p (rtx); extern int reg_mentioned_p (const_rtx, const_rtx); extern int count_occurrences (const_rtx, const_rtx, int); extern int reg_referenced_p (const_rtx, const_rtx); extern int reg_used_between_p (const_rtx, const_rtx, const_rtx); extern int reg_set_between_p (const_rtx, const_rtx, const_rtx); -extern int commutative_operand_precedence (rtx); +extern int compare_commutative_operands_precedence (rtx, rtx); extern bool swap_commutative_operands_p (rtx, rtx); extern int modified_between_p (const_rtx, const_rtx, const_rtx); extern int no_labels_between_p (const_rtx, const_rtx); extern int modified_in_p (const_rtx, const_rtx); extern int reg_set_p (const_rtx, const_rtx); extern rtx single_set_2 (const_rtx, const_rtx); extern int multiple_sets (const_rtx); extern int set_noop_p (const_rtx); extern int noop_move_p (const_rtx); extern rtx find_last_value (rtx, rtx *, rtx, int);