diff mbox

Fix PR 61225

Message ID CACgzC7A7+JXBQ-HCcoqjhmbjoALCi8nngJy2J8SXHF1GRvyQPQ@mail.gmail.com
State New
Headers show

Commit Message

Zhenqiang Chen May 21, 2014, 9:58 a.m. UTC
Hi,

The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
test case tends to check a peephole2 optimization, which optimizes the
following sequence

    2: bx:SI=ax:SI
    25: ax:SI=[bx:SI]
    7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
    8: [bx:SI]=ax:SI
    9: flags:CCZ=cmp(ax:SI,0)
to
   2: bx:SI=ax:SI
   41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}

The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
changes the INSN 25 to

    25: ax:SI=[ax:SI]

Then peephole2 can not optimize it since two memory_operands look like
different.

To fix it, the patch adds another peephole2 rule to read one more
insn. From the register copy, it knows the address is the same.

Bootstrap and no make check regression on X86-64.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-05-21  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        Part of PR rtl-optimization/61225
        * config/i386/i386.md: New peephole2 rule.
        * rtl.c (rtx_equal_p_if_reg_equal): New function.
        * rtl.h (rtx_equal_p_if_reg_equal): New prototype.

testsuite/ChangeLog:
2014-05-21  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        * gcc.dg/pr61225.c: New test.

Comments

Steven Bosscher May 21, 2014, 12:43 p.m. UTC | #1
On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote:
> Hi,
>
> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
> test case tends to check a peephole2 optimization, which optimizes the
> following sequence
>
>     2: bx:SI=ax:SI
>     25: ax:SI=[bx:SI]
>     7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
>     8: [bx:SI]=ax:SI
>     9: flags:CCZ=cmp(ax:SI,0)
> to
>    2: bx:SI=ax:SI
>    41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}
>
> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
> changes the INSN 25 to
>
>     25: ax:SI=[ax:SI]
>
> Then peephole2 can not optimize it since two memory_operands look like
> different.
>
> To fix it, the patch adds another peephole2 rule to read one more
> insn. From the register copy, it knows the address is the same.

That is one complex peephole2 to deal with a transformation like this.
It seems to be like it's a too specific solution for a bigger problem.

Could you please try one of the following solutions instead:

1. Track register values for peephole2 and try different alternatives
based on known register equivalences? E.g. in your example, perhaps
there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after
copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to
[bx:SI] at that point (or if that information is not available, it is
not very difficult to make it available). Then you could try applying
peephole2 on the original pattern but also on patterns modified with
the known equivalences (i.e. try peephole2 on multiple equivalent
patterns for the same insn). This may expose other peephole2
opportunities, not just the specific one your patch addresses.

2. Avoid copy-prop'ing ax:SI into [bx:SI] to begin with. At insn 7,
both ax:SI and bx:SI are live, so insn 2 is not dead (i.e. cannot be
eliminated) and there is no benefit in this transformation. It only
hides (or at least makes it harder to see) that [ax:SI] in insn 25 is
the same memory reference as [bx:SI] in insn 8. So perhaps the copy
propagation should simply not be done unless it turns at least one
instruction into dead code.


Any reason why this transformation isn't done much earlier, e.g. in combine?

Ciao!
Steven
Zhenqiang Chen May 22, 2014, 2:01 a.m. UTC | #2
On 21 May 2014 20:43, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Wed, May 21, 2014 at 11:58 AM, Zhenqiang Chen wrote:
>> Hi,
>>
>> The patch fixes the gcc.target/i386/pr49095.c FAIL in PR61225. The
>> test case tends to check a peephole2 optimization, which optimizes the
>> following sequence
>>
>>     2: bx:SI=ax:SI
>>     25: ax:SI=[bx:SI]
>>     7: {ax:SI=ax:SI-0x1;clobber flags:CC;}
>>     8: [bx:SI]=ax:SI
>>     9: flags:CCZ=cmp(ax:SI,0)
>> to
>>    2: bx:SI=ax:SI
>>    41: {flags:CCZ=cmp([bx:SI]-0x1,0);[bx:SI]=[bx:SI]-0x1;}
>>
>> The enhanced shrink-wrapping, which calls copyprop_hardreg_forward
>> changes the INSN 25 to
>>
>>     25: ax:SI=[ax:SI]
>>
>> Then peephole2 can not optimize it since two memory_operands look like
>> different.
>>
>> To fix it, the patch adds another peephole2 rule to read one more
>> insn. From the register copy, it knows the address is the same.
>
> That is one complex peephole2 to deal with a transformation like this.
> It seems to be like it's a too specific solution for a bigger problem.
>
> Could you please try one of the following solutions instead:

Thanks for the comments.

> 1. Track register values for peephole2 and try different alternatives
> based on known register equivalences? E.g. in your example, perhaps
> there is already a REG_EQUAL/REG_EQUIV note available on insn 25 after
> copyprop_hardreg_forward, to annotate that [ax:SI] is equivalent to
> [bx:SI] at that point (or if that information is not available, it is
> not very difficult to make it available). Then you could try applying
> peephole2 on the original pattern but also on patterns modified with
> the known equivalences (i.e. try peephole2 on multiple equivalent
> patterns for the same insn). This may expose other peephole2
> opportunities, not just the specific one your patch addresses.

I will try this one.

> 2. Avoid copy-prop'ing ax:SI into [bx:SI] to begin with. At insn 7,
> both ax:SI and bx:SI are live, so insn 2 is not dead (i.e. cannot be
> eliminated) and there is no benefit in this transformation. It only
> hides (or at least makes it harder to see) that [ax:SI] in insn 25 is
> the same memory reference as [bx:SI] in insn 8. So perhaps the copy
> propagation should simply not be done unless it turns at least one
> instruction into dead code.

This is a good heuristics. But it will lead copy-prop much more
complexity. copy-prop pass scans INSN one by one to do the
propagation. If there have multi reference INSNs, you can not make the
decision until changing the last reference.

> Any reason why this transformation isn't done much earlier, e.g. in combine?

I do not know. The similar peephole2 rule was added to fix pr49095.
Will try it if Solution 1 does not work.

Thanks!
-Zhenqiang
diff mbox

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 44e80ec..40639a5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17021,6 +17021,49 @@ 
                                 operands[5], const0_rtx);
 })

+;; Attempt to use arith or logical operations with memory outputs with
+;; setting of flags.  The difference from previous pattern is that it includes
+;; one more register copy insn, which is copy-forwarded to the address.
+(define_peephole2
+  [(set (match_operand:SWI 6 "register_operand")
+       (match_operand:SWI 0 "register_operand"))
+   (set (match_dup 0)
+       (match_operand:SWI 1 "memory_operand"))
+   (parallel [(set (match_dup 0)
+                  (match_operator:SWI 3 "plusminuslogic_operator"
+                    [(match_dup 0)
+                     (match_operand:SWI 2 "<nonmemory_operand>")]))
+             (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SWI 7 "memory_operand") (match_dup 0))
+   (set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
+  "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+   && peep2_reg_dead_p (5, operands[0])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])
+   && !reg_overlap_mentioned_p (operands[0], operands[6])
+   && (GET_MODE (operands[0]) == GET_MODE (operands[6]))
+   && (MEM_ADDR_SPACE (operands[1]) == MEM_ADDR_SPACE (operands[7]))
+   && rtx_equal_p_if_reg_equal (operands[0], operands[6],
+                               XEXP (operands[1], 0), XEXP (operands[7], 0))
+   && (<MODE>mode != QImode
+       || immediate_operand (operands[2], QImode)
+       || q_regs_operand (operands[2], QImode))
+   && ix86_match_ccmode (peep2_next_insn (4),
+                        (GET_CODE (operands[3]) == PLUS
+                         || GET_CODE (operands[3]) == MINUS)
+                        ? CCGOCmode : CCNOmode)"
+  [(set (match_dup 6) (match_dup 0))
+   (parallel [(set (match_dup 4) (match_dup 5))
+             (set (match_dup 1) (match_op_dup 3 [(match_dup 1)
+                                                 (match_dup 2)]))])]
+{
+  operands[4] = SET_DEST (PATTERN (peep2_next_insn (4)));
+  operands[5] = gen_rtx_fmt_ee (GET_CODE (operands[3]), <MODE>mode,
+                               copy_rtx (operands[1]),
+                               copy_rtx (operands[2]));
+  operands[5] = gen_rtx_COMPARE (GET_MODE (operands[4]),
+                                operands[5], const0_rtx);
+})
+
(define_peephole2
   [(parallel [(set (match_operand:SWI 0 "register_operand")
                   (match_operator:SWI 2 "plusminuslogic_operator"
diff --git a/gcc/rtl.c b/gcc/rtl.c
index 520f9a8..99418fc 100644
--- a/gcc/rtl.c
+++ b/gcc/rtl.c
@@ -654,6 +654,82 @@  rtx_equal_p (const_rtx x, const_rtx y)
   return 1;
 }

+/* Return 1 if X and Y are equal rtx's if RX used in X and RY used
+   Y are equal.  */
+
+int
+rtx_equal_p_if_reg_equal (const_rtx rx, const_rtx ry,
+                         const_rtx x, const_rtx y)
+{
+  int i;
+  enum rtx_code code;
+  const char *fmt;
+
+  gcc_assert (REG_P (rx) && REG_P (ry) && x && y);
+
+  code = GET_CODE (x);
+  /* Rtx's of different codes cannot be equal.  */
+  if (code != GET_CODE (y))
+    return 0;
+
+  if (rtx_equal_p (x, y) == 1)
+    return 1;
+
+  /* Since rx == ry, if x == rx && y == ry, x == y.  */
+  if (REG_P (x) && REG_P (y)
+      && GET_MODE (x) == GET_MODE (rx)
+      && REGNO (x) == REGNO (rx)
+      && GET_MODE (y) == GET_MODE (ry)
+      && REGNO (y) == REGNO (ry))
+    return 1;
+
+  /* Compare the elements.  If any pair of corresponding elements
+     fail to match, return 0 for the whole thing.  */
+
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      switch (fmt[i])
+       {
+       case 'e':
+         if (rtx_equal_p (XEXP (x, i), XEXP (y, i)) == 0)
+           {
+             rtx xi = XEXP (x, i);
+             rtx yi = XEXP (y, i);
+
+             if (!(REG_P (xi) && REG_P (yi)
+                   && GET_MODE (xi) == GET_MODE (rx)
+                   && REGNO (xi) == REGNO (rx)
+                   && GET_MODE (yi) == GET_MODE (ry)
+                   && REGNO (yi) == REGNO (ry)))
+               return 0;
+           }
+         break;
+       case 'w':
+         if (XWINT (x, i) != XWINT (y, i))
+           return 0;
+         break;
+
+       case 'n':
+       case 'i':
+         if (XINT (x, i) != XINT (y, i))
+           return 0;
+         break;
+
+       case 'u':
+       case '0':
+       case 't':
+         break;
+
+       default:
+         if (rtx_equal_p (XEXP (x, i), XEXP (y, i)) == 0)
+           return 0;
+         break;
+       }
+    }
+  return 1;
+}
+
 /* Iteratively hash rtx X.  */

 hashval_t
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 10ae1e9..7f9e9e7 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -1983,6 +1983,8 @@  extern unsigned int rtx_size (const_rtx);
 extern rtx shallow_copy_rtx_stat (const_rtx MEM_STAT_DECL);
 #define shallow_copy_rtx(a) shallow_copy_rtx_stat (a MEM_STAT_INFO)
 extern int rtx_equal_p (const_rtx, const_rtx);
+extern int rtx_equal_p_if_reg_equal (const_rtx, const_rtx,
+                                    const_rtx, const_rtx);
 extern hashval_t iterative_hash_rtx (const_rtx, hashval_t);

 /* In emit-rtl.c */
diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c
b/gcc/testsuite/gcc.target/i386/pr61225.c
new file mode 100644
index 0000000..1b33e9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr61225.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-options "-Os -mregparm=2" { target ia32 } } */
+
+void foo (void *);
+
+int *
+f1 (int *x)
+{
+  if (!--*(x))
+    foo (x);
+  return x;
+}
+
+int *
+f2 (int *x)
+{
+  if (!--*(x + 4))
+    foo (x);
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "test\[lq\]" } } */