diff mbox

[v2,rtl-optimization] : Fix PR54457, [x32] Fail to combine 64bit index + constant

Message ID CAFULd4YiqSJPDNzg3XTyWx=mBphzCWSpuaQiQ7SbO17Bgc7SSA@mail.gmail.com
State New
Headers show

Commit Message

Uros Bizjak Sept. 27, 2012, 6:04 p.m. UTC
On Thu, Sep 27, 2012 at 4:25 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:

>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>> a good transformation, but why do we need to handle as special
>>>> the case where the subreg is itself the operand of a plus or minus?
>>>> I think it should happen regardless of where the subreg occurs.
>>>
>>> Don't we need to restrict this to the low part though?
>>
>> I have tried this approach with attached patch.  Unfortunately,
>> although it survived bootstrap without libjava on x86_64, it failed
>> building libjava with:
>>
>> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1299:0:
>> error: insn does not satisfy its constraints:
>>    }
>>  ^
>> (insn 237 398 399 7 (set (reg:SI 1 dx [125])
>>         (plus:SI (subreg:SI (mult:DI (reg:DI 1 dx [orig:72 D.78627 ] [72])
>>                     (const_int 2 [0x2])) 0)
>>             (reg:SI 5 di)))
>> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1271
>> 240 {*leasi}
>>      (expr_list:REG_DEAD (reg:DI 5 di)
>>         (nil)))
>>
>> Original RTX was (subreg:SI (plus:DI (mult:DI (...) reg:DI))), which
>> is valid RTX pattern for lea insn, the above is not.
>>
>> Due to these problems, I think the safer approach is to limit the
>> transformation to (plus:SI (subreg:SI (plus:DI (...) 0)) RTXes, as was
>> the case with original patch. This approach would fix a specific
>> problem where simplify_plus_minus is not able to simplify the combined
>> RTX at combine time. Please note, that combined RTXes are always
>> checked for correctness at combine pass.
>
> I think instead the (subreg (plus ...)) handling should be applied
> to (subreg (mult ...)) too.  IMO the correct form of the above address
> ought to be:
>
>     (set (reg:SI 1 dx [125])
>          (plus:SI (mult:SI (reg:SI 1 dx [orig:72 D.78627 ] [72])
>                            (const_int 2 [0x2]))
>                   (reg:SI 5 di))

Great, this works as expected!

After some off-line discussion with Richard, attached is v2 of the patch.

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * simplify-rtx.c (simplify_subreg):
	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
     	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).

testsuite/ChangeLog:

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * gcc.target/i386/pr54457.c: New test.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

BTW: I propose that we start with limited selection of opcodes, so x32
autotester will pick and test the patch with SImode addresses.

OK for mainline?

Uros.

Comments

Paul Koning Sept. 27, 2012, 6:08 p.m. UTC | #1
On Sep 27, 2012, at 2:04 PM, Uros Bizjak wrote:

> 
> 
> 
>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>>> a good transformation, but why do we need to handle as special
>>>>> the case where the subreg is itself the operand of a plus or minus?
>>>>> I think it should happen regardless of where the subreg occurs.
>>>> 
>>>> Don't we need to restrict this to the low part though?
>>> 
>>> ...
> 
> After some off-line discussion with Richard, attached is v2 of the patch.
> 
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
> 
>        PR rtl-optimization/54457
>        * simplify-rtx.c (simplify_subreg):
> 	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>     	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
> ...

Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI?

	paul
Uros Bizjak Sept. 27, 2012, 6:13 p.m. UTC | #2
On Thu, Sep 27, 2012 at 8:08 PM,  <Paul_Koning@dell.com> wrote:

>>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>>>> a good transformation, but why do we need to handle as special
>>>>>> the case where the subreg is itself the operand of a plus or minus?
>>>>>> I think it should happen regardless of where the subreg occurs.
>>>>>
>>>>> Don't we need to restrict this to the low part though?
>>>>
>>>> ...
>>
>> After some off-line discussion with Richard, attached is v2 of the patch.
>>
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>
>>        PR rtl-optimization/54457
>>        * simplify-rtx.c (simplify_subreg):
>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>> ...
>
> Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI?

Oh, I just copied v1 ChangeLog. The patch converts all modes where
size of mode M < size of mode N. Updated ChangeLog reads:

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * simplify-rtx.c (simplify_subreg):
	Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
     	to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
	the outer subreg is effectively a truncation to the original mode M.

testsuite/ChangeLog:

2012-09-27  Uros Bizjak  <ubizjak@gmail.com>

        PR rtl-optimization/54457
        * gcc.target/i386/pr54457.c: New test.

Uros.
Jakub Jelinek Sept. 27, 2012, 6:20 p.m. UTC | #3
On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
> After some off-line discussion with Richard, attached is v2 of the patch.
> 
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
> 
>         PR rtl-optimization/54457
>         * simplify-rtx.c (simplify_subreg):
> 	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>      	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).

Is that a good idea even for WORD_REGISTER_OPERATIONS targets?

> --- simplify-rtx.c	(revision 191808)
> +++ simplify-rtx.c	(working copy)
> @@ -5689,6 +5689,28 @@ simplify_subreg (enum machine_mode outermode, rtx
>  	return CONST0_RTX (outermode);
>      }
>  
> +  /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
> +     to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
> +     the outer subreg is effectively a truncation to the original mode.  */
> +  if ((GET_CODE (op) == PLUS
> +       || GET_CODE (op) == MINUS
> +       || GET_CODE (op) == MULT)
> +      && SCALAR_INT_MODE_P (outermode)
> +      && SCALAR_INT_MODE_P (innermode)
> +      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
> +      && byte == subreg_lowpart_offset (outermode, innermode))
> +    {
> +      rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0),
> +                                     innermode, byte);
> +      if (op0)
> +        {
> +          rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1),
> +                                         innermode, byte);
> +          if (op1)
> +            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
> +        }
> +    }
> +
>    /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
>       to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
>       the outer subreg is effectively a truncation to the original mode.  */

	Jakub
Richard Sandiford Sept. 27, 2012, 8:58 p.m. UTC | #4
Jakub Jelinek <jakub@redhat.com> writes:
> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>> After some off-line discussion with Richard, attached is v2 of the patch.
>> 
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>> 
>>         PR rtl-optimization/54457
>>         * simplify-rtx.c (simplify_subreg):
>> 	Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>      	to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>
> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?

I think so.  Admittedly it means I'm going to have to change the
mips.md BADDU patterns from:

(define_insn "*baddu_si_el"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (zero_extend:SI
	 (subreg:QI
	  (plus:SI (match_operand:SI 1 "register_operand" "d")
		   (match_operand:SI 2 "register_operand" "d")) 0)))]
  "ISA_HAS_BADDU && !BYTES_BIG_ENDIAN"
  "baddu\\t%0,%1,%2"
  [(set_attr "alu_type" "add")])

to:

(define_insn "*baddu_si_el"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (zero_extend:SI
	 (plus:QI (match_operand:QI 1 "register_operand" "d")
		  (match_operand:QI 2 "register_operand" "d"))))]
  "ISA_HAS_BADDU"
  "baddu\\t%0,%1,%2"
  [(set_attr "alu_type" "add")])

But really, that's a better representation even on MIPS,
not least because it gets rid of the ugly endianness condition.

There will probably be fallout on other targets too.
But the upside is that we get rid of more subregs from .mds.
I think a few of the i386.md define_peephole2s could go too.
E.g. the second two of:

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (plus:SI (match_operand:SI 1 "register_operand")
		   (match_operand:SI 2 "nonmemory_operand"))))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && REGNO (operands[0]) == REGNO (operands[1])
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))
	      (clobber (reg:CC FLAGS_REG))])])

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (plus:SI (match_operand:SI 1 "nonmemory_operand")
		   (match_operand:SI 2 "register_operand"))))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && REGNO (operands[0]) == REGNO (operands[2])
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1))))
	      (clobber (reg:CC FLAGS_REG))])])

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (subreg:SI (plus:DI (match_dup 0)
			      (match_operand:DI 1 "nonmemory_operand")) 0)))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1))))
	      (clobber (reg:CC FLAGS_REG))])]
{
  operands[1] = gen_lowpart (SImode, operands[1]);
  operands[2] = gen_lowpart (SImode, operands[0]);
})

(define_peephole2
  [(set (match_operand:DI 0 "register_operand")
  	(zero_extend:DI
	  (subreg:SI (plus:DI (match_operand:DI 1 "nonmemory_operand")
		     	      (match_dup 0)) 0)))]
  "TARGET_64BIT && !TARGET_OPT_AGU
   && peep2_regno_dead_p (0, FLAGS_REG)"
  [(parallel [(set (match_dup 0)
		   (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1))))
	      (clobber (reg:CC FLAGS_REG))])]
{
  operands[1] = gen_lowpart (SImode, operands[1]);
  operands[2] = gen_lowpart (SImode, operands[0]);
})

where we should now always generate the first two forms.

There's no semantic difference between the two rtxes, and I think
it would be confusing to have different canonical forms on different
targets.  If the caller really wants a word-mode operation on
WORD_REGISTER_OPERATIONS targets, then I think it's asking for
the wrong thing by taking this subreg.

Richard
Uros Bizjak Sept. 28, 2012, 3:32 p.m. UTC | #5
On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>> After some off-line discussion with Richard, attached is v2 of the patch.
>>
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>
>>         PR rtl-optimization/54457
>>         * simplify-rtx.c (simplify_subreg):
>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>
> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?

I have bootstrapped and regtested [1] the patch on
alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there
were no additional failures.

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-09/msg02828.html

Uros.
Richard Sandiford Sept. 30, 2012, 9:02 a.m. UTC | #6
Uros Bizjak <ubizjak@gmail.com> writes:
> On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote:
>>> After some off-line discussion with Richard, attached is v2 of the patch.
>>>
>>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>>
>>>         PR rtl-optimization/54457
>>>         * simplify-rtx.c (simplify_subreg):
>>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>>
>> Is that a good idea even for WORD_REGISTER_OPERATIONS targets?
>
> I have bootstrapped and regtested [1] the patch on
> alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there
> were no additional failures.

Thanks.  Given Jakub's question/concern, I'd really prefer a third
opinion rather than approving it myself, but... OK if no-one objects
within 24hrs.

Richard
Andrew Pinski Oct. 2, 2012, 2:12 a.m. UTC | #7
On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Sep 27, 2012 at 8:08 PM,  <Paul_Koning@dell.com> wrote:
>
>>>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is
>>>>>>> a good transformation, but why do we need to handle as special
>>>>>>> the case where the subreg is itself the operand of a plus or minus?
>>>>>>> I think it should happen regardless of where the subreg occurs.
>>>>>>
>>>>>> Don't we need to restrict this to the low part though?
>>>>>
>>>>> ...
>>>
>>> After some off-line discussion with Richard, attached is v2 of the patch.
>>>
>>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>>
>>>        PR rtl-optimization/54457
>>>        * simplify-rtx.c (simplify_subreg):
>>>       Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
>>>       to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)).
>>> ...
>>
>> Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI?
>
> Oh, I just copied v1 ChangeLog. The patch converts all modes where
> size of mode M < size of mode N. Updated ChangeLog reads:
>
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>
>         PR rtl-optimization/54457
>         * simplify-rtx.c (simplify_subreg):
>         Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
>         to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
>         the outer subreg is effectively a truncation to the original mode M.


When I was doing something similar on our internal toolchain at
Cavium.  I found doing this caused a regression on MIPS64 n32 in
gcc.c-torture/execute/20040709-1.c Where:


(insn 15 14 16 2 (set (reg/v:DI 200 [ y ])
        (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit}
     (expr_list:REG_DEAD (reg:DI 2 $2)
        (nil)))

(insn 16 15 17 2 (set (reg:DI 210)
        (zero_extract:DI (reg/v:DI 200 [ y ])
            (const_int 29 [0x1d])
            (const_int 0 [0]))) t.c:16 249 {extzvdi}
     (expr_list:REG_DEAD (reg/v:DI 200 [ y ])
        (nil)))

(insn 17 16 23 2 (set (reg:SI 211)
        (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2}
     (expr_list:REG_DEAD (reg:DI 210)
        (nil)))

Gets converted to:
(insn 23 17 26 2 (set (reg/i:SI 2 $2)
        (and:SI (reg:SI 2 $2 [+4 ])
            (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3}
     (nil))

Which is considered an ext instruction

And with the Octeon simulator which causes undefined arguments to
32bit word operations to come out as 0xDEADBEEF which showed the
regression.  I fixed it by changing it to produce TRUNCATE instead of
the subreg.

I did the simplification on ior/and rather than plus/minus/mult so the
issue is only when expanding to this to and/ior.

Thanks,
Andrew Pinski



>
> testsuite/ChangeLog:
>
> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>
>         PR rtl-optimization/54457
>         * gcc.target/i386/pr54457.c: New test.
>
> Uros.
Richard Sandiford Oct. 2, 2012, 7:33 p.m. UTC | #8
Andrew Pinski <andrew.pinski@caviumnetworks.com> writes:
> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>
>>         PR rtl-optimization/54457
>>         * simplify-rtx.c (simplify_subreg):
>>         Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
>>         to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
>>         the outer subreg is effectively a truncation to the original mode M.
>
>
> When I was doing something similar on our internal toolchain at
> Cavium.  I found doing this caused a regression on MIPS64 n32 in
> gcc.c-torture/execute/20040709-1.c Where:
>
>
> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ])
>         (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit}
>      (expr_list:REG_DEAD (reg:DI 2 $2)
>         (nil)))
>
> (insn 16 15 17 2 (set (reg:DI 210)
>         (zero_extract:DI (reg/v:DI 200 [ y ])
>             (const_int 29 [0x1d])
>             (const_int 0 [0]))) t.c:16 249 {extzvdi}
>      (expr_list:REG_DEAD (reg/v:DI 200 [ y ])
>         (nil)))
>
> (insn 17 16 23 2 (set (reg:SI 211)
>         (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2}
>      (expr_list:REG_DEAD (reg:DI 210)
>         (nil)))
>
> Gets converted to:
> (insn 23 17 26 2 (set (reg/i:SI 2 $2)
>         (and:SI (reg:SI 2 $2 [+4 ])
>             (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3}
>      (nil))
>
> Which is considered an ext instruction
>
> And with the Octeon simulator which causes undefined arguments to
> 32bit word operations to come out as 0xDEADBEEF which showed the
> regression.  I fixed it by changing it to produce TRUNCATE instead of
> the subreg.
>
> I did the simplification on ior/and rather than plus/minus/mult so the
> issue is only when expanding to this to and/ior.

Hmm, hadn't thought of that.  I think some of the existing subreg
optimisations suffer the same problem.  I.e. we can't assume that
subreg truncations of nested operands are OK just because the outer
subreg is OK.

I've got a patch I'm testing.

BTW, I haven't forgotten about your other ext patch.  Was hoping
to see whether we could finally take the opportunity to parameterise
the ext* patterns by mode, but got distracted with other patches.
Maybe I'll just have to admit I won't get time to try it for 4.8...

Richard
diff mbox

Patch

Index: simplify-rtx.c
===================================================================
--- simplify-rtx.c	(revision 191808)
+++ simplify-rtx.c	(working copy)
@@ -5689,6 +5689,28 @@  simplify_subreg (enum machine_mode outermode, rtx
 	return CONST0_RTX (outermode);
     }
 
+  /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
+     to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == PLUS
+       || GET_CODE (op) == MINUS
+       || GET_CODE (op) == MULT)
+      && SCALAR_INT_MODE_P (outermode)
+      && SCALAR_INT_MODE_P (innermode)
+      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
+      && byte == subreg_lowpart_offset (outermode, innermode))
+    {
+      rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0),
+                                     innermode, byte);
+      if (op0)
+        {
+          rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1),
+                                         innermode, byte);
+          if (op1)
+            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
+        }
+    }
+
   /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
      to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
      the outer subreg is effectively a truncation to the original mode.  */
Index: testsuite/gcc.target/i386/pr54457.c
===================================================================
--- testsuite/gcc.target/i386/pr54457.c	(revision 0)
+++ testsuite/gcc.target/i386/pr54457.c	(working copy)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target { ! { ia32 } } } } */
+/* { dg-options "-O2 -mx32 -maddress-mode=short" } */
+
+extern char array[40];
+
+char foo (long long position)
+{
+  return array[position + 1];
+}
+
+/* { dg-final { scan-assembler-not "add\[lq\]?\[^\n\]*1" } } */