diff mbox

PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

Message ID 20111012223450.GA11516@intel.com
State New
Headers show

Commit Message

H.J. Lu Oct. 12, 2011, 10:34 p.m. UTC
Hi,

When combine tries to combine:

(insn 37 35 39 3 (set (reg:SI 90)
        (plus:SI (mult:SI (reg/v:SI 84 [ i ])
                (const_int 4 [0x4]))
            (reg:SI 106))) x.i:11 247 {*leasi_2}
     (nil))

(insn 39 37 41 3 (set (mem:SI (zero_extend:DI (reg:SI 90)) [3
MEM[symbol: x,
index: D.2741_12, step: 4, offset: 4294967292B]+0 S4 A32])
        (reg/v:SI 84 [ i ])) x.i:11 64 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 90)
        (nil)))

it optimizes

	(zero_extend:DI (plus:SI (mult:SI (reg/v:SI 84 [ i ])
                (const_int 4 [0x4]))
            (reg:SI 106)))

into

	(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
                        (const_int 4 [0x4])) 0)
                (subreg:DI (reg:SI 106) 0))
            (const_int 4294967292 [0xfffffffc])) 

in make_compound_operation.  X86 backend doesn't accept the new
expression as valid address while (zero_extend:DI) works just fine.
This patches keeps ZERO_EXTEND when zero-extending address to Pmode.
It reduces number of lea from 24173 to 21428 in x32 libgfortran.so.
Does it make any senses?

Thanks.

H.J.
---
2011-10-12  H.J. Lu  <hongjiu.lu@intel.com>

	PR rtl-optimization/50696
	* combine.c (subst): If an address is zero-extended to Pmode,
	replace FROM with while keeping ZERO_EXTEND.

Comments

Richard Kenner Oct. 12, 2011, 10:40 p.m. UTC | #1
> X86 backend doesn't accept the new expression as valid address while
> (zero_extend:DI) works just fine.  This patches keeps ZERO_EXTEND
> when zero-extending address to Pmode.  It reduces number of lea from
> 24173 to 21428 in x32 libgfortran.so.  Does it make any senses?

I'd be inclined to have the x86 backend accept combine's canonicalized
form rather than doing a patch such as this.
H.J. Lu Oct. 12, 2011, 10:52 p.m. UTC | #2
On Wed, Oct 12, 2011 at 3:40 PM, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> X86 backend doesn't accept the new expression as valid address while
>> (zero_extend:DI) works just fine.  This patches keeps ZERO_EXTEND
>> when zero-extending address to Pmode.  It reduces number of lea from
>> 24173 to 21428 in x32 libgfortran.so.  Does it make any senses?
>
> I'd be inclined to have the x86 backend accept combine's canonicalized
> form rather than doing a patch such as this.
>

The address format generated by combine is very unusual in
2 aspecst:

1. The placement of subreg in

(plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
                        (const_int 4 [0x4])) 0)
                (subreg:DI (reg:SI 106) 0))

isn't supported by x86 backend.

2. The biggest problem is optimizing mask 0xffffffff to
0xfffffffc by keeping track of non-zero bits in registers.
X86 backend doesn't have such information to know
ADDR & 0xfffffffc == ADDR & 0xffffffff.
Richard Kenner Oct. 12, 2011, 11:04 p.m. UTC | #3
> 1. The placement of subreg in
> 
> (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>                         (const_int 4 [0x4])) 0)
>                 (subreg:DI (reg:SI 106) 0))
> 
> isn't supported by x86 backend.

That's easy to fix.

> 2. The biggest problem is optimizing mask 0xffffffff to
> 0xfffffffc by keeping track of non-zero bits in registers.
> X86 backend doesn't have such information to know
> ADDR & 0xfffffffc == ADDR & 0xffffffff.

But this indeed isn't.

I withdraw my comment.  

I still don't like the patch, but I'm no longer as familiar with the code
as I used to be so can't suggest a replacement.  Let's see what others 
think about it.
Paolo Bonzini Oct. 13, 2011, 10:17 a.m. UTC | #4
On 10/13/2011 01:04 AM, Richard Kenner wrote:
>
> I still don't like the patch, but I'm no longer as familiar with the code
> as I used to be so can't suggest a replacement.  Let's see what others
> think about it.

Same here, I don't like it but I hardly see any alternative.  The only 
possibility could be to prevent calling expand_compound_operation 
completely for addresses.  Richard, what do you think?  Don't worry, 
combine hasn't changed much since your days. :)

Paolo
Richard Kenner Oct. 13, 2011, 12:51 p.m. UTC | #5
> Same here, I don't like it but I hardly see any alternative.  The only 
> possibility could be to prevent calling expand_compound_operation 
> completely for addresses.  Richard, what do you think?  Don't worry, 
> combine hasn't changed much since your days. :)

The problem wasn't potential changes to combine, but my memory ...

But now I refreshed it and understand the issue.

expand_compound_operation and make_compond_operation were meant to be used
as a pair.  You first do the former, then see if the result can be
simplified, then call the latter in case it couldn't.

In the SET case, we call the latter in simplify_set.  But there's also this
code in combine_simplify_rtx:

    case MEM:
      /* Ensure that our address has any ASHIFTs converted to MULT in case
         address-recognizing predicates are called later.  */
      temp = make_compound_operation (XEXP (x, 0), MEM);
      SUBST (XEXP (x, 0), temp);
      break;

THAT'S the code that should do the transformation that this patch contains.
So I'd suggest doing some debugging and seeing why it isn't.  This could
just be a bug in make_compound_operation not handling the SUBREG.
Paolo Bonzini Oct. 13, 2011, 1:19 p.m. UTC | #6
On 10/13/2011 02:51 PM, Richard Kenner wrote:
>      case MEM:
>        /* Ensure that our address has any ASHIFTs converted to MULT in case
>           address-recognizing predicates are called later.  */
>        temp = make_compound_operation (XEXP (x, 0), MEM);
>        SUBST (XEXP (x, 0), temp);
>        break;
>
> THAT'S the code that should do the transformation that this patch contains.
> So I'd suggest doing some debugging and seeing why it isn't.  This could
> just be a bug in make_compound_operation not handling the SUBREG.

Or being fooled by the 0xfffffffc masking, perhaps.

Paolo
Richard Kenner Oct. 13, 2011, 2:14 p.m. UTC | #7
> Or being fooled by the 0xfffffffc masking, perhaps.

No, I'm pretty sure that's NOT the case.  The *whole point* of the
routine is to deal with that masking.
H.J. Lu Oct. 13, 2011, 3:55 p.m. UTC | #8
On Thu, Oct 13, 2011 at 7:14 AM, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> Or being fooled by the 0xfffffffc masking, perhaps.
>
> No, I'm pretty sure that's NOT the case.  The *whole point* of the
> routine is to deal with that masking.
>

I got

(gdb) step
make_compound_operation (x=0x7ffff139c4c8, in_code=MEM)
    at /export/gnu/import/git/gcc/gcc/combine.c:7572
7572	  enum rtx_code code = GET_CODE (x);
(gdb) call debug_rtx (x)
(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
                (const_int 4 [0x4])) 0)
        (subreg:DI (reg:SI 106) 0))
    (const_int 4294967292 [0xfffffffc]))

and it produces

(gdb) call debug_rtx (x)
(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
                (const_int 4 [0x4])) 0)
        (subreg:DI (reg:SI 106) 0))
    (const_int 4294967292 [0xfffffffc]))

at the end.  make_compound_operation doesn't know how to
restore ZERO_EXTEND.

BTW, there is a small testcase at

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696

You can reproduce it on Linux/x86-64.
Richard Kenner Oct. 13, 2011, 4:11 p.m. UTC | #9
> at the end.  make_compound_operation doesn't know how to
> restore ZERO_EXTEND.

It does in general.  See make_extraction, which it calls.  The question is
why it doesn't in this case.  That's the bug.
H.J. Lu Oct. 13, 2011, 4:30 p.m. UTC | #10
On Thu, Oct 13, 2011 at 9:11 AM, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> at the end.  make_compound_operation doesn't know how to
>> restore ZERO_EXTEND.
>
> It does in general.  See make_extraction, which it calls.  The question is
> why it doesn't in this case.  That's the bug.
>

It never calls make_extraction.  There are several cases handled
for AND operation. But

(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
               (const_int 4 [0x4])) 0)
       (subreg:DI (reg:SI 106) 0))
   (const_int 4294967292 [0xfffffffc]))

isn't one of them.
Richard Kenner Oct. 13, 2011, 4:35 p.m. UTC | #11
> It never calls make_extraction.  There are several cases handled
> for AND operation. But
> 
> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>                (const_int 4 [0x4])) 0)
>        (subreg:DI (reg:SI 106) 0))
>    (const_int 4294967292 [0xfffffffc]))
> 
> isn't one of them.

Yes, clearly.  Otherwise it would work!  The correct fix for this problem
is to make it to do that.  That's where this needs to be fixed: in
make_compound_operation.
Paolo Bonzini Oct. 13, 2011, 5:01 p.m. UTC | #12
On 10/13/2011 06:35 PM, Richard Kenner wrote:
>> It never calls make_extraction.  There are several cases handled
>> for AND operation. But
>>
>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>>                 (const_int 4 [0x4])) 0)
>>         (subreg:DI (reg:SI 106) 0))
>>     (const_int 4294967292 [0xfffffffc]))
>>
>> isn't one of them.
>
> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
> is to make it to do that.  That's where this needs to be fixed: in
> make_compound_operation.

An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is 
correct in not doing this transformation.  I think adding a case to 
make_compound_operation that simply undoes the transformation (without 
calling make_extraction) is fine if you guard it with if (in_code == MEM).

Paolo
Richard Kenner Oct. 13, 2011, 5:06 p.m. UTC | #13
> An and:DI is cheaper than a zero_extend:DI of an and:SI.  

That depends strongly on the constants and whether the machine is 32-bit
or 64-bit. 

But that's irrelevant in this case since the and:SI will be removed (it
reflects what already been done).
H.J. Lu Oct. 13, 2011, 5:19 p.m. UTC | #14
On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 10/13/2011 06:35 PM, Richard Kenner wrote:
>>>
>>> It never calls make_extraction.  There are several cases handled
>>> for AND operation. But
>>>
>>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>>>                (const_int 4 [0x4])) 0)
>>>        (subreg:DI (reg:SI 106) 0))
>>>    (const_int 4294967292 [0xfffffffc]))
>>>
>>> isn't one of them.
>>
>> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
>> is to make it to do that.  That's where this needs to be fixed: in
>> make_compound_operation.
>
> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
> in not doing this transformation.  I think adding a case to
> make_compound_operation that simply undoes the transformation (without
> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>

We first expand zero_extend:DI address to and:DI and then try
to restore zero_extend:DI.   Why do we do this transformation
to begin with?
Paolo Bonzini Oct. 13, 2011, 5:21 p.m. UTC | #15
On Thu, Oct 13, 2011 at 19:19, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> On 10/13/2011 06:35 PM, Richard Kenner wrote:
>>>>
>>>> It never calls make_extraction.  There are several cases handled
>>>> for AND operation. But
>>>>
>>>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>>>>                (const_int 4 [0x4])) 0)
>>>>        (subreg:DI (reg:SI 106) 0))
>>>>    (const_int 4294967292 [0xfffffffc]))
>>>>
>>>> isn't one of them.
>>>
>>> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
>>> is to make it to do that.  That's where this needs to be fixed: in
>>> make_compound_operation.
>>
>> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
>> in not doing this transformation.  I think adding a case to
>> make_compound_operation that simply undoes the transformation (without
>> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>>
>
> We first expand zero_extend:DI address to and:DI and then try
> to restore zero_extend:DI.   Why do we do this transformation
> to begin with?

Because outside of a MEM it may be beneficial _not_ to restore
zero_extend:DI in this case (depending on rtx_costs).

Paolo
H.J. Lu Oct. 13, 2011, 5:22 p.m. UTC | #16
On Thu, Oct 13, 2011 at 10:21 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On Thu, Oct 13, 2011 at 19:19, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>>> On 10/13/2011 06:35 PM, Richard Kenner wrote:
>>>>>
>>>>> It never calls make_extraction.  There are several cases handled
>>>>> for AND operation. But
>>>>>
>>>>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>>>>>                (const_int 4 [0x4])) 0)
>>>>>        (subreg:DI (reg:SI 106) 0))
>>>>>    (const_int 4294967292 [0xfffffffc]))
>>>>>
>>>>> isn't one of them.
>>>>
>>>> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
>>>> is to make it to do that.  That's where this needs to be fixed: in
>>>> make_compound_operation.
>>>
>>> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
>>> in not doing this transformation.  I think adding a case to
>>> make_compound_operation that simply undoes the transformation (without
>>> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>>>
>>
>> We first expand zero_extend:DI address to and:DI and then try
>> to restore zero_extend:DI.   Why do we do this transformation
>> to begin with?
>
> Because outside of a MEM it may be beneficial _not_ to restore
> zero_extend:DI in this case (depending on rtx_costs).
>

Why do we do it for MEM then?
Paolo Bonzini Oct. 13, 2011, 5:29 p.m. UTC | #17
On Thu, Oct 13, 2011 at 19:06, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> An and:DI is cheaper than a zero_extend:DI of an and:SI.
>
> That depends strongly on the constants and whether the machine is 32-bit
> or 64-bit.

Yes, the rtx_costs take care of that.

> But that's irrelevant in this case since the and:SI will be removed (it
> reflects what already been done).

Do you refer to this in make_extraction:

  /* See if this can be done without an extraction.  We never can if the
     width of the field is not the same as that of some integer mode. For
     registers, we can only avoid the extraction if the position is at the
     low-order bit and this is either not in the destination or we have the
     appropriate STRICT_LOW_PART operation available.  */

and this call to force_to_mode in particular:

        new_rtx = force_to_mode (inner, tmode,
                             len >= HOST_BITS_PER_WIDE_INT
                             ? ~(unsigned HOST_WIDE_INT) 0
                             : ((unsigned HOST_WIDE_INT) 1 << len) - 1,
                             0);

and from there the call to simplify_and_const_int that does this:

  if (constop == nonzero)
    return varop;

?

Then indeed it should work if you call make_extraction more greedily
than what we do now (which is, just if the constant is one less than a
power of two).

The answer to H.J.'s "Why do we do it for MEM then?" is simply
"because no one ever thought about not doing it" (because there are no
other POINTERS_EXTEND_UNSIGNED == 1 machines).  In fact it may even be
advantageous to do it in general, even if in_code != MEM.  Only
experimentation can tell.

Paolo
Richard Kenner Oct. 13, 2011, 6:12 p.m. UTC | #18
> We first expand zero_extend:DI address to and:DI and then try
> to restore zero_extend:DI.   Why do we do this transformation
> to begin with?

Suppose there were an outer AND that duplicated what this one did.
Then when you combine those two, you merge it to one AND.  Then
make_compound_operation puts it back.  The net result is to eliminate
the outer AND.  There are lots of similar sorts of things.

As I said, the strategy there was to convert extractions and expansions
into the corresponding logical and shift operations, see if they can
merge with something outside (which is similarly converted), then convert
the result (possibly merged) back.

This, for example, is the code that will remove nested SIGN_EXTENDs.
Richard Kenner Oct. 13, 2011, 6:15 p.m. UTC | #19
> The answer to H.J.'s "Why do we do it for MEM then?" is simply
> "because no one ever thought about not doing it" 

No, that's false.  The same expand_compound_operation / make_compound_operation
pair is present in the MEM case as in the SET case.  It's just that
there's some bug here that's noticable in not making proper MEMs that
doesn't show up in the SET case because of the way the insns are structured.
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..45180e5 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5078,6 +5078,23 @@  subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 	    }
 	}
     }
+#ifdef POINTERS_EXTEND_UNSIGNED
+  else if (POINTERS_EXTEND_UNSIGNED > 0
+	   && code == MEM
+	   && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	   && GET_MODE (XEXP (x, 0)) == Pmode)
+    {
+      /* If an address is zero-extended to Pmode, replace FROM with
+	 TO while keeping ZERO_EXTEND.  */
+      new_rtx = subst (XEXP (XEXP (x, 0), 0), from, to, 0, 0,
+		       unique_copy);
+      /* Drop ZERO_EXTEND on constant.  */
+      if (CONST_INT_P (new_rtx))
+	SUBST (XEXP (x, 0), new_rtx);
+      else
+	SUBST (XEXP (XEXP (x, 0), 0), new_rtx);
+    }
+#endif
   else
     {
       len = GET_RTX_LENGTH (code);