diff mbox

PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

Message ID CAMe9rOopUGnQj-DmbOAcPfJGFEoAM+2BSwZ0bwjeQrW9fL=Hng@mail.gmail.com
State New
Headers show

Commit Message

H.J. Lu Oct. 13, 2011, 8:07 p.m. UTC
On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> The answer to H.J.'s "Why do we do it for MEM then?" is simply
>> "because no one ever thought about not doing it"
>
> No, that's false.  The same expand_compound_operation / make_compound_operation
> pair is present in the MEM case as in the SET case.  It's just that
> there's some bug here that's noticable in not making proper MEMs that
> doesn't show up in the SET case because of the way the insns are structured.
>

When we have (and (OP) M) where

(and (OP) M) == (and (OP) ((1 << ceil_log2 (M)) - 1) ))

(and (OP) M) is zero_extract bits 0 to ceil_log2 (M).

Does it look OK?

Thanks.

Comments

Richard Kenner Oct. 13, 2011, 9:23 p.m. UTC | #1
> Does it look OK?

No.  

If I understand your code correctly, there's essentially the same code
as you have a bit above that:

      /* If the constant is one less than a power of two, this might be 
         representable by an extraction even if no shift is present. 
         If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
         we are in a COMPARE.  */
      else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0)
        new_rtx = make_extraction (mode,
                               make_compound_operation (XEXP (x, 0),
                                                        next_code),
                               0, NULL_RTX, i, 1, 0, in_code == COMPARE);

So you need to understand why your code "fires" and it doesn't.
H.J. Lu Oct. 13, 2011, 9:27 p.m. UTC | #2
On Thu, Oct 13, 2011 at 2:23 PM, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> Does it look OK?
>
> No.
>
> If I understand your code correctly, there's essentially the same code
> as you have a bit above that:
>
>      /* If the constant is one less than a power of two, this might be
>         representable by an extraction even if no shift is present.
>         If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>         we are in a COMPARE.  */
>      else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0)
>        new_rtx = make_extraction (mode,
>                               make_compound_operation (XEXP (x, 0),
>                                                        next_code),
>                               0, NULL_RTX, i, 1, 0, in_code == COMPARE);
>
> So you need to understand why your code "fires" and it doesn't.
>
>

It is because mask 0xffffffff is optimized to 0xfffffffc by keeping track
of non-zero bits in registers and the above code doesn't take that
into account.
Richard Kenner Oct. 13, 2011, 9:30 p.m. UTC | #3
> It is because mask 0xffffffff is optimized to 0xfffffffc by keeping track
> of non-zero bits in registers and the above code doesn't take that
> into account.

Then I'd suggest modifying that code so that it does rather than
essentially duplicating it.  But I'd recommend running some
performance tests to verify that you're not pessimizing things when
you do that: this stuff can be very tricky and you want to make sure
that you're not converting something like (and X 3) into a bit
extraction unnecessarily.
H.J. Lu Oct. 13, 2011, 9:45 p.m. UTC | #4
On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner
<kenner@vlsi1.ultra.nyu.edu> wrote:
>> It is because mask 0xffffffff is optimized to 0xfffffffc by keeping track
>> of non-zero bits in registers and the above code doesn't take that
>> into account.
>
> Then I'd suggest modifying that code so that it does rather than
> essentially duplicating it.  But I'd recommend running some
> performance tests to verify that you're not pessimizing things when
> you do that: this stuff can be very tricky and you want to make sure
> that you're not converting something like (and X 3) into a bit
> extraction unnecessarily.
>

But the current code converts (and X 3) into a bit extraction
since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
when UINTVAL (XEXP (x, 1))  == 3.  Should we do it or not?
Richard Kenner Oct. 13, 2011, 10:21 p.m. UTC | #5
> But the current code converts (and X 3) into a bit extraction
> since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
> when UINTVAL (XEXP (x, 1)) == 3.  Should we do it or not?

By adding the test for nonzero bits, you'd potentially be doing the
conversion more often (which is the point of this patch, after all) and
it's therefore necessary to be sure you're not doing it *too* often.
Paolo Bonzini Oct. 14, 2011, 6:51 a.m. UTC | #6
On 10/13/2011 10:07 PM, H.J. Lu wrote:
> On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
> <kenner@vlsi1.ultra.nyu.edu>  wrote:
>>> The answer to H.J.'s "Why do we do it for MEM then?" is simply
>>> "because no one ever thought about not doing it"
>>
>> No, that's false.  The same expand_compound_operation / make_compound_operation
>> pair is present in the MEM case as in the SET case.  It's just that
>> there's some bug here that's noticable in not making proper MEMs that
>> doesn't show up in the SET case because of the way the insns are structured.
>>
>
> When we have (and (OP) M) where
>
> (and (OP) M) == (and (OP) ((1<<  ceil_log2 (M)) - 1) ))
>
> (and (OP) M) is zero_extract bits 0 to ceil_log2 (M).
>
> Does it look OK?

Yes, it does.  How did you test it?

Paolo
H.J. Lu Oct. 14, 2011, 3:36 p.m. UTC | #7
On Thu, Oct 13, 2011 at 11:51 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 10/13/2011 10:07 PM, H.J. Lu wrote:
>>
>> On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
>> <kenner@vlsi1.ultra.nyu.edu>  wrote:
>>>>
>>>> The answer to H.J.'s "Why do we do it for MEM then?" is simply
>>>> "because no one ever thought about not doing it"
>>>
>>> No, that's false.  The same expand_compound_operation /
>>> make_compound_operation
>>> pair is present in the MEM case as in the SET case.  It's just that
>>> there's some bug here that's noticable in not making proper MEMs that
>>> doesn't show up in the SET case because of the way the insns are
>>> structured.
>>>
>>
>> When we have (and (OP) M) where
>>
>> (and (OP) M) == (and (OP) ((1<<  ceil_log2 (M)) - 1) ))
>>
>> (and (OP) M) is zero_extract bits 0 to ceil_log2 (M).
>>
>> Does it look OK?
>
> Yes, it does.  How did you test it?
>

There is a testcase at

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696

It passes with my patch.
Paolo Bonzini Oct. 14, 2011, 4:23 p.m. UTC | #8
On 10/14/2011 05:36 PM, H.J. Lu wrote:
> There is a testcase at
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696
>
> It passes with my patch.

Cool, so let's wait for the results of testing.

Paolo
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..5962b1d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7758,6 +7758,23 @@  make_compound_operation (rtx x, enum rtx_code in_code)
 							next_code),
 			       i, NULL_RTX, 1, 1, 0, 1);

+
+      /* If we are (and (OP) M) and M is an extraction mask, this is an
+	 extraction.  */
+      else
+	{
+	  unsigned HOST_WIDE_INT nonzero =
+	    nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+	  unsigned HOST_WIDE_INT mask = INTVAL (XEXP (x, 1));
+	  unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+	  if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+	      == (nonzero & mask))
+	    {
+	      new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+	      new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+					 len, 1, 0, in_code == COMPARE);
+	    }
+	}
       break;

     case LSHIFTRT: