Patchwork PATCH: PR target/50603: [x32] Unnecessary lea

login
register
mail settings
Submitter Uros Bizjak
Date Oct. 7, 2011, 7:13 a.m.
Message ID <CAFULd4ZZAdqdZNZ0oSRjobd2HkKkChqMJGZcDXj=ff=B7WRW-Q@mail.gmail.com>
Download mbox | patch
Permalink /patch/118219/
State New
Headers show

Comments

Uros Bizjak - Oct. 7, 2011, 7:13 a.m.
On Thu, Oct 6, 2011 at 11:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>>>> OTOH, x86_64 and i686 targets can also benefit from this change. If
>>>>>>> combine can't create more complex address (covered by lea), then it
>>>>>>> will simply propagate memory operand back into the add insn. It looks
>>>>>>> to me that we can't loose here, so:
>>>>>>>
>>>>>>>  /* Improve address combine.  */
>>>>>>>  if (code == PLUS && MEM_P (src2))
>>>>>>>    src2 = force_reg (mode, src2);
>>>>>>>
>>>>>>> Any opinions?
>>>>>>>
>>>>>>
>>>>>> It doesn't work with 64bit libstdc++:
>>>>>
>>>>> Yeah, yeah. ix86_output_mi_thunk has some ...  issues.
>>>>>
>>>>> Please try attached patch that introduces ix86_emit_binop and uses it
>>>>> in a bunch of places.
>>>
>>>> I tried it on GCC.  There are no regressions.  The bugs are fixed for x32.
>>>> Here are size comparison with GCC runtime libraries on ia32, x32 and
>>>> x86-64:
>>>
>>>>  884093   18600   27064  929757   e2fdd old libstdc++.so
>>>>  884189   18600   27064  929853   e303d new libs/libstdc++.so
>>>>
>>>> The new code is
>>>>
>>>> mov    0xc(%edi),%eax
>>>> mov    %eax,0x8(%esi)
>>>> mov    -0xc(%eax),%eax
>>>> mov    0x10(%edi),%edx
>>>> lea    0x8(%esi,%eax,1),%eax
>>>>
>>>> The old one is
>>>>
>>>> mov    0xc(%edi),%edx
>>>> lea    0x8(%esi),%eax
>>>> mov    %edx,0x8(%esi)
>>>> add    -0xc(%edx),%eax
>>>> mov    0x10(%edi),%edx
>>>
>>> The new code merged lea+add into one lea, so it looks quite OK to me.
>>>
>>> Do you have some performance numbers?
>>>
>>
>> I will report performance numbers in a few days.
>
> The differences in SPEC CPU 2006 on ia32, x86-64 and
> x32 are within noise range.

Great.

Attached is a slightly updated patch, where we consider only
integer-mode PLUS RTXes.

2011-10-07  Uros Bizjak  <ubizjak@gmail.com>
	    H.J. Lu  <hongjiu.lu@intel.com>

	PR target/50603
	* config/i386/i386.c (ix86_fixup_binary_operands): Force src2 of
	integer PLUS RTX to a register to improve address combine.

testsuite/ChangeLog:

2011-10-07  Uros Bizjak  <ubizjak@gmail.com>
	    H.J. Lu  <hongjiu.lu@intel.com>

	PR target/50603
	* gcc.target/i386/pr50603.c: New test.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.

Patch

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 179645)
+++ config/i386/i386.c	(working copy)
@@ -15798,6 +15798,12 @@  ix86_fixup_binary_operands (enum rtx_code code, en
   if (MEM_P (src1) && !rtx_equal_p (dst, src1))
     src1 = force_reg (mode, src1);
 
+  /* Improve address combine.  */
+  if (code == PLUS
+      && GET_MODE_CLASS (mode) == MODE_INT
+      && MEM_P (src2))
+    src2 = force_reg (mode, src2);
+
   operands[1] = src1;
   operands[2] = src2;
   return dst;
Index: testsuite/gcc.target/i386/pr50603.c
===================================================================
--- testsuite/gcc.target/i386/pr50603.c	(revision 0)
+++ testsuite/gcc.target/i386/pr50603.c	(revision 0)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern int *foo;
+
+int
+bar (int x)
+{
+  return foo[x];
+}
+/* { dg-final { scan-assembler-not "lea\[lq\]" } } */