Patchwork RFC: Turn on -fomit-frame-pointer and -fasynchronous-unwind-tables for Linux/i386

login
register
mail settings
Submitter H.J. Lu
Date July 15, 2010, 5:42 p.m.
Message ID <20100715174246.GA20583@intel.com>
Download mbox | patch
Permalink /patch/58999/
State New
Headers show

Comments

H.J. Lu - July 15, 2010, 5:42 p.m.
Hi,

This patch turns on -fomit-frame-pointer and -fasynchronous-unwind-tables
for Linux/i386.  Tested on Linux/ia32 and Linux/x86-64.  I am also
enclosing a spreadsheet of comparison of shared library segment sizes
in gcc.  Segment size differences range from -1% to 30%. The smaller
the DSO size is, the bigger its size increases. Overall, the size
Chris Lattner - July 15, 2010, 6:08 p.m.
On Jul 15, 2010, at 10:42 AM, H.J. Lu wrote:

> Hi,
> 
> This patch turns on -fomit-frame-pointer and -fasynchronous-unwind-tables
> for Linux/i386.  Tested on Linux/ia32 and Linux/x86-64.  I am also
> enclosing a spreadsheet of comparison of shared library segment sizes
> in gcc.  Segment size differences range from -1% to 30%. The smaller
> the DSO size is, the bigger its size increases. Overall, the size
> difference is close to 0%. This comparison may not be typical since
> C++ and Java libraries have .eh_frame sections anyway.  Any comments?

Does glibc backtrace() look at eh_frame on i386?  I suspect that most unwinders don't.  This is a borderline ABI change.

-Chris
H.J. Lu - July 15, 2010, 6:16 p.m.
On Thu, Jul 15, 2010 at 11:08 AM, Chris Lattner <clattner@apple.com> wrote:
>
> On Jul 15, 2010, at 10:42 AM, H.J. Lu wrote:
>
>> Hi,
>>
>> This patch turns on -fomit-frame-pointer and -fasynchronous-unwind-tables
>> for Linux/i386.  Tested on Linux/ia32 and Linux/x86-64.  I am also
>> enclosing a spreadsheet of comparison of shared library segment sizes
>> in gcc.  Segment size differences range from -1% to 30%. The smaller
>> the DSO size is, the bigger its size increases. Overall, the size
>> difference is close to 0%. This comparison may not be typical since
>> C++ and Java libraries have .eh_frame sections anyway.  Any comments?
>
> Does glibc backtrace() look at eh_frame on i386?  I suspect that most unwinders don't.  This is a borderline ABI change.
>

It does:

http://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/i386/backtrace.c;hb=HEAD
H.J. Lu - July 15, 2010, 7:39 p.m.
On Thu, Jul 15, 2010 at 11:16 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jul 15, 2010 at 11:08 AM, Chris Lattner <clattner@apple.com> wrote:
>>
>> On Jul 15, 2010, at 10:42 AM, H.J. Lu wrote:
>>
>>> Hi,
>>>
>>> This patch turns on -fomit-frame-pointer and -fasynchronous-unwind-tables
>>> for Linux/i386.  Tested on Linux/ia32 and Linux/x86-64.  I am also
>>> enclosing a spreadsheet of comparison of shared library segment sizes
>>> in gcc.  Segment size differences range from -1% to 30%. The smaller
>>> the DSO size is, the bigger its size increases. Overall, the size
>>> difference is close to 0%. This comparison may not be typical since
>>> C++ and Java libraries have .eh_frame sections anyway.  Any comments?
>>
>> Does glibc backtrace() look at eh_frame on i386?  I suspect that most unwinders don't.  This is a borderline ABI change.
>>
>
> It does:
>
> http://sourceware.org/git/?p=glibc.git;a=blob_plain;f=sysdeps/i386/backtrace.c;hb=HEAD
>

I updated the spread sheet with glibc 12.0 DSO sizes
I think -fomit-frame-pointer and -fasynchronous-unwind-tables
will increase size of DSO written in C by about 10%.
Roland McGrath - July 15, 2010, 7:49 p.m.
> I updated the spread sheet with glibc 12.0 DSO sizes
> I think -fomit-frame-pointer and -fasynchronous-unwind-tables
> will increase size of DSO written in C by about 10%.

Are you talking about code size or text size?  Of course
-fasynchronous-unwind-tables increases the size of .eh_frame,
but that is not usually a performance issue.  The actual code
size increase is what affects hot path cache load and so forth.


Thanks,
Roland
H.J. Lu - July 15, 2010, 8:46 p.m.
On Thu, Jul 15, 2010 at 12:49 PM, Roland McGrath <roland@redhat.com> wrote:
>> I updated the spread sheet with glibc 12.0 DSO sizes
>> I think -fomit-frame-pointer and -fasynchronous-unwind-tables
>> will increase size of DSO written in C by about 10%.
>
> Are you talking about code size or text size?  Of course
> -fasynchronous-unwind-tables increases the size of .eh_frame,
> but that is not usually a performance issue.  The actual code
> size increase is what affects hot path cache load and so forth.

-fomit-frame-pointer increases libc.so code size by 1%. We
have to save/restore EBP when using it GPR. But for some reason,
gcc will choose EBP even when ECX/EDX are available:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44958
Chris Lattner - July 15, 2010, 8:55 p.m.
On Jul 15, 2010, at 1:46 PM, H.J. Lu wrote:

> On Thu, Jul 15, 2010 at 12:49 PM, Roland McGrath <roland@redhat.com> wrote:
>>> I updated the spread sheet with glibc 12.0 DSO sizes
>>> I think -fomit-frame-pointer and -fasynchronous-unwind-tables
>>> will increase size of DSO written in C by about 10%.
>> 
>> Are you talking about code size or text size?  Of course
>> -fasynchronous-unwind-tables increases the size of .eh_frame,
>> but that is not usually a performance issue.  The actual code
>> size increase is what affects hot path cache load and so forth.
> 
> -fomit-frame-pointer increases libc.so code size by 1%. We
> have to save/restore EBP when using it GPR. But for some reason,
> gcc will choose EBP even when ECX/EDX are available:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44958

Another issue is that X86 encoding of 42(%esp) takes more space than 42(%ebp).  The former requires a SIB byte, the later doesn't.

-Chris
Andi Kleen - July 16, 2010, 8:13 a.m.
"H.J. Lu" <hjl.tools@gmail.com> writes:

> On Thu, Jul 15, 2010 at 12:49 PM, Roland McGrath <roland@redhat.com> wrote:
>>> I updated the spread sheet with glibc 12.0 DSO sizes
>>> I think -fomit-frame-pointer and -fasynchronous-unwind-tables
>>> will increase size of DSO written in C by about 10%.
>>
>> Are you talking about code size or text size?  Of course
>> -fasynchronous-unwind-tables increases the size of .eh_frame,
>> but that is not usually a performance issue.  The actual code
>> size increase is what affects hot path cache load and so forth.
>
> -fomit-frame-pointer increases libc.so code size by 1%. We
> have to save/restore EBP when using it GPR. But for some reason,
> gcc will choose EBP even when ECX/EDX are available:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44958

Ideal would be if -Os could automatically switch on -fframe-pointer
when the function has many local variables on the stack
(reference through %rbp is one byte smaller than through another GPR)
and otherwise keep it off.

I believe MSVC does that and it's a useful optimization for x86.

But it's presumably hard to do in gcc because FP or no FP
gets decided fairly early. Still perhaps it could be done
with a heuristic?

-Andi
Dave Korn - July 16, 2010, 4:21 p.m.
On 16/07/2010 09:13, Andi Kleen wrote:

> (reference through %rbp is one byte smaller than through another GPR)

> I believe MSVC does that and it's a useful optimization for x86.

  I can't tell if you meant x86_64 or %ebp but those two statements don't
quite go together as-is! :)

    cheers,
      DaveK
Andi Kleen - July 16, 2010, 7:24 p.m.
On Fri, Jul 16, 2010 at 05:21:16PM +0100, Dave Korn wrote:
> On 16/07/2010 09:13, Andi Kleen wrote:
> 
> > (reference through %rbp is one byte smaller than through another GPR)
> 
> > I believe MSVC does that and it's a useful optimization for x86.
> 
>   I can't tell if you meant x86_64 or %ebp but those two statements don't
> quite go together as-is! :)

The optimization is valid for 32bit and 64bit.

When you have a lot of variable references onto the stack it's smaller
to generate a frame pointer.

However on 64bit since there are more registers this should generally happen somewhat 
less often, but there is always code that exceeds even the 16 registers on 64bit
too.

The problem in gcc would be really to figure this out in advance.

-Andi
Dave Korn - July 17, 2010, 7:59 p.m.
On 16/07/2010 20:24, Andi Kleen wrote:
> On Fri, Jul 16, 2010 at 05:21:16PM +0100, Dave Korn wrote:
>> On 16/07/2010 09:13, Andi Kleen wrote:
>>
>>> (reference through %rbp is one byte smaller than through another GPR)
>>> I believe MSVC does that and it's a useful optimization for x86.
>>   I can't tell if you meant x86_64 or %ebp but those two statements don't
>> quite go together as-is! :)
> 
> The optimization is valid for 32bit and 64bit.

  There's no such thing as %rbp in 32-bit x86 is all I meant.  Does the "one
byte smaller" also apply to %ebp indirection on x86?

    cheers,
      DaveK
Andi Kleen - July 17, 2010, 10:11 p.m.
On Sat, Jul 17, 2010 at 08:59:24PM +0100, Dave Korn wrote:
> On 16/07/2010 20:24, Andi Kleen wrote:
> > On Fri, Jul 16, 2010 at 05:21:16PM +0100, Dave Korn wrote:
> >> On 16/07/2010 09:13, Andi Kleen wrote:
> >>
> >>> (reference through %rbp is one byte smaller than through another GPR)
> >>> I believe MSVC does that and it's a useful optimization for x86.
> >>   I can't tell if you meant x86_64 or %ebp but those two statements don't
> >> quite go together as-is! :)
> > 
> > The optimization is valid for 32bit and 64bit.
> 
>   There's no such thing as %rbp in 32-bit x86 is all I meant.  Does the "one
> byte smaller" also apply to %ebp indirection on x86?

Yes. The two encodings are exactly the same.

-Andi

Patch

difference is close to 0%. This comparison may not be typical since
C++ and Java libraries have .eh_frame sections anyway.  Any comments?

Thanks.



H.J.
---
gcc/

2010-07-14  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (override_options): Default
	flag_omit_frame_pointer and flag_asynchronous_unwind_tables
	to TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT and
	TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT.

	* config/i386/i386.h (TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT):
	New.
	(TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT): Likewise.
	* config/i386/linux.h (TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT):
	Likewise.
	(TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT): Likewise.
	* config/i386/linux64.h (TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT):
	Likewise.
	(TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT): Likewise.

gcc/testsuite/

2010-07-15  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/frame-pointer-1.c: New.
	* gcc.target/i386/frame-pointer-2.c: Likewise.
	* gcc.target/i386/frame-pointer-3.c: Likewise.
	* gcc.target/i386/frame-pointer-4.c: Likewise.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4fd2aab..fe30bfd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2985,10 +2985,23 @@  override_options (bool main_args_p)
     {
       if (flag_zee == 2)
         flag_zee = 0;
+      /* Unwind info is not correct around the CFG unless either a
+	 frame pointer is present or -maccumulate-outgoing-args is
+	 set. When both -fasynchronous-unwind-tables and
+	 -fomit-frame-pointer are turned on by default, turn off
+	 both if -mno-accumulate-outgoing-args is used.  */
       if (flag_omit_frame_pointer == 2)
-	flag_omit_frame_pointer = 0;
+	flag_omit_frame_pointer
+	  = (TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT
+	     && (!TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT
+		 || !(target_flags_explicit
+		      & MASK_ACCUMULATE_OUTGOING_ARGS)));
       if (flag_asynchronous_unwind_tables == 2)
-	flag_asynchronous_unwind_tables = 0;
+	flag_asynchronous_unwind_tables
+	  = (TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT
+	     && (!TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT
+		 || !(target_flags_explicit
+		      & MASK_ACCUMULATE_OUTGOING_ARGS)));
       if (flag_pcc_struct_return == 2)
 	flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN;
     }
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index eb3eb9f..c0ae95f 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -467,6 +467,8 @@  extern tree x86_mfence;
 /* Extra bits to force on w/ 32-bit mode.  */
 #define TARGET_SUBTARGET32_DEFAULT 0
 #define TARGET_SUBTARGET32_ISA_DEFAULT 0
+#define TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT 0
+#define TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT 0
 
 /* Extra bits to force on w/ 64-bit mode.  */
 #define TARGET_SUBTARGET64_DEFAULT 0
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index 81dfd1e..61d53b5 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -219,3 +219,10 @@  along with GCC; see the file COPYING3.  If not see
 /* i386 glibc provides __stack_chk_guard in %gs:0x14.  */
 #define TARGET_THREAD_SSP_OFFSET	0x14
 #endif
+
+/* Turn on -fomit-frame-pointer and -fasynchronous-unwind-tables by
+   default.  */
+#undef TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT
+#define TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT 1
+#undef TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT
+#define TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT 1
diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
index 33b4dc9..5a02205 100644
--- a/gcc/config/i386/linux64.h
+++ b/gcc/config/i386/linux64.h
@@ -123,3 +123,10 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    x86_64 glibc provides it in %fs:0x28.  */
 #define TARGET_THREAD_SSP_OFFSET	(TARGET_64BIT ? 0x28 : 0x14)
 #endif
+
+/* Turn on -fomit-frame-pointer and -fasynchronous-unwind-tables by
+   default.  */
+#undef TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT
+#define TARGET_SUBTARGET32_OMIT_FRAME_POINTER_DEFAULT 1
+#undef TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT
+#define TARGET_SUBTARGET32_ASYNCHRONOUS_UNWIND_TABLES_DEFAULT 1
diff --git a/gcc/testsuite/gcc.target/i386/frame-pointer-1.c b/gcc/testsuite/gcc.target/i386/frame-pointer-1.c
new file mode 100644
index 0000000..2495d09
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/frame-pointer-1.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=i686" } */
+/* { dg-require-effective-target ilp32 } */
+
+void bar (int);
+
+void
+foo (void)
+{
+  bar (1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/frame-pointer-2.c b/gcc/testsuite/gcc.target/i386/frame-pointer-2.c
new file mode 100644
index 0000000..1478aba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/frame-pointer-2.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=i686 -mno-accumulate-outgoing-args" } */
+/* { dg-require-effective-target ilp32 } */
+
+void bar (int);
+
+void
+foo (void)
+{
+  bar (1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/frame-pointer-3.c b/gcc/testsuite/gcc.target/i386/frame-pointer-3.c
new file mode 100644
index 0000000..2ff965f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/frame-pointer-3.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=i686 -mpush-args" } */
+/* { dg-require-effective-target ilp32 } */
+
+void bar (int);
+
+void
+foo (void)
+{
+  bar (1);
+}
diff --git a/gcc/testsuite/gcc.target/i386/frame-pointer-4.c b/gcc/testsuite/gcc.target/i386/frame-pointer-4.c
new file mode 100644
index 0000000..2b1bf52
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/frame-pointer-4.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=i686 -mpush-args -mno-accumulate-outgoing-args" } */
+/* { dg-require-effective-target ilp32 } */
+
+void bar (int);
+
+void
+foo (void)
+{
+  bar (1);
+}