Patchwork Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls

login
register
mail settings
Submitter Michael Meissner
Date July 6, 2011, 10:29 p.m.
Message ID <20110706222922.GA23641@hungry-tiger.westford.ibm.com>
Download mbox | patch
Permalink /patch/103588/
State New
Headers show

Comments

Michael Meissner - July 6, 2011, 10:29 p.m.
This patch adds an option to not load the static chain (r11) for 64-bit PowerPC
calls through function pointers (or virtual function).  Most of the languages
on the PowerPC do not need the static chain being loaded when called, and
adding this instruction can slow down code that calls very short functions.

In addition, if the function does not call alloca, setjmp or deal with
exceptions where the stack is modified, the compiler can move the store of the
TOC value for the current function to the prologue of the function, rather than
at each call site.

The effect of these patches is to speed up 464.h264ref in the Spec 2006
benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the
save of the TOC register is hoisted).  I believe this is due to the load of the
current function's TOC (r2) having to wait until the store queue is drained
with the store just before the call.

Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the
cause is.

I have bootstraped the compiler and saw that there were no regressions in make
check.  Is it ok to install in the trunk?

[gcc]
2011-07-06  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): New
	declaration.
	(rs6000_save_toc_in_prologue_p): Ditto.

	* config/rs6000/rs6000.opt (-mr11): New switch to disable loading
	up the static chain (r11) during indirect function calls.
	(-msave-toc-indirect): New undocumented debug switch.

	* config/rs6000/rs6000.c (struct machine_function): Add
	save_toc_in_prologue field to note whether the prologue needs to
	save the TOC value in the reserved stack location.
	(rs6000_emit_prologue): Use TOC_REGNUM instead of 2.  If we need
	to save the TOC in the prologue, do so.
	(rs6000_trampoline_init): Don't allow creating AIX style
	trampolines if -mno-r11 is in effect.
	(rs6000_call_indirect_aix): New function to create AIX style
	indirect calls, adding support for -mno-r11 to suppress loading
	the static chain, and saving the TOC in the prologue instead of
	the call body.
	(rs6000_save_toc_in_prologue_p): Return true if we are saving the
	TOC in the prologue.

	* config/rs6000/rs6000.md (STACK_POINTER_REGNUM): Add more fixed
	register numbers.
	(TOC_REGNUM): Ditto.
	(STATIC_CHAIN_REGNUM): Ditto.
	(ARG_POINTER_REGNUM): Ditto.
	(SFP_REGNO): Delete, unused.
	(TOC_SAVE_OFFSET_32BIT): Add constants for AIX TOC save and
	function descriptor offsets.
	(TOC_SAVE_OFFSET_64BIT): Ditto.
	(AIX_FUNC_DESC_TOC_32BIT): Ditto.
	(AIX_FUNC_DESC_TOC_64BIT): Ditto.
	(AIX_FUNC_DESC_SC_32BIT): Ditto.
	(AIX_FUNC_DESC_SC_64BIT): Ditto.
	(ptrload): New mode attribute for the appropriate load of a
	pointer.
	(call_indirect_aix32): Delete, rewrite AIX indirect function
	calls.
	(call_indirect_aix64): Ditto.
	(call_value_indirect_aix32): Ditto.
	(call_value_indirect_aix64): Ditto.
	(call_indirect_nonlocal_aix32_internal): Ditto.
	(call_indirect_nonlocal_aix32): Ditto.
	(call_indirect_nonlocal_aix64_internal): Ditto.
	(call_indirect_nonlocal_aix64): Ditto.
	(call): Rewrite AIX indirect function calls.  Add support for
	eliminating the static chain, and for moving the save of the TOC
	to the function prologue.
	(call_value): Ditto.
	(call_indirect_aix<ptrsize>): Ditto.
	(call_indirect_aix<ptrsize>_internal): Ditto.
	(call_indirect_aix<ptrsize>_internal2): Ditto.
	(call_indirect_aix<ptrsize>_nor11): Ditto.
	(call_value_indirect_aix<ptrsize>): Ditto.
	(call_value_indirect_aix<ptrsize>_internal): Ditto.
	(call_value_indirect_aix<ptrsize>_internal2): Ditto.
	(call_value_indirect_aix<ptrsize>_nor11): Ditto.
	(call_nonlocal_aix32): Relocate in the rs6000.md file.
	(call_nonlocal_aix64): Ditto.

	* doc/invoke.texi (RS/6000 and PowerPC Options): Add -mr11 and
	-mno-r11 documentation.
[gcc/testsuite]
2011-07-06  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/no-r11-1.c: New test for -mr11, -mno-r11.
	* gcc.target/powerpc/no-r11-2.c: Ditto.
	* gcc.target/powerpc/no-r11-3.c: Ditto.
David Edelsohn - July 6, 2011, 10:39 p.m.
On Wed, Jul 6, 2011 at 6:29 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds an option to not load the static chain (r11) for 64-bit PowerPC
> calls through function pointers (or virtual function).  Most of the languages
> on the PowerPC do not need the static chain being loaded when called, and
> adding this instruction can slow down code that calls very short functions.
>
> In addition, if the function does not call alloca, setjmp or deal with
> exceptions where the stack is modified, the compiler can move the store of the
> TOC value for the current function to the prologue of the function, rather than
> at each call site.
>
> The effect of these patches is to speed up 464.h264ref in the Spec 2006
> benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the
> save of the TOC register is hoisted).  I believe this is due to the load of the
> current function's TOC (r2) having to wait until the store queue is drained
> with the store just before the call.
>
> Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the
> cause is.
>
> I have bootstraped the compiler and saw that there were no regressions in make
> check.  Is it ok to install in the trunk?
>
> [gcc]
> 2011-07-06  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): New
>        declaration.
>        (rs6000_save_toc_in_prologue_p): Ditto.
>
>        * config/rs6000/rs6000.opt (-mr11): New switch to disable loading
>        up the static chain (r11) during indirect function calls.
>        (-msave-toc-indirect): New undocumented debug switch.
>
>        * config/rs6000/rs6000.c (struct machine_function): Add
>        save_toc_in_prologue field to note whether the prologue needs to
>        save the TOC value in the reserved stack location.
>        (rs6000_emit_prologue): Use TOC_REGNUM instead of 2.  If we need
>        to save the TOC in the prologue, do so.
>        (rs6000_trampoline_init): Don't allow creating AIX style
>        trampolines if -mno-r11 is in effect.
>        (rs6000_call_indirect_aix): New function to create AIX style
>        indirect calls, adding support for -mno-r11 to suppress loading
>        the static chain, and saving the TOC in the prologue instead of
>        the call body.
>        (rs6000_save_toc_in_prologue_p): Return true if we are saving the
>        TOC in the prologue.
>
>        * config/rs6000/rs6000.md (STACK_POINTER_REGNUM): Add more fixed
>        register numbers.
>        (TOC_REGNUM): Ditto.
>        (STATIC_CHAIN_REGNUM): Ditto.
>        (ARG_POINTER_REGNUM): Ditto.
>        (SFP_REGNO): Delete, unused.
>        (TOC_SAVE_OFFSET_32BIT): Add constants for AIX TOC save and
>        function descriptor offsets.
>        (TOC_SAVE_OFFSET_64BIT): Ditto.
>        (AIX_FUNC_DESC_TOC_32BIT): Ditto.
>        (AIX_FUNC_DESC_TOC_64BIT): Ditto.
>        (AIX_FUNC_DESC_SC_32BIT): Ditto.
>        (AIX_FUNC_DESC_SC_64BIT): Ditto.
>        (ptrload): New mode attribute for the appropriate load of a
>        pointer.
>        (call_indirect_aix32): Delete, rewrite AIX indirect function
>        calls.
>        (call_indirect_aix64): Ditto.
>        (call_value_indirect_aix32): Ditto.
>        (call_value_indirect_aix64): Ditto.
>        (call_indirect_nonlocal_aix32_internal): Ditto.
>        (call_indirect_nonlocal_aix32): Ditto.
>        (call_indirect_nonlocal_aix64_internal): Ditto.
>        (call_indirect_nonlocal_aix64): Ditto.
>        (call): Rewrite AIX indirect function calls.  Add support for
>        eliminating the static chain, and for moving the save of the TOC
>        to the function prologue.
>        (call_value): Ditto.
>        (call_indirect_aix<ptrsize>): Ditto.
>        (call_indirect_aix<ptrsize>_internal): Ditto.
>        (call_indirect_aix<ptrsize>_internal2): Ditto.
>        (call_indirect_aix<ptrsize>_nor11): Ditto.
>        (call_value_indirect_aix<ptrsize>): Ditto.
>        (call_value_indirect_aix<ptrsize>_internal): Ditto.
>        (call_value_indirect_aix<ptrsize>_internal2): Ditto.
>        (call_value_indirect_aix<ptrsize>_nor11): Ditto.
>        (call_nonlocal_aix32): Relocate in the rs6000.md file.
>        (call_nonlocal_aix64): Ditto.
>
>        * doc/invoke.texi (RS/6000 and PowerPC Options): Add -mr11 and
>        -mno-r11 documentation.
> [gcc/testsuite]
> 2011-07-06  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * gcc.target/powerpc/no-r11-1.c: New test for -mr11, -mno-r11.
>        * gcc.target/powerpc/no-r11-2.c: Ditto.
>        * gcc.target/powerpc/no-r11-3.c: Ditto.

Okay.

Thanks, David
Michael Meissner - July 6, 2011, 11:39 p.m.
I  updated the html documents for my two recent changes:

*** changes.html.~1~	2011-07-06 19:26:37.000000000 -0400
--- changes.html	2011-07-06 19:35:22.000000000 -0400
***************
*** 48,54 ****
  <h2>General Optimizer Improvements</h2>
  
    <ul>
!     <li>...</li>
    </ul>
  
  <h2>New Languages and Language specific improvements</h2>
--- 48,57 ----
  <h2>General Optimizer Improvements</h2>
  
    <ul>
!     <li>Support for a new parameter <code>--param case-value-threshold=n</code>
!     was added to allow users to control the cutoff between doing switch statements
!     as a series of if statements and using a jump table.
!     </li>
    </ul>
  
  <h2>New Languages and Language specific improvements</h2>
*************** struct F: E { }; // error: deriving from
*** 230,235 ****
--- 233,246 ----
         instruction set.  Previously the GCC compiler did not adhere to the ABI
         for 128-bit vectors with 64-bit integer base types (PR 48857).
         This will also be fixed in the GCC 4.6.1 and 4.5.4 releases.</li>
+ 
+      <li>A new option (<code>-mno-r11)</code> was added to allow AIX
+        32-bit/64-bit and Linux 64-bit PowerPC users to specify that the compiler
+        should not load up the chain register (<i>r11</i>) before calling a
+        function through a pointer.  If you use this option, you cannot call
+        nested functions through a pointer, or call other languages that might
+        use the static chain.
+      </li>
    </ul>
  
  <h3>MIPS</h3>
Richard Guenther - July 7, 2011, 8:59 a.m.
On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds an option to not load the static chain (r11) for 64-bit PowerPC
> calls through function pointers (or virtual function).  Most of the languages
> on the PowerPC do not need the static chain being loaded when called, and
> adding this instruction can slow down code that calls very short functions.
>
> In addition, if the function does not call alloca, setjmp or deal with
> exceptions where the stack is modified, the compiler can move the store of the
> TOC value for the current function to the prologue of the function, rather than
> at each call site.
>
> The effect of these patches is to speed up 464.h264ref in the Spec 2006
> benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the
> save of the TOC register is hoisted).  I believe this is due to the load of the
> current function's TOC (r2) having to wait until the store queue is drained
> with the store just before the call.
>
> Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the
> cause is.
>
> I have bootstraped the compiler and saw that there were no regressions in make
> check.  Is it ok to install in the trunk?

Hum.  Can't the compiler figure this our itself per-call-site?  At least
the name of the command-line switch -m[no-]r11 is meaningless to me.
Points-to information should be able to tell you if the function pointer
points to a nested function.

Richard.

> [gcc]
> 2011-07-06  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): New
>        declaration.
>        (rs6000_save_toc_in_prologue_p): Ditto.
>
>        * config/rs6000/rs6000.opt (-mr11): New switch to disable loading
>        up the static chain (r11) during indirect function calls.
>        (-msave-toc-indirect): New undocumented debug switch.
>
>        * config/rs6000/rs6000.c (struct machine_function): Add
>        save_toc_in_prologue field to note whether the prologue needs to
>        save the TOC value in the reserved stack location.
>        (rs6000_emit_prologue): Use TOC_REGNUM instead of 2.  If we need
>        to save the TOC in the prologue, do so.
>        (rs6000_trampoline_init): Don't allow creating AIX style
>        trampolines if -mno-r11 is in effect.
>        (rs6000_call_indirect_aix): New function to create AIX style
>        indirect calls, adding support for -mno-r11 to suppress loading
>        the static chain, and saving the TOC in the prologue instead of
>        the call body.
>        (rs6000_save_toc_in_prologue_p): Return true if we are saving the
>        TOC in the prologue.
>
>        * config/rs6000/rs6000.md (STACK_POINTER_REGNUM): Add more fixed
>        register numbers.
>        (TOC_REGNUM): Ditto.
>        (STATIC_CHAIN_REGNUM): Ditto.
>        (ARG_POINTER_REGNUM): Ditto.
>        (SFP_REGNO): Delete, unused.
>        (TOC_SAVE_OFFSET_32BIT): Add constants for AIX TOC save and
>        function descriptor offsets.
>        (TOC_SAVE_OFFSET_64BIT): Ditto.
>        (AIX_FUNC_DESC_TOC_32BIT): Ditto.
>        (AIX_FUNC_DESC_TOC_64BIT): Ditto.
>        (AIX_FUNC_DESC_SC_32BIT): Ditto.
>        (AIX_FUNC_DESC_SC_64BIT): Ditto.
>        (ptrload): New mode attribute for the appropriate load of a
>        pointer.
>        (call_indirect_aix32): Delete, rewrite AIX indirect function
>        calls.
>        (call_indirect_aix64): Ditto.
>        (call_value_indirect_aix32): Ditto.
>        (call_value_indirect_aix64): Ditto.
>        (call_indirect_nonlocal_aix32_internal): Ditto.
>        (call_indirect_nonlocal_aix32): Ditto.
>        (call_indirect_nonlocal_aix64_internal): Ditto.
>        (call_indirect_nonlocal_aix64): Ditto.
>        (call): Rewrite AIX indirect function calls.  Add support for
>        eliminating the static chain, and for moving the save of the TOC
>        to the function prologue.
>        (call_value): Ditto.
>        (call_indirect_aix<ptrsize>): Ditto.
>        (call_indirect_aix<ptrsize>_internal): Ditto.
>        (call_indirect_aix<ptrsize>_internal2): Ditto.
>        (call_indirect_aix<ptrsize>_nor11): Ditto.
>        (call_value_indirect_aix<ptrsize>): Ditto.
>        (call_value_indirect_aix<ptrsize>_internal): Ditto.
>        (call_value_indirect_aix<ptrsize>_internal2): Ditto.
>        (call_value_indirect_aix<ptrsize>_nor11): Ditto.
>        (call_nonlocal_aix32): Relocate in the rs6000.md file.
>        (call_nonlocal_aix64): Ditto.
>
>        * doc/invoke.texi (RS/6000 and PowerPC Options): Add -mr11 and
>        -mno-r11 documentation.
> [gcc/testsuite]
> 2011-07-06  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        * gcc.target/powerpc/no-r11-1.c: New test for -mr11, -mno-r11.
>        * gcc.target/powerpc/no-r11-2.c: Ditto.
>        * gcc.target/powerpc/no-r11-3.c: Ditto.
>
> --
> Michael Meissner, IBM
> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
> meissner@linux.vnet.ibm.com     fax +1 (978) 399-6899
>
Jakub Jelinek - July 7, 2011, 9:03 a.m.
On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote:
> Hum.  Can't the compiler figure this our itself per-call-site?  At least
> the name of the command-line switch -m[no-]r11 is meaningless to me.
> Points-to information should be able to tell you if the function pointer
> points to a nested function.

Yeah.  E.g. for C++ virtual method calls I believe all function pointers in
vtables should always ignore the static chain pointer, etc., because you
can't have a nested method.

	Jakub
Richard Guenther - July 7, 2011, 9:12 a.m.
On Thu, Jul 7, 2011 at 11:03 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote:
>> Hum.  Can't the compiler figure this our itself per-call-site?  At least
>> the name of the command-line switch -m[no-]r11 is meaningless to me.
>> Points-to information should be able to tell you if the function pointer
>> points to a nested function.
>
> Yeah.  E.g. for C++ virtual method calls I believe all function pointers in
> vtables should always ignore the static chain pointer, etc., because you
> can't have a nested method.

For this kind of FE specific info you could use a flag on the CALL_EXPR
as well.

Richard.
Michael Meissner - July 7, 2011, 3:47 p.m.
On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote:
> On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patch adds an option to not load the static chain (r11) for 64-bit PowerPC
> > calls through function pointers (or virtual function).  Most of the languages
> > on the PowerPC do not need the static chain being loaded when called, and
> > adding this instruction can slow down code that calls very short functions.
> >
> > In addition, if the function does not call alloca, setjmp or deal with
> > exceptions where the stack is modified, the compiler can move the store of the
> > TOC value for the current function to the prologue of the function, rather than
> > at each call site.
> >
> > The effect of these patches is to speed up 464.h264ref in the Spec 2006
> > benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the
> > save of the TOC register is hoisted).  I believe this is due to the load of the
> > current function's TOC (r2) having to wait until the store queue is drained
> > with the store just before the call.
> >
> > Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the
> > cause is.
> >
> > I have bootstraped the compiler and saw that there were no regressions in make
> > check.  Is it ok to install in the trunk?
> 
> Hum.  Can't the compiler figure this our itself per-call-site?  At least
> the name of the command-line switch -m[no-]r11 is meaningless to me.
> Points-to information should be able to tell you if the function pointer
> points to a nested function.

No, the compiler cannot figure it out.  Consider the case where a function is
passed a pointer to a function, such as the standard library function qsort.
The call may come from any random module, that isn't part of the compilation
suite, such as if the function being passed the pointer is in a shared library.
You don't know whether the function pointed to uses the static chain
(i.e. nested function call with trampoline, call to PL/I, or other language
that does use the static chain, which is part of the ABI).  The point of the
switch is similar to -ffast-math where you say you are willing to ignore some
corner cases in the standard in order to get better performance.

I certainly can call the switch -mno-static-chain, which is perhaps more
meaningful (at least to us compiler folk, I'm not sure static chain means much
to the normal programmer).
Richard Guenther - July 7, 2011, 3:53 p.m.
On Thu, Jul 7, 2011 at 5:47 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote:
>> On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner
>> <meissner@linux.vnet.ibm.com> wrote:
>> > This patch adds an option to not load the static chain (r11) for 64-bit PowerPC
>> > calls through function pointers (or virtual function).  Most of the languages
>> > on the PowerPC do not need the static chain being loaded when called, and
>> > adding this instruction can slow down code that calls very short functions.
>> >
>> > In addition, if the function does not call alloca, setjmp or deal with
>> > exceptions where the stack is modified, the compiler can move the store of the
>> > TOC value for the current function to the prologue of the function, rather than
>> > at each call site.
>> >
>> > The effect of these patches is to speed up 464.h264ref in the Spec 2006
>> > benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the
>> > save of the TOC register is hoisted).  I believe this is due to the load of the
>> > current function's TOC (r2) having to wait until the store queue is drained
>> > with the store just before the call.
>> >
>> > Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the
>> > cause is.
>> >
>> > I have bootstraped the compiler and saw that there were no regressions in make
>> > check.  Is it ok to install in the trunk?
>>
>> Hum.  Can't the compiler figure this our itself per-call-site?  At least
>> the name of the command-line switch -m[no-]r11 is meaningless to me.
>> Points-to information should be able to tell you if the function pointer
>> points to a nested function.
>
> No, the compiler cannot figure it out.  Consider the case where a function is
> passed a pointer to a function, such as the standard library function qsort.
> The call may come from any random module, that isn't part of the compilation
> suite, such as if the function being passed the pointer is in a shared library.
> You don't know whether the function pointed to uses the static chain
> (i.e. nested function call with trampoline, call to PL/I, or other language
> that does use the static chain, which is part of the ABI).  The point of the
> switch is similar to -ffast-math where you say you are willing to ignore some
> corner cases in the standard in order to get better performance.

Well, I guess you don't propose to build glibc with -mno-r11?  The compiler
certainly can't figure out in _all_ cases - but it should be able to handle
most of the cases (with LTO even more cases) ok, no?

I also wonder why loading a register is so expensive compared to the
actual call ...

> I certainly can call the switch -mno-static-chain, which is perhaps more
> meaningful (at least to us compiler folk, I'm not sure static chain means much
> to the normal programmer).

Well, that's up to the target maintainers to decide, maybe
-mno-nested-functions instead?

Richard.
Michael Meissner - July 7, 2011, 4:13 p.m.
On Thu, Jul 07, 2011 at 05:53:09PM +0200, Richard Guenther wrote:
> Well, I guess you don't propose to build glibc with -mno-r11?  The compiler
> certainly can't figure out in _all_ cases - but it should be able to handle
> most of the cases (with LTO even more cases) ok, no?

No, we are no proposing to build glibc or any standard library with -mno-r11.

> I also wonder why loading a register is so expensive compared to the
> actual call ...

We are trying to eliminate instructions in the indirect function call pathway,
and this happens to be the first and easiest.  As I said, I see a 5-7% gain in
h264ref, but a 2-3% drop in mcf.  In addition to saving of not loading r11,
perhaps more of the gain comes from not saving the TOC (r2) at the point of the
call, but moving it into the prologue for functions that don't call alloca,
setjmp, or have exceptions.  This is because the instruction sequence before
the change was:

	ld r0,0(<ptr>)		/* load function address */
	mtctr r0		/* move to ctr register */
	st r2,40(r1)		/* save TOC value */
	ld r2,8(<ptr>)		/* load new TOC value */
	ld r11,16(<ptr>)	/* load static chain */
	bctrl			/* call function */
	ld r2,40(r1)		/* reload our TOC */

The ld of r2 has to wait for the store queue to drain in some cases, because it
is loading a value being stored.


> > I certainly can call the switch -mno-static-chain, which is perhaps more
> > meaningful (at least to us compiler folk, I'm not sure static chain means much
> > to the normal programmer).
> 
> Well, that's up to the target maintainers to decide, maybe
> -mno-nested-functions instead?

David?
Tristan Gingold - July 7, 2011, 4:14 p.m.
[...]

On Jul 7, 2011, at 5:53 PM, Richard Guenther wrote:

> On Thu, Jul 7, 2011 at 5:47 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> I certainly can call the switch -mno-static-chain, which is perhaps more
>> meaningful (at least to us compiler folk, I'm not sure static chain means much
>> to the normal programmer).
> 
> Well, that's up to the target maintainers to decide, maybe
> -mno-nested-functions instead?

Isn't that an issue of pointer to nested functions rather than nested functions ?
So -mno-nested-function-pointers would be more accurate

That's somewhat important from an Ada POV as nested subprograms are common, but
access/pointer to nested subprogram is not very usual.

My two cents.
Tristan.
David Edelsohn - July 7, 2011, 7:14 p.m.
On Thu, Jul 7, 2011 at 11:53 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:

> Well, that's up to the target maintainers to decide, maybe
> -mno-nested-functions instead?

Is -mno-nested-functions or -mno-nested-function-pointers too
C-centric or GCC-centric?  I don't know what wording would be more
informative, but the functionality is available in Pascal, PL/I, Ada,
GCC extensions and other languages.  We're open to suggestions.

> The compiler certainly can't figure out in _all_ cases - but it should be able to handle
> most of the cases (with LTO even more cases) ok, no?

-mno-r11 is an assertion to the compiler that no function calls
through pointers will require the static chain.  However, I agree that
the compiler conservatively should be able to figure out some cases
itself, which would be a good enhancement.

Thanks, David
Richard Guenther - July 7, 2011, 8:19 p.m.
On Thu, Jul 7, 2011 at 9:14 PM, David Edelsohn <dje.gcc@gmail.com> wrote:
> On Thu, Jul 7, 2011 at 11:53 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>> Well, that's up to the target maintainers to decide, maybe
>> -mno-nested-functions instead?
>
> Is -mno-nested-functions or -mno-nested-function-pointers too
> C-centric or GCC-centric?  I don't know what wording would be more
> informative, but the functionality is available in Pascal, PL/I, Ada,
> GCC extensions and other languages.  We're open to suggestions.
>
>> The compiler certainly can't figure out in _all_ cases - but it should be able to handle
>> most of the cases (with LTO even more cases) ok, no?
>
> -mno-r11 is an assertion to the compiler that no function calls
> through pointers will require the static chain.  However, I agree that
> the compiler conservatively should be able to figure out some cases
> itself, which would be a good enhancement.

Does XLC have a similar switch whose name we can use?

Richard.

> Thanks, David
>
David Edelsohn - July 11, 2011, 2:04 p.m.
On Thu, Jul 7, 2011 at 4:19 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:

> Does XLC have a similar switch whose name we can use?

The IBM XL compiler is discussing a similar feature, but it is not
implemented yet and does not have a formal command line option name.

- David

Patch

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 175921)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -171,6 +171,8 @@  extern unsigned int rs6000_dbx_register_
 extern void rs6000_emit_epilogue (int);
 extern void rs6000_emit_eh_reg_restore (rtx, rtx);
 extern const char * output_isel (rtx *);
+extern void rs6000_call_indirect_aix (rtx, rtx, rtx);
+extern bool rs6000_save_toc_in_prologue_p (void);
 
 extern void rs6000_aix_asm_output_dwarf_table_ref (char *);
 
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 175921)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -521,4 +521,10 @@  mxilinx-fpu
 Target Var(rs6000_xilinx_fpu) Save
 Specify Xilinx FPU.
 
+mr11
+Target Report Var(TARGET_R11) Init(1) Save
+Use/do not use r11 to hold the static link in calls.
 
+msave-toc-indirect
+Target Undocumented Var(TARGET_SAVE_TOC_INDIRECT) Save Init(1)
+; Control whether we save the TOC in the prologue for indirect calls or generate the save inline
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 175921)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -130,6 +130,9 @@  typedef struct GTY(()) machine_function
   int ra_need_lr;
   /* Cache lr_save_p after expansion of builtin_eh_return.  */
   int lr_save_state;
+  /* Whether we need to save the TOC to the reserved stack location in the
+     function prologue.  */
+  bool save_toc_in_prologue;
   /* Offset from virtual_stack_vars_rtx to the start of the ABI_V4
      varargs save area.  */
   HOST_WIDE_INT varargs_save_offset;
@@ -20325,7 +20328,7 @@  rs6000_emit_prologue (void)
       JUMP_LABEL (jump) = toc_save_done;
       LABEL_NUSES (toc_save_done) += 1;
 
-      emit_frame_save (frame_reg_rtx, frame_ptr_rtx, reg_mode, 2,
+      emit_frame_save (frame_reg_rtx, frame_ptr_rtx, reg_mode, TOC_REGNUM,
 		       sp_offset + 5 * reg_size, info->total_size);
       emit_label (toc_save_done);
       if (using_static_chain_p)
@@ -20516,6 +20519,11 @@  rs6000_emit_prologue (void)
 	emit_move_insn (lr, gen_rtx_REG (Pmode, 0));
     }
 #endif
+
+  /* If we need to, save the TOC register after doing the stack setup.  */
+  if (rs6000_save_toc_in_prologue_p ())
+    emit_frame_save (sp_reg_rtx, sp_reg_rtx, reg_mode, TOC_REGNUM,
+		     5 * reg_size, info->total_size);
 }
 
 /* Write function prologue.  */
@@ -24469,9 +24477,14 @@  rs6000_trampoline_init (rtx m_tramp, tre
     /* Under AIX, just build the 3 word function descriptor */
     case ABI_AIX:
       {
-	rtx fnmem = gen_const_mem (Pmode, force_reg (Pmode, fnaddr));
-	rtx fn_reg = gen_reg_rtx (Pmode);
-	rtx toc_reg = gen_reg_rtx (Pmode);
+	rtx fnmem, fn_reg, toc_reg;
+
+	if (!TARGET_R11)
+	  error ("-mno-r11 must not be used if you have trampolines");
+
+	fnmem = gen_const_mem (Pmode, force_reg (Pmode, fnaddr));
+	fn_reg = gen_reg_rtx (Pmode);
+	toc_reg = gen_reg_rtx (Pmode);
 
   /* Macro to shorten the code expansions below.  */
 # define MEM_PLUS(MEM, OFFSET) adjust_address (MEM, Pmode, OFFSET)
@@ -27760,4 +27773,132 @@  rs6000_legitimate_constant_p (enum machi
 	  || easy_vector_constant (x, mode));
 }
 
+
+/* A function pointer under AIX is a pointer to a data area whose first word
+   contains the actual address of the function, whose second word contains a
+   pointer to its TOC, and whose third word contains a value to place in the
+   static chain register (r11).  Note that if we load the static chain, our
+   "trampoline" need not have any executable code.  */
+
+void
+rs6000_call_indirect_aix (rtx value, rtx func_desc, rtx flag)
+{
+  rtx func_addr;
+  rtx toc_reg;
+  rtx sc_reg;
+  rtx stack_ptr;
+  rtx stack_toc_offset;
+  rtx stack_toc_mem;
+  rtx func_toc_offset;
+  rtx func_toc_mem;
+  rtx func_sc_offset;
+  rtx func_sc_mem;
+  rtx insn;
+  rtx (*call_func) (rtx, rtx, rtx, rtx);
+  rtx (*call_value_func) (rtx, rtx, rtx, rtx, rtx);
+
+  stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  toc_reg = gen_rtx_REG (Pmode, TOC_REGNUM);
+
+  /* Load up address of the actual function.  */
+  func_desc = force_reg (Pmode, func_desc);
+  func_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (func_addr, gen_rtx_MEM (Pmode, func_desc));
+
+  if (TARGET_32BIT)
+    {
+
+      stack_toc_offset = GEN_INT (TOC_SAVE_OFFSET_32BIT);
+      func_toc_offset = GEN_INT (AIX_FUNC_DESC_TOC_32BIT);
+      func_sc_offset = GEN_INT (AIX_FUNC_DESC_SC_32BIT);
+      if (TARGET_R11)
+	{
+	  call_func = gen_call_indirect_aix32bit;
+	  call_value_func = gen_call_value_indirect_aix32bit;
+	}
+      else
+	{
+	  call_func = gen_call_indirect_aix32bit_nor11;
+	  call_value_func = gen_call_value_indirect_aix32bit_nor11;
+	}
+    }
+  else
+    {
+      stack_toc_offset = GEN_INT (TOC_SAVE_OFFSET_64BIT);
+      func_toc_offset = GEN_INT (AIX_FUNC_DESC_TOC_64BIT);
+      func_sc_offset = GEN_INT (AIX_FUNC_DESC_SC_64BIT);
+      if (TARGET_R11)
+	{
+	  call_func = gen_call_indirect_aix64bit;
+	  call_value_func = gen_call_value_indirect_aix64bit;
+	}
+      else
+	{
+	  call_func = gen_call_indirect_aix64bit_nor11;
+	  call_value_func = gen_call_value_indirect_aix64bit_nor11;
+	}
+    }
+
+  /* Reserved spot to store the TOC.  */
+  stack_toc_mem = gen_frame_mem (Pmode,
+				 gen_rtx_PLUS (Pmode,
+					       stack_ptr,
+					       stack_toc_offset));
+
+  gcc_assert (cfun);
+  gcc_assert (cfun->machine);
+
+  /* Can we optimize saving the TOC in the prologue or do we need to do it at
+     every call?  */
+  if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca
+      && !cfun->calls_setjmp && !cfun->has_nonlocal_label
+      && !cfun->can_throw_non_call_exceptions
+      && ((flags_from_decl_or_type (cfun->decl) & ECF_NOTHROW) == ECF_NOTHROW))
+    cfun->machine->save_toc_in_prologue = true;
+
+  else
+    {
+      MEM_VOLATILE_P (stack_toc_mem) = 1;
+      emit_move_insn (stack_toc_mem, toc_reg);
+    }
+
+  /* Calculate the address to load the TOC of the called function.  We don't
+     actually load this until the split after reload.  */
+  func_toc_mem = gen_rtx_MEM (Pmode,
+			      gen_rtx_PLUS (Pmode,
+					    func_desc,
+					    func_toc_offset));
+
+  /* If we have a static chain, load it up.  */
+  if (TARGET_R11)
+    {
+      func_sc_mem = gen_rtx_MEM (Pmode,
+				 gen_rtx_PLUS (Pmode,
+					       func_desc,
+					       func_sc_offset));
+
+      sc_reg = gen_rtx_REG (Pmode, STATIC_CHAIN_REGNUM);
+      emit_move_insn (sc_reg, func_sc_mem);
+    }
+
+  /* Create the call.  */
+  if (value)
+    insn = call_value_func (value, func_addr, flag, func_toc_mem,
+			    stack_toc_mem);
+  else
+    insn = call_func (func_addr, flag, func_toc_mem, stack_toc_mem);
+
+  emit_call_insn (insn);
+  return;
+}
+
+/* Return whether we need to always update the saved TOC pointer when we update
+   the stack pointer.  */
+
+bool
+rs6000_save_toc_in_prologue_p (void)
+{
+  return (cfun && cfun->machine && cfun->machine->save_toc_in_prologue);
+}
+
 #include "gt-rs6000.h"
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 175921)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -27,9 +27,14 @@ 
 ;;
 
 (define_constants
-  [(MQ_REGNO			64)
+  [(STACK_POINTER_REGNUM	1)
+   (TOC_REGNUM			2)
+   (STATIC_CHAIN_REGNUM		11)
+   (HARD_FRAME_POINTER_REGNUM	31)
+   (MQ_REGNO			64)
    (LR_REGNO			65)
    (CTR_REGNO			66)
+   (ARG_POINTER_REGNUM		67)
    (CR0_REGNO			68)
    (CR1_REGNO			69)
    (CR2_REGNO			70)
@@ -46,7 +51,19 @@  (define_constants
    (VSCR_REGNO			110)
    (SPE_ACC_REGNO		111)
    (SPEFSCR_REGNO		112)
-   (SFP_REGNO			113)
+   (FRAME_POINTER_REGNUM	113)
+
+   ; ABI defined stack offsets for storing the TOC pointer with AIX calls.
+   (TOC_SAVE_OFFSET_32BIT	20)
+   (TOC_SAVE_OFFSET_64BIT	40)
+
+   ; Function TOC offset in the AIX function descriptor.
+   (AIX_FUNC_DESC_TOC_32BIT	4)
+   (AIX_FUNC_DESC_TOC_64BIT	8)
+
+   ; Static chain offset in the AIX function descriptor.
+   (AIX_FUNC_DESC_SC_32BIT	8)
+   (AIX_FUNC_DESC_SC_64BIT	16)
   ])
 
 ;;
@@ -267,6 +284,9 @@  (define_mode_attr tptrsize [(SI "TARGET_
 (define_mode_attr mptrsize [(SI "si")
 			    (DI "di")])
 
+(define_mode_attr ptrload [(SI "{l|lwz}")
+			   (DI "ld")])
+
 (define_mode_attr rreg [(SF   "f")
 			(DF   "ws")
 			(V4SF "wf")
@@ -12178,87 +12198,7 @@  (define_insn "largetoc_low"
    "TARGET_ELF && TARGET_CMODEL != CMODEL_SMALL"
    "{cal %0,%2@l(%1)|addi %0,%1,%2@l}")
 
-;; A function pointer under AIX is a pointer to a data area whose first word
-;; contains the actual address of the function, whose second word contains a
-;; pointer to its TOC, and whose third word contains a value to place in the
-;; static chain register (r11).  Note that if we load the static chain, our
-;; "trampoline" need not have any executable code.
-
-(define_expand "call_indirect_aix32"
-  [(set (match_dup 2)
-	(mem:SI (match_operand:SI 0 "gpc_reg_operand" "")))
-   (set (mem:SI (plus:SI (reg:SI 1) (const_int 20)))
-	(reg:SI 2))
-   (set (reg:SI 11)
-	(mem:SI (plus:SI (match_dup 0)
-			 (const_int 8))))
-   (parallel [(call (mem:SI (match_dup 2))
-		    (match_operand 1 "" ""))
-	      (use (mem:SI (plus:SI (match_dup 0) (const_int 4))))
-	      (use (reg:SI 11))
-	      (use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	      (clobber (reg:SI LR_REGNO))])]
-  "TARGET_32BIT"
-  "
-{ operands[2] = gen_reg_rtx (SImode); }")
-
-(define_expand "call_indirect_aix64"
-  [(set (match_dup 2)
-	(mem:DI (match_operand:DI 0 "gpc_reg_operand" "")))
-   (set (mem:DI (plus:DI (reg:DI 1) (const_int 40)))
-	(reg:DI 2))
-   (set (reg:DI 11)
-	(mem:DI (plus:DI (match_dup 0)
-			 (const_int 16))))
-   (parallel [(call (mem:SI (match_dup 2))
-		    (match_operand 1 "" ""))
-	      (use (mem:DI (plus:DI (match_dup 0) (const_int 8))))
-	      (use (reg:DI 11))
-	      (use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
-  "TARGET_64BIT"
-  "
-{ operands[2] = gen_reg_rtx (DImode); }")
-
-(define_expand "call_value_indirect_aix32"
-  [(set (match_dup 3)
-	(mem:SI (match_operand:SI 1 "gpc_reg_operand" "")))
-   (set (mem:SI (plus:SI (reg:SI 1) (const_int 20)))
-	(reg:SI 2))
-   (set (reg:SI 11)
-	(mem:SI (plus:SI (match_dup 1)
-			 (const_int 8))))
-   (parallel [(set (match_operand 0 "" "")
-		   (call (mem:SI (match_dup 3))
-			 (match_operand 2 "" "")))
-	      (use (mem:SI (plus:SI (match_dup 1) (const_int 4))))
-	      (use (reg:SI 11))
-	      (use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	      (clobber (reg:SI LR_REGNO))])]
-  "TARGET_32BIT"
-  "
-{ operands[3] = gen_reg_rtx (SImode); }")
-
-(define_expand "call_value_indirect_aix64"
-  [(set (match_dup 3)
-	(mem:DI (match_operand:DI 1 "gpc_reg_operand" "")))
-   (set (mem:DI (plus:DI (reg:DI 1) (const_int 40)))
-	(reg:DI 2))
-   (set (reg:DI 11)
-	(mem:DI (plus:DI (match_dup 1)
-			 (const_int 16))))
-   (parallel [(set (match_operand 0 "" "")
-		   (call (mem:SI (match_dup 3))
-			 (match_operand 2 "" "")))
-	      (use (mem:DI (plus:DI (match_dup 1) (const_int 8))))
-	      (use (reg:DI 11))
-	      (use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
-  "TARGET_64BIT"
-  "
-{ operands[3] = gen_reg_rtx (DImode); }")
-
-;; Now the definitions for the call and call_value insns
+;; Call and call_value insns
 (define_expand "call"
   [(parallel [(call (mem:SI (match_operand 0 "address_operand" ""))
 		    (match_operand 1 "" ""))
@@ -12294,13 +12234,7 @@  (define_expand "call"
 	case ABI_AIX:
 	  /* AIX function pointers are really pointers to a three word
 	     area.  */
-	  emit_call_insn (TARGET_32BIT
-			  ? gen_call_indirect_aix32 (force_reg (SImode,
-							        operands[0]),
-						     operands[1])
-			  : gen_call_indirect_aix64 (force_reg (DImode,
-							        operands[0]),
-						     operands[1]));
+	  rs6000_call_indirect_aix (NULL_RTX, operands[0], operands[1]);
 	  DONE;
 
 	default:
@@ -12345,15 +12279,7 @@  (define_expand "call_value"
 	case ABI_AIX:
 	  /* AIX function pointers are really pointers to a three word
 	     area.  */
-	  emit_call_insn (TARGET_32BIT
-			  ? gen_call_value_indirect_aix32 (operands[0],
-							   force_reg (SImode,
-								      operands[1]),
-							   operands[2])
-			  : gen_call_value_indirect_aix64 (operands[0],
-							   force_reg (DImode,
-								      operands[1]),
-							   operands[2]));
+	  rs6000_call_indirect_aix (operands[0], operands[1], operands[2]);
 	  DONE;
 
 	default:
@@ -12447,149 +12373,202 @@  (define_insn "*call_value_local64"
   [(set_attr "type" "branch")
    (set_attr "length" "4,8")])
 
-;; Call to function which may be in another module.  Restore the TOC
-;; pointer (r2) after the call unless this is System V.
-;; Operand2 is nonzero if we are using the V.4 calling sequence and
-;; either the function was not prototyped, or it was prototyped as a
-;; variable argument function.  It is > 0 if FP registers were passed
-;; and < 0 if they were not.
+;; Call to indirect functions with the AIX abi using a 3 word descriptor.
+;; Operand0 is the addresss of the function to call
+;; Operand1 is the flag for System V.4 for unprototyped or FP registers
+;; Operand2 is the location in the function descriptor to load r2 from
+;; Operand3 is the stack location to hold the current TOC pointer
 
-(define_insn_and_split "*call_indirect_nonlocal_aix32_internal"
-  [(call (mem:SI (match_operand:SI 0 "register_operand" "c,*l"))
-		 (match_operand 1 "" "g,g"))
-   (use (mem:SI (plus:SI (match_operand:SI 2 "register_operand" "b,b") (const_int 4))))
-   (use (reg:SI 11))
-   (use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX"
+(define_insn_and_split "call_indirect_aix<ptrsize>"
+  [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
+	 (match_operand 1 "" "g,g"))
+   (use (match_operand:P 2 "memory_operand" "m,m"))
+   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (use (reg:P STATIC_CHAIN_REGNUM))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && TARGET_R11"
   "#"
   "&& reload_completed"
-  [(set (reg:SI 2)
-	(mem:SI (plus:SI (match_dup 2) (const_int 4))))
+  [(set (reg:P TOC_REGNUM) (match_dup 2))
    (parallel [(call (mem:SI (match_dup 0))
 		    (match_dup 1))
-	      (use (reg:SI 2))
-	      (use (reg:SI 11))
-	      (set (reg:SI 2)
-		   (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	      (clobber (reg:SI LR_REGNO))])]
+	      (use (reg:P TOC_REGNUM))
+	      (use (reg:P STATIC_CHAIN_REGNUM))
+	      (use (match_dup 3))
+	      (set (reg:P TOC_REGNUM) (match_dup 3))
+	      (clobber (reg:P LR_REGNO))])]
   ""
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_indirect_nonlocal_aix32"
-  [(call (mem:SI (match_operand:SI 0 "register_operand" "c,*l"))
+(define_insn "*call_indirect_aix<ptrsize>_internal"
+  [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
 	 (match_operand 1 "" "g,g"))
-   (use (reg:SI 2))
-   (use (reg:SI 11))
-   (set (reg:SI 2)
-	(mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T0l\;{l|lwz} 2,20(1)"
+   (use (reg:P TOC_REGNUM))
+   (use (reg:P STATIC_CHAIN_REGNUM))
+   (use (match_operand:P 2 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_dup 2))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && reload_completed && TARGET_R11"
+  "b%T0l\;<ptrload> 2,%2"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "8")])
 
-(define_insn "*call_nonlocal_aix32"
-  [(call (mem:SI (match_operand:SI 0 "symbol_ref_operand" "s"))
-	 (match_operand 1 "" "g"))
-   (use (match_operand:SI 2 "immediate_operand" "O"))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT
-   && DEFAULT_ABI == ABI_AIX
-   && (INTVAL (operands[2]) & CALL_LONG) == 0"
-  "bl %z0\;%."
-  [(set_attr "type" "branch")
-   (set_attr "length" "8")])
-   
-(define_insn_and_split "*call_indirect_nonlocal_aix64_internal"
-  [(call (mem:SI (match_operand:DI 0 "register_operand" "c,*l"))
-		 (match_operand 1 "" "g,g"))
-   (use (mem:DI (plus:DI (match_operand:DI 2 "register_operand" "b,b")
-			 (const_int 8))))
-   (use (reg:DI 11))
-   (use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_64BIT && DEFAULT_ABI == ABI_AIX"
+;; Like call_indirect_aix<ptrsize>, except don't load the static chain
+;; Operand0 is the addresss of the function to call
+;; Operand1 is the flag for System V.4 for unprototyped or FP registers
+;; Operand2 is the location in the function descriptor to load r2 from
+;; Operand3 is the stack location to hold the current TOC pointer
+
+(define_insn_and_split "call_indirect_aix<ptrsize>_nor11"
+  [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
+	 (match_operand 1 "" "g,g"))
+   (use (match_operand:P 2 "memory_operand" "m,m"))
+   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && !TARGET_R11"
   "#"
   "&& reload_completed"
-  [(set (reg:DI 2)
-	(mem:DI (plus:DI (match_dup 2) (const_int 8))))
+  [(set (reg:P TOC_REGNUM) (match_dup 2))
    (parallel [(call (mem:SI (match_dup 0))
 		    (match_dup 1))
-	      (use (reg:DI 2))
-	      (use (reg:DI 11))
-	      (set (reg:DI 2)
-		   (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
+	      (use (reg:P TOC_REGNUM))
+	      (use (match_dup 3))
+	      (set (reg:P TOC_REGNUM) (match_dup 3))
+	      (clobber (reg:P LR_REGNO))])]
   ""
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_indirect_nonlocal_aix64"
-  [(call (mem:SI (match_operand:DI 0 "register_operand" "c,*l"))
+(define_insn "*call_indirect_aix<ptrsize>_internal2"
+  [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
 	 (match_operand 1 "" "g,g"))
-   (use (reg:DI 2))
-   (use (reg:DI 11))
-   (set (reg:DI 2)
-	(mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_64BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T0l\;ld 2,40(1)"
+   (use (reg:P TOC_REGNUM))
+   (use (match_operand:P 2 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_dup 2))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && reload_completed && !TARGET_R11"
+  "b%T0l\;<ptrload> 2,%2"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "8")])
 
-(define_insn "*call_nonlocal_aix64"
-  [(call (mem:SI (match_operand:DI 0 "symbol_ref_operand" "s"))
-	 (match_operand 1 "" "g"))
-   (use (match_operand:SI 2 "immediate_operand" "O"))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_64BIT
-   && DEFAULT_ABI == ABI_AIX
-   && (INTVAL (operands[2]) & CALL_LONG) == 0"
-  "bl %z0\;%."
-  [(set_attr "type" "branch")
+;; Operand0 is the return result of the function
+;; Operand1 is the addresss of the function to call
+;; Operand2 is the flag for System V.4 for unprototyped or FP registers
+;; Operand3 is the location in the function descriptor to load r2 from
+;; Operand4 is the stack location to hold the current TOC pointer
+
+(define_insn_and_split "call_value_indirect_aix<ptrsize>"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
+	      (match_operand 2 "" "g,g")))
+   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (use (match_operand:P 4 "memory_operand" "m,m"))
+   (use (reg:P STATIC_CHAIN_REGNUM))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && TARGET_R11"
+  "#"
+  "&& reload_completed"
+  [(set (reg:P TOC_REGNUM) (match_dup 3))
+   (parallel [(set (match_dup 0)
+		   (call (mem:SI (match_dup 1))
+			 (match_dup 2)))
+	      (use (reg:P TOC_REGNUM))
+	      (use (reg:P STATIC_CHAIN_REGNUM))
+	      (use (match_dup 4))
+	      (set (reg:P TOC_REGNUM) (match_dup 4))
+	      (clobber (reg:P LR_REGNO))])]
+  ""
+  [(set_attr "type" "jmpreg")
+   (set_attr "length" "12")])
+
+(define_insn "*call_value_indirect_aix<ptrsize>_internal"
+  [(set (match_operand 0 "" "")
+	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
+	      (match_operand 2 "" "g,g")))
+   (use (reg:P TOC_REGNUM))
+   (use (reg:P STATIC_CHAIN_REGNUM))
+   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_dup 3))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && reload_completed && TARGET_R11"
+  "b%T1l\;<ptrload> 2,%3"
+  [(set_attr "type" "jmpreg")
    (set_attr "length" "8")])
 
-(define_insn_and_split "*call_value_indirect_nonlocal_aix32_internal"
+;; Like call_value_indirect_aix<ptrsize>, but don't load the static chain
+;; Operand0 is the return result of the function
+;; Operand1 is the addresss of the function to call
+;; Operand2 is the flag for System V.4 for unprototyped or FP registers
+;; Operand3 is the location in the function descriptor to load r2 from
+;; Operand4 is the stack location to hold the current TOC pointer
+
+(define_insn_and_split "call_value_indirect_aix<ptrsize>_nor11"
   [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:SI 1 "register_operand" "c,*l"))
-		      (match_operand 2 "" "g,g")))
-	(use (mem:SI (plus:SI (match_operand:SI 3 "register_operand" "b,b")
-			      (const_int 4))))
-	(use (reg:SI 11))
-	(use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	(clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX"
+	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
+	      (match_operand 2 "" "g,g")))
+   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (use (match_operand:P 4 "memory_operand" "m,m"))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && !TARGET_R11"
   "#"
   "&& reload_completed"
-  [(set (reg:SI 2)
-	(mem:SI (plus:SI (match_dup 3) (const_int 4))))
-   (parallel [(set (match_dup 0) (call (mem:SI (match_dup 1))
-				       (match_dup 2)))
-	      (use (reg:SI 2))
-	      (use (reg:SI 11))
-	      (set (reg:SI 2)
-		   (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	      (clobber (reg:SI LR_REGNO))])]
+  [(set (reg:P TOC_REGNUM) (match_dup 3))
+   (parallel [(set (match_dup 0)
+		   (call (mem:SI (match_dup 1))
+			 (match_dup 2)))
+	      (use (reg:P TOC_REGNUM))
+	      (use (match_dup 4))
+	      (set (reg:P TOC_REGNUM) (match_dup 4))
+	      (clobber (reg:P LR_REGNO))])]
   ""
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_value_indirect_nonlocal_aix32"
+(define_insn "*call_value_indirect_aix<ptrsize>_internal2"
   [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:SI 1 "register_operand" "c,*l"))
+	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
 	      (match_operand 2 "" "g,g")))
-   (use (reg:SI 2))
-   (use (reg:SI 11))
-   (set (reg:SI 2)
-	(mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T1l\;{l|lwz} 2,20(1)"
+   (use (reg:P TOC_REGNUM))
+   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_dup 3))
+   (clobber (reg:P LR_REGNO))]
+  "DEFAULT_ABI == ABI_AIX && reload_completed && !TARGET_R11"
+  "b%T1l\;<ptrload> 2,%3"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "8")])
 
+;; Call to function which may be in another module.  Restore the TOC
+;; pointer (r2) after the call unless this is System V.
+;; Operand2 is nonzero if we are using the V.4 calling sequence and
+;; either the function was not prototyped, or it was prototyped as a
+;; variable argument function.  It is > 0 if FP registers were passed
+;; and < 0 if they were not.
+
+(define_insn "*call_nonlocal_aix32"
+  [(call (mem:SI (match_operand:SI 0 "symbol_ref_operand" "s"))
+	 (match_operand 1 "" "g"))
+   (use (match_operand:SI 2 "immediate_operand" "O"))
+   (clobber (reg:SI LR_REGNO))]
+  "TARGET_32BIT
+   && DEFAULT_ABI == ABI_AIX
+   && (INTVAL (operands[2]) & CALL_LONG) == 0"
+  "bl %z0\;%."
+  [(set_attr "type" "branch")
+   (set_attr "length" "8")])
+   
+(define_insn "*call_nonlocal_aix64"
+  [(call (mem:SI (match_operand:DI 0 "symbol_ref_operand" "s"))
+	 (match_operand 1 "" "g"))
+   (use (match_operand:SI 2 "immediate_operand" "O"))
+   (clobber (reg:SI LR_REGNO))]
+  "TARGET_64BIT
+   && DEFAULT_ABI == ABI_AIX
+   && (INTVAL (operands[2]) & CALL_LONG) == 0"
+  "bl %z0\;%."
+  [(set_attr "type" "branch")
+   (set_attr "length" "8")])
+
 (define_insn "*call_value_nonlocal_aix32"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand:SI 1 "symbol_ref_operand" "s"))
@@ -12603,45 +12582,6 @@  (define_insn "*call_value_nonlocal_aix32
   [(set_attr "type" "branch")
    (set_attr "length" "8")])
 
-(define_insn_and_split "*call_value_indirect_nonlocal_aix64_internal"
-  [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:DI 1 "register_operand" "c,*l"))
-		      (match_operand 2 "" "g,g")))
-	(use (mem:DI (plus:DI (match_operand:DI 3 "register_operand" "b,b")
-			      (const_int 8))))
-	(use (reg:DI 11))
-	(use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	(clobber (reg:SI LR_REGNO))]
-  "TARGET_64BIT && DEFAULT_ABI == ABI_AIX"
-  "#"
-  "&& reload_completed"
-  [(set (reg:DI 2)
-	(mem:DI (plus:DI (match_dup 3) (const_int 8))))
-   (parallel [(set (match_dup 0) (call (mem:SI (match_dup 1))
-				       (match_dup 2)))
-	      (use (reg:DI 2))
-	      (use (reg:DI 11))
-	      (set (reg:DI 2)
-		   (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
-  ""
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "12")])
-
-(define_insn "*call_value_indirect_nonlocal_aix64"
-  [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:DI 1 "register_operand" "c,*l"))
-	      (match_operand 2 "" "g,g")))
-   (use (reg:DI 2))
-   (use (reg:DI 11))
-   (set (reg:DI 2)
-	(mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_64BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T1l\;ld 2,40(1)"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
 (define_insn "*call_value_nonlocal_aix64"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand:DI 1 "symbol_ref_operand" "s"))
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 175921)
+++ gcc/doc/invoke.texi	(working copy)
@@ -807,7 +807,7 @@  See RS/6000 and PowerPC Options.
 -msdata=@var{opt}  -mvxworks  -G @var{num}  -pthread @gol
 -mrecip -mrecip=@var{opt} -mno-recip -mrecip-precision @gol
 -mno-recip-precision @gol
--mveclibabi=@var{type} -mfriz -mno-friz}
+-mveclibabi=@var{type} -mfriz -mno-friz -mr11 -mno-r11}
 
 @emph{RX Options}
 @gccoptlist{-m64bit-doubles  -m32bit-doubles  -fpu  -nofpu@gol
@@ -16325,6 +16325,19 @@  Generate (do not generate) the @code{fri
 rounding a floating point value to 64-bit integer and back to floating
 point.  The @code{friz} instruction does not return the same value if
 the floating point number is too large to fit in an integer.
+
+@item -mr11
+@itemx -mno-r11
+@opindex mr11
+Generate (do not generate) code to load up the static chain register
+(@var{r11}) when calling through a pointer on AIX and 64-bit Linux
+systems where a function pointer points to a 3 word descriptor giving
+the function address, TOC value to be loaded in register @var{r2}, and
+static chain value to be loaded in register @var{r11}.  The
+@option{-mr11} is on by default.  You will not be able to call through
+pointers to nested functions or pointers to functions compiled in
+other languages that use the static chain if you use the
+@option{-mno-r11}.
 @end table
 
 @node RX Options
Index: gcc/testsuite/gcc.target/powerpc/no-r11-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/no-r11-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/no-r11-1.c	(revision 0)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { *-*-darwin* } { "*" } { "" } } */
+/* { dg-options "-O2 -mno-r11" } */
+
+int
+call_ptr (int (func) (void))
+{
+  return func () + 1;
+}
+
+/* { dg-final { scan-assembler-not "ld 11,16(3)" } } */
Index: gcc/testsuite/gcc.target/powerpc/no-r11-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/no-r11-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/no-r11-2.c	(revision 0)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { *-*-darwin* } { "*" } { "" } } */
+/* { dg-options "-O2 -mr11" } */
+
+int
+call_ptr (int (func) (void))
+{
+  return func () + 1;
+}
+
+/* { dg-final { scan-assembler "ld 11,16" } } */
Index: gcc/testsuite/gcc.target/powerpc/no-r11-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/no-r11-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/no-r11-3.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { *-*-darwin* } { "*" } { "" } } */
+/* { dg-options "-O2 -mno-r11" } */
+
+extern void ext_call (int (func) (void));
+
+int
+outer_func (int init)	/* { dg-error "-mno-r11 must not be used if you have trampolines" "" } */
+{
+  int value = init;
+
+  int inner (void)
+  {
+    return ++value;
+  }
+
+  ext_call (inner);
+  return value;
+}
+