diff mbox series

[V3,#7,of,10] , Implement PCREL_OPT relocation optimization

Message ID 20190826214341.GG11790@ibm-toto.the-meissners.org
State New
Headers show
Series [V3,#7,of,10] , Implement PCREL_OPT relocation optimization | expand

Commit Message

Michael Meissner Aug. 26, 2019, 9:43 p.m. UTC
This patch is a slight rework on V1 patch #7 (V1 patch #6 is not going
to be re-submitted at this time).

This patch adds a new RTL pass that supports creating the optimization
and flagging the appropriate load of external pc-relative addresses and
the use of that address in the basic block.

Here is the comment from the beginning of rs6000-pcrel.c that describes
the optimization.

/* This file implements a RTL pass that looks for pc-relative loads of the
   address of an external variable using the PCREL_GOT relocation and a single
   load/store that uses that GOT pointer.  If that is found we create the
   PCREL_OPT relocation to possibly convert:

	pld b,var@pcrel@got(0),1

	# possibly other instructions that do not use the base register 'b' or
        # the result register 'r'.

	lwz r,0(b)

   into:

	plwz r,var@pcrel(0),1

	# possibly other instructions that do not use the base register 'b' or
        # the result register 'r'.

	nop

   If the variable is not defined in the main program or the code using it is
   not in the main program, the linker put the address in the .got section and
   do:

	.section .got
	.Lvar_got:	.dword var

	.section .text
	pld b,.Lvar_got@pcrel(0),1

	# possibly other instructions that do not use the base register 'b' or
        # the result register 'r'.

	lwz r,0(b)
	
   We only look for a single usage in the basic block where the GOT pointer is
   loaded.  Multiple uses or references in another basic block will force us to
   not use the PCREL_OPT relocation.  */

I have built a bootstrap compiler on a little endian power8 system, and
there wre no regressions when I ran make check.  Assuming the previous
patches are checked in, can I check this into the trunk?

[gcc]
2019-08-26  Michael Meissner  <meissner@linux.ibm.com>

	* config.gcc (powerpc*-*-*): Add rs6000-pcrel.c.
	(rs6000*-*-*): Add rs6000-pcrel.c.
	* config/rs6000/pcrel.md: New file.
	* config/rs6000/predicates.md (one_reg_memory_operand): New
	predicate.
	(pcrel_ext_mem_operand): New predicate.
	* config/rs6000/rs6000-cpus.def (ADDRESSING_FUTURE_MASKS): Add
	-mpcrel-opt.
	(POWERPC_MASKS): Add -mpcrel-opt.
	* config/rs6000/rs6000-passes.def: Add pcrel optimization pass.
	* config/rs6000/rs6000-pcrel.c: New file.
	* config/rs6000/rs6000-protos.h (make_pass_pcrel_opt): New
	declaration.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	-mpcrel-opt support.
	(pcrel_opt_label_num): New state static flag.
	(rs6000_final_prescan_insn): Add -mpcrel-opt support.
	(rs6000_asm_output_opcode): Add -mpcrel-opt support.
	(rs6000_opt_masks): Add -mpcrel-opt.
	* config/rs6000/rs6000.md: Include pcrel.md.
	(pcrel_opt RTL attribute): New RTL attribute.
	* config/rs6000/t-rs6000 (rs6000-pcrel.o): Add build rules.
	(MD_INCLUDES): Add pcrel.md.

[gcc/testsuite]
2019-08-26   Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/pcrel-opt-di.c: New test for -mpcrel-opt.

Comments

Michael Meissner Aug. 28, 2019, 9:30 p.m. UTC | #1
Note, there is a minor error in this patch.  However, since I will need to
create V4 patches shortly, I will fix the bug in those patches.
Segher Boessenkool Sept. 3, 2019, 10:56 p.m. UTC | #2
Hi!

On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> /* This file implements a RTL pass that looks for pc-relative loads of the
>    address of an external variable using the PCREL_GOT relocation and a single
>    load/store that uses that GOT pointer.

Does this work better than having a peephole for it?  Is there some reason
you cannot do this with a peephole?


Segher
Michael Meissner Sept. 3, 2019, 11:20 p.m. UTC | #3
On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > /* This file implements a RTL pass that looks for pc-relative loads of the
> >    address of an external variable using the PCREL_GOT relocation and a single
> >    load/store that uses that GOT pointer.
> 
> Does this work better than having a peephole for it?  Is there some reason
> you cannot do this with a peephole?

Yes.  Peepholes only look at adjacent insns.  This optimization allows the load
of the GOT address to be separated from the eventual load or store.

Peephole2's are likely too early, because you really, really, really don't want
any other pass moving things around.
Segher Boessenkool Sept. 3, 2019, 11:33 p.m. UTC | #4
On Tue, Sep 03, 2019 at 07:20:13PM -0400, Michael Meissner wrote:
> On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> > Hi!
> > 
> > On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > > /* This file implements a RTL pass that looks for pc-relative loads of the
> > >    address of an external variable using the PCREL_GOT relocation and a single
> > >    load/store that uses that GOT pointer.
> > 
> > Does this work better than having a peephole for it?  Is there some reason
> > you cannot do this with a peephole?
> 
> Yes.  Peepholes only look at adjacent insns.

Huh.  Wow.  Would you believe I never knew that (or I forgot)?  Well, that
explains why peepholes aren't very effective for us at all, alright!

> This optimization allows the load
> of the GOT address to be separated from the eventual load or store.
> 
> Peephole2's are likely too early, because you really, really, really don't want
> any other pass moving things around.

That is a bit worrying...  What can go wrong?


Segher
Michael Meissner Sept. 4, 2019, 5:26 p.m. UTC | #5
On Tue, Sep 03, 2019 at 06:33:26PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 03, 2019 at 07:20:13PM -0400, Michael Meissner wrote:
> > On Tue, Sep 03, 2019 at 05:56:03PM -0500, Segher Boessenkool wrote:
> > > Hi!
> > > 
> > > On Mon, Aug 26, 2019 at 05:43:41PM -0400, Michael Meissner wrote:
> > > > /* This file implements a RTL pass that looks for pc-relative loads of the
> > > >    address of an external variable using the PCREL_GOT relocation and a single
> > > >    load/store that uses that GOT pointer.
> > > 
> > > Does this work better than having a peephole for it?  Is there some reason
> > > you cannot do this with a peephole?
> > 
> > Yes.  Peepholes only look at adjacent insns.
> 
> Huh.  Wow.  Would you believe I never knew that (or I forgot)?  Well, that
> explains why peepholes aren't very effective for us at all, alright!
> 
> > This optimization allows the load
> > of the GOT address to be separated from the eventual load or store.
> > 
> > Peephole2's are likely too early, because you really, really, really don't want
> > any other pass moving things around.
> 
> That is a bit worrying...  What can go wrong?

As I say in the comments, with PCREL_OPT, you must have exactly one load of the
address and one load or store that references the load of the address.  If
something duplicates one of the loads or stores, or adds another reference to
the address, or just moves it so we can't link the loading of the address to
the final load/store, it will not work.

For stores, the value being stored must be live at both the loading of the
address and the store.

For loads, the register being loaded must not be used between the loading of
the address and the final load.

I.e. in:

		PLD r1,foo@got@pcrel
	.Lpcrel1:

		# other instructions

		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
		LWZ r2,0(r1)

If you get lucky and foo is defined in the same compilation unit, this will get
turned into:

		PLWZ r2,foo@pcrel

		# other instructions

		NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

		PLD r1,.got.foo@pcrel

		# other instructions

		LWZ r2,0(r1)

		.section .got
	.got.foo: .quad	foo

So for loads, r2 must not be used between the PLD and LWZ instructions.

Similarly for stores:

		PLD r1,foo@got@pcrel
	.Lpcrel1:

		# other instructions

		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
		stw r2,0(r1)

If you get lucky, this becomes:

		PSTW r2,foo@pcrel

		# other instructions

		NOP

If foo is defined in a shared library (or you are linking for a shared library,
and foo is defined in the main program or another shared library), you get:

		PLD r1,.got.foo@pcrel

		# other instructions

		STW r2,0(r1)

		.section .got
	.got.foo: .quad	foo

So as I said, r2 must be live betweent he PLD and STW, because you don't know
if the PLD will be replaced with a PSTW or not.

So to keep other passes from 'improving' things, I opted to do the pass as the
last pass before final.
Segher Boessenkool Sept. 6, 2019, 12:09 p.m. UTC | #6
On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:

[snip]

> So to keep other passes from 'improving' things, I opted to do the pass as the
> last pass before final.

If the problem is that you do not properly analyse dependencies between
insns, well, fix that?

If this really needs to be done after everything else GCC does, that is
problematic.  What when you have two or more passes with that property?

If this really needs to be done after everything else GCC does, does it
belong in the compiler at all?  Should the assembler do it instead, or
the linker?


Segher
Michael Meissner Sept. 9, 2019, 8:32 p.m. UTC | #7
On Fri, Sep 06, 2019 at 07:09:45AM -0500, Segher Boessenkool wrote:
> On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:
> 
> [snip]
> 
> > So to keep other passes from 'improving' things, I opted to do the pass as the
> > last pass before final.
> 
> If the problem is that you do not properly analyse dependencies between
> insns, well, fix that?
> 
> If this really needs to be done after everything else GCC does, that is
> problematic.  What when you have two or more passes with that property?
> 
> If this really needs to be done after everything else GCC does, does it
> belong in the compiler at all?  Should the assembler do it instead, or
> the linker?

No, with the definition of the PCREL_OPT there can be only one reference.
Yeah, there might be other ways to do it, but fundamentally you need to do this
as late as possible and prevent any other optimizations from messing things up.

This is similar to figuring out whether a conditional branch is short enough or
you have to do reverse the conditional branch and do an unconditional jump.  If
you add any more code at that point that changes the sizes, it makes the whole
calculation moot.
Segher Boessenkool Sept. 9, 2019, 8:56 p.m. UTC | #8
On Mon, Sep 09, 2019 at 04:32:39PM -0400, Michael Meissner wrote:
> On Fri, Sep 06, 2019 at 07:09:45AM -0500, Segher Boessenkool wrote:
> > On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:
> > 
> > [snip]
> > 
> > > So to keep other passes from 'improving' things, I opted to do the pass as the
> > > last pass before final.
> > 
> > If the problem is that you do not properly analyse dependencies between
> > insns, well, fix that?
> > 
> > If this really needs to be done after everything else GCC does, that is
> > problematic.  What when you have two or more passes with that property?
> > 
> > If this really needs to be done after everything else GCC does, does it
> > belong in the compiler at all?  Should the assembler do it instead, or
> > the linker?
> 
> No, with the definition of the PCREL_OPT there can be only one reference.

I don't see why you think that argues for having to do it last?

> Yeah, there might be other ways to do it, but fundamentally you need to do this
> as late as possible and prevent any other optimizations from messing things up.

That is true for *everything*.


You haven't addressed the "if it should be after everything the compiler
does, does this belong in the compiler at all" question.


Segher
Michael Meissner Sept. 9, 2019, 10:39 p.m. UTC | #9
On Mon, Sep 09, 2019 at 03:56:52PM -0500, Segher Boessenkool wrote:
> On Mon, Sep 09, 2019 at 04:32:39PM -0400, Michael Meissner wrote:
> > On Fri, Sep 06, 2019 at 07:09:45AM -0500, Segher Boessenkool wrote:
> > > On Wed, Sep 04, 2019 at 01:26:27PM -0400, Michael Meissner wrote:
> > > 
> > > [snip]
> > > 
> > > > So to keep other passes from 'improving' things, I opted to do the pass as the
> > > > last pass before final.
> > > 
> > > If the problem is that you do not properly analyse dependencies between
> > > insns, well, fix that?
> > > 
> > > If this really needs to be done after everything else GCC does, that is
> > > problematic.  What when you have two or more passes with that property?
> > > 
> > > If this really needs to be done after everything else GCC does, does it
> > > belong in the compiler at all?  Should the assembler do it instead, or
> > > the linker?
> > 
> > No, with the definition of the PCREL_OPT there can be only one reference.
> 
> I don't see why you think that argues for having to do it last?
> 
> > Yeah, there might be other ways to do it, but fundamentally you need to do this
> > as late as possible and prevent any other optimizations from messing things up.
> 
> That is true for *everything*.
> 
> 
> You haven't addressed the "if it should be after everything the compiler
> does, does this belong in the compiler at all" question.

I believe it falls out of the basic PCREL_OPT description which I have in the
comments to the code.

For the load case, if you have:

		pld 4,esym@got@pcrel
		addi 6,6,1
		lwz 5,0(4)

I.e. load up the addresss of 'esym' into register 4.  If 'esym' is defined in
another module and both are in the main program, the linker converts the PLD
into:

		pla 4,esym@pcrel

If instead esym is defined in a shared library or you are linking a shared
library, the linker rewrites this as:

		pld 4,.esym.got
		.section .got
	.esym.got:
		.quad esym
		.section .text

I.e. load up the address of 'esym' from an address in the data section that has
an external relocation to 'esym' and the runtime loader will fill in the
address after loading any shared libraries.

And you want to use the PCREL_OPT optimization, the following must be true:

    1) Between the PLD and LWZ, register 4 must not be referenced;
    2) Register 4 dies on the LWZ instruction;
    3) Register 5 is not used between PLD and LWZ.

If these hold, you can modify it to use the PCREL_OPT optimization:

		pld 4,esym@got@pcrel
	.Lpcrel1:
		addi 6,6,1
		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
		lwz 5,0(4)

Then if 'esym' is in the main program, and you are linking for the main
program, the linker can change this to:

		plwz 4,esym@pcrel
		addi 6,6,1
		nop

Thus if any other pass, duplicates the LWZ, uses the result of the PLD, or uses
register 5 in that sequence, it will be invalid.  Hence, why I think it should
be the last pass before final.

Similarly for the store case.  If you have:

		pld 4,esym@got@pcrel
		addi 6,6,1
		stw 5,0(4)

And you want to use the PCREL_OPT optimization, the following must be true:

    1) Between the PLD and STW, register 4 must not be referenced;
    2) Register 4 dies on the LWZ instruction;
    3) Register 5 must have the value in it at the time of the PLD, and it must
       not be modified between the PLD and STW.

The compiler would generate:

		pld 4,esym@got@pcrel
	.Lpcrel2:
		addi 6,6,1
		.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
		stw 5,0(4)

And if the symbol is defined in the main program, and you are linking for the
main program, the linker will transform this to:

		pstw 5,esym@pcrel
		addi 6,6,1
		nop

The reason the .Lpcrel<x> label is defined after the PLD and we use
.Lpcrel<x>-8 is due to the prefixed instruction possibly having a NOP if it
otherwise would cross a 64-byte boundary, and you would have the relocation on
the wrong word.
diff mbox series

Patch

Index: gcc/config/rs6000/pcrel.md
===================================================================
--- gcc/config/rs6000/pcrel.md	(revision 274877)
+++ gcc/config/rs6000/pcrel.md	(working copy)
@@ -0,0 +1,563 @@ 
+;; PC relative support.
+;; Copyright (C) 2019 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;;		  Michael Meissner <meissner@linux.ibm.com>
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;;
+;; UNSPEC usage
+;;
+
+(define_c_enum "unspec"
+  [UNSPEC_PCREL_LD
+   UNSPEC_PCREL_ST
+  ])
+
+
+;; Optimize references to external variables to combine loading up the external
+;; address from the GOT and doing the load or store operation.
+;;
+;; A typical optimization looks like:
+;;
+;;		pld b,var@pcrel@got(0),1
+;;	100:
+;;		...
+;;		.reloc 100b-8,R_PPC64_PCREL_OPT,0
+;;		lwz r,0(b)
+;;
+;; If 'var' is an external variable defined in another module in the main
+;; program, and the code is being linked for the main program, then the
+;; linker can optimize this to:
+;;
+;;		plwz r,var(0),1
+;;	100:
+;;		...
+;;		nop
+;;
+;; If either the variable or the code being linked is defined in a shared
+;; library, then the linker puts the address in the GOT area, and the pld will
+;; load up the pointer, and then that pointer is used for the load or store.
+;; If there is more than one reference to the GOT pointer, the compiler will
+;; not do this optimization, and use the GOT pointer normally.
+;;
+;; Having the label after the pld instruction and using label-8 in the .reloc
+;; addresses the prefixed instruction properly.  If we put the label before the
+;; pld instruction, then the relocation might point to the NOP that is
+;; generated if the prefixed instruction is not aligned.
+;;
+;; We need to rewrite the normal GOT load operation before register allocation
+;; to include setting the eventual destination register for loads, or referring
+;; to the value being stored for store operations so that the proper register
+;; lifetime is set in case the optimization is done and the pld/lwz is
+;; converted to plwz/nop.
+
+(define_mode_iterator PO [QI HI SI DI SF DF
+			  V16QI V8HI V4SI V4SF V2DI V2DF V1TI KF
+			  (TF "FLOAT128_IEEE_P (TFmode)")])
+
+;; Vector types for pcrel optimization
+(define_mode_iterator POV [V16QI V8HI V4SI V4SF V2DI V2DF V1TI KF
+			   (TF "FLOAT128_IEEE_P (TFmode)")])
+
+;; Define the constraints for each mode for pcrel_opt.  The order of the
+;; constraints should have the most natural register class first.
+(define_mode_attr PO_constraint [(QI    "r,d,v")
+				 (HI    "r,d,v")
+				 (SI    "r,d,v")
+				 (DI    "r,d,v")
+				 (SF    "d,v,r")
+				 (DF    "d,v,r")
+				 (V16QI "wa,wn,wn")
+				 (V8HI  "wa,wn,wn")
+				 (V4SI  "wa,wn,wn")
+				 (V4SF  "wa,wn,wn")
+				 (V2DI  "wa,wn,wn")
+				 (V2DF  "wa,wn,wn")
+				 (V1TI  "wa,wn,wn")
+				 (KF    "wa,wn,wn")
+				 (TF    "wa,wn,wn")])
+
+;; Combiner pattern that combines the load of the GOT along with the load.  The
+;; first split pass before register allocation will split this into the load of
+;; the GOT that indicates the resultant value may be created if the PCREL_OPT
+;; relocation is done.
+;;
+;; The (set (match_dup 0)
+;;	    (unspec:<MODE> [(const_int 0)] UNSPEC_PCREL_LD))
+;;
+;; Is to signal to the register allocator that the destination register may be
+;; set by the GOT operation (if the linker does the optimization).
+;;
+;; We need to set the "cost" explicitly so that the instruction length is not
+;; used.  We return the same cost as a normal load (4 if we are not optimizing
+;; for speed, 8 if we are optimizing for speed)
+
+(define_insn_and_split "*mov<mode>_pcrel_opt_load"
+  [(set (match_operand:PO 0 "gpc_reg_operand")
+	(match_operand:PO 1 "pcrel_ext_mem_operand"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:<MODE> [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (match_dup 4))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Zero extend combiner patterns
+(define_insn_and_split "*mov<mode>_pcrel_opt_zero_extend"
+  [(set (match_operand:DI 0 "gpc_reg_operand")
+	(zero_extend:DI
+	 (match_operand:QHSI 1 "pcrel_ext_mem_operand")))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:DI [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (zero_extend:DI
+		    (match_dup 4)))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Sign extend combiner patterns
+(define_insn_and_split "*mov<mode>_pcrel_opt_sign_extend"
+  [(set (match_operand:DI 0 "gpc_reg_operand")
+	(sign_extend:DI
+	 (match_operand:HSI 1 "pcrel_ext_mem_operand")))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:DI [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (sign_extend:DI
+		    (match_dup 4)))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Float extend combiner pattern
+(define_insn_and_split "*movdf_pcrel_opt_float_extend"
+  [(set (match_operand:DF 0 "gpc_reg_operand")
+	(float_extend:DF
+	 (match_operand:SF 1 "pcrel_ext_mem_operand")))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 2)
+		   (match_dup 3))
+	      (set (match_dup 0)
+		   (unspec:DF [(const_int 0)] UNSPEC_PCREL_LD))
+	      (use (const_int 0))])
+   (parallel [(set (match_dup 0)
+		   (float_extend:DF
+		    (match_dup 4)))
+	      (use (match_dup 0))
+	      (use (const_int 0))])]
+{
+  rtx mem = operands[1];
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = XEXP (mem, 0);
+  operands[4] = change_address (mem, SFmode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "16")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; Patterns to load up the GOT address that may be changed into the load of the
+;; actual variable.
+(define_insn "*mov<mode>_pcrel_opt_load_got"
+  [(set (match_operand:DI 0 "base_reg_operand" "=b,b,b")
+	(match_operand:DI 1 "pcrel_ext_address"))
+   (set (match_operand:PO 2 "gpc_reg_operand" "=<PO_constraint>")
+	(unspec:PO [(const_int 0)] UNSPEC_PCREL_LD))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+{
+  return (INTVAL (operands[3])) ? "ld %0,%a1\n.Lpcrel%3:" : "ld %0,%a1";
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "12")
+   (set_attr "pcrel_opt" "load_got")
+   (set (attr "cost")
+	(if_then_else (match_test "optimize_function_for_speed_p (cfun)")
+		      (const_string "8")
+		      (const_string "4")))
+   (set_attr "prefixed" "yes")])
+
+;; The secondary load insns that uses the GOT pointer that may become a NOP.
+(define_insn "*mov<mode>_pcrel_opt_load_mem"
+  [(set (match_operand:QHI 0 "gpc_reg_operand" "+r,wa")
+	(match_operand:QHI 1 "one_reg_memory_operand" "Q,Q"))
+   (use (match_operand:QHI 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   l<wd>z %0,%1
+   lxsi<wd>zx %x0,%y1"
+  [(set_attr "type" "load,fpload")
+   (set_attr "pcrel_opt" "load,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsi_pcrel_opt_load_mem"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "+r,d,v")
+	(match_operand:SI 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:SI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lwz %0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load,no,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdi_pcrel_opt_load_mem"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,d,v")
+	(match_operand:DI 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   ld %0,%1
+   lfd %0,%1
+   lxsd %0,%1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsf_pcrel_opt_load_mem"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "+d,v,r")
+	(match_operand:SF 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:SF 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lfs %0,%1
+   lxssp %0,%1
+   lwz %0,%1"
+  [(set_attr "type" "fpload,fpload,load")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdf_pcrel_opt_load_mem"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "+d,v,r")
+	(match_operand:DF 1 "one_reg_memory_operand" "Q,Q,Q"))
+   (use (match_operand:DF 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lfd %0,%1
+   lxsd %0,%1
+   ld %0,%1"
+  [(set_attr "type" "fpload,fpload,load")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*mov<mode>_pcrel_opt_load_mem"
+  [(set (match_operand:POV 0 "gpc_reg_operand" "+wa")
+	(match_operand:POV 1 "one_reg_memory_operand" "Q"))
+   (use (match_operand:POV 2 "gpc_reg_operand" "0"))
+   (use (match_operand:DI 3 "const_int_operand" "n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "lxv %x0,%1"
+  [(set_attr "type" "vecload")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+;; Zero extend insns
+(define_insn "*mov<mode>_pcrel_opt_load_zero_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,wa")
+	(zero_extend:DI
+	 (match_operand:QHI 1 "one_reg_memory_operand" "Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   l<wd>z %0,%1
+   lxsi<wd>zx %x0,%y1"
+  [(set_attr "type" "load,fpload")
+   (set_attr "pcrel_opt" "load,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsi_pcrel_opt_load_zero_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,d,v")
+	(zero_extend:DI
+	 (match_operand:SI 1 "one_reg_memory_operand" "Q,Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lwz %0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load,no,no")
+   (set_attr "prefixed" "no")])
+
+;; Sign extend insns
+(define_insn "*movsi_pcrel_opt_load_sign_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,d,v")
+	(sign_extend:DI
+	 (match_operand:SI 1 "one_reg_memory_operand" "Q,Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lwa %0,%1
+   lfiwax %0,%y1
+   lxsiwax %x0,%y1"
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "pcrel_opt" "load,no,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn_and_split "*movhi_pcrel_opt_load_sign_extend2"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "+r,v")
+	(sign_extend:DI
+	 (match_operand:HI 1 "one_reg_memory_operand" "Q,Q")))
+   (use (match_operand:DI 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lha %0,%1
+   #"
+  "&& reload_completed && altivec_register_operand (operands[0], HImode)"
+  [(parallel [(set (match_dup 4)
+		   (match_dup 1))
+	      (use (match_dup 4))
+	      (use (const_int 0))])
+   (set (match_dup 0)
+	(sign_extend:DI
+	 (match_dup 4)))]
+{
+  operands[4] = gen_rtx_REG (HImode, REGNO (operands[0]));
+}
+  [(set_attr "type" "load,fpload")
+   (set_attr "pcrel_opt" "load,no")
+   (set_attr "length" "4,8")
+   (set_attr "prefixed" "no")])
+
+;; Floating point extend insn
+(define_insn "*movsf_pcrel_opt_load_float_extend2"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "+d,v")
+	(float_extend:DF
+	 (match_operand:SF 1 "one_reg_memory_operand" "Q,Q")))
+   (use (match_operand:DF 2 "gpc_reg_operand" "0,0"))
+   (use (match_operand:DI 3 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   lfs %0,%1
+   lxssp %0,%1"
+  [(set_attr "type" "fpload")
+   (set_attr "pcrel_opt" "load")
+   (set_attr "prefixed" "no")])
+
+; ;; Store combiner insns that merge together loading up the address of the
+; ;; external variable and doing the store.  This is split in the first split
+; ;; pass before register allocation.
+;;
+;; We need to set the "cost" explicitly so that the instruction length is not
+;; used.  We return the same cost as a normal store (4).
+(define_insn_and_split "*mov<mode>_pcrel_opt_store"
+  [(set (match_operand:PO 0 "pcrel_ext_mem_operand")
+ 	(match_operand:PO 1 "gpc_reg_operand"))]
+   "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64
+    && can_create_pseudo_p ()"
+   "#"
+   "&& 1"
+   [(set (match_dup 2)
+	 (unspec:DI [(match_dup 1)
+		     (match_dup 3)
+		     (const_int 0)] UNSPEC_PCREL_ST))
+    (parallel [(set (match_dup 4)
+		    (match_dup 1))
+	       (use (const_int 0))])]
+{
+  rtx mem = operands[0];
+  rtx addr = XEXP (mem, 0);
+  rtx got = gen_reg_rtx (DImode);
+
+  operands[2] = got;
+  operands[3] = addr;
+  operands[4] = change_address (mem, <MODE>mode, got);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "20")
+   (set_attr "pcrel_opt" "store_got")
+   (set_attr "cost" "4")
+   (set_attr "prefixed" "yes")])
+
+;; Load of the GOT address for a store operation that may be converted into a
+;; direct store.
+(define_insn "*mov<mode>_pcrel_opt_store_got"
+  [(set (match_operand:DI 0 "base_reg_operand" "=&b,&b,&b")
+	(unspec:DI [(match_operand:PO 1 "gpc_reg_operand" "<PO_constraint>")
+		    (match_operand:DI 2 "pcrel_ext_address")
+		    (match_operand:DI 3 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_ST))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+{
+  return (INTVAL (operands[3])) ? "ld %0,%a2\n.Lpcrel%3:" : "ld %0,%a2";
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "12")
+   (set_attr "pcrel_opt" "store_got")
+   (set_attr "cost" "4")
+   (set_attr "prefixed" "yes")])
+
+;; Secondary store instruction that uses the GOT pointer, and may be optimized
+;; into a NOP instruction.
+(define_insn "*mov<mode>_pcrel_opt_store_mem"
+  [(set (match_operand:QHI 0 "one_reg_memory_operand" "=Q,Q")
+	(match_operand:QHI 1 "gpc_reg_operand" "r,wa"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  st<wd> %1,%0
+  stxsi<wd>x %x1,%y0"
+  [(set_attr "type" "store,fpstore")
+   (set_attr "pcrel_opt" "store,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsi_pcrel_opt_store_mem"
+  [(set (match_operand:SI 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:SI 1 "gpc_reg_operand" "r,d,v"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  stw %1,%0
+  stfiwx %1,%y0
+  stxsiwx %1,%y0"
+  [(set_attr "type" "store,fpstore,fpstore")
+   (set_attr "pcrel_opt" "store,no,no")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdi_pcrel_opt_store_mem"
+  [(set (match_operand:DI 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:DI 1 "gpc_reg_operand" "r,d,v"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  std %1,%0
+  stfd %1,%0
+  stxsd %1,%0"
+  [(set_attr "type" "store,fpstore,fpstore")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movsf_pcrel_opt_store_mem"
+  [(set (match_operand:SF 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:SF 1 "gpc_reg_operand" "d,v,r"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  stfs %1,%0
+  stxssp %1,%0
+  stw %1,%0"
+  [(set_attr "type" "fpstore,fpstore,store")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*movdf_pcrel_opt_store_mem"
+  [(set (match_operand:DF 0 "one_reg_memory_operand" "=Q,Q,Q")
+	(match_operand:DF 1 "gpc_reg_operand" "d,v,r"))
+   (use (match_operand:DI 2 "const_int_operand" "n,n,n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+  stfd %1,%0
+  stxsd %1,%0
+  std %1,%0"
+  [(set_attr "type" "fpstore,fpstore,store")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
+
+(define_insn "*mov<mode>_pcrel_opt_store_mem"
+  [(set (match_operand:POV 0 "one_reg_memory_operand" "=Q")
+	(match_operand:POV 1 "gpc_reg_operand" "wa"))
+   (use (match_operand:DI 2 "const_int_operand" "n"))]
+  "TARGET_PCREL && TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "stxv %x1,%0"
+  [(set_attr "type" "vecstore")
+   (set_attr "pcrel_opt" "store")
+   (set_attr "prefixed" "no")])
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 274876)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -775,6 +775,13 @@  (define_predicate "indexed_or_indirect_o
   return indexed_or_indirect_address (op, mode);
 })
 
+;; Return 1 if the operand uses a single register for the address.
+(define_predicate "one_reg_memory_operand"
+  (match_code "mem")
+{
+  return REG_P (XEXP (op, 0));
+})
+
 ;; Like indexed_or_indirect_operand, but also allow a GPR register if direct
 ;; moves are supported.
 (define_predicate "reg_or_indexed_operand"
@@ -1695,6 +1702,15 @@  (define_predicate "pcrel_ext_address"
   return (SYMBOL_REF_P (op) && !SYMBOL_REF_LOCAL_P (op));
 })
 
+;; Return 1 if op is a memory operand to an external variable when we
+;; support pc-relative addressing and the PCREL_OPT relocation to
+;; optimize references to it.
+(define_predicate "pcrel_ext_mem_operand"
+  (match_code "mem")
+{
+  return pcrel_ext_address (XEXP (op, 0), Pmode);
+})
+
 ;; Return 1 if op is a memory operand that is not prefixed.
 (define_predicate "non_prefixed_mem_operand"
   (match_code "mem")
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 274875)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -86,6 +86,7 @@ 
    prefixed addressing, and we want to clear all of the addressing bits
    on targets that cannot support prefixed/pcrel addressing.  */
 #define ADDRESSING_FUTURE_MASKS	(OPTION_MASK_PCREL			\
+				 | OPTION_MASK_PCREL_OPT		\
 				 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
@@ -144,6 +145,7 @@ 
 				 | OPTION_MASK_P9_MISC			\
 				 | OPTION_MASK_P9_VECTOR		\
 				 | OPTION_MASK_PCREL			\
+				 | OPTION_MASK_PCREL_OPT		\
 				 | OPTION_MASK_POPCNTB			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_POWERPC64		\
Index: gcc/config/rs6000/rs6000-passes.def
===================================================================
--- gcc/config/rs6000/rs6000-passes.def	(revision 274864)
+++ gcc/config/rs6000/rs6000-passes.def	(working copy)
@@ -25,3 +25,12 @@  along with GCC; see the file COPYING3.
  */
 
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+
+/* The pcrel_opt pass must be the final pass before final.  This pass combines
+   references to external pc-relative variables with their use.  There must be
+   only one reference to the external pointer loaded in order to do the
+   optimization.  Otherwise we load up the addresses (either via PADDI if the
+   label is local or via a PLD from the got section if it is defined in another
+   module) and the value as a base pointer.  */
+
+  INSERT_PASS_BEFORE (pass_final, 1, pass_pcrel_opt);
Index: gcc/config/rs6000/rs6000-pcrel.c
===================================================================
--- gcc/config/rs6000/rs6000-pcrel.c	(revision 274877)
+++ gcc/config/rs6000/rs6000-pcrel.c	(working copy)
@@ -0,0 +1,463 @@ 
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file implements a RTL pass that looks for pc-relative loads of the
+   address of an external variable using the PCREL_GOT relocation and a single
+   load/store that uses that GOT pointer.  If that is found we create the
+   PCREL_OPT relocation to possibly convert:
+
+	pld b,var@pcrel@got(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	lwz r,0(b)
+
+   into:
+
+	plwz r,var@pcrel(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	nop
+
+   If the variable is not defined in the main program or the code using it is
+   not in the main program, the linker put the address in the .got section and
+   do:
+
+	.section .got
+	.Lvar_got:	.dword var
+
+	.section .text
+	pld b,.Lvar_got@pcrel(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	lwz r,0(b)
+	
+   We only look for a single usage in the basic block where the GOT pointer is
+   loaded.  Multiple uses or references in another basic block will force us to
+   not use the PCREL_OPT relocation.
+
+   This file also contains the support function for prefixed memory to emit the
+   leading 'p' in front of prefixed instructions, and to create the necessary
+   relocations needed for PCREL_OPT.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "print-rtl.h"
+#include "insn-attr.h"
+
+
+// Optimize pc-relative references
+const pass_data pass_data_pcrel =
+{
+  RTL_PASS,			// type
+  "pcrel",			// name
+  OPTGROUP_NONE,		// optinfo_flags
+  TV_NONE,			// tv_id
+  0,				// properties_required
+  0,				// properties_provided
+  0,				// properties_destroyed
+  0,				// todo_flags_start
+  TODO_df_finish,		// todo_flags_finish
+};
+
+// Pass data structures
+class pcrel : public rtl_opt_pass
+{
+private:
+  // Function to optimize pc relative loads/stores
+  unsigned int do_pcrel_opt (function *);
+
+  // A GOT pointer used for a load
+  void load_got (rtx_insn *);
+
+  // A load insn that uses the GOT ponter
+  void load_insn (rtx_insn *);
+
+  // A GOT pointer used for a store
+  void store_got (rtx_insn *);
+
+  // A store insn that uses the GOT ponter
+  void store_insn (rtx_insn *);
+
+  // Record the number of loads and stores optimized
+  unsigned long num_got_loads;
+  unsigned long num_got_stores;
+  unsigned long num_loads;
+  unsigned long num_stores;
+  unsigned long num_opt_loads;
+  unsigned long num_opt_stores;
+
+  // We record the GOT insn for each register that sets a GOT for a load or a
+  // store instruction.
+  rtx_insn *got_reg[32];
+
+public:
+  pcrel (gcc::context *ctxt)
+  : rtl_opt_pass (pass_data_pcrel, ctxt),
+    num_got_loads (0),
+    num_got_stores (0),
+    num_loads (0),
+    num_stores (0),
+    num_opt_loads (0),
+    num_opt_stores (0)
+  {}
+
+  ~pcrel (void)
+  {}
+
+  // opt_pass methods:
+  virtual bool gate (function *)
+  {
+    return TARGET_PCREL && TARGET_PCREL_OPT && optimize;
+  }
+
+  virtual unsigned int execute (function *fun)
+  {
+    return do_pcrel_opt (fun);
+  }
+
+  opt_pass *clone ()
+  {
+    return new pcrel (m_ctxt);
+  }
+};
+
+
+/* Return a marker to create the backward pointing label that links the load or
+   store to the insn that loads the adddress of an external label with
+   PCREL_GOT.  This allows us to create the necessary R_PPC64_PCREL_OPT
+   relocation to link the two instructions.  */
+
+static rtx
+pcrel_marker (void)
+{
+  static unsigned int label_number = 0;
+
+  label_number++;
+  return GEN_INT (label_number);
+}
+
+
+// Save the current PCREL_OPT load GOT insn address in the register # of the
+// GOT pointer that is loaded.
+//
+// The PCREL_OPT LOAD_GOT insn looks like:
+//
+//	(parallel [(set (base) (addr))
+//		   (set (reg)  (unspec [(const_int 0)] UNSPEC_PCREL_LD))
+//		   (use (marker))])
+//
+// The base register is the GOT address, and the marker is a numeric label that
+// is created in this pass if the only use of the GOT load pointer is for a
+// single load.
+
+void
+pcrel::load_got (rtx_insn *insn)
+{
+  rtx pattern = PATTERN (insn);
+  rtx set = XVECEXP (pattern, 0, 0);
+  int got = REGNO (SET_DEST (set));
+
+  gcc_assert (IN_RANGE (got, FIRST_GPR_REGNO+1, LAST_GPR_REGNO));
+  got_reg[got] = insn;
+  num_got_loads++;
+}
+
+// See if the use of this load of a GOT pointer is the only usage.  If so,
+// allocate a marker to create a label.
+//
+// The PCREL_OPT LOAD insn looks like:
+//
+//	(parallel [(set (reg) (mem))
+//		   (use (reg)
+//		   (use (marker))])
+//
+// Between the reg and the memory might be a SIGN_EXTEND, ZERO_EXTEND, or
+// FLOAT_EXTEND:
+//
+//	(parallel [(set (reg) (sign_extend (mem)))
+//		   (use (reg)
+//		   (use (marker))])
+
+void
+pcrel::load_insn (rtx_insn *insn)
+{
+  num_loads++;
+
+  /* If the optimizer has changed the load instruction, just use the GOT
+     pointer as an address.  */
+  rtx pattern = PATTERN (insn);
+  if (GET_CODE (pattern) != PARALLEL || XVECLEN (pattern, 0) != 3)
+    return;
+
+  rtx set = XVECEXP (pattern, 0, 0);
+  if (GET_CODE (set) != SET
+      || GET_CODE (XVECEXP (pattern, 0, 1)) != USE
+      || GET_CODE (XVECEXP (pattern, 0, 2)) != USE)
+    return;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+
+  if (!rtx_equal_p (dest, XEXP (XVECEXP (pattern, 0, 1), 0)))
+    return;
+
+  if (GET_CODE (src) == SIGN_EXTEND || GET_CODE (src) == ZERO_EXTEND
+      || GET_CODE (src) == FLOAT_EXTEND)
+    src = XEXP (src, 0);
+
+  if (!MEM_P (src))
+    return;
+
+  rtx addr = XEXP (src, 0);
+  if (!REG_P (addr))
+    return;
+
+  int r = REGNO (addr);
+  if (!IN_RANGE (r, FIRST_GPR_REGNO+1, LAST_GPR_REGNO))
+    return;
+
+  rtx_insn *got_insn = got_reg[r];
+
+  // See if this is the only reference, and there is a set of the GOT pointer
+  // previously in the same basic block.  If this is the only reference,
+  // optimize it.
+  if (got_insn
+      && get_attr_pcrel_opt (got_insn) == PCREL_OPT_LOAD_GOT
+      && !reg_used_between_p (addr, got_insn, insn)
+      && (find_reg_note (insn, REG_DEAD, addr) || rtx_equal_p (dest, addr)))
+    {
+      rtx marker = pcrel_marker ();
+      rtx got_use = XVECEXP (PATTERN (got_insn), 0, 2);
+      rtx insn_use = XVECEXP (pattern, 0, 2);
+
+      gcc_checking_assert (rtx_equal_p (XEXP (got_use, 0), const0_rtx));
+      gcc_checking_assert (rtx_equal_p (XEXP (insn_use, 0), const0_rtx));
+
+      XEXP (got_use, 0) = marker;
+      XEXP (insn_use, 0) = marker;
+      num_opt_loads++;
+    }
+
+  // Forget the GOT now that we've used it.
+  got_reg[r] = (rtx_insn *)0;
+}
+
+// Save the current PCREL_OPT store GOT insn address in the register # of the
+// GOT pointer that is loaded.
+//
+// The PCREL_OPT STORE_GOT insn looks like:
+//
+//	(set (set (base)
+//	     (unspec:DI [(src)
+//			 (addr)
+//			 (marker)] UNSPEC_PCREL_ST))
+//
+// The base register is the GOT address, and the marker is a numeric label that
+// is created in this pass or 0 to indicate there are other uses of the GOT
+// pointer.
+
+void
+pcrel::store_got (rtx_insn *insn)
+{
+  rtx pattern = PATTERN (insn);
+  int got = REGNO (SET_DEST (pattern));
+
+  gcc_checking_assert (IN_RANGE (got, FIRST_GPR_REGNO+1, LAST_GPR_REGNO));
+  got_reg[got] = insn;
+  num_got_stores++;
+}
+
+// See if the use of this store using a GOT pointer is the only usage.  If so,
+// allocate a marker to create a label.
+//
+// The PCREL_OPT STORE insn looks like:
+//
+//	(parallel [(set (mem) (reg))
+//		   (use (marker))])
+
+void
+pcrel::store_insn (rtx_insn *insn)
+{
+  num_stores++;
+
+  /* If the optimizer has changed the store instruction, just use the GOT
+     pointer as an address.  */
+  rtx pattern = PATTERN (insn);
+  if (GET_CODE (pattern) != PARALLEL || XVECLEN (pattern, 0) != 2)
+    return;
+
+  rtx set = XVECEXP (pattern, 0, 0);
+  if (GET_CODE (set) != SET || GET_CODE (XVECEXP (pattern, 0, 1)) != USE)
+    return;
+
+  rtx dest = SET_DEST (set);
+
+  if (!MEM_P (dest))
+    return;
+
+  rtx addr = XEXP (dest, 0);
+  if (!REG_P (addr))
+    return;
+
+  int r = REGNO (addr);
+  if (!IN_RANGE (r, FIRST_GPR_REGNO+1, LAST_GPR_REGNO))
+    return;
+
+  rtx_insn *got_insn = got_reg[r];
+
+  // See if this is the only reference, and there is a GOT pointer previously.
+  // If this is the only reference, optimize it.
+  if (got_insn
+      && get_attr_pcrel_opt (got_insn) == PCREL_OPT_STORE_GOT
+      && !reg_used_between_p (addr, got_insn, insn)
+      && find_reg_note (insn, REG_DEAD, addr))
+    {
+      rtx marker = pcrel_marker ();
+      rtx got_src = SET_SRC (PATTERN (got_insn));
+      rtx insn_use = XVECEXP (pattern, 0, 1);
+
+      gcc_checking_assert (rtx_equal_p (XVECEXP (got_src, 0, 2), const0_rtx));
+      gcc_checking_assert (rtx_equal_p (XEXP (insn_use, 0), const0_rtx));
+
+      XVECEXP (got_src, 0, 2) = marker;
+      XEXP (insn_use, 0) = marker;
+      num_opt_stores++;
+    }
+
+  // Forget the GOT now
+  got_reg[r] = (rtx_insn *)0;
+}
+
+// Optimize pcrel external variable references
+
+unsigned int
+pcrel::do_pcrel_opt (function *fun)
+{
+  basic_block bb;
+  rtx_insn *insn, *curr_insn = 0;
+
+  // Dataflow analysis for use-def chains.
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
+  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
+  df_analyze ();
+  df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE);
+
+  // Look at each basic block to see if there is a load of an external
+  // variable's GOT address, and a single load/store using that GOT address.
+  FOR_ALL_BB_FN (bb, fun)
+    {
+      bool clear_got_p = true;
+
+      FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
+	{
+	  if (clear_got_p)
+	    {
+	      memset ((void *) &got_reg[0], 0, sizeof (got_reg));
+	      clear_got_p = false;
+	    }
+
+	  if (NONJUMP_INSN_P (insn))
+	    {
+	      rtx pattern = PATTERN (insn);
+	      if (GET_CODE (pattern) == SET || GET_CODE (pattern) == PARALLEL)
+		{
+		  switch (get_attr_pcrel_opt (insn))
+		    {
+		    case PCREL_OPT_NO:
+		      break;
+
+		    case PCREL_OPT_LOAD_GOT:
+		      load_got (insn);
+		      break;
+
+		    case PCREL_OPT_LOAD:
+		      load_insn (insn);
+		      break;
+
+		    case PCREL_OPT_STORE_GOT:
+		      store_got (insn);
+		      break;
+
+		    case PCREL_OPT_STORE:
+		      store_insn (insn);
+		      break;
+
+		    default:
+		      gcc_unreachable ();
+		    }
+		}
+	    }
+
+	  /* Don't let the GOT load be moved before a label, jump, or call and
+	     the dependent load/store after the label, jump, or call.  */
+	  else if (JUMP_P (insn) || CALL_P (insn) || LABEL_P (insn))
+	    clear_got_p = true;
+	}
+    }
+
+  // Rebuild ud chains.
+  df_remove_problem (df_chain);
+  df_process_deferred_rescans ();
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE);
+  df_chain_add_problem (DF_UD_CHAIN);
+  df_analyze ();
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "\npc-relative optimizations:\n");
+      fprintf (dump_file, "\tgot loads        = %lu\n", num_got_loads);
+      fprintf (dump_file, "\tpotential loads  = %lu\n", num_loads);
+      fprintf (dump_file, "\toptimized loads  = %lu\n", num_opt_loads);
+      fprintf (dump_file, "\tgot stores       = %lu\n", num_got_stores);
+      fprintf (dump_file, "\tpotential stores = %lu\n", num_stores);
+      fprintf (dump_file, "\toptimized stores = %lu\n\n", num_opt_stores);
+    }
+
+  return 0;
+}
+
+
+rtl_opt_pass *
+make_pass_pcrel_opt (gcc::context *ctxt)
+{
+  return new pcrel (ctxt);
+}
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 274874)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -266,6 +266,7 @@  extern bool rs6000_linux_float_exception
 namespace gcc { class context; }
 class rtl_opt_pass;
 
+extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 274875)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -4415,7 +4415,7 @@  rs6000_option_override_internal (bool gl
 	  if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
 	    error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
 
-	  rs6000_isa_flags &= ~OPTION_MASK_PCREL;
+	  rs6000_isa_flags &= ~(OPTION_MASK_PCREL | OPTION_MASK_PCREL_OPT);
 	}
 
       /* Enable defaults if desired.  */
@@ -4429,7 +4429,11 @@  rs6000_option_override_internal (bool gl
 
 	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
 	      && TARGET_CMODEL == CMODEL_MEDIUM)
-	    rs6000_isa_flags |= OPTION_MASK_PCREL;
+	    {
+	      rs6000_isa_flags |= OPTION_MASK_PCREL;
+	      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) == 0)
+		rs6000_isa_flags |= OPTION_MASK_PCREL_OPT;
+	    }
 	}
     }
 
@@ -4453,6 +4457,15 @@  rs6000_option_override_internal (bool gl
       rs6000_isa_flags &= ~OPTION_MASK_PCREL;
     }
 
+  /* Check -mfuture debug switches.  */
+  if (!TARGET_PCREL && TARGET_PCREL_OPT)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0)
+	error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel");
+
+      rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -14244,13 +14257,40 @@  prefixed_paddi_p (rtx_insn *insn)
    instruction is printed out.  */
 static bool next_insn_prefixed_p;
 
+/* Numeric label that is the address of the GOT load instruction + 8 that we
+   link the R_PPC64_PCREL_OPT relocation to for on the next instruction.  */
+static unsigned int pcrel_opt_label_num;
+
 /* Define FINAL_PRESCAN_INSN if some processing needs to be done before
    outputting the assembler code.  On the PowerPC, we remember if the current
-   insn is a prefixed insn where we need to emit a 'p' before the insn.  */
+   insn is a prefixed insn where we need to emit a 'p' before the insn.
+
+   In addition, if the insn is part of a pc-relative reference to an external
+   label optimization, this is recorded also.  */
 void
-rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
+rs6000_final_prescan_insn (rtx_insn *insn, rtx operands[], int noperands)
 {
   next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO);
+
+  enum attr_pcrel_opt pcrel_attr = get_attr_pcrel_opt (insn);
+
+  /* For the load and store instructions that are tied to a GOT pointer, we
+     know that operand 3 contains a marker for loads and operand 2 contains
+     the marker for stores.  If it is non-zero, it is the numeric label where
+     we load the address + 8.  */
+  if (pcrel_attr == PCREL_OPT_LOAD)
+    {
+      gcc_assert (noperands >= 3);
+      pcrel_opt_label_num = INTVAL (operands[3]);
+    }
+  else if (pcrel_attr == PCREL_OPT_STORE)
+    {
+      gcc_assert (noperands >= 2);
+      pcrel_opt_label_num = INTVAL (operands[2]);
+    }
+  else
+    pcrel_opt_label_num = 0;
+
   return;
 }
 
@@ -14260,6 +14300,13 @@  rs6000_final_prescan_insn (rtx_insn *ins
 void
 rs6000_asm_output_opcode (FILE *stream)
 {
+  if (pcrel_opt_label_num)
+    {
+      fprintf (stream, ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t",
+	       pcrel_opt_label_num, pcrel_opt_label_num);
+      pcrel_opt_label_num = 0;
+    }
+
   if (next_insn_prefixed_p)
     fputc ('p', stream);
 
@@ -23422,6 +23469,7 @@  static struct rs6000_opt_mask const rs60
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
   { "pcrel",			OPTION_MASK_PCREL,		false, true  },
+  { "pcrel-opt",		OPTION_MASK_PCREL_OPT,		false, true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
   { "popcntd",			OPTION_MASK_POPCNTD,		false, true  },
   { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 274874)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -258,6 +258,31 @@  (define_attr "var_shift" "no,yes"
 ;; Is copying of this instruction disallowed?
 (define_attr "cannot_copy" "no,yes" (const_string "no"))
 
+;; Whether this instruction is part of the two instruction sequence that
+;; supports PCREL_OPT optimizations, where the linker can change code of the
+;; form:
+;;
+;;		pld b,var@got@pcrel
+;;	100:
+;;		# possibly other instructions
+;;		.reloc 100b-8,R_PPC64_PCREL_OPT,0
+;;		lwz r,0(b)
+;;
+;; into the following if 'var' is in the main program:
+;;
+;;		plwz r,0(b)
+;;		# possibly other instructions
+;;		nop
+;;
+;; The states are:
+;;	no		-- insn is not involved with PCREL_OPT optimizations
+;;	load_got	-- insn loads up the GOT pointer for a load instruction
+;;	load		-- insn is an offsettable load that uses the GOT pointer
+;;	store_got	-- insn loads up the GOT pointer for a store instruction
+;;	store		-- insn is an offsettable store that uses the GOT pointer
+
+(define_attr "pcrel_opt" "no,load_got,load,store_got,store" (const_string "no"))
+
 ;; Whether an insn is a prefixed insn, and an initial 'p' should be printed
 ;; before the instruction.  A prefixed instruction has a prefix instruction
 ;; word that extends the immediate value of the instructions from 12-16 bits to
@@ -14726,6 +14751,7 @@  (define_insn "*cmpeqb_internal"
   [(set_attr "type" "logical")])
 
 
+(include "pcrel.md")
 (include "sync.md")
 (include "vector.md")
 (include "vsx.md")
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 274864)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -577,3 +577,7 @@  Generate (do not generate) prefixed memo
 mpcrel
 Target Report Mask(PCREL) Var(rs6000_isa_flags)
 Generate (do not generate) pc-relative memory addressing.
+
+mpcrel-opt
+Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags)
+Generate (do not generate) pc-relative memory optimizations for externals.
Index: gcc/config/rs6000/t-rs6000
===================================================================
--- gcc/config/rs6000/t-rs6000	(revision 274864)
+++ gcc/config/rs6000/t-rs6000	(working copy)
@@ -47,6 +47,10 @@  rs6000-call.o: $(srcdir)/config/rs6000/r
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+rs6000-pcrel.o: $(srcdir)/config/rs6000/rs6000-pcrel.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
   $(srcdir)/config/rs6000/rs6000-cpus.def
 	$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -79,6 +83,7 @@  MD_INCLUDES = $(srcdir)/config/rs6000/rs
 	$(srcdir)/config/rs6000/predicates.md \
 	$(srcdir)/config/rs6000/constraints.md \
 	$(srcdir)/config/rs6000/darwin.md \
+	$(srcdir)/config/rs6000/pcrel.md \
 	$(srcdir)/config/rs6000/sync.md \
 	$(srcdir)/config/rs6000/vector.md \
 	$(srcdir)/config/rs6000/vsx.md \