, V9, #1 of 4, Add basic PCREL_OPT support for loads
diff mbox series

Message ID 20191116002338.GA3044@ibm-toto.the-meissners.org
State New
Headers show
Series
  • , V9, #1 of 4, Add basic PCREL_OPT support for loads
Related show

Commit Message

Michael Meissner Nov. 16, 2019, 12:23 a.m. UTC
This patch adds the basic support for the PCREL_OPT optimization for loads.  It
is on the long side, because I needed to create the infrastructure for the
support.  It creates a new pass that is run just before final to see if it can
find appropriate load external addresses and a single load/store using that
address.

I have bootstrapped a compiler with this a little endian power8 system, and
there were no regressions in the test suite.  Can I check this into the FSF
trunk once the patches it depends on from the V7 series have been checked in?

2019-11-15  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/pcrel-opt.c: New file to implement the PCREL_OPT
	optimization as a new pass.
	* config/rs6000/rs6000-passes.def: Add comment for the analyze
	swaps pass.  Add new pass to do the PCREL_OPT optimization.
	* config/rs6000/rs6000-protos.h (enum non_prefixed_form): Add a
	new case to recognize memory that meets PCREL_OPT requirements.
	(reg_to_non_prefixed): New declaration.
	(make_pass_pcrel_opt): New declaration.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	support for -mpcrel-opt.
	(rs6000_delegitimize_address): Convert PCREL_OPT unspec for GOT
	load back into a normal SYMBOL_REF.
	(print_operand): Add %r<n> to print the .reloc for PCREL_OPT.
	(rs6000_opt_masks): Add -mpcrel-opt.
	(address_to_insn_form): For addresses used with PCREL_OPT, only
	recognize addresses that can be used in a non-prefixed
	instruction.
	(reg_to_non_prefixed): Make global.
	* config/rs6000/rs6000.md (UNSPEC_PCREL_OPT_LD_GOT): New unspec.
	(UNSPEC_PCREL_OPT_LD_RELOC): New unspec.
	(pcrel_extern_addr): Make it a global insn.
	(PO mode iterator): New mode iterator for the PCREL_OPT
	optimization.
	(POV mode iterator): New mode iterator for the PCREL_OPT
	optimization.
	(pcrel_opt_ld_got<mode>, PO iterator): New insns for the PCREL_OPT
	optimization to load the address of an external symbol.
	(pcrel_opt_ld<mode>, QHSI iterator): New insns for the PCREL_OPT
	optimization to load the value of an external variable.
	(pcrel_opt_lddi): New insn for the PCREL_OPT optimization to load
	a DImode external variable.
	(pcrel_opt_ldsf): New insn for the PCREL_OPT optimization to load
	a SFmode external variable.
	(pcrel_opt_lddf): New insn for the PCREL_OPT optimization to load
	a DFmode external variable.
	(pcrel_opt_ld<mode>): New insns for the PCREL_OPT optimization to
	load external vector variables.
	* config/rs6000/rs6000.opt (-mpcrel-opt): New undocumented
	switch.
	* config/rs6000/t-rs6000 (pcrel-opt.o): Add build rules.
	* config.gcc (powerpc*-*-*): Add pcrel-opt.o.
	(rs6000*-*-*): Add pcrel-opt.o.

Patch
diff mbox series

Index: gcc/config/rs6000/pcrel-opt.c
===================================================================
--- gcc/config/rs6000/pcrel-opt.c	(revision 278311)
+++ gcc/config/rs6000/pcrel-opt.c	(working copy)
@@ -0,0 +1,623 @@ 
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file implements a RTL pass that looks for pc-relative loads of the
+   address of an external variable using the PCREL_GOT relocation and a single
+   load that uses that GOT pointer.  If that is found we create the PCREL_OPT
+   relocation to possibly convert:
+
+	pld b,var@pcrel@got(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	lwz r,0(b)
+
+   into:
+
+	plwz r,var@pcrel(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	nop
+
+   If the variable is not defined in the main program or the code using it is
+   not in the main program, the linker put the address in the .got section and
+   do:
+
+	.section .got
+	.Lvar_got:	.dword var
+
+	.section .text
+	pld b,.Lvar_got@pcrel(0),1
+
+	# possibly other instructions that do not use the base register 'b' or
+        # the result register 'r'.
+
+	lwz r,0(b)
+	
+   We only look for a single usage in the basic block where the GOT pointer is
+   loaded.  Multiple uses or references in another basic block will force us to
+   not use the PCREL_OPT relocation.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "expmed.h"
+#include "optabs.h"
+#include "recog.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "print-rtl.h"
+#include "insn-attr.h"
+#include "insn-codes.h"
+
+
+// Optimize pc-relative references
+const pass_data pass_data_pcrel_opt =
+{
+  RTL_PASS,			// type
+  "pcrel_opt",			// name
+  OPTGROUP_NONE,		// optinfo_flags
+  TV_NONE,			// tv_id
+  0,				// properties_required
+  0,				// properties_provided
+  0,				// properties_destroyed
+  0,				// todo_flags_start
+  TODO_df_finish,		// todo_flags_finish
+};
+
+// Maximum number of insns to scan between the load address and the load that
+// uses that address.
+const int MAX_PCREL_OPT_INSNS	= 10;
+
+/* Next PCREL_OPT label number.  */
+static unsigned int pcrel_opt_next_num;
+
+// Pass data structures
+class pcrel_opt : public rtl_opt_pass
+{
+private:
+  // Pass to look for insns loading the PC-relative GOT address and then
+  // possibly optimizing them.
+  unsigned int do_pcrel_opt_pass (function *);
+
+  // Given the load of a PC-relative GOT address, optimize it.
+  void do_pcrel_opt_got_addr (rtx_insn *);
+
+  // Optimize a particular PC-relative load
+  bool do_pcrel_opt_load (rtx_insn *, rtx_insn *);
+
+  // Various counters
+  struct {
+    unsigned long gots;
+    unsigned long loads;
+    unsigned long load_separation[MAX_PCREL_OPT_INSNS+1];
+  } counters;
+
+public:
+  pcrel_opt (gcc::context *ctxt)
+  : rtl_opt_pass (pass_data_pcrel_opt, ctxt)
+  {}
+
+  ~pcrel_opt (void)
+  {}
+
+  // opt_pass methods:
+  virtual bool gate (function *)
+  {
+    return TARGET_PCREL && TARGET_PCREL_OPT && optimize;
+  }
+
+  virtual unsigned int execute (function *fun)
+  {
+    return do_pcrel_opt_pass (fun);
+  }
+
+  opt_pass *clone ()
+  {
+    return new pcrel_opt (m_ctxt);
+  }
+};
+
+
+// Optimize a PC-relative load address to be used in a load.
+
+// If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no
+// additional references to the address register, the address register dies at
+// the load, and no references to the load), convert insns of the form:
+//
+//	(set (reg:DI addr)
+//	     (symbol_ref:DI "ext_symbol"))
+//
+//	...
+//
+//	(set (reg:<MODE> value)
+//	     (mem:<MODE> (reg:DI addr)))
+//
+// into:
+//
+//	(parallel [(set (reg:DI addr)
+//                      (unspec:DI [(symbol_ref:DI "ext_symbol")
+//                                  (const_int label_num)]
+//                                 UNSPEC_PCREL_OPT_LD_GOT))
+//                 (clobber (reg:<MODE> value))])
+//
+//	...
+//
+//	(parallel [(set (reg:<MODE>)
+//                      (unspec:<MODE> [(mem:<MODE> (reg:DI addr))
+//                                      (const_int label_num)]
+//                                     UNSPEC_PCREL_OPT_LD_RELOC))
+//                 (clobber (reg:DI addr))])
+//
+//
+// The UNSPEC_PCREL_OPT_LD_GOT insn will generate the load address plus a
+// definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_LD_RELOC insn
+// will generate the .reloc to tell the linker to tie the load address and load
+// using that address together.
+//
+//	pld b,ext_symbol@got@pcrel(0),1
+// .Lpcrel1:
+//
+//	...
+//
+//	.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
+//	lwz r,0(b)
+//
+// If ext_symbol is defined in another object file in the main program and we
+// are linking the main program, the linker will convert the above instructions
+// to:
+//
+//	plwz r,ext_symbol@got@pcrel(0),1
+//
+//	...
+//
+//	nop
+//
+// Return true if the PCREL_OPT load optimization succeeded.
+
+bool
+pcrel_opt::do_pcrel_opt_load (rtx_insn *got_insn,	// insn loading GOT
+			      rtx_insn *load_insn)	// insn using GOT
+{
+  rtx got_set = PATTERN (got_insn);
+  rtx got = SET_DEST (got_set);
+  rtx got_addr = SET_SRC (got_set);
+  rtx load_set = single_set (load_insn);
+  rtx reg = SET_DEST (load_set);
+  rtx mem = SET_SRC (load_set);
+  machine_mode reg_mode = GET_MODE (reg);
+  machine_mode mem_mode = GET_MODE (mem);
+  rtx mem_inner = mem;
+  unsigned int reg_regno = reg_or_subregno (reg);
+
+  if (!MEM_P (mem_inner))
+    return false;
+
+  // If this is LFIWAX or similar instructions that are indexed only, we can't
+  // do the optimization.
+  enum non_prefixed_form non_prefixed = reg_to_non_prefixed (reg, mem_mode);
+  if (non_prefixed == NON_PREFIXED_X)
+    return false;
+
+  // The optimization will only work on non-prefixed offsettable loads.
+  rtx addr = XEXP (mem_inner, 0);
+  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
+  if (iform != INSN_FORM_BASE_REG
+      && iform != INSN_FORM_D
+      && iform != INSN_FORM_DS
+      && iform != INSN_FORM_DQ)
+    return false;
+
+  // Allocate a new PC-relative label, and update the GOT insn.  If the GOT
+  // register is not the same register being loaded, add a clobber just in case
+  // something runs after this pass.
+  //
+  // (parallel [(set (got)
+  //                 (unspec [(symbol_ref got_addr)
+  //                          (const_int label_num)]
+  //                         UNSPEC_PCREL_OPT_LD_GOT))
+  //            (clobber (reg))])
+
+  ++pcrel_opt_next_num;
+  unsigned int got_regno = reg_or_subregno (got);
+  rtx label_num = GEN_INT (pcrel_opt_next_num);
+  rtvec v_got = gen_rtvec (2, got_addr, label_num);
+  rtx got_unspec = gen_rtx_UNSPEC (Pmode, v_got, UNSPEC_PCREL_OPT_LD_GOT);
+  rtx got_new_set = gen_rtx_SET (got, got_unspec);
+  rtx got_clobber = gen_rtx_CLOBBER (VOIDmode,
+				     (got_regno == reg_regno
+				      ? gen_rtx_SCRATCH (reg_mode)
+				      : reg));
+
+  PATTERN (got_insn)
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, got_new_set, got_clobber));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+  INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0);
+  if (INSN_CODE (got_insn) < 0)
+    {
+      PATTERN (got_insn) = got_set;
+      INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0);
+      return false;
+    }
+
+  // Update the load insn.  Add an explicit clobber of the GOT register just in
+  // case something runs after this pass.
+  //
+  // (parallel [(set (reg)
+  //                 (unspec:<MODE> [(mem (got)
+  //                                 (const_int label_num)]
+  //                                UNSPEC_PCREL_OPT_LD_RELOC))
+  //            (clobber (reg:DI got))])
+
+  rtvec v_load = gen_rtvec (2, mem_inner, label_num);
+  rtx new_load = gen_rtx_UNSPEC (GET_MODE (mem_inner), v_load,
+				 UNSPEC_PCREL_OPT_LD_RELOC);
+
+  rtx old_load_set = PATTERN (load_insn);
+  rtx new_load_set = gen_rtx_SET (reg, new_load);
+  rtx load_clobber = gen_rtx_CLOBBER (VOIDmode,
+				      (got_regno == reg_regno
+				       ? gen_rtx_SCRATCH (Pmode)
+				       : got));
+  PATTERN (load_insn)
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_load_set, load_clobber));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+
+  INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0);
+  if (INSN_CODE (load_insn) < 0)
+    {
+      PATTERN (got_insn) = got_set;
+      INSN_CODE (got_insn) = recog (PATTERN (got_insn), got_insn, 0);
+
+      PATTERN (load_insn) = old_load_set;
+      INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0);
+      return false;
+    }
+
+  return true;
+}
+
+
+/* Given an insn, find the next insn in the basic block.  Stop if we find a the
+   end of a basic block, such as a label, call or jump, and return NULL.  */
+
+static rtx_insn *
+next_active_insn_in_basic_block (rtx_insn *insn)
+{
+  insn = NEXT_INSN (insn);
+
+  while (insn != NULL_RTX)
+    {
+      /* If the basic block ends or there is a jump of some kind, exit the
+	 loop.  */
+      if (CALL_P (insn)
+	  || JUMP_P (insn)
+	  || JUMP_TABLE_DATA_P (insn)
+	  || LABEL_P (insn)
+	  || BARRIER_P (insn))
+	return NULL;
+
+      /* If this is a real insn, return it.  */
+      if (!insn->deleted ()
+	  && NONJUMP_INSN_P (insn)
+	  && GET_CODE (PATTERN (insn)) != USE
+	  && GET_CODE (PATTERN (insn)) != CLOBBER)
+	return insn;
+
+      /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
+      insn = NEXT_INSN (insn);
+    }
+
+  return NULL;
+}
+
+
+// Given an insn with that loads up a base register with the address of an
+// external symbol (GOT address), see if we can optimize it with the PCREL_OPT
+// optimization.
+
+void
+pcrel_opt::do_pcrel_opt_got_addr (rtx_insn *got_insn)
+{
+  int num_insns = 0;
+
+  // Do some basic validation.
+  rtx got_set = PATTERN (got_insn);
+  if (GET_CODE (got_set) != SET)
+    return;
+
+  rtx got = SET_DEST (got_set);
+  rtx got_addr = SET_SRC (got_set);
+
+  if (!base_reg_operand (got, Pmode)
+      || !pcrel_external_address (got_addr, Pmode))
+    return;
+
+  rtx_insn *insn = got_insn;
+  bool looping = true;
+  bool had_load = false;	// whether intermediate insns had a load
+  bool had_store = false;	// whether intermediate insns had a store
+  bool is_load = false;		// whether the current insn is a load
+  bool is_store = false;	// whether the current insn is a store
+
+  // Check the following insns and see if it is a load or store that uses the
+  // GOT address.  If we can't do the optimization, just return.
+  while (looping)
+    {
+      // Don't allow too many insns between the load of the GOT address and the
+      // eventual load or store.
+      if (++num_insns >= MAX_PCREL_OPT_INSNS)
+	return;
+
+      insn = next_active_insn_in_basic_block (insn);
+      if (!insn)
+	return;
+
+      // See if the current insn is a load or store
+      switch (get_attr_type (insn))
+	{
+	  // While load of the GOT register is a 'load' for scheduling
+	  // purposes, it should be safe to allow other load GOTs between the
+	  // load of the GOT address and the store using that address.
+	case TYPE_LOAD:
+	  if (INSN_CODE (insn) == CODE_FOR_pcrel_extern_addr)
+	    {
+	      is_load = is_store = false;
+	      break;
+	    }
+	  else
+	    {
+	      rtx set = single_set (insn);
+	      if (set)
+		{
+		  rtx src = SET_SRC (set);
+		  if (GET_CODE (src) == UNSPEC
+		      && XINT (src, 1) == UNSPEC_PCREL_OPT_LD_GOT)
+		    {
+		      is_load = is_store = false;
+		      break;
+		    }
+		}
+	    }
+	  /* fall through */
+
+	case TYPE_FPLOAD:
+	case TYPE_VECLOAD:
+	  is_load = true;
+	  is_store = false;
+	  break;
+
+	case TYPE_STORE:
+	case TYPE_FPSTORE:
+	case TYPE_VECSTORE:
+	  is_load = false;
+	  is_store = true;
+	  break;
+
+	  // For a first pass, don't do the optimization through atomic
+	  // operations.
+	case TYPE_LOAD_L:
+	case TYPE_STORE_C:
+	case TYPE_HTM:
+	case TYPE_HTMSIMPLE:
+	  return;
+
+	default:
+	  is_load = is_store = false;
+	  break;
+	}
+
+      // If the GOT register was referenced, it must also die in the same insn.
+      if (reg_referenced_p (got, PATTERN (insn)))
+	{
+	  if (!dead_or_set_p (insn, got))
+	    return;
+
+	  looping = false;
+	}
+
+      // If it dies by being set without being referenced, exit.
+      else if (dead_or_set_p (insn, got))
+	return;
+
+      // If it isn't the insn we want, remember if there were loads or stores.
+      else
+	{
+	  had_load |= is_load;
+	  had_store |= is_store;
+	}
+    }
+
+  // If the insn does not use the GOT pointer, or the GOT pointer does not die
+  // at this insn, we can't do the optimization.
+  if (!reg_referenced_p (got, PATTERN (insn)) || !dead_or_set_p (insn, got))
+    return;
+
+  // If the last insn is not a load, we can't do the optimization.  If it is a
+  // load, get the register and memory.
+  rtx load_set = single_set (insn);
+  if (!load_set)
+    return;
+
+  rtx reg = NULL_RTX;
+  rtx mem = NULL_RTX;
+
+  // Get register and memory, and validate it.
+  if (is_load)
+    {
+      reg = SET_DEST (load_set);
+      mem = SET_SRC (load_set);
+      if (!MEM_P (mem))
+	return;
+
+      if (!REG_P (reg) && !SUBREG_P (reg))
+	return;
+
+      // If there were any stores in the insns between loading the GOT address
+      // and doing the load, turn off the optimization.
+      if (had_store)
+	return;
+    }
+
+  else
+    return;
+
+  machine_mode mode = GET_MODE (reg);
+  unsigned int regno = reg_or_subregno (reg);
+  unsigned int size = GET_MODE_SIZE (mode);
+
+  // Eliminate various possiblies involving multiple instructions.
+  if (get_attr_length (insn) != 4)
+    return;
+
+  if (size == 16 && !VSX_REGNO_P (regno))
+    return;
+
+  if (size > 16)
+    return;
+
+  if (mode == TFmode && !TARGET_IEEEQUAD)
+    return;
+
+  // If the register being loaded was used or set between the load of the GOT
+  // address and the load using the GOT address, we can't do the optimization.
+  if (reg_used_between_p (reg, got_insn, insn)
+      || reg_set_between_p (reg, got_insn, insn))
+    return;
+
+  // Process the load in detail
+  if (is_load)
+    {
+      if (do_pcrel_opt_load (got_insn, insn))
+	{
+	  counters.loads++;
+	  counters.load_separation[num_insns-1]++;
+	}
+    }
+
+  return;
+}
+
+
+// Optimize pcrel external variable references
+
+unsigned int
+pcrel_opt::do_pcrel_opt_pass (function *fun)
+{
+  basic_block bb;
+  rtx_insn *insn, *curr_insn = 0;
+
+  memset ((char *) &counters, '\0', sizeof (counters));
+
+  // Dataflow analysis for use-def chains.
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
+  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
+  df_note_add_problem ();
+  df_analyze ();
+  df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE);
+
+  // Look at each basic block to see if there is a load of an external
+  // variable's GOT address, and a single load using that GOT address.
+  FOR_ALL_BB_FN (bb, fun)
+    {
+      FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
+	{
+	  if (NONJUMP_INSN_P (insn)
+	      && INSN_CODE (insn) == CODE_FOR_pcrel_extern_addr)
+	    {
+	      counters.gots++;
+	      do_pcrel_opt_got_addr (insn);
+	    }
+	}
+    }
+
+  df_remove_problem (df_chain);
+  df_process_deferred_rescans ();
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE);
+  df_chain_add_problem (DF_UD_CHAIN);
+  df_note_add_problem ();
+  df_analyze ();
+
+  if (dump_file)
+    {
+      if (!counters.gots)
+	fprintf (dump_file, "\nNo external symbols were referenced\n");
+
+      else
+	{
+	  fprintf (dump_file,
+		   "\n# of loads of an address of an external symbol = %lu\n",
+		   counters.gots);
+
+	  if (!counters.loads)
+	    fprintf (dump_file,
+		     "\nNo PCREL_OPT load optimizations were done\n");
+
+	  else
+	    {
+	      fprintf (dump_file, "# of PCREL_OPT loads = %lu\n",
+		       counters.loads);
+
+	      fprintf (dump_file, "# of adjacent PCREL_OPT loads = %lu\n",
+		       counters.load_separation[0]);
+
+	      for (int i = 1; i < MAX_PCREL_OPT_INSNS; i++)
+		{
+		  if (counters.load_separation[i])
+		    fprintf (dump_file,
+			     "# of PCREL_OPT loads separated by %d insn%s = %lu\n",
+			     i, (i == 1) ? "" : "s",
+			     counters.load_separation[i]);
+		}
+	    }
+	}
+
+      fprintf (dump_file, "\n");
+    }
+
+  return 0;
+}
+
+
+rtl_opt_pass *
+make_pass_pcrel_opt (gcc::context *ctxt)
+{
+  return new pcrel_opt (ctxt);
+}
Index: gcc/config/rs6000/rs6000-passes.def
===================================================================
--- gcc/config/rs6000/rs6000-passes.def	(revision 278287)
+++ gcc/config/rs6000/rs6000-passes.def	(working copy)
@@ -24,4 +24,15 @@  along with GCC; see the file COPYING3.
    REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
  */
 
+  /* Pass to add the appropriate vector swaps on power8 little endian systems.
+     The power8 does not have instructions that automaticaly do the byte swaps
+     for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+
+  /* Pass to do the PCREL_OPT optimization that combines the load of an
+     external symbol's address along with a single load or store using that
+     address as a base register.  This pass should be the last pass before
+     final, so that it can make sure the address being loaded up dies in a
+     single reference, and it doesn't have to worry about something else using
+     the address.  */
+  INSERT_PASS_BEFORE (pass_final, 1, pass_pcrel_opt);
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 278287)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -183,11 +183,13 @@  enum non_prefixed_form {
   NON_PREFIXED_D,		/* All 16-bits are valid.  */
   NON_PREFIXED_DS,		/* Bottom 2 bits must be 0.  */
   NON_PREFIXED_DQ,		/* Bottom 4 bits must be 0.  */
-  NON_PREFIXED_X		/* No offset memory form exists.  */
+  NON_PREFIXED_X,		/* No offset memory form exists.  */
+  NON_PREFIXED_PCREL_OPT	/* Offset for PCREL_OPT optimizations.  */
 };
 
 extern enum insn_form address_to_insn_form (rtx, machine_mode,
 					    enum non_prefixed_form);
+extern enum non_prefixed_form reg_to_non_prefixed (rtx, machine_mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
@@ -303,6 +305,7 @@  namespace gcc { class context; }
 class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
+extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 278287)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -4213,7 +4213,7 @@  rs6000_option_override_internal (bool gl
 	  if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
 	    error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");
 
-	  rs6000_isa_flags &= ~OPTION_MASK_PCREL;
+	  rs6000_isa_flags &= ~(OPTION_MASK_PCREL | OPTION_MASK_PCREL_OPT);
 	}
 
       /* Enable defaults if desired.  */
@@ -4227,7 +4227,8 @@  rs6000_option_override_internal (bool gl
 
 	  if (!explicit_pcrel && TARGET_PCREL_DEFAULT
 	      && TARGET_CMODEL == CMODEL_MEDIUM)
-	    rs6000_isa_flags |= OPTION_MASK_PCREL;
+	    rs6000_isa_flags |= (OPTION_MASK_PCREL
+				 | OPTION_MASK_PCREL_OPT);
 	}
     }
 
@@ -4248,7 +4249,17 @@  rs6000_option_override_internal (bool gl
       if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0)
 	error ("%qs requires %qs", "-mpcrel", "-mprefixed-addr");
 
-      rs6000_isa_flags &= ~OPTION_MASK_PCREL;
+      rs6000_isa_flags &= ~(OPTION_MASK_PCREL
+			    | OPTION_MASK_PCREL_OPT);
+    }
+
+  /* Check -mfuture debug switches.  */
+  if (!TARGET_PCREL && TARGET_PCREL_OPT)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0)
+	error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel");
+
+      rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
     }
 
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
@@ -8379,7 +8390,9 @@  rs6000_delegitimize_address (rtx orig_x)
 {
   rtx x, y, offset;
 
-  if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR)
+  if (GET_CODE (orig_x) == UNSPEC
+      && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR
+	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_GOT))
     orig_x = XVECEXP (orig_x, 0, 0);
 
   orig_x = delegitimize_mem_from_attrs (orig_x);
@@ -13016,6 +13029,19 @@  print_operand (FILE *file, rtx x, int co
 	fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO));
       return;
 
+    case 'r':
+      /* X is a label number for the PCREL_OPT optimization.  Emit the .reloc
+	 to enable this optimization, unless the value is 0.  */
+      gcc_assert (CONST_INT_P (x));
+      if (UINTVAL (x) != 0)
+	{
+	  unsigned int label_num = UINTVAL (x);
+	  fprintf (file,
+		   ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t",
+		   label_num, label_num);
+	}
+      return;
+
     case 's':
       /* Low 5 bits of 32 - value */
       if (! INT_P (x))
@@ -22950,6 +22976,7 @@  static struct rs6000_opt_mask const rs60
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
   { "pcrel",			OPTION_MASK_PCREL,		false, true  },
+  { "pcrel-opt",		OPTION_MASK_PCREL_OPT,		false, true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
   { "popcntd",			OPTION_MASK_POPCNTD,		false, true  },
   { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
@@ -24911,8 +24938,32 @@  address_to_insn_form (rtx addr,
      local.  */
   if (TARGET_PCREL)
     {
+      /* Special case the PCREL_OPT optimization to only allow offsets that can
+	 fit in the worst case instruction format.  This test is done before
+	 register allocation, so we might miss an occasionally offset that
+	 could have been used in some cases (for example, a SImode that isn't
+	 sign extended or a floating point scalar that is loaded into the
+	 traditional FPR registers instead of the traditional Altivec
+	 registers).  */
       if (SYMBOL_REF_P (op0) && !SYMBOL_REF_LOCAL_P (op0))
-	return INSN_FORM_PCREL_EXTERNAL;
+	{
+	  if (non_prefixed_format == NON_PREFIXED_PCREL_OPT)
+	    {
+	      if (!SIGNED_16BIT_OFFSET_P (offset))
+		return INSN_FORM_BAD;
+
+	      unsigned int size = GET_MODE_SIZE (mode);
+	      if (size >= 16 && (offset & 0xf) != 0)
+		return INSN_FORM_BAD;
+
+	      /* SImode might be sign extended (DS format).  SFmode and DFmode
+		 might be loaded into Altivec registers (DS format).  */
+	      if (size >= 4 && (offset & 0x2) != 0)
+		return INSN_FORM_BAD;
+	    }
+
+	  return INSN_FORM_PCREL_EXTERNAL;
+	}
 
       if (SYMBOL_REF_P (op0) || LABEL_REF_P (op0))
 	return INSN_FORM_PCREL_LOCAL;
@@ -24953,6 +25004,24 @@  address_to_insn_form (rtx addr,
 	non_prefixed_format = NON_PREFIXED_D;
     }
 
+  /* If we are validating the load or store off of the external pointer, be
+     stricter in terms of the offset allowed.  */
+  else if (non_prefixed_format == NON_PREFIXED_PCREL_OPT)
+    {
+      unsigned int size = GET_MODE_SIZE (mode);
+
+      if (size >= 16 && (offset & 0xf) != 0)
+	non_prefixed_format = NON_PREFIXED_DQ;
+
+      /* SImode might be sign extended (DS format).  SFmode and DFmode might be
+	 loaded into Altivec registers (DS format).  */
+      else if (size >= 4 && (offset & 0x2) != 0)
+	non_prefixed_format = NON_PREFIXED_DS;
+
+      else
+	non_prefixed_format = NON_PREFIXED_D;
+    }
+
   /* Classify the D/DS/DQ-form addresses.  */
   switch (non_prefixed_format)
     {
@@ -24992,7 +25061,7 @@  address_to_insn_form (rtx addr,
 /* Helper function to take a REG and a MODE and turn it into the non-prefixed
    instruction format (D/DS/DQ) used for offset memory.  */
 
-static enum non_prefixed_form
+enum non_prefixed_form
 reg_to_non_prefixed (rtx reg, machine_mode mode)
 {
   /* If it isn't a register, use the defaults.  */
@@ -25199,7 +25268,14 @@  void
 rs6000_asm_output_opcode (FILE *stream)
 {
   if (next_insn_prefixed_p)
-    fprintf (stream, "p");
+    {
+      fprintf (stream, "p");
+
+      /* Reset flag in case there are separate insn lines in the sequence, so
+	 the 'p' is only emited for the first line.  This shows up in
+	 pcrel_opt_ld_got.  */
+      next_insn_prefixed_p = false;
+    }
 
   return;
 }
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 278287)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -150,6 +150,8 @@  (define_c_enum "unspec"
    UNSPEC_PLT16_HA
    UNSPEC_PLT16_LO
    UNSPEC_PLT_PCREL
+   UNSPEC_PCREL_OPT_LD_GOT
+   UNSPEC_PCREL_OPT_LD_RELOC
   ])
 
 ;;
@@ -10022,7 +10024,7 @@  (define_insn "*pcrel_local_addr"
 ;; to a PADDI.  Otherwise, it will create a GOT address that is relocated by
 ;; the dynamic linker and loaded up.  Print_operand_address will append a
 ;; @got@pcrel to the symbol.
-(define_insn "*pcrel_extern_addr"
+(define_insn "pcrel_extern_addr"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
 	(match_operand:DI 1 "pcrel_external_address"))]
   "TARGET_PCREL"
@@ -14774,6 +14776,94 @@  (define_insn "*cmpeqb_internal"
   "cmpeqb %0,%1,%2"
   [(set_attr "type" "logical")])
 
+;; Modes that are supported for PCREL_OPT
+(define_mode_iterator PO [QI HI SI DI TI SF DF KF
+			  V1TI V2DI V4SI V8HI V16QI V2DF V4SF
+			  (TF   "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")])
+
+;; Vector modes for PCREL_OPT
+(define_mode_iterator POV [TI KF V1TI V2DI V4SI V8HI V16QI V2DF V4SF
+			   (TF   "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")])
+
+;; Alternate form of pcrel_extern_addr used for the PCREL_OPT optimization for
+;; loads.  We need to put the label after the PLD instruction, because the
+;; assembler might insert a NOP before the PLD for alignment.
+(define_insn "pcrel_opt_ld_got<mode>"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=b")
+	(unspec:DI [(match_operand:DI 1 "pcrel_external_address")
+		    (match_operand 2 "const_int_operand" "n")]
+		UNSPEC_PCREL_OPT_LD_GOT))
+   (clobber (match_scratch:PO 3 "=rwaX"))]
+  "TARGET_PCREL_OPT"
+{
+  return (INTVAL (operands[2])) ? "ld %0,%a1\n.Lpcrel%2:" : "ld %0,%a1";
+}
+  [(set_attr "prefixed" "yes")
+   (set_attr "type" "load")])
+
+;; Alternate form of the loads that include a marker to identify whether we can
+;; do the PCREL_OPT optimization.
+(define_insn "*pcrel_opt_ld<mode>"
+  [(set (match_operand:QHSI 0 "gpc_reg_operand" "=r")
+	(unspec:QHSI [(match_operand:QHSI 1 "non_prefixed_memory" "o")
+		      (match_operand 2 "const_int_operand" "n")]
+		     UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX"))]
+  "TARGET_PCREL_OPT"
+  "%r2l<wd>z %0,%1"
+  [(set_attr "type" "load")])
+
+(define_insn "*pcrel_opt_lddi"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,d,v")
+	(unspec:DI [(match_operand:DI 1 "non_prefixed_memory" "o,o,o")
+		    (match_operand 2 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX,bX,bX"))]
+  "TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   %r2ld %0,%1
+   %r2lfd %0,%1
+   %r2lxsd %0,%1"
+  [(set_attr "type" "load,fpload,fpload")])
+
+(define_insn "*pcrel_opt_ldsf"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=d,v,r")
+	(unspec:SF [(match_operand:SF 1 "non_prefixed_memory" "o,o,o")
+		    (match_operand 2 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX,bX,bX"))]
+  "TARGET_PCREL_OPT"
+  "@
+   %r2lfs %0,%1
+   %r2lxssp %0,%1
+   %r2lwz %0,%1"
+  [(set_attr "type" "fpload,fpload,load")])
+
+(define_insn "*pcrel_opt_lddf"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "=d,v,r")
+	(unspec:DF [(match_operand:DF 1 "non_prefixed_memory" "o,o,o")
+		    (match_operand 2 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX,bX,bX"))]
+  "TARGET_PCREL_OPT
+   && (TARGET_POWERPC64 || vsx_register_operand (operands[0], DFmode))"
+  "@
+   %r2lfd %0,%1
+   %r2lxsd %0,%1
+   %r2ld %0,%1"
+  [(set_attr "type" "fpload,fpload,load")])
+
+(define_insn "*pcrel_opt_ld<mode>"
+  [(set (match_operand:POV 0 "gpc_reg_operand" "=wa")
+	(unspec:POV [(match_operand:POV 1 "non_prefixed_memory" "o")
+		     (match_operand 2 "const_int_operand" "n")]
+		    UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_scratch:DI 3 "=bX"))]
+  "TARGET_PCREL_OPT"
+  "%r2lxv %x0,%1"
+  [(set_attr "type" "vecload")])
+
+
 
 (include "sync.md")
 (include "vector.md")
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 278287)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -577,3 +577,7 @@  Generate (do not generate) prefixed memo
 mpcrel
 Target Report Mask(PCREL) Var(rs6000_isa_flags)
 Generate (do not generate) pc-relative memory addressing.
+
+mpcrel-opt
+Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags)
+Generate (do not generate) pc-relative memory optimizations for externals.
Index: gcc/config/rs6000/t-rs6000
===================================================================
--- gcc/config/rs6000/t-rs6000	(revision 278287)
+++ gcc/config/rs6000/t-rs6000	(working copy)
@@ -47,6 +47,10 @@  rs6000-call.o: $(srcdir)/config/rs6000/r
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+pcrel-opt.o: $(srcdir)/config/rs6000/pcrel-opt.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
   $(srcdir)/config/rs6000/rs6000-cpus.def
 	$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 278287)
+++ gcc/config.gcc	(working copy)
@@ -502,7 +502,7 @@  or1k*-*-*)
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
-	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o"
+	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o"
 	extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
 	extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
 	extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -517,6 +517,7 @@  powerpc*-*-*)
 	esac
 	extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c"
+	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
 	;;
 pru-*-*)
 	cpu_type=pru
@@ -528,8 +529,9 @@  riscv*)
 	;;
 rs6000*-*-*)
 	extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt"
-	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o"
+	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c"
+	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
 	;;
 sparc*-*-*)
 	cpu_type=sparc