diff mbox

, Add power9 support to GCC, patches #2-5 committed

Message ID 20151110001659.GA416@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Nov. 10, 2015, 12:16 a.m. UTC
David said I could commit patches 2-5 after fixing the points that Segher
Boessenkool raised.  I think I addressed most of the points.  If not, let me
know.  I now recall, I have not yet fixed the 'advance fusion' vs. 'power9
fusion' wording in comments, and I will get to that shortly.

I updated the tests to have new tests for the integer power9 instructions
(modulus, count trailing 0's, extswsli) vs. the power9 vector instructions.  I
added new tests for both float128 via software emulation and via power9
instructions.

I updated CTZ_DEFINED_VALUE_AT_ZERO to be 32/64 depending on whether you are
running in 32/64-bit mode.

I removed the empty constraints from the mod define_expand.

Inside of ashdi3_extswsli_dot, if we had split the move and we need to re-issue
the instruction, it calls ashdi3_extswsli_dot instead of ashdi3_extswsli_dot2.

I'm including the patch file for the changes I checked in.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (wF constraint): New constraints
	for power9/toc fusion.
	(wG constraint): Likewise.

	* config/rs6000/predicates.md (u6bit_cint_operand): New
	predicate, recognize 0..63.
	(upper16_cint_operand): New predicate for power9 and toc fusion.
	(fpr_reg_operand): Likewise.
	(toc_fusion_or_p9_reg_operand): Likewise.
	(toc_fusion_mem_raw): Likewise.
	(toc_fusion_mem_wrapped): Likewise.
	(fusion_gpr_addis): If power9 fusion, allow fusion for a larger
	address range.
	(fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
	instead.
	(fusion_addis_mem_combo_load): Add support for power9 fusion of
	floating point loads, floating point stores, and gpr stores.
	(fusion_addis_mem_combo_store): Likewise.
	(fusion_offsettable_mem_operand): Likewise.

	* config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
	declarations.
	(emit_fusion_load_store): Likewise.
	(fusion_p9_p): Likewise.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.
	(fusion_wrap_memory_address): Likewise.

	* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
	elements for power9 fusion.
	(rs6000_debug_print_mode): Rework debug information to print more
	information about fusion.
	(rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
	support.
	(rs6000_legitimate_address_p): Recognize toc fusion as a valid
	offsettable memory address.
	(rs6000_rtx_costs): Update costs for new ISA 3.0 instructions.
	(emit_fusion_gpr_load): Move most of the code from
	emit_fusion_gpr_load into emit_fusion-addis that handles both
	power8 and power9 fusion.
	(emit_fusion_addis): Likewise.
	(emit_fusion_load_store): Likewise.
	(fusion_wrap_memory_address): Add support for TOC fusion.
	(fusion_split_address): Likewise.
	(fusion_p9_p): Add support for power9 fusion.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.

	* config/rs6000/rs6000.h (TARGET_EXTSWSLI): Macros for support for
	new instructions in ISA 3.0.
	(TARGET_CTZ): Likewise.
	(TARGET_TOC_FUSION_INT): Macros for power9 fusion support.
	(TARGET_TOC_FUSION_FP): Likewise.

	* config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
	fusion unspecs.
	(UNSPEC_FUSION_ADDIS): Likewise.
	(QHSI mode iterator): New iterator for power9 fusion.
	(GPR_FUSION): Likewise.
	(FPR_FUSION): Likewise.
	(mod<mode>3): Add support for ISA 3.0
	modulus instructions.
	(umod<mode>3): Likewise.
	(divmod peephole): Likewise.
	(udivmod peephole): Likewise.
	(ctz<mode>2): Add support for ISA 3.0 count trailing zeros scalar
	instructions.
	(ctz<mode>2_h): Likewise.
	(ashdi3_extswsli): Add support for ISA 3.0 EXTSWSLI instruction.
	(ashdi3_extswsli_dot): Likewise.
	(ashdi3_extswsli_dot2): Likewise.
	(power9 fusion splitter): New power9/toc fusion support.
	(toc_fusionload_<mode>): Likewise.
	(toc_fusionload_di): Likewise.
	(fusion_gpr_load_<mode>): Update predicate function.
	(power9 fusion peephole2s): New power9/toc fusion support.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
	(fusion_p9_<mode>_constant): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* lib/target-supports.exp (check_p8vector_hw_available): Split
	long line.
	(check_vsx_hw_available): Likewise.
	(check_p9vector_hw_available): Add new checks for ISA 3.0 hardware
	support and for PowerPC float128 support.
	(check_p9modulo_hw_available): Likewise.
	(check_ppc_float128_sw_available): Likewise.
	(check_ppc_float128_hw_available): Likewise.
	(check_effective_target_powerpc_p9vector_ok): Likewise.
	(check_effective_target_powerpc_p9modulo_ok): Likewise.
	(check_effective_target_powerpc_float128_sw_ok): Likewise.
	(check_effective_target_powerpc_float128_hw_ok): Likewise.
	(is-effective-target): Add new PowerPc targets.
	(is-effective-target-keyword): Likewise.
	(check_vect_support_and_set_flags): If we have ISA 3.0 vector
	instructions, use it.

	* gcc.target/powerpc/mod-1.c: New test for ISA 3.0 instructions.
	* gcc.target/powerpc/mod-2.c: Likewise.
	* gcc.target/powerpc/ctz-1.c: Likewise.
	* gcc.target/powerpc/ctz-2.c: Likewise.
	* gcc.target/powerpc/extswsli-1.c: Likewise.
	* gcc.target/powerpc/extswsli-2.c: Likewise.
	* gcc.target/powerpc/extswsli-3.c: Likewise.

	* gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
	and allow the test on PowerPC LE.
	* gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.
	* gcc.target/powerpc/fusion3.c: New file, test power9 fusion.

	* gcc.target/powerpc/float128-call.c: Use powerpc_float128_sw_ok
	check instead of powerpc_vsx_ok.
	* gcc.target/powerpc/float128-mix.c: Likewise.

Comments

Michael Meissner Nov. 10, 2015, 12:20 a.m. UTC | #1
Actually, it looks like I changed advanced fusion -> power9 fusion.
diff mbox

Patch

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 230064)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -137,6 +137,16 @@  (define_constraint "wD"
   (and (match_code "const_int")
        (match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)")))
 
+;; Extended fusion store
+(define_memory_constraint "wF"
+  "Memory operand suitable for power9 fusion load/stores"
+  (match_operand 0 "fusion_addis_mem_combo_load"))
+
+;; Fusion gpr load.
+(define_memory_constraint "wG"
+  "Memory operand suitable for TOC fusion memory references"
+  (match_operand 0 "toc_fusion_mem_wrapped"))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 230064)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -142,6 +142,11 @@  (define_predicate "u5bit_cint_operand"
   (and (match_code "const_int")
        (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 31")))
 
+;; Return 1 if op is a unsigned 6-bit constant integer.
+(define_predicate "u6bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 63")))
+
 ;; Return 1 if op is a signed 8-bit constant integer.
 ;; Integer multiplication complete more quickly
 (define_predicate "s8bit_cint_operand"
@@ -163,6 +168,12 @@  (define_predicate "u_short_cint_operand"
   (and (match_code "const_int")
        (match_test "satisfies_constraint_K (op)")))
 
+;; Return 1 if op is a constant integer that is a signed 16-bit constant
+;; shifted left 16 bits
+(define_predicate "upper16_cint_operand"
+  (and (match_code "const_int")
+       (match_test "satisfies_constraint_L (op)")))
+
 ;; Return 1 if op is a constant integer that cannot fit in a signed D field.
 (define_predicate "non_short_cint_operand"
   (and (match_code "const_int")
@@ -271,6 +282,70 @@  (define_predicate "base_reg_operand"
   return (REGNO (op) != FIRST_GPR_REGNO);
 })
 
+
+;; Return true if this is a traditional floating point register
+(define_predicate "fpr_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return 1;
+
+  return FP_REGNO_P (r);
+})
+
+;; Return true if this is a register that can has D-form addressing (GPR and
+;; traditional FPR registers for scalars).  ISA 3.0 (power9) adds D-form
+;; addressing for scalars in Altivec registers.
+;;
+;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
+;; power9 fusion allow the floating point types.
+(define_predicate "toc_fusion_or_p9_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+  bool gpr_p = (mode == QImode || mode == HImode || mode == SImode
+		|| mode == SFmode
+		|| (TARGET_POWERPC64 && (mode == DImode || mode == DFmode)));
+  bool fpr_p = (TARGET_P9_FUSION
+		&& (mode == DFmode || mode == SFmode
+		    || (TARGET_POWERPC64 && mode == DImode)));
+  bool vmx_p = (TARGET_P9_FUSION && TARGET_P9_VECTOR
+		&& (mode == DFmode || mode == SFmode));
+
+  if (!TARGET_P8_FUSION)
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return (gpr_p || fpr_p || vmx_p);
+
+  if (INT_REGNO_P (r))
+    return gpr_p;
+
+  if (FP_REGNO_P (r))
+    return fpr_p;
+
+  if (ALTIVEC_REGNO_P (r))
+    return vmx_p;
+
+  return 0;
+})
+
 ;; Return 1 if op is a HTM specific SPR register.
 (define_predicate "htm_spr_reg_operand"
   (match_operand 0 "register_operand")
@@ -1598,6 +1673,35 @@  (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
+;; Match the TOC memory operand that can be fused with an addis instruction.
+;; This is used in matching a potential fused address before register
+;; allocation.
+(define_predicate "toc_fusion_mem_raw"
+  (match_code "mem")
+{
+  if (!TARGET_TOC_FUSION_INT || !can_create_pseudo_p ())
+    return false;
+
+  return small_toc_ref (XEXP (op, 0), Pmode);
+})
+
+;; Match the memory operand that has been fused with an addis instruction and
+;; wrapped inside of an (unspec [...] UNSPEC_FUSION_ADDIS) wrapper.
+(define_predicate "toc_fusion_mem_wrapped"
+  (match_code "mem")
+{
+  rtx addr;
+
+  if (!TARGET_TOC_FUSION_INT)
+    return false;
+
+  if (!MEM_P (op))
+    return false;
+
+  addr = XEXP (op, 0);
+  return (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS);
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
@@ -1620,8 +1724,6 @@  (define_predicate "fusion_gpr_addis"
   else
     return 0;
 
-  /* Power8 currently will only do the fusion if the top 11 bits of the addis
-     value are all 1's or 0's.  */
   value = INTVAL (int_const);
   if ((value & (HOST_WIDE_INT)0xffff) != 0)
     return 0;
@@ -1629,6 +1731,12 @@  (define_predicate "fusion_gpr_addis"
   if ((value & (HOST_WIDE_INT)0xffff0000) == 0)
     return 0;
 
+  /* Power8 currently will only do the fusion if the top 11 bits of the addis
+     value are all 1's or 0's.  Ignore this restriction if we are testing
+     advanced fusion.  */
+  if (TARGET_P9_FUSION)
+    return 1;
+
   return (IN_RANGE (value >> 16, -32, 31));
 })
 
@@ -1694,13 +1802,14 @@  (define_predicate "fusion_gpr_mem_load"
 ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
 ;; memory field with both the addis and the memory offset.  Sign extension
 ;; is not handled here, since lha and lwa are not fused.
-(define_predicate "fusion_gpr_mem_combo"
-  (match_code "mem,zero_extend")
+;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
+(define_predicate "fusion_addis_mem_combo_load"
+  (match_code "mem,zero_extend,float_extend")
 {
   rtx addr, base, offset;
 
-  /* Handle zero extend.  */
-  if (GET_CODE (op) == ZERO_EXTEND)
+  /* Handle zero/float extend.  */
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
     {
       op = XEXP (op, 0);
       mode = GET_MODE (op);
@@ -1721,6 +1830,71 @@  (define_predicate "fusion_gpr_mem_combo"
 	return 0;
       break;
 
+    case SFmode:
+    case DFmode:
+      if (!TARGET_P9_FUSION)
+	return 0;
+      break;
+
+    default:
+      return 0;
+    }
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    return 0;
+
+  base = XEXP (addr, 0);
+  if (!fusion_gpr_addis (base, GET_MODE (base)))
+    return 0;
+
+  offset = XEXP (addr, 1);
+  if (GET_CODE (addr) == PLUS)
+    return satisfies_constraint_I (offset);
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return 0;
+})
+
+;; Like fusion_addis_mem_combo_load, but for stores
+(define_predicate "fusion_addis_mem_combo_store"
+  (match_code "mem")
+{
+  rtx addr, base, offset;
+
+  if (!MEM_P (op) || !TARGET_P9_FUSION)
+    return 0;
+
+  switch (mode)
+    {
+    case QImode:
+    case HImode:
+    case SImode:
+      break;
+
+    case DImode:
+      if (!TARGET_POWERPC64)
+	return 0;
+      break;
+
+    case SFmode:
+      if (!TARGET_SF_FPR)
+	return 0;
+      break;
+
+    case DFmode:
+      if (!TARGET_DF_FPR)
+	return 0;
+      break;
+
     default:
       return 0;
     }
@@ -1748,3 +1922,20 @@  (define_predicate "fusion_gpr_mem_combo"
 
   return 0;
 })
+
+;; Return true if the operand is a float_extend or zero extend of an
+;; offsettable memory operand suitable for use in fusion
+(define_predicate "fusion_offsettable_mem_operand"
+  (match_code "mem,zero_extend,float_extend")
+{
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  if (!memory_operand (op, mode))
+    return 0;
+
+  return offsettable_nonstrict_memref_p (op);
+})
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 230064)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -87,7 +87,15 @@  extern bool direct_move_p (rtx, rtx);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
+extern void emit_fusion_addis (rtx, rtx, const char *, const char *);
+extern void emit_fusion_load_store (rtx, rtx, rtx, const char *);
 extern const char *emit_fusion_gpr_load (rtx, rtx);
+extern bool fusion_p9_p (rtx, rtx, rtx, rtx);
+extern void expand_fusion_p9_load (rtx *);
+extern void expand_fusion_p9_store (rtx *);
+extern const char *emit_fusion_p9_load (rtx, rtx, rtx);
+extern const char *emit_fusion_p9_store (rtx, rtx, rtx);
+extern rtx fusion_wrap_memory_address (rtx);
 extern enum reg_class (*rs6000_preferred_reload_class_ptr) (rtx,
 							    enum reg_class);
 extern enum reg_class (*rs6000_secondary_reload_class_ptr) (enum reg_class,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 230064)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -376,8 +376,18 @@  struct rs6000_reg_addr {
   enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
   enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
   enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
+  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
+					/* INSNs for fusing addi with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
+					/* INSNs for fusing addis with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addis_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addis_st[(int)N_RELOAD_REG];
   addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
   bool scalar_in_vmx_p;			/* Scalar value can go in VMX.  */
+  bool fused_toc;			/* Mode supports TOC fusion.  */
 };
 
 static struct rs6000_reg_addr reg_addr[NUM_MACHINE_MODES];
@@ -2026,25 +2036,113 @@  DEBUG_FUNCTION void
 rs6000_debug_print_mode (ssize_t m)
 {
   ssize_t rc;
+  int spaces = 0;
+  bool fuse_extra_p;
 
   fprintf (stderr, "Mode: %-5s", GET_MODE_NAME (m));
   for (rc = 0; rc < N_RELOAD_REG; rc++)
     fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
 	     rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
 
+  if ((reg_addr[m].reload_store != CODE_FOR_nothing)
+      || (reg_addr[m].reload_load != CODE_FOR_nothing))
+    fprintf (stderr, "  Reload=%c%c",
+	     (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
+	     (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*');
+  else
+    spaces += sizeof ("  Reload=sl") - 1;
+
+  if (reg_addr[m].scalar_in_vmx_p)
+    {
+      fprintf (stderr, "%*s  Upper=y", spaces, "");
+      spaces = 0;
+    }
+  else
+    spaces += sizeof ("  Upper=y") - 1;
+
+  fuse_extra_p = ((reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+		  || reg_addr[m].fused_toc);
+  if (!fuse_extra_p)
+    {
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      if (reg_addr[m].fusion_addi_ld[rc]     != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_ld[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_st[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		{
+		  fuse_extra_p = true;
+		  break;
+		}
+	    }
+	}
+    }
+
+  if (fuse_extra_p)
+    {
+      fprintf (stderr, "%*s  Fuse:", spaces, "");
+      spaces = 0;
+
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      char load, store;
+
+	      if (reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing)
+		load = 'l';
+	      else if (reg_addr[m].fusion_addi_ld[rc] != CODE_FOR_nothing)
+		load = 'L';
+	      else
+		load = '-';
+
+	      if (reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		store = 's';
+	      else if (reg_addr[m].fusion_addi_st[rc] != CODE_FOR_nothing)
+		store = 'S';
+	      else
+		store = '-';
+
+	      if (load == '-' && store == '-')
+		spaces += 5;
+	      else
+		{
+		  fprintf (stderr, "%*s%c=%c%c", (spaces + 1), "",
+			   reload_reg_map[rc].name[0], load, store);
+		  spaces = 0;
+		}
+	    }
+	}
+
+      if (reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+	{
+	  fprintf (stderr, "%*sP8gpr", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" P8gpr") - 1;
+
+      if (reg_addr[m].fused_toc)
+	{
+	  fprintf (stderr, "%*sToc", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" Toc") - 1;
+    }
+  else
+    spaces += sizeof ("  Fuse: G=ls F=ls v=ls P8gpr Toc") - 1;
+
   if (rs6000_vector_unit[m] != VECTOR_NONE
-      || rs6000_vector_mem[m] != VECTOR_NONE
-      || (reg_addr[m].reload_store != CODE_FOR_nothing)
-      || (reg_addr[m].reload_load != CODE_FOR_nothing)
-      || reg_addr[m].scalar_in_vmx_p)
+      || rs6000_vector_mem[m] != VECTOR_NONE)
     {
-      fprintf (stderr,
-	       "  Vector-arith=%-10s Vector-mem=%-10s Reload=%c%c Upper=%c",
+      fprintf (stderr, "%*s  vector: arith=%-10s mem=%s",
+	       spaces, "",
 	       rs6000_debug_vector_unit (rs6000_vector_unit[m]),
-	       rs6000_debug_vector_unit (rs6000_vector_mem[m]),
-	       (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
-	       (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*',
-	       (reg_addr[m].scalar_in_vmx_p) ? 'y' : 'n');
+	       rs6000_debug_vector_unit (rs6000_vector_mem[m]));
     }
 
   fputs ("\n", stderr);
@@ -3019,6 +3117,130 @@  rs6000_init_hard_regno_mode_ok (bool glo
 	reg_addr[SFmode].scalar_in_vmx_p = true;
     }
 
+  /* Setup the fusion operations.  */
+  if (TARGET_P8_FUSION)
+    {
+      reg_addr[QImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_qi;
+      reg_addr[HImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_hi;
+      reg_addr[SImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_si;
+      if (TARGET_64BIT)
+	reg_addr[DImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_di;
+    }
+
+  if (TARGET_P9_FUSION)
+    {
+      struct fuse_insns {
+	enum machine_mode mode;			/* mode of the fused type.  */
+	enum machine_mode pmode;		/* pointer mode.  */
+	enum rs6000_reload_reg_type rtype;	/* register type.  */
+	enum insn_code load;			/* load insn.  */
+	enum insn_code store;			/* store insn.  */
+      };
+
+      static const struct fuse_insns addis_insns[] = {
+	{ SFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_sf_load,
+	  CODE_FOR_fusion_fpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_sf_load,
+	  CODE_FOR_fusion_fpr_si_sf_store },
+
+	{ DFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_df_load,
+	  CODE_FOR_fusion_fpr_di_df_store },
+
+	{ DFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_df_load,
+	  CODE_FOR_fusion_fpr_si_df_store },
+
+	{ DImode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_di_load,
+	  CODE_FOR_fusion_fpr_di_di_store },
+
+	{ DImode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_di_load,
+	  CODE_FOR_fusion_fpr_si_di_store },
+
+	{ QImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_qi_load,
+	  CODE_FOR_fusion_gpr_di_qi_store },
+
+	{ QImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_qi_load,
+	  CODE_FOR_fusion_gpr_si_qi_store },
+
+	{ HImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_hi_load,
+	  CODE_FOR_fusion_gpr_di_hi_store },
+
+	{ HImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_hi_load,
+	  CODE_FOR_fusion_gpr_si_hi_store },
+
+	{ SImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_si_load,
+	  CODE_FOR_fusion_gpr_di_si_store },
+
+	{ SImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_si_load,
+	  CODE_FOR_fusion_gpr_si_si_store },
+
+	{ SFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_sf_load,
+	  CODE_FOR_fusion_gpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_sf_load,
+	  CODE_FOR_fusion_gpr_si_sf_store },
+
+	{ DImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_di_load,
+	  CODE_FOR_fusion_gpr_di_di_store },
+
+	{ DFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_df_load,
+	  CODE_FOR_fusion_gpr_di_df_store },
+      };
+
+      enum machine_mode cur_pmode = Pmode;
+      size_t i;
+
+      for (i = 0; i < ARRAY_SIZE (addis_insns); i++)
+	{
+	  enum machine_mode xmode = addis_insns[i].mode;
+	  enum rs6000_reload_reg_type rtype = addis_insns[i].rtype;
+
+	  if (addis_insns[i].pmode != cur_pmode)
+	    continue;
+
+	  if (rtype == RELOAD_REG_FPR
+	      && (!TARGET_HARD_FLOAT || !TARGET_FPRS))
+	    continue;
+
+	  reg_addr[xmode].fusion_addis_ld[rtype] = addis_insns[i].load;
+	  reg_addr[xmode].fusion_addis_st[rtype] = addis_insns[i].store;
+	}
+    }
+
+  /* Note which types we support fusing TOC setup plus memory insn.  We only do
+     fused TOCs for medium/large code models.  */
+  if (TARGET_P8_FUSION && TARGET_TOC_FUSION && TARGET_POWERPC64
+      && (TARGET_CMODEL != CMODEL_SMALL))
+    {
+      reg_addr[QImode].fused_toc = true;
+      reg_addr[HImode].fused_toc = true;
+      reg_addr[SImode].fused_toc = true;
+      reg_addr[DImode].fused_toc = true;
+      if (TARGET_HARD_FLOAT && TARGET_FPRS)
+	{
+	  if (TARGET_SINGLE_FLOAT)
+	    reg_addr[SFmode].fused_toc = true;
+	  if (TARGET_DOUBLE_FLOAT)
+	    reg_addr[DFmode].fused_toc = true;
+	}
+    }
+
   /* Precalculate HARD_REGNO_NREGS.  */
   for (r = 0; r < FIRST_PSEUDO_REGISTER; ++r)
     for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -8127,6 +8349,8 @@  rs6000_legitimate_address_p (machine_mod
       && legitimate_constant_pool_address_p (x, mode,
 					     reg_ok_strict || lra_in_progress))
     return 1;
+  if (reg_offset_p && reg_addr[mode].fused_toc && toc_fusion_mem_wrapped (x, mode))
+    return 1;
   /* For TImode, if we have load/store quad and TImode in VSX registers, only
      allow register indirect addresses.  This will allow the values to go in
      either GPRs or VSX registers without reloading.  The vector types would
@@ -31851,12 +32075,15 @@  rs6000_rtx_costs (rtx x, machine_mode mo
 	  else
 	    *total = rs6000_cost->divsi;
 	}
-      /* Add in shift and subtract for MOD. */
-      if (code == MOD || code == UMOD)
+      /* Add in shift and subtract for MOD unless we have a mod instruction. */
+      if (!TARGET_MODULO && (code == MOD || code == UMOD))
 	*total += COSTS_N_INSNS (2);
       return false;
 
     case CTZ:
+      *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4);
+      return false;
+
     case FFS:
       *total = COSTS_N_INSNS (4);
       return false;
@@ -31931,6 +32158,17 @@  rs6000_rtx_costs (rtx x, machine_mode mo
       return false;
 
     case ASHIFT:
+      /* The EXTSWSLI instruction is a combined instruction.  Don't count both
+	 the sign extend and shift separately within the insn.  */
+      if (TARGET_EXTSWSLI && mode == DImode
+	  && GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+	  && GET_MODE (XEXP (XEXP (x, 0), 0)) == SImode)
+	{
+	  *total = 0;
+	  return false;
+	}
+      /* fall through */
+	  
     case ASHIFTRT:
     case LSHIFTRT:
     case ROTATE:
@@ -35202,72 +35440,21 @@  expand_fusion_gpr_load (rtx *operands)
   return;
 }
 
-/* Return a string to fuse an addis instruction with a gpr load to the same
-   register that we loaded up the addis instruction.  The address that is used
-   is the logical address that was formed during peephole2:
-	(lo_sum (high) (low-part))
-
-   The code is complicated, so we call output_asm_insn directly, and just
-   return "".  */
+/* Emit the addis instruction that will be part of a fused instruction
+   sequence.  */
 
-const char *
-emit_fusion_gpr_load (rtx target, rtx mem)
+void
+emit_fusion_addis (rtx target, rtx addis_value, const char *comment,
+		   const char *mode_name)
 {
-  rtx addis_value;
   rtx fuse_ops[10];
-  rtx addr;
-  rtx load_offset;
-  const char *addis_str = NULL;
-  const char *load_str = NULL;
-  const char *mode_name = NULL;
   char insn_template[80];
-  machine_mode mode;
+  const char *addis_str = NULL;
   const char *comment_str = ASM_COMMENT_START;
 
-  if (GET_CODE (mem) == ZERO_EXTEND)
-    mem = XEXP (mem, 0);
-
-  gcc_assert (REG_P (target) && MEM_P (mem));
-
   if (*comment_str == ' ')
     comment_str++;
 
-  addr = XEXP (mem, 0);
-  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
-    gcc_unreachable ();
-
-  addis_value = XEXP (addr, 0);
-  load_offset = XEXP (addr, 1);
-
-  /* Now emit the load instruction to the same register.  */
-  mode = GET_MODE (mem);
-  switch (mode)
-    {
-    case QImode:
-      mode_name = "char";
-      load_str = "lbz";
-      break;
-
-    case HImode:
-      mode_name = "short";
-      load_str = "lhz";
-      break;
-
-    case SImode:
-      mode_name = "int";
-      load_str = "lwz";
-      break;
-
-    case DImode:
-      gcc_assert (TARGET_POWERPC64);
-      mode_name = "long";
-      load_str = "ld";
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
   /* Emit the addis instruction.  */
   fuse_ops[0] = target;
   if (satisfies_constraint_L (addis_value))
@@ -35346,67 +35533,530 @@  emit_fusion_gpr_load (rtx target, rtx me
   if (!addis_str)
     fatal_insn ("Could not generate addis value for fusion", addis_value);
 
-  sprintf (insn_template, "%s\t\t%s gpr load fusion, type %s", addis_str,
-	   comment_str, mode_name);
+  sprintf (insn_template, "%s\t\t%s %s, type %s", addis_str, comment_str,
+	   comment, mode_name);
   output_asm_insn (insn_template, fuse_ops);
+}
 
-  /* Emit the D-form load instruction.  */
-  if (CONST_INT_P (load_offset) && satisfies_constraint_I (load_offset))
+/* Emit a D-form load or store instruction that is the second instruction
+   of a fusion sequence.  */
+
+void
+emit_fusion_load_store (rtx load_store_reg, rtx addis_reg, rtx offset,
+			const char *insn_str)
+{
+  rtx fuse_ops[10];
+  char insn_template[80];
+
+  fuse_ops[0] = load_store_reg;
+  fuse_ops[1] = addis_reg;
+
+  if (CONST_INT_P (offset) && satisfies_constraint_I (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1(%%0)", load_str);
-      fuse_ops[1] = load_offset;
+      sprintf (insn_template, "%s %%0,%%2(%%1)", insn_str);
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == UNSPEC
-	   && XINT (load_offset, 1) == UNSPEC_TOCREL)
+  else if (GET_CODE (offset) == UNSPEC
+	   && XINT (offset, 1) == UNSPEC_TOCREL)
     {
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (load_offset, 0, 0);
+      fuse_ops[2] = XVECEXP (offset, 0, 0);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == PLUS
-	   && GET_CODE (XEXP (load_offset, 0)) == UNSPEC
-	   && XINT (XEXP (load_offset, 0), 1) == UNSPEC_TOCREL
-	   && CONST_INT_P (XEXP (load_offset, 1)))
+  else if (GET_CODE (offset) == PLUS
+	   && GET_CODE (XEXP (offset, 0)) == UNSPEC
+	   && XINT (XEXP (offset, 0), 1) == UNSPEC_TOCREL
+	   && CONST_INT_P (XEXP (offset, 1)))
     {
-      rtx tocrel_unspec = XEXP (load_offset, 0);
+      rtx tocrel_unspec = XEXP (offset, 0);
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (tocrel_unspec, 0, 0);
-      fuse_ops[2] = XEXP (load_offset, 1);
+      fuse_ops[2] = XVECEXP (tocrel_unspec, 0, 0);
+      fuse_ops[3] = XEXP (offset, 1);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (load_offset))
+  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+      sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
-      fuse_ops[1] = load_offset;
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
   else
-    fatal_insn ("Unable to generate load offset for fusion", load_offset);
+    fatal_insn ("Unable to generate load/store offset for fusion", offset);
+
+  return;
+}
+
+/* Wrap a TOC address that can be fused to indicate that special fusion
+   processing is needed.  */
+
+rtx
+fusion_wrap_memory_address (rtx old_mem)
+{
+  rtx old_addr = XEXP (old_mem, 0);
+  rtvec v = gen_rtvec (1, old_addr);
+  rtx new_addr = gen_rtx_UNSPEC (Pmode, v, UNSPEC_FUSION_ADDIS);
+  return replace_equiv_address_nv (old_mem, new_addr, false);
+}
+
+/* Given an address, convert it into the addis and load offset parts.  Addresses
+   created during the peephole2 process look like:
+	(lo_sum (high (unspec [(sym)] UNSPEC_TOCREL))
+		(unspec [(...)] UNSPEC_TOCREL))
+
+   Addresses created via toc fusion look like:
+	(unspec [(unspec [(...)] UNSPEC_TOCREL)] UNSPEC_FUSION_ADDIS))  */
+
+static void
+fusion_split_address (rtx addr, rtx *p_hi, rtx *p_lo)
+{
+  rtx hi, lo;
+
+  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS)
+    {
+      lo = XVECEXP (addr, 0, 0);
+      hi = gen_rtx_HIGH (Pmode, lo);
+    }
+  else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
+    {
+      hi = XEXP (addr, 0);
+      lo = XEXP (addr, 1);
+    }
+  else
+    gcc_unreachable ();
+
+  *p_hi = hi;
+  *p_lo = lo;
+}
+
+/* Return a string to fuse an addis instruction with a gpr load to the same
+   register that we loaded up the addis instruction.  The address that is used
+   is the logical address that was formed during peephole2:
+	(lo_sum (high) (low-part))
+
+   Or the address is the TOC address that is wrapped before register allocation:
+	(unspec [(addr) (toc-reg)] UNSPEC_FUSION_ADDIS)
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_gpr_load (rtx target, rtx mem)
+{
+  rtx addis_value;
+  rtx addr;
+  rtx load_offset;
+  const char *load_str = NULL;
+  const char *mode_name = NULL;
+  machine_mode mode;
+
+  if (GET_CODE (mem) == ZERO_EXTEND)
+    mem = XEXP (mem, 0);
+
+  gcc_assert (REG_P (target) && MEM_P (mem));
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &addis_value, &load_offset);
+
+  /* Now emit the load instruction to the same register.  */
+  mode = GET_MODE (mem);
+  switch (mode)
+    {
+    case QImode:
+      mode_name = "char";
+      load_str = "lbz";
+      break;
+
+    case HImode:
+      mode_name = "short";
+      load_str = "lhz";
+      break;
+
+    case SImode:
+    case SFmode:
+      mode_name = (mode == SFmode) ? "float" : "int";
+      load_str = "lwz";
+      break;
+
+    case DImode:
+    case DFmode:
+      gcc_assert (TARGET_POWERPC64);
+      mode_name = (mode == DFmode) ? "double" : "long";
+      load_str = "ld";
+      break;
+
+    default:
+      fatal_insn ("Bad GPR fusion", gen_rtx_SET (target, mem));
+    }
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (target, addis_value, "gpr load fusion", mode_name);
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (target, target, load_offset, load_str);
+
+  return "";
+}
+
+
+/* Return true if the peephole2 can combine a load/store involving a
+   combination of an addis instruction and the memory operation.  This was
+   added to the ISA 3.0 (power9) hardware.  */
+
+bool
+fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
+	     rtx addis_value,		/* addis value.  */
+	     rtx dest,			/* destination (memory or register). */
+	     rtx src)			/* source (register or memory).  */
+{
+  rtx addr, mem, offset;
+  enum machine_mode mode = GET_MODE (src);
+
+  /* Validate arguments.  */
+  if (!base_reg_operand (addis_reg, GET_MODE (addis_reg)))
+    return false;
+
+  if (!fusion_gpr_addis (addis_value, GET_MODE (addis_value)))
+    return false;
+
+  /* Ignore extend operations that are part of the load.  */
+  if (GET_CODE (src) == FLOAT_EXTEND || GET_CODE (src) == ZERO_EXTEND)
+    src = XEXP (src, 0);
+
+  /* Test for memory<-register or register<-memory.  */
+  if (fpr_reg_operand (src, mode) || int_reg_operand (src, mode))
+    {
+      if (!MEM_P (dest))
+	return false;
+
+      mem = dest;
+    }
+
+  else if (MEM_P (src))
+    {
+      if (!fpr_reg_operand (dest, mode) && !int_reg_operand (dest, mode))
+	return false;
+
+      mem = src;
+    }
+
+  else
+    return false;
+
+  addr = XEXP (mem, 0);			/* either PLUS or LO_SUM.  */
+  if (GET_CODE (addr) == PLUS)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      return satisfies_constraint_I (XEXP (addr, 1));
+    }
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      offset = XEXP (addr, 1);
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return false;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   load sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target register being loaded
+	operands[3]	D-form memory reference using operands[0].
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_load (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx target = operands[2];
+  rtx orig_mem = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (target);
+  machine_mode extend_mode = target_mode;
+  machine_mode ptr_mode = Pmode;
+  enum rtx_code extend = UNKNOWN;
+
+  if (GET_CODE (orig_mem) == FLOAT_EXTEND || GET_CODE (orig_mem) == ZERO_EXTEND)
+    {
+      extend = GET_CODE (orig_mem);
+      orig_mem = XEXP (orig_mem, 0);
+      target_mode = GET_MODE (orig_mem);
+    }
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  if (extend != UNKNOWN)
+    new_mem = gen_rtx_fmt_e (extend, extend_mode, new_mem);
+
+  new_mem = gen_rtx_UNSPEC (extend_mode, gen_rtvec (1, new_mem),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (target, new_mem);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   store sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target D-form memory being stored to
+	operands[3]	register being stored
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_store (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx orig_mem = operands[2];
+  rtx src = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn, new_src;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (orig_mem);
+  machine_mode ptr_mode = Pmode;
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  new_src = gen_rtx_UNSPEC (target_mode, gen_rtvec (1, src),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (new_mem, new_src);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* Return a string to fuse an addis instruction with a load using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_load (rtx reg, rtx mem, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *load_string;
+  int r;
+
+  if (GET_CODE (mem) == FLOAT_EXTEND || GET_CODE (mem) == ZERO_EXTEND)
+    {
+      mem = XEXP (mem, 0);
+      mode = GET_MODE (mem);
+    }
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_load, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	load_string = "lfs";
+      else if (mode == DFmode || mode == DImode)
+	load_string = "lfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  load_string = "lbz";
+	  break;
+	case HImode:
+	  load_string = "lhz";
+	  break;
+	case SImode:
+	case SFmode:
+	  load_string = "lwz";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  load_string = "ld";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_load, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_load not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 load fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, load_string);
 
   return "";
 }
+
+/* Return a string to fuse an addis instruction with a store using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_store (rtx mem, rtx reg, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *store_string;
+  int r;
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_store, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	store_string = "stfs";
+      else if (mode == DFmode)
+	store_string = "stfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  store_string = "stb";
+	  break;
+	case HImode:
+	  store_string = "sth";
+	  break;
+	case SImode:
+	case SFmode:
+	  store_string = "stw";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  store_string = "std";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_store, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_store not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 store fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, store_string);
+
+  return "";
+}
+
 
 /* Analyze vector computations and remove unnecessary doubleword
    swaps (xxswapdi instructions).  This pass is performed only
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 230064)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -565,6 +565,8 @@  extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS	TARGET_POPCNTD
 #define TARGET_FCTIDUZ	TARGET_POPCNTD
 #define TARGET_FCTIWUZ	TARGET_POPCNTD
+#define TARGET_CTZ	TARGET_MODULO
+#define TARGET_EXTSWSLI	(TARGET_MODULO && TARGET_POWERPC64)
 
 #define TARGET_XSCVDPSPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN	(TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
@@ -701,6 +703,22 @@  extern int rs6000_vector_align[];
 			 && TARGET_DOUBLE_FLOAT \
 			 && (TARGET_PPC_GFXOPT || VECTOR_UNIT_VSX_P (DFmode)))
 
+/* Conditions to allow TOC fusion for loading/storing integers.  */
+#define TARGET_TOC_FUSION_INT	(TARGET_P8_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64)
+
+/* Conditions to allow TOC fusion for loading/storing floating point.  */
+#define TARGET_TOC_FUSION_FP	(TARGET_P9_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64			\
+				 && TARGET_HARD_FLOAT			\
+				 && TARGET_FPRS				\
+				 && TARGET_SINGLE_FLOAT			\
+				 && TARGET_DOUBLE_FLOAT)
+
 /* Whether the various reciprocal divide/square root estimate instructions
    exist, and whether we should automatically generate code for the instruction
    by default.  */
@@ -2095,8 +2113,12 @@  do {									     \
 #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
   ((VALUE) = ((MODE) == SImode ? 32 : 64), 1)
 
-/* The CTZ patterns return -1 for input of zero.  */
-#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) ((VALUE) = -1, 1)
+/* The CTZ patterns that are implemented in terms of CLZ return -1 for input of
+   zero.  The hardware instructions added in Power9 return 32 or 64.  */
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE)				\
+  ((!TARGET_CTZ)							\
+   ? ((VALUE) = -1, 1)							\
+   : ((VALUE) = ((MODE) == SImode ? 32 : 64), 1))
 
 /* Specify the machine mode that pointers have.
    After generation of rtl, the compiler makes no further distinction
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 230064)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -141,6 +141,8 @@  (define_c_enum "unspec"
    UNSPEC_LSQ
    UNSPEC_FUSION_GPR
    UNSPEC_STACK_CHECK
+   UNSPEC_FUSION_P9
+   UNSPEC_FUSION_ADDIS
   ])
 
 ;;
@@ -327,12 +329,28 @@  (define_mode_iterator EXTSI [(DI "TARGET
 ; QImode or HImode for small atomic ops
 (define_mode_iterator QHI [QI HI])
 
+; QImode, HImode, SImode for fused ops only for GPR loads
+(define_mode_iterator QHSI [QI HI SI])
+
 ; HImode or SImode for sign extended fusion ops
 (define_mode_iterator HSI [HI SI])
 
 ; SImode or DImode, even if DImode doesn't fit in GPRs.
 (define_mode_iterator SDI [SI DI])
 
+; Types that can be fused with an ADDIS instruction to load or store a GPR
+; register that has reg+offset addressing.
+(define_mode_iterator GPR_FUSION [QI
+				  HI
+				  SI
+				  (DI	"TARGET_POWERPC64")
+				  SF
+				  (DF	"TARGET_POWERPC64")])
+
+; Types that can be fused with an ADDIS instruction to load or store a FPR
+; register that has reg+offset addressing.
+(define_mode_iterator FPR_FUSION [DI SF DF])
+
 ; The size of a pointer.  Also, the size of the value that a record-condition
 ; (one with a '.') will compare; and the size used for arithmetic carries.
 (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
@@ -2101,12 +2119,25 @@  (define_expand "ctz<mode>2"
 	      (clobber (reg:GPR CA_REGNO))])]
   ""
 {
+  if (TARGET_CTZ)
+    {
+      emit_insn (gen_ctz<mode>2_hw (operands[0], operands[1]));
+      DONE;
+    }
+
   operands[2] = gen_reg_rtx (<MODE>mode);
   operands[3] = gen_reg_rtx (<MODE>mode);
   operands[4] = gen_reg_rtx (<MODE>mode);
   operands[5] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode) - 1);
 })
 
+(define_insn "ctz<mode>2_hw"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
+  "TARGET_CTZ"
+  "cnttz<wd> %0,%1"
+  [(set_attr "type" "cntlz")])
+
 (define_expand "ffs<mode>2"
   [(set (match_dup 2)
 	(neg:GPR (match_operand:GPR 1 "gpc_reg_operand" "")))
@@ -2885,9 +2916,9 @@  (define_insn_and_split "*div<mode>3_sra_
    (set_attr "cell_micro" "not")])
 
 (define_expand "mod<mode>3"
-  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
-   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
-   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+	(mod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+		 (match_operand:GPR 2 "reg_or_cint_operand")))]
   ""
 {
   int i;
@@ -2897,16 +2928,93 @@  (define_expand "mod<mode>3"
   if (GET_CODE (operands[2]) != CONST_INT
       || INTVAL (operands[2]) <= 0
       || (i = exact_log2 (INTVAL (operands[2]))) < 0)
-    FAIL;
+    {
+      if (!TARGET_MODULO)
+	FAIL;
 
-  temp1 = gen_reg_rtx (<MODE>mode);
-  temp2 = gen_reg_rtx (<MODE>mode);
+      operands[2] = force_reg (<MODE>mode, operands[2]);
+    }
+  else
+    {
+      temp1 = gen_reg_rtx (<MODE>mode);
+      temp2 = gen_reg_rtx (<MODE>mode);
 
-  emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
-  emit_insn (gen_ashl<mode>3 (temp2, temp1, GEN_INT (i)));
-  emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
-  DONE;
+      emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
+      emit_insn (gen_ashl<mode>3 (temp2, temp1, GEN_INT (i)));
+      emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
+      DONE;
+    }
 })
+
+;; In order to enable using a peephole2 for combining div/mod to eliminate the
+;; mod, prefer putting the result of mod into a different register
+(define_insn "*mod<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r")
+        (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		 (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  "TARGET_MODULO"
+  "mods<wd> %0,%1,%2"
+  [(set_attr "type" "div")
+   (set_attr "size" "<bits>")])
+
+
+(define_insn "umod<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r")
+        (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+		  (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  "TARGET_MODULO"
+  "modu<wd> %0,%1,%2"
+  [(set_attr "type" "div")
+   (set_attr "size" "<bits>")])
+
+;; On machines with modulo support, do a combined div/mod the old fashioned
+;; method, since the multiply/subtract is faster than doing the mod instruction
+;; after a divide.
+
+(define_peephole2
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(div:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		 (match_operand:GPR 2 "gpc_reg_operand" "")))
+   (set (match_operand:GPR 3 "gpc_reg_operand" "")
+	(mod:GPR (match_dup 1)
+		 (match_dup 2)))]
+  "TARGET_MODULO
+   && ! reg_mentioned_p (operands[0], operands[1])
+   && ! reg_mentioned_p (operands[0], operands[2])
+   && ! reg_mentioned_p (operands[3], operands[1])
+   && ! reg_mentioned_p (operands[3], operands[2])"
+  [(set (match_dup 0)
+	(div:GPR (match_dup 1)
+		 (match_dup 2)))
+   (set (match_dup 3)
+	(mult:GPR (match_dup 0)
+		  (match_dup 2)))
+   (set (match_dup 3)
+	(minus:GPR (match_dup 1)
+		   (match_dup 3)))])
+
+(define_peephole2
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
+	(udiv:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
+		  (match_operand:GPR 2 "gpc_reg_operand" "")))
+   (set (match_operand:GPR 3 "gpc_reg_operand" "")
+	(umod:GPR (match_dup 1)
+		  (match_dup 2)))]
+  "TARGET_MODULO
+   && ! reg_mentioned_p (operands[0], operands[1])
+   && ! reg_mentioned_p (operands[0], operands[2])
+   && ! reg_mentioned_p (operands[3], operands[1])
+   && ! reg_mentioned_p (operands[3], operands[2])"
+  [(set (match_dup 0)
+	(div:GPR (match_dup 1)
+		 (match_dup 2)))
+   (set (match_dup 3)
+	(mult:GPR (match_dup 0)
+		  (match_dup 2)))
+   (set (match_dup 3)
+	(minus:GPR (match_dup 1)
+		   (match_dup 3)))])
+
 
 ;; Logical instructions
 ;; The logical instructions are mostly combined by using match_operator,
@@ -3843,6 +3951,127 @@  (define_insn_and_split "*ashl<mode>3_dot
    (set_attr "dot" "yes")
    (set_attr "length" "4,8")])
 
+;; Pretend we have a memory form of extswsli until register allocation is done
+;; so that we use LWZ to load the value from memory, instead of LWA.
+(define_insn_and_split "ashdi3_extswsli"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+	(ashift:DI
+	 (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,m"))
+	 (match_operand:DI 2 "u6bit_cint_operand" "n,n")))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli %0,%1,%2
+   #"
+  "&& reload_completed && MEM_P (operands[1])"
+  [(set (match_dup 3)
+	(match_dup 1))
+   (set (match_dup 0)
+	(ashift:DI (sign_extend:DI (match_dup 3))
+		   (match_dup 2)))]
+{
+  operands[3] = gen_lowpart (SImode, operands[0]);
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")])
+
+
+(define_insn_and_split "ashdi3_extswsli_dot"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y,?x,??y")
+	(compare:CC
+	 (ashift:DI
+	  (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,r,m,m"))
+	  (match_operand:DI 2 "u6bit_cint_operand" "n,n,n,n"))
+	 (const_int 0)))
+   (clobber (match_scratch:DI 0 "=r,r,r,r"))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli. %0,%1,%2
+   #
+   #
+   #"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[3], CCmode)
+       || memory_operand (operands[1], SImode))"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx shift = operands[2];
+  rtx cr = operands[3];
+  rtx src2;
+
+  if (!MEM_P (src))
+    src2 = src;
+  else
+    {
+      src2 = gen_lowpart (SImode, dest);
+      emit_move_insn (src2, src);
+    }
+
+  if (REGNO (cr) == CR0_REGNO)
+    {
+      emit_insn (gen_ashdi3_extswsli_dot (dest, src2, shift, cr));
+      DONE;
+    }
+
+  emit_insn (gen_ashdi3_extswsli (dest, src2, shift));
+  emit_insn (gen_rtx_SET (cr, gen_rtx_COMPARE (CCmode, dest, const0_rtx)));
+  DONE;
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")
+   (set_attr "dot" "yes")
+   (set_attr "length" "4,8,8,12")])
+
+(define_insn_and_split "ashdi3_extswsli_dot2"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y,?x,??y")
+	(compare:CC
+	 (ashift:DI
+	  (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,r,m,m"))
+	  (match_operand:DI 2 "u6bit_cint_operand" "n,n,n,n"))
+	 (const_int 0)))
+   (set (match_operand:DI 0 "gpc_reg_operand" "=r,r,r,r")
+	(ashift:DI (sign_extend:DI (match_dup 1))
+		   (match_dup 2)))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli. %0,%1,%2
+   #
+   #
+   #"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[3], CCmode)
+       || memory_operand (operands[1], SImode))"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx shift = operands[2];
+  rtx cr = operands[3];
+  rtx src2;
+
+  if (!MEM_P (src))
+    src2 = src;
+  else
+    {
+      src2 = gen_lowpart (SImode, dest);
+      emit_move_insn (src2, src);
+    }
+
+  if (REGNO (cr) == CR0_REGNO)
+    {
+      emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
+      DONE;
+    }
+
+  emit_insn (gen_ashdi3_extswsli (dest, src2, shift));
+  emit_insn (gen_rtx_SET (cr, gen_rtx_COMPARE (CCmode, dest, const0_rtx)));
+  DONE;
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")
+   (set_attr "dot" "yes")
+   (set_attr "length" "4,8,8,12")])
 
 (define_insn "lshr<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
@@ -12381,6 +12610,66 @@  (define_insn "rs6000_mtfsf"
 ;; a GPR.  The addis instruction must be adjacent to the load, and use the same
 ;; register that is being loaded.  The fused ops must be physically adjacent.
 
+;; There are two parts to addis fusion.  The support for fused TOCs occur
+;; before register allocation, and is meant to reduce the lifetime for the
+;; tempoary register that holds the ADDIS result.  On Power8 GPR loads, we try
+;; to use the register that is being load.  The peephole2 then gathers any
+;; other fused possibilities that it can find after register allocation.  If
+;; power9 fusion is selected, we also fuse floating point loads/stores.
+
+;; Fused TOC support: Replace simple GPR loads with a fused form.  This is done
+;; before register allocation, so that we can avoid allocating a temporary base
+;; register that won't be used, and that we try to load into base registers,
+;; and not register 0.  If we can't get a fused GPR load, generate a P9 fusion
+;; (addis followed by load) even on power8.
+
+(define_split
+  [(set (match_operand:INT1 0 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:INT1 1 "toc_fusion_mem_raw" ""))]
+  "TARGET_TOC_FUSION_INT && can_create_pseudo_p ()"
+  [(parallel [(set (match_dup 0) (match_dup 2))
+	      (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+	      (use (match_dup 3))
+	      (clobber (scratch:DI))])]
+{
+  operands[2] = fusion_wrap_memory_address (operands[1]);
+  operands[3] = gen_rtx_REG (Pmode, TOC_REGISTER);
+})
+
+(define_insn "*toc_fusionload_<mode>"
+  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
+	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b"))]
+  "TARGET_TOC_FUSION_INT"
+{
+  if (base_reg_operand (operands[0], <MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "*toc_fusionload_di"
+  [(set (match_operand:DI 0 "int_reg_operand" "=&b,??r,?d")
+	(match_operand:DI 1 "toc_fusion_mem_wrapped" "wG,wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b,&b"))]
+  "TARGET_TOC_FUSION_INT && TARGET_POWERPC64
+   && (MEM_P (operands[1]) || int_reg_operand (operands[0], DImode))"
+{
+  if (base_reg_operand (operands[0], DImode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+
 ;; Find cases where the addis that feeds into a load instruction is either used
 ;; once or is the same as the target register, and replace it with the fusion
 ;; insn
@@ -12404,7 +12693,7 @@  (define_peephole2
 
 (define_insn "fusion_gpr_load_<mode>"
   [(set (match_operand:INT1 0 "base_reg_operand" "=&b")
-	(unspec:INT1 [(match_operand:INT1 1 "fusion_gpr_mem_combo" "")]
+	(unspec:INT1 [(match_operand:INT1 1 "fusion_addis_mem_combo_load" "")]
 		     UNSPEC_FUSION_GPR))]
   "TARGET_P8_FUSION"
 {
@@ -12414,6 +12703,133 @@  (define_insn "fusion_gpr_load_<mode>"
    (set_attr "length" "8")])
 
 
+;; ISA 3.0 (power9) fusion support
+;; Merge addis with floating load/store to FPRs (or GPRs).
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:SFDF 3 "fusion_offsettable_mem_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_load (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "offsettable_mem_operand" "")
+	(match_operand:SFDF 3 "toc_fusion_or_p9_reg_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_store (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_dup 0)
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 2 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION"
+  [(set (match_dup 0)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 2)] UNSPEC_FUSION_P9))])
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_operand:SDI 2 "int_reg_operand" "")
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 3 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION
+   && !rtx_equal_p (operands[0], operands[2])
+   && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 3)] UNSPEC_FUSION_P9))])
+
+;; Fusion insns, created by the define_peephole2 above (and eventually by
+;; reload).  Because we want to eventually have secondary_reload generate
+;; these, they have to have a single alternative that gives the register
+;; classes.  This means we need to have separate gpr/fpr/altivec versions.
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load"
+  [(set (match_operand:GPR_FUSION 0 "int_reg_operand" "=r")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  /* This insn is a secondary reload insn, which cannot have alternatives.
+     If we are not loading up register 0, use the power8 fusion instead.  */
+  if (base_reg_operand (operands[0], <GPR_FUSION:MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store"
+  [(set (match_operand:GPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "int_reg_operand" "r")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "store")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load"
+  [(set (match_operand:FPR_FUSION 0 "fpr_reg_operand" "=d")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpload")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store"
+  [(set (match_operand:FPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fpr_reg_operand" "d")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpstore")
+   (set_attr "length" "8")])
+
+(define_insn "*fusion_p9_<mode>_constant"
+  [(set (match_operand:SDI 0 "int_reg_operand" "=r")
+	(unspec:SDI [(match_operand:SDI 1 "upper16_cint_operand" "L")
+		     (match_operand:SDI 2 "u_short_cint_operand" "K")]
+		    UNSPEC_FUSION_P9))]	
+  "TARGET_P9_FUSION"
+{
+  emit_fusion_addis (operands[0], operands[1], "constant", "<MODE>");
+  return "ori %0,%0,%2";
+}
+  [(set_attr "type" "two")
+   (set_attr "length" "8")])
+
+
 ;; Miscellaneous ISA 2.06 (power7) instructions
 (define_insn "addg6s"
   [(set (match_operand:SI 0 "register_operand" "=r")
@@ -12580,6 +12996,7 @@  (define_insn "pack<mode>"
   "xxpermdi %x0,%x1,%x2,0"
   [(set_attr "type" "vecperm")])
 
+
 
 
 (include "sync.md")
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 230064)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1616,7 +1616,9 @@  proc check_p8vector_hw_available { } {
     return [check_cached_effective_target p8vector_hw_available {
 	# Some simulators are known to not support VSX/power8 instructions.
 	# For now, disable on Darwin
-	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
 	    expr 0
 	} else {
 	    set options "-mpower8-vector"
@@ -1635,6 +1637,112 @@  proc check_p8vector_hw_available { } {
     }]
 }
 
+# Return 1 if the target supports executing power9 vector instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p9vector_hw_available { } {
+    return [check_cached_effective_target p9vector_hw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mpower9-vector"
+	    check_runtime_nocache p9vector_hw_available {
+		int main()
+		{
+		    long e = -1;
+		    vector double v = (vector double) { 0.0, 0.0 };
+		    asm ("xsxexpdp %0,%1" : "+r" (e) : "wa" (v));
+		    return e;
+		}
+	    } $options
+	}
+    }]
+}
+
+# Return 1 if the target supports executing power9 modulo instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p9modulo_hw_available { } {
+    return [check_cached_effective_target p9modulo_hw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mmodulo"
+	    check_runtime_nocache p9modulo_hw_available {
+		int main()
+		{
+		    int i = 5, j = 3, r = -1;
+		    asm ("modsw %0,%1,%2" : "+r" (r) : "r" (i), "r" (j));
+		    return (r == 2);
+		}
+	    } $options
+	}
+    }]
+}
+
+# Return 1 if the target supports executing __float128 on PowerPC via software
+# emulation, 0 otherwise.  Cache the result.
+
+proc check_ppc_float128_sw_available { } {
+    return [check_cached_effective_target ppc_float128_sw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mfloat128 -mvsx"
+	    check_runtime_nocache ppc_float128_sw_available {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main()
+		{
+		    __float128 z = x + y;
+		    return (z == 3.0q);
+		}
+	    } $options
+	}
+    }]
+}
+
+# Return 1 if the target supports executing __float128 on PowerPC via power9
+# hardware instructions, 0 otherwise.  Cache the result.
+
+proc check_ppc_float128_hw_available { } {
+    return [check_cached_effective_target ppc_float128_hw_available {
+	# Some simulators are known to not support VSX/power8/power9
+	# instructions.	For now, disable on Darwin.
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
+	    expr 0
+	} else {
+	    set options "-mfloat128-hardware"
+	    check_runtime_nocache ppc_float128_hw_available {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main()
+		{
+		    __float128 z = x + y;
+		    __float128 w = -1.0q;
+
+		    __asm__ ("xsaddqp %0,%1,%2" : "+v" (w) : "v" (x), "v" (y));
+		    return ((z == 3.0q) && (z == w);
+		}
+	    } $options
+	}
+    }]
+}
+
 # Return 1 if the target supports executing VSX instructions, 0
 # otherwise.  Cache the result.
 
@@ -1642,7 +1750,9 @@  proc check_vsx_hw_available { } {
     return [check_cached_effective_target vsx_hw_available {
 	# Some simulators are known to not support VSX instructions.
 	# For now, disable on Darwin
-	if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || [istarget *-*-darwin*]} {
+	if { [istarget powerpc-*-eabi]
+	     || [istarget powerpc*-*-eabispe]
+	     || [istarget *-*-darwin*]} {
 	    expr 0
 	} else {
 	    set options "-mvsx"
@@ -3358,6 +3468,108 @@  proc check_effective_target_powerpc_p8ve
     }
 }
 
+# Return 1 if this is a PowerPC target supporting -mpower9-vector
+
+proc check_effective_target_powerpc_p9vector_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p9vector_ok object {
+	    int main (void) {
+		long e = -1;
+		vector double v = (vector double) { 0.0, 0.0 };
+		asm ("xsxexpdp %0,%1" : "+r" (e) : "wa" (v));
+		return e;
+	    }
+	} "-mpower9-vector"]
+    } else {
+	return 0
+    }
+}
+
+# Return 1 if this is a PowerPC target supporting -mmodulo
+
+proc check_effective_target_powerpc_p9modulo_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_p9modulo_ok object {
+	    int main (void) {
+		int i = 5, j = 3, r = -1;
+		asm ("modsw %0,%1,%2" : "+r" (r) : "r" (i), "r" (j));
+		return (r == 2);
+	    }
+	} "-mmodulo"]
+    } else {
+	return 0
+    }
+}
+
+# Return 1 if this is a PowerPC target supporting -mfloat128 via either
+# software emulation on power7/power8 systems or hardware support on power9.
+
+proc check_effective_target_powerpc_float128_sw_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_float128_sw_ok object {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main() {
+		    __float128 z = x + y;
+		    return (z == 3.0q);
+		}
+	    } "-mfloat128 -mvsx"]
+    } else {
+	return 0
+    }
+}
+
+# Return 1 if this is a PowerPC target supporting -mfloat128 via hardware
+# support on power9.
+
+proc check_effective_target_powerpc_float128_hw_ok { } {
+    if { ([istarget powerpc*-*-*]
+         && ![istarget powerpc-*-linux*paired*])
+	 || [istarget rs6000-*-*] } {
+	# AltiVec is not supported on AIX before 5.3.
+	if { [istarget powerpc*-*-aix4*]
+	     || [istarget powerpc*-*-aix5.1*] 
+	     || [istarget powerpc*-*-aix5.2*] } {
+	    return 0
+	}
+	return [check_no_compiler_messages powerpc_float128_hw_ok object {
+		volatile __float128 x = 1.0q;
+		volatile __float128 y = 2.0q;
+		int main() {
+		    __float128 z;
+		    __asm__ ("xsaddqp %0,%1,%2" : "=v" (z) : "v" (x), "v" (y));
+		    return (z == 3.0q);
+		}
+	} "-mfloat128-hardware"]
+    } else {
+	return 0
+    }
+}
+
 # Return 1 if this is a PowerPC target supporting -mvsx
 
 proc check_effective_target_powerpc_vsx_ok { } {
@@ -5459,6 +5671,10 @@  proc is-effective-target { arg } {
 	  "vmx_hw"         { set selected [check_vmx_hw_available] }
 	  "vsx_hw"         { set selected [check_vsx_hw_available] }
 	  "p8vector_hw"    { set selected [check_p8vector_hw_available] }
+	  "p9vector_hw"    { set selected [check_p9vector_hw_available] }
+	  "p9modulo_hw"    { set selected [check_p9modulo_hw_available] }
+	  "ppc_float128_sw" { set selected [check_ppc_float128_sw_available] }
+	  "ppc_float128_hw" { set selected [check_ppc_float128_hw_available] }
 	  "ppc_recip_hw"   { set selected [check_ppc_recip_hw_available] }
 	  "dfp_hw"         { set selected [check_dfp_hw_available] }
 	  "htm_hw"         { set selected [check_htm_hw_available] }
@@ -5483,6 +5699,10 @@  proc is-effective-target-keyword { arg }
 	  "vmx_hw"         { return 1 }
 	  "vsx_hw"         { return 1 }
 	  "p8vector_hw"    { return 1 }
+	  "p9vector_hw"    { return 1 }
+	  "p9modulo_hw"    { return 1 }
+	  "ppc_float128_sw" { return 1 }
+	  "ppc_float128_hw" { return 1 }
 	  "ppc_recip_hw"   { return 1 }
 	  "dfp_hw"         { return 1 }
 	  "htm_hw"         { return 1 }
@@ -6186,7 +6406,9 @@  proc check_vect_support_and_set_flags { 
         }
 
         lappend DEFAULT_VECTCFLAGS "-maltivec"
-        if [check_p8vector_hw_available] {
+        if [check_p9vector_hw_available] {
+            lappend DEFAULT_VECTCFLAGS "-mpower9-vector"
+        } elseif [check_p8vector_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mpower8-vector"
         } elseif [check_vsx_hw_available] {
             lappend DEFAULT_VECTCFLAGS "-mvsx" "-mno-allow-movmisalign"
Index: gcc/testsuite/gcc.target/powerpc/extswsli-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-1.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+static int mem;
+int *ptr = &mem;
+
+long
+add (long *p, int reg)
+{
+  __asm__ (" #foo %0" : "+r" (reg));
+  return p[reg] + p[mem];
+}
+
+/* { dg-final { scan-assembler-times "extswsli " 2 } } */
+/* { dg-final { scan-assembler-times "lwz "      1 } } */
+/* { dg-final { scan-assembler-not   "lwa "        } } */
+/* { dg-final { scan-assembler-not   "sldi "       } } */
+/* { dg-final { scan-assembler-not   "extsw "      } } */
Index: gcc/testsuite/gcc.target/powerpc/extswsli-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-2.c	(revision 0)
@@ -0,0 +1,37 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+long
+func1 (int reg, int *is_zero)
+{
+  long value;
+
+  __asm__ (" #foo %0" : "+r" (reg));
+  value = ((long)reg) << 4;
+
+  if (!value)
+    *is_zero = 1;
+
+  return value;
+}
+
+long
+func2 (int *ptr, int *is_zero)
+{
+  int reg = *ptr;
+  long value = ((long)reg) << 4;
+
+  if (!value)
+    *is_zero = 1;
+
+  return value;
+}
+
+/* { dg-final { scan-assembler     "extswsli\\. " } } */
+/* { dg-final { scan-assembler     "lwz "         } } */
+/* { dg-final { scan-assembler-not "lwa "         } } */
+/* { dg-final { scan-assembler-not "sldi "        } } */
+/* { dg-final { scan-assembler-not "sldi\\. "     } } */
+/* { dg-final { scan-assembler-not "extsw "       } } */
Index: gcc/testsuite/gcc.target/powerpc/extswsli-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/extswsli-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/extswsli-3.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+long
+do_ext_add (int *p, long a, long b)
+{
+  long l = *p;
+  long l2 = l << 4;
+  return l2 + ((l2 == 0) ? a : b);
+}
+
+long
+do_ext (int *p, long a, long b)
+{
+  long l = *p;
+  long l2 = l << 4;
+  return ((l2 == 0) ? a : b);
+}
+
+/* { dg-final { scan-assembler "extswsli\\. "} } */
Index: gcc/testsuite/gcc.target/powerpc/ctz-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ctz-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-1.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+int l_trailing_zero (long a) { return __builtin_ctzl (a); }
+int ll_trailing_zero (long long a) { return __builtin_ctzll (a); }
+
+/* { dg-final { scan-assembler     "cnttzw " } } */
+/* { dg-final { scan-assembler     "cnttzd " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */
+/* { dg-final { scan-assembler-not "cntlzd " } } */
Index: gcc/testsuite/gcc.target/powerpc/ctz-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ctz-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-2.c	(revision 0)
@@ -0,0 +1,9 @@ 
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+
+/* { dg-final { scan-assembler     "cnttzw " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */
Index: gcc/testsuite/gcc.target/powerpc/float128-mix.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/float128-mix.c	(revision 230064)
+++ gcc/testsuite/gcc.target/powerpc/float128-mix.c	(working copy)
@@ -1,6 +1,5 @@ 
 /* { dg-do compile { target { powerpc*-*-linux* } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_float128_sw_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-O2 -mcpu=power7 -mfloat128" } */
 
Index: gcc/testsuite/gcc.target/powerpc/mod-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/mod-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-1.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int ismod (int a, int b) { return a%b; }
+long lsmod (long a, long b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+unsigned long lumod (unsigned long a, unsigned long b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "modsd " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-times "modud " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "mulld "   } } */
+/* { dg-final { scan-assembler-not   "divw "    } } */
+/* { dg-final { scan-assembler-not   "divd "    } } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
+/* { dg-final { scan-assembler-not   "divdu "   } } */
Index: gcc/testsuite/gcc.target/powerpc/mod-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/mod-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-2.c	(revision 0)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+int ismod (int a, int b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "divw "    } } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
+
+vector double fusion_vector (vector double *p) { return p[2]; }
+
+/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */
+
+#define LARGE 0x12345
+
+int fusion_float_read (float *p){ return p[LARGE]; }
+int fusion_double_read (double *p){ return p[LARGE]; }
+
+void fusion_float_write (float *p, float f){ p[LARGE] = f; }
+void fusion_double_write (double *p, double d){ p[LARGE] = d; }
+
+/* { dg-final { scan-assembler "load fusion, type SF"  } } */
+/* { dg-final { scan-assembler "load fusion, type DF"  } } */
+/* { dg-final { scan-assembler "store fusion, type SF" } } */
+/* { dg-final { scan-assembler "store fusion, type DF" } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion.c	(revision 230064)
+++ gcc/testsuite/gcc.target/powerpc/fusion.c	(working copy)
@@ -1,6 +1,5 @@ 
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
@@ -14,10 +13,7 @@  int fusion_short (short *p){ return p[LA
 int fusion_int (int *p){ return p[LARGE]; }
 unsigned fusion_uns (unsigned *p){ return p[LARGE]; }
 
-vector double fusion_vector (vector double *p) { return p[2]; }
-
 /* { dg-final { scan-assembler-times "gpr load fusion"    6 } } */
-/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
 /* { dg-final { scan-assembler-times "lbz"                2 } } */
 /* { dg-final { scan-assembler-times "extsb"              1 } } */
 /* { dg-final { scan-assembler-times "lhz"                2 } } */
Index: gcc/testsuite/gcc.target/powerpc/float128-call.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/float128-call.c	(revision 230064)
+++ gcc/testsuite/gcc.target/powerpc/float128-call.c	(working copy)
@@ -1,6 +1,5 @@ 
 /* { dg-do compile { target { powerpc*-*-linux* } } } */
-/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_float128_sw_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-O2 -mcpu=power7 -mfloat128 -mno-regnames" } */