diff mbox

, Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

Message ID 20151109004204.GE17170@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Nov. 9, 2015, 12:42 a.m. UTC
This patch adds support for new fusion forms in ISA 3.0 (power9).  In
particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
stores, and some constant generation that ISA 2.07 (power8) could not
generate.

I have built this patch with a bootstrap build on a power8 little endian
system.  There were no regressions in the test suite.  Is this patch ok to
install in the trunk once patch #1 has been installed.

[gcc]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (wF constraint): New constraints
	for power9/toc fusion.
	(wG constraint): Likewise.

	* config/rs6000/predicates.md (upper16_cint_operand): New
	predicate for power9 and toc fusion.
	(fpr_reg_operand): Likewise.
	(toc_fusion_or_p9_reg_operand): Likewise.
	(toc_fusion_mem_raw): Likewise.
	(toc_fusion_mem_wrapped): Likewise.
	(fusion_gpr_addis): If power9 fusion, allow fusion for a larger
	address range.
	(fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
	instead.
	(fusion_addis_mem_combo_load): Add support for power9 fusion of
	floating point loads, floating point stores, and gpr stores.
	(fusion_addis_mem_combo_store): Likewise.
	(fusion_offsettable_mem_operand): Likewise.

	* config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
	declarations.
	(emit_fusion_load_store): Likewise.
	(fusion_p9_p): Likewise.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.
	(fusion_wrap_memory_address): Likewise.

	* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
	elements for power9 fusion.
	(rs6000_debug_print_mode): Rework debug information to print more
	information about fusion.
	(rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
	support.
	(rs6000_legitimate_address_p): Recognize toc fusion as a valid
	offsettable memory address.
	(emit_fusion_gpr_load): Move most of the code from
	emit_fusion_gpr_load into emit_fusion-addis that handles both
	power8 and power9 fusion.
	(emit_fusion_addis): Likewise.
	(emit_fusion_load_store): Likewise.
	(fusion_wrap_memory_address): Add support for TOC fusion.
	(fusion_split_address): Likewise.
	(fusion_p9_p): Add support for power9 fusion.
	(expand_fusion_p9_load): Likewise.
	(expand_fusion_p9_store): Likewise.
	(emit_fusion_p9_load): Likewise.
	(emit_fusion_p9_store): Likewise.

	* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for
	power9 fusion support.
	(TARGET_TOC_FUSION_FP): Likewise.

	* config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
	fusion unspecs.
	(UNSPEC_FUSION_ADDIS): Likewise.
	(QHSI mode iterator): New iterator for power9 fusion.
	(GPR_FUSION): Likewise.
	(FPR_FUSION): Likewise.
	(power9 fusion splitter): New power9/toc fusion support.
	(toc_fusionload_<mode>): Likewise.
	(toc_fusionload_di): Likewise.
	(fusion_gpr_load_<mode>): Update predicate function.
	(power9 fusion peephole2s): New power9/toc fusion support.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
	(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load): Likewise.
	(fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
	(fusion_p9_<mode>_constant): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
	and allow the test on PowerPC LE.
	* gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.

	* gcc.target/powerpc/fusion3.c: New file, test power9 fusion.

Comments

Segher Boessenkool Nov. 9, 2015, 5:16 p.m. UTC | #1
On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote:
> -  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> -     value are all 1's or 0's.  */
>    value = INTVAL (int_const);
>    if ((value & (HOST_WIDE_INT)0xffff) != 0)

Space after cast, like  (HOST_WIDE_INT) 0xffff  .

> +  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> +     value are all 1's or 0's.  Ignore this restriction if we are testing
> +     advanced fusion.  */
> +  if (TARGET_P9_FUSION)
> +    return 1;

This comment seems out of date?

>  ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
>  ;; memory field with both the addis and the memory offset.  Sign extension
>  ;; is not handled here, since lha and lwa are not fused.
> -(define_predicate "fusion_gpr_mem_combo"
> -  (match_code "mem,zero_extend")
> +;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend

And here?

> --- gcc/config/rs6000/rs6000.c	(revision 229975)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -376,8 +376,18 @@ struct rs6000_reg_addr {
>    enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
>    enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
>    enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
> +  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
> +					/* INSNs for fusing addi with loads
> +					   or stores for each reg. class.  */					   
> +  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
> +  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
> +					/* INSNs for fusing addis with loads
> +					   or stores for each reg. class.  */					   

Trailing tabs.

> +/* Return true if the peephole2 can combine a load/store involving a
> +   combination of an addis instruction and the memory operation.  This was
> +   added to the ISA 3.0 (power9) hardware.  */
> +
> +bool
> +fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
> +	     rtx addis_value,		/* addis value.  */
> +	     rtx dest,			/* destination (memory or register). */
> +	     rtx src)			/* source (register or memory).  */

The function header comment should explain the params, after which you
can use the normal style for the function declaration itself.

> +(define_insn "*toc_fusionload_<mode>"
> +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> +	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> +   (clobber (match_scratch:DI 3 "=X,&b"))]
> +  "TARGET_TOC_FUSION_INT"

Do you need that "??r" alternative?  Same for the next define_insn.

Big patch, most looks good :-)


Segher
Michael Meissner Nov. 9, 2015, 5:34 p.m. UTC | #2
On Mon, Nov 09, 2015 at 11:16:27AM -0600, Segher Boessenkool wrote:
> On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote:
> > -  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> > -     value are all 1's or 0's.  */
> >    value = INTVAL (int_const);
> >    if ((value & (HOST_WIDE_INT)0xffff) != 0)
> 
> Space after cast, like  (HOST_WIDE_INT) 0xffff  .

Thanks.

> > +  /* Power8 currently will only do the fusion if the top 11 bits of the addis
> > +     value are all 1's or 0's.  Ignore this restriction if we are testing
> > +     advanced fusion.  */
> > +  if (TARGET_P9_FUSION)
> > +    return 1;
> 
> This comment seems out of date?

Yeah, when I first coded it when the fusion semantics were being nailed down, I
couldn't reference power9 in the branch which was kept on the FSF servers, so I
just called it advanced fusion.  I evidently missed a few places in doing the
merge to change the name.

> >  ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
> >  ;; memory field with both the addis and the memory offset.  Sign extension
> >  ;; is not handled here, since lha and lwa are not fused.
> > -(define_predicate "fusion_gpr_mem_combo"
> > -  (match_code "mem,zero_extend")
> > +;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
> 
> And here?

Yes.

> > --- gcc/config/rs6000/rs6000.c	(revision 229975)
> > +++ gcc/config/rs6000/rs6000.c	(working copy)
> > @@ -376,8 +376,18 @@ struct rs6000_reg_addr {
> >    enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
> >    enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
> >    enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
> > +  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
> > +					/* INSNs for fusing addi with loads
> > +					   or stores for each reg. class.  */					   
> > +  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
> > +  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
> > +					/* INSNs for fusing addis with loads
> > +					   or stores for each reg. class.  */					   
> 
> Trailing tabs.

Ok.

> > +/* Return true if the peephole2 can combine a load/store involving a
> > +   combination of an addis instruction and the memory operation.  This was
> > +   added to the ISA 3.0 (power9) hardware.  */
> > +
> > +bool
> > +fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
> > +	     rtx addis_value,		/* addis value.  */
> > +	     rtx dest,			/* destination (memory or register). */
> > +	     rtx src)			/* source (register or memory).  */
> 
> The function header comment should explain the params, after which you
> can use the normal style for the function declaration itself.

Ok.

> > +(define_insn "*toc_fusionload_<mode>"
> > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> > +	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> > +  "TARGET_TOC_FUSION_INT"
> 
> Do you need that "??r" alternative?  Same for the next define_insn.

Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
base register, and it can't be used for power8 gpr fusion (where you use the
value being loaded for the ADDIS instruction), but it can be used for power9
fusion (where the ADDIS must be adjancent, but it no longer has to be the
register being loaded).

> Big patch, most looks good :-)

Thanks.
David Edelsohn Nov. 9, 2015, 6:57 p.m. UTC | #3
On Sun, Nov 8, 2015 at 4:42 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patch adds support for new fusion forms in ISA 3.0 (power9).  In
> particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
> stores, and some constant generation that ISA 2.07 (power8) could not
> generate.
>
> I have built this patch with a bootstrap build on a power8 little endian
> system.  There were no regressions in the test suite.  Is this patch ok to
> install in the trunk once patch #1 has been installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * config/rs6000/constraints.md (wF constraint): New constraints
>         for power9/toc fusion.
>         (wG constraint): Likewise.
>
>         * config/rs6000/predicates.md (upper16_cint_operand): New
>         predicate for power9 and toc fusion.
>         (fpr_reg_operand): Likewise.
>         (toc_fusion_or_p9_reg_operand): Likewise.
>         (toc_fusion_mem_raw): Likewise.
>         (toc_fusion_mem_wrapped): Likewise.
>         (fusion_gpr_addis): If power9 fusion, allow fusion for a larger
>         address range.
>         (fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
>         instead.
>         (fusion_addis_mem_combo_load): Add support for power9 fusion of
>         floating point loads, floating point stores, and gpr stores.
>         (fusion_addis_mem_combo_store): Likewise.
>         (fusion_offsettable_mem_operand): Likewise.
>
>         * config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
>         declarations.
>         (emit_fusion_load_store): Likewise.
>         (fusion_p9_p): Likewise.
>         (expand_fusion_p9_load): Likewise.
>         (expand_fusion_p9_store): Likewise.
>         (emit_fusion_p9_load): Likewise.
>         (emit_fusion_p9_store): Likewise.
>         (fusion_wrap_memory_address): Likewise.
>
>         * config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
>         elements for power9 fusion.
>         (rs6000_debug_print_mode): Rework debug information to print more
>         information about fusion.
>         (rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
>         support.
>         (rs6000_legitimate_address_p): Recognize toc fusion as a valid
>         offsettable memory address.
>         (emit_fusion_gpr_load): Move most of the code from
>         emit_fusion_gpr_load into emit_fusion-addis that handles both
>         power8 and power9 fusion.
>         (emit_fusion_addis): Likewise.
>         (emit_fusion_load_store): Likewise.
>         (fusion_wrap_memory_address): Add support for TOC fusion.
>         (fusion_split_address): Likewise.
>         (fusion_p9_p): Add support for power9 fusion.
>         (expand_fusion_p9_load): Likewise.
>         (expand_fusion_p9_store): Likewise.
>         (emit_fusion_p9_load): Likewise.
>         (emit_fusion_p9_store): Likewise.
>
>         * config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for
>         power9 fusion support.
>         (TARGET_TOC_FUSION_FP): Likewise.
>
>         * config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
>         fusion unspecs.
>         (UNSPEC_FUSION_ADDIS): Likewise.
>         (QHSI mode iterator): New iterator for power9 fusion.
>         (GPR_FUSION): Likewise.
>         (FPR_FUSION): Likewise.
>         (power9 fusion splitter): New power9/toc fusion support.
>         (toc_fusionload_<mode>): Likewise.
>         (toc_fusionload_di): Likewise.
>         (fusion_gpr_load_<mode>): Update predicate function.
>         (power9 fusion peephole2s): New power9/toc fusion support.
>         (fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
>         (fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
>         (fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load): Likewise.
>         (fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store): Likewise.
>         (fusion_p9_<mode>_constant): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
>         and allow the test on PowerPC LE.
>         * gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.
>
>         * gcc.target/powerpc/fusion3.c: New file, test power9 fusion.

Okay, with the changes that you and Segher discussed.

Thanks, David
Segher Boessenkool Nov. 9, 2015, 7:57 p.m. UTC | #4
On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
> > > +(define_insn "*toc_fusionload_<mode>"
> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> > > +	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> > > +  "TARGET_TOC_FUSION_INT"
> > 
> > Do you need that "??r" alternative?  Same for the next define_insn.
> 
> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
> base register, and it can't be used for power8 gpr fusion (where you use the
> value being loaded for the ADDIS instruction), but it can be used for power9
> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> register being loaded).

If you have only "b", r0 will not be chosen.  Does that help?  Or are
you generating this pattern from somewhere else where you put in r0?


Segher
David Edelsohn Nov. 9, 2015, 9:11 p.m. UTC | #5
On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
>> > > +(define_insn "*toc_fusionload_<mode>"
>> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
>> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
>> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
>> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
>> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
>> > > +  "TARGET_TOC_FUSION_INT"
>> >
>> > Do you need that "??r" alternative?  Same for the next define_insn.
>>
>> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
>> base register, and it can't be used for power8 gpr fusion (where you use the
>> value being loaded for the ADDIS instruction), but it can be used for power9
>> fusion (where the ADDIS must be adjancent, but it no longer has to be the
>> register being loaded).
>
> If you have only "b", r0 will not be chosen.  Does that help?  Or are
> you generating this pattern from somewhere else where you put in r0?

Mike,

What happens if you leave out the "r" alternative?  Does other code
explicitly generate that pattern with r0?

Thanks, David
Michael Meissner Nov. 9, 2015, 10:17 p.m. UTC | #6
On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote:
> On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
> >> > > +(define_insn "*toc_fusionload_<mode>"
> >> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> >> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> >> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> >> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> >> > > +  "TARGET_TOC_FUSION_INT"
> >> >
> >> > Do you need that "??r" alternative?  Same for the next define_insn.
> >>
> >> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
> >> base register, and it can't be used for power8 gpr fusion (where you use the
> >> value being loaded for the ADDIS instruction), but it can be used for power9
> >> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> >> register being loaded).
> >
> > If you have only "b", r0 will not be chosen.  Does that help?  Or are
> > you generating this pattern from somewhere else where you put in r0?
> 
> Mike,
> 
> What happens if you leave out the "r" alternative?  Does other code
> explicitly generate that pattern with r0?

Sometimes, one of the passes after reload (usually -fgcse-after-reload) decides
to redo the register allocation, and I would see a failure in building things
like Spec 2006.  I have tried not putting the "r" in there, or using
base_reg_operand instead of gpc_reg_operand, but I still got failures.
David Edelsohn Nov. 9, 2015, 10:33 p.m. UTC | #7
On Mon, Nov 9, 2015 at 2:17 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote:
>> On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
>> <segher@kernel.crashing.org> wrote:
>> > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
>> >> > > +(define_insn "*toc_fusionload_<mode>"
>> >> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
>> >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
>> >> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
>> >> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
>> >> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
>> >> > > +  "TARGET_TOC_FUSION_INT"
>> >> >
>> >> > Do you need that "??r" alternative?  Same for the next define_insn.
>> >>
>> >> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
>> >> base register, and it can't be used for power8 gpr fusion (where you use the
>> >> value being loaded for the ADDIS instruction), but it can be used for power9
>> >> fusion (where the ADDIS must be adjancent, but it no longer has to be the
>> >> register being loaded).
>> >
>> > If you have only "b", r0 will not be chosen.  Does that help?  Or are
>> > you generating this pattern from somewhere else where you put in r0?
>>
>> Mike,
>>
>> What happens if you leave out the "r" alternative?  Does other code
>> explicitly generate that pattern with r0?
>
> Sometimes, one of the passes after reload (usually -fgcse-after-reload) decides
> to redo the register allocation, and I would see a failure in building things
> like Spec 2006.  I have tried not putting the "r" in there, or using
> base_reg_operand instead of gpc_reg_operand, but I still got failures.

This seems like a bug in those other passes that should be tracked down.

Thanks, David
Segher Boessenkool Nov. 14, 2015, 10:58 p.m. UTC | #8
On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote:
> This patch adds support for new fusion forms in ISA 3.0 (power9).  In
> particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
> stores, and some constant generation that ISA 2.07 (power8) could not
> generate.

TOC fusion breaks thousands of testcases with -mlra -flto.

What happens is that LRA tries to reload the memory address (the unspec
FUSION_ADDIS) into a register, but there is no pattern that will let it
do that.

This can be fixed temporarily by not enabling TOC fusion if LRA is
enabled.

It seems that without -flto TOC fusion doesn't do much at all, btw?


Segher
diff mbox

Patch

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 229970)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -137,6 +137,16 @@  (define_constraint "wD"
   (and (match_code "const_int")
        (match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)")))
 
+;; Extended fusion store
+(define_memory_constraint "wF"
+  "Memory operand suitable for power9 fusion load/stores"
+  (match_operand 0 "fusion_addis_mem_combo_load"))
+
+;; Fusion gpr load.
+(define_memory_constraint "wG"
+  "Memory operand suitable for TOC fusion memory references"
+  (match_operand 0 "toc_fusion_mem_wrapped"))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 229975)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -168,6 +168,12 @@  (define_predicate "u_short_cint_operand"
   (and (match_code "const_int")
        (match_test "satisfies_constraint_K (op)")))
 
+;; Return 1 if op is a constant integer that is a signed 16-bit constant
+;; shifted left 16 bits
+(define_predicate "upper16_cint_operand"
+  (and (match_code "const_int")
+       (match_test "satisfies_constraint_L (op)")))
+
 ;; Return 1 if op is a constant integer that cannot fit in a signed D field.
 (define_predicate "non_short_cint_operand"
   (and (match_code "const_int")
@@ -276,6 +282,70 @@  (define_predicate "base_reg_operand"
   return (REGNO (op) != FIRST_GPR_REGNO);
 })
 
+
+;; Return true if this is a traditional floating point register
+(define_predicate "fpr_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return 1;
+
+  return FP_REGNO_P (r);
+})
+
+;; Return true if this is a register that can has D-form addressing (GPR and
+;; traditional FPR registers for scalars).  ISA 3.0 (power9) adds D-form
+;; addressing for scalars in Altivec registers.
+;;
+;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
+;; power9 fusion allow the floating point types.
+(define_predicate "toc_fusion_or_p9_reg_operand"
+  (match_code "reg,subreg")
+{
+  HOST_WIDE_INT r;
+  bool gpr_p = (mode == QImode || mode == HImode || mode == SImode
+		|| mode == SFmode
+		|| (TARGET_POWERPC64 && (mode == DImode || mode == DFmode)));
+  bool fpr_p = (TARGET_P9_FUSION
+		&& (mode == DFmode || mode == SFmode
+		    || (TARGET_POWERPC64 && mode == DImode)));
+  bool vmx_p = (TARGET_P9_FUSION && TARGET_P9_VECTOR
+		&& (mode == DFmode || mode == SFmode));
+
+  if (!TARGET_P8_FUSION)
+    return 0;
+
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+
+  if (!REG_P (op))
+    return 0;
+
+  r = REGNO (op);
+  if (r >= FIRST_PSEUDO_REGISTER)
+    return (gpr_p || fpr_p || vmx_p);
+
+  if (INT_REGNO_P (r))
+    return gpr_p;
+
+  if (FP_REGNO_P (r))
+    return fpr_p;
+
+  if (ALTIVEC_REGNO_P (r))
+    return vmx_p;
+
+  return 0;
+})
+
 ;; Return 1 if op is a HTM specific SPR register.
 (define_predicate "htm_spr_reg_operand"
   (match_operand 0 "register_operand")
@@ -1603,6 +1673,35 @@  (define_predicate "small_toc_ref"
   return GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_TOCREL;
 })
 
+;; Match the TOC memory operand that can be fused with an addis instruction.
+;; This is used in matching a potential fused address before register
+;; allocation.
+(define_predicate "toc_fusion_mem_raw"
+  (match_code "mem")
+{
+  if (!TARGET_TOC_FUSION_INT || !can_create_pseudo_p ())
+    return false;
+
+  return small_toc_ref (XEXP (op, 0), Pmode);
+})
+
+;; Match the memory operand that has been fused with an addis instruction and
+;; wrapped inside of an (unspec [...] UNSPEC_FUSION_ADDIS) wrapper.
+(define_predicate "toc_fusion_mem_wrapped"
+  (match_code "mem")
+{
+  rtx addr;
+
+  if (!TARGET_TOC_FUSION_INT)
+    return false;
+
+  if (!MEM_P (op))
+    return false;
+
+  addr = XEXP (op, 0);
+  return (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS);
+})
+
 ;; Match the first insn (addis) in fusing the combination of addis and loads to
 ;; GPR registers on power8.
 (define_predicate "fusion_gpr_addis"
@@ -1625,8 +1724,6 @@  (define_predicate "fusion_gpr_addis"
   else
     return 0;
 
-  /* Power8 currently will only do the fusion if the top 11 bits of the addis
-     value are all 1's or 0's.  */
   value = INTVAL (int_const);
   if ((value & (HOST_WIDE_INT)0xffff) != 0)
     return 0;
@@ -1634,6 +1731,12 @@  (define_predicate "fusion_gpr_addis"
   if ((value & (HOST_WIDE_INT)0xffff0000) == 0)
     return 0;
 
+  /* Power8 currently will only do the fusion if the top 11 bits of the addis
+     value are all 1's or 0's.  Ignore this restriction if we are testing
+     advanced fusion.  */
+  if (TARGET_P9_FUSION)
+    return 1;
+
   return (IN_RANGE (value >> 16, -32, 31));
 })
 
@@ -1699,13 +1802,14 @@  (define_predicate "fusion_gpr_mem_load"
 ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
 ;; memory field with both the addis and the memory offset.  Sign extension
 ;; is not handled here, since lha and lwa are not fused.
-(define_predicate "fusion_gpr_mem_combo"
-  (match_code "mem,zero_extend")
+;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
+(define_predicate "fusion_addis_mem_combo_load"
+  (match_code "mem,zero_extend,float_extend")
 {
   rtx addr, base, offset;
 
-  /* Handle zero extend.  */
-  if (GET_CODE (op) == ZERO_EXTEND)
+  /* Handle zero/float extend.  */
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
     {
       op = XEXP (op, 0);
       mode = GET_MODE (op);
@@ -1726,6 +1830,71 @@  (define_predicate "fusion_gpr_mem_combo"
 	return 0;
       break;
 
+    case SFmode:
+    case DFmode:
+      if (!TARGET_P9_FUSION)
+	return 0;
+      break;
+
+    default:
+      return 0;
+    }
+
+  addr = XEXP (op, 0);
+  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
+    return 0;
+
+  base = XEXP (addr, 0);
+  if (!fusion_gpr_addis (base, GET_MODE (base)))
+    return 0;
+
+  offset = XEXP (addr, 1);
+  if (GET_CODE (addr) == PLUS)
+    return satisfies_constraint_I (offset);
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return 0;
+})
+
+;; Like fusion_addis_mem_combo_load, but for stores
+(define_predicate "fusion_addis_mem_combo_store"
+  (match_code "mem")
+{
+  rtx addr, base, offset;
+
+  if (!MEM_P (op) || !TARGET_P9_FUSION)
+    return 0;
+
+  switch (mode)
+    {
+    case QImode:
+    case HImode:
+    case SImode:
+      break;
+
+    case DImode:
+      if (!TARGET_POWERPC64)
+	return 0;
+      break;
+
+    case SFmode:
+      if (!TARGET_SF_FPR)
+	return 0;
+      break;
+
+    case DFmode:
+      if (!TARGET_DF_FPR)
+	return 0;
+      break;
+
     default:
       return 0;
     }
@@ -1753,3 +1922,20 @@  (define_predicate "fusion_gpr_mem_combo"
 
   return 0;
 })
+
+;; Return true if the operand is a float_extend or zero extend of an
+;; offsettable memory operand suitable for use in fusion
+(define_predicate "fusion_offsettable_mem_operand"
+  (match_code "mem,zero_extend,float_extend")
+{
+  if (GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == FLOAT_EXTEND)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  if (!memory_operand (op, mode))
+    return 0;
+
+  return offsettable_nonstrict_memref_p (op);
+})
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 229970)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -87,7 +87,15 @@  extern bool direct_move_p (rtx, rtx);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
+extern void emit_fusion_addis (rtx, rtx, const char *, const char *);
+extern void emit_fusion_load_store (rtx, rtx, rtx, const char *);
 extern const char *emit_fusion_gpr_load (rtx, rtx);
+extern bool fusion_p9_p (rtx, rtx, rtx, rtx);
+extern void expand_fusion_p9_load (rtx *);
+extern void expand_fusion_p9_store (rtx *);
+extern const char *emit_fusion_p9_load (rtx, rtx, rtx);
+extern const char *emit_fusion_p9_store (rtx, rtx, rtx);
+extern rtx fusion_wrap_memory_address (rtx);
 extern enum reg_class (*rs6000_preferred_reload_class_ptr) (rtx,
 							    enum reg_class);
 extern enum reg_class (*rs6000_secondary_reload_class_ptr) (enum reg_class,
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 229975)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -376,8 +376,18 @@  struct rs6000_reg_addr {
   enum insn_code reload_fpr_gpr;	/* INSN to move from FPR to GPR.  */
   enum insn_code reload_gpr_vsx;	/* INSN to move from GPR to VSX.  */
   enum insn_code reload_vsx_gpr;	/* INSN to move from VSX to GPR.  */
+  enum insn_code fusion_gpr_ld;		/* INSN for fusing gpr ADDIS/loads.  */
+					/* INSNs for fusing addi with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
+					/* INSNs for fusing addis with loads
+					   or stores for each reg. class.  */					   
+  enum insn_code fusion_addis_ld[(int)N_RELOAD_REG];
+  enum insn_code fusion_addis_st[(int)N_RELOAD_REG];
   addr_mask_type addr_mask[(int)N_RELOAD_REG]; /* Valid address masks.  */
   bool scalar_in_vmx_p;			/* Scalar value can go in VMX.  */
+  bool fused_toc;			/* Mode supports TOC fusion.  */
 };
 
 static struct rs6000_reg_addr reg_addr[NUM_MACHINE_MODES];
@@ -2026,25 +2036,113 @@  DEBUG_FUNCTION void
 rs6000_debug_print_mode (ssize_t m)
 {
   ssize_t rc;
+  int spaces = 0;
+  bool fuse_extra_p;
 
   fprintf (stderr, "Mode: %-5s", GET_MODE_NAME (m));
   for (rc = 0; rc < N_RELOAD_REG; rc++)
     fprintf (stderr, " %s: %s", reload_reg_map[rc].name,
 	     rs6000_debug_addr_mask (reg_addr[m].addr_mask[rc], true));
 
+  if ((reg_addr[m].reload_store != CODE_FOR_nothing)
+      || (reg_addr[m].reload_load != CODE_FOR_nothing))
+    fprintf (stderr, "  Reload=%c%c",
+	     (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
+	     (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*');
+  else
+    spaces += sizeof ("  Reload=sl") - 1;
+
+  if (reg_addr[m].scalar_in_vmx_p)
+    {
+      fprintf (stderr, "%*s  Upper=y", spaces, "");
+      spaces = 0;
+    }
+  else
+    spaces += sizeof ("  Upper=y") - 1;
+
+  fuse_extra_p = ((reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+		  || reg_addr[m].fused_toc);
+  if (!fuse_extra_p)
+    {
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      if (reg_addr[m].fusion_addi_ld[rc]     != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_ld[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addi_st[rc]  != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing
+		  || reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		{
+		  fuse_extra_p = true;
+		  break;
+		}
+	    }
+	}
+    }
+
+  if (fuse_extra_p)
+    {
+      fprintf (stderr, "%*s  Fuse:", spaces, "");
+      spaces = 0;
+
+      for (rc = 0; rc < N_RELOAD_REG; rc++)
+	{
+	  if (rc != RELOAD_REG_ANY)
+	    {
+	      char load, store;
+
+	      if (reg_addr[m].fusion_addis_ld[rc] != CODE_FOR_nothing)
+		load = 'l';
+	      else if (reg_addr[m].fusion_addi_ld[rc] != CODE_FOR_nothing)
+		load = 'L';
+	      else
+		load = '-';
+
+	      if (reg_addr[m].fusion_addis_st[rc] != CODE_FOR_nothing)
+		store = 's';
+	      else if (reg_addr[m].fusion_addi_st[rc] != CODE_FOR_nothing)
+		store = 'S';
+	      else
+		store = '-';
+
+	      if (load == '-' && store == '-')
+		spaces += 5;
+	      else
+		{
+		  fprintf (stderr, "%*s%c=%c%c", (spaces + 1), "",
+			   reload_reg_map[rc].name[0], load, store);
+		  spaces = 0;
+		}
+	    }
+	}
+
+      if (reg_addr[m].fusion_gpr_ld != CODE_FOR_nothing)
+	{
+	  fprintf (stderr, "%*sP8gpr", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" P8gpr") - 1;
+
+      if (reg_addr[m].fused_toc)
+	{
+	  fprintf (stderr, "%*sToc", (spaces + 1), "");
+	  spaces = 0;
+	}
+      else
+	spaces += sizeof (" Toc") - 1;
+    }
+  else
+    spaces += sizeof ("  Fuse: G=ls F=ls v=ls P8gpr Toc") - 1;
+
   if (rs6000_vector_unit[m] != VECTOR_NONE
-      || rs6000_vector_mem[m] != VECTOR_NONE
-      || (reg_addr[m].reload_store != CODE_FOR_nothing)
-      || (reg_addr[m].reload_load != CODE_FOR_nothing)
-      || reg_addr[m].scalar_in_vmx_p)
+      || rs6000_vector_mem[m] != VECTOR_NONE)
     {
-      fprintf (stderr,
-	       "  Vector-arith=%-10s Vector-mem=%-10s Reload=%c%c Upper=%c",
+      fprintf (stderr, "%*s  vector: arith=%-10s mem=%s",
+	       spaces, "",
 	       rs6000_debug_vector_unit (rs6000_vector_unit[m]),
-	       rs6000_debug_vector_unit (rs6000_vector_mem[m]),
-	       (reg_addr[m].reload_store != CODE_FOR_nothing) ? 's' : '*',
-	       (reg_addr[m].reload_load != CODE_FOR_nothing) ? 'l' : '*',
-	       (reg_addr[m].scalar_in_vmx_p) ? 'y' : 'n');
+	       rs6000_debug_vector_unit (rs6000_vector_mem[m]));
     }
 
   fputs ("\n", stderr);
@@ -3019,6 +3117,130 @@  rs6000_init_hard_regno_mode_ok (bool glo
 	reg_addr[SFmode].scalar_in_vmx_p = true;
     }
 
+  /* Setup the fusion operations.  */
+  if (TARGET_P8_FUSION)
+    {
+      reg_addr[QImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_qi;
+      reg_addr[HImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_hi;
+      reg_addr[SImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_si;
+      if (TARGET_64BIT)
+	reg_addr[DImode].fusion_gpr_ld = CODE_FOR_fusion_gpr_load_di;
+    }
+
+  if (TARGET_P9_FUSION)
+    {
+      struct fuse_insns {
+	enum machine_mode mode;			/* mode of the fused type.  */
+	enum machine_mode pmode;		/* pointer mode.  */
+	enum rs6000_reload_reg_type rtype;	/* register type.  */
+	enum insn_code load;			/* load insn.  */
+	enum insn_code store;			/* store insn.  */
+      };
+
+      static const struct fuse_insns addis_insns[] = {
+	{ SFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_sf_load,
+	  CODE_FOR_fusion_fpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_sf_load,
+	  CODE_FOR_fusion_fpr_si_sf_store },
+
+	{ DFmode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_df_load,
+	  CODE_FOR_fusion_fpr_di_df_store },
+
+	{ DFmode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_df_load,
+	  CODE_FOR_fusion_fpr_si_df_store },
+
+	{ DImode, DImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_di_di_load,
+	  CODE_FOR_fusion_fpr_di_di_store },
+
+	{ DImode, SImode, RELOAD_REG_FPR,
+	  CODE_FOR_fusion_fpr_si_di_load,
+	  CODE_FOR_fusion_fpr_si_di_store },
+
+	{ QImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_qi_load,
+	  CODE_FOR_fusion_gpr_di_qi_store },
+
+	{ QImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_qi_load,
+	  CODE_FOR_fusion_gpr_si_qi_store },
+
+	{ HImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_hi_load,
+	  CODE_FOR_fusion_gpr_di_hi_store },
+
+	{ HImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_hi_load,
+	  CODE_FOR_fusion_gpr_si_hi_store },
+
+	{ SImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_si_load,
+	  CODE_FOR_fusion_gpr_di_si_store },
+
+	{ SImode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_si_load,
+	  CODE_FOR_fusion_gpr_si_si_store },
+
+	{ SFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_sf_load,
+	  CODE_FOR_fusion_gpr_di_sf_store },
+
+	{ SFmode, SImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_si_sf_load,
+	  CODE_FOR_fusion_gpr_si_sf_store },
+
+	{ DImode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_di_load,
+	  CODE_FOR_fusion_gpr_di_di_store },
+
+	{ DFmode, DImode, RELOAD_REG_GPR,
+	  CODE_FOR_fusion_gpr_di_df_load,
+	  CODE_FOR_fusion_gpr_di_df_store },
+      };
+
+      enum machine_mode cur_pmode = Pmode;
+      size_t i;
+
+      for (i = 0; i < ARRAY_SIZE (addis_insns); i++)
+	{
+	  enum machine_mode xmode = addis_insns[i].mode;
+	  enum rs6000_reload_reg_type rtype = addis_insns[i].rtype;
+
+	  if (addis_insns[i].pmode != cur_pmode)
+	    continue;
+
+	  if (rtype == RELOAD_REG_FPR
+	      && (!TARGET_HARD_FLOAT || !TARGET_FPRS))
+	    continue;
+
+	  reg_addr[xmode].fusion_addis_ld[rtype] = addis_insns[i].load;
+	  reg_addr[xmode].fusion_addis_st[rtype] = addis_insns[i].store;
+	}
+    }
+
+  /* Note which types we support fusing TOC setup plus memory insn.  We only do
+     fused TOCs for medium/large code models.  */
+  if (TARGET_P8_FUSION && TARGET_TOC_FUSION && TARGET_POWERPC64
+      && (TARGET_CMODEL != CMODEL_SMALL))
+    {
+      reg_addr[QImode].fused_toc = true;
+      reg_addr[HImode].fused_toc = true;
+      reg_addr[SImode].fused_toc = true;
+      reg_addr[DImode].fused_toc = true;
+      if (TARGET_HARD_FLOAT && TARGET_FPRS)
+	{
+	  if (TARGET_SINGLE_FLOAT)
+	    reg_addr[SFmode].fused_toc = true;
+	  if (TARGET_DOUBLE_FLOAT)
+	    reg_addr[DFmode].fused_toc = true;
+	}
+    }
+
   /* Precalculate HARD_REGNO_NREGS.  */
   for (r = 0; r < FIRST_PSEUDO_REGISTER; ++r)
     for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -8127,6 +8349,8 @@  rs6000_legitimate_address_p (machine_mod
       && legitimate_constant_pool_address_p (x, mode,
 					     reg_ok_strict || lra_in_progress))
     return 1;
+  if (reg_offset_p && reg_addr[mode].fused_toc && toc_fusion_mem_wrapped (x, mode))
+    return 1;
   /* For TImode, if we have load/store quad and TImode in VSX registers, only
      allow register indirect addresses.  This will allow the values to go in
      either GPRs or VSX registers without reloading.  The vector types would
@@ -35209,72 +35433,21 @@  expand_fusion_gpr_load (rtx *operands)
   return;
 }
 
-/* Return a string to fuse an addis instruction with a gpr load to the same
-   register that we loaded up the addis instruction.  The address that is used
-   is the logical address that was formed during peephole2:
-	(lo_sum (high) (low-part))
-
-   The code is complicated, so we call output_asm_insn directly, and just
-   return "".  */
+/* Emit the addis instruction that will be part of a fused instruction
+   sequence.  */
 
-const char *
-emit_fusion_gpr_load (rtx target, rtx mem)
+void
+emit_fusion_addis (rtx target, rtx addis_value, const char *comment,
+		   const char *mode_name)
 {
-  rtx addis_value;
   rtx fuse_ops[10];
-  rtx addr;
-  rtx load_offset;
-  const char *addis_str = NULL;
-  const char *load_str = NULL;
-  const char *mode_name = NULL;
   char insn_template[80];
-  machine_mode mode;
+  const char *addis_str = NULL;
   const char *comment_str = ASM_COMMENT_START;
 
-  if (GET_CODE (mem) == ZERO_EXTEND)
-    mem = XEXP (mem, 0);
-
-  gcc_assert (REG_P (target) && MEM_P (mem));
-
   if (*comment_str == ' ')
     comment_str++;
 
-  addr = XEXP (mem, 0);
-  if (GET_CODE (addr) != PLUS && GET_CODE (addr) != LO_SUM)
-    gcc_unreachable ();
-
-  addis_value = XEXP (addr, 0);
-  load_offset = XEXP (addr, 1);
-
-  /* Now emit the load instruction to the same register.  */
-  mode = GET_MODE (mem);
-  switch (mode)
-    {
-    case QImode:
-      mode_name = "char";
-      load_str = "lbz";
-      break;
-
-    case HImode:
-      mode_name = "short";
-      load_str = "lhz";
-      break;
-
-    case SImode:
-      mode_name = "int";
-      load_str = "lwz";
-      break;
-
-    case DImode:
-      gcc_assert (TARGET_POWERPC64);
-      mode_name = "long";
-      load_str = "ld";
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
   /* Emit the addis instruction.  */
   fuse_ops[0] = target;
   if (satisfies_constraint_L (addis_value))
@@ -35353,68 +35526,531 @@  emit_fusion_gpr_load (rtx target, rtx me
   if (!addis_str)
     fatal_insn ("Could not generate addis value for fusion", addis_value);
 
-  sprintf (insn_template, "%s\t\t%s gpr load fusion, type %s", addis_str,
-	   comment_str, mode_name);
+  sprintf (insn_template, "%s\t\t%s %s, type %s", addis_str, comment_str,
+	   comment, mode_name);
   output_asm_insn (insn_template, fuse_ops);
+}
 
-  /* Emit the D-form load instruction.  */
-  if (CONST_INT_P (load_offset) && satisfies_constraint_I (load_offset))
+/* Emit a D-form load or store instruction that is the second instruction
+   of a fusion sequence.  */
+
+void
+emit_fusion_load_store (rtx load_store_reg, rtx addis_reg, rtx offset,
+			const char *insn_str)
+{
+  rtx fuse_ops[10];
+  char insn_template[80];
+
+  fuse_ops[0] = load_store_reg;
+  fuse_ops[1] = addis_reg;
+
+  if (CONST_INT_P (offset) && satisfies_constraint_I (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1(%%0)", load_str);
-      fuse_ops[1] = load_offset;
+      sprintf (insn_template, "%s %%0,%%2(%%1)", insn_str);
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == UNSPEC
-	   && XINT (load_offset, 1) == UNSPEC_TOCREL)
+  else if (GET_CODE (offset) == UNSPEC
+	   && XINT (offset, 1) == UNSPEC_TOCREL)
     {
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (load_offset, 0, 0);
+      fuse_ops[2] = XVECEXP (offset, 0, 0);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (GET_CODE (load_offset) == PLUS
-	   && GET_CODE (XEXP (load_offset, 0)) == UNSPEC
-	   && XINT (XEXP (load_offset, 0), 1) == UNSPEC_TOCREL
-	   && CONST_INT_P (XEXP (load_offset, 1)))
+  else if (GET_CODE (offset) == PLUS
+	   && GET_CODE (XEXP (offset, 0)) == UNSPEC
+	   && XINT (XEXP (offset, 0), 1) == UNSPEC_TOCREL
+	   && CONST_INT_P (XEXP (offset, 1)))
     {
-      rtx tocrel_unspec = XEXP (load_offset, 0);
+      rtx tocrel_unspec = XEXP (offset, 0);
       if (TARGET_ELF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@toc@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@toc@l(%%1)", insn_str);
 
       else if (TARGET_XCOFF)
-	sprintf (insn_template, "%s %%0,%%1+%%2@l(%%0)", load_str);
+	sprintf (insn_template, "%s %%0,%%2+%%3@l(%%1)", insn_str);
 
       else
 	gcc_unreachable ();
 
-      fuse_ops[1] = XVECEXP (tocrel_unspec, 0, 0);
-      fuse_ops[2] = XEXP (load_offset, 1);
+      fuse_ops[2] = XVECEXP (tocrel_unspec, 0, 0);
+      fuse_ops[3] = XEXP (offset, 1);
       output_asm_insn (insn_template, fuse_ops);
     }
 
-  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (load_offset))
+  else if (TARGET_ELF && !TARGET_POWERPC64 && CONSTANT_P (offset))
     {
-      sprintf (insn_template, "%s %%0,%%1@l(%%0)", load_str);
+      sprintf (insn_template, "%s %%0,%%2@l(%%1)", insn_str);
 
-      fuse_ops[1] = load_offset;
+      fuse_ops[2] = offset;
       output_asm_insn (insn_template, fuse_ops);
     }
 
   else
-    fatal_insn ("Unable to generate load offset for fusion", load_offset);
+    fatal_insn ("Unable to generate load/store offset for fusion", offset);
+
+  return;
+}
+
+/* Wrap a TOC address that can be fused to indicate that special fusion
+   processing is needed.  */
+
+rtx
+fusion_wrap_memory_address (rtx old_mem)
+{
+  rtx old_addr = XEXP (old_mem, 0);
+  rtvec v = gen_rtvec (1, old_addr);
+  rtx new_addr = gen_rtx_UNSPEC (Pmode, v, UNSPEC_FUSION_ADDIS);
+  return replace_equiv_address_nv (old_mem, new_addr, false);
+}
+
+/* Given an address, convert it into the addis and load offset parts.  Addresses
+   created during the peephole2 process look like:
+	(lo_sum (high (unspec [(sym)] UNSPEC_TOCREL))
+		(unspec [(...)] UNSPEC_TOCREL))
+
+   Addresses created via toc fusion look like:
+	(unspec [(unspec [(...)] UNSPEC_TOCREL)] UNSPEC_FUSION_ADDIS))  */
+
+static void
+fusion_split_address (rtx addr, rtx *p_hi, rtx *p_lo)
+{
+  rtx hi, lo;
+
+  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_FUSION_ADDIS)
+    {
+      lo = XVECEXP (addr, 0, 0);
+      hi = gen_rtx_HIGH (Pmode, lo);
+    }
+  else if (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
+    {
+      hi = XEXP (addr, 0);
+      lo = XEXP (addr, 1);
+    }
+  else
+    gcc_unreachable ();
+
+  *p_hi = hi;
+  *p_lo = lo;
+}
+
+/* Return a string to fuse an addis instruction with a gpr load to the same
+   register that we loaded up the addis instruction.  The address that is used
+   is the logical address that was formed during peephole2:
+	(lo_sum (high) (low-part))
+
+   Or the address is the TOC address that is wrapped before register allocation:
+	(unspec [(addr) (toc-reg)] UNSPEC_FUSION_ADDIS)
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_gpr_load (rtx target, rtx mem)
+{
+  rtx addis_value;
+  rtx addr;
+  rtx load_offset;
+  const char *load_str = NULL;
+  const char *mode_name = NULL;
+  machine_mode mode;
+
+  if (GET_CODE (mem) == ZERO_EXTEND)
+    mem = XEXP (mem, 0);
+
+  gcc_assert (REG_P (target) && MEM_P (mem));
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &addis_value, &load_offset);
+
+  /* Now emit the load instruction to the same register.  */
+  mode = GET_MODE (mem);
+  switch (mode)
+    {
+    case QImode:
+      mode_name = "char";
+      load_str = "lbz";
+      break;
+
+    case HImode:
+      mode_name = "short";
+      load_str = "lhz";
+      break;
+
+    case SImode:
+    case SFmode:
+      mode_name = (mode == SFmode) ? "float" : "int";
+      load_str = "lwz";
+      break;
+
+    case DImode:
+    case DFmode:
+      gcc_assert (TARGET_POWERPC64);
+      mode_name = (mode == DFmode) ? "double" : "long";
+      load_str = "ld";
+      break;
+
+    default:
+      fatal_insn ("Bad GPR fusion", gen_rtx_SET (target, mem));
+    }
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (target, addis_value, "gpr load fusion", mode_name);
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (target, target, load_offset, load_str);
 
   return "";
 }
 
+
+/* Return true if the peephole2 can combine a load/store involving a
+   combination of an addis instruction and the memory operation.  This was
+   added to the ISA 3.0 (power9) hardware.  */
+
+bool
+fusion_p9_p (rtx addis_reg,		/* register set via addis.  */
+	     rtx addis_value,		/* addis value.  */
+	     rtx dest,			/* destination (memory or register). */
+	     rtx src)			/* source (register or memory).  */
+{
+  rtx addr, mem, offset;
+  enum machine_mode mode = GET_MODE (src);
+
+  /* Validate arguments.  */
+  if (!base_reg_operand (addis_reg, GET_MODE (addis_reg)))
+    return false;
+
+  if (!fusion_gpr_addis (addis_value, GET_MODE (addis_value)))
+    return false;
+
+  /* Ignore extend operations that are part of the load.  */
+  if (GET_CODE (src) == FLOAT_EXTEND || GET_CODE (src) == ZERO_EXTEND)
+    src = XEXP (src, 0);
+
+  /* Test for memory<-register or register<-memory.  */
+  if (fpr_reg_operand (src, mode) || int_reg_operand (src, mode))
+    {
+      if (!MEM_P (dest))
+	return false;
+
+      mem = dest;
+    }
+
+  else if (MEM_P (src))
+    {
+      if (!fpr_reg_operand (dest, mode) && !int_reg_operand (dest, mode))
+	return false;
+
+      mem = src;
+    }
+
+  else
+    return false;
+
+  addr = XEXP (mem, 0);			/* either PLUS or LO_SUM.  */
+  if (GET_CODE (addr) == PLUS)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      return satisfies_constraint_I (XEXP (addr, 1));
+    }
+
+  else if (GET_CODE (addr) == LO_SUM)
+    {
+      if (!rtx_equal_p (addis_reg, XEXP (addr, 0)))
+	return false;
+
+      offset = XEXP (addr, 1);
+      if (TARGET_XCOFF || (TARGET_ELF && TARGET_POWERPC64))
+	return small_toc_ref (offset, GET_MODE (offset));
+
+      else if (TARGET_ELF && !TARGET_POWERPC64)
+	return CONSTANT_P (offset);
+    }
+
+  return false;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   load sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target register being loaded
+	operands[3]	D-form memory reference using operands[0].
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_load (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx target = operands[2];
+  rtx orig_mem = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (target);
+  machine_mode extend_mode = target_mode;
+  machine_mode ptr_mode = Pmode;
+  enum rtx_code extend = UNKNOWN;
+
+  if (GET_CODE (orig_mem) == FLOAT_EXTEND || GET_CODE (orig_mem) == ZERO_EXTEND)
+    {
+      extend = GET_CODE (orig_mem);
+      orig_mem = XEXP (orig_mem, 0);
+      target_mode = GET_MODE (orig_mem);
+    }
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  if (extend != UNKNOWN)
+    new_mem = gen_rtx_fmt_e (extend, extend_mode, new_mem);
+
+  new_mem = gen_rtx_UNSPEC (extend_mode, gen_rtvec (1, new_mem),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (target, new_mem);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* During the peephole2 pass, adjust and expand the insns for an extended fusion
+   store sequence.
+
+   The operands are:
+	operands[0]	register set with addis
+	operands[1]	value set via addis
+	operands[2]	target D-form memory being stored to
+	operands[3]	register being stored
+
+  This is similar to the fusion introduced with power8, except it scales to
+  both loads/stores and does not require the result register to be the same as
+  the base register.  At the moment, we only do this if register set with addis
+  is dead.  */
+
+void
+expand_fusion_p9_store (rtx *operands)
+{
+  rtx tmp_reg = operands[0];
+  rtx addis_value = operands[1];
+  rtx orig_mem = operands[2];
+  rtx src = operands[3];
+  rtx  new_addr, new_mem, orig_addr, offset, set, clobber, insn, new_src;
+  enum rtx_code plus_or_lo_sum;
+  machine_mode target_mode = GET_MODE (orig_mem);
+  machine_mode ptr_mode = Pmode;
+
+  gcc_assert (MEM_P (orig_mem));
+
+  orig_addr = XEXP (orig_mem, 0);
+  plus_or_lo_sum = GET_CODE (orig_addr);
+  gcc_assert (plus_or_lo_sum == PLUS || plus_or_lo_sum == LO_SUM);
+
+  offset = XEXP (orig_addr, 1);
+  new_addr = gen_rtx_fmt_ee (plus_or_lo_sum, ptr_mode, addis_value, offset);
+  new_mem = replace_equiv_address_nv (orig_mem, new_addr, false);
+
+  new_src = gen_rtx_UNSPEC (target_mode, gen_rtvec (1, src),
+			    UNSPEC_FUSION_P9);
+
+  set = gen_rtx_SET (new_mem, new_src);
+  clobber = gen_rtx_CLOBBER (VOIDmode, tmp_reg);
+  insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber));
+  emit_insn (insn);
+
+  return;
+}
+
+/* Return a string to fuse an addis instruction with a load using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_load (rtx reg, rtx mem, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *load_string;
+  int r;
+
+  if (GET_CODE (mem) == FLOAT_EXTEND || GET_CODE (mem) == ZERO_EXTEND)
+    {
+      mem = XEXP (mem, 0);
+      mode = GET_MODE (mem);
+    }
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_load, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	load_string = "lfs";
+      else if (mode == DFmode || mode == DImode)
+	load_string = "lfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  load_string = "lbz";
+	  break;
+	case HImode:
+	  load_string = "lhz";
+	  break;
+	case SImode:
+	case SFmode:
+	  load_string = "lwz";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  load_string = "ld";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_load, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_load not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 load fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, load_string);
+
+  return "";
+}
+
+/* Return a string to fuse an addis instruction with a store using extended
+   fusion.  The address that is used is the logical address that was formed
+   during peephole2: (lo_sum (high) (low-part))
+
+   The code is complicated, so we call output_asm_insn directly, and just
+   return "".  */
+
+const char *
+emit_fusion_p9_store (rtx mem, rtx reg, rtx tmp_reg)
+{
+  enum machine_mode mode = GET_MODE (reg);
+  rtx hi;
+  rtx lo;
+  rtx addr;
+  const char *store_string;
+  int r;
+
+  if (GET_CODE (reg) == SUBREG)
+    {
+      gcc_assert (SUBREG_BYTE (reg) == 0);
+      reg = SUBREG_REG (reg);
+    }
+
+  if (!REG_P (reg))
+    fatal_insn ("emit_fusion_p9_store, bad reg #1", reg);
+
+  r = REGNO (reg);
+  if (FP_REGNO_P (r))
+    {
+      if (mode == SFmode)
+	store_string = "stfs";
+      else if (mode == DFmode)
+	store_string = "stfd";
+      else
+	gcc_unreachable ();
+    }
+  else if (INT_REGNO_P (r))
+    {
+      switch (mode)
+	{
+	case QImode:
+	  store_string = "stb";
+	  break;
+	case HImode:
+	  store_string = "sth";
+	  break;
+	case SImode:
+	case SFmode:
+	  store_string = "stw";
+	  break;
+	case DImode:
+	case DFmode:
+	  if (!TARGET_POWERPC64)
+	    gcc_unreachable ();
+	  store_string = "std";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  else
+    fatal_insn ("emit_fusion_p9_store, bad reg #2", reg);
+
+  if (!MEM_P (mem))
+    fatal_insn ("emit_fusion_p9_store not MEM", mem);
+
+  addr = XEXP (mem, 0);
+  fusion_split_address (addr, &hi, &lo);
+
+  /* Emit the addis instruction.  */
+  emit_fusion_addis (tmp_reg, hi, "power9 store fusion", GET_MODE_NAME (mode));
+
+  /* Emit the D-form load instruction.  */
+  emit_fusion_load_store (reg, tmp_reg, lo, store_string);
+
+  return "";
+}
+
+
 /* Analyze vector computations and remove unnecessary doubleword
    swaps (xxswapdi instructions).  This pass is performed only
    for little-endian VSX code generation.
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 229975)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -703,6 +703,22 @@  extern int rs6000_vector_align[];
 			 && TARGET_DOUBLE_FLOAT \
 			 && (TARGET_PPC_GFXOPT || VECTOR_UNIT_VSX_P (DFmode)))
 
+/* Conditions to allow TOC fusion for loading/storing integers.  */
+#define TARGET_TOC_FUSION_INT	(TARGET_P8_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64)
+
+/* Conditions to allow TOC fusion for loading/storing floating point.  */
+#define TARGET_TOC_FUSION_FP	(TARGET_P9_FUSION			\
+				 && TARGET_TOC_FUSION			\
+				 && (TARGET_CMODEL != CMODEL_SMALL)	\
+				 && TARGET_POWERPC64			\
+				 && TARGET_HARD_FLOAT			\
+				 && TARGET_FPRS				\
+				 && TARGET_SINGLE_FLOAT			\
+				 && TARGET_DOUBLE_FLOAT)
+
 /* Whether the various reciprocal divide/square root estimate instructions
    exist, and whether we should automatically generate code for the instruction
    by default.  */
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 229975)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -141,6 +141,8 @@  (define_c_enum "unspec"
    UNSPEC_LSQ
    UNSPEC_FUSION_GPR
    UNSPEC_STACK_CHECK
+   UNSPEC_FUSION_P9
+   UNSPEC_FUSION_ADDIS
   ])
 
 ;;
@@ -327,12 +329,28 @@  (define_mode_iterator EXTSI [(DI "TARGET
 ; QImode or HImode for small atomic ops
 (define_mode_iterator QHI [QI HI])
 
+; QImode, HImode, SImode for fused ops only for GPR loads
+(define_mode_iterator QHSI [QI HI SI])
+
 ; HImode or SImode for sign extended fusion ops
 (define_mode_iterator HSI [HI SI])
 
 ; SImode or DImode, even if DImode doesn't fit in GPRs.
 (define_mode_iterator SDI [SI DI])
 
+; Types that can be fused with an ADDIS instruction to load or store a GPR
+; register that has reg+offset addressing.
+(define_mode_iterator GPR_FUSION [QI
+				  HI
+				  SI
+				  (DI	"TARGET_POWERPC64")
+				  SF
+				  (DF	"TARGET_POWERPC64")])
+
+; Types that can be fused with an ADDIS instruction to load or store a FPR
+; register that has reg+offset addressing.
+(define_mode_iterator FPR_FUSION [DI SF DF])
+
 ; The size of a pointer.  Also, the size of the value that a record-condition
 ; (one with a '.') will compare; and the size used for arithmetic carries.
 (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
@@ -12592,6 +12610,66 @@  (define_insn "rs6000_mtfsf"
 ;; a GPR.  The addis instruction must be adjacent to the load, and use the same
 ;; register that is being loaded.  The fused ops must be physically adjacent.
 
+;; There are two parts to addis fusion.  The support for fused TOCs occur
+;; before register allocation, and is meant to reduce the lifetime for the
+;; tempoary register that holds the ADDIS result.  On Power8 GPR loads, we try
+;; to use the register that is being load.  The peephole2 then gathers any
+;; other fused possibilities that it can find after register allocation.  If
+;; power9 fusion is selected, we also fuse floating point loads/stores.
+
+;; Fused TOC support: Replace simple GPR loads with a fused form.  This is done
+;; before register allocation, so that we can avoid allocating a temporary base
+;; register that won't be used, and that we try to load into base registers,
+;; and not register 0.  If we can't get a fused GPR load, generate a P9 fusion
+;; (addis followed by load) even on power8.
+
+(define_split
+  [(set (match_operand:INT1 0 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:INT1 1 "toc_fusion_mem_raw" ""))]
+  "TARGET_TOC_FUSION_INT && can_create_pseudo_p ()"
+  [(parallel [(set (match_dup 0) (match_dup 2))
+	      (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+	      (use (match_dup 3))
+	      (clobber (scratch:DI))])]
+{
+  operands[2] = fusion_wrap_memory_address (operands[1]);
+  operands[3] = gen_rtx_REG (Pmode, TOC_REGISTER);
+})
+
+(define_insn "*toc_fusionload_<mode>"
+  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
+	(match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b"))]
+  "TARGET_TOC_FUSION_INT"
+{
+  if (base_reg_operand (operands[0], <MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "*toc_fusionload_di"
+  [(set (match_operand:DI 0 "int_reg_operand" "=&b,??r,?d")
+	(match_operand:DI 1 "toc_fusion_mem_wrapped" "wG,wG,wG"))
+   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
+   (use (match_operand:DI 2 "base_reg_operand" "r,r,r"))
+   (clobber (match_scratch:DI 3 "=X,&b,&b"))]
+  "TARGET_TOC_FUSION_INT && TARGET_POWERPC64
+   && (MEM_P (operands[1]) || int_reg_operand (operands[0], DImode))"
+{
+  if (base_reg_operand (operands[0], DImode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[3]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+
 ;; Find cases where the addis that feeds into a load instruction is either used
 ;; once or is the same as the target register, and replace it with the fusion
 ;; insn
@@ -12615,7 +12693,7 @@  (define_peephole2
 
 (define_insn "fusion_gpr_load_<mode>"
   [(set (match_operand:INT1 0 "base_reg_operand" "=&b")
-	(unspec:INT1 [(match_operand:INT1 1 "fusion_gpr_mem_combo" "")]
+	(unspec:INT1 [(match_operand:INT1 1 "fusion_addis_mem_combo_load" "")]
 		     UNSPEC_FUSION_GPR))]
   "TARGET_P8_FUSION"
 {
@@ -12625,6 +12703,133 @@  (define_insn "fusion_gpr_load_<mode>"
    (set_attr "length" "8")])
 
 
+;; ISA 3.0 (power9) fusion support
+;; Merge addis with floating load/store to FPRs (or GPRs).
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "toc_fusion_or_p9_reg_operand" "")
+	(match_operand:SFDF 3 "fusion_offsettable_mem_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_load (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:P 0 "base_reg_operand" "")
+	(match_operand:P 1 "fusion_gpr_addis" ""))
+   (set (match_operand:SFDF 2 "offsettable_mem_operand" "")
+	(match_operand:SFDF 3 "toc_fusion_or_p9_reg_operand" ""))]
+  "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+  [(const_int 0)]
+{
+  expand_fusion_p9_store (operands);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_dup 0)
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 2 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION"
+  [(set (match_dup 0)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 2)] UNSPEC_FUSION_P9))])
+
+(define_peephole2
+  [(set (match_operand:SDI 0 "int_reg_operand" "")
+	(match_operand:SDI 1 "upper16_cint_operand" ""))
+   (set (match_operand:SDI 2 "int_reg_operand" "")
+	(ior:SDI (match_dup 0)
+		 (match_operand:SDI 3 "u_short_cint_operand" "")))]
+  "TARGET_P9_FUSION
+   && !rtx_equal_p (operands[0], operands[2])
+   && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2)
+	(unspec:SDI [(match_dup 1)
+		     (match_dup 3)] UNSPEC_FUSION_P9))])
+
+;; Fusion insns, created by the define_peephole2 above (and eventually by
+;; reload).  Because we want to eventually have secondary_reload generate
+;; these, they have to have a single alternative that gives the register
+;; classes.  This means we need to have separate gpr/fpr/altivec versions.
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load"
+  [(set (match_operand:GPR_FUSION 0 "int_reg_operand" "=r")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  /* This insn is a secondary reload insn, which cannot have alternatives.
+     If we are not loading up register 0, use the power8 fusion instead.  */
+  if (base_reg_operand (operands[0], <GPR_FUSION:MODE>mode))
+    return emit_fusion_gpr_load (operands[0], operands[1]);
+
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store"
+  [(set (match_operand:GPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:GPR_FUSION
+	 [(match_operand:GPR_FUSION 1 "int_reg_operand" "r")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=&b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "store")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_load"
+  [(set (match_operand:FPR_FUSION 0 "fpr_reg_operand" "=d")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fusion_addis_mem_combo_load" "wF")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_load (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpload")
+   (set_attr "length" "8")])
+
+(define_insn "fusion_fpr_<P:mode>_<FPR_FUSION:mode>_store"
+  [(set (match_operand:FPR_FUSION 0 "fusion_addis_mem_combo_store" "=wF")
+	(unspec:FPR_FUSION
+	 [(match_operand:FPR_FUSION 1 "fpr_reg_operand" "d")]
+	 UNSPEC_FUSION_P9))
+   (clobber (match_operand:P 2 "base_reg_operand" "=b"))]
+  "TARGET_P9_FUSION"
+{
+  return emit_fusion_p9_store (operands[0], operands[1], operands[2]);
+}
+  [(set_attr "type" "fpstore")
+   (set_attr "length" "8")])
+
+(define_insn "*fusion_p9_<mode>_constant"
+  [(set (match_operand:SDI 0 "int_reg_operand" "=r")
+	(unspec:SDI [(match_operand:SDI 1 "upper16_cint_operand" "L")
+		     (match_operand:SDI 2 "u_short_cint_operand" "K")]
+		    UNSPEC_FUSION_P9))]	
+  "TARGET_P9_FUSION"
+{
+  emit_fusion_addis (operands[0], operands[1], "constant", "<MODE>");
+  return "ori %0,%0,%2";
+}
+  [(set_attr "type" "two")
+   (set_attr "length" "8")])
+
+
 ;; Miscellaneous ISA 2.06 (power7) instructions
 (define_insn "addg6s"
   [(set (match_operand:SI 0 "register_operand" "=r")
@@ -12791,6 +12996,7 @@  (define_insn "pack<mode>"
   "xxpermdi %x0,%x1,%x2,0"
   [(set_attr "type" "vecperm")])
 
+
 
 
 (include "sync.md")
Index: gcc/testsuite/gcc.target/powerpc/fusion2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion2.c	(revision 0)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
+
+vector double fusion_vector (vector double *p) { return p[2]; }
+
+/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/fusion3.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */
+
+#define LARGE 0x12345
+
+int fusion_float_read (float *p){ return p[LARGE]; }
+int fusion_double_read (double *p){ return p[LARGE]; }
+
+void fusion_float_write (float *p, float f){ p[LARGE] = f; }
+void fusion_double_write (double *p, double d){ p[LARGE] = d; }
+
+/* { dg-final { scan-assembler "load fusion, type SF"  } } */
+/* { dg-final { scan-assembler "load fusion, type DF"  } } */
+/* { dg-final { scan-assembler "store fusion, type SF" } } */
+/* { dg-final { scan-assembler "store fusion, type DF" } } */
Index: gcc/testsuite/gcc.target/powerpc/fusion.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/fusion.c	(revision 229970)
+++ gcc/testsuite/gcc.target/powerpc/fusion.c	(working copy)
@@ -1,6 +1,5 @@ 
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
-/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
 /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
@@ -14,10 +13,7 @@  int fusion_short (short *p){ return p[LA
 int fusion_int (int *p){ return p[LARGE]; }
 unsigned fusion_uns (unsigned *p){ return p[LARGE]; }
 
-vector double fusion_vector (vector double *p) { return p[2]; }
-
 /* { dg-final { scan-assembler-times "gpr load fusion"    6 } } */
-/* { dg-final { scan-assembler-times "vector load fusion" 1 } } */
 /* { dg-final { scan-assembler-times "lbz"                2 } } */
 /* { dg-final { scan-assembler-times "extsb"              1 } } */
 /* { dg-final { scan-assembler-times "lhz"                2 } } */