diff mbox

, PR 71977/70568/78823: Improve PowerPC code that uses SFmode in unions

Message ID 20161230002406.GA15372@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Dec. 30, 2016, 12:24 a.m. UTC
This is both a fix to a regression (since GCC 4.9), a code improvement for
GLIBC, and fixes potential bugs with the recent changes to allow small integers
(32-bit integer, SImode in particular) in floating point and vector registers.

The core of the problem is when a SFmode (32-bit binary floating point) value
is a scalar in the floating point and vector registers, it is stored internally
as a 64-bit binary floating point value.  This means that if you could look at
the value using the SUBREG mechanism you might see the wrong value.  Before the
recent changes to add small integer support went in, it was less of an issue,
since the only integer type allowed in floating point and vector registers was
64-bit integers (DImode).

The regression comes in in how SFmode values are moved between general purpose
and floating point/vector registers.  Up through the power7, the way you moved
SFmode values from one register set to another was store and then load.  Doing
back to back store and load to the same location could cause serious
performance problems on recent power systems.  On the power7 and power8, we
would put a special NOP that forces the two instructions to be in different
dispatch groups, which helps somewhat.

When the power8 (ISA 2.07) came out, it had direct move instructions and
convert between scalar double precision and single precision.  I added the
appropriate secondary_reload support so that if the register allocator wanted
to move a SFmode value between register banks, it would create a temporary and
do the appropriate instructions to move the value.  This worked in the GCC 4.8
time frame.

Some time in the 4.9 time frame, this broke and the register allocator would
more often generate store and load instead of the direct move sequences.
However, simple test cases continued to use the direct move instructions.  In
checking on power9 (ISA 3.0) code, it more likely uses the store/load than
direct move.

On power8 spec runs we have seen the effect of these store/load sequences on
some benchmarks from the next generation of the Spec suite.

The optimization that the GLIBC implementers have requested (PR 71977) was to
speed up code sequences that they use in implementing the single precision math
library.  They often times need to extract/modify bits in floating point values
(for example, setting exponents or mantissas, etc.).

For example from e_powf.c, you see code like this after macro expansion:

	typedef union
	{
	  float value;
	  u_int32_t word;
	} ieee_float_shape_type;

	float t1;
	int32_t is;
	/* ... */
	do
	  {
	    ieee_float_shape_type gf_u;
	    gf_u.value = (t1);
	    (is) = gf_u.word;
	  }
	while (0);
	do
	  {
	    ieee_float_shape_type sf_u;
	    sf_u.word = (is&0xfffff000);
	    (t1) = sf_u.value;
	  }
	while (0);

Originally, I just wrote a peephole2 to catch the above code, and it worked in
small test cases on the power8.  But it didn't work on larger programs or on
the power9.  I also wanted to fix the performance issue that we've seen.

I also became convinced that for GCC 7, it was a ticking time bomb where
eventually somebody would write code that intermixed SImode and SFmode, and it
would get the wrong value.

The main part of the patch is to not let the compiler generate:

	(set (reg:SI)
	     (subreg:SF (reg:SI)))

or

	(set (reg:SI)
	     (subreg:SI (reg:SF)))

Most of the register predicates eventually call gpc_reg_operand, and it was
simple to put the check in there, and to other predicates that did not call
gpc_reg_operand. 

I created new insns to do the move between formats that allocated the temporary
needed with match_scratch.

There were places that then needed to not have the check (the movsi/movsf
expanders themselves, and the insn spliters for format conversion insns), and I
added a predicate for that.

I have built the patches on a little endian power8, a big endian power8 (64-bit
only), and a big endian power7 (both 32-bit and 64-bit).  There were no
regression failures.

In addition, I built spec 2006, with the fixes and without, and did a quick run
comparing the results (1 run).  I am re-running the spec results, with the code
merged to today's trunk, and with 3 runs to isolate the occasional benchmark
that goes off in the weeds.

Of the 29 benchmarks in Spec 2006 CPU, 6 benchmarks had changes in the
instructions generated (perlbench, gromacs, cactusADM, namd, povray, wrf).

In the single run I did, there were no regressions, and 2 or 3 benchmarks
improved:

	namd		6%
	tonto		3%
	libquantum	6%

However, single runs of libquantum have varied as much as 10%, so without
seeing more runs, I will skip it.  Namd was one of the benchmarks that saw
changes in code generation, but tonto did not have changes of code.  I suspect
having the separate converter unspec insn, allowed the scheduler to move things
in between the move and the use.

So, in summary, can I check these changes into the GCC 7 trunk?  Given it does
fix a long standing regression in GCC 6 that hurts performance, did you want me
to develop the changes for a GCC 6 backport as well?  I realize it is skating
on thin ice whether it is a feature or a fix.

[gcc]
2016-12-29  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71977
	PR target/70568
	PR target/78823
	* config/rs6000/predicates.md (sf_subreg_operand): New predicate
	to return true if the operand contains a SUBREG mixing SImode and
	SFmode on 64-bit VSX systems with direct move.  This can be a
	problem in that we have to know whether the SFmode value should be
	represented in the 32-bit memory format or the 64-bit scalar
	format used within the floating point and vector registers.
	(altivec_register_operand): Do not return true if the operand
	contains a SUBREG mixing SImode and SFmode.
	(vsx_register_operand): Likewise.
	(vsx_reg_sfsubreg_ok): New predicate.  Like vsx_register_operand,
	but do not check if there is a SUBREG mixing SImode and SFmode on
	64-bit VSX systems with direct move.
	(vfloat_operand): Do not return true if the operand contains a
	SUBREG mixing SImode and SFmode.
	(vint_operand): Likewise.
	(vlogical_operand): Likewise.
	(gpc_reg_operand): Likewise.
	(int_reg_operand): Likewise.
	* config/rs6000/rs6000.c (valid_sf_si_move): New function to
	determine if a MOVSI or MOVSF operation contains SUBREGs that mix
	SImode and SFmode.
	(rs6000_emit_move): If we have a MOVSI or MOVSF operation that
	contains SUBREGs that mix SImode and SFmode, call special insns,
	that can allocate the necessary temporary registers to convert
	between SFmode and SImode within the registers.
	* config/rs6000/vsx.md (SFBOOL_*): Add peephole2 to recognize when
	we are converting a SFmode to a SImode, moving the result to a GPR
	register, doing a single AND/IOR/XOR operation, and then moving it
	back to a vector register.  Change the insns recognized to move
	the integer value to the vector register and do the operation
	there.  This code occurs quite a bit in the GLIBC math library in
	float math functions.
	(peephole2 to speed up GLIB math functions): Likewise.
	* config/rs6000/rs6000-protos.h (valid_sf_si_move): Add
	declaration.
	* config/rs6000/rs6000.h (TARGET_NO_SF_SUBREG): New internal
	target macros to say whether we need to avoid SUBREGs mixing
	SImode and SFmode.
	(TARGET_ALLOW_SF_SUBREG): Likewise.
	* config/rs6000/rs6000.md (UNSPEC_SF_FROM_SI): Ne unspecs.
	(UNSPEC_SI_FROM_SF): Likewise.
	(iorxor): Change spacing.
	(and_ior_xor): New iterator for AND, IOR, and XOR.
	(movsi_from_sf): New insn to handle where we are moving SFmode
	values to SImode registers and we need to convert the value to the
	memory format from the format used within the register.
	(movdi_from_sf_zero_ext): Optimize zero extending movsi_from_sf.
	(mov<mode>_hardfloat, FMOVE32 iterator): Don't allow moving
	SUBREGs mixing SImode and SFmode on 64-bit VSX systems with direct
	move.
	(movsf_from_si): New insn to handle where we are moving SImode
	values to SFmode registers and we need to convert the value to the
	the format used within the floating point and vector registers to
	the 32-bit memory format.
	(fma<mode>4): Change register_operand to gpc_reg_operand to
	prevent SUBREGs mixing SImode and SFmode.
	(fms<mode>4): Likewise.
	(fnma<mode>4): Likewise.
	(fnms<mode>4): Likewise.
	(nfma<mode>4): Likewise.
	(nfms<mode>4): Likewise.

[gcc/testsuite]
2016-12-29  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71977
	PR target/70568
	PR target/78823
	* gcc.target/powerpc/pr71977-1.c: New tests to check whether on
	64-bit VSX systems with direct move, whether we optimize common
	code sequences in the GLIBC math library for float math functions.
	* gcc.target/powerpc/pr71977-2.c: Likewise.

Comments

David Edelsohn Jan. 4, 2017, 7:17 p.m. UTC | #1
On Thu, Dec 29, 2016 at 7:24 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:

Mike,

Thanks for the tremendous effort on this patch.  A few comments.

The ChangeLog contains a lot of extraneous commentary that should be
in the documentation comments.  And a few typos in comments.

>         * config/rs6000/predicates.md (sf_subreg_operand): New predicate
>         to return true if the operand contains a SUBREG mixing SImode and
>         SFmode on 64-bit VSX systems with direct move.  This can be a
>         problem in that we have to know whether the SFmode value should be
>         represented in the 32-bit memory format or the 64-bit scalar
>         format used within the floating point and vector registers.

The ChangeLog entry should just say "New predicate."  The description
is in the documentation of the function.

>         * config/rs6000/vsx.md (SFBOOL_*): Add peephole2 to recognize when
>         we are converting a SFmode to a SImode, moving the result to a GPR
>         register, doing a single AND/IOR/XOR operation, and then moving it
>         back to a vector register.  Change the insns recognized to move
>         the integer value to the vector register and do the operation
>         there.  This code occurs quite a bit in the GLIBC math library in
>         float math functions.

SFBOOL constants are not a new peephole2.  Please move the explanation
into a documentation comment in front of the peephole2.

>         (peephole2 to speed up GLIB math functions): Likewise.
                                                   ^^^ GLIBC?
New peephole2.

>         (movsi_from_sf): New insn to handle where we are moving SFmode
>         values to SImode registers and we need to convert the value to the
>         memory format from the format used within the register.

New define_insn_and_split.

>        (movdi_from_sf_zero_ext): Optimize zero extending movsi_from_sf.

New define_insn_and_split.

>         (movsf_from_si): New insn to handle where we are moving SImode
>         values to SFmode registers and we need to convert the value to the
>         the format used within the floating point and vector registers to
>         the 32-bit memory format.

New define_insn_and_split.

+;; Return 1 if op is a SUBREG that is used to look at a SFmode value as
+;; and integer or vice versa.
     ^^ an integer?

+  /*.  Allow (set (SUBREG:SI (REG:SF)) (SUBREG:SI (REG:SF))).  */
      ^ No period, one space.

The change to rs6000_emit_move() really should have been in a helper
function. We have to stop adding to the complexity of the function.  I
won't insist that you split it out, but any addition of more than a
few lines in rs6000_emit_move(), prologue or epilogue should be placed
into a separate function.

Okay with those changes.  This change is too invasive for GCC 6
backport at this late phase of GCC 6 release.

We'll see if Segher catches anything additional when he wants some
light reading. :-)

Thanks, David
Michael Meissner Jan. 4, 2017, 8:10 p.m. UTC | #2
On Wed, Jan 04, 2017 at 02:17:16PM -0500, David Edelsohn wrote:
> On Thu, Dec 29, 2016 at 7:24 PM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> 
> Mike,
> 
> Thanks for the tremendous effort on this patch.  A few comments.
> 
> The ChangeLog contains a lot of extraneous commentary that should be
> in the documentation comments.  And a few typos in comments.
> 
> >         * config/rs6000/predicates.md (sf_subreg_operand): New predicate
> >         to return true if the operand contains a SUBREG mixing SImode and
> >         SFmode on 64-bit VSX systems with direct move.  This can be a
> >         problem in that we have to know whether the SFmode value should be
> >         represented in the 32-bit memory format or the 64-bit scalar
> >         format used within the floating point and vector registers.
> 
> The ChangeLog entry should just say "New predicate."  The description
> is in the documentation of the function.

Ok.

> >         * config/rs6000/vsx.md (SFBOOL_*): Add peephole2 to recognize when
> >         we are converting a SFmode to a SImode, moving the result to a GPR
> >         register, doing a single AND/IOR/XOR operation, and then moving it
> >         back to a vector register.  Change the insns recognized to move
> >         the integer value to the vector register and do the operation
> >         there.  This code occurs quite a bit in the GLIBC math library in
> >         float math functions.
> 
> SFBOOL constants are not a new peephole2.  Please move the explanation
> into a documentation comment in front of the peephole2.

Ok.

> >         (peephole2 to speed up GLIB math functions): Likewise.
>                                                    ^^^ GLIBC?
> New peephole2.
> 
> >         (movsi_from_sf): New insn to handle where we are moving SFmode
> >         values to SImode registers and we need to convert the value to the
> >         memory format from the format used within the register.
> 
> New define_insn_and_split.
> 
> >        (movdi_from_sf_zero_ext): Optimize zero extending movsi_from_sf.
> 
> New define_insn_and_split.
> 
> >         (movsf_from_si): New insn to handle where we are moving SImode
> >         values to SFmode registers and we need to convert the value to the
> >         the format used within the floating point and vector registers to
> >         the 32-bit memory format.
> 
> New define_insn_and_split.
> 
> +;; Return 1 if op is a SUBREG that is used to look at a SFmode value as
> +;; and integer or vice versa.
>      ^^ an integer?
> 
> +  /*.  Allow (set (SUBREG:SI (REG:SF)) (SUBREG:SI (REG:SF))).  */
>       ^ No period, one space.

Whoops.  How did that get there?

> The change to rs6000_emit_move() really should have been in a helper
> function. We have to stop adding to the complexity of the function.  I
> won't insist that you split it out, but any addition of more than a
> few lines in rs6000_emit_move(), prologue or epilogue should be placed
> into a separate function.

I will see about moving it out to a helper function.

> Okay with those changes.  This change is too invasive for GCC 6
> backport at this late phase of GCC 6 release.

Yes, I figured it was too invasive for GCC 6, but I know some people want this
patch, ASAP.  In any case, I know it wouldn't just go in, since some of the
constraints are related to VSX small integer support, and that did not make GCC
6.

> We'll see if Segher catches anything additional when he wants some
> light reading. :-)
> 
> Thanks, David
>
Michael Meissner Jan. 4, 2017, 9 p.m. UTC | #3
On Wed, Jan 04, 2017 at 02:17:16PM -0500, David Edelsohn wrote:
> The change to rs6000_emit_move() really should have been in a helper
> function. We have to stop adding to the complexity of the function.  I
> won't insist that you split it out, but any addition of more than a
> few lines in rs6000_emit_move(), prologue or epilogue should be placed
> into a separate function.

I can certainly move it into a static helper function (the code in
rs6000_emit_move is only 20 lines once you eliminate the comments), but did you
want me to move some of the other special case parts that are in the same
region to their own helper functions?  I suspect this should be a GCC 8 work
item, but since I'm already moving stuff around, it would be simple to do it
now while I'm in the code.

These include support for block moves, thread local moves, splitting up 128-bit
IBM double double constants, and all of the decimal reload gunk.

> Okay with those changes.  This change is too invasive for GCC 6
> backport at this late phase of GCC 6 release.
> 
> We'll see if Segher catches anything additional when he wants some
> light reading. :-)
> 
> Thanks, David
>
David Edelsohn Jan. 5, 2017, 12:27 a.m. UTC | #4
On Wed, Jan 4, 2017 at 4:00 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> On Wed, Jan 04, 2017 at 02:17:16PM -0500, David Edelsohn wrote:
>> The change to rs6000_emit_move() really should have been in a helper
>> function. We have to stop adding to the complexity of the function.  I
>> won't insist that you split it out, but any addition of more than a
>> few lines in rs6000_emit_move(), prologue or epilogue should be placed
>> into a separate function.
>
> I can certainly move it into a static helper function (the code in
> rs6000_emit_move is only 20 lines once you eliminate the comments), but did you
> want me to move some of the other special case parts that are in the same
> region to their own helper functions?  I suspect this should be a GCC 8 work
> item, but since I'm already moving stuff around, it would be simple to do it
> now while I'm in the code.
>
> These include support for block moves, thread local moves, splitting up 128-bit
> IBM double double constants, and all of the decimal reload gunk.

Mike,

I am not requesting that you split out any other pieces. This isn't
the right time for general cleanups. During stage 3, if we add new
pieces, we can set the precedent of better design.

Thanks, David
diff mbox

Patch

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 243966)
+++ gcc/config/rs6000/predicates.md	(.../gcc/config/rs6000)	(working copy)
@@ -31,12 +31,47 @@  (define_predicate "count_register_operan
        (match_test "REGNO (op) == CTR_REGNO
 		    || REGNO (op) > LAST_VIRTUAL_REGISTER")))
 
+;; Return 1 if op is a SUBREG that is used to look at a SFmode value as
+;; and integer or vice versa.
+;;
+;; In the normal case where SFmode is in a floating point/vector register, it
+;; is stored as a DFmode and has a different format.  If we don't transform the
+;; value, things that use logical operations on the values will get the wrong
+;; value.
+;;
+;; If we don't have 64-bit and direct move, this conversion will be done by
+;; store and load, instead of by fiddling with the bits within the register.
+(define_predicate "sf_subreg_operand"
+  (match_code "subreg")
+{
+  rtx inner_reg = SUBREG_REG (op);
+  machine_mode inner_mode = GET_MODE (inner_reg);
+
+  if (TARGET_ALLOW_SF_SUBREG || !REG_P (inner_reg))
+    return 0;
+
+  if ((mode == SFmode && GET_MODE_CLASS (inner_mode) == MODE_INT)
+       || (GET_MODE_CLASS (mode) == MODE_INT && inner_mode == SFmode))
+    {
+      if (INT_REGNO_P (REGNO (inner_reg)))
+	return 0;
+
+      return 1;
+    }
+  return 0;
+})
+
 ;; Return 1 if op is an Altivec register.
 (define_predicate "altivec_register_operand"
   (match_operand 0 "register_operand")
 {
   if (GET_CODE (op) == SUBREG)
-    op = SUBREG_REG (op);
+    {
+      if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode))
+	return 0;
+
+      op = SUBREG_REG (op);
+    }
 
   if (!REG_P (op))
     return 0;
@@ -52,6 +87,27 @@  (define_predicate "vsx_register_operand"
   (match_operand 0 "register_operand")
 {
   if (GET_CODE (op) == SUBREG)
+    {
+      if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode))
+	return 0;
+
+      op = SUBREG_REG (op);
+    }
+
+  if (!REG_P (op))
+    return 0;
+
+  if (REGNO (op) >= FIRST_PSEUDO_REGISTER)
+    return 1;
+
+  return VSX_REGNO_P (REGNO (op));
+})
+
+;; Like vsx_register_operand, but allow SF SUBREGS
+(define_predicate "vsx_reg_sfsubreg_ok"
+  (match_operand 0 "register_operand")
+{
+  if (GET_CODE (op) == SUBREG)
     op = SUBREG_REG (op);
 
   if (!REG_P (op))
@@ -69,7 +125,12 @@  (define_predicate "vfloat_operand"
   (match_operand 0 "register_operand")
 {
   if (GET_CODE (op) == SUBREG)
-    op = SUBREG_REG (op);
+    {
+      if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode))
+	return 0;
+
+      op = SUBREG_REG (op);
+    }
 
   if (!REG_P (op))
     return 0;
@@ -86,7 +147,12 @@  (define_predicate "vint_operand"
   (match_operand 0 "register_operand")
 {
   if (GET_CODE (op) == SUBREG)
-    op = SUBREG_REG (op);
+    {
+      if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode))
+	return 0;
+
+      op = SUBREG_REG (op);
+    }
 
   if (!REG_P (op))
     return 0;
@@ -103,7 +169,13 @@  (define_predicate "vlogical_operand"
   (match_operand 0 "register_operand")
 {
   if (GET_CODE (op) == SUBREG)
-    op = SUBREG_REG (op);
+    {
+      if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode))
+	return 0;
+
+      op = SUBREG_REG (op);
+    }
+
 
   if (!REG_P (op))
     return 0;
@@ -221,6 +293,9 @@  (define_predicate "const_0_to_15_operand
        (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
 
 ;; Return 1 if op is a register that is not special.
+;; Disallow (SUBREG:SF (REG:SI)) and (SUBREG:SI (REG:SF)) on VSX systems where
+;; you need to be careful in moving a SFmode to SImode and vice versa due to
+;; the fact that SFmode is represented as DFmode in the VSX registers.
 (define_predicate "gpc_reg_operand"
   (match_operand 0 "register_operand")
 {
@@ -228,7 +303,12 @@  (define_predicate "gpc_reg_operand"
     return 0;
 
   if (GET_CODE (op) == SUBREG)
-    op = SUBREG_REG (op);
+    {
+      if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode))
+	return 0;
+
+      op = SUBREG_REG (op);
+    }
 
   if (!REG_P (op))
     return 0;
@@ -246,7 +326,8 @@  (define_predicate "gpc_reg_operand"
 })
 
 ;; Return 1 if op is a general purpose register.  Unlike gpc_reg_operand, don't
-;; allow floating point or vector registers.
+;; allow floating point or vector registers.  Since vector registers are not
+;; allowed, we don't have to reject SFmode/SImode subregs.
 (define_predicate "int_reg_operand"
   (match_operand 0 "register_operand")
 {
@@ -254,7 +335,12 @@  (define_predicate "int_reg_operand"
     return 0;
 
   if (GET_CODE (op) == SUBREG)
-    op = SUBREG_REG (op);
+    {
+      if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode))
+	return 0;
+
+      op = SUBREG_REG (op);
+    }
 
   if (!REG_P (op))
     return 0;
@@ -266,6 +352,8 @@  (define_predicate "int_reg_operand"
 })
 
 ;; Like int_reg_operand, but don't return true for pseudo registers
+;; We don't have to check for SF SUBREGS because pseudo registers
+;; are not allowed, and SF SUBREGs are ok within GPR registers.
 (define_predicate "int_reg_operand_not_pseudo"
   (match_operand 0 "register_operand")
 {
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 243966)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -10341,6 +10341,39 @@  rs6000_emit_le_vsx_move (rtx dest, rtx s
     }
 }
 
+/* Return whether a SFmode or SImode move can be done without converting one
+   mode to another.  This arrises when we have:
+
+	(SUBREG:SF (REG:SI ...))
+	(SUBREG:SI (REG:SF ...))
+
+   and one of the values is in a floating point/vector register, where SFmode
+   scalars are stored in DFmode format.  */
+
+bool
+valid_sf_si_move (rtx dest, rtx src, machine_mode mode)
+{
+  if (TARGET_ALLOW_SF_SUBREG)
+    return true;
+
+  if (mode != SFmode && GET_MODE_CLASS (mode) != MODE_INT)
+    return true;
+
+  if (!SUBREG_P (src) || !sf_subreg_operand (src, mode))
+    return true;
+
+  /*.  Allow (set (SUBREG:SI (REG:SF)) (SUBREG:SI (REG:SF))).  */
+  if (SUBREG_P (dest))
+    {
+      rtx dest_subreg = SUBREG_REG (dest);
+      rtx src_subreg = SUBREG_REG (src);
+      return GET_MODE (dest_subreg) == GET_MODE (src_subreg);
+    }
+
+  return false;
+}
+
+
 /* Emit a move from SOURCE to DEST in mode MODE.  */
 void
 rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
@@ -10371,6 +10404,39 @@  rs6000_emit_move (rtx dest, rtx source, 
       gcc_unreachable ();
     }
 
+  /* If we are running before register allocation on a 64-bit machine with
+     direct move, and we see either:
+
+	(set (reg:SF xxx) (subreg:SF (reg:SI yyy) zzz))		(or)
+	(set (reg:SI xxx) (subreg:SI (reg:SF yyy) zzz))
+
+     convert these into a form using UNSPEC.  This is due to SFmode being
+     stored within a vector register in the same format as DFmode.  We need to
+     convert the bits before we can use a direct move or operate on the bits in
+     the vector register as an integer type.
+
+     Skip things like (set (SUBREG:SI (...) (SUBREG:SI (...)).  */
+  if (TARGET_DIRECT_MOVE_64BIT && !reload_in_progress && !reload_completed
+      && !lra_in_progress
+      && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode))
+      && SUBREG_P (source) && sf_subreg_operand (source, mode))
+    {
+      rtx inner_source = SUBREG_REG (source);
+      machine_mode inner_mode = GET_MODE (inner_source);
+
+      if (mode == SImode && inner_mode == SFmode)
+	{
+	  emit_insn (gen_movsi_from_sf (dest, inner_source));
+	  return;
+	}
+
+      if (mode == SFmode && inner_mode == SImode)
+	{
+	  emit_insn (gen_movsf_from_si (dest, inner_source));
+	  return;
+	}
+    }
+
   /* Check if GCC is setting up a block move that will end up using FP
      registers as temporaries.  We must make sure this is acceptable.  */
   if (GET_CODE (operands[0]) == MEM
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 243966)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -3897,3 +3897,124 @@  (define_insn "*vinsert4b_di_internal"
   "TARGET_P9_VECTOR"
   "xxinsertw %x0,%x1,%3"
   [(set_attr "type" "vecperm")])
+
+
+
+;; Attempt to optimize some common operations using logical operations to pick
+;; apart SFmode operations.  This is tricky, because we can't just do the
+;; operation using VSX logical operations directly, because in the PowerPC,
+;; SFmode is represented internally as DFmode in the vector registers.
+
+;; The insns for dealing with SFmode in GPR registers looks like:
+;; (set (reg:V4SF reg2) (unspec:V4SF [(reg:SF reg1)] UNSPEC_VSX_CVDPSPN))
+;;
+;; (set (reg:DI reg3) (unspec:DI [(reg:V4SF reg2)] UNSPEC_P8V_RELOAD_FROM_VSX))
+;;
+;; (set (reg:DI reg3) (lshiftrt:DI (reg:DI reg3) (const_int 32)))
+;;
+;; (set (reg:DI reg5) (and:DI (reg:DI reg3) (reg:DI reg4)))
+;;
+;; (set (reg:DI reg6) (ashift:DI (reg:DI reg5) (const_int 32)))
+;;
+;; (set (reg:SF reg7) (unspec:SF [(reg:DI reg6)] UNSPEC_P8V_MTVSRD))
+;;
+;; (set (reg:SF reg7) (unspec:SF [(reg:SF reg7)] UNSPEC_VSX_CVSPDPN))
+
+(define_code_iterator sf_logical [and ior xor])
+
+(define_constants
+  [(SFBOOL_TMP_GPR		 0)		;; GPR temporary
+   (SFBOOL_TMP_VSX		 1)		;; vector temporary
+   (SFBOOL_MFVSR_D		 2)		;; move to gpr dest
+   (SFBOOL_MFVSR_A		 3)		;; move to gpr src
+   (SFBOOL_BOOL_D		 4)		;; and/ior/xor dest
+   (SFBOOL_BOOL_A1		 5)		;; and/ior/xor arg1
+   (SFBOOL_BOOL_A2		 6)		;; and/ior/xor arg1
+   (SFBOOL_SHL_D		 7)		;; shift left dest
+   (SFBOOL_SHL_A		 8)		;; shift left arg
+   (SFBOOL_MTVSR_D		 9)		;; move to vecter dest
+   (SFBOOL_BOOL_A_DI		10)		;; SFBOOL_BOOL_A1/A2 as DImode
+   (SFBOOL_TMP_VSX_DI		11)		;; SFBOOL_TMP_VSX as DImode
+   (SFBOOL_MTVSR_D_V4SF		12)])		;; SFBOOL_MTVSRD_D as V4SFmode
+
+(define_peephole2
+  [(match_scratch:DI SFBOOL_TMP_GPR "r")
+   (match_scratch:V4SF SFBOOL_TMP_VSX "wa")
+
+   ;; MFVSRD
+   (set (match_operand:DI SFBOOL_MFVSR_D "int_reg_operand")
+	(unspec:DI [(match_operand:V4SF SFBOOL_MFVSR_A "vsx_register_operand")]
+		   UNSPEC_P8V_RELOAD_FROM_VSX))
+
+   ;; SRDI
+   (set (match_dup SFBOOL_MFVSR_D)
+	(lshiftrt:DI (match_dup SFBOOL_MFVSR_D)
+		     (const_int 32)))
+
+   ;; AND/IOR/XOR operation on int
+   (set (match_operand:SI SFBOOL_BOOL_D "int_reg_operand")
+	(sf_logical:SI (match_operand:SI SFBOOL_BOOL_A1 "int_reg_operand")
+		       (match_operand:SI SFBOOL_BOOL_A2 "reg_or_cint_operand")))
+
+   ;; SLDI
+   (set (match_operand:DI SFBOOL_SHL_D "int_reg_operand")
+	(ashift:DI (match_operand:DI SFBOOL_SHL_A "int_reg_operand")
+		   (const_int 32)))
+
+   ;; MTVSRD
+   (set (match_operand:SF SFBOOL_MTVSR_D "vsx_register_operand")
+	(unspec:SF [(match_dup SFBOOL_SHL_D)] UNSPEC_P8V_MTVSRD))]
+
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE
+   /* The REG_P (xxx) tests prevents SUBREG's, which allows us to use REGNO
+      to compare registers, when the mode is different.  */
+   && REG_P (operands[SFBOOL_MFVSR_D]) && REG_P (operands[SFBOOL_BOOL_D])
+   && REG_P (operands[SFBOOL_BOOL_A1]) && REG_P (operands[SFBOOL_SHL_D])
+   && REG_P (operands[SFBOOL_SHL_A])   && REG_P (operands[SFBOOL_MTVSR_D])
+   && (REG_P (operands[SFBOOL_BOOL_A2])
+       || CONST_INT_P (operands[SFBOOL_BOOL_A2]))
+   && (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D])
+       || peep2_reg_dead_p (3, operands[SFBOOL_MFVSR_D]))
+   && (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1])
+       || (REG_P (operands[SFBOOL_BOOL_A2])
+	   && REGNO (operands[SFBOOL_MFVSR_D])
+		== REGNO (operands[SFBOOL_BOOL_A2])))
+   && REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A])
+   && (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D])
+       || peep2_reg_dead_p (4, operands[SFBOOL_BOOL_D]))
+   && peep2_reg_dead_p (5, operands[SFBOOL_SHL_D])"
+  [(set (match_dup SFBOOL_TMP_GPR)
+	(ashift:DI (match_dup SFBOOL_BOOL_A_DI)
+		   (const_int 32)))
+
+   (set (match_dup SFBOOL_TMP_VSX_DI)
+	(match_dup SFBOOL_TMP_GPR))
+
+   (set (match_dup SFBOOL_MTVSR_D_V4SF)
+	(sf_logical:V4SF (match_dup SFBOOL_MFVSR_A)
+			 (match_dup SFBOOL_TMP_VSX)))]
+{
+  rtx bool_a1 = operands[SFBOOL_BOOL_A1];
+  rtx bool_a2 = operands[SFBOOL_BOOL_A2];
+  int regno_mfvsr_d = REGNO (operands[SFBOOL_MFVSR_D]);
+  int regno_tmp_vsx = REGNO (operands[SFBOOL_TMP_VSX]);
+  int regno_mtvsr_d = REGNO (operands[SFBOOL_MTVSR_D]);
+
+  if (CONST_INT_P (bool_a2))
+    {
+      rtx tmp_gpr = operands[SFBOOL_TMP_GPR];
+      emit_move_insn (tmp_gpr, bool_a2);
+      operands[SFBOOL_BOOL_A_DI] = tmp_gpr;
+    }
+  else
+    {
+      int regno_bool_a1 = REGNO (bool_a1);
+      int regno_bool_a2 = REGNO (bool_a2);
+      int regno_bool_a = (regno_mfvsr_d == regno_bool_a1
+			  ? regno_bool_a2 : regno_bool_a1);
+      operands[SFBOOL_BOOL_A_DI] = gen_rtx_REG (DImode, regno_bool_a);
+    }
+
+  operands[SFBOOL_TMP_VSX_DI] = gen_rtx_REG (DImode, regno_tmp_vsx);
+  operands[SFBOOL_MTVSR_D_V4SF] = gen_rtx_REG (V4SFmode, regno_mtvsr_d);
+})
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 243966)
+++ gcc/config/rs6000/rs6000-protos.h	(.../gcc/config/rs6000)	(working copy)
@@ -153,6 +153,7 @@  extern void rs6000_fatal_bad_address (rt
 extern rtx create_TOC_reference (rtx, rtx);
 extern void rs6000_split_multireg_move (rtx, rtx);
 extern void rs6000_emit_le_vsx_move (rtx, rtx, machine_mode);
+extern bool valid_sf_si_move (rtx, rtx, machine_mode);
 extern void rs6000_emit_move (rtx, rtx, machine_mode);
 extern rtx rs6000_secondary_memory_needed_rtx (machine_mode);
 extern machine_mode rs6000_secondary_memory_needed_mode (machine_mode);
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 243966)
+++ gcc/config/rs6000/rs6000.h	(.../gcc/config/rs6000)	(working copy)
@@ -608,6 +608,12 @@  extern int rs6000_vector_align[];
 				 && TARGET_POWERPC64)
 #define TARGET_VEXTRACTUB	(TARGET_P9_VECTOR && TARGET_DIRECT_MOVE \
 				 && TARGET_UPPER_REGS_DI && TARGET_POWERPC64)
+
+
+/* Whether we should avoid (SUBREG:SI (REG:SF) and (SUBREG:SF (REG:SI).  */
+#define TARGET_NO_SF_SUBREG	TARGET_DIRECT_MOVE_64BIT
+#define TARGET_ALLOW_SF_SUBREG	(!TARGET_DIRECT_MOVE_64BIT)
+
 /* This wants to be set for p8 and newer.  On p7, overlapping unaligned
    loads are slow. */
 #define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 243966)
+++ gcc/config/rs6000/rs6000.md	(.../gcc/config/rs6000)	(working copy)
@@ -150,6 +150,8 @@  (define_c_enum "unspec"
    UNSPEC_IEEE128_CONVERT
    UNSPEC_SIGNBIT
    UNSPEC_DOLOOP
+   UNSPEC_SF_FROM_SI
+   UNSPEC_SI_FROM_SF
   ])
 
 ;;
@@ -564,7 +566,8 @@  (define_code_attr return_pred [(return "
 (define_code_attr return_str [(return "") (simple_return "simple_")])
 
 ; Logical operators.
-(define_code_iterator iorxor [ior xor])
+(define_code_iterator iorxor		[ior xor])
+(define_code_iterator and_ior_xor	[and ior xor])
 
 ; Signed/unsigned variants of ops.
 (define_code_iterator any_extend	[sign_extend zero_extend])
@@ -6754,6 +6757,157 @@  (define_insn "*movsi_internal1_single"
   [(set_attr "type" "*,*,load,store,*,*,*,mfjmpr,mtjmpr,*,*,fpstore,fpload")
    (set_attr "length" "4,4,4,4,4,4,8,4,4,4,4,4,4")])
 
+;; Like movsi, but adjust a SF value to be used in a SI context, i.e.
+;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0))
+;;
+;; Because SF values are actually stored as DF values within the vector
+;; registers, we need to convert the value to the vector SF format when
+;; we need to use the bits in a union or similar cases.  We only need
+;; to do this transformation when the value is a vector register.  Loads,
+;; stores, and transfers within GPRs are assumed to be safe.
+;;
+;; This is a more general case of reload_gpr_from_vsxsf.  That insn must have
+;; no alternatives, because the call is created as part of secondary_reload,
+;; and operand #2's register class is used to allocate the temporary register.
+;; This function is called before reload, and it creates the temporary as
+;; needed.
+
+;;		MR           LWZ          LFIWZX       LXSIWZX   STW
+;;		STFS         STXSSP       STXSSPX      VSX->GPR  MTVSRWZ
+;;		VSX->VSX
+
+(define_insn_and_split "movsi_from_sf"
+  [(set (match_operand:SI 0 "rs6000_nonimmediate_operand"
+		"=r,         r,           ?*wI,        ?*wH,     m,
+		 m,          wY,          Z,           r,        wIwH,
+		 ?wK")
+
+	(unspec:SI [(match_operand:SF 1 "input_operand"
+		"r,          m,           Z,           Z,        r,
+		 f,          wu,          wu,          wIwH,     r,
+		 wK")]
+		    UNSPEC_SI_FROM_SF))
+
+   (clobber (match_scratch:V4SF 2
+		"=X,         X,           X,           X,        X,
+		 X,          X,           X,           wa,       X,
+		 wa"))]
+
+  "TARGET_NO_SF_SUBREG
+   && (register_operand (operands[0], SImode)
+       || register_operand (operands[1], SFmode))"
+  "@
+   mr %0,%1
+   lwz%U1%X1 %0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1
+   stw%U0%X0 %1,%0
+   stfs%U0%X0 %1,%0
+   stxssp %1,%0
+   stxsspx %x1,%y0
+   #
+   mtvsrwz %x0,%1
+   #"
+  "&& reload_completed
+   && register_operand (operands[0], SImode)
+   && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
+
+  emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
+
+  if (int_reg_operand (op0, SImode))
+    {
+      emit_insn (gen_p8_mfvsrd_4_disf (op0_di, op2));
+      emit_insn (gen_lshrdi3 (op0_di, op0_di, GEN_INT (32)));
+    }
+  else
+    {
+      rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
+      rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
+      emit_insn (gen_vextract4b (op0_di, op1_v16qi, byte_off));
+    }
+
+  DONE;
+}
+  [(set_attr "type"
+		"*,          load,        fpload,      fpload,   store,
+		 fpstore,    fpstore,     fpstore,     mftgpr,   mffgpr,
+		 veclogical")
+
+   (set_attr "length"
+		"4,          4,           4,           4,        4,
+		 4,          4,           4,           12,       4,
+		 8")])
+
+;; movsi_from_sf with zero extension
+;;
+;;		RLDICL       LWZ          LFIWZX       LXSIWZX   VSX->GPR
+;;		MTVSRWZ      VSX->VSX
+
+(define_insn_and_split "*movdi_from_sf_zero_ext"
+  [(set (match_operand:DI 0 "gpc_reg_operand"
+		"=r,         r,           ?*wI,        ?*wH,     r,
+		wIwH,        ?wK")
+
+	(zero_extend:DI
+	 (unspec:SI [(match_operand:SF 1 "input_operand"
+		"r,          m,           Z,           Z,        wIwH,
+		 r,          wK")]
+		    UNSPEC_SI_FROM_SF)))
+
+   (clobber (match_scratch:V4SF 2
+		"=X,         X,           X,           X,        wa,
+		 X,          wa"))]
+
+  "TARGET_DIRECT_MOVE_64BIT
+   && (register_operand (operands[0], DImode)
+       || register_operand (operands[1], SImode))"
+  "@
+   rldicl %0,%1,0,32
+   lwz%U1%X1 %0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1
+   #
+   mtvsrwz %x0,%1
+   #"
+  "&& reload_completed
+   && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+
+  emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
+
+  if (int_reg_operand (op0, DImode))
+    {
+      emit_insn (gen_p8_mfvsrd_4_disf (op0, op2));
+      emit_insn (gen_lshrdi3 (op0, op0, GEN_INT (32)));
+    }
+  else
+    {
+      rtx op0_si = gen_rtx_REG (SImode, REGNO (op0));
+      rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
+      rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
+      emit_insn (gen_vextract4b (op0_si, op1_v16qi, byte_off));
+    }
+
+  DONE;
+}
+  [(set_attr "type"
+		"*,          load,        fpload,      fpload,  mftgpr,
+		 mffgpr,     veclogical")
+
+   (set_attr "length"
+		"4,          4,           4,           4,        12,
+		 4,          8")])
+
 ;; Split a load of a large constant into the appropriate two-insn
 ;; sequence.
 
@@ -6963,9 +7117,11 @@  (define_insn "mov<mode>_hardfloat"
 	 "m,         <f32_lm>,  <f32_lm2>, Z,         r,         <f32_sr>,
 	  <f32_sr2>, <f32_av>,  <zero_fp>, <zero_fp>, r,         <f32_dm>,
 	  f,         <f32_vsx>, r,         r,         *h,        0"))]
-  "(gpc_reg_operand (operands[0], <MODE>mode)
-   || gpc_reg_operand (operands[1], <MODE>mode))
-   && (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT)"
+  "(register_operand (operands[0], <MODE>mode)
+   || register_operand (operands[1], <MODE>mode))
+   && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT
+   && (TARGET_ALLOW_SF_SUBREG
+       || valid_sf_si_move (operands[0], operands[1], <MODE>mode))"
   "@
    lwz%U1%X1 %0,%1
    <f32_li>
@@ -7007,6 +7163,75 @@  (define_insn "*mov<mode>_softfloat"
   [(set_attr "type" "*,mtjmpr,mfjmpr,load,store,*,*,*,*,*")
    (set_attr "length" "4,4,4,4,4,4,4,4,8,4")])
 
+;; Like movsf, but adjust a SI value to be used in a SF context, i.e.
+;; (set (reg:SF ...) (subreg:SF (reg:SI ...) 0))
+;;
+;; Because SF values are actually stored as DF values within the vector
+;; registers, we need to convert the value to the vector SF format when
+;; we need to use the bits in a union or similar cases.  We only need
+;; to do this transformation when the value is a vector register.  Loads,
+;; stores, and transfers within GPRs are assumed to be safe.
+;;
+;; This is a more general case of reload_vsx_from_gprsf.  That insn must have
+;; no alternatives, because the call is created as part of secondary_reload,
+;; and operand #2's register class is used to allocate the temporary register.
+;; This function is called before reload, and it creates the temporary as
+;; needed.
+
+;;	    LWZ          LFS        LXSSP      LXSSPX     STW        STFIWX
+;;	    STXSIWX      GPR->VSX   VSX->GPR   GPR->GPR
+(define_insn_and_split "movsf_from_si"
+  [(set (match_operand:SF 0 "rs6000_nonimmediate_operand"
+	    "=!r,       f,         wb,        wu,        m,         Z,
+	     Z,         wy,        ?r,        !r")
+
+	(unspec:SF [(match_operand:SI 1 "input_operand" 
+	    "m,         m,         wY,        Z,         r,         f,
+	     wu,        r,         wy,        r")]
+		   UNSPEC_SF_FROM_SI))
+
+   (clobber (match_scratch:DI 2
+	    "=X,        X,         X,         X,         X,         X,
+             X,         r,         X,         X"))]
+
+  "TARGET_NO_SF_SUBREG
+   && (register_operand (operands[0], SFmode)
+       || register_operand (operands[1], SImode))"
+  "@
+   lwz%U1%X1 %0,%1
+   lfs%U1%X1 %0,%1
+   lxssp %0,%1
+   lxsspx %x0,%y1
+   stw%U0%X0 %1,%0
+   stfiwx %1,%y0
+   stxsiwx %x1,%y0
+   #
+   mfvsrwz %0,%x1
+   mr %0,%1"
+
+  "&& reload_completed
+   && vsx_reg_sfsubreg_ok (operands[0], SFmode)
+   && int_reg_operand_not_pseudo (operands[1], SImode)"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
+
+  /* Move SF value to upper 32-bits for xscvspdpn.  */
+  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
+  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+  DONE;
+}
+  [(set_attr "length"
+	    "4,          4,         4,         4,         4,         4,
+	     4,          12,        4,         4")
+   (set_attr "type"
+	    "load,       fpload,    fpload,    fpload,    store,     fpstore,
+	     fpstore,    vecfloat,  mffgpr,    *")])
+
 
 ;; Move 64-bit binary/decimal floating point
 (define_expand "mov<mode>"
@@ -13217,11 +13442,11 @@  (define_insn "bpermd_<mode>"
 ;; Note that the conditions for expansion are in the FMA_F iterator.
 
 (define_expand "fma<mode>4"
-  [(set (match_operand:FMA_F 0 "register_operand" "")
+  [(set (match_operand:FMA_F 0 "gpc_reg_operand" "")
 	(fma:FMA_F
-	  (match_operand:FMA_F 1 "register_operand" "")
-	  (match_operand:FMA_F 2 "register_operand" "")
-	  (match_operand:FMA_F 3 "register_operand" "")))]
+	  (match_operand:FMA_F 1 "gpc_reg_operand" "")
+	  (match_operand:FMA_F 2 "gpc_reg_operand" "")
+	  (match_operand:FMA_F 3 "gpc_reg_operand" "")))]
   ""
   "")
 
@@ -13241,11 +13466,11 @@  (define_insn "*fma<mode>4_fpr"
 
 ; Altivec only has fma and nfms.
 (define_expand "fms<mode>4"
-  [(set (match_operand:FMA_F 0 "register_operand" "")
+  [(set (match_operand:FMA_F 0 "gpc_reg_operand" "")
 	(fma:FMA_F
-	  (match_operand:FMA_F 1 "register_operand" "")
-	  (match_operand:FMA_F 2 "register_operand" "")
-	  (neg:FMA_F (match_operand:FMA_F 3 "register_operand" ""))))]
+	  (match_operand:FMA_F 1 "gpc_reg_operand" "")
+	  (match_operand:FMA_F 2 "gpc_reg_operand" "")
+	  (neg:FMA_F (match_operand:FMA_F 3 "gpc_reg_operand" ""))))]
   "!VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
   "")
 
@@ -13265,34 +13490,34 @@  (define_insn "*fms<mode>4_fpr"
 
 ;; If signed zeros are ignored, -(a * b - c) = -a * b + c.
 (define_expand "fnma<mode>4"
-  [(set (match_operand:FMA_F 0 "register_operand" "")
+  [(set (match_operand:FMA_F 0 "gpc_reg_operand" "")
 	(neg:FMA_F
 	  (fma:FMA_F
-	    (match_operand:FMA_F 1 "register_operand" "")
-	    (match_operand:FMA_F 2 "register_operand" "")
-	    (neg:FMA_F (match_operand:FMA_F 3 "register_operand" "")))))]
+	    (match_operand:FMA_F 1 "gpc_reg_operand" "")
+	    (match_operand:FMA_F 2 "gpc_reg_operand" "")
+	    (neg:FMA_F (match_operand:FMA_F 3 "gpc_reg_operand" "")))))]
   "!HONOR_SIGNED_ZEROS (<MODE>mode)"
   "")
 
 ;; If signed zeros are ignored, -(a * b + c) = -a * b - c.
 (define_expand "fnms<mode>4"
-  [(set (match_operand:FMA_F 0 "register_operand" "")
+  [(set (match_operand:FMA_F 0 "gpc_reg_operand" "")
 	(neg:FMA_F
 	  (fma:FMA_F
-	    (match_operand:FMA_F 1 "register_operand" "")
-	    (match_operand:FMA_F 2 "register_operand" "")
-	    (match_operand:FMA_F 3 "register_operand" ""))))]
+	    (match_operand:FMA_F 1 "gpc_reg_operand" "")
+	    (match_operand:FMA_F 2 "gpc_reg_operand" "")
+	    (match_operand:FMA_F 3 "gpc_reg_operand" ""))))]
   "!HONOR_SIGNED_ZEROS (<MODE>mode) && !VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
   "")
 
 ; Not an official optab name, but used from builtins.
 (define_expand "nfma<mode>4"
-  [(set (match_operand:FMA_F 0 "register_operand" "")
+  [(set (match_operand:FMA_F 0 "gpc_reg_operand" "")
 	(neg:FMA_F
 	  (fma:FMA_F
-	    (match_operand:FMA_F 1 "register_operand" "")
-	    (match_operand:FMA_F 2 "register_operand" "")
-	    (match_operand:FMA_F 3 "register_operand" ""))))]
+	    (match_operand:FMA_F 1 "gpc_reg_operand" "")
+	    (match_operand:FMA_F 2 "gpc_reg_operand" "")
+	    (match_operand:FMA_F 3 "gpc_reg_operand" ""))))]
   "!VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
   "")
 
@@ -13313,12 +13538,12 @@  (define_insn "*nfma<mode>4_fpr"
 
 ; Not an official optab name, but used from builtins.
 (define_expand "nfms<mode>4"
-  [(set (match_operand:FMA_F 0 "register_operand" "")
+  [(set (match_operand:FMA_F 0 "gpc_reg_operand" "")
 	(neg:FMA_F
 	  (fma:FMA_F
-	    (match_operand:FMA_F 1 "register_operand" "")
-	    (match_operand:FMA_F 2 "register_operand" "")
-	    (neg:FMA_F (match_operand:FMA_F 3 "register_operand" "")))))]
+	    (match_operand:FMA_F 1 "gpc_reg_operand" "")
+	    (match_operand:FMA_F 2 "gpc_reg_operand" "")
+	    (neg:FMA_F (match_operand:FMA_F 3 "gpc_reg_operand" "")))))]
   ""
   "")
 
Index: gcc/testsuite/gcc.target/powerpc/pr71977-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr71977-1.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 243966)
@@ -0,0 +1,31 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+#include <stdint.h>
+
+typedef union
+{
+  float value;
+  uint32_t word;
+} ieee_float_shape_type;
+
+float
+mask_and_float_var (float f, uint32_t mask)
+{ 
+  ieee_float_shape_type u;
+
+  u.value = f;
+  u.word &= mask;
+
+  return u.value;
+}
+
+/* { dg-final { scan-assembler     "\[ \t\]xxland " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]and "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]stxv"    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]lxv"     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]srdi "   } } */
Index: gcc/testsuite/gcc.target/powerpc/pr71977-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr71977-2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr71977-2.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 243966)
@@ -0,0 +1,31 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+#include <stdint.h>
+
+typedef union
+{
+  float value;
+  uint32_t word;
+} ieee_float_shape_type;
+
+float
+mask_and_float_sign (float f)
+{ 
+  ieee_float_shape_type u;
+
+  u.value = f;
+  u.word &= 0x80000000;
+
+  return u.value;
+}
+
+/* { dg-final { scan-assembler     "\[ \t\]xxland " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]and "    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
+/* { dg-final { scan-assembler-not "\[ \t\]stxv"    } } */
+/* { dg-final { scan-assembler-not "\[ \t\]lxv"     } } */
+/* { dg-final { scan-assembler-not "\[ \t\]srdi "   } } */