diff mbox series

, Generate XXBR{H,W,D} for bswap{16,32,64} on PowerPC ISA 3.0 (power9)

Message ID 20171108131431.GA11821@ibm-tiger.the-meissners.org
State New
Headers show
Series , Generate XXBR{H,W,D} for bswap{16,32,64} on PowerPC ISA 3.0 (power9) | expand

Commit Message

Michael Meissner Nov. 8, 2017, 1:14 p.m. UTC
PowerPC ISA 3.0 does not have a byte-reverse instruction that operates on the
GPRs, but it does have vector byte swap half-word, word, double-word operations
in the VSX registers.  The enclosed patch enables generation of the byte
revseral instructions for register-register operations.  It still prefers to
generate the load with byte reverse (L{H,W,D}BRX) or store with byte reverse
(ST{H,W,D}BRX) instructions over the register sequence.

For 16-bit and 32-bit byte swaps, it typically does the tradational operation
in GPR registers, but it will generate XXBR{H,W} if the values are in vector
registers.

For 64-bit swaps, it no longer generates the 9 instruction sequence in favor of
XXBRD.  I did some timing runs on a prototype power9 system, and it was
slightly faster to do direct move to the vecter unit, XXBRD, and direct move
back to a GPR than the traditional sequence.

I did bootstraps on little endian Power8 and Power9 systems (with the default
cpu set to power8 and power9 respectively).  There were no regressions.  Can I
check this patch into the trunk?

[gcc]
2017-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems,
	enable generating XXBR{H,W} if the value is in a vector
	register.
	(bswapsi2_reg): Likewise.
	(bswapdi2_reg): On ISA 3.0 systems, use XXBRD to do bswap64
	instead of doing the GPR sequence used on previoius machines.
	(bswapdi2_xxbrd): Likewise.
	(bswapdi2_reg splitters): Use int_reg_operand instead of
	gpc_reg_operand to not match when XXBRD is generated.

[gcc/testsuite]
2017-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/p9-xxbr-3.c: New test.

Comments

Segher Boessenkool Nov. 8, 2017, 2:01 p.m. UTC | #1
Hi Mike,

On Wed, Nov 08, 2017 at 08:14:31AM -0500, Michael Meissner wrote:
> PowerPC ISA 3.0 does not have a byte-reverse instruction that operates on the
> GPRs, but it does have vector byte swap half-word, word, double-word operations
> in the VSX registers.  The enclosed patch enables generation of the byte
> revseral instructions for register-register operations.  It still prefers to
> generate the load with byte reverse (L{H,W,D}BRX) or store with byte reverse
> (ST{H,W,D}BRX) instructions over the register sequence.
> 
> For 16-bit and 32-bit byte swaps, it typically does the tradational operation
> in GPR registers, but it will generate XXBR{H,W} if the values are in vector
> registers.

You can do the byteswaps with just VMX, too, with some rotate instructions
(vrlh 8 for HI, vrlh 8 followed by vrlw 16 for SI, etc.)  For a future
date, I suppose.

> For 64-bit swaps, it no longer generates the 9 instruction sequence in favor of
> XXBRD.  I did some timing runs on a prototype power9 system, and it was
> slightly faster to do direct move to the vecter unit, XXBRD, and direct move
> back to a GPR than the traditional sequence.
> 
> I did bootstraps on little endian Power8 and Power9 systems (with the default
> cpu set to power8 and power9 respectively).  There were no regressions.  Can I
> check this patch into the trunk?
> 
> [gcc]
> 2017-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems,
> 	enable generating XXBR{H,W} if the value is in a vector
> 	register.

Join the last two lines?  And you only mean XXBRH here.

> 	(bswapsi2_reg): Likewise.

And XXBRW here.

> 	(bswapdi2_reg): On ISA 3.0 systems, use XXBRD to do bswap64
> 	instead of doing the GPR sequence used on previoius machines.

Typo ("previous").

> 	(bswapdi2_xxbrd): Likewise.

Not "Likewise"; just "New define_insn."

Should this somehow be joined with p9_xxbrd_<mode>?  Or maybe you should
generate that, instead.

> 	(bswapdi2_reg splitters): Use int_reg_operand instead of
> 	gpc_reg_operand to not match when XXBRD is generated.

Please see if you can merge the define_splits to their corresponding
define_insns (making define_insn_and_split).  Shorter, no mismatch
possible between the two anymore, and you get a name for the patterns
too :-)

> @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2"
>  	emit_insn (gen_bswapdi2_load (dest, src));
>        else if (MEM_P (dest))
>  	emit_insn (gen_bswapdi2_store (dest, src));
> +      else if (TARGET_P9_VECTOR)
> +	emit_insn (gen_bswapdi2_xxbrd (dest, src));
>        else
>  	emit_insn (gen_bswapdi2_reg (dest, src));
>        DONE;

Pity that you cannot easily do similar for HI and SI.

> @@ -2544,7 +2559,8 @@ (define_insn "bswapdi2_reg"
>     (clobber (match_scratch:DI 3 "=&r"))]
>    "TARGET_POWERPC64 && TARGET_LDBRX"
>    "#"
> -  [(set_attr "length" "36")])
> +  [(set_attr "length" "36")
> +   (set_attr "type" "*")])

Explicitly setting "type" (or any other attr) to the default value is
pretty useless; just leave it out.


Segher
Michael Meissner Nov. 8, 2017, 8:02 p.m. UTC | #2
On Wed, Nov 08, 2017 at 08:01:06AM -0600, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Nov 08, 2017 at 08:14:31AM -0500, Michael Meissner wrote:
> > PowerPC ISA 3.0 does not have a byte-reverse instruction that operates on the
> > GPRs, but it does have vector byte swap half-word, word, double-word operations
> > in the VSX registers.  The enclosed patch enables generation of the byte
> > revseral instructions for register-register operations.  It still prefers to
> > generate the load with byte reverse (L{H,W,D}BRX) or store with byte reverse
> > (ST{H,W,D}BRX) instructions over the register sequence.
> > 
> > For 16-bit and 32-bit byte swaps, it typically does the tradational operation
> > in GPR registers, but it will generate XXBR{H,W} if the values are in vector
> > registers.
> 
> You can do the byteswaps with just VMX, too, with some rotate instructions
> (vrlh 8 for HI, vrlh 8 followed by vrlw 16 for SI, etc.)  For a future
> date, I suppose.
> 
> > For 64-bit swaps, it no longer generates the 9 instruction sequence in favor of
> > XXBRD.  I did some timing runs on a prototype power9 system, and it was
> > slightly faster to do direct move to the vecter unit, XXBRD, and direct move
> > back to a GPR than the traditional sequence.
> > 
> > I did bootstraps on little endian Power8 and Power9 systems (with the default
> > cpu set to power8 and power9 respectively).  There were no regressions.  Can I
> > check this patch into the trunk?
> > 
> > [gcc]
> > 2017-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
> > 
> > 	* config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems,
> > 	enable generating XXBR{H,W} if the value is in a vector
> > 	register.
> 
> Join the last two lines?  And you only mean XXBRH here.
> 
> > 	(bswapsi2_reg): Likewise.
> 
> And XXBRW here.

No, I meant the comment for both insns, but I will separate them.

> > 	(bswapdi2_reg): On ISA 3.0 systems, use XXBRD to do bswap64
> > 	instead of doing the GPR sequence used on previoius machines.
> 
> Typo ("previous").

Ok.

> > 	(bswapdi2_xxbrd): Likewise.
> 
> Not "Likewise"; just "New define_insn."

Ok.

> Should this somehow be joined with p9_xxbrd_<mode>?  Or maybe you should
> generate that, instead.

No, since p9_xxbrd_<mode> doesn't include DImode.  We could add yet another
mode iterator to include DI + V2DI/V2DF modes, but that seems to be overkill.

> > 	(bswapdi2_reg splitters): Use int_reg_operand instead of
> > 	gpc_reg_operand to not match when XXBRD is generated.
> 
> Please see if you can merge the define_splits to their corresponding
> define_insns (making define_insn_and_split).  Shorter, no mismatch
> possible between the two anymore, and you get a name for the patterns
> too :-)

No, there are two different patterns that generate the register to register
bswap64.  One is the fall through path on bswapdi2 if the current target is not
64-bit or is 64-bit and doesn't support LDBRX.  The second is from bswapdi2_reg
doing the split.

I simplified it to only change the one insn that would normally match the
register to register case to skip if ISA 3.0.  I put a test on bswapdi2_reg as
well.

> > @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2"
> >  	emit_insn (gen_bswapdi2_load (dest, src));
> >        else if (MEM_P (dest))
> >  	emit_insn (gen_bswapdi2_store (dest, src));
> > +      else if (TARGET_P9_VECTOR)
> > +	emit_insn (gen_bswapdi2_xxbrd (dest, src));
> >        else
> >  	emit_insn (gen_bswapdi2_reg (dest, src));
> >        DONE;
> 
> Pity that you cannot easily do similar for HI and SI.

Not really.  Bswap64 generates 9 separate insns, while bswap32 and bswap16 only
generate 3 insns.  So, having to add the move direct to/from the vector
registers would mean it would be slower than the normal case that is currently
generated.  But if the value happens to be in a VSX register, then we can do it
in one instruction.

Of course if we had a bswap insn in GPR registers, that would simplify things.
But we don't currently.

> > @@ -2544,7 +2559,8 @@ (define_insn "bswapdi2_reg"
> >     (clobber (match_scratch:DI 3 "=&r"))]
> >    "TARGET_POWERPC64 && TARGET_LDBRX"
> >    "#"
> > -  [(set_attr "length" "36")])
> > +  [(set_attr "length" "36")
> > +   (set_attr "type" "*")])
> 
> Explicitly setting "type" (or any other attr) to the default value is
> pretty useless; just leave it out.

Yep.  I missed that in simplifying the patch.  The first version, kept in the
GPR bswap patterns, and it needed to set the type vecperm on the VSX case.
After doing some testing, I yanked out the GPR support on ISA 3.0, but didn't
delete that line.

Here is the latest patch (tested and no regressions).  Can I check this into
the trunk?

After a burn-in period, I plan to backport a reduced version of the patch
(XXBRD only) to gcc 7, unless you think it shouldn't go into gcc 7.

[gcc]
2017-11-09  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems,
	enable generating XXBRH if the value is in a vector register.
	(bswapsi2_reg): On ISA 3.0 systems, enable generating XXBRW if the
	value is in a vector register.
	(bswapdi2_reg): On ISA 3.0 systems, always use XXBRD to do
	register to register bswap64's instead of doing the GPR sequence
	used on previous machines.
	(bswapdi2_xxbrd): New insn.
	(bswapdi2_reg): Disallow on ISA 3.0.
	(register to register bswap64 splitter): Do not split the insn on
	ISA 3.0 systems that use XXBRD.

[gcc/testsuite]
2017-11-09  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/p9-xxbr-3.c: New test.
Segher Boessenkool Nov. 10, 2017, 5:45 p.m. UTC | #3
Hi Mike,

On Wed, Nov 08, 2017 at 03:02:30PM -0500, Michael Meissner wrote:
> > Should this somehow be joined with p9_xxbrd_<mode>?  Or maybe you should
> > generate that, instead.
> 
> No, since p9_xxbrd_<mode> doesn't include DImode.  We could add yet another
> mode iterator to include DI + V2DI/V2DF modes, but that seems to be overkill.

Having separate DI and V2DI patterns for the same instruction isn't great
either.  If it is only this insn it is fine of course, but I suspect we'll
get many more later.  Well I guess we can solve it later ;-)

> I simplified it to only change the one insn that would normally match the
> register to register case to skip if ISA 3.0.  I put a test on bswapdi2_reg as
> well.

Thanks.

> 
> > > @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2"
> > >  	emit_insn (gen_bswapdi2_load (dest, src));
> > >        else if (MEM_P (dest))
> > >  	emit_insn (gen_bswapdi2_store (dest, src));
> > > +      else if (TARGET_P9_VECTOR)
> > > +	emit_insn (gen_bswapdi2_xxbrd (dest, src));
> > >        else
> > >  	emit_insn (gen_bswapdi2_reg (dest, src));
> > >        DONE;
> > 
> > Pity that you cannot easily do similar for HI and SI.
> 
> Not really.  Bswap64 generates 9 separate insns, while bswap32 and bswap16 only
> generate 3 insns.  So, having to add the move direct to/from the vector
> registers would mean it would be slower than the normal case that is currently
> generated.  But if the value happens to be in a VSX register, then we can do it
> in one instruction.

I meant, have just two lines as above :-)

> After a burn-in period, I plan to backport a reduced version of the patch
> (XXBRD only) to gcc 7, unless you think it shouldn't go into gcc 7.

Well, why do we want it on 7?

> +(define_insn "bswapdi2_xxbrd"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=wo")
> +	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wo")))]
> +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> +  "xxbrd %x0,%x1"
> +  [(set_attr "type" "vecperm")])

This doesn't need TARGET_POWERPC64 I think.

Please look at that.  The patch is okay for trunk.  Thanks!


Segher
Michael Meissner Nov. 10, 2017, 11:03 p.m. UTC | #4
On Fri, Nov 10, 2017 at 11:45:23AM -0600, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Nov 08, 2017 at 03:02:30PM -0500, Michael Meissner wrote:
> > > Should this somehow be joined with p9_xxbrd_<mode>?  Or maybe you should
> > > generate that, instead.
> > 
> > No, since p9_xxbrd_<mode> doesn't include DImode.  We could add yet another
> > mode iterator to include DI + V2DI/V2DF modes, but that seems to be overkill.
> 
> Having separate DI and V2DI patterns for the same instruction isn't great
> either.  If it is only this insn it is fine of course, but I suspect we'll
> get many more later.  Well I guess we can solve it later ;-)
> 
> > I simplified it to only change the one insn that would normally match the
> > register to register case to skip if ISA 3.0.  I put a test on bswapdi2_reg as
> > well.
> 
> Thanks.
> 
> > 
> > > > @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2"
> > > >  	emit_insn (gen_bswapdi2_load (dest, src));
> > > >        else if (MEM_P (dest))
> > > >  	emit_insn (gen_bswapdi2_store (dest, src));
> > > > +      else if (TARGET_P9_VECTOR)
> > > > +	emit_insn (gen_bswapdi2_xxbrd (dest, src));
> > > >        else
> > > >  	emit_insn (gen_bswapdi2_reg (dest, src));
> > > >        DONE;
> > > 
> > > Pity that you cannot easily do similar for HI and SI.
> > 
> > Not really.  Bswap64 generates 9 separate insns, while bswap32 and bswap16 only
> > generate 3 insns.  So, having to add the move direct to/from the vector
> > registers would mean it would be slower than the normal case that is currently
> > generated.  But if the value happens to be in a VSX register, then we can do it
> > in one instruction.
> 
> I meant, have just two lines as above :-)
> 
> > After a burn-in period, I plan to backport a reduced version of the patch
> > (XXBRD only) to gcc 7, unless you think it shouldn't go into gcc 7.
> 
> Well, why do we want it on 7?

Advanced customers and the kernel team using pre-production power9 servers have
asked for it.  Since GCC 8 likely won't be out for several months, I figured to
ask for it to go into GCC 7.

> > +(define_insn "bswapdi2_xxbrd"
> > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=wo")
> > +	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wo")))]
> > +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> > +  "xxbrd %x0,%x1"
> > +  [(set_attr "type" "vecperm")])
> 
> This doesn't need TARGET_POWERPC64 I think.
> 
> Please look at that.  The patch is okay for trunk.  Thanks!

I removed the test for POWERPC64.  It isn't needed, but I don't have any way of
testing whether the resulting code is slower or faster.  I'll attach the patch
of the changes I checked in.
Segher Boessenkool Nov. 13, 2017, 4:39 p.m. UTC | #5
Hi!

On Fri, Nov 10, 2017 at 06:03:29PM -0500, Michael Meissner wrote:
> On Fri, Nov 10, 2017 at 11:45:23AM -0600, Segher Boessenkool wrote:
> > > After a burn-in period, I plan to backport a reduced version of the patch
> > > (XXBRD only) to gcc 7, unless you think it shouldn't go into gcc 7.
> > 
> > Well, why do we want it on 7?
> 
> Advanced customers and the kernel team using pre-production power9 servers have
> asked for it.  Since GCC 8 likely won't be out for several months, I figured to
> ask for it to go into GCC 7.

Well, convince me (off-list, if you want).

> > > +(define_insn "bswapdi2_xxbrd"
> > > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=wo")
> > > +	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wo")))]
> > > +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> > > +  "xxbrd %x0,%x1"
> > > +  [(set_attr "type" "vecperm")])
> > 
> > This doesn't need TARGET_POWERPC64 I think.
> > 
> > Please look at that.  The patch is okay for trunk.  Thanks!
> 
> I removed the test for POWERPC64.  It isn't needed, but I don't have any way of
> testing whether the resulting code is slower or faster.  I'll attach the patch
> of the changes I checked in.

> +  if (TARGET_P9_VECTOR && !MEM_P (src) && !MEM_P (dest))
> +    {
> +      emit_insn (gen_bswapdi2_xxbrd (dest, src));
> +      DONE;
> +    }

That's not what I asked -- just remove the TARGET_POWERPC from the insn,
you shouldn't change the expander (unless it really works better?  I doubt
it though).


Segher
diff mbox series

Patch

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 254516)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -2432,13 +2432,15 @@  (define_insn "bswap<mode>2_store"
   [(set_attr "type" "store")])
 
 (define_insn_and_split "bswaphi2_reg"
-  [(set (match_operand:HI 0 "gpc_reg_operand" "=&r")
+  [(set (match_operand:HI 0 "gpc_reg_operand" "=&r,wo")
 	(bswap:HI
-	 (match_operand:HI 1 "gpc_reg_operand" "r")))
-   (clobber (match_scratch:SI 2 "=&r"))]
+	 (match_operand:HI 1 "gpc_reg_operand" "r,wo")))
+   (clobber (match_scratch:SI 2 "=&r,X"))]
   ""
-  "#"
-  "reload_completed"
+  "@
+   #
+   xxbrh %x0,%x1"
+  "reload_completed && int_reg_operand (operands[0], HImode)"
   [(set (match_dup 3)
 	(and:SI (lshiftrt:SI (match_dup 4)
 			     (const_int 8))
@@ -2454,18 +2456,20 @@  (define_insn_and_split "bswaphi2_reg"
   operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0);
   operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
 }
-  [(set_attr "length" "12")
-   (set_attr "type" "*")])
+  [(set_attr "length" "12,4")
+   (set_attr "type" "*,vecperm")])
 
 ;; We are always BITS_BIG_ENDIAN, so the bit positions below in
 ;; zero_extract insns do not change for -mlittle.
 (define_insn_and_split "bswapsi2_reg"
-  [(set (match_operand:SI 0 "gpc_reg_operand" "=&r")
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=&r,wo")
 	(bswap:SI
-	 (match_operand:SI 1 "gpc_reg_operand" "r")))]
+	 (match_operand:SI 1 "gpc_reg_operand" "r,wo")))]
   ""
-  "#"
-  "reload_completed"
+  "@
+   #
+   xxbrw %x0,%x1"
+  "reload_completed && int_reg_operand (operands[0], SImode)"
   [(set (match_dup 0)					; DABC
 	(rotate:SI (match_dup 1)
 		   (const_int 24)))
@@ -2481,7 +2485,9 @@  (define_insn_and_split "bswapsi2_reg"
 			(const_int 255))
 		(and:SI (match_dup 0)
 			(const_int -256))))]
-  "")
+  ""
+  [(set_attr "length" "12,4")
+   (set_attr "type" "*,vecperm")])
 
 ;; On systems with LDBRX/STDBRX generate the loads/stores directly, just like
 ;; we do for L{H,W}BRX and ST{H,W}BRX above.  If not, we have to generate more
@@ -2507,6 +2513,8 @@  (define_expand "bswapdi2"
 	emit_insn (gen_bswapdi2_load (dest, src));
       else if (MEM_P (dest))
 	emit_insn (gen_bswapdi2_store (dest, src));
+      else if (TARGET_P9_VECTOR)
+	emit_insn (gen_bswapdi2_xxbrd (dest, src));
       else
 	emit_insn (gen_bswapdi2_reg (dest, src));
       DONE;
@@ -2537,6 +2545,13 @@  (define_insn "bswapdi2_store"
   "stdbrx %1,%y0"
   [(set_attr "type" "store")])
 
+(define_insn "bswapdi2_xxbrd"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=wo")
+	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "wo")))]
+  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
+  "xxbrd %x0,%x1"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "bswapdi2_reg"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=&r")
 	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "r")))
@@ -2544,7 +2559,8 @@  (define_insn "bswapdi2_reg"
    (clobber (match_scratch:DI 3 "=&r"))]
   "TARGET_POWERPC64 && TARGET_LDBRX"
   "#"
-  [(set_attr "length" "36")])
+  [(set_attr "length" "36")
+   (set_attr "type" "*")])
 
 ;; Non-power7/cell, fall back to use lwbrx/stwbrx
 (define_insn "*bswapdi2_64bit"
@@ -2560,7 +2576,7 @@  (define_insn "*bswapdi2_64bit"
   [(set_attr "length" "16,12,36")])
 
 (define_split
-  [(set (match_operand:DI 0 "gpc_reg_operand" "")
+  [(set (match_operand:DI 0 "int_reg_operand" "")
 	(bswap:DI (match_operand:DI 1 "indexed_or_indirect_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
    (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
@@ -2625,7 +2641,7 @@  (define_split
 
 (define_split
   [(set (match_operand:DI 0 "indexed_or_indirect_operand" "")
-	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "")))
+	(bswap:DI (match_operand:DI 1 "int_reg_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
    (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
   "TARGET_POWERPC64 && !TARGET_LDBRX && reload_completed"
@@ -2687,10 +2703,10 @@  (define_split
 }")
 
 (define_split
-  [(set (match_operand:DI 0 "gpc_reg_operand" "")
-	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "")))
-   (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
+  [(set (match_operand:DI 0 "int_reg_operand" "")
+	(bswap:DI (match_operand:DI 1 "int_reg_operand" "")))
+   (clobber (match_operand:DI 2 "int_reg_operand" ""))
+   (clobber (match_operand:DI 3 "int_reg_operand" ""))]
   "TARGET_POWERPC64 && reload_completed"
   [(const_int 0)]
   "
@@ -2722,9 +2738,9 @@  (define_insn "bswapdi2_32bit"
   [(set_attr "length" "16,12,36")])
 
 (define_split
-  [(set (match_operand:DI 0 "gpc_reg_operand" "")
+  [(set (match_operand:DI 0 "int_reg_operand" "")
 	(bswap:DI (match_operand:DI 1 "indexed_or_indirect_operand" "")))
-   (clobber (match_operand:SI 2 "gpc_reg_operand" ""))]
+   (clobber (match_operand:SI 2 "int_reg_operand" ""))]
   "!TARGET_POWERPC64 && reload_completed"
   [(const_int 0)]
   "
Index: gcc/testsuite/gcc.target/powerpc/p9-xxbr-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p9-xxbr-3.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/p9-xxbr-3.c	(working copy)
@@ -0,0 +1,99 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mpower9-vector -O2" } */
+
+/* Verify that the XXBR{H,W} instructions are generated if the value is
+   forced to be in a vector register, and XXBRD is generated all of the
+   time for register bswap64's.  */
+
+unsigned short
+do_bswap16_mem (unsigned short *p)
+{
+  return __builtin_bswap16 (*p);	/* LHBRX.  */
+}
+
+unsigned short
+do_bswap16_reg (unsigned short a)
+{
+  return __builtin_bswap16 (a);		/* gpr sequences.  */
+}
+
+void
+do_bswap16_store (unsigned short *p, unsigned short a)
+{
+  *p = __builtin_bswap16 (a);		/* STHBRX.  */
+}
+
+unsigned short
+do_bswap16_vect (unsigned short a)
+{
+  __asm__ (" # %x0" : "+v" (a));
+  return __builtin_bswap16 (a);		/* XXBRW.  */
+}
+
+unsigned int
+do_bswap32_mem (unsigned int *p)
+{
+  return __builtin_bswap32 (*p);	/* LWBRX.  */
+}
+
+unsigned int
+do_bswap32_reg (unsigned int a)
+{
+  return __builtin_bswap32 (a);		/* gpr sequences.  */
+}
+
+void
+do_bswap32_store (unsigned int *p, unsigned int a)
+{
+  *p = __builtin_bswap32 (a);		/* STWBRX.  */
+}
+
+unsigned int
+do_bswap32_vect (unsigned int a)
+{
+  __asm__ (" # %x0" : "+v" (a));
+  return __builtin_bswap32 (a);		/* XXBRW.  */
+}
+
+unsigned long
+do_bswap64_mem (unsigned long *p)
+{
+  return __builtin_bswap64 (*p);	/* LDBRX.  */
+}
+
+unsigned long
+do_bswap64_reg (unsigned long a)
+{
+  return __builtin_bswap64 (a);		/* gpr sequences.  */
+}
+
+void
+do_bswap64_store (unsigned long *p, unsigned int a)
+{
+  *p = __builtin_bswap64 (a);		/* STDBRX.  */
+}
+
+double
+do_bswap64_double (unsigned long a)
+{
+  return (double) __builtin_bswap64 (a);	/* XXBRD.  */
+}
+
+unsigned long
+do_bswap64_vect (unsigned long a)
+{
+  __asm__ (" # %x0" : "+v" (a));	/* XXBRD.  */
+  return __builtin_bswap64 (a);
+}
+
+/* Make sure XXBR{H,W,D} is not generated by default.  */
+/* { dg-final { scan-assembler-times "xxbrd"  3  } } */
+/* { dg-final { scan-assembler-times "xxbrh"  1  } } */
+/* { dg-final { scan-assembler-times "xxbrw"  1  } } */
+/* { dg-final { scan-assembler-times "ldbrx"  1  } } */
+/* { dg-final { scan-assembler-times "lhbrx"  1  } } */
+/* { dg-final { scan-assembler-times "lwbrx"  1  } } */
+/* { dg-final { scan-assembler-times "stdbrx" 1  } } */
+/* { dg-final { scan-assembler-times "sthbrx" 1  } } */
+/* { dg-final { scan-assembler-times "stwbrx" 1  } } */