diff mbox series

[RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

Message ID 3bd3efa2-14af-1622-bcb9-d583137f7d5e@linux.ibm.com
State New
Headers show
Series [RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars | expand

Commit Message

Peter Bergner Nov. 13, 2018, 5:29 p.m. UTC
PR87870 shows a problem loading simple constant values into TImode variables.
This is a regression ever since VSX was added and we started using the
vsx_mov<mode>_64bit pattern.  We still get the correct code on trunk if we
compile with -mno-vsx, since we fall back to using the older mov<mode>_ppc64
move pattern, which has an alternative "r" <- "n".

Our current vsx_mov<mode>_64bit pattern currently has two alternatives for
loading constants into GPRs, one using "*r" <- "jwM" and "??r" <- "W".
These look redundant to me, since "W" contains support for both all-zero
constants (ie, "j") and all-one constants (ie, wM) as well as a few more.
My patch below consolidates them both and uses a new mode iterator that
uses "W" for the vector modes and "n" for TImode like mov<mode>_ppc64
used.

I'll note I didn't change the vsx_mov<mode>_32bit pattern, since TImode
isn't supported with -m32.  However, if you want, I could remove the
redundant "*r" <- "jwM" alternative there too?

This passes bootstrap and regtesting with no regressions.  Ok for trunk?

Peter


gcc/
	PR target/87870
	* config/rs6000/vsx.md (nW): New mode iterator.
	(vsx_mov<mode>_64bit): Use it.  Remove redundant GPR 0/-1 alternative.

gcc/testsuite/
	PR target/87870
	* gcc.target/powerpc/pr87870.c: New test.

Comments

Segher Boessenkool Nov. 16, 2018, 5:06 p.m. UTC | #1
On Tue, Nov 13, 2018 at 11:29:20AM -0600, Peter Bergner wrote:
> PR87870 shows a problem loading simple constant values into TImode variables.
> This is a regression ever since VSX was added and we started using the
> vsx_mov<mode>_64bit pattern.  We still get the correct code on trunk if we
> compile with -mno-vsx, since we fall back to using the older mov<mode>_ppc64
> move pattern, which has an alternative "r" <- "n".
> 
> Our current vsx_mov<mode>_64bit pattern currently has two alternatives for
> loading constants into GPRs, one using "*r" <- "jwM" and "??r" <- "W".
> These look redundant to me, since "W" contains support for both all-zero
> constants (ie, "j") and all-one constants (ie, wM) as well as a few more.
> My patch below consolidates them both and uses a new mode iterator that
> uses "W" for the vector modes and "n" for TImode like mov<mode>_ppc64
> used.

Why does the "r" have a "*"?  Should it have been just a "?"?

"W" is easy_vector_constant, which requires const_vector always; is that
okay here?

> I'll note I didn't change the vsx_mov<mode>_32bit pattern, since TImode
> isn't supported with -m32.  However, if you want, I could remove the
> redundant "*r" <- "jwM" alternative there too?

Yeah, please keep the patterns in synch.


Segher
Peter Bergner Nov. 16, 2018, 7:33 p.m. UTC | #2
On 11/16/18 11:06 AM, Segher Boessenkool wrote:
> Why does the "r" have a "*"?  Should it have been just a "?"?

Maybe Mike remembers why he added it?  I'd guess it was probably to
try and steer the vector modes away from GPRs.  However, since that
alternative was also suppose to handle TImode, "*" is probably not
what we want for that.  The constraint for the new combines alternative
uses a mode iterator so that the vector modes use "??r" while TImode
uses "r".  I think that is what we want.


> "W" is easy_vector_constant, which requires const_vector always; is that
> okay here?

Well, "wM" calls all_ones_constant which also requires const_vector, so
if it isn't correct now, then it wasn't correct before my patch either.
Since I'm using the (new) mode iterator <nW> and the "W" is only used
for the vector modes, I think we're ok here, aren't we?


>> I'll note I didn't change the vsx_mov<mode>_32bit pattern, since TImode
>> isn't supported with -m32.  However, if you want, I could remove the
>> redundant "*r" <- "jwM" alternative there too?
> 
> Yeah, please keep the patterns in synch.

Ok, I'll do the same thing and retest.


Peter
Michael Meissner Nov. 16, 2018, 8:50 p.m. UTC | #3
On Fri, Nov 16, 2018 at 01:33:58PM -0600, Peter Bergner wrote:
> On 11/16/18 11:06 AM, Segher Boessenkool wrote:
> > Why does the "r" have a "*"?  Should it have been just a "?"?
> 
> Maybe Mike remembers why he added it?  I'd guess it was probably to
> try and steer the vector modes away from GPRs.  However, since that
> alternative was also suppose to handle TImode, "*" is probably not
> what we want for that.  The constraint for the new combines alternative
> uses a mode iterator so that the vector modes use "??r" while TImode
> uses "r".  I think that is what we want.

I imagine when I made the original change we had a different move for TImode
(and the -mvsx-timode hack) than for vector modes.  And it was likely pre-LRA
also.
Peter Bergner Nov. 16, 2018, 10:26 p.m. UTC | #4
On 11/16/18 11:06 AM, Segher Boessenkool wrote:
> On Tue, Nov 13, 2018 at 11:29:20AM -0600, Peter Bergner wrote:
>> I'll note I didn't change the vsx_mov<mode>_32bit pattern, since TImode
>> isn't supported with -m32.  However, if you want, I could remove the
>> redundant "*r" <- "jwM" alternative there too?
> 
> Yeah, please keep the patterns in synch.

Ok, here is the patch with the vsx_mov<mod>_32bit pattern changed too.

However, when I made the change below, the length attribute seems a
little off.  For *_64bit, we have a length of 4, but for *_32bit, we
have a length of 32.  The "4" looks correct for both *_64bit and *_32bit
if we're loading an easy_vector_constant into one of the vector regs.
For loading a TImode constant into a GPR, then it could be anything from
8 bytes to 40bytes (loading 0xdeadbeefcafebabefacefeedbaaaaaad) for
-m64.  Since TImode isn't supposrted in -m32 (yet?), who knows, probably
it would be 16 bytes to ??? bytes.

Should those sizes be updated too?  If so, what should they be?
The smallest, average or worst case lengths?  I assume we could use
another iterator to separate the vector lengths from the gpr lengths
if we need to.

Peter


gcc/
	PR target/87870
	* config/rs6000/vsx.md (nW): New mode iterator.
	(vsx_mov<mode>_64bit): Use it.  Remove redundant GPR 0/-1 alternative.

gcc/testsuite/
	PR target/87870
	* gcc.target/powerpc/pr87870.c: New test.

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 265971)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -183,6 +183,18 @@ (define_mode_attr ??r	[(V16QI	"??r")
 			 (TF	"??r")
 			 (TI	"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW	[(V16QI	"W")
+			 (V8HI	"W")
+			 (V4SI	"W")
+			 (V4SF	"W")
+			 (V2DI	"W")
+			 (V2DF	"W")
+			 (V1TI	"W")
+			 (KF	"W")
+			 (TF	"W")
+			 (TI	"n")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
 		       (V2DF  "v2di")
@@ -1193,17 +1205,17 @@ (define_insn_and_split "*xxspltib_<mode>
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              VSX 0/-1   GPR 0/-1   VMX const GPR const  LVX (VMX)   STVX (VMX)
+;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      <VSa>,     <VSa>,     r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wo,        v,
-                ?<VSa>,    *r,        v,         ??r,       wZ,        v")
+                ?<VSa>,    v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "<VSa>,     ZwO,       <VSa>,     we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                ?jwM,      jwM,       W,         W,         v,         wZ"))]
+                ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1214,25 +1226,25 @@ (define_insn "vsx_mov<mode>_64bit"
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecsimple, *,         *,         *,         vecstore,  vecload")
+                vecsimple, *,         *,         vecstore,  vecload")
 
    (set_attr "length"
                "4,         4,         4,         8,         4,         8,
                 8,         8,         8,         8,         4,         4,
-                4,         8,         20,        20,        4,         4")])
+                4,         20,        20,        4,         4")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   GPR 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      <VSa>,     <VSa>,     ??r,       ??Y,       <??r>,
-                wo,        v,         ?<VSa>,    *r,        v,         ??r,
+                wo,        v,         ?<VSa>,    v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "<VSa>,     ZwO,       <VSa>,     Y,         r,         r,
-                wE,        jwM,       ?jwM,      jwM,       W,         W,
+                wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1243,12 +1255,12 @@ (define_insn "*vsx_mov<mode>_32bit"
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,        *,
+                vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
 
    (set_attr "length"
                "4,         4,         4,         16,        16,        16,
-                4,         4,         4,         16,        20,        32,
+                4,         4,         4,         20,        32,
                 4,         4")])
 
 ;; Explicit  load/store expanders for the builtin functions
Index: gcc/testsuite/gcc.target/powerpc/pr87870.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr87870.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr87870.c	(working copy)
@@ -0,0 +1,28 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-options "-O2" } */
+
+__int128
+test0 (void)
+{
+  return 0;
+}
+
+__int128
+test1 (void)
+{
+  return 1;
+}
+
+__int128
+test2 (void)
+{
+  return -1;
+}
+
+__int128
+test3 (void)
+{
+  return ((__int128)0xdeadbeefcafebabe << 64) | 0xfacefeedbaaaaaad;
+}
+
+/* { dg-final { scan-assembler-not {\mld\M} } } */
Segher Boessenkool Nov. 16, 2018, 11:18 p.m. UTC | #5
On Fri, Nov 16, 2018 at 01:33:58PM -0600, Peter Bergner wrote:
> On 11/16/18 11:06 AM, Segher Boessenkool wrote:
> > "W" is easy_vector_constant, which requires const_vector always; is that
> > okay here?
> 
> Well, "wM" calls all_ones_constant which also requires const_vector, so

Does it?

(define_predicate "all_ones_constant"
  (and (match_code "const_int,const_double,const_wide_int,const_vector")
       (match_test "op == CONSTM1_RTX (mode) && !FLOAT_MODE_P (mode)")))

And "j" is defined as

(define_constraint "j"
  "Zero vector constant"
  (match_test "op == const0_rtx || op == CONST0_RTX (mode)"))

> if it isn't correct now, then it wasn't correct before my patch either.
> Since I'm using the (new) mode iterator <nW> and the "W" is only used
> for the vector modes, I think we're ok here, aren't we?

That's a good point yes.  Okay.

> >> I'll note I didn't change the vsx_mov<mode>_32bit pattern, since TImode
> >> isn't supported with -m32.  However, if you want, I could remove the
> >> redundant "*r" <- "jwM" alternative there too?
> > 
> > Yeah, please keep the patterns in synch.
> 
> Ok, I'll do the same thing and retest.

Could you also check (manually) that the compiler still handles things as
it should here?  Okay for trunk if it does.  Thanks!


Segher
Segher Boessenkool Nov. 16, 2018, 11:29 p.m. UTC | #6
On Fri, Nov 16, 2018 at 04:26:18PM -0600, Peter Bergner wrote:
> On 11/16/18 11:06 AM, Segher Boessenkool wrote:
> > On Tue, Nov 13, 2018 at 11:29:20AM -0600, Peter Bergner wrote:
> >> I'll note I didn't change the vsx_mov<mode>_32bit pattern, since TImode
> >> isn't supported with -m32.  However, if you want, I could remove the
> >> redundant "*r" <- "jwM" alternative there too?
> > 
> > Yeah, please keep the patterns in synch.
> 
> Ok, here is the patch with the vsx_mov<mod>_32bit pattern changed too.
> 
> However, when I made the change below, the length attribute seems a
> little off.  For *_64bit, we have a length of 4, but for *_32bit, we
> have a length of 32.  The "4" looks correct for both *_64bit and *_32bit
> if we're loading an easy_vector_constant into one of the vector regs.
> For loading a TImode constant into a GPR, then it could be anything from
> 8 bytes to 40bytes (loading 0xdeadbeefcafebabefacefeedbaaaaaad) for
> -m64.  Since TImode isn't supposrted in -m32 (yet?), who knows, probably
> it would be 16 bytes to ??? bytes.
> 
> Should those sizes be updated too?  If so, what should they be?
> The smallest, average or worst case lengths?  I assume we could use
> another iterator to separate the vector lengths from the gpr lengths
> if we need to.

Worst case.  This is required for correctness.


Segher
Peter Bergner Dec. 13, 2018, 4:59 p.m. UTC | #7
On 11/16/18 5:29 PM, Segher Boessenkool wrote:
> On Fri, Nov 16, 2018 at 04:26:18PM -0600, Peter Bergner wrote:
>> However, when I made the change below, the length attribute seems a
>> little off.  For *_64bit, we have a length of 4, but for *_32bit, we
>> have a length of 32.  The "4" looks correct for both *_64bit and *_32bit
>> if we're loading an easy_vector_constant into one of the vector regs.
>> For loading a TImode constant into a GPR, then it could be anything from
>> 8 bytes to 40bytes (loading 0xdeadbeefcafebabefacefeedbaaaaaad) for
>> -m64.  Since TImode isn't supposrted in -m32 (yet?), who knows, probably
>> it would be 16 bytes to ??? bytes.
>>
>> Should those sizes be updated too?  If so, what should they be?
>> The smallest, average or worst case lengths?  I assume we could use
>> another iterator to separate the vector lengths from the gpr lengths
>> if we need to.
> 
> Worst case.  This is required for correctness.

Ok, I looked into this and the point where we must have correct length info
is in final assembly generation, so very very late.  For the alternatives
I'm changing, we're loading into GPR regs and these alternatives are always
split (split2), so these length values are never used/seen at final assembly
time.

Given the above, I'm guessing we should probably go with the most common
length value (ie, 8 for 64-bit and 16 for 32-bit)?  The following patch
implements that.  Does this seem reasonable to you?

Peter

gcc/
	PR target/87870
	* config/rs6000/vsx.md (nW): New mode iterator.
	(vsx_mov<mode>_64bit): Use it.  Remove redundant GPR 0/-1 alternative.
	Update length attribute for (<??r>, <nW>)  alternative.
	(vsx_mov<mode>_32bit): Likewise.

gcc/testsuite/
	PR target/87870
	* gcc.target/powerpc/pr87870.c: New test.


Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 265971)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -183,6 +183,18 @@ (define_mode_attr ??r	[(V16QI	"??r")
 			 (TF	"??r")
 			 (TI	"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW	[(V16QI	"W")
+			 (V8HI	"W")
+			 (V4SI	"W")
+			 (V4SF	"W")
+			 (V2DI	"W")
+			 (V2DF	"W")
+			 (V1TI	"W")
+			 (KF	"W")
+			 (TF	"W")
+			 (TI	"n")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
 		       (V2DF  "v2di")
@@ -1193,17 +1205,17 @@ (define_insn_and_split "*xxspltib_<mode>
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              VSX 0/-1   GPR 0/-1   VMX const GPR const  LVX (VMX)   STVX (VMX)
+;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      <VSa>,     <VSa>,     r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wo,        v,
-                ?<VSa>,    *r,        v,         ??r,       wZ,        v")
+                ?<VSa>,    v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "<VSa>,     ZwO,       <VSa>,     we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                ?jwM,      jwM,       W,         W,         v,         wZ"))]
+                ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1214,25 +1226,25 @@ (define_insn "vsx_mov<mode>_64bit"
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecsimple, *,         *,         *,         vecstore,  vecload")
+                vecsimple, *,         *,         vecstore,  vecload")
 
    (set_attr "length"
                "4,         4,         4,         8,         4,         8,
                 8,         8,         8,         8,         4,         4,
-                4,         8,         20,        20,        4,         4")])
+                4,         20,        8,         4,         4")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   GPR 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      <VSa>,     <VSa>,     ??r,       ??Y,       <??r>,
-                wo,        v,         ?<VSa>,    *r,        v,         ??r,
+                wo,        v,         ?<VSa>,    v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "<VSa>,     ZwO,       <VSa>,     Y,         r,         r,
-                wE,        jwM,       ?jwM,      jwM,       W,         W,
+                wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1243,12 +1255,12 @@ (define_insn "*vsx_mov<mode>_32bit"
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,        *,
+                vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
 
    (set_attr "length"
                "4,         4,         4,         16,        16,        16,
-                4,         4,         4,         16,        20,        32,
+                4,         4,         4,         20,        16,
                 4,         4")])
 
 ;; Explicit  load/store expanders for the builtin functions
Index: gcc/testsuite/gcc.target/powerpc/pr87870.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr87870.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr87870.c	(working copy)
@@ -0,0 +1,28 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-options "-O2" } */
+
+__int128
+test0 (void)
+{
+  return 0;
+}
+
+__int128
+test1 (void)
+{
+  return 1;
+}
+
+__int128
+test2 (void)
+{
+  return -1;
+}
+
+__int128
+test3 (void)
+{
+  return ((__int128)0xdeadbeefcafebabe << 64) | 0xfacefeedbaaaaaad;
+}
+
+/* { dg-final { scan-assembler-not {\mld\M} } } */
Segher Boessenkool Dec. 15, 2018, 2:23 a.m. UTC | #8
On Thu, Dec 13, 2018 at 10:59:36AM -0600, Peter Bergner wrote:
> On 11/16/18 5:29 PM, Segher Boessenkool wrote:
> > On Fri, Nov 16, 2018 at 04:26:18PM -0600, Peter Bergner wrote:
> >> However, when I made the change below, the length attribute seems a
> >> little off.  For *_64bit, we have a length of 4, but for *_32bit, we
> >> have a length of 32.  The "4" looks correct for both *_64bit and *_32bit
> >> if we're loading an easy_vector_constant into one of the vector regs.
> >> For loading a TImode constant into a GPR, then it could be anything from
> >> 8 bytes to 40bytes (loading 0xdeadbeefcafebabefacefeedbaaaaaad) for
> >> -m64.  Since TImode isn't supposrted in -m32 (yet?), who knows, probably
> >> it would be 16 bytes to ??? bytes.
> >>
> >> Should those sizes be updated too?  If so, what should they be?
> >> The smallest, average or worst case lengths?  I assume we could use
> >> another iterator to separate the vector lengths from the gpr lengths
> >> if we need to.
> > 
> > Worst case.  This is required for correctness.
> 
> Ok, I looked into this and the point where we must have correct length info
> is in final assembly generation, so very very late.

I'm not convinced this is true.  And problems with this only show up
with unusual code, so typically after releases :-(

Maybe we could make a mode where conditional jumps can jump only 128
bytes or similar, that would make testing much easier (problems will
show up much more often than with the 32kB max distance we have).

> For the alternatives
> I'm changing, we're loading into GPR regs and these alternatives are always
> split (split2), so these length values are never used/seen at final assembly
> time.

But some move instructions are created *after* split2.

> Given the above, I'm guessing we should probably go with the most common
> length value (ie, 8 for 64-bit and 16 for 32-bit)?  The following patch
> implements that.  Does this seem reasonable to you?

I do like the patch.  Let me sleep on it.


Segher
Peter Bergner Dec. 17, 2018, 8:23 p.m. UTC | #9
On 12/14/18 8:23 PM, Segher Boessenkool wrote:
> On Thu, Dec 13, 2018 at 10:59:36AM -0600, Peter Bergner wrote:
>> For the alternatives
>> I'm changing, we're loading into GPR regs and these alternatives are always
>> split (split2), so these length values are never used/seen at final assembly
>> time.
> 
> But some move instructions are created *after* split2.

That may be so, but I do not know how we could create a move insn using
this alternative, since rs6000_output_move_128bit() does the following
for this alternative (ie, loading a constant into a GPR):


  /* Constants.  */
  else if (dest_regno >= 0
           && (GET_CODE (src) == CONST_INT
               || GET_CODE (src) == CONST_WIDE_INT
               || GET_CODE (src) == CONST_DOUBLE
               || GET_CODE (src) == CONST_VECTOR))
    {
      if (dest_gpr_p)
        return "#";

This means we require the insn to eventually be split and not reach final
assembly, does it not?

Peter
Segher Boessenkool Dec. 17, 2018, 9:55 p.m. UTC | #10
On Mon, Dec 17, 2018 at 02:23:54PM -0600, Peter Bergner wrote:
> On 12/14/18 8:23 PM, Segher Boessenkool wrote:
> > On Thu, Dec 13, 2018 at 10:59:36AM -0600, Peter Bergner wrote:
> >> For the alternatives
> >> I'm changing, we're loading into GPR regs and these alternatives are always
> >> split (split2), so these length values are never used/seen at final assembly
> >> time.
> > 
> > But some move instructions are created *after* split2.
> 
> That may be so, but I do not know how we could create a move insn using
> this alternative, since rs6000_output_move_128bit() does the following
> for this alternative (ie, loading a constant into a GPR):
> 
> 
>   /* Constants.  */
>   else if (dest_regno >= 0
>            && (GET_CODE (src) == CONST_INT
>                || GET_CODE (src) == CONST_WIDE_INT
>                || GET_CODE (src) == CONST_DOUBLE
>                || GET_CODE (src) == CONST_VECTOR))
>     {
>       if (dest_gpr_p)
>         return "#";
> 
> This means we require the insn to eventually be split and not reach final
> assembly, does it not?

Yes, but is the length attribute used for something that matters before
that?  For correctness, not just for better code.  It isn't clear to me
that this will work.

OTOH I cannot currently show it does not work either.  So, okay for trunk,
and I hope it actually works :-)


Segher
Peter Bergner Dec. 17, 2018, 10:09 p.m. UTC | #11
On 12/17/18 3:55 PM, Segher Boessenkool wrote:
> On Mon, Dec 17, 2018 at 02:23:54PM -0600, Peter Bergner wrote:
>> This means we require the insn to eventually be split and not reach final
>> assembly, does it not?
> 
> Yes, but is the length attribute used for something that matters before
> that?  For correctness, not just for better code.  It isn't clear to me
> that this will work.
> 
> OTOH I cannot currently show it does not work either.  So, okay for trunk,
> and I hope it actually works :-)

Ok, committed.  I'll try and keep an eye out for any fallout, but I'm
fairly certain there won't be.  Thanks!

Peter
diff mbox series

Patch

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 265971)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -183,6 +183,18 @@  (define_mode_attr ??r	[(V16QI	"??r")
 			 (TF	"??r")
 			 (TI	"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW	[(V16QI	"W")
+			 (V8HI	"W")
+			 (V4SI	"W")
+			 (V4SF	"W")
+			 (V2DI	"W")
+			 (V2DF	"W")
+			 (V1TI	"W")
+			 (KF	"W")
+			 (TF	"W")
+			 (TI	"n")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
 		       (V2DF  "v2di")
@@ -1193,17 +1205,17 @@  (define_insn_and_split "*xxspltib_<mode>
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              VSX 0/-1   GPR 0/-1   VMX const GPR const  LVX (VMX)   STVX (VMX)
+;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      <VSa>,     <VSa>,     r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wo,        v,
-                ?<VSa>,    *r,        v,         ??r,       wZ,        v")
+                ?<VSa>,    v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "<VSa>,     ZwO,       <VSa>,     we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                ?jwM,      jwM,       W,         W,         v,         wZ"))]
+                ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1214,12 +1226,12 @@  (define_insn "vsx_mov<mode>_64bit"
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecsimple, *,         *,         *,         vecstore,  vecload")
+                vecsimple, *,         *,         vecstore,  vecload")
 
    (set_attr "length"
                "4,         4,         4,         8,         4,         8,
                 8,         8,         8,         8,         4,         4,
-                4,         8,         20,        20,        4,         4")])
+                4,         20,        20,        4,         4")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   GPR 0/-1   VMX const  GPR const
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 265971)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -183,6 +183,18 @@  (define_mode_attr ??r	[(V16QI	"??r")
 			 (TF	"??r")
 			 (TI	"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW	[(V16QI	"W")
+			 (V8HI	"W")
+			 (V4SI	"W")
+			 (V4SF	"W")
+			 (V2DI	"W")
+			 (V2DF	"W")
+			 (V1TI	"W")
+			 (KF	"W")
+			 (TF	"W")
+			 (TI	"n")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
 		       (V2DF  "v2di")
@@ -1193,17 +1205,17 @@  (define_insn_and_split "*xxspltib_<mode>
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              VSX 0/-1   GPR 0/-1   VMX const GPR const  LVX (VMX)   STVX (VMX)
+;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      <VSa>,     <VSa>,     r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wo,        v,
-                ?<VSa>,    *r,        v,         ??r,       wZ,        v")
+                ?<VSa>,    v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "<VSa>,     ZwO,       <VSa>,     we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                ?jwM,      jwM,       W,         W,         v,         wZ"))]
+                ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1214,12 +1226,12 @@  (define_insn "vsx_mov<mode>_64bit"
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mffgpr,    mftgpr,    load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecsimple, *,         *,         *,         vecstore,  vecload")
+                vecsimple, *,         *,         vecstore,  vecload")
 
    (set_attr "length"
                "4,         4,         4,         8,         4,         8,
                 8,         8,         8,         8,         4,         4,
-                4,         8,         20,        20,        4,         4")])
+                4,         20,        20,        4,         4")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   GPR 0/-1   VMX const  GPR const
Index: gcc/testsuite/gcc.target/powerpc/pr87870.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr87870.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr87870.c	(working copy)
@@ -0,0 +1,28 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-options "-O2" } */
+
+__int128
+test0 (void)
+{
+  return 0;
+}
+
+__int128
+test1 (void)
+{
+  return 1;
+}
+
+__int128
+test2 (void)
+{
+  return -1;
+}
+
+__int128
+test3 (void)
+{
+  return ((__int128)0xdeadbeefcafebabe << 64) | 0xfacefeedbaaaaaad;
+}
+
+/* { dg-final { scan-assembler-not {\mld\M} } } */