diff mbox series

V7, #1 of 7, Use PLI to load up 34-bit DImode constants

Message ID 20191114224010.GA7528@ibm-toto.the-meissners.org
State New
Headers show
Series V7, #1 of 7, Use PLI to load up 34-bit DImode constants | expand

Commit Message

Michael Meissner Nov. 14, 2019, 10:40 p.m. UTC
This patch adds an alternative to movdi to allow using PLI (PADDI) to load up
34-bit constants on the 'future' machine.

I have built compilers with this patch, and there were no regressions in the
test suite.  Can I check this into the trunk?

2019-11-14  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/rs6000.c (num_insns_constant_gpr): Add support for
	PADDI to load up and/or add 34-bit integer constants.
	(rs6000_rtx_costs): Treat constants loaded up with PADDI with the
	same cost as normal 16-bit constants.
	* config/rs6000/rs6000.md (movdi_internal64): Add support to load
	up 34-bit integer constants with PADDI.
	(movdi integer constant splitter): Add comment about PADDI.

Comments

Segher Boessenkool Nov. 23, 2019, 12:06 a.m. UTC | #1
On Thu, Nov 14, 2019 at 05:40:10PM -0500, Michael Meissner wrote:
> --- gcc/config/rs6000/rs6000.c	(revision 278173)
> +++ gcc/config/rs6000/rs6000.c	(working copy)
> @@ -5552,7 +5552,7 @@ static int
>  num_insns_constant_gpr (HOST_WIDE_INT value)
>  {
>    /* signed constant loadable with addi */
> -  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x10000)
> +  if (SIGNED_16BIT_OFFSET_P (value))
>      return 1;

Hrm, so the SIGNED_nBIT_OFFSET_P should not be called "offset", if we use
them for other numbers as well.

> -;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR #
> -;;              FPR store  FPR load   FPR move   AVX store  AVX store   AVX load
> -;;              AVX load   VSX move   P9 0       P9 -1      AVX 0/-1    VSX 0
> -;;              VSX -1     P9 const   AVX const  From SPR   To SPR      SPR<->SPR
> -;;              VSX->GPR   GPR->VSX
> +;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR pli
> +;;              GPR #      FPR store  FPR load   FPR move   AVX store   AVX store
> +;;              AVX load   AVX load   VSX move   P9 0       P9 -1       AVX 0/-1
> +;;              VSX 0      VSX -1     P9 const   AVX const  From SPR    To SPR
> +;;              SPR<->SPR  VSX->GPR   GPR->VSX

I cannot make heads or tails of it this way.  Please just add the "pli",
don't rearrange everything else.

There do not have to be exactly six per line.  The only reason to have
some order here is to make it easier to read, not to make it *harder*!

So for this first line let's have three GPR moves, and then have four
load immediates.  Then in the future if we need to edit it again, make
the edited part make some sense, etc.

>  ; Some DImode loads are best done as a load of -1 followed by a mask
> -; instruction.
> +; instruction.  On systems that support the PADDI (PLI) instruction,
> +; num_insns_constant returns 1, so these splitter would not be used for things
> +; that be loaded with PLI.

That comment doesn't add much at all?  This splitter isn't used for
constants we can load in one insn, that's right.  That happily works
just fine if we have prefixed insns as well.


Okay for trunk with those things fixed.  Thanks!


Segher
Michael Meissner Nov. 25, 2019, 10:09 p.m. UTC | #2
On Fri, Nov 22, 2019 at 06:06:19PM -0600, Segher Boessenkool wrote:
> On Thu, Nov 14, 2019 at 05:40:10PM -0500, Michael Meissner wrote:
> > --- gcc/config/rs6000/rs6000.c	(revision 278173)
> > +++ gcc/config/rs6000/rs6000.c	(working copy)
> > @@ -5552,7 +5552,7 @@ static int
> >  num_insns_constant_gpr (HOST_WIDE_INT value)
> >  {
> >    /* signed constant loadable with addi */
> > -  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x10000)
> > +  if (SIGNED_16BIT_OFFSET_P (value))
> >      return 1;
> 
> Hrm, so the SIGNED_nBIT_OFFSET_P should not be called "offset", if we use
> them for other numbers as well.
> 
> > -;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR #
> > -;;              FPR store  FPR load   FPR move   AVX store  AVX store   AVX load
> > -;;              AVX load   VSX move   P9 0       P9 -1      AVX 0/-1    VSX 0
> > -;;              VSX -1     P9 const   AVX const  From SPR   To SPR      SPR<->SPR
> > -;;              VSX->GPR   GPR->VSX
> > +;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR pli
> > +;;              GPR #      FPR store  FPR load   FPR move   AVX store   AVX store
> > +;;              AVX load   AVX load   VSX move   P9 0       P9 -1       AVX 0/-1
> > +;;              VSX 0      VSX -1     P9 const   AVX const  From SPR    To SPR
> > +;;              SPR<->SPR  VSX->GPR   GPR->VSX
> 
> I cannot make heads or tails of it this way.  Please just add the "pli",
> don't rearrange everything else.

You have to put PLI after LI and LIS and before the splitter insn.

> There do not have to be exactly six per line.  The only reason to have
> some order here is to make it easier to read, not to make it *harder*!

But in doing the transformation, I will modify all of the constraint and
attribute lines.  Otherwise it makes it impossible to add new things, and it
makes it impossible to find the appropriate insn.

> So for this first line let's have three GPR moves, and then have four
> load immediates.  Then in the future if we need to edit it again, make
> the edited part make some sense, etc.


So right now we have:

;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR #
;;              FPR store  FPR load   FPR move   AVX store  AVX store   AVX load
;;              AVX load   VSX move   P9 0       P9 -1      AVX 0/-1    VSX 0
;;              VSX -1     P9 const   AVX const  From SPR   To SPR      SPR<->SPR
;;              VSX->GPR   GPR->VSX
(define_insn "*movdi_internal64"
  [(set (match_operand:DI 0 "nonimmediate_operand"
               "=YZ,       r,         r,         r,         r,          r,
                m,         ^d,        ^d,        wY,        Z,          $v,
                $v,        ^wa,       wa,        wa,        v,          wa,
                wa,        v,         v,         r,         *h,         *h,
                ?r,        ?wa")
	(match_operand:DI 1 "input_operand"
               "r,         YZ,        r,         I,         L,          nF,
                ^d,        m,         ^d,        ^v,        $v,         wY,
                Z,         ^wa,       Oj,        wM,        OjwM,       Oj,
                wM,        wS,        wB,        *h,        r,          0,
                wa,        r"))]
  "TARGET_POWERPC64
   && (gpc_reg_operand (operands[0], DImode)
       || gpc_reg_operand (operands[1], DImode))"
  "@
   std%U0%X0 %1,%0
   ld%U1%X1 %0,%1
   mr %0,%1
   li %0,%1
   lis %0,%v1
   #
   stfd%U0%X0 %1,%0
   lfd%U1%X1 %0,%1
   fmr %0,%1
   stxsd %1,%0
   stxsdx %x1,%y0
   lxsd %0,%1
   lxsdx %x0,%y1
   xxlor %x0,%x1,%x1
   xxspltib %x0,0
   xxspltib %x0,255
   #
   xxlxor %x0,%x0,%x0
   xxlorc %x0,%x0,%x0
   #
   #
   mf%1 %0
   mt%0 %1
   nop
   mfvsrd %0,%x1
   mtvsrd %x0,%1"
  [(set_attr "type"
               "store,      load,	*,         *,         *,         *,
                fpstore,    fpload,     fpsimple,  fpstore,   fpstore,   fpload,
                fpload,     veclogical, vecsimple, vecsimple, vecsimple, veclogical,
                veclogical, vecsimple,  vecsimple, mfjmpr,    mtjmpr,    *,
                mftgpr,    mffgpr")
   (set_attr "size" "64")
   (set_attr "length"
               "*,         *,         *,         *,         *,          20,
                *,         *,         *,         *,         *,          *,
                *,         *,         *,         *,         *,          *,
                *,         8,         *,         *,         *,          *,
                *,         *")
   (set_attr "isa"
               "*,         *,         *,         *,         *,          *,
                *,         *,         *,         p9v,       p7v,        p9v,
                p7v,       *,         p9v,       p9v,       p7v,        *,
                *,         p7v,       p7v,       *,         *,          *,
                p8v,       p8v")])

If I just change the columns, in order to be able to correlate which constraint
goes with which attribute, it would need to be modified to something like:

;;              GPR store  GPR load   GPR move
;;              GPR li     GPR lis    GPR pli    GPR #
;;              FPR store  FPR load   FPR move
;;              AVX store  AVX store  AVX load   AVX load   VSX move
;;              P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1
;;              P9 const   AVX const
;;              From SPR   To SPR     SPR<->SPR
;;              VSX->GPR   GPR->VSX

(define_insn "*movdi_internal64"
  [(set (match_operand:DI 0 "nonimmediate_operand"
               "=YZ,       r,         r,
                r,         r,         r,         r,
                m,         ^d,        ^d,
                wY,        Z,         $v,        $v,        ^wa,
                wa,        wa,        v,         wa,        wa,
                v,         v,
                r,         *h,        *h,
                ?r,        ?wa")

	(match_operand:DI 1 "input_operand"
               "r,         YZ,        r,
                I,         L,         eI,        nF,
                ^d,        m,         ^d,        ^v,        $v,
                wY,        Z,         ^wa,       Oj,        wM,
                OjwM,      Oj,        wM,        wS,        wB,
                *h,        r,         0,
                wa,        r"))]

  "TARGET_POWERPC64
   && (gpc_reg_operand (operands[0], DImode)
       || gpc_reg_operand (operands[1], DImode))"
  "@
   std%U0%X0 %1,%0
   ld%U1%X1 %0,%1
   mr %0,%1
   li %0,%1
   lis %0,%v1
   li %0,%1
   #
   stfd%U0%X0 %1,%0
   lfd%U1%X1 %0,%1
   fmr %0,%1
   stxsd %1,%0
   stxsdx %x1,%y0
   lxsd %0,%1
   lxsdx %x0,%y1
   xxlor %x0,%x1,%x1
   xxspltib %x0,0
   xxspltib %x0,255
   #
   xxlxor %x0,%x0,%x0
   xxlorc %x0,%x0,%x0
   #
   #
   mf%1 %0
   mt%0 %1
   nop
   mfvsrd %0,%x1
   mtvsrd %x0,%1"
  [(set_attr "type"
               "store,      load,	*,
                *,          *,          *,         *,
                fpstore,    fpload,    fpsimple,
                fpstore,    fpstore,   fpload,     fpload,     veclogical,
                vecsimple,  vecsimple, vecsimple,  veclogical, veclogical,
                vecsimple,  vecsimple,
                mfjmpr,     mtjmpr,    *,
                mftgpr,     mffgpr")
   (set_attr "size" "64")
   (set_attr "num_insns"
               "*,         *,         *,
                *,         *,         *,           5,
                *,         *,         *,
                *,         *,         *,           *,          *,
                *,         *,         *,           *,          *,
                2,         *,
                *,         *,         *,
                *,         *")
   (set_attr "isa"
               "*,         *,         *,
                *,         *,         fut,         *,
                *,         *,         *,
                p9v,       p7v,       p9v,         p7v,        *,
                p9v,       p9v,       p7v,         *,          *,
                p7v,       p7v,
                *,         *,         *,
                p8v,       p8v")])

Is this what you want?

Now, this is without moving any of the alternatives around.  Logically, you
could move gpr/fpr/vsx moves to be together (and maybe the direct moves also).
But I dunno whether this is getting into bike shedding.
 
> >  ; Some DImode loads are best done as a load of -1 followed by a mask
> > -; instruction.
> > +; instruction.  On systems that support the PADDI (PLI) instruction,
> > +; num_insns_constant returns 1, so these splitter would not be used for things
> > +; that be loaded with PLI.
> 
> That comment doesn't add much at all?  This splitter isn't used for
> constants we can load in one insn, that's right.  That happily works
> just fine if we have prefixed insns as well.

Well you had asked for the comment many months ago.
Segher Boessenkool Nov. 25, 2019, 11 p.m. UTC | #3
Hi Mike,

On Mon, Nov 25, 2019 at 05:09:10PM -0500, Michael Meissner wrote:
> On Fri, Nov 22, 2019 at 06:06:19PM -0600, Segher Boessenkool wrote:
> > > --- gcc/config/rs6000/rs6000.c	(revision 278173)
> > > +++ gcc/config/rs6000/rs6000.c	(working copy)
> > > @@ -5552,7 +5552,7 @@ static int
> > >  num_insns_constant_gpr (HOST_WIDE_INT value)
> > >  {
> > >    /* signed constant loadable with addi */
> > > -  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x10000)
> > > +  if (SIGNED_16BIT_OFFSET_P (value))
> > >      return 1;
> > 
> > Hrm, so the SIGNED_nBIT_OFFSET_P should not be called "offset", if we use
> > them for other numbers as well.

Please do something about this?  Not in this patch, of course.

> Is this what you want?

Yeah, like that.

Please send a patch doing *only* this typographical change -- no code
changes at all -- before patches that change the code.  Or, alternatively,
just split up lines that you are touching anyway, and eventually we will
get something nice.

Either option is much easier to review than patches moving 60 constraints
around :-)

This is general advice: if you want to rearrange some code, or fix (some
non-trivial) white space, etc., do that in a separate patch *before* the
rest.  Same as with bugfixes.

That way, it can be immediately applied, whether the rest of the patches
needs further revisions or not.  And it makes reviewing the rest a lot
easier too.

> Now, this is without moving any of the alternatives around.  Logically, you
> could move gpr/fpr/vsx moves to be together (and maybe the direct moves also).
> But I dunno whether this is getting into bike shedding.

Yeah you cannot always move things the way you like them, the ordering
also matters for RA itself (sometimes).  It's all not a big deal except
we have so very very many alternatives in the mov patterns.


Segher
diff mbox series

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 278173)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5552,7 +5552,7 @@  static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
 {
   /* signed constant loadable with addi */
-  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x10000)
+  if (SIGNED_16BIT_OFFSET_P (value))
     return 1;
 
   /* constant loadable with addis */
@@ -5560,6 +5560,10 @@  num_insns_constant_gpr (HOST_WIDE_INT va
 	   && (value >> 31 == -1 || value >> 31 == 0))
     return 1;
 
+  /* PADDI can support up to 34 bit signed integers.  */
+  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
+    return 1;
+
   else if (TARGET_POWERPC64)
     {
       HOST_WIDE_INT low  = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
@@ -20679,7 +20683,8 @@  rs6000_rtx_costs (rtx x, machine_mode mo
 	    || outer_code == PLUS
 	    || outer_code == MINUS)
 	   && (satisfies_constraint_I (x)
-	       || satisfies_constraint_L (x)))
+	       || satisfies_constraint_L (x)
+	       || satisfies_constraint_eI (x)))
 	  || (outer_code == AND
 	      && (satisfies_constraint_K (x)
 		  || (mode == SImode
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 278173)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -8808,24 +8808,24 @@  (define_split
   DONE;
 })
 
-;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR #
-;;              FPR store  FPR load   FPR move   AVX store  AVX store   AVX load
-;;              AVX load   VSX move   P9 0       P9 -1      AVX 0/-1    VSX 0
-;;              VSX -1     P9 const   AVX const  From SPR   To SPR      SPR<->SPR
-;;              VSX->GPR   GPR->VSX
+;;              GPR store  GPR load   GPR move   GPR li     GPR lis     GPR pli
+;;              GPR #      FPR store  FPR load   FPR move   AVX store   AVX store
+;;              AVX load   AVX load   VSX move   P9 0       P9 -1       AVX 0/-1
+;;              VSX 0      VSX -1     P9 const   AVX const  From SPR    To SPR
+;;              SPR<->SPR  VSX->GPR   GPR->VSX
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
                "=YZ,       r,         r,         r,         r,          r,
-                m,         ^d,        ^d,        wY,        Z,          $v,
-                $v,        ^wa,       wa,        wa,        v,          wa,
-                wa,        v,         v,         r,         *h,         *h,
-                ?r,        ?wa")
+                r,         m,         ^d,        ^d,        wY,         Z,
+                $v,        $v,        ^wa,       wa,        wa,         v,
+                wa,        wa,        v,         v,         r,          *h,
+                *h,        ?r,        ?wa")
 	(match_operand:DI 1 "input_operand"
-               "r,         YZ,        r,         I,         L,          nF,
-                ^d,        m,         ^d,        ^v,        $v,         wY,
-                Z,         ^wa,       Oj,        wM,        OjwM,       Oj,
-                wM,        wS,        wB,        *h,        r,          0,
-                wa,        r"))]
+               "r,         YZ,        r,         I,         L,          eI,
+                nF,        ^d,        m,         ^d,        ^v,         $v,
+                wY,        Z,         ^wa,       Oj,        wM,         OjwM,
+                Oj,        wM,        wS,        wB,        *h,         r,
+                0,         wa,        r"))]
   "TARGET_POWERPC64
    && (gpc_reg_operand (operands[0], DImode)
        || gpc_reg_operand (operands[1], DImode))"
@@ -8835,6 +8835,7 @@  (define_insn "*movdi_internal64"
    mr %0,%1
    li %0,%1
    lis %0,%v1
+   li %0,%1
    #
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
@@ -8858,26 +8859,28 @@  (define_insn "*movdi_internal64"
    mtvsrd %x0,%1"
   [(set_attr "type"
                "store,      load,	*,         *,         *,         *,
-                fpstore,    fpload,     fpsimple,  fpstore,   fpstore,   fpload,
-                fpload,     veclogical, vecsimple, vecsimple, vecsimple, veclogical,
-                veclogical, vecsimple,  vecsimple, mfjmpr,    mtjmpr,    *,
-                mftgpr,    mffgpr")
+                *,          fpstore,    fpload,    fpsimple,  fpstore,   fpstore,
+                fpload,     fpload,     veclogical,vecsimple, vecsimple, vecsimple,
+                veclogical, veclogical, vecsimple,  vecsimple, mfjmpr,   mtjmpr,
+                *,          mftgpr,    mffgpr")
    (set_attr "size" "64")
    (set_attr "length"
-               "*,         *,         *,         *,         *,          20,
-                *,         *,         *,         *,         *,          *,
+               "*,         *,         *,         *,         *,          *,
+                20,        *,         *,         *,         *,          *,
                 *,         *,         *,         *,         *,          *,
-                *,         8,         *,         *,         *,          *,
-                *,         *")
+                *,         *,         8,         *,         *,          *,
+                *,         *,         *")
    (set_attr "isa"
-               "*,         *,         *,         *,         *,          *,
-                *,         *,         *,         p9v,       p7v,        p9v,
-                p7v,       *,         p9v,       p9v,       p7v,        *,
-                *,         p7v,       p7v,       *,         *,          *,
-                p8v,       p8v")])
+               "*,         *,         *,         *,         *,          fut,
+                *,         *,         *,         *,         p9v,        p7v,
+                p9v,       p7v,       *,         p9v,       p9v,        p7v,
+                *,         *,         p7v,       p7v,       *,          *,
+                *,         p8v,       p8v")])
 
 ; Some DImode loads are best done as a load of -1 followed by a mask
-; instruction.
+; instruction.  On systems that support the PADDI (PLI) instruction,
+; num_insns_constant returns 1, so these splitter would not be used for things
+; that be loaded with PLI.
 (define_split
   [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
 	(match_operand:DI 1 "const_int_operand"))]