diff mbox series

Mark XXSPLTIW/XXSPLTIDP as prefixed -- PR 104136

Message ID YesORmgtI+VXk751@toto.the-meissners.org
State New
Headers show
Series Mark XXSPLTIW/XXSPLTIDP as prefixed -- PR 104136 | expand

Commit Message

Michael Meissner Jan. 21, 2022, 7:49 p.m. UTC
Mark XXSPLTIW/XXSPLTIDP as prefixed -- PR 104136

If you compile module_advect_em.F90 with -Ofast -mcpu=power10, one module
is large enough that we can't use a single conditional jump to span the
function.  Instead, GCC has to reverse the condition, and do a conditional
jump around an unconditional branch.  It turns out when xxspltiw and
xxspltdp instructions were generated, they were not marked as being
prefixed (i.e. length of 12 bytes instead of 4 bytes).  This meant the
calculations for the branch length were off, which in turn meant the
assembler raised an error because it couldn't do the conditional jump.

The fix is to explicitly set the prefixed attribute when we are loading up
vector constants with the xxspltiw or xxspltidp instructions.

I have removed the code that sets the prefixed attribute for xxspltiw,
xxspltidp, and xxsplti32dx instructions, since it no longer will be invoked.

I have also explicitly set the prefixed attribute for load SF and DF mode
constants with xxsplitw and xxspltidp.  Previously, it was not set on these
insns, but when the insn was split to get the XXSPLTIW/XXSPLTIDP forms, those
forms already had the prefixed attribute set.

I have tested this by doing bootstraps and make check on a power8 big endian
system using --with-cpu=power8, power9 little endian system using
--with-cpu=power9, and a power10 little endian system using
--with-cpu=power10.  There were no new errors with this patch.

I have also built a full spec 2017 rate build for power10 using the -Ofast
compilation option, and it now built the entire suite.

Can I install this patch to the trunk?

gcc/
2022-01-21  Michael Meissner  <meissner@the-meissners.org>

	PR target/104136
	* config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): Delete.
	* config/rs6000/rs6000.cc (prefixed_xxsplti_p): Delete.
	* config/rs6000/rs6000.md (prefixed attribute): Delete section
	that sets the prefixed attribute for xxspltiw, xxspltidp, and
	xxsplti32dx instructions.
	(movsf_hardfloat): Explicitly set the prefixed attribute
	when xxspltiw and xxspltidp instructions are generated.
	(mov<mode>_hardfloat32): Likewise.
	(mov<mode>_hardfloat64): Likewise.
	* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Explicitly set the
	prefixed attribute for xxspltiw and xxspltidp instructions.
	(vsx_mov<mode>_32bit): Likewise.
---
 gcc/config/rs6000/rs6000-protos.h |  1 -
 gcc/config/rs6000/rs6000.cc       | 38 -------------------------------
 gcc/config/rs6000/rs6000.md       | 24 ++++++++++++-------
 gcc/config/rs6000/vsx.md          | 12 +++++++++-
 4 files changed, 27 insertions(+), 48 deletions(-)

Comments

Segher Boessenkool Jan. 21, 2022, 9:35 p.m. UTC | #1
Hi!

On Fri, Jan 21, 2022 at 02:49:26PM -0500, Michael Meissner wrote:
> If you compile module_advect_em.F90 with -Ofast -mcpu=power10, one module
> is large enough that we can't use a single conditional jump to span the
> function.  Instead, GCC has to reverse the condition, and do a conditional
> jump around an unconditional branch.  It turns out when xxspltiw and
> xxspltdp instructions were generated, they were not marked as being
> prefixed (i.e. length of 12 bytes instead of 4 bytes).

(The prefixed insn itself is 8B, but there can be 4B more because
prefixed insns cannot cross 64B boundaries, necessitating an extra nop
insn or other 4B padding).

> This meant the
> calculations for the branch length were off, which in turn meant the
> assembler raised an error because it couldn't do the conditional jump.

That is the most common symptom, yup.  But there are other problems as
well (other correctness problems -- it obviously does not help
performance either).

> The fix is to explicitly set the prefixed attribute when we are loading up
> vector constants with the xxspltiw or xxspltidp instructions.

That attribute should be set on *all* xxsplti{w,dp} insns, and more in
general on all insns that are always prefixed.  The maybe_prefixed
attribute is only for insns for which a porefixed as well as a not
prefixed version exists, the prefixed version with a "p" prefixed to the
mnemonic.

> I have removed the code that sets the prefixed attribute for xxspltiw,
> xxspltidp, and xxsplti32dx instructions, since it no longer will be invoked.

Great cleanup / simplification!

> I have also explicitly set the prefixed attribute for load SF and DF mode
> constants with xxsplitw and xxspltidp.  Previously, it was not set on these
> insns, but when the insn was split to get the XXSPLTIW/XXSPLTIDP forms, those
> forms already had the prefixed attribute set.

So now we have more correct information before the insn is split.  Good.

> -	 (eq_attr "type" "vecperm")
> -	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
>  		       (const_string "yes")
>  		       (const_string "no"))]

Excellent to see this go :-)

> +   (set_attr "prefixed"
> +	"*,          *,         *,          *,         *,         *,
> +	 *,          *,         *,          *,         *,         *,
> +	 *,          *,         *,          *,         yes")])

You could do some formula that computes it from isa==p10 btw.  But wrap
that in some helper, "is can have prefixed" or something.

Not really worth it unless you need this often, the four we have now
(which could be two perhaps, by merging each pair of patterns again)
isn't enough to warrant the extra indirection.

Okay for trunk.  Also fine for backports if you need them.

Thanks!


Segher
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index e322ac0c199..3ea01023609 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -199,7 +199,6 @@  enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
-extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b34962da27d..7b8a3b5299a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -26617,44 +26617,6 @@  prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
-/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
-   from the prefixed attribute processing.  */
-
-bool
-prefixed_xxsplti_p (rtx_insn *insn)
-{
-  rtx set = single_set (insn);
-  if (!set)
-    return false;
-
-  rtx dest = SET_DEST (set);
-  rtx src = SET_SRC (set);
-  machine_mode mode = GET_MODE (dest);
-
-  if (!REG_P (dest) && !SUBREG_P (dest))
-    return false;
-
-  if (GET_CODE (src) == UNSPEC)
-    {
-      int unspec = XINT (src, 1);
-      return (unspec == UNSPEC_XXSPLTIW
-	      || unspec == UNSPEC_XXSPLTIDP
-	      || unspec == UNSPEC_XXSPLTI32DX);
-    }
-
-  vec_const_128bit_type vsx_const;
-  if (vec_const_128bit_to_bytes (src, mode, &vsx_const))
-    {
-      if (constant_generates_xxspltiw (&vsx_const))
-	return true;
-
-      if (constant_generates_xxspltidp (&vsx_const))
-	return true;
-    }
-
-  return false;
-}
-
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 59531b6d07e..4e221189028 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,11 +314,6 @@  (define_attr "prefixed" "no,yes"
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
-		       (const_string "yes")
-		       (const_string "no"))
-
-	 (eq_attr "type" "vecperm")
-	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7857,7 +7852,11 @@  (define_insn "movsf_hardfloat"
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         p10")])
+	 *,          *,         *,          *,         p10")
+   (set_attr "prefixed"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         yes")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8159,7 +8158,11 @@  (define_insn "*mov<mode>_hardfloat32"
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          p10")])
+             *,           *,          *,          p10")
+   (set_attr "prefixed"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          yes")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8232,7 +8235,12 @@  (define_insn "*mov<mode>_hardfloat64"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v,        p10")])
+             *,           p8v,        p8v,        p10")
+   (set_attr "prefixed"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e84ffb6a6db..c8c891e13f4 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1237,7 +1237,12 @@  (define_insn "vsx_mov<mode>_64bit"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
                 p10,       p10,
-                <VSisa>,   *,         *,         *,         *")])
+                <VSisa>,   *,         *,         *,         *")
+   (set_attr "prefixed"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         *,         *,         *,
+                *,         yes,
+                *,         *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              LXVKQ      XXSPLTI*
@@ -1276,6 +1281,11 @@  (define_insn "*vsx_mov<mode>_32bit"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
+                *,         *")
+   (set_attr "prefixed"
+               "*,         *,         *,         *,         *,         *,
+                *,         yes,
+                *,         *,         *,         *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions