Message ID | YSaeI5BVUxofem9y@toto.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | Generate XXSPLTIDP on power10. | expand |
On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote: > Generate XXSPLTIDP on power10. > > This patch implements XXSPLTIDP support for SF and DF scalar constants and V2DF > vector constants. The XXSPLTIDP instruction is given a 32-bit immediate that > is converted to a vector of two DFmode constants. The immediate is in SFmode > format, so only constants that fit as SFmode values can be loaded with > XXSPLTIDP. ok > > I added a new constraint (eF) to match constants that can be loaded with the > XXSPLTIDP instruction. > > I have added a temporary switch (-mxxspltidp) to control whether or not the > XXSPLTIDP instruction is generated. How temporary? > > I added 3 new tests to test loading up SF/DF scalar and V2DF vector > constants. > > I have tested this with bootstrap compilers on power10 systems and there was no > regression. I have built GCC with these patches on little endian power9 and > big endian power8 systems, and there were no regressions. > > In addition, I have built and run the full Spec 2017 rate suite, comparing with > the patches enabled and not enabled. There were roughly 66,000 XXSPLTIDP's > generated in the rate build for Spec 2017. On a stand-alone system that is > running single threaded, blender_r has a 1.9% increase in performance, and rest > of the benchmarks are performance neutral. However, I would expect that in a > real world scenario, switching to use XXSPLTIDP will increase performance due > to removing all of the loads. ok > > Can I check this into the master branch? > > 2021-08-25 Michael Meissner <meissner@linux.ibm.com> > > gcc/ > * config/rs6000/constraints.md (eF): New constraint. > * config/rs6000/predicates.md (easy_fp_constant): If we can load > the scalar constant with XXSPLTIDP, the floating point constant is > easy. Could be shortened to something like ? Add clause to accept xxspltidp_operand as easy. > (xxspltidp_operand): New predicate. Will there ever be another instruction using the SF/DF CONST_DOUBLE or V2DF CONST_VECTOR ? I tentatively question the name of the operand, but defer.. > (easy_vector_constant): If we can generate XXSPLTIDP, mark the > vector constant as easy. Duplicated from above. > * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New > declaration. > (prefixed_permute_p): Likewise. > * config/rs6000/rs6000.c (xxspltidp_constant_p): New function. > (output_vec_const_move): Add support for XXSPLTIDP. > (prefixed_permute_p): New function. Duplicated. > * config/rs6000/rs6000.md (prefixed attribute): Add support for > permute prefixed instructions. > (movsf_hardfloat): Add XXSPLTIDP support. > (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise. > (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise. > * config/rs6000/rs6000.opt (-mxxspltidp): New switch. > * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP > support. > (vsx_move<mode>_32bit): Likewise. No e in mov (per patch contents below). > (vsx_splat_v2df_xxspltidp): New insn. > (XXSPLTIDP): New mode iterator. > (xxspltidp_<mode>_internal): New insn and splits. > (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an > iterated form that also does SFmode, and DFmode. Swap "an iterated form" with "xxspltidp_<mode>_inst ? > > gcc/testsuite/ > * gcc.target/powerpc/vec-splat-constant-sf.c: New test. > * gcc.target/powerpc/vec-splat-constant-df.c: New test. > * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. > --- > gcc/config/rs6000/constraints.md | 5 + > gcc/config/rs6000/predicates.md | 17 +++ > gcc/config/rs6000/rs6000-protos.h | 2 + > gcc/config/rs6000/rs6000.c | 106 ++++++++++++++++++ > gcc/config/rs6000/rs6000.md | 45 +++++--- > gcc/config/rs6000/rs6000.opt | 4 + > gcc/config/rs6000/vsx.md | 64 ++++++++++- > .../powerpc/vec-splat-constant-df.c | 60 ++++++++++ > .../powerpc/vec-splat-constant-sf.c | 60 ++++++++++ > .../powerpc/vec-splat-constant-v2df.c | 64 +++++++++++ > 10 files changed, 405 insertions(+), 22 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > > diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md > index c8cff1a3038..ea2e4a267c3 100644 > --- a/gcc/config/rs6000/constraints.md > +++ b/gcc/config/rs6000/constraints.md > @@ -208,6 +208,11 @@ (define_constraint "P" > (and (match_code "const_int") > (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000"))) > > +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP > +(define_constraint "eF" > + "A vector constant that can be loaded with the XXSPLTIDP instruction." > + (match_operand 0 "xxspltidp_operand")) > + > ;; 34-bit signed integer constant > (define_constraint "eI" > "A signed 34-bit integer constant if prefixed instructions are supported." > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index 956e42bc514..134243e404b 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant" > if (TARGET_VSX && op == CONST0_RTX (mode)) > return 1; > > + /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can > + be loaded with that instruction. */ > + if (xxspltidp_operand (op, mode)) > + return 1; > + > /* Otherwise consider floating point constants hard, so that the > constant gets pushed to memory during the early RTL phases. This > has the advantage that double precision constants that can be > @@ -640,6 +645,15 @@ (define_predicate "xxspltib_constant_nosplit" > return num_insns == 1; > }) > > +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be > +;; loaded via the ISA 3.1 XXSPLTIDP instruction. "Return 1 if" doesnt seem right given the return statement here. > +(define_predicate "xxspltidp_operand" > + (match_code "const_double,const_vector,vec_duplicate") > +{ > + HOST_WIDE_INT value = 0; > + return xxspltidp_constant_p (op, mode, &value); > +}) > + > ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a > ;; vector register without using memory. > (define_predicate "easy_vector_constant" > @@ -653,6 +667,9 @@ (define_predicate "easy_vector_constant" > if (zero_constant (op, mode) || all_ones_constant (op, mode)) > return true; > > + if (xxspltidp_operand (op, mode)) > + return true; > + > if (TARGET_P9_VECTOR > && xxspltib_constant_p (op, mode, &num_insns, &value)) > return true; > diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h > index 14f6b313105..9bba57c22f2 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int, > > extern int easy_altivec_constant (rtx, machine_mode); > extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *); > +extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *); > extern int vspltis_shifted (rtx); > extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); > extern bool macho_lo_sum_memory_operand (rtx, machine_mode); > @@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode); > extern bool prefixed_load_p (rtx_insn *); > extern bool prefixed_store_p (rtx_insn *); > extern bool prefixed_paddi_p (rtx_insn *); > +extern bool prefixed_permute_p (rtx_insn *); > extern void rs6000_asm_output_opcode (FILE *); > extern void output_pcrel_opt_reloc (rtx); > extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int); > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index e073b26b430..322b3c83925 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -6533,6 +6533,74 @@ xxspltib_constant_p (rtx op, > return true; > } > > +/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1 > + XXSPLTIDP instruction. > + > + Return the constant that is being split via CONSTANT_PTR to use in the > + XXSPLTIDP instruction. */ Appears to return true or false. Is the "Return the constant" comment meant to go on the predicate definition earlier? > + > +bool > +xxspltidp_constant_p (rtx op, > + machine_mode mode, > + HOST_WIDE_INT *constant_ptr) > +{ > + *constant_ptr = 0; > + > + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) > + return false; > + > + if (mode == VOIDmode) > + mode = GET_MODE (op); > + > + rtx element = op; > + if (mode == V2DFmode) > + { > + if (CONST_VECTOR_P (op)) > + { > + element = CONST_VECTOR_ELT (op, 0); > + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1))) > + return false; > + } > + > + else if (GET_CODE (op) == VEC_DUPLICATE) > + element = XEXP (op, 0); > + > + else > + return false; > + > + mode = DFmode; > + } > + > + if (mode != SFmode && mode != DFmode) > + return false; > + > + if (GET_MODE (element) != mode) > + return false; > + > + if (!CONST_DOUBLE_P (element)) > + return false; > + > + /* Don't return true for 0.0 since that is easy to create without > + XXSPLTIDP. */ > + if (element == CONST0_RTX (mode)) > + return false; > + > + /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP. */ > + const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element); > + if (!exact_real_truncate (SFmode, rv)) > + return false; The 'exactly' caught my eye. Per a glance at comments in extract_real_truncate this indicates that the value is identical after conversion to the new format. Ok. > + > + long value; > + REAL_VALUE_TO_TARGET_SINGLE (*rv, value); > + > + /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero). */ > + if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0)) > + return false; > + > + *constant_ptr = value; > + return true; > +} ok > + > const char * > output_vec_const_move (rtx *operands) > { > @@ -6548,6 +6616,7 @@ output_vec_const_move (rtx *operands) > { > bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest)); > int xxspltib_value = 256; > + HOST_WIDE_INT xxspltidp_value = 0; > int num_insns = -1; > > if (zero_constant (vec, mode)) > @@ -6577,6 +6646,12 @@ output_vec_const_move (rtx *operands) > gcc_unreachable (); > } > > + if (xxspltidp_constant_p (vec, mode, &xxspltidp_value)) > + { > + operands[2] = GEN_INT (xxspltidp_value); > + return "xxspltidp %x0,%2"; > + } > + > if (TARGET_P9_VECTOR > && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) > { ok > @@ -26219,6 +26294,37 @@ prefixed_paddi_p (rtx_insn *insn) > return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL); > } > > +/* Whether a permute type instruction is a prefixed instruction. This is > + called from the prefixed attribute processing. */ > + > +bool > +prefixed_permute_p (rtx_insn *insn) > +{ > + rtx set = single_set (insn); > + if (!set) > + return false; > + > + rtx dest = SET_DEST (set); > + rtx src = SET_SRC (set); > + machine_mode mode = GET_MODE (dest); > + > + if (!REG_P (dest) && !SUBREG_P (dest)) > + return false; > + > + switch (mode) > + { > + case DFmode: > + case SFmode: > + case V2DFmode: > + return xxspltidp_operand (src, mode); > + > + default: > + break; > + } > + > + return false; > +} > + ok > /* Whether the next instruction needs a 'p' prefix issued before the > instruction is printed out. */ > static bool prepend_p_to_next_insn; > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index a84438f8545..bf3bfed3b88 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes" > > (eq_attr "type" "integer,add") > (if_then_else (match_test "prefixed_paddi_p (insn)") > + (const_string "yes") > + (const_string "no")) > + > + (eq_attr "type" "vecperm") > + (if_then_else (match_test "prefixed_permute_p (insn)") > (const_string "yes") > (const_string "no"))] > > @@ -7723,17 +7728,17 @@ (define_split > ;; > ;; LWZ LFS LXSSP LXSSPX STFS STXSSP > ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP > -;; MR MT<x> MF<x> NOP > +;; MR MT<x> MF<x> NOP XXSPLTIDP > > (define_insn "movsf_hardfloat" > [(set (match_operand:SF 0 "nonimmediate_operand" > "=!r, f, v, wa, m, wY, > Z, m, wa, !r, f, wa, > - !r, *c*l, !r, *h") > + !r, *c*l, !r, *h, wa") > (match_operand:SF 1 "input_operand" > "m, m, wY, Z, f, v, > wa, r, j, j, f, wa, > - r, r, *h, 0"))] > + r, r, *h, 0, eF"))] > "(register_operand (operands[0], SFmode) > || register_operand (operands[1], SFmode)) > && TARGET_HARD_FLOAT > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat" > mr %0,%1 > mt%0 %1 > mf%1 %0 > - nop" > + nop > + #" > [(set_attr "type" > "load, fpload, fpload, fpload, fpstore, fpstore, > fpstore, store, veclogical, integer, fpsimple, fpsimple, > - *, mtjmpr, mfjmpr, *") > + *, mtjmpr, mfjmpr, *, vecperm") > (set_attr "isa" > "*, *, p9v, p8v, *, p9v, > p8v, *, *, *, *, *, > - *, *, *, *")]) > + *, *, *, *, p10")]) OK, i think. The addition of vecperm for type and p10 for the isa entries catch my eye, but I expect this is obvious to others. > > ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ > ;; FMR MR MT%0 MF%1 NOP > @@ -8023,18 +8029,18 @@ (define_split > > ;; STFD LFD FMR LXSD STXSD > ;; LXSD STXSD XXLOR XXLXOR GPR<-0 > -;; LWZ STW MR > +;; LWZ STW MR XXSPLTIDP > > > (define_insn "*mov<mode>_hardfloat32" > [(set (match_operand:FMOVE64 0 "nonimmediate_operand" > "=m, d, d, <f64_p9>, wY, > <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, > - Y, r, !r") > + Y, r, !r, wa") > (match_operand:FMOVE64 1 "input_operand" > "d, m, d, wY, <f64_p9>, > Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, > - r, Y, r"))] > + r, Y, r, eF"))] > "! TARGET_POWERPC64 && TARGET_HARD_FLOAT > && (gpc_reg_operand (operands[0], <MODE>mode) > || gpc_reg_operand (operands[1], <MODE>mode))" > @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32" > # > # > # > + # > #" > [(set_attr "type" > "fpstore, fpload, fpsimple, fpload, fpstore, > fpload, fpstore, veclogical, veclogical, two, > - store, load, two") > + store, load, two, vecperm") > (set_attr "size" "64") > (set_attr "length" > "*, *, *, *, *, > *, *, *, *, 8, > - 8, 8, 8") > + 8, 8, 8, *") > (set_attr "isa" > "*, *, *, p9v, p9v, > p7v, p7v, *, *, *, > - *, *, *")]) > + *, *, *, p10")]) > > ;; STW LWZ MR G-const H-const F-const > > @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32" > ;; STFD LFD FMR LXSD STXSD > ;; LXSDX STXSDX XXLOR XXLXOR LI 0 > ;; STD LD MR MT{CTR,LR} MF{CTR,LR} > -;; NOP MFVSRD MTVSRD > +;; NOP MFVSRD MTVSRD XXSPLTIDP > > (define_insn "*mov<mode>_hardfloat64" > [(set (match_operand:FMOVE64 0 "nonimmediate_operand" > "=m, d, d, <f64_p9>, wY, > <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, > YZ, r, !r, *c*l, !r, > - *h, r, <f64_dm>") > + *h, r, <f64_dm>, wa") > (match_operand:FMOVE64 1 "input_operand" > "d, m, d, wY, <f64_p9>, > Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, > r, YZ, r, r, *h, > - 0, <f64_dm>, r"))] > + 0, <f64_dm>, r, eF"))] > "TARGET_POWERPC64 && TARGET_HARD_FLOAT > && (gpc_reg_operand (operands[0], <MODE>mode) > || gpc_reg_operand (operands[1], <MODE>mode))" > @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64" > mf%1 %0 > nop > mfvsrd %0,%x1 > - mtvsrd %x0,%1" > + mtvsrd %x0,%1 > + #" > [(set_attr "type" > "fpstore, fpload, fpsimple, fpload, fpstore, > fpload, fpstore, veclogical, veclogical, integer, > store, load, *, mtjmpr, mfjmpr, > - *, mfvsr, mtvsr") > + *, mfvsr, mtvsr, vecperm") > (set_attr "size" "64") > (set_attr "isa" > "*, *, *, p9v, p9v, > p7v, p7v, *, *, *, > *, *, *, *, *, > - *, p8v, p8v")]) > + *, p8v, p8v, p10")]) > > ;; STD LD MR MT<SPR> MF<SPR> G-const > ;; H-const F-const Special Ok. > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64" > (set_attr "length" > "*, *, *, *, *, 8, > 12, 16, *")]) > + > Unnecessarily blank line? > (define_expand "mov<mode>" > [(set (match_operand:FMOVE128 0 "general_operand") > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > index 0538db387dc..928c4fafe07 100644 > --- a/gcc/config/rs6000/rs6000.opt > +++ b/gcc/config/rs6000/rs6000.opt > @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks. > mprivileged > Target Var(rs6000_privileged) Init(0) > Generate code that will run in privileged state. > + > +mxxspltidp > +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save > +Generate (do not generate) XXSPLTIDP instructions. Ok. > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index bf033e31c1c..af9a04870d4 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -1191,16 +1191,19 @@ (define_insn_and_split "*xxspltib_<mode>_split" > ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move. > > ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR) > +;; XXSPLTIDP > ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW > ;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) > (define_insn "vsx_mov<mode>_64bit" > [(set (match_operand:VSX_M 0 "nonimmediate_operand" > "=ZwO, wa, wa, r, we, ?wQ, > + wa, > ?&r, ??r, ??Y, <??r>, wa, v, > ?wa, v, <??r>, wZ, v") > > (match_operand:VSX_M 1 "input_operand" > "wa, ZwO, wa, we, r, r, > + eF, > wQ, Y, r, r, wE, jwM, > ?jwM, W, <nW>, v, wZ"))] > > @@ -1212,36 +1215,44 @@ (define_insn "vsx_mov<mode>_64bit" > } > [(set_attr "type" > "vecstore, vecload, vecsimple, mtvsr, mfvsr, load, > + vecperm, > store, load, store, *, vecsimple, vecsimple, > vecsimple, *, *, vecstore, vecload") > (set_attr "num_insns" > "*, *, *, 2, *, 2, > + *, > 2, 2, 2, 2, *, *, > *, 5, 2, *, *") > (set_attr "max_prefixed_insns" > "*, *, *, *, *, 2, > + *, > 2, 2, 2, 2, *, *, > *, *, *, *, *") > (set_attr "length" > "*, *, *, 8, *, 8, > + *, > 8, 8, 8, 8, *, *, > *, 20, 8, *, *") > (set_attr "isa" > "<VSisa>, <VSisa>, <VSisa>, *, *, *, > + p10, > *, *, *, *, p9v, *, > <VSisa>, *, *, *, *")]) > > ;; VSX store VSX load VSX move GPR load GPR store GPR move > +;; XXSPLTIDP > ;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const > ;; LVX (VMX) STVX (VMX) > (define_insn "*vsx_mov<mode>_32bit" > [(set (match_operand:VSX_M 0 "nonimmediate_operand" > "=ZwO, wa, wa, ??r, ??Y, <??r>, > + wa, > wa, v, ?wa, v, <??r>, > wZ, v") > > (match_operand:VSX_M 1 "input_operand" > "wa, ZwO, wa, Y, r, r, > + eF, > wE, jwM, ?jwM, W, <nW>, > v, wZ"))] > > @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit" > } > [(set_attr "type" > "vecstore, vecload, vecsimple, load, store, *, > + vecperm, > vecsimple, vecsimple, vecsimple, *, *, > vecstore, vecload") > (set_attr "length" > "*, *, *, 16, 16, 16, > + *, > *, *, *, 20, 16, > *, *") > (set_attr "isa" > "<VSisa>, <VSisa>, <VSisa>, *, *, *, > + p10, > p9v, *, <VSisa>, *, *, > *, *")]) > ok > @@ -4580,6 +4594,23 @@ (define_insn "vsx_splat_<mode>_reg" > mtvsrdd %x0,%1,%1" > [(set_attr "type" "vecperm,vecmove")]) > > +(define_insn "*vsx_splat_v2df_xxspltidp" > + [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa") > + (vec_duplicate:V2DF > + (match_operand:DF 1 "xxspltidp_operand" "eF")))] > + "TARGET_POWER10" > +{ > + HOST_WIDE_INT value; > + > + if (!xxspltidp_constant_p (operands[1], DFmode, &value)) > + gcc_unreachable (); > + > + operands[2] = GEN_INT (value); > + return "xxspltidp %x0,%1"; > +} > + [(set_attr "type" "vecperm") > + (set_attr "prefixed" "yes")]) > + > (define_insn "vsx_splat_<mode>_mem" > [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > (vec_duplicate:VSX_D > @@ -6449,15 +6480,40 @@ (define_expand "xxspltidp_v2df" > DONE; > }) > > -(define_insn "xxspltidp_v2df_inst" > - [(set (match_operand:V2DF 0 "register_operand" "=wa") > - (unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")] > - UNSPEC_XXSPLTIDP))] > +(define_mode_iterator XXSPLTIDP [SF DF V2DF]) > + > +(define_insn "xxspltidp_<mode>_inst" > + [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa") > + (unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")] > + UNSPEC_XXSPLTIDP))] > "TARGET_POWER10" > "xxspltidp %x0,%1" > [(set_attr "type" "vecperm") > (set_attr "prefixed" "yes")]) > > +;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar > +;; constants and V2DF vector constants where both elements are the same. The > +;; constant has to be expressible as a SFmode constant that is not a SFmode > +;; denormal value. > +(define_insn_and_split "*xxspltidp_<mode>_internal" > + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa") > + (match_operand:XXSPLTIDP 1 "xxspltidp_operand" "eF"))] Extra spaces there. > + "TARGET_POWER10" > + "#" > + "&& 1" > + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand") > + (unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))] > +{ > + HOST_WIDE_INT value = 0; > + > + if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value)) > + gcc_unreachable (); > + > + operands[2] = GEN_INT (value); > +} > + [(set_attr "type" "vecperm") > + (set_attr "prefixed" "yes")]) > + > ;; XXSPLTI32DX built-in function support > (define_expand "xxsplti32dx_v4si" > [(set (match_operand:V4SI 0 "register_operand" "=wa") ok Just briefly looed at testcases.. nothing jumped out at me below. Thanks -Will > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > new file mode 100644 > index 00000000000..8f6e176f9af > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > @@ -0,0 +1,60 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include <math.h> > + > +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP > + instruction. */ > + > +double > +scalar_double_0 (void) > +{ > + return 0.0; /* XXSPLTIB or XXLXOR. */ > +} > + > +double > +scalar_double_1 (void) > +{ > + return 1.0; /* XXSPLTIDP. */ > +} > + > +#ifndef __FAST_MATH__ > +double > +scalar_double_m0 (void) > +{ > + return -0.0; /* XXSPLTIDP. */ > +} > + > +double > +scalar_double_nan (void) > +{ > + return __builtin_nan (""); /* XXSPLTIDP. */ > +} > + > +double > +scalar_double_inf (void) > +{ > + return __builtin_inf (); /* XXSPLTIDP. */ > +} > + > +double > +scalar_double_m_inf (void) /* XXSPLTIDP. */ > +{ > + return - __builtin_inf (); > +} > +#endif > + > +double > +scalar_double_pi (void) > +{ > + return M_PI; /* PLFD. */ > +} > + > +double > +scalar_double_denorm (void) > +{ > + return 0x1p-149f; /* PLFD. */ > +} > + > +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > new file mode 100644 > index 00000000000..72504bdfbbd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > @@ -0,0 +1,60 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include <math.h> > + > +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP > + instruction. */ > + > +float > +scalar_float_0 (void) > +{ > + return 0.0f; /* XXSPLTIB or XXLXOR. */ > +} > + > +float > +scalar_float_1 (void) > +{ > + return 1.0f; /* XXSPLTIDP. */ > +} > + > +#ifndef __FAST_MATH__ > +float > +scalar_float_m0 (void) > +{ > + return -0.0f; /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_nan (void) > +{ > + return __builtin_nanf (""); /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_inf (void) > +{ > + return __builtin_inff (); /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_m_inf (void) /* XXSPLTIDP. */ > +{ > + return - __builtin_inff (); > +} > +#endif > + > +float > +scalar_float_pi (void) > +{ > + return (float)M_PI; /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_denorm (void) > +{ > + return 0x1p-149f; /* PLFS. */ > +} > + > +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > new file mode 100644 > index 00000000000..82ffc86f8aa > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > @@ -0,0 +1,64 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include <math.h> > + > +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP > + instruction. */ > + > +vector double > +v2df_double_0 (void) > +{ > + return (vector double) { 0.0, 0.0 }; /* XXSPLTIB or XXLXOR. */ > +} > + > +vector double > +v2df_double_1 (void) > +{ > + return (vector double) { 1.0, 1.0 }; /* XXSPLTIDP. */ > +} > + > +#ifndef __FAST_MATH__ > +vector double > +v2df_double_m0 (void) > +{ > + return (vector double) { -0.0, -0.0 }; /* XXSPLTIDP. */ > +} > + > +vector double > +v2df_double_nan (void) > +{ > + return (vector double) { __builtin_nan (""), > + __builtin_nan ("") }; /* XXSPLTIDP. */ > +} > + > +vector double > +v2df_double_inf (void) > +{ > + return (vector double) { __builtin_inf (), > + __builtin_inf () }; /* XXSPLTIDP. */ > +} > + > +vector double > +v2df_double_m_inf (void) > +{ > + return (vector double) { - __builtin_inf (), > + - __builtin_inf () }; /* XXSPLTIDP. */ > +} > +#endif > + > +vector double > +v2df_double_pi (void) > +{ > + return (vector double) { M_PI, M_PI }; /* PLVX. */ > +} > + > +vector double > +v2df_double_denorm (void) > +{ > + return (vector double) { (double)0x1p-149f, > + (double)0x1p-149f }; /* PLVX. */ > +} > + > +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ > -- > 2.31.1 > >
On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote: > On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote: > > Generate XXSPLTIDP on power10. > > > > I have added a temporary switch (-mxxspltidp) to control whether or not the > > XXSPLTIDP instruction is generated. > > How temporary? Until we decide we no longer need to disable the option to do tests. Probably at the end of stage1. > > > > I added 3 new tests to test loading up SF/DF scalar and V2DF vector > > constants. ... > > gcc/ > > * config/rs6000/constraints.md (eF): New constraint. > > * config/rs6000/predicates.md (easy_fp_constant): If we can load > > the scalar constant with XXSPLTIDP, the floating point constant is > > easy. > > Could be shortened to something like ? > Add clause to accept xxspltidp_operand as easy. Sure. > > (xxspltidp_operand): New predicate. > > Will there ever be another instruction using the SF/DF CONST_DOUBLE or > V2DF CONST_VECTOR ? I tentatively question the name of the operand, > but defer.. This is the convention I've used for adding other instructions like xxspltib. You use the name of the instruction followed by operand. The predicate only matches RTXes that we would use the XXSPLTIDP for. And then for operands like XXSPLTIDP and XXSPLTIB where you want additional information, there is the C++ function in rs6000.c that has the additional arguments. In the case of XXSPLTIDP, it returns the 32-bit immediate value that is used in the instruction. The predicate calls this internal function, adding the address of internal variables that aren't used in this case. This way we have just one place that centralizes the knowledge about the instruction. This means that the places that deal with decomposing the XXSPLTIDP instruction don't have to do their own parsing. And if we need to add new enhancements or restrictions, there is only one place to go. There is the XXSPLTI32DX instruction that can also load SFmode, DFmode, and V2DFmode constants. You use a pair of instructions to either fill in the top 32-bits or the bottom 32-bits of each 64-bit element. I have patches for adding XXSPLTI32DX, but so far, I'm not sure whether it is a win or not. This has the xxsplti32dx_operand predicate and the xxsplti32dx_constant_p internal function and a separate constraint ("eD") for matching it. If xxspltidp_constant_p returns false, then xxsplti32dx_operand and xxsplti32dx_constant_p return false. I.e. it only returns true if we should use XXSPLTI32DX instead of some other instruction. You can't combine XXSPLTIDP and XXSPLTI32DX because you have to set the various attributes differently (XXSPLTIDP is a single prefixed instruction, while before split, XXSPLTI32DX is two prefixed instructions). There is another instruction (XXSPLTIW) that can be used for loading up certain V16QImode, V8HImode, V4SImode, and V4SFmode constants. I have patches for this as well. At the moment, there are 1-2 regressions if I enable XXSPLTIW, and I'm trying to figure out how to improve them. But it likely will be submitted as a future patch also. This would have the xxspltiw_operand predicate, "eW" constraint and the xxspltiw_constant_p internal function . > > > (easy_vector_constant): If we can generate XXSPLTIDP, mark the > > vector constant as easy. > > Duplicated from above. Yep, both of the easy_*_functions need to call this. The easy_fp_function is for scalar floating point (i.e. SFmode and DFmode). The easy_vector_function is for vector (i.e. V2DFmode). What the easy_{fp,vector}_constant functions are used for is to decide whether we can load up the constant via insns that are created via define_splits or if we need to push the constant to memory (because it isn't 'easy' to load up the constants). These two functions predate my current involvement with PowerPC on GCC (i.e. 10 years or so). I don't recall if they were used back in the 1990's when I worked on PowerPC at Cygnus solutions. > > > * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New > > declaration. > > (prefixed_permute_p): Likewise. > > > > * config/rs6000/rs6000.c (xxspltidp_constant_p): New function. > > (output_vec_const_move): Add support for XXSPLTIDP. > > (prefixed_permute_p): New function. > > Duplicated. Yes. You need the external declaration in rs6000-protos.h and the definition in rs6000.c. So you will have the duplication. > > * config/rs6000/rs6000.md (prefixed attribute): Add support for > > permute prefixed instructions. > > (movsf_hardfloat): Add XXSPLTIDP support. > > (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise. > > (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise. > > * config/rs6000/rs6000.opt (-mxxspltidp): New switch. > > * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP > > support. > > (vsx_move<mode>_32bit): Likewise. > > No e in mov (per patch contents below). Thanks. > > (vsx_splat_v2df_xxspltidp): New insn. > > (XXSPLTIDP): New mode iterator. > > (xxspltidp_<mode>_internal): New insn and splits. > > (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an > > iterated form that also does SFmode, and DFmode. > Swap "an iterated form" with "xxspltidp_<mode>_inst ? Ok. > > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64" > > (set_attr "length" > > "*, *, *, *, *, 8, > > 12, 16, *")]) > > + > > > > Unnecessarily blank line? Probably. > > (define_expand "mov<mode>" > > [(set (match_operand:FMOVE128 0 "general_operand") > > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > > index 0538db387dc..928c4fafe07 100644 > > --- a/gcc/config/rs6000/rs6000.opt > > +++ b/gcc/config/rs6000/rs6000.opt > > @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks. > > mprivileged > > Target Var(rs6000_privileged) Init(0) > > Generate code that will run in privileged state. > > + > > +mxxspltidp > > +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save > > +Generate (do not generate) XXSPLTIDP instructions. > > > Ok.
On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote: > I have patches for adding XXSPLTI32DX, but so far, I'm not sure whether it is a > win or not. This has the xxsplti32dx_operand predicate and the > xxsplti32dx_constant_p internal function and a separate constraint ("eD") for > matching it. If xxspltidp_constant_p returns false, then xxsplti32dx_operand > and xxsplti32dx_constant_p return false. I.e. it only returns true if we > should use XXSPLTI32DX instead of some other instruction. Whoops in re-reading that paragraph, I mixed up the return value. The sentence should have read: If xxspltidp_constant_p (and xxspltidp_operand) returns "true" then xxsplti32dx_constant_p (and xxsplti32dx_operand) would return false.
Ping patch. | Date: Wed, 25 Aug 2021 15:46:43 -0400 | Subject: [PATCH] Generate XXSPLTIDP on power10. | Message-ID: <YSaeI5BVUxofem9y@toto.the-meissners.org>
Hi! On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote: > On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote: > > On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote: > > > Generate XXSPLTIDP on power10. > > > > > > I have added a temporary switch (-mxxspltidp) to control whether or not the > > > XXSPLTIDP instruction is generated. > > > > How temporary? > > Until we decide we no longer need to disable the option to do tests. Probably > at the end of stage1. Don't do it at all please. If it is useful to disable some new strategy for generating constants, a (temporary or at least undocumented) flag for that can be handy. But a flag to disable separate insns is a liability, it makes the compiler much more fragile, makes changing the compiler hard because of all the surprises hidden. > > > (xxspltidp_operand): New predicate. > > > > Will there ever be another instruction using the SF/DF CONST_DOUBLE or > > V2DF CONST_VECTOR ? I tentatively question the name of the operand, > > but defer.. > > This is the convention I've used for adding other instructions like xxspltib. The only reason it is a good idea here is because of the strange behaviour this insn has with single precision subnormals. In general a better name here would be something like "sf_as_int_operand". The insn should probably not allow anything else than bit patterns, not floating point constants, have a separate pattern for that (that can then forward to the integer one). > This way we have just one place that centralizes the knowledge about the > instruction. That one place should be the define_insn for it. Segher
Hi! Please do two separate patches. The first that adds the instruction (with a bit pattern, i.e. integer, input), and perhaps a second pattern that has an fp as input and uses it if the constant is valid for the insn (survives being converted to SP and back to DP (or the other way around), and is not denormal). That can be two patches if you want, but :-) Having the integer intermediate step not only makes the code hugely less complicated, but is also allows e.g. === typedef unsigned long long v2u64 __attribute__ ((vector_size (16))); v2u64 f(void) { v2u64 x = { 0x8000000000000000, 0x8000000000000000 }; return x; } === to be optimised properly. The second part is letting the existing code use such FP (and integer!) contants. On Wed, Aug 25, 2021 at 03:46:43PM -0400, Michael Meissner wrote: > +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP > +(define_constraint "eF" > + "A vector constant that can be loaded with the XXSPLTIDP instruction." > + (match_operand 0 "xxspltidp_operand")) vector *or float*. It should allow all vectors, not just FP ones. > +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be > +;; loaded via the ISA 3.1 XXSPLTIDP instruction. > +(define_predicate "xxspltidp_operand" > + (match_code "const_double,const_vector,vec_duplicate") > +{ > + HOST_WIDE_INT value = 0; > + return xxspltidp_constant_p (op, mode, &value); > +}) Don't do that. Factor the code properly. A predicate function should never have side effects. Since this is the only place you want to convert the value to its bit pattern, you should just do that here. (Btw, initialising the value (although the function always writes it) is not defensive programming, it is hiding bugs. IMNSHO :-) ) > +bool > +xxspltidp_constant_p (rtx op, > + machine_mode mode, > + HOST_WIDE_INT *constant_ptr) > +{ > + *constant_ptr = 0; And a second time, too! Don't do either. > + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) > + return false; This is the wrong place to test these. It belongs in the caller. > + if (CONST_VECTOR_P (op)) > + { > + element = CONST_VECTOR_ELT (op, 0); > + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1))) > + return false; > + } const_vec_duplicate_p (But you actually should check if the bit pattern is valid, nothing more, nothing less). > + /* Don't return true for 0.0 since that is easy to create without > + XXSPLTIDP. */ > + if (element == CONST0_RTX (mode)) > + return false; Don't do that. Instead have whatever decides what insn to use choose more directly. > +/* Whether a permute type instruction is a prefixed instruction. This is > + called from the prefixed attribute processing. */ > + > +bool > +prefixed_permute_p (rtx_insn *insn) What does this have to do with this patch? > +{ > + rtx set = single_set (insn); > + if (!set) > + return false; > + > + rtx dest = SET_DEST (set); > + rtx src = SET_SRC (set); > + machine_mode mode = GET_MODE (dest); > + > + if (!REG_P (dest) && !SUBREG_P (dest)) > + return false; > + > + switch (mode) > + { > + case DFmode: > + case SFmode: > + case V2DFmode: > + return xxspltidp_operand (src, mode); ??!!?? That is not a permute insn at all. Perhaps you mean it is executed in the PM pipe on current implementations (all one of-em). That does not make it a permute insn. It is not a good idea to call insns that do not have semantics similar to permutations "permute". > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat" > @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32" > @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32" > @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64" > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64" It would be a good idea to merge many of these patterns again. We can do this now that we have the "isa" and "enabled" attributes. Segher
On Tue, Aug 31, 2021 at 05:52:48PM -0500, Segher Boessenkool wrote: > Hi! > > On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote: > > On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote: > > > On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote: > > > > Generate XXSPLTIDP on power10. > > > > > > > > I have added a temporary switch (-mxxspltidp) to control whether or not the > > > > XXSPLTIDP instruction is generated. > > > > > > How temporary? > > > > Until we decide we no longer need to disable the option to do tests. Probably > > at the end of stage1. > > Don't do it at all please. If it is useful to disable some new strategy > for generating constants, a (temporary or at least undocumented) flag > for that can be handy. But a flag to disable separate insns is a > liability, it makes the compiler much more fragile, makes changing the > compiler hard because of all the surprises hidden. > > > > > (xxspltidp_operand): New predicate. > > > > > > Will there ever be another instruction using the SF/DF CONST_DOUBLE or > > > V2DF CONST_VECTOR ? I tentatively question the name of the operand, > > > but defer.. > > > > This is the convention I've used for adding other instructions like xxspltib. > > The only reason it is a good idea here is because of the strange > behaviour this insn has with single precision subnormals. In general > a better name here would be something like "sf_as_int_operand". The > insn should probably not allow anything else than bit patterns, not > floating point constants, have a separate pattern for that (that can > then forward to the integer one). > > > This way we have just one place that centralizes the knowledge about the > > instruction. > > That one place should be the define_insn for it. But you need the predicate and constraint for this to work. That is what I was talking about. The define_insn is simple with this current method. Otherwise, the define_insn has to reparse the RTL.
On Tue, Aug 31, 2021 at 06:41:30PM -0500, Segher Boessenkool wrote: > Hi! > > Please do two separate patches. The first that adds the instruction > (with a bit pattern, i.e. integer, input), and perhaps a second pattern > that has an fp as input and uses it if the constant is valid for the > insn (survives being converted to SP and back to DP (or the other way > around), and is not denormal). That can be two patches if you want, > but :-) Well we already have the two stages for the built-in. But the point of the work is to make it work without the explicit builtin. > Having the integer intermediate step not only makes the code hugely less > complicated, but is also allows e.g. > === > typedef unsigned long long v2u64 __attribute__ ((vector_size (16))); > v2u64 f(void) > { > v2u64 x = { 0x8000000000000000, 0x8000000000000000 }; > return x; > } > === > > to be optimised properly. > > The second part is letting the existing code use such FP (and integer!) > contants. It is somewhat harder to do this as integer values, because you have to check whether all of the bits are correct, including skipping the SF denormal bits. I tend to think in real life, the only win will be be -0.0 (i.e. 0x8000000000000000 that you mentioned). I doubt many V2DI constants will be appropriate canidates for this. > > On Wed, Aug 25, 2021 at 03:46:43PM -0400, Michael Meissner wrote: > > +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP > > +(define_constraint "eF" > > + "A vector constant that can be loaded with the XXSPLTIDP instruction." > > + (match_operand 0 "xxspltidp_operand")) > > vector *or float*. It should allow all vectors, not just FP ones. > > > +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be > > +;; loaded via the ISA 3.1 XXSPLTIDP instruction. > > +(define_predicate "xxspltidp_operand" > > + (match_code "const_double,const_vector,vec_duplicate") > > +{ > > + HOST_WIDE_INT value = 0; > > + return xxspltidp_constant_p (op, mode, &value); > > +}) > > Don't do that. Factor the code properly. A predicate function should > never have side effects. > > Since this is the only place you want to convert the value to its bit > pattern, you should just do that here. > > (Btw, initialising the value (although the function always writes it) is > not defensive programming, it is hiding bugs. IMNSHO :-) ) And avoiding warnings. > > > +bool > > +xxspltidp_constant_p (rtx op, > > + machine_mode mode, > > + HOST_WIDE_INT *constant_ptr) > > +{ > > + *constant_ptr = 0; > > And a second time, too! Don't do either. > > > + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) > > + return false; > > This is the wrong place to test these. It belongs in the caller. > > > + if (CONST_VECTOR_P (op)) > > + { > > + element = CONST_VECTOR_ELT (op, 0); > > + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1))) > > + return false; > > + } > > const_vec_duplicate_p Ok. > (But you actually should check if the bit pattern is valid, nothing > more, nothing less). > > > + /* Don't return true for 0.0 since that is easy to create without > > + XXSPLTIDP. */ > > + if (element == CONST0_RTX (mode)) > > + return false; > > Don't do that. Instead have whatever decides what insn to use choose > more directly. > > > +/* Whether a permute type instruction is a prefixed instruction. This is > > + called from the prefixed attribute processing. */ > > + > > +bool > > +prefixed_permute_p (rtx_insn *insn) > > What does this have to do with this patch? This is used by the prefixed type attribute to know that the xxsplti{w,dp,32dx} instructions are prefixed by default, but unlike paddi, they don't want a leading 'p' in the instruction. The alternative is for each of the moves to add an explicit prefixed attribute, instead of letting the default be set. > > +{ > > + rtx set = single_set (insn); > > + if (!set) > > + return false; > > + > > + rtx dest = SET_DEST (set); > > + rtx src = SET_SRC (set); > > + machine_mode mode = GET_MODE (dest); > > + > > + if (!REG_P (dest) && !SUBREG_P (dest)) > > + return false; > > + > > + switch (mode) > > + { > > + case DFmode: > > + case SFmode: > > + case V2DFmode: > > + return xxspltidp_operand (src, mode); > > ??!!?? > > That is not a permute insn at all. > > Perhaps you mean it is executed in the PM pipe on current > implementations (all one of-em). That does not make it a permute insn. > It is not a good idea to call insns that do not have semantics similar > to permutations "permute". I'll come up with a different name. But because it includes SPLAT-ing to the other double word, I would tend to believe these instructions will always be a permute class instruction. > > > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat" > > @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32" > > @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32" > > @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64" > > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64" > > It would be a good idea to merge many of these patterns again. We can > do this now that we have the "isa" and "enabled" attributes. I don't plan to do this right now.
On Wed, Sep 01, 2021 at 04:22:13PM -0400, Michael Meissner wrote: > On Tue, Aug 31, 2021 at 06:41:30PM -0500, Segher Boessenkool wrote: > > Hi! > > > > Please do two separate patches. The first that adds the instruction > > (with a bit pattern, i.e. integer, input), and perhaps a second pattern > > that has an fp as input and uses it if the constant is valid for the > > insn (survives being converted to SP and back to DP (or the other way > > around), and is not denormal). That can be two patches if you want, > > but :-) > > Well we already have the two stages for the built-in. But the point of the > work is to make it work without the explicit builtin. There should be a define_insn that works with just as just a bit pattern, without floating point in sight. > > Having the integer intermediate step not only makes the code hugely less > > complicated, but is also allows e.g. > > > > === > > typedef unsigned long long v2u64 __attribute__ ((vector_size (16))); > > v2u64 f(void) > > { > > v2u64 x = { 0x8000000000000000, 0x8000000000000000 }; > > return x; > > } > > === > > > > to be optimised properly. > > > > The second part is letting the existing code use such FP (and integer!) > > contants. > > It is somewhat harder to do this as integer values, because you have to check > whether all of the bits are correct, including skipping the SF denormal bits. You have to *anyway*. And it is easy enough. > I tend to think in real life, the only win will be be -0.0 > (i.e. 0x8000000000000000 that you mentioned). I doubt many V2DI constants will > be appropriate canidates for this. There are a few more. But the point is that the code is *simpler* like this. It usually is -- shortcuts often take more time in the end. > > (Btw, initialising the value (although the function always writes it) is > > not defensive programming, it is hiding bugs. IMNSHO :-) ) > > And avoiding warnings. Shutting up warnings that your control flow is not good is not a good idea usually. Don't fight the compiler: it does not matter if you have better taste than it, it will win. Instead, write better control flow. There might even be an actual bug hiding in there! > > > +/* Whether a permute type instruction is a prefixed instruction. This is > > > + called from the prefixed attribute processing. */ > > > + > > > +bool > > > +prefixed_permute_p (rtx_insn *insn) > > > > What does this have to do with this patch? > > This is used by the prefixed type attribute to know that the xxsplti{w,dp,32dx} > instructions are prefixed by default, but unlike paddi, they don't want a > leading 'p' in the instruction. They are not permute insns, that's where your code turns into an enigma. You should not have a function for prefixed insns that are not a variant of non-prefixed insns: that is *most* prefixed insns (eventually it will be, right now you see memory insns mostly of course). Pick out the special case, instead (hint: if it has a "not" in the description, you probably picked the wrong side). So you have an "is_prefixed" thing, all prefixed insns, and an "is_prefixed_variant" (or some better name :-) ) function, that returns true for those prefixed insns written with a leading "p" on an existing base insn (and actually have the original opcode as suffix). > > > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat" > > > @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32" > > > @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32" > > > @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64" > > > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64" > > > > It would be a good idea to merge many of these patterns again. We can > > do this now that we have the "isa" and "enabled" attributes. > > I don't plan to do this right now. The longer you wait the more work it becomes, and the more work other things (like this patch) become as well. Segher
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index c8cff1a3038..ea2e4a267c3 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -208,6 +208,11 @@ (define_constraint "P" (and (match_code "const_int") (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000"))) +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP +(define_constraint "eF" + "A vector constant that can be loaded with the XXSPLTIDP instruction." + (match_operand 0 "xxspltidp_operand")) + ;; 34-bit signed integer constant (define_constraint "eI" "A signed 34-bit integer constant if prefixed instructions are supported." diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 956e42bc514..134243e404b 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant" if (TARGET_VSX && op == CONST0_RTX (mode)) return 1; + /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can + be loaded with that instruction. */ + if (xxspltidp_operand (op, mode)) + return 1; + /* Otherwise consider floating point constants hard, so that the constant gets pushed to memory during the early RTL phases. This has the advantage that double precision constants that can be @@ -640,6 +645,15 @@ (define_predicate "xxspltib_constant_nosplit" return num_insns == 1; }) +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be +;; loaded via the ISA 3.1 XXSPLTIDP instruction. +(define_predicate "xxspltidp_operand" + (match_code "const_double,const_vector,vec_duplicate") +{ + HOST_WIDE_INT value = 0; + return xxspltidp_constant_p (op, mode, &value); +}) + ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a ;; vector register without using memory. (define_predicate "easy_vector_constant" @@ -653,6 +667,9 @@ (define_predicate "easy_vector_constant" if (zero_constant (op, mode) || all_ones_constant (op, mode)) return true; + if (xxspltidp_operand (op, mode)) + return true; + if (TARGET_P9_VECTOR && xxspltib_constant_p (op, mode, &num_insns, &value)) return true; diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 14f6b313105..9bba57c22f2 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int, extern int easy_altivec_constant (rtx, machine_mode); extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *); +extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *); extern int vspltis_shifted (rtx); extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); extern bool macho_lo_sum_memory_operand (rtx, machine_mode); @@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode); extern bool prefixed_load_p (rtx_insn *); extern bool prefixed_store_p (rtx_insn *); extern bool prefixed_paddi_p (rtx_insn *); +extern bool prefixed_permute_p (rtx_insn *); extern void rs6000_asm_output_opcode (FILE *); extern void output_pcrel_opt_reloc (rtx); extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index e073b26b430..322b3c83925 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -6533,6 +6533,74 @@ xxspltib_constant_p (rtx op, return true; } +/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1 + XXSPLTIDP instruction. + + Return the constant that is being split via CONSTANT_PTR to use in the + XXSPLTIDP instruction. */ + +bool +xxspltidp_constant_p (rtx op, + machine_mode mode, + HOST_WIDE_INT *constant_ptr) +{ + *constant_ptr = 0; + + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) + return false; + + if (mode == VOIDmode) + mode = GET_MODE (op); + + rtx element = op; + if (mode == V2DFmode) + { + if (CONST_VECTOR_P (op)) + { + element = CONST_VECTOR_ELT (op, 0); + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1))) + return false; + } + + else if (GET_CODE (op) == VEC_DUPLICATE) + element = XEXP (op, 0); + + else + return false; + + mode = DFmode; + } + + if (mode != SFmode && mode != DFmode) + return false; + + if (GET_MODE (element) != mode) + return false; + + if (!CONST_DOUBLE_P (element)) + return false; + + /* Don't return true for 0.0 since that is easy to create without + XXSPLTIDP. */ + if (element == CONST0_RTX (mode)) + return false; + + /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP. */ + const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element); + if (!exact_real_truncate (SFmode, rv)) + return false; + + long value; + REAL_VALUE_TO_TARGET_SINGLE (*rv, value); + + /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero). */ + if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0)) + return false; + + *constant_ptr = value; + return true; +} + const char * output_vec_const_move (rtx *operands) { @@ -6548,6 +6616,7 @@ output_vec_const_move (rtx *operands) { bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest)); int xxspltib_value = 256; + HOST_WIDE_INT xxspltidp_value = 0; int num_insns = -1; if (zero_constant (vec, mode)) @@ -6577,6 +6646,12 @@ output_vec_const_move (rtx *operands) gcc_unreachable (); } + if (xxspltidp_constant_p (vec, mode, &xxspltidp_value)) + { + operands[2] = GEN_INT (xxspltidp_value); + return "xxspltidp %x0,%2"; + } + if (TARGET_P9_VECTOR && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) { @@ -26219,6 +26294,37 @@ prefixed_paddi_p (rtx_insn *insn) return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL); } +/* Whether a permute type instruction is a prefixed instruction. This is + called from the prefixed attribute processing. */ + +bool +prefixed_permute_p (rtx_insn *insn) +{ + rtx set = single_set (insn); + if (!set) + return false; + + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + machine_mode mode = GET_MODE (dest); + + if (!REG_P (dest) && !SUBREG_P (dest)) + return false; + + switch (mode) + { + case DFmode: + case SFmode: + case V2DFmode: + return xxspltidp_operand (src, mode); + + default: + break; + } + + return false; +} + /* Whether the next instruction needs a 'p' prefix issued before the instruction is printed out. */ static bool prepend_p_to_next_insn; diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index a84438f8545..bf3bfed3b88 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes" (eq_attr "type" "integer,add") (if_then_else (match_test "prefixed_paddi_p (insn)") + (const_string "yes") + (const_string "no")) + + (eq_attr "type" "vecperm") + (if_then_else (match_test "prefixed_permute_p (insn)") (const_string "yes") (const_string "no"))] @@ -7723,17 +7728,17 @@ (define_split ;; ;; LWZ LFS LXSSP LXSSPX STFS STXSSP ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP -;; MR MT<x> MF<x> NOP +;; MR MT<x> MF<x> NOP XXSPLTIDP (define_insn "movsf_hardfloat" [(set (match_operand:SF 0 "nonimmediate_operand" "=!r, f, v, wa, m, wY, Z, m, wa, !r, f, wa, - !r, *c*l, !r, *h") + !r, *c*l, !r, *h, wa") (match_operand:SF 1 "input_operand" "m, m, wY, Z, f, v, wa, r, j, j, f, wa, - r, r, *h, 0"))] + r, r, *h, 0, eF"))] "(register_operand (operands[0], SFmode) || register_operand (operands[1], SFmode)) && TARGET_HARD_FLOAT @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat" mr %0,%1 mt%0 %1 mf%1 %0 - nop" + nop + #" [(set_attr "type" "load, fpload, fpload, fpload, fpstore, fpstore, fpstore, store, veclogical, integer, fpsimple, fpsimple, - *, mtjmpr, mfjmpr, *") + *, mtjmpr, mfjmpr, *, vecperm") (set_attr "isa" "*, *, p9v, p8v, *, p9v, p8v, *, *, *, *, *, - *, *, *, *")]) + *, *, *, *, p10")]) ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ ;; FMR MR MT%0 MF%1 NOP @@ -8023,18 +8029,18 @@ (define_split ;; STFD LFD FMR LXSD STXSD ;; LXSD STXSD XXLOR XXLXOR GPR<-0 -;; LWZ STW MR +;; LWZ STW MR XXSPLTIDP (define_insn "*mov<mode>_hardfloat32" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, <f64_p9>, wY, <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, - Y, r, !r") + Y, r, !r, wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, <f64_p9>, Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, - r, Y, r"))] + r, Y, r, eF"))] "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], <MODE>mode) || gpc_reg_operand (operands[1], <MODE>mode))" @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32" # # # + # #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, two, - store, load, two") + store, load, two, vecperm") (set_attr "size" "64") (set_attr "length" "*, *, *, *, *, *, *, *, *, 8, - 8, 8, 8") + 8, 8, 8, *") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, - *, *, *")]) + *, *, *, p10")]) ;; STW LWZ MR G-const H-const F-const @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32" ;; STFD LFD FMR LXSD STXSD ;; LXSDX STXSDX XXLOR XXLXOR LI 0 ;; STD LD MR MT{CTR,LR} MF{CTR,LR} -;; NOP MFVSRD MTVSRD +;; NOP MFVSRD MTVSRD XXSPLTIDP (define_insn "*mov<mode>_hardfloat64" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, <f64_p9>, wY, <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, YZ, r, !r, *c*l, !r, - *h, r, <f64_dm>") + *h, r, <f64_dm>, wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, <f64_p9>, Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, r, YZ, r, r, *h, - 0, <f64_dm>, r"))] + 0, <f64_dm>, r, eF"))] "TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], <MODE>mode) || gpc_reg_operand (operands[1], <MODE>mode))" @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64" mf%1 %0 nop mfvsrd %0,%x1 - mtvsrd %x0,%1" + mtvsrd %x0,%1 + #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, integer, store, load, *, mtjmpr, mfjmpr, - *, mfvsr, mtvsr") + *, mfvsr, mtvsr, vecperm") (set_attr "size" "64") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, *, *, *, *, *, - *, p8v, p8v")]) + *, p8v, p8v, p10")]) ;; STD LD MR MT<SPR> MF<SPR> G-const ;; H-const F-const Special @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64" (set_attr "length" "*, *, *, *, *, 8, 12, 16, *")]) + (define_expand "mov<mode>" [(set (match_operand:FMOVE128 0 "general_operand") diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 0538db387dc..928c4fafe07 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks. mprivileged Target Var(rs6000_privileged) Init(0) Generate code that will run in privileged state. + +mxxspltidp +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save +Generate (do not generate) XXSPLTIDP instructions. diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index bf033e31c1c..af9a04870d4 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1191,16 +1191,19 @@ (define_insn_and_split "*xxspltib_<mode>_split" ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move. ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR) +;; XXSPLTIDP ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW ;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) (define_insn "vsx_mov<mode>_64bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, r, we, ?wQ, + wa, ?&r, ??r, ??Y, <??r>, wa, v, ?wa, v, <??r>, wZ, v") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, we, r, r, + eF, wQ, Y, r, r, wE, jwM, ?jwM, W, <nW>, v, wZ"))] @@ -1212,36 +1215,44 @@ (define_insn "vsx_mov<mode>_64bit" } [(set_attr "type" "vecstore, vecload, vecsimple, mtvsr, mfvsr, load, + vecperm, store, load, store, *, vecsimple, vecsimple, vecsimple, *, *, vecstore, vecload") (set_attr "num_insns" "*, *, *, 2, *, 2, + *, 2, 2, 2, 2, *, *, *, 5, 2, *, *") (set_attr "max_prefixed_insns" "*, *, *, *, *, 2, + *, 2, 2, 2, 2, *, *, *, *, *, *, *") (set_attr "length" "*, *, *, 8, *, 8, + *, 8, 8, 8, 8, *, *, *, 20, 8, *, *") (set_attr "isa" "<VSisa>, <VSisa>, <VSisa>, *, *, *, + p10, *, *, *, *, p9v, *, <VSisa>, *, *, *, *")]) ;; VSX store VSX load VSX move GPR load GPR store GPR move +;; XXSPLTIDP ;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const ;; LVX (VMX) STVX (VMX) (define_insn "*vsx_mov<mode>_32bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, ??r, ??Y, <??r>, + wa, wa, v, ?wa, v, <??r>, wZ, v") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, Y, r, r, + eF, wE, jwM, ?jwM, W, <nW>, v, wZ"))] @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit" } [(set_attr "type" "vecstore, vecload, vecsimple, load, store, *, + vecperm, vecsimple, vecsimple, vecsimple, *, *, vecstore, vecload") (set_attr "length" "*, *, *, 16, 16, 16, + *, *, *, *, 20, 16, *, *") (set_attr "isa" "<VSisa>, <VSisa>, <VSisa>, *, *, *, + p10, p9v, *, <VSisa>, *, *, *, *")]) @@ -4580,6 +4594,23 @@ (define_insn "vsx_splat_<mode>_reg" mtvsrdd %x0,%1,%1" [(set_attr "type" "vecperm,vecmove")]) +(define_insn "*vsx_splat_v2df_xxspltidp" + [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa") + (vec_duplicate:V2DF + (match_operand:DF 1 "xxspltidp_operand" "eF")))] + "TARGET_POWER10" +{ + HOST_WIDE_INT value; + + if (!xxspltidp_constant_p (operands[1], DFmode, &value)) + gcc_unreachable (); + + operands[2] = GEN_INT (value); + return "xxspltidp %x0,%1"; +} + [(set_attr "type" "vecperm") + (set_attr "prefixed" "yes")]) + (define_insn "vsx_splat_<mode>_mem" [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") (vec_duplicate:VSX_D @@ -6449,15 +6480,40 @@ (define_expand "xxspltidp_v2df" DONE; }) -(define_insn "xxspltidp_v2df_inst" - [(set (match_operand:V2DF 0 "register_operand" "=wa") - (unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")] - UNSPEC_XXSPLTIDP))] +(define_mode_iterator XXSPLTIDP [SF DF V2DF]) + +(define_insn "xxspltidp_<mode>_inst" + [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa") + (unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")] + UNSPEC_XXSPLTIDP))] "TARGET_POWER10" "xxspltidp %x0,%1" [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) +;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar +;; constants and V2DF vector constants where both elements are the same. The +;; constant has to be expressible as a SFmode constant that is not a SFmode +;; denormal value. +(define_insn_and_split "*xxspltidp_<mode>_internal" + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa") + (match_operand:XXSPLTIDP 1 "xxspltidp_operand" "eF"))] + "TARGET_POWER10" + "#" + "&& 1" + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand") + (unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))] +{ + HOST_WIDE_INT value = 0; + + if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value)) + gcc_unreachable (); + + operands[2] = GEN_INT (value); +} + [(set_attr "type" "vecperm") + (set_attr "prefixed" "yes")]) + ;; XXSPLTI32DX built-in function support (define_expand "xxsplti32dx_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c new file mode 100644 index 00000000000..8f6e176f9af --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include <math.h> + +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +double +scalar_double_0 (void) +{ + return 0.0; /* XXSPLTIB or XXLXOR. */ +} + +double +scalar_double_1 (void) +{ + return 1.0; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +double +scalar_double_m0 (void) +{ + return -0.0; /* XXSPLTIDP. */ +} + +double +scalar_double_nan (void) +{ + return __builtin_nan (""); /* XXSPLTIDP. */ +} + +double +scalar_double_inf (void) +{ + return __builtin_inf (); /* XXSPLTIDP. */ +} + +double +scalar_double_m_inf (void) /* XXSPLTIDP. */ +{ + return - __builtin_inf (); +} +#endif + +double +scalar_double_pi (void) +{ + return M_PI; /* PLFD. */ +} + +double +scalar_double_denorm (void) +{ + return 0x1p-149f; /* PLFD. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c new file mode 100644 index 00000000000..72504bdfbbd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include <math.h> + +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +float +scalar_float_0 (void) +{ + return 0.0f; /* XXSPLTIB or XXLXOR. */ +} + +float +scalar_float_1 (void) +{ + return 1.0f; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +float +scalar_float_m0 (void) +{ + return -0.0f; /* XXSPLTIDP. */ +} + +float +scalar_float_nan (void) +{ + return __builtin_nanf (""); /* XXSPLTIDP. */ +} + +float +scalar_float_inf (void) +{ + return __builtin_inff (); /* XXSPLTIDP. */ +} + +float +scalar_float_m_inf (void) /* XXSPLTIDP. */ +{ + return - __builtin_inff (); +} +#endif + +float +scalar_float_pi (void) +{ + return (float)M_PI; /* XXSPLTIDP. */ +} + +float +scalar_float_denorm (void) +{ + return 0x1p-149f; /* PLFS. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c new file mode 100644 index 00000000000..82ffc86f8aa --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include <math.h> + +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +vector double +v2df_double_0 (void) +{ + return (vector double) { 0.0, 0.0 }; /* XXSPLTIB or XXLXOR. */ +} + +vector double +v2df_double_1 (void) +{ + return (vector double) { 1.0, 1.0 }; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +vector double +v2df_double_m0 (void) +{ + return (vector double) { -0.0, -0.0 }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_nan (void) +{ + return (vector double) { __builtin_nan (""), + __builtin_nan ("") }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_inf (void) +{ + return (vector double) { __builtin_inf (), + __builtin_inf () }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_m_inf (void) +{ + return (vector double) { - __builtin_inf (), + - __builtin_inf () }; /* XXSPLTIDP. */ +} +#endif + +vector double +v2df_double_pi (void) +{ + return (vector double) { M_PI, M_PI }; /* PLVX. */ +} + +vector double +v2df_double_denorm (void) +{ + return (vector double) { (double)0x1p-149f, + (double)0x1p-149f }; /* PLVX. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */