Message ID | 20160919220208.GA5448@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
On Mon, Sep 19, 2016 at 06:02:08PM -0400, Michael Meissner wrote: > vector float combine (float a, float b, float c, float d) > { > return (vector float) { a, b, c, d }; > } [ ... ] > However ISA 2.07 (i.e. power8) added the VMRGEW instruction, which can do this > more simply: > > xxpermdi 34,1,2,0 > xxpermdi 32,3,4,0 > xvcvdpsp 34,34 > xvcvdpsp 32,32 > vmrgew 2,2,0 This results in {a,c,b,d} instead? > --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 240142) > +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) > @@ -6821,11 +6821,26 @@ rs6000_expand_vector_init (rtx target, r > rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2)); > rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3)); > > - emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1)); > - emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3)); > - emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even)); > - emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd)); > - rs6000_expand_extract_even (target, flt_even, flt_odd); > + /* Use VMRGEW if we can instead of doing a permute. */ > + if (TARGET_P8_VECTOR) > + { > + emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2)); > + emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3)); But this looks correct, so just the example is pastoed? Okay for trunk if you can clear that up. Thanks, Segher
On Mon, Sep 19, 2016 at 05:43:19PM -0500, Segher Boessenkool wrote: > On Mon, Sep 19, 2016 at 06:02:08PM -0400, Michael Meissner wrote: > > vector float combine (float a, float b, float c, float d) > > { > > return (vector float) { a, b, c, d }; > > } > > [ ... ] > > > However ISA 2.07 (i.e. power8) added the VMRGEW instruction, which can do this > > more simply: > > > > xxpermdi 34,1,2,0 > > xxpermdi 32,3,4,0 > > xvcvdpsp 34,34 > > xvcvdpsp 32,32 > > vmrgew 2,2,0 > > This results in {a,c,b,d} instead? Yes. > > --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 240142) > > +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) > > @@ -6821,11 +6821,26 @@ rs6000_expand_vector_init (rtx target, r > > rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2)); > > rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3)); > > > > - emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1)); > > - emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3)); > > - emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even)); > > - emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd)); > > - rs6000_expand_extract_even (target, flt_even, flt_odd); > > + /* Use VMRGEW if we can instead of doing a permute. */ > > + if (TARGET_P8_VECTOR) > > + { > > + emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2)); > > + emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3)); > > But this looks correct, so just the example is pastoed? Yes, I pasted the code for -mcpu=power7 and -mcpu=power8. The original code puts the elements in a different order, and then fixes it up with a permute. I changed the order so that it would match how VMRGEW works, and I tested it on both big and little endian power8's. The original puts the values as: +-------+-------+-------+-------+ | a | unsued| b | unused| +-------+-------+-------+-------+ +-------+-------+-------+-------+ | c | unsued| d | unused| +-------+-------+-------+-------+ The VMRGEW instruction wants the register as: +-------+-------+-------+-------+ | a | unsued| c | unused| +-------+-------+-------+-------+ +-------+-------+-------+-------+ | b | unsued| d | unused| +-------+-------+-------+-------+ > Okay for trunk if you can clear that up. Did that answer the question? > Thanks, > > > Segher >
On Mon, Sep 19, 2016 at 06:51:42PM -0400, Michael Meissner wrote: > > > However ISA 2.07 (i.e. power8) added the VMRGEW instruction, which can do this > > > more simply: > > > > > > xxpermdi 34,1,2,0 > > > xxpermdi 32,3,4,0 > > > xvcvdpsp 34,34 > > > xvcvdpsp 32,32 > > > vmrgew 2,2,0 > > > > This results in {a,c,b,d} instead? > > Yes. [ snip ] My question was if you typoed/pastoed/thinkoed it here and you meant xxpermdi 34,1,3,0 xxpermdi 32,2,4,0 xvcvdpsp 34,34 xvcvdpsp 32,32 vmrgew 2,2,0 which a) works, and b) seems to be what the code generates :-) Segher
On Mon, Sep 19, 2016 at 06:20:59PM -0500, Segher Boessenkool wrote: > On Mon, Sep 19, 2016 at 06:51:42PM -0400, Michael Meissner wrote: > > > > However ISA 2.07 (i.e. power8) added the VMRGEW instruction, which can do this > > > > more simply: > > > > > > > > xxpermdi 34,1,2,0 > > > > xxpermdi 32,3,4,0 > > > > xvcvdpsp 34,34 > > > > xvcvdpsp 32,32 > > > > vmrgew 2,2,0 > > > > > > This results in {a,c,b,d} instead? > > > > Yes. > > [ snip ] > > My question was if you typoed/pastoed/thinkoed it here and you meant > > xxpermdi 34,1,3,0 > xxpermdi 32,2,4,0 > xvcvdpsp 34,34 > xvcvdpsp 32,32 > vmrgew 2,2,0 > > which a) works, and b) seems to be what the code generates :-) Yes it works on both big endian and little endian power8's. And yes, it is the code generated by the patch.
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 240142) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -6821,11 +6821,26 @@ rs6000_expand_vector_init (rtx target, r rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2)); rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3)); - emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1)); - emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3)); - emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even)); - emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd)); - rs6000_expand_extract_even (target, flt_even, flt_odd); + /* Use VMRGEW if we can instead of doing a permute. */ + if (TARGET_P8_VECTOR) + { + emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2)); + emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3)); + emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even)); + emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_p8_vmrgew_v4sf_direct (target, flt_even, flt_odd)); + else + emit_insn (gen_p8_vmrgew_v4sf_direct (target, flt_odd, flt_even)); + } + else + { + emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1)); + emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3)); + emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even)); + emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd)); + rs6000_expand_extract_even (target, flt_even, flt_odd); + } } return; } Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 240142) +++ gcc/config/rs6000/altivec.md (.../gcc/config/rs6000) (working copy) @@ -141,6 +141,7 @@ (define_c_enum "unspec" UNSPEC_VMRGH_DIRECT UNSPEC_VMRGL_DIRECT UNSPEC_VSPLT_DIRECT + UNSPEC_VMRGEW_DIRECT UNSPEC_VSUMSWS_DIRECT UNSPEC_VADDCUQ UNSPEC_VADDEUQM @@ -1340,6 +1341,15 @@ (define_insn "p8_vmrgow" } [(set_attr "type" "vecperm")]) +(define_insn "p8_vmrgew_v4sf_direct" + [(set (match_operand:V4SF 0 "register_operand" "=v") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v") + (match_operand:V4SF 2 "register_operand" "v")] + UNSPEC_VMRGEW_DIRECT))] + "TARGET_P8_VECTOR" + "vmrgew %0,%1,%2" + [(set_attr "type" "vecperm")]) + (define_expand "vec_widen_umult_even_v16qi" [(use (match_operand:V8HI 0 "register_operand" "")) (use (match_operand:V16QI 1 "register_operand" ""))