Message ID | fc85253b-cc4c-02cf-3c65-717633b08d27@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] new split pattern for TI to V1TI move [PR103124] | expand |
On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote: > > Hi, > This patch defines a new split pattern for TI to V1TI move. The pattern concatenates two subreg:DI of > a TI to a V2DI, then move the V2DI to V1TI. With the pattern, the subreg pass can do register split for > TI when there is a TI to V1TI move. The patch optimizes one unnecessary "mr" out on P9. The new > test case illustrates it. > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk? > Any recommendations? Thanks a lot. > > ChangeLog > 2021-12-13 Haochen Gui <guihaoc@linux.ibm.com> > > gcc/ > * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined. > > gcc/testsuite/ > * gcc.target/powerpc/pr103124.c: New testcase. > > > patch.diff > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index bf033e31c1c..7bca7780735 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -6589,3 +6589,19 @@ (define_insn "xxeval" > [(set_attr "type" "vecperm") > (set_attr "prefixed" "yes")]) > > +;; split TI to V1TI move > +(define_split > + [(set (match_operand:V1TI 0 "vsx_register_operand") > + (subreg:V1TI > + (match_operand:TI 1 "int_reg_operand") 0 ))] > + "TARGET_P9_VECTOR && !reload_completed" > + [(const_int 0)] > +{ > + rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0); > + rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8); > + rtx tmp3 = gen_reg_rtx (V2DImode); > + emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2)); > + rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0); > + emit_move_insn (operands[0], tmp4); > + DONE; > +}) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c b/gcc/testsuite/gcc.target/powerpc/pr103124.c > new file mode 100644 > index 00000000000..724492dbcd2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */ Please don't include the "powerpc" target selector in the gcc.target/powerpc directory. Just use lp64. > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > +/* { dg-final { scan-assembler-not "\mmr\M" } } */ > + > +vector __int128 add (long long a) > +{ > + vector __int128 b; > + b = (vector __int128) {a}; > + return b; > +} Okay with that change. Thanks, David
Hi! On Mon, Dec 13, 2021 at 05:22:06PM -0500, David Edelsohn wrote: > On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote: > > --- a/gcc/config/rs6000/vsx.md > > +++ b/gcc/config/rs6000/vsx.md > > @@ -6589,3 +6589,19 @@ (define_insn "xxeval" > > [(set_attr "type" "vecperm") > > (set_attr "prefixed" "yes")]) > > > > +;; split TI to V1TI move Please comment that this splitter tries to generate mtvsrdd insns, and don't say the obvious things :-) > > +(define_split > > + [(set (match_operand:V1TI 0 "vsx_register_operand") > > + (subreg:V1TI > > + (match_operand:TI 1 "int_reg_operand") 0 ))] > > + "TARGET_P9_VECTOR && !reload_completed" Why the "!reload_completed"? Is this generated after reload as well, and that is bad for some reason? > > + [(const_int 0)] > > +{ > > + rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0); > > + rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8); > > + rtx tmp3 = gen_reg_rtx (V2DImode); > > + emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2)); > > + rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0); > > + emit_move_insn (operands[0], tmp4); > > + DONE; > > +}) Ah, it is bad because it generates a pseudo. So either you just make it work when everything is hard regs, or you do this *and comment it*. The first option is not very easy to do. You need to make sure you can do those subregs (and get GPRs!), and you need to use a hard reg instead of the new pseudo (you can use operand 0 for this here though, it can never be the same as operand 1 :-) (but only do this if this *is* after reload)). But, it sounds like you actually saw problems when allowing it after reload, so it sounds like it would actually be useful to do it then? > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c > > @@ -0,0 +1,11 @@ > > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */ > > Please don't include the "powerpc" target selector in the > gcc.target/powerpc directory. Just use lp64. Or actually, don't use anything, and do a dg-require int128 instead. Segher
Hi Segher, Thanks for your advice. Please see my comments. On 14/12/2021 上午 6:59, Segher Boessenkool wrote: > Hi! > > On Mon, Dec 13, 2021 at 05:22:06PM -0500, David Edelsohn wrote: >> On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote: >>> --- a/gcc/config/rs6000/vsx.md >>> +++ b/gcc/config/rs6000/vsx.md >>> @@ -6589,3 +6589,19 @@ (define_insn "xxeval" >>> [(set_attr "type" "vecperm") >>> (set_attr "prefixed" "yes")]) >>> >>> +;; split TI to V1TI move > > Please comment that this splitter tries to generate mtvsrdd insns, and > don't say the obvious things :-) > OK, I will modify it. >>> +(define_split >>> + [(set (match_operand:V1TI 0 "vsx_register_operand") >>> + (subreg:V1TI >>> + (match_operand:TI 1 "int_reg_operand") 0 ))] >>> + "TARGET_P9_VECTOR && !reload_completed" > > Why the "!reload_completed"? Is this generated after reload as well, > and that is bad for some reason? > >>> + [(const_int 0)] >>> +{ >>> + rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0); >>> + rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8); >>> + rtx tmp3 = gen_reg_rtx (V2DImode); >>> + emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2)); >>> + rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0); >>> + emit_move_insn (operands[0], tmp4); >>> + DONE; >>> +}) > > Ah, it is bad because it generates a pseudo. > > So either you just make it work when everything is hard regs, or you do > this *and comment it*. > > The first option is not very easy to do. You need to make sure you can > do those subregs (and get GPRs!), and you need to use a hard reg instead > of the new pseudo (you can use operand 0 for this here though, it can > never be the same as operand 1 :-) (but only do this if this *is* after > reload)). > > But, it sounds like you actually saw problems when allowing it after > reload, so it sounds like it would actually be useful to do it then? The purpose of this split pattern is to generate V1TI by two subregs from TI. Subsequent subreg pass can recognize TI in the insn as splitable. As there is no subreg pass after reload, I want the split just to be done before reload. Also as you mentioned, my patch generates a pseudo. It doesn't work after reload. That's why I set "!reload_complete" condition. > >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c >>> @@ -0,0 +1,11 @@ >>> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */ >> >> Please don't include the "powerpc" target selector in the >> gcc.target/powerpc directory. Just use lp64. > > Or actually, don't use anything, and do a dg-require int128 instead. > Thanks, I will take it. > > Segher >
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index bf033e31c1c..7bca7780735 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -6589,3 +6589,19 @@ (define_insn "xxeval" [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) +;; split TI to V1TI move +(define_split + [(set (match_operand:V1TI 0 "vsx_register_operand") + (subreg:V1TI + (match_operand:TI 1 "int_reg_operand") 0 ))] + "TARGET_P9_VECTOR && !reload_completed" + [(const_int 0)] +{ + rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0); + rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8); + rtx tmp3 = gen_reg_rtx (V2DImode); + emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2)); + rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0); + emit_move_insn (operands[0], tmp4); + DONE; +}) diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c b/gcc/testsuite/gcc.target/powerpc/pr103124.c new file mode 100644 index 00000000000..724492dbcd2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ +/* { dg-final { scan-assembler-not "\mmr\M" } } */ + +vector __int128 add (long long a) +{ + vector __int128 b; + b = (vector __int128) {a}; + return b; +}