Message ID | YkHhmvwSJF7DUDhJ@toto.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | Optimize vec_splats of vec_extract, PR target/99293 | expand |
On Mon, Mar 28, 2022 at 12:26:02PM -0400, Michael Meissner wrote: > However on power9 and power10 it generates: > > ;; vec_splats (vec_extract (src, 0)) > mfvsld 3,34 > mtvsrdd 34,9,9 > > ;; vec_splats (vec_extract (src, 1)) > mfvsrd 9,34 > mtvsrdd 34,9,9 > > This is due to the power9 having the mfvsrld instruction which can extract > either 64-bit element into a GPR. While there are alternatives for both > vector registers and GPR registers, the register allocator prefers to put > DImode into GPR registers. As I said in comment 2 in the PR, it is because we do not have this pattern yet, we only have vec_concat. The instruction combiner tries to use this pattern, but it doesn't exist :-) > +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element > +;; PR target/99293 > +(define_insn "*vsx_splat_const_extract_<mode>" > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > + (vec_duplicate:VSX_D > + (vec_select:<VS_scalar> > + (match_operand:VSX_D 1 "vsx_register_operand" "wa") > + (parallel [(match_operand 2 "const_0_to_1_operand" "n")]))))] > + "VECTOR_MEM_VSX_P (<MODE>mode)" > +{ > + int which_word = INTVAL (operands[2]); dword, not word. > + if (!BYTES_BIG_ENDIAN) > + which_word = 1 - which_word; > + > + operands[3] = GEN_INT (which_word ? 3 : 0); > + return "xxpermdi %x0,%x1,%x1,%3"; Please use gen_vsx_xxspltd_v2di, instead. Which itself should not use an unspec of course, but that is another patch. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c > @@ -0,0 +1,36 @@ > +/* { dg-do compile { target powerpc*-*-* } } */ Don't. This is gcc.target/powerpc/ already. > +/* { dg-final { scan-assembler-times "xxpermdi" 4 } } */ \m \M Thanks, Segher
On Mon, Mar 28, 2022 at 12:14:09PM -0500, Segher Boessenkool wrote: > On Mon, Mar 28, 2022 at 12:26:02PM -0400, Michael Meissner wrote: > > However on power9 and power10 it generates: > > > > ;; vec_splats (vec_extract (src, 0)) > > mfvsld 3,34 > > mtvsrdd 34,9,9 > > > > ;; vec_splats (vec_extract (src, 1)) > > mfvsrd 9,34 > > mtvsrdd 34,9,9 > > > > This is due to the power9 having the mfvsrld instruction which can extract > > either 64-bit element into a GPR. While there are alternatives for both > > vector registers and GPR registers, the register allocator prefers to put > > DImode into GPR registers. > > As I said in comment 2 in the PR, it is because we do not have this > pattern yet, we only have vec_concat. The instruction combiner tries > to use this pattern, but it doesn't exist :-) Ok. I will create that function and see if it works. > > +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element > > +;; PR target/99293 > > +(define_insn "*vsx_splat_const_extract_<mode>" > > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > > + (vec_duplicate:VSX_D > > + (vec_select:<VS_scalar> > > + (match_operand:VSX_D 1 "vsx_register_operand" "wa") > > + (parallel [(match_operand 2 "const_0_to_1_operand" "n")]))))] > > + "VECTOR_MEM_VSX_P (<MODE>mode)" > > +{ > > + int which_word = INTVAL (operands[2]); > > dword, not word. Ok. > > + if (!BYTES_BIG_ENDIAN) > > + which_word = 1 - which_word; > > + > > + operands[3] = GEN_INT (which_word ? 3 : 0); > > + return "xxpermdi %x0,%x1,%x1,%3"; > > Please use gen_vsx_xxspltd_v2di, instead. Which itself should not use > an unspec of course, but that is another patch. There is also vsx_concat_<mode>_3 which does not use an UNSPEC. Maybe eventually rewrite vsx_xxspltd_v2di to be a define_insn_and_split. > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c > > @@ -0,0 +1,36 @@ > > +/* { dg-do compile { target powerpc*-*-* } } */ > > Don't. This is gcc.target/powerpc/ already. Ok. > > +/* { dg-final { scan-assembler-times "xxpermdi" 4 } } */ > > \m \M Ok.
On Mon, Mar 28, 2022 at 06:30:41PM -0400, Michael Meissner wrote: > On Mon, Mar 28, 2022 at 12:14:09PM -0500, Segher Boessenkool wrote: > > On Mon, Mar 28, 2022 at 12:26:02PM -0400, Michael Meissner wrote: > > > However on power9 and power10 it generates: > > > > > > ;; vec_splats (vec_extract (src, 0)) > > > mfvsld 3,34 > > > mtvsrdd 34,9,9 > > > > > > ;; vec_splats (vec_extract (src, 1)) > > > mfvsrd 9,34 > > > mtvsrdd 34,9,9 > > > > > > This is due to the power9 having the mfvsrld instruction which can extract > > > either 64-bit element into a GPR. While there are alternatives for both > > > vector registers and GPR registers, the register allocator prefers to put > > > DImode into GPR registers. > > > > As I said in comment 2 in the PR, it is because we do not have this > > pattern yet, we only have vec_concat. The instruction combiner tries > > to use this pattern, but it doesn't exist :-) > > Ok. I will create that function and see if it works. That is what this patch *does*! *That* needs to be in the commit message! > > > + if (!BYTES_BIG_ENDIAN) > > > + which_word = 1 - which_word; > > > + > > > + operands[3] = GEN_INT (which_word ? 3 : 0); > > > + return "xxpermdi %x0,%x1,%x1,%3"; > > > > Please use gen_vsx_xxspltd_v2di, instead. Which itself should not use > > an unspec of course, but that is another patch. > > There is also vsx_concat_<mode>_3 which does not use an UNSPEC. Maybe > eventually rewrite vsx_xxspltd_v2di to be a define_insn_and_split. It almost always is a bad idea to have a 1->1 split: you are rewriting the canonical way to write something to something non-canonical. (Where "canonical" means "actually canonical, and/or what GCC generates anyway"). It doesn't save anything over an extra define_insn either, anyway? Segher
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 15bd86dfdfb..ddafbe471dd 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4616,6 +4616,25 @@ (define_insn "vsx_splat_v4si_di" [(set_attr "type" "vecperm") (set_attr "isa" "p8v,*")]) +;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element +;; PR target/99293 +(define_insn "*vsx_splat_const_extract_<mode>" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_duplicate:VSX_D + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(match_operand 2 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + int which_word = INTVAL (operands[2]); + if (!BYTES_BIG_ENDIAN) + which_word = 1 - which_word; + + operands[3] = GEN_INT (which_word ? 3 : 0); + return "xxpermdi %x0,%x1,%x1,%3"; +} + [(set_attr "type" "vecperm")]) + ;; V4SF splat (ISA 3.0) (define_insn_and_split "vsx_splat_v4sf" [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,wa") diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c b/gcc/testsuite/gcc.target/powerpc/pr99293.c new file mode 100644 index 00000000000..13b5ed5b0b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c @@ -0,0 +1,36 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test for PR 99263, which wants to do: + __builtin_vec_splats (__builtin_vec_extract (v, n)) + + where v is a V2DF or V2DI vector and n is either 0 or 1. Previously the + compiler would do a direct move to the GPR registers to select the item and + a direct move from the GPR registers to do the splat. */ + +vector long long +splat_dup_ll_0 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector long long +splat_dup_ll_1 (vector long long v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +vector double +splat_dup_d_0 (vector double v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); +} + +vector double +splat_dup_d_1 (vector double v) +{ + return __builtin_vec_splats (__builtin_vec_extract (v, 1)); +} + +/* { dg-final { scan-assembler-times "xxpermdi" 4 } } */