Message ID | a342d677-867e-e5a2-dd56-b6ba784c1d50@suse.com |
---|---|
State | New |
Headers | show |
Series | [v2] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F | expand |
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: Friday, June 16, 2023 2:22 PM > To: gcc-patches@gcc.gnu.org > Cc: Kirill Yukhin <kirill.yukhin@gmail.com>; Liu, Hongtao > <hongtao.liu@intel.com> > Subject: [PATCH v2] x86: make VPTERNLOG* usable on less than 512-bit > operands with just AVX512F > > There's no reason to constrain this to AVX512VL, unless instructed so by - > mprefer-vector-width=, as the wider operation is unusable for more narrow > operands only when the possible memory source is a non-broadcast one. > This way even the scalar copysign<mode>3 can benefit from the operation > being a single-insn one (leaving aside moves which the compiler decides to > insert for unclear reasons, and leaving aside the fact that > bcst_mem_operand() is too restrictive for broadcast to be embedded right > into VPTERNLOG*). > > Along with this also request value duplication in ix86_expand_copysign()'s > call to ix86_build_signbit_mask(), eliminating excess space allocation > in .rodata.*, filled with zeros which are never read. > > gcc/ > > * config/i386/i386-expand.cc (ix86_expand_copysign): Request > value duplication by ix86_build_signbit_mask() when AVX512F and > not HFmode. > * config/i386/sse.md (*<avx512>_vternlog<mode>_all): Convert to > 2-alternative form. Adjust "mode" attribute. Add "enabled" > attribute. > (*<avx512>_vpternlog<mode>_1): Also permit when > TARGET_AVX512F > && !TARGET_PREFER_AVX256. > (*<avx512>_vpternlog<mode>_2): Likewise. > (*<avx512>_vpternlog<mode>_3): Likewise. > --- > I guess the underlying pattern, going along the lines of what > <mask_codefor>one_cmpl<mode>2<mask_name> uses, can be applied > elsewhere as well. > > HFmode could use embedded broadcast too for copysign and alike, but that > would need to be V2HF -> V8HF (for which I don't think there are any existing > patterns). > --- > v2: Respect -mprefer-vector-width=. > > --- a/gcc/config/i386/i386-expand.cc > +++ b/gcc/config/i386/i386-expand.cc > @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) > else > dest = NULL_RTX; > op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); > - mask = ix86_build_signbit_mask (vmode, 0, 0); > + mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != > + HFmode, 0); > > if (CONST_DOUBLE_P (operands[1])) > { > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -12597,11 +12597,11 @@ > (set_attr "mode" "<sseinsnmode>")]) > > (define_insn "*<avx512>_vternlog<mode>_all" > - [(set (match_operand:V 0 "register_operand" "=v") > + [(set (match_operand:V 0 "register_operand" "=v,v") > (unspec:V > - [(match_operand:V 1 "register_operand" "0") > - (match_operand:V 2 "register_operand" "v") > - (match_operand:V 3 "bcst_vector_operand" "vmBr") > + [(match_operand:V 1 "register_operand" "0,0") > + (match_operand:V 2 "register_operand" "v,v") > + (match_operand:V 3 "bcst_vector_operand" "vBr,m") > (match_operand:SI 4 "const_0_to_255_operand")] > UNSPEC_VTERNLOG))] > "TARGET_AVX512F Change condition to <MODE_SIZE> == 64 || TARGET_AVX512VL || (TARGET_AVX512F && !TARGET_PREFER_AVX256) Also please add a testcase for case TARGET_AVX512F && !TARGET_PREFER_AVX256. > @@ -12609,10 +12609,22 @@ > it's not real AVX512FP16 instruction. */ > && (GET_MODE_SIZE (GET_MODE_INNER (<MODE>mode)) >= 4 > || GET_CODE (operands[3]) != VEC_DUPLICATE)" > - "vpternlog<ternlogsuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}" > +{ > + if (TARGET_AVX512VL) > + return "vpternlog<ternlogsuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}"; > + else > + return "vpternlog<ternlogsuffix>\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, > +%4}"; } > [(set_attr "type" "sselog") > (set_attr "prefix" "evex") > - (set_attr "mode" "<sseinsnmode>")]) > + (set (attr "mode") > + (if_then_else (match_test "TARGET_AVX512VL") > + (const_string "<sseinsnmode>") > + (const_string "XI"))) > + (set (attr "enabled") > + (if_then_else (eq_attr "alternative" "1") > + (symbol_ref "<MODE_SIZE> == 64 || TARGET_AVX512VL") > + (const_string "*")))]) > > ;; There must be lots of other combinations like ;; @@ -12641,7 +12653,8 > @@ > (any_logic2:V > (match_operand:V 3 "regmem_or_bitnot_regmem_operand") > (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] > - "(<MODE_SIZE> == 64 || TARGET_AVX512VL) > + "(<MODE_SIZE> == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > && ix86_pre_reload_split () > && (rtx_equal_p (STRIP_UNARY (operands[1]), > STRIP_UNARY (operands[4])) > @@ -12725,7 +12738,8 @@ > (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) > (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) > (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] > - "(<MODE_SIZE> == 64 || TARGET_AVX512VL) > + "(<MODE_SIZE> == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > && ix86_pre_reload_split () > && (rtx_equal_p (STRIP_UNARY (operands[1]), > STRIP_UNARY (operands[4])) > @@ -12808,7 +12822,8 @@ > (match_operand:V 1 "regmem_or_bitnot_regmem_operand") > (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) > (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] > - "(<MODE_SIZE> == 64 || TARGET_AVX512VL) > + "(<MODE_SIZE> == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) > && ix86_pre_reload_split ()" > "#" > "&& 1"
On 19.06.2023 04:07, Liu, Hongtao wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@suse.com> >> Sent: Friday, June 16, 2023 2:22 PM >> >> --- a/gcc/config/i386/sse.md >> +++ b/gcc/config/i386/sse.md >> @@ -12597,11 +12597,11 @@ >> (set_attr "mode" "<sseinsnmode>")]) >> >> (define_insn "*<avx512>_vternlog<mode>_all" >> - [(set (match_operand:V 0 "register_operand" "=v") >> + [(set (match_operand:V 0 "register_operand" "=v,v") >> (unspec:V >> - [(match_operand:V 1 "register_operand" "0") >> - (match_operand:V 2 "register_operand" "v") >> - (match_operand:V 3 "bcst_vector_operand" "vmBr") >> + [(match_operand:V 1 "register_operand" "0,0") >> + (match_operand:V 2 "register_operand" "v,v") >> + (match_operand:V 3 "bcst_vector_operand" "vBr,m") >> (match_operand:SI 4 "const_0_to_255_operand")] >> UNSPEC_VTERNLOG))] >> "TARGET_AVX512F > Change condition to <MODE_SIZE> == 64 || TARGET_AVX512VL || (TARGET_AVX512F && !TARGET_PREFER_AVX256) May I ask why you think this is necessary? The condition of the insn already wasn't in sync with the condition used in all three splitters, and I didn't see any reason why now they would need to be brought in sync. First and foremost because of the use of the UNSPEC (equally before and after this patch). Furthermore, isn't it the case that I'm already mostly expressing this with the "enabled" attribute? At the very least I think I should drop that again then if following your request? > Also please add a testcase for case TARGET_AVX512F && !TARGET_PREFER_AVX256. Especially in a case like this one I'm wondering about the usefulness of a contrived testcase: It won't test more than one minor sub-case of the whole set of constructs covered here. But well, here as well as for the other change I'll invent something. Jan
On Mon, Jun 19, 2023 at 3:09 PM Jan Beulich via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On 19.06.2023 04:07, Liu, Hongtao wrote: > >> -----Original Message----- > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: Friday, June 16, 2023 2:22 PM > >> > >> --- a/gcc/config/i386/sse.md > >> +++ b/gcc/config/i386/sse.md > >> @@ -12597,11 +12597,11 @@ > >> (set_attr "mode" "<sseinsnmode>")]) > >> > >> (define_insn "*<avx512>_vternlog<mode>_all" > >> - [(set (match_operand:V 0 "register_operand" "=v") > >> + [(set (match_operand:V 0 "register_operand" "=v,v") > >> (unspec:V > >> - [(match_operand:V 1 "register_operand" "0") > >> - (match_operand:V 2 "register_operand" "v") > >> - (match_operand:V 3 "bcst_vector_operand" "vmBr") > >> + [(match_operand:V 1 "register_operand" "0,0") > >> + (match_operand:V 2 "register_operand" "v,v") > >> + (match_operand:V 3 "bcst_vector_operand" "vBr,m") > >> (match_operand:SI 4 "const_0_to_255_operand")] > >> UNSPEC_VTERNLOG))] > >> "TARGET_AVX512F > > Change condition to <MODE_SIZE> == 64 || TARGET_AVX512VL || (TARGET_AVX512F && !TARGET_PREFER_AVX256) > > May I ask why you think this is necessary? The condition of the insn > already wasn't in sync with the condition used in all three splitters, I think it's a latent bug for the original *<avx512>_vternlog<mode>_all, it should be <MODE_SIZE> == 64 || TARGET_AVX512VL instead of TARGET_AVX512F, since TARGET_AVX512VL is needed for 128/256-bit vectors. The bug won't be exposed since the pattern is only generated by those 3 splitters which are guarded by TARGET_AVX512VL. But We can just fix this to make the pattern exactly correct. > and I didn't see any reason why now they would need to be brought in > sync. First and foremost because of the use of the UNSPEC (equally > before and after this patch). > > Furthermore, isn't it the case that I'm already mostly expressing > this with the "enabled" attribute? At the very least I think I > should drop that again then if following your request? You only handle alternative 1, but for alternative 0, it is still enabled when TARGET_PREFER_AVX256 && !TARGET_AVX512VL for 128/256-bit vectors. You don't need the drop that, alternative 1 still needs <MODE_SIZE> == 64 || TARGET_AVX512VL since memory_operand can't be operated with the zmm instruction for 128/256-bit vectors. > > > Also please add a testcase for case TARGET_AVX512F && !TARGET_PREFER_AVX256. > > Especially in a case like this one I'm wondering about the usefulness > of a contrived testcase: It won't test more than one minor sub-case of > the whole set of constructs covered here. But well, here as well as > for the other change I'll invent something. We don't need all sub-case, one is enough to guard that your optimization won't be corrupted by a later commit. .i.e typedef int v4si __attribute((vector_size(16))); v4si foo (v4si a, v4si b, v4si c) { return (a & b) | c; } We can now generate vpternlog with -mavx512f -O2 -mno-avx512vl -mprefer-vector-width=512. > > Jan
--- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2266,7 +2266,7 @@ ix86_expand_copysign (rtx operands[]) else dest = NULL_RTX; op1 = lowpart_subreg (vmode, force_reg (mode, operands[2]), mode); - mask = ix86_build_signbit_mask (vmode, 0, 0); + mask = ix86_build_signbit_mask (vmode, TARGET_AVX512F && mode != HFmode, 0); if (CONST_DOUBLE_P (operands[1])) { --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12597,11 +12597,11 @@ (set_attr "mode" "<sseinsnmode>")]) (define_insn "*<avx512>_vternlog<mode>_all" - [(set (match_operand:V 0 "register_operand" "=v") + [(set (match_operand:V 0 "register_operand" "=v,v") (unspec:V - [(match_operand:V 1 "register_operand" "0") - (match_operand:V 2 "register_operand" "v") - (match_operand:V 3 "bcst_vector_operand" "vmBr") + [(match_operand:V 1 "register_operand" "0,0") + (match_operand:V 2 "register_operand" "v,v") + (match_operand:V 3 "bcst_vector_operand" "vBr,m") (match_operand:SI 4 "const_0_to_255_operand")] UNSPEC_VTERNLOG))] "TARGET_AVX512F @@ -12609,10 +12609,22 @@ it's not real AVX512FP16 instruction. */ && (GET_MODE_SIZE (GET_MODE_INNER (<MODE>mode)) >= 4 || GET_CODE (operands[3]) != VEC_DUPLICATE)" - "vpternlog<ternlogsuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}" +{ + if (TARGET_AVX512VL) + return "vpternlog<ternlogsuffix>\t{%4, %3, %2, %0|%0, %2, %3, %4}"; + else + return "vpternlog<ternlogsuffix>\t{%4, %g3, %g2, %g0|%g0, %g2, %g3, %4}"; +} [(set_attr "type" "sselog") (set_attr "prefix" "evex") - (set_attr "mode" "<sseinsnmode>")]) + (set (attr "mode") + (if_then_else (match_test "TARGET_AVX512VL") + (const_string "<sseinsnmode>") + (const_string "XI"))) + (set (attr "enabled") + (if_then_else (eq_attr "alternative" "1") + (symbol_ref "<MODE_SIZE> == 64 || TARGET_AVX512VL") + (const_string "*")))]) ;; There must be lots of other combinations like ;; @@ -12641,7 +12653,8 @@ (any_logic2:V (match_operand:V 3 "regmem_or_bitnot_regmem_operand") (match_operand:V 4 "regmem_or_bitnot_regmem_operand"))))] - "(<MODE_SIZE> == 64 || TARGET_AVX512VL) + "(<MODE_SIZE> == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12725,7 +12738,8 @@ (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")) (match_operand:V 4 "regmem_or_bitnot_regmem_operand")))] - "(<MODE_SIZE> == 64 || TARGET_AVX512VL) + "(<MODE_SIZE> == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split () && (rtx_equal_p (STRIP_UNARY (operands[1]), STRIP_UNARY (operands[4])) @@ -12808,7 +12822,8 @@ (match_operand:V 1 "regmem_or_bitnot_regmem_operand") (match_operand:V 2 "regmem_or_bitnot_regmem_operand")) (match_operand:V 3 "regmem_or_bitnot_regmem_operand")))] - "(<MODE_SIZE> == 64 || TARGET_AVX512VL) + "(<MODE_SIZE> == 64 || TARGET_AVX512VL + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)) && ix86_pre_reload_split ()" "#" "&& 1"