diff mbox series

i386: Fix aes/vaes patterns [PR114576]

Message ID Zg5nzNtvhCoTxaBi@tucnak
State New
Headers show
Series i386: Fix aes/vaes patterns [PR114576] | expand

Commit Message

Jakub Jelinek April 4, 2024, 8:41 a.m. UTC
On Wed, Apr 19, 2023 at 02:40:59AM +0000, Jiang, Haochen via Gcc-patches wrote:
> > >  (define_insn "aesenc"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +                      (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >                       UNSPEC_AESENC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >    "@
> > >     aesenc\t{%2, %0|%0, %2}
> > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > >     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > TARGET_AVX512VL)" from condition.
> 
> Since VAES should not imply AES, we need that "|| (TARGET_VAES && 
> TARGET_AVX512VL)"
> 
> And there is no need to add vaes_avx512vl since the last alternative will only
> be hit when there is no aes. When there is no aes, the pattern will need vaes
> and avx512vl both or we could not use this pattern. avx512vl here is just like
> a placeholder.

As the following testcase shows, the above change was incorrect.

Using aes isa for the second alternative is obviously wrong, aes is enabled
whenever -maes is, regardless of -mavx or -mno-avx, so the above change
means that for -maes -mno-avx RA can choose, either it matches the first
alternative with the dup operand, or it matches the second one (but that
is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).

The big question is if "Since VAES should not imply AES" is the case or not.
Looking around at what LLVM does on godbolt, seems since clang 6 which added
-mvaes support -mvaes there implies -maes, but GCC treats those two
independent.

Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
imply -mno-vaes, then we should probably just revert the above patch and
tweak common/config/i386/ to do the implications (+ add the testcase from
this patch).

If we keep the current behavior, where AES and VAES are completely
independent extensions, then we need to do more changes as the following
patch attempts to do.
We should use the aesenc etc. insns for noavx as before, we know at that
point that TARGET_AES must be true because (TARGET_VAES && TARGET_AVX512VL)
won't be true when !TARGET_AVX - TARGET_AVX512VL implies TARGET_AVX.
For the second alternative, i.e. the AVX AES VEX encoded case, the patch
uses aes_avx isa which requires both.  Now, for the third one we can't
use avx512vl isa attribute, because one could compile with
-maes -mavx512vl -mno-vaes and in that case we want VEX encoded vaesenc
which can't use %xmm16+ (nor EGPRs), so we need vaes_avx512vl isa to
ensure it is enabled only for -mvaes -mavx512vl.  And there is another
problem, with -mno-aes -mvaes -mavx512vl we could emit VEX encoded vaesenc
which requires AES and AVX ISAs rather than the VAES and AVX512VL which
are enabled.  So the patch uses the {evex} prefix for those cases.
And similarly for the vaes*_<mode> instructions, if they aren't 128-bit
or use %xmm16+ registers, the current case is fine, but if they are 128-bit
and use only %xmm0-15 registers, assembler would again emit VEX encoded insn
which needs AES & AVX CPUID, rather than the EVEX encoded ones which need
VAES & AVX512VL CPUIDs.
Still, I wonder if -mvaes shouldn't imply at least -mavx512f and
-mno-avx512f shouldn't imply -mno-vaes, because otherwise can't see how
it could use 512-bit registers (this part not done in the patch).

The following patch has been successfully bootstrapped/regtested on
x86_64-linux and i686-linux.

2024-04-04  Jakub Jelinek  <jakub@redhat.com>

	PR target/114576
	* config/i386/i386.md (isa): Remove aes, add aes_avx, vaes_avx512vl.
	(enabled): Remove aes isa check, add aes_avx and vaes_avx512vl.
	* config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Add
	4th alternative, emit {evex} prefix for the third one, use
	noavx,aes_avx,vaes_avx512vl,vaes_avx512vl isa attribute, use jm
	rather than m constraint on the 2nd and 3rd alternative input.
	(vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>,
	vaesenclast_<mode>): Add second alternative with x instead of v
	and jm instead of m.

	* gcc.target/i386/aes-pr114576.c: New test.



	Jakub

Comments

Jiang, Haochen April 8, 2024, 12:33 p.m. UTC | #1
Hi Jakub,

Sorry for the late response since I am on vacation for now.

> As the following testcase shows, the above change was incorrect.
> 
> Using aes isa for the second alternative is obviously wrong, aes is enabled
> whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> means that for -maes -mno-avx RA can choose, either it matches the first
> alternative with the dup operand, or it matches the second one (but that
> is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).

When I wrote that patch, I suppose it will never match the second one when
AVX is not enabled because it will immediately drop to the first one so the
second one is automatically AES && AVX, which is tricky here.

But this patch is buggy when "-maes -mavx512vl -mno-vaes" with %xmm16+ so
your change is needed, really appreciate that.

> 
> The big question is if "Since VAES should not imply AES" is the case or not.
> Looking around at what LLVM does on godbolt, seems since clang 6 which added
> -mvaes support -mvaes there implies -maes, but GCC treats those two
> independent.
> 
> Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
> imply -mno-vaes, then we should probably just revert the above patch and
> tweak common/config/i386/ to do the implications (+ add the testcase from
> this patch).

LLVM always had less restrictions on ISA under such circumstances, I would like to
stick to how SDM did when implementing that, which is a little conservative.

However, I am also ok with VAES implying AES if there is no real HW that has
VAES w/o AES to reduce complexity in this scenario.

Thx,
Haochen
Jakub Jelinek April 8, 2024, 12:43 p.m. UTC | #2
On Mon, Apr 08, 2024 at 12:33:39PM +0000, Jiang, Haochen wrote:
> Sorry for the late response since I am on vacation for now.
> 
> > As the following testcase shows, the above change was incorrect.
> > 
> > Using aes isa for the second alternative is obviously wrong, aes is enabled
> > whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> > means that for -maes -mno-avx RA can choose, either it matches the first
> > alternative with the dup operand, or it matches the second one (but that
> > is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).
> 
> When I wrote that patch, I suppose it will never match the second one when
> AVX is not enabled because it will immediately drop to the first one so the
> second one is automatically AES && AVX, which is tricky here.

Before the -mvaes changes the alternatives were noavx,avx isa and so clearly
it was either the first alternative is the solely available, or the second,
depending on TARGET_AVX.  But with noavx,aes on the first alternative is
enabled only for !TARGET_AVX, but the second one whenever TARGET_AES, which
is both if !TARGET_AVX and TARGET_AVX.  So, the RA is free to consider both
alternatives, and because the first one is more restrictive (requires
output matching input), if there is a match between those, it will use the
first alternative, but if there isn't, it will happily use the second
alternative.

> LLVM always had less restrictions on ISA under such circumstances, I would like to
> stick to how SDM did when implementing that, which is a little conservative.
> 
> However, I am also ok with VAES implying AES if there is no real HW that has
> VAES w/o AES to reduce complexity in this scenario.

I'm fine with -mvaes not implying -maes, just want to mention that it is
fairly user visible thing and so we shouldn't be changing it after deciding
if we do it one way or another.  Now, I thought -mvaes was added in GCC 14,
but it has been around for a few years, so that means it is likely a bad
idea to change it now.

	Jakub
Jiang, Haochen April 8, 2024, 12:46 p.m. UTC | #3
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Monday, April 8, 2024 9:43 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: Hongtao Liu <crazylht@gmail.com>; gcc-patches@gcc.gnu.org; Liu, Hongtao
> <hongtao.liu@intel.com>; ubizjak@gmail.com
> Subject: Re: [PATCH] i386: Fix aes/vaes patterns [PR114576]
> 
> On Mon, Apr 08, 2024 at 12:33:39PM +0000, Jiang, Haochen wrote:
> > Sorry for the late response since I am on vacation for now.
> >
> > > As the following testcase shows, the above change was incorrect.
> > >
> > > Using aes isa for the second alternative is obviously wrong, aes is enabled
> > > whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> > > means that for -maes -mno-avx RA can choose, either it matches the first
> > > alternative with the dup operand, or it matches the second one (but that
> > > is of course wrong because vaesenc VEX encoded insn needs AES & AVX
> CPUID).
> >
> > When I wrote that patch, I suppose it will never match the second one when
> > AVX is not enabled because it will immediately drop to the first one so the
> > second one is automatically AES && AVX, which is tricky here.
> 
> Before the -mvaes changes the alternatives were noavx,avx isa and so clearly
> it was either the first alternative is the solely available, or the second,
> depending on TARGET_AVX.  But with noavx,aes on the first alternative is
> enabled only for !TARGET_AVX, but the second one whenever TARGET_AES, which
> is both if !TARGET_AVX and TARGET_AVX.  So, the RA is free to consider both
> alternatives, and because the first one is more restrictive (requires
> output matching input), if there is a match between those, it will use the
> first alternative, but if there isn't, it will happily use the second
> alternative.
> 

Aha, I see. Thanks for the explanation.

Thx,
Haochen
Hongtao Liu April 9, 2024, 3:23 a.m. UTC | #4
On Thu, Apr 4, 2024 at 4:42 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Apr 19, 2023 at 02:40:59AM +0000, Jiang, Haochen via Gcc-patches wrote:
> > > >  (define_insn "aesenc"
> > > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > > +                      (match_operand:V2DI 2 "vector_operand"
> > > > + "xBm,xm,vm")]
> > > >                       UNSPEC_AESENC))]
> > > > -  "TARGET_AES"
> > > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > > >    "@
> > > >     aesenc\t{%2, %0|%0, %2}
> > > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > > >     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > > -  [(set_attr "isa" "noavx,avx")
> > > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > > TARGET_AVX512VL)" from condition.
> >
> > Since VAES should not imply AES, we need that "|| (TARGET_VAES &&
> > TARGET_AVX512VL)"
> >
> > And there is no need to add vaes_avx512vl since the last alternative will only
> > be hit when there is no aes. When there is no aes, the pattern will need vaes
> > and avx512vl both or we could not use this pattern. avx512vl here is just like
> > a placeholder.
>
> As the following testcase shows, the above change was incorrect.
>
> Using aes isa for the second alternative is obviously wrong, aes is enabled
> whenever -maes is, regardless of -mavx or -mno-avx, so the above change
> means that for -maes -mno-avx RA can choose, either it matches the first
> alternative with the dup operand, or it matches the second one (but that
> is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).
>
> The big question is if "Since VAES should not imply AES" is the case or not.
> Looking around at what LLVM does on godbolt, seems since clang 6 which added
> -mvaes support -mvaes there implies -maes, but GCC treats those two
> independent.
>
> Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
> imply -mno-vaes, then we should probably just revert the above patch and
> tweak common/config/i386/ to do the implications (+ add the testcase from
> this patch).
>
> If we keep the current behavior, where AES and VAES are completely
> independent extensions, then we need to do more changes as the following
> patch attempts to do.
> We should use the aesenc etc. insns for noavx as before, we know at that
> point that TARGET_AES must be true because (TARGET_VAES && TARGET_AVX512VL)
> won't be true when !TARGET_AVX - TARGET_AVX512VL implies TARGET_AVX.
> For the second alternative, i.e. the AVX AES VEX encoded case, the patch
> uses aes_avx isa which requires both.  Now, for the third one we can't
> use avx512vl isa attribute, because one could compile with
> -maes -mavx512vl -mno-vaes and in that case we want VEX encoded vaesenc
> which can't use %xmm16+ (nor EGPRs), so we need vaes_avx512vl isa to
> ensure it is enabled only for -mvaes -mavx512vl.  And there is another
> problem, with -mno-aes -mvaes -mavx512vl we could emit VEX encoded vaesenc
> which requires AES and AVX ISAs rather than the VAES and AVX512VL which
> are enabled.  So the patch uses the {evex} prefix for those cases.
> And similarly for the vaes*_<mode> instructions, if they aren't 128-bit
> or use %xmm16+ registers, the current case is fine, but if they are 128-bit
> and use only %xmm0-15 registers, assembler would again emit VEX encoded insn
> which needs AES & AVX CPUID, rather than the EVEX encoded ones which need
> VAES & AVX512VL CPUIDs.
> Still, I wonder if -mvaes shouldn't imply at least -mavx512f and
> -mno-avx512f shouldn't imply -mno-vaes, because otherwise can't see how
> it could use 512-bit registers (this part not done in the patch).
>
> The following patch has been successfully bootstrapped/regtested on
> x86_64-linux and i686-linux.
>
> 2024-04-04  Jakub Jelinek  <jakub@redhat.com>
>
>         PR target/114576
>         * config/i386/i386.md (isa): Remove aes, add aes_avx, vaes_avx512vl.
>         (enabled): Remove aes isa check, add aes_avx and vaes_avx512vl.
>         * config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Add
>         4th alternative, emit {evex} prefix for the third one, use
>         noavx,aes_avx,vaes_avx512vl,vaes_avx512vl isa attribute, use jm
>         rather than m constraint on the 2nd and 3rd alternative input.
>         (vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>,
>         vaesenclast_<mode>): Add second alternative with x instead of v
>         and jm instead of m.
>
>         * gcc.target/i386/aes-pr114576.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2024-03-18 22:15:43.165839479 +0100
> +++ gcc/config/i386/i386.md     2024-04-04 00:48:46.575511556 +0200
> @@ -568,13 +568,14 @@ (define_attr "unit" "integer,i387,sse,mm
>
>  ;; Used to control the "enabled" attribute on a per-instruction basis.
>  (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
> -                   x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
> +                   x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
>                     sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
>                     avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
>                     noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
>                     noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
>                     avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
> -                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl"
> +                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
> +                   aes_avx,vaes_avx512vl"
>    (const_string "base"))
>
>  ;; The (bounding maximum) length of an instruction immediate.
> @@ -915,7 +916,6 @@ (define_attr "enabled" ""
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
>          (eq_attr "isa" "x64_avx512dq")
>            (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
> -        (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
>          (eq_attr "isa" "sse_noavx")
>            (symbol_ref "TARGET_SSE && !TARGET_AVX")
>          (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
> @@ -968,6 +968,10 @@ (define_attr "enabled" ""
>            (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
>          (eq_attr "isa" "apx_ndd")
>            (symbol_ref "TARGET_APX_NDD")
> +        (eq_attr "isa" "aes_avx")
> +          (symbol_ref "TARGET_AES && TARGET_AVX")
> +        (eq_attr "isa" "vaes_avx512vl")
> +          (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
>
>          (eq_attr "mmx_isa" "native")
>            (symbol_ref "!TARGET_MMX_WITH_SSE")
> --- gcc/config/i386/sse.md.jj   2024-03-18 22:15:43.168839437 +0100
> +++ gcc/config/i386/sse.md      2024-04-04 00:58:56.482090689 +0200
> @@ -26277,75 +26277,79 @@ (define_insn "xop_vpermil2<mode>3"
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>
>  (define_insn "aesenc"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESENC))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenc\t{%2, %0|%0, %2}
>     vaesenc\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}
I think we can merge alternative 2 with 3 to
*  return TARGET_AES ? \"vaesenc\t{%2, %1, %0|%0, %1, %2}"\" :
\"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";
Then it can handle vaes_avx512vl + -mno-aes case.
>     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
>     (set_attr "type" "sselog1")
> -   (set_attr "addr" "gpr16,*,*")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesenclast"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESENCLAST))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesenclast\t{%2, %0|%0, %2}
>     vaesenclast\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}
Ditto.
>     vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
>     (set_attr "type" "sselog1")
> -   (set_attr "addr" "gpr16,*,*")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdec"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESDEC))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdec\t{%2, %0|%0, %2}
>     vaesdec\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}
Ditto.
>     vaesdec\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
>     (set_attr "type" "sselog1")
> -   (set_attr "addr" "gpr16,*,*")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesdeclast"
> -  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> -                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
> +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
> +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
> +                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
>                       UNSPEC_AESDECLAST))]
>    "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
>    "@
>     aesdeclast\t{%2, %0|%0, %2}
>     vaesdeclast\t{%2, %1, %0|%0, %1, %2}
> +   %{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}
Ditto.
>     vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "isa" "noavx,aes,avx512vl")
> -   (set_attr "addr" "gpr16,*,*")
> +  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
> +   (set_attr "addr" "gpr16,*,*,*")
>     (set_attr "type" "sselog1")
>     (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,vex,evex")
> -   (set_attr "btver2_decode" "double,double,double")
> +   (set_attr "prefix" "orig,vex,evex,evex")
> +   (set_attr "btver2_decode" "double,double,double,double")
>     (set_attr "mode" "TI")])
>
>  (define_insn "aesimc"
> @@ -30246,44 +30250,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
>     [(set_attr ("prefix") ("evex"))])
>
>  (define_insn "vaesdec_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESDEC))]
>    "TARGET_VAES"
> -  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
Similar, but something like
*  return TARGET_AES || <MODE>mode != V16QImode ? \"vaesenc\t{%2, %1,
%0|%0, %1, %2}"\" : \"%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}\";

> +  else
> +    return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesdeclast_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESDECLAST))]
>    "TARGET_VAES"
> -  "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
Ditto.
> +  else
> +    return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesenc_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESENC))]
>    "TARGET_VAES"
> -  "vaesenc\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
Ditto.
> +  else
> +    return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vaesenclast_<mode>"
> -  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> +  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
>         (unspec:VI1_AVX512VL_F
> -         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
> +         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
> +          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
>           UNSPEC_VAESENCLAST))]
>    "TARGET_VAES"
> -  "vaesenclast\t{%2, %1, %0|%0, %1, %2}"
> -)
> +{
> +  if (which_alternative == 0 && <MODE>mode == V16QImode)
> +    return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
Ditto.
> +  else
> +    return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";
> +})
>
>  (define_insn "vpclmulqdq_<mode>"
>    [(set (match_operand:VI8_FVL 0 "register_operand" "=v")
> --- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj     2024-04-04 09:50:17.117757179 +0200
> +++ gcc/testsuite/gcc.target/i386/aes-pr114576.c        2024-04-04 09:51:45.211544801 +0200
> @@ -0,0 +1,63 @@
> +/* PR target/114576 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -maes -mno-avx" } */
> +/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */
> +/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */
> +/* { dg-final { scan-assembler-not "\tvaesenc" } } */
> +/* { dg-final { scan-assembler-not "\tvaesdec" } } */
> +
> +#include <immintrin.h>
> +
> +__m128i
> +f1 (__m128i x, __m128i y)
> +{
> +  return _mm_aesenc_si128 (x, y);
> +}
> +
> +__m128i
> +f2 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesenc_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f3 (__m128i x, __m128i y)
> +{
> +  return _mm_aesdec_si128 (x, y);
> +}
> +
> +__m128i
> +f4 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesdec_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f5 (__m128i x, __m128i y)
> +{
> +  return _mm_aesenclast_si128 (x, y);
> +}
> +
> +__m128i
> +f6 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesenclast_si128 (x, y);
> +  return z + x + y;
> +}
> +
> +__m128i
> +f7 (__m128i x, __m128i y)
> +{
> +  return _mm_aesdeclast_si128 (x, y);
> +}
> +
> +__m128i
> +f8 (__m128i x, __m128i y)
> +{
> +  __m128i z = _mm_aesdeclast_si128 (x, y);
> +  return z + x + y;
> +}
>
>
>         Jakub
>
diff mbox series

Patch

--- gcc/config/i386/i386.md.jj	2024-03-18 22:15:43.165839479 +0100
+++ gcc/config/i386/i386.md	2024-04-04 00:48:46.575511556 +0200
@@ -568,13 +568,14 @@  (define_attr "unit" "integer,i387,sse,mm
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-		    x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
+		    x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
 		    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
 		    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
 		    noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
 		    noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
 		    avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
-		    avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl"
+		    avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
+		    aes_avx,vaes_avx512vl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -915,7 +916,6 @@  (define_attr "enabled" ""
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
 	 (eq_attr "isa" "x64_avx512dq")
 	   (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
-	 (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
 	 (eq_attr "isa" "sse_noavx")
 	   (symbol_ref "TARGET_SSE && !TARGET_AVX")
 	 (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
@@ -968,6 +968,10 @@  (define_attr "enabled" ""
 	   (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
 	 (eq_attr "isa" "apx_ndd")
 	   (symbol_ref "TARGET_APX_NDD")
+	 (eq_attr "isa" "aes_avx")
+	   (symbol_ref "TARGET_AES && TARGET_AVX")
+	 (eq_attr "isa" "vaes_avx512vl")
+	   (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
 
 	 (eq_attr "mmx_isa" "native")
 	   (symbol_ref "!TARGET_MMX_WITH_SSE")
--- gcc/config/i386/sse.md.jj	2024-03-18 22:15:43.168839437 +0100
+++ gcc/config/i386/sse.md	2024-04-04 00:58:56.482090689 +0200
@@ -26277,75 +26277,79 @@  (define_insn "xop_vpermil2<mode>3"
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "aesenc"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESENC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenc\t{%2, %0|%0, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesenclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESENCLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenclast\t{%2, %0|%0, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double") 
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesdec"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESDEC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdec\t{%2, %0|%0, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double") 
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdeclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-		       (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+	(unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+		       (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
 		      UNSPEC_AESDECLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdeclast\t{%2, %0|%0, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
-   (set_attr "addr" "gpr16,*,*")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesimc"
@@ -30246,44 +30250,60 @@  (define_insn "vpdpwssds_<mode>_maskz_1"
    [(set_attr ("prefix") ("evex"))])
 
 (define_insn "vaesdec_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESDEC))]
   "TARGET_VAES"
-  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesdeclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESDECLAST))]
   "TARGET_VAES"
-  "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenc_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESENC))]
   "TARGET_VAES"
-  "vaesenc\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
 	(unspec:VI1_AVX512VL_F
-	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+	  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+	   (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
 	  UNSPEC_VAESENCLAST))]
   "TARGET_VAES"
-  "vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vpclmulqdq_<mode>"
   [(set (match_operand:VI8_FVL 0 "register_operand" "=v")
--- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj	2024-04-04 09:50:17.117757179 +0200
+++ gcc/testsuite/gcc.target/i386/aes-pr114576.c	2024-04-04 09:51:45.211544801 +0200
@@ -0,0 +1,63 @@ 
+/* PR target/114576 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -maes -mno-avx" } */
+/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */
+/* { dg-final { scan-assembler-not "\tvaesenc" } } */
+/* { dg-final { scan-assembler-not "\tvaesdec" } } */
+
+#include <immintrin.h>
+
+__m128i
+f1 (__m128i x, __m128i y)
+{
+  return _mm_aesenc_si128 (x, y);
+}
+
+__m128i
+f2 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenc_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f3 (__m128i x, __m128i y)
+{
+  return _mm_aesdec_si128 (x, y);
+}
+
+__m128i
+f4 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdec_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f5 (__m128i x, __m128i y)
+{
+  return _mm_aesenclast_si128 (x, y);
+}
+
+__m128i
+f6 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenclast_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f7 (__m128i x, __m128i y)
+{
+  return _mm_aesdeclast_si128 (x, y);
+}
+
+__m128i
+f8 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdeclast_si128 (x, y);
+  return z + x + y;
+}