diff mbox

[i386,AVX512] PR target/69228: Restrict default masks for prefetch gathers/scatters instructions.

Message ID 20160112115706.GA63858@msticlxl57.ims.intel.com
State New
Headers show

Commit Message

Alexander Fomin Jan. 12, 2016, 11:57 a.m. UTC
This patch addresses PR target/69228. Expanding non-mask builtins
for prefetch gather/scatter insns results in using default mask.
Although Intel ISA Extensions Programming Reference statement about
EVEX.aaa field in prefetch gather/scatter insns encoding is a bit
opaque, no default mask is allowed for that family.

Bootstrapped and regtested on x86_64-linux-gnu. OK for trunk?

Thanks,
Alexander
---
gcc/

	PR target/69228
	* config/i386/sse.md (define_expand "avx512pf_gatherpf<mode>sf"):
	Change first operand predicate from register_or_constm1_operand
	to register_operand.
	(define_expand "avx512pf_gatherpf<mode>df"): Likewise.
	(define_expand "avx512pf_scatterpf<mode>sf"): Likewise.
	(define_expand "avx512pf_scatterpf<mode>df"): Likewise.
	(define_insn "*avx512pf_gatherpf<mode>sf"): Remove.
	(define_insn "*avx512pf_gatherpf<mode>df"): Likewise.
	(define_insn "*avx512pf_scatterpf<mode>sf"): Likewise.
	(define_insn "*avx512pf_scatterpf<mode>df"): Likewise.
	* config/i386/i386.c (ix86_expand_builtin): Remove first operand
	comparison with constm1_rtx from vec_prefetch_gen part.

gcc/testsuite

	PR target/69228
	* gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Adjust.
	* gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Likewise.
	* gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Likewise.
	* gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Likewise.
	* gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Likewise.
	* gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Likewise.
	* gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Likewise.
	* gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Likewise.
---
 gcc/config/i386/i386.c                             |   5 +-
 gcc/config/i386/sse.md                             | 120 +--------------------
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c    |   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0dps-1.c    |   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c    |   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0qps-1.c    |   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c    |   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1dps-1.c    |   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c    |   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1qps-1.c    |   3 +-
 10 files changed, 14 insertions(+), 135 deletions(-)

Comments

Kirill Yukhin Jan. 13, 2016, 12:41 p.m. UTC | #1
Hello Sasha,
On 12 Jan 14:57, Alexander Fomin wrote:
> This patch addresses PR target/69228. Expanding non-mask builtins
> for prefetch gather/scatter insns results in using default mask.
> Although Intel ISA Extensions Programming Reference statement about
> EVEX.aaa field in prefetch gather/scatter insns encoding is a bit
> opaque, no default mask is allowed for that family.
> 
> Bootstrapped and regtested on x86_64-linux-gnu. OK for trunk?
That's right, there's no such a thing like default-masked
gather/scatter prefetch.
Your patch is OK for main trunk, thanks.

--
K
Alexander Fomin Jan. 13, 2016, 2:45 p.m. UTC | #2
Checked into main trunk.
Could you please backport it to 5-branch?

Regards,
Alexander

On Wed, Jan 13, 2016 at 03:41:26PM +0300, Kirill Yukhin wrote:
> Hello Sasha,
> On 12 Jan 14:57, Alexander Fomin wrote:
> > This patch addresses PR target/69228. Expanding non-mask builtins
> > for prefetch gather/scatter insns results in using default mask.
> > Although Intel ISA Extensions Programming Reference statement about
> > EVEX.aaa field in prefetch gather/scatter insns encoding is a bit
> > opaque, no default mask is allowed for that family.
> > 
> > Bootstrapped and regtested on x86_64-linux-gnu. OK for trunk?
> That's right, there's no such a thing like default-masked
> gather/scatter prefetch.
> Your patch is OK for main trunk, thanks.
> 
> --
> K
Alexander Fomin Jan. 14, 2016, 12:08 p.m. UTC | #3
Now bootstrapped and regtested against GCC v5.
OK for 5-branch thus?

Regards,
Alexander

On Wed, Jan 13, 2016 at 03:41:26PM +0300, Kirill Yukhin wrote:
> Hello Sasha,
> On 12 Jan 14:57, Alexander Fomin wrote:
> > This patch addresses PR target/69228. Expanding non-mask builtins
> > for prefetch gather/scatter insns results in using default mask.
> > Although Intel ISA Extensions Programming Reference statement about
> > EVEX.aaa field in prefetch gather/scatter insns encoding is a bit
> > opaque, no default mask is allowed for that family.
> > 
> > Bootstrapped and regtested on x86_64-linux-gnu. OK for trunk?
> That's right, there's no such a thing like default-masked
> gather/scatter prefetch.
> Your patch is OK for main trunk, thanks.
> 
> --
> K
Kirill Yukhin Jan. 14, 2016, 3:14 p.m. UTC | #4
Hello Sasha,
On 14 Jan 15:08, Alexander Fomin wrote:
> Now bootstrapped and regtested against GCC v5.
> OK for 5-branch thus?
Your patch is OK for gcc-5 branch.

--
Thanks, K
> 
> Regards,
> Alexander
diff mbox

Patch

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aac0847..c37eb74 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41821,13 +41821,12 @@  rdseed_step:
 
       op0 = fixup_modeless_constant (op0, mode0);
 
-      if (GET_MODE (op0) == mode0
-	  || (GET_MODE (op0) == VOIDmode && op0 != constm1_rtx))
+      if (GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
 	{
 	  if (!insn_data[icode].operand[0].predicate (op0, mode0))
 	    op0 = copy_to_mode_reg (mode0, op0);
 	}
-      else if (op0 != constm1_rtx)
+      else
 	{
 	  op0 = copy_to_reg (op0);
 	  op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 278dd38..b96be36 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15674,7 +15674,7 @@ 
 
 (define_expand "avx512pf_gatherpf<mode>sf"
   [(unspec
-     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+     [(match_operand:<avx512fmaskmode> 0 "register_operand")
       (mem:<GATHER_SCATTER_SF_MEM_MODE>
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
@@ -15716,37 +15716,10 @@ 
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpf<mode>sf"
-  [(unspec
-     [(const_int -1)
-      (match_operator:<GATHER_SCATTER_SF_MEM_MODE> 4 "vsib_mem_operator"
-	[(unspec:P
-	   [(match_operand:P 1 "vsib_address_operand" "Tv")
-	    (match_operand:VI48_512 0 "register_operand" "v")
-	    (match_operand:SI 2 "const1248_operand" "n")]
-	   UNSPEC_VSIBADDR)])
-      (match_operand:SI 3 "const_2_to_3_operand" "n")]
-     UNSPEC_GATHER_PREFETCH)]
-  "TARGET_AVX512PF"
-{
-  switch (INTVAL (operands[3]))
-    {
-    case 3:
-      return "vgatherpf0<ssemodesuffix>ps\t{%4|%4}";
-    case 2:
-      return "vgatherpf1<ssemodesuffix>ps\t{%4|%4}";
-    default:
-      gcc_unreachable ();
-    }
-}
-  [(set_attr "type" "sse")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "XI")])
-
 ;; Packed double variants
 (define_expand "avx512pf_gatherpf<mode>df"
   [(unspec
-     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+     [(match_operand:<avx512fmaskmode> 0 "register_operand")
       (mem:V8DF
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
@@ -15788,37 +15761,10 @@ 
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpf<mode>df"
-  [(unspec
-     [(const_int -1)
-      (match_operator:V8DF 4 "vsib_mem_operator"
-	[(unspec:P
-	   [(match_operand:P 1 "vsib_address_operand" "Tv")
-	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
-	    (match_operand:SI 2 "const1248_operand" "n")]
-	   UNSPEC_VSIBADDR)])
-      (match_operand:SI 3 "const_2_to_3_operand" "n")]
-     UNSPEC_GATHER_PREFETCH)]
-  "TARGET_AVX512PF"
-{
-  switch (INTVAL (operands[3]))
-    {
-    case 3:
-      return "vgatherpf0<ssemodesuffix>pd\t{%4|%4}";
-    case 2:
-      return "vgatherpf1<ssemodesuffix>pd\t{%4|%4}";
-    default:
-      gcc_unreachable ();
-    }
-}
-  [(set_attr "type" "sse")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "XI")])
-
 ;; Packed float variants
 (define_expand "avx512pf_scatterpf<mode>sf"
   [(unspec
-     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+     [(match_operand:<avx512fmaskmode> 0 "register_operand")
       (mem:<GATHER_SCATTER_SF_MEM_MODE>
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
@@ -15862,39 +15808,10 @@ 
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_scatterpf<mode>sf"
-  [(unspec
-     [(const_int -1)
-      (match_operator:<GATHER_SCATTER_SF_MEM_MODE> 4 "vsib_mem_operator"
-	[(unspec:P
-	   [(match_operand:P 1 "vsib_address_operand" "Tv")
-	    (match_operand:VI48_512 0 "register_operand" "v")
-	    (match_operand:SI 2 "const1248_operand" "n")]
-	   UNSPEC_VSIBADDR)])
-      (match_operand:SI 3 "const2367_operand" "n")]
-     UNSPEC_SCATTER_PREFETCH)]
-  "TARGET_AVX512PF"
-{
-  switch (INTVAL (operands[3]))
-    {
-    case 3:
-    case 7:
-      return "vscatterpf0<ssemodesuffix>ps\t{%4|%4}";
-    case 2:
-    case 6:
-      return "vscatterpf1<ssemodesuffix>ps\t{%4|%4}";
-    default:
-      gcc_unreachable ();
-    }
-}
-  [(set_attr "type" "sse")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "XI")])
-
 ;; Packed double variants
 (define_expand "avx512pf_scatterpf<mode>df"
   [(unspec
-     [(match_operand:<avx512fmaskmode> 0 "register_or_constm1_operand")
+     [(match_operand:<avx512fmaskmode> 0 "register_operand")
       (mem:V8DF
 	(match_par_dup 5
 	  [(match_operand 2 "vsib_address_operand")
@@ -15938,35 +15855,6 @@ 
    (set_attr "prefix" "evex")
    (set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_scatterpf<mode>df"
-  [(unspec
-     [(const_int -1)
-      (match_operator:V8DF 4 "vsib_mem_operator"
-	[(unspec:P
-	   [(match_operand:P 1 "vsib_address_operand" "Tv")
-	    (match_operand:VI4_256_8_512 0 "register_operand" "v")
-	    (match_operand:SI 2 "const1248_operand" "n")]
-	   UNSPEC_VSIBADDR)])
-      (match_operand:SI 3 "const2367_operand" "n")]
-     UNSPEC_SCATTER_PREFETCH)]
-  "TARGET_AVX512PF"
-{
-  switch (INTVAL (operands[3]))
-    {
-    case 3:
-    case 7:
-      return "vscatterpf0<ssemodesuffix>pd\t{%4|%4}";
-    case 2:
-    case 6:
-      return "vscatterpf1<ssemodesuffix>pd\t{%4|%4}";
-    default:
-      gcc_unreachable ();
-    }
-}
-  [(set_attr "type" "sse")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "XI")])
-
 (define_insn "avx512er_exp2<mode><mask_name><round_saeonly_name>"
   [(set (match_operand:VF_512 0 "register_operand" "=v")
 	(unspec:VF_512
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
index ace50de..5a153ea 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dpd-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0dpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 #include <immintrin.h>
 
 volatile __m256i idx;
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dps-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dps-1.c
index d648b2ee9..d1173a2 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dps-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0dps-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf0dps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf0dps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0dps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
index d32345c..67529e7 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qpd-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0qpd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qps-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qps-1.c
index 44c908f..9ff580f 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qps-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf0qps-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf0qps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf0qps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf0qps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
index ff38338..73a029d 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dpd-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1dpd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dps-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dps-1.c
index 8ec3388..439bc853 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dps-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1dps-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf1dps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf1dps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1dps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
index 2c4eb2a..3ae16cd 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qpd-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1qpd\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 
 #include <immintrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qps-1.c b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qps-1.c
index 34bcb65..35cd7d3 100644
--- a/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qps-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512pf-vscatterpf1qps-1.c
@@ -1,7 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-options "-mavx512pf -O2" } */
-/* { dg-final { scan-assembler-times "vscatterpf1qps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vscatterpf1qps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vscatterpf1qps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 2 } } */
 
 #include <immintrin.h>