diff mbox

[i386,AVX512] Match latest spec. Add CPUID prefetchwt1.

Message ID 20140225091323.GA31394@msticlxl7.ims.intel.com
State New
Headers show

Commit Message

Ilya Tocar Feb. 25, 2014, 9:13 a.m. UTC
On 21 Feb 18:35, Uros Bizjak wrote:
> On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar <tocarip.intel@gmail.com> wrote:
> >> > Latest version of AVX512 spec
> >> > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
> >> > Has a few changes.
> >> >
> >> > 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
> >> > We can either support new CPUID or disable PREFETCHWT1 from generating,
> >> > without removing code, and enable it in 4.9.1/latest version.
> >> > I am not sure that adding new -m flag and related stuff this late
> >> > is a good idea. Should still add it?
> >>
> >> Please submit the patch anyway. We can relax release constraints on
> >> non-algorithmic patch a bit, weighting in benefits of having gcc
> >> release that fully conforms to some published specification.
> >>
> > Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
> > and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
> > Ok for trunk?
> >

> >         * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
> 
> Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
> g++.dg/other/i386-{2,3} and new options to
> gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
> repost the patch.
>

I've added new switch to those tests. However when I add prefetchwt1
to pragma GCC target ("sse") sse-22a.c test fails with:
pmmintrin.h: In function ‘_mm_loaddup_pd’:
emmintrin.h:119:1: error: inlining failed in call to always_inline
‘_mm_load1_pd’: target specific option mismatch

I've checked and this isn't a problem with prefetchwt1. I get the same
error when I add any other option (e. g. sha) to #pragma GCC target ("sse").
So I haven't added anything there. As that was the only fail,
I'm reposting this patch.

ChangeLog for GCC:

	* common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
	(OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
	(ix86_handle_option): Handle OPT_mprefetchwt1.
	* config/i386/cpuid.h (bit_PREFETCHWT1): New.
	* config/i386/driver-i386.c (host_detect_local_cpu): Detect
	PREFETCHWT1 CPUID.
	* config/i386/i386-c.c (ix86_target_macros_internal): Handle
	OPTION_MASK_ISA_PREFETCHWT1.
	* config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
	(PTA_PREFETCHWT1): New.
	(ix86_option_override_internal): Handle PTA_PREFETCHWT1.
	(ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
	* config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
	  New.
	* config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
	(*prefetch_avx512pf_<mode>_: Change into ...
	 (*prefetch_prefetchwt1_<mode>: This.
	* config/i386/i386.opt (mprefetchwt1): New.
	* config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
	(_mm_prefetch): Handle intent to write.
	* doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.

ChangeLog for tests:

	* gcc.target/i386/avx-1.c: Update __builtin_prefetch.
	* gcc.target/i386/prefetchwt1-1.c: New.
	* g++.dg/other/i386-2.C: Add new option.
	* g++.dg/other/i386-3.C: Ditto.
	* gcc.target/i386/sse-12.c: Ditto.
	* gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option.
	* gcc.target/i386/sse-22.c: Add new option.
	* gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option.

---
 gcc/common/config/i386/i386-common.c          | 15 +++++++++++++++
 gcc/config/i386/cpuid.h                       |  4 ++++
 gcc/config/i386/driver-i386.c                 |  7 +++++--
 gcc/config/i386/i386-c.c                      |  2 ++
 gcc/config/i386/i386.c                        |  6 ++++++
 gcc/config/i386/i386.h                        |  2 ++
 gcc/config/i386/i386.md                       | 13 ++++++-------
 gcc/config/i386/i386.opt                      |  4 ++++
 gcc/config/i386/xmmintrin.h                   |  6 ++++--
 gcc/doc/invoke.texi                           |  4 +++-
 gcc/testsuite/g++.dg/other/i386-2.C           |  2 +-
 gcc/testsuite/g++.dg/other/i386-3.C           |  2 +-
 gcc/testsuite/gcc.target/i386/avx-1.c         |  2 +-
 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++++++++++++++
 gcc/testsuite/gcc.target/i386/sse-12.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c        |  4 ++--
 gcc/testsuite/gcc.target/i386/sse-14.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c        |  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c        |  4 ++--
 19 files changed, 75 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c

Comments

Uros Bizjak Feb. 25, 2014, 9:37 a.m. UTC | #1
On Tue, Feb 25, 2014 at 10:13 AM, Ilya Tocar <tocarip.intel@gmail.com> wrote:

>> >> > Latest version of AVX512 spec
>> >> > http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
>> >> > Has a few changes.
>> >> >
>> >> > 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
>> >> > We can either support new CPUID or disable PREFETCHWT1 from generating,
>> >> > without removing code, and enable it in 4.9.1/latest version.
>> >> > I am not sure that adding new -m flag and related stuff this late
>> >> > is a good idea. Should still add it?
>> >>
>> >> Please submit the patch anyway. We can relax release constraints on
>> >> non-algorithmic patch a bit, weighting in benefits of having gcc
>> >> release that fully conforms to some published specification.
>> >>
>> > Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
>> > and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
>> > Ok for trunk?
>> >
>
>> >         * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
>>
>> Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
>> g++.dg/other/i386-{2,3} and new options to
>> gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
>> repost the patch.
>>
>
> I've added new switch to those tests. However when I add prefetchwt1
> to pragma GCC target ("sse") sse-22a.c test fails with:
> pmmintrin.h: In function '_mm_loaddup_pd':
> emmintrin.h:119:1: error: inlining failed in call to always_inline
> '_mm_load1_pd': target specific option mismatch
>
> I've checked and this isn't a problem with prefetchwt1. I get the same
> error when I add any other option (e. g. sha) to #pragma GCC target ("sse").
> So I haven't added anything there. As that was the only fail,
> I'm reposting this patch.
>
> ChangeLog for GCC:
>
>         * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
>         (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
>         (ix86_handle_option): Handle OPT_mprefetchwt1.
>         * config/i386/cpuid.h (bit_PREFETCHWT1): New.
>         * config/i386/driver-i386.c (host_detect_local_cpu): Detect
>         PREFETCHWT1 CPUID.
>         * config/i386/i386-c.c (ix86_target_macros_internal): Handle
>         OPTION_MASK_ISA_PREFETCHWT1.
>         * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
>         (PTA_PREFETCHWT1): New.
>         (ix86_option_override_internal): Handle PTA_PREFETCHWT1.
>         (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
>         * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
>           New.
>         * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
>         (*prefetch_avx512pf_<mode>_: Change into ...
>          (*prefetch_prefetchwt1_<mode>: This.
>         * config/i386/i386.opt (mprefetchwt1): New.
>         * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
>         (_mm_prefetch): Handle intent to write.
>         * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.
>
> ChangeLog for tests:
>
>         * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
>         * gcc.target/i386/prefetchwt1-1.c: New.
>         * g++.dg/other/i386-2.C: Add new option.
>         * g++.dg/other/i386-3.C: Ditto.
>         * gcc.target/i386/sse-12.c: Ditto.
>         * gcc.target/i386/sse-13.c: Update __builtin_prefetch, add new option.
>         * gcc.target/i386/sse-22.c: Add new option.
>         * gcc.target/i386/sse-23.c: Update __builtin_prefetch, add new option.

The patch is OK for mainline.

Thanks,
Uros.
diff mbox

Patch

diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index b7f9ff6..a6ab555 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -69,6 +69,7 @@  along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_SET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
    as -msse4.2.  */
@@ -154,6 +155,7 @@  along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_UNSET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_UNSET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
    as -mno-sse4.1. */
@@ -757,6 +759,19 @@  ix86_handle_option (struct gcc_options *opts,
 	}
       return true;
 
+    case OPT_mprefetchwt1:
+      if (value)
+	{
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+	}
+      else
+	{
+	  opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+	  opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+	}
+      return true;
+
   /* Comes from final.c -- no real reason to change it.  */
 #define MAX_CODE_ALIGN 16
 
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index c7a53dd..8c323ae 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -65,6 +65,7 @@ 
 #define bit_3DNOW	(1 << 31)
 
 /* Extended Features (%eax == 7) */
+/* %ebx */
 #define bit_FSGSBASE	(1 << 0)
 #define bit_BMI	(1 << 3)
 #define bit_HLE	(1 << 4)
@@ -79,6 +80,9 @@ 
 #define bit_AVX512CD	(1 << 28)
 #define bit_SHA		(1 << 29)
 
+/* %ecx */
+#define bit_PREFETCHWT1	  (1 << 0)
+
 /* Extended State Enumeration Sub-leaf (%eax == 13, %ecx == 1) */
 #define bit_XSAVEOPT	(1 << 0)
 
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 940ae20..1f5a11c 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -409,7 +409,7 @@  const char *host_detect_local_cpu (int argc, const char **argv)
   unsigned int has_rdseed = 0, has_prfchw = 0, has_adx = 0;
   unsigned int has_osxsave = 0, has_fxsr = 0, has_xsave = 0, has_xsaveopt = 0;
   unsigned int has_avx512er = 0, has_avx512pf = 0, has_avx512cd = 0;
-  unsigned int has_avx512f = 0, has_sha = 0;
+  unsigned int has_avx512f = 0, has_sha = 0, has_prefetchwt1 = 0;
 
   bool arch;
 
@@ -486,6 +486,8 @@  const char *host_detect_local_cpu (int argc, const char **argv)
       has_avx512pf = ebx & bit_AVX512PF;
       has_avx512cd = ebx & bit_AVX512CD;
       has_sha = ebx & bit_SHA;
+
+      has_prefetchwt1 = ecx & bit_PREFETCHWT1;
     }
 
   if (max_level >= 13)
@@ -883,6 +885,7 @@  const char *host_detect_local_cpu (int argc, const char **argv)
       const char *avx512er = has_avx512er ? " -mavx512er" : " -mno-avx512er";
       const char *avx512cd = has_avx512cd ? " -mavx512cd" : " -mno-avx512cd";
       const char *avx512pf = has_avx512pf ? " -mavx512pf" : " -mno-avx512pf";
+      const char *prefetchwt1 = has_prefetchwt1 ? " -mprefetchwt1" : " -mno-prefetchwt1";
 
       options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3,
 			sse4a, cx16, sahf, movbe, aes, sha, pclmul,
@@ -890,7 +893,7 @@  const char *host_detect_local_cpu (int argc, const char **argv)
 			tbm, avx, avx2, sse4_2, sse4_1, lzcnt, rtm,
 			hle, rdrnd, f16c, fsgsbase, rdseed, prfchw, adx,
 			fxsr, xsave, xsaveopt, avx512f, avx512er,
-			avx512cd, avx512pf, NULL);
+			avx512cd, avx512pf, prefetchwt1, NULL);
     }
 
 done:
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 0c50720..c9977bf 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -387,6 +387,8 @@  ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     def_or_undef (parse_in, "__XSAVE__");
   if (isa_flag & OPTION_MASK_ISA_XSAVEOPT)
     def_or_undef (parse_in, "__XSAVEOPT__");
+  if (isa_flag & OPTION_MASK_ISA_PREFETCHWT1)
+    def_or_undef (parse_in, "__PREFETCHWT1__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE))
     def_or_undef (parse_in, "__SSE_MATH__");
   if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE2))
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4fead55..00773d8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2622,6 +2622,7 @@  ix86_target_string (HOST_WIDE_INT isa, int flags, const char *arch,
     { "-mrtm",		OPTION_MASK_ISA_RTM },
     { "-mxsave",	OPTION_MASK_ISA_XSAVE },
     { "-mxsaveopt",	OPTION_MASK_ISA_XSAVEOPT },
+    { "-mprefetchwt1",	OPTION_MASK_ISA_PREFETCHWT1 },
   };
 
   /* Flag options.  */
@@ -3112,6 +3113,7 @@  ix86_option_override_internal (bool main_args_p,
 #define PTA_AVX512PF		(HOST_WIDE_INT_1 << 42)
 #define PTA_AVX512CD		(HOST_WIDE_INT_1 << 43)
 #define PTA_SHA			(HOST_WIDE_INT_1 << 45)
+#define PTA_PREFETCHWT1		(HOST_WIDE_INT_1 << 46)
 
 #define PTA_CORE2 \
   (PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \
@@ -3666,6 +3668,9 @@  ix86_option_override_internal (bool main_args_p,
 	if (processor_alias_table[i].flags & PTA_AVX512CD
 	    && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_AVX512CD))
 	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512CD;
+	if (processor_alias_table[i].flags & PTA_PREFETCHWT1
+	    && !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_PREFETCHWT1))
+	  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1;
 	if (processor_alias_table[i].flags & (PTA_PREFETCH_SSE | PTA_SSE))
 	  x86_prefetch_sse = true;
 
@@ -4547,6 +4552,7 @@  ix86_valid_target_attribute_inner_p (tree args, char *p_strings[],
     IX86_ATTR_ISA ("fxsr",	OPT_mfxsr),
     IX86_ATTR_ISA ("xsave",	OPT_mxsave),
     IX86_ATTR_ISA ("xsaveopt",	OPT_mxsaveopt),
+    IX86_ATTR_ISA ("prefetchwt1", OPT_mprefetchwt1),
 
     /* enum options */
     IX86_ATTR_ENUM ("fpmath=",	OPT_mfpmath_),
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1b6460a..c80878b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -130,6 +130,8 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define TARGET_XSAVE_P(x)	TARGET_ISA_XSAVE_P(x)
 #define TARGET_XSAVEOPT	TARGET_ISA_XSAVEOPT
 #define TARGET_XSAVEOPT_P(x)	TARGET_ISA_XSAVEOPT_P(x)
+#define TARGET_PREFETCHWT1	TARGET_ISA_PREFETCHWT1
+#define TARGET_PREFETCHWT1_P(x)	TARGET_ISA_PREFETCHWT1_P(x)
 
 #define TARGET_LP64	TARGET_ABI_64
 #define TARGET_LP64_P(x)	TARGET_ABI_64_P(x)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 232a334..b9f1320 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17856,7 +17856,7 @@ 
   [(prefetch (match_operand 0 "address_operand")
 	     (match_operand:SI 1 "const_int_operand")
 	     (match_operand:SI 2 "const_int_operand"))]
-  "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_AVX512PF"
+  "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
 {
   bool write = INTVAL (operands[1]) != 0;
   int locality = INTVAL (operands[2]);
@@ -17867,8 +17867,8 @@ 
      supported by SSE counterpart or the SSE prefetch is not available
      (K6 machines).  Otherwise use SSE prefetch as it allows specifying
      of locality.  */
-  if (TARGET_AVX512PF && write)
-    operands[2] = const1_rtx;
+  if (TARGET_PREFETCHWT1 && write)
+    operands[2] = const2_rtx;
   else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
     operands[2] = GEN_INT (3);
   else
@@ -17912,14 +17912,13 @@ 
 	(symbol_ref "memory_address_length (operands[0], false)"))
    (set_attr "memory" "none")])
 
-(define_insn "*prefetch_avx512pf_<mode>"
+(define_insn "*prefetch_prefetchwt1_<mode>"
   [(prefetch (match_operand:P 0 "address_operand" "p")
 	     (const_int 1)
-	     (const_int 1))]
-  "TARGET_AVX512PF"
+	     (const_int 2))]
+  "TARGET_PREFETCHWT1"
   "prefetchwt1\t%a0";
   [(set_attr "type" "sse")
-   (set_attr "prefix" "evex")
    (set (attr "length_address")
 	(symbol_ref "memory_address_length (operands[0], false)"))
    (set_attr "memory" "none")])
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index d5dd0fa..0f463a2 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -757,6 +757,10 @@  mf16c
 Target Report Mask(ISA_F16C) Var(ix86_isa_flags) Save
 Support F16C built-in functions and code generation
 
+mprefetchwt1
+Target Report Mask(ISA_PREFETCHWT1) Var(ix86_isa_flags) Save
+Support PREFETCHWT1 built-in functions and code generation
+
 mfentry
 Target Report Var(flag_fentry) Init(-1)
 Emit profiling counter call at function entry before prologue.
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index 0511dcf..9cefa2c 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -53,6 +53,8 @@  typedef float __v4sf __attribute__ ((__vector_size__ (16)));
 /* Constants for use with _mm_prefetch.  */
 enum _mm_hint
 {
+  /* _MM_HINT_ET is _MM_HINT_T with set 3rd bit.  */
+  _MM_HINT_ET1 = 6,
   _MM_HINT_T0 = 3,
   _MM_HINT_T1 = 2,
   _MM_HINT_T2 = 1,
@@ -1191,11 +1193,11 @@  _m_psadbw (__m64 __A, __m64 __B)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_prefetch (const void *__P, enum _mm_hint __I)
 {
-  __builtin_prefetch (__P, 0, __I);
+  __builtin_prefetch (__P, (__I & 0x4) >> 2, __I & 0x3);
 }
 #else
 #define _mm_prefetch(P, I) \
-  __builtin_prefetch ((P), 0, (I))
+  __builtin_prefetch ((P), ((I & 0x4) >> 2), (I & 0x3))
 #endif
 
 /* Stores the data in A to the address P without polluting the caches.  */
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 959664c..7bcaa83 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -667,7 +667,7 @@  Objective-C and Objective-C++ Dialects}.
 -mvzeroupper -mprefer-avx128 @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
 -mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol
--maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma @gol
+-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol
 -msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol
 -mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mthreads @gol
 -mno-align-stringops  -minline-all-stringops @gol
@@ -15264,6 +15264,8 @@  preferred alignment to @option{-mpreferred-stack-boundary=2}.
 @itemx -mno-f16c
 @itemx -mfma
 @itemx -mno-fma
+@itemx -mprefetchwt1
+@itemx -mno-prefetchwt1
 @itemx -msse4a
 @itemx -mno-sse4a
 @itemx -mfma4
diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C
index a7ef6dc..2f8650a6 100644
--- a/gcc/testsuite/g++.dg/other/i386-2.C
+++ b/gcc/testsuite/g++.dg/other/i386-2.C
@@ -1,5 +1,5 @@ 
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
+/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C
index 4c443b1..df0bd27 100644
--- a/gcc/testsuite/g++.dg/other/i386-3.C
+++ b/gcc/testsuite/g++.dg/other/i386-3.C
@@ -1,5 +1,5 @@ 
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
+/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1" } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
    xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index f7e412d..12cfc68 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -152,7 +152,7 @@ 
 #define __builtin_ia32_shufpd(A, B, N) __builtin_ia32_shufpd(A, B, 0)
 
 /* xmmintrin.h */
-#define __builtin_prefetch(P, A, I) __builtin_prefetch(P, A, _MM_HINT_NTA)
+#define __builtin_prefetch(P, A, I) __builtin_prefetch(P, 0, _MM_HINT_NTA)
 #define __builtin_ia32_pshufw(A, N) __builtin_ia32_pshufw(A, 0)
 #define __builtin_ia32_vec_set_v4hi(A, D, N) \
   __builtin_ia32_vec_set_v4hi(A, D, 0)
diff --git a/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c b/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c
new file mode 100644
index 0000000..1b88516
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/prefetchwt1-1.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mprefetchwt1 -O2" } */
+/* { dg-final { scan-assembler "\[ \\t\]+prefetchwt1\[ \\t\]+" } } */
+
+#include <x86intrin.h>
+
+void *p;
+
+void extern
+prefetchw__test (void)
+{
+    _mm_prefetch (p, _MM_HINT_ET1);
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c
index cf91a9d..51de357 100644
--- a/gcc/testsuite/gcc.target/i386/sse-12.c
+++ b/gcc/testsuite/gcc.target/i386/sse-12.c
@@ -3,7 +3,7 @@ 
    popcntintrin.h and mm_malloc.h are usable
    with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
+/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1" } */
 
 #include <x86intrin.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index c0068a8..171e242 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
+/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1" } */
 
 #include <mm_malloc.h>
 
@@ -138,7 +138,7 @@ 
 #define __builtin_ia32_shufpd(A, B, N) __builtin_ia32_shufpd(A, B, 0)
 
 /* xmmintrin.h */
-#define __builtin_prefetch(P, A, I) __builtin_prefetch(P, A, _MM_HINT_NTA)
+#define __builtin_prefetch(P, A, I) __builtin_prefetch(P, 0, _MM_HINT_NTA)
 #define __builtin_ia32_pshufw(A, N) __builtin_ia32_pshufw(A, 0)
 #define __builtin_ia32_vec_set_v4hi(A, D, N) \
   __builtin_ia32_vec_set_v4hi(A, D, 0)
diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c
index dbe05cb..10334a6 100644
--- a/gcc/testsuite/gcc.target/i386/sse-14.c
+++ b/gcc/testsuite/gcc.target/i386/sse-14.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha" } */
+/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1" } */
 
 #include <mm_malloc.h>
 
diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c
index 85a03da..51f04c2 100644
--- a/gcc/testsuite/gcc.target/i386/sse-22.c
+++ b/gcc/testsuite/gcc.target/i386/sse-22.c
@@ -99,7 +99,7 @@ 
 
 
 #ifndef DIFFERENT_PRAGMAS
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1")
 #endif
 
 /* Following intrinsics require immediate arguments.  They
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index c02b151..5b24618 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -90,7 +90,7 @@ 
 #define __builtin_ia32_shufpd(A, B, N) __builtin_ia32_shufpd(A, B, 0)
 
 /* xmmintrin.h */
-#define __builtin_prefetch(P, A, I) __builtin_prefetch(P, A, _MM_HINT_NTA)
+#define __builtin_prefetch(P, A, I) __builtin_prefetch(P, 0, _MM_HINT_NTA)
 #define __builtin_ia32_pshufw(A, N) __builtin_ia32_pshufw(A, 0)
 #define __builtin_ia32_vec_set_v4hi(A, D, N) \
   __builtin_ia32_vec_set_v4hi(A, D, 0)
@@ -385,7 +385,7 @@ 
 /* shaintrin.h */
 #define __builtin_ia32_sha1rnds4(A, B, C) __builtin_ia32_sha1rnds4(A, B, 1)
 
-#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha")
+#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1")
 #include <wmmintrin.h>
 #include <smmintrin.h>
 #include <mm3dnow.h>