From patchwork Wed Jul 7 21:37:11 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 58193 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 80EDBB6EF3 for ; Thu, 8 Jul 2010 07:37:38 +1000 (EST) Received: (qmail 30385 invoked by alias); 7 Jul 2010 21:37:37 -0000 Received: (qmail 30329 invoked by uid 22791); 7 Jul 2010 21:37:29 -0000 X-SWARE-Spam-Status: No, hits=-0.9 required=5.0 tests=AWL, BAYES_00, NO_DNS_FOR_FROM, TW_AV, TW_CL, TW_FS, TW_MX, TW_OV, TW_VX, TW_XV, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mga02.intel.com (HELO mga02.intel.com) (134.134.136.20) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 07 Jul 2010 21:37:18 +0000 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 07 Jul 2010 14:36:25 -0700 X-ExtLoop1: 1 Received: from gnu-6.sc.intel.com ([10.3.194.135]) by orsmga002.jf.intel.com with ESMTP; 07 Jul 2010 14:37:48 -0700 Received: by gnu-6.sc.intel.com (Postfix, from userid 500) id 3FEF42025D; Wed, 7 Jul 2010 14:37:11 -0700 (PDT) Date: Wed, 7 Jul 2010 14:37:11 -0700 From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Subject: [ix86/gcc-4_5-branch] PATCH: AVX Programming Reference (June, 2010) Message-ID: <20100707213711.GA12717@intel.com> Reply-To: "H.J. Lu" MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, I checked in this patch to backport support for AVX Programming Reference (June, 2010). H.J. diff --git a/gcc/ChangeLog.ix86 b/gcc/ChangeLog.ix86 index 1e2f8c0..d29f35b 100644 --- a/gcc/ChangeLog.ix86 +++ b/gcc/ChangeLog.ix86 @@ -1,6 +1,110 @@ 2010-07-07 H.J. Lu Backport from mainline + 2010-07-07 H.J. Lu + + PR target/44844 + * config/i386/i386.md (rdrand): Changed to expand to + retry if the carry flag isn't valid. + (rdrand_1): New. + + 2010-07-05 H.J. Lu + + AVX Programming Reference (June, 2010) + * config/i386/cpuid.h (bit_F16C): New. + (bit_RDRND): Likewise. + (bit_FSGSBASE): Likewise. + + * config/i386/i386-builtin-types.def: Add + "DEF_FUNCTION_TYPE (UINT16)", function types for + float16 <-> float conversions and + "DEF_FUNCTION_TYPE (VOID, UINT64)". + + * config/i386/i386-c.c (ix86_target_macros_internal): Support + OPTION_MASK_ISA_FSGSBASE, OPTION_MASK_ISA_RDRND and + OPTION_MASK_ISA_F16C. + + * config/i386/i386.c (OPTION_MASK_ISA_FSGSBASE_SET): New. + (OPTION_MASK_ISA_RDRND_SET): Likewise. + (OPTION_MASK_ISA_F16C_SET): Likewise. + (OPTION_MASK_ISA_FSGSBASE_UNSET): Likewise. + (OPTION_MASK_ISA_RDRND_UNSET): Likewise. + (OPTION_MASK_ISA_F16C_UNSET): Likewise. + (OPTION_MASK_ISA_AVX_UNSET): Add OPTION_MASK_ISA_F16C_UNSET. + (ix86_handle_option): Handle OPT_mfsgsbase, OPT_mrdrnd and + OPT_mf16c. + (ix86_target_string): Support -mfsgsbase, -mrdrnd and -mf16c. + (pta_flags): Add PTA_FSGSBASE, PTA_RDRND and PTA_F16C. + (override_options): Handle them. + (ix86_valid_target_attribute_inner_p): Handle fsgsbase, rdrnd + and f16c. + (ix86_builtins): Add IX86_BUILTIN_RDFSBASE32, + IX86_BUILTIN_RDFSBASE64, IX86_BUILTIN_RDGSBASE32, + IX86_BUILTIN_RDGSBASE64, IX86_BUILTIN_WRFSBASE32, + IX86_BUILTIN_WRFSBASE64, IX86_BUILTIN_WRGSBASE32, + IX86_BUILTIN_WRGSBASE64, IX86_BUILTIN_RDRAND16, + IX86_BUILTIN_RDRAND32, IX86_BUILTIN_RDRAND64, + IX86_BUILTIN_CVTPH2PS, IX86_BUILTIN_CVTPH2PS256, + IX86_BUILTIN_CVTPS2PH and IX86_BUILTIN_CVTPS2PH256. + (bdesc_args): Likewise. + (ix86_expand_args_builtin): Handle V8SF_FTYPE_V8HI, + V4SF_FTYPE_V8HI, V8HI_FTYPE_V8SF_INT and V8HI_FTYPE_V4SF_INT. + (ix86_expand_special_args_builtin): Handle VOID_FTYPE_UINT64, + VOID_FTYPE_UNSIGNED, UNSIGNED_FTYPE_VOID and UINT16_FTYPE_VOID. + Handle non-memory store. + + * config/i386/i386.h (TARGET_FSGSBASE): New. + (TARGET_RDRND): Likewise. + (TARGET_F12C): Likewise. + + * config/i386/i386.md (UNSPEC_VCVTPH2PS): New. + (UNSPEC_VCVTPS2PH): Likewise. + (UNSPECV_RDFSBASE): Likewise. + (UNSPECV_RDGSBASE): Likewise. + (UNSPECV_WRFSBASE): Likewise. + (UNSPECV_WRGSBASE): Likewise. + (UNSPECV_RDRAND): Likewise. + (rdfsbase): Likewise. + (rdgsbase): Likewise. + (wrfsbase): Likewise. + (wrgsbase): Likewise. + (rdrand): Likewise. + + * config/i386/i386.opt: Add -mfsgsbase, -mrdrnd and -mf16c. + + * config/i386/immintrin.h (_rdrand_u16): New. + (_rdrand_u32): Likewise. + (_readfsbase_u32): Likewise. + (_readfsbase_u64): Likewise. + (_readgsbase_u32): Likewise. + (_readgsbase_u64): Likewise. + (_writefsbase_u32): Likewise. + (_writefsbase_u64): Likewise. + (_writegsbase_u32): Likewise. + (_writegsbase_u64): Likewise. + (_rdrand_u64): Likewise. + (_cvtsh_ss): Likewise. + (_mm_cvtph_ps): Likewise. + (_mm256_cvtph_ps): Likewise. + (_cvtss_sh): Likewise. + (_mm_cvtps_ph): Likewise. + (_mm256_cvtps_ph): Likewise. + + * config/i386/sse.md (vcvtph2ps): New. + (*vcvtph2ps_load): Likewise. + (vcvtph2ps256): Likewise. + (vcvtps2ph): Likewise. + (*vcvtps2ph): Likewise. + (*vcvtps2ph_store): Likewise. + (vcvtps2ph256): Likewise. + + * doc/extend.texi: Document FSGSBASE and RDRND built-in functions. + + * doc/invoke.texi: Document -mfsgsbase, -mrdrnd and -mf16c. + +2010-07-07 H.J. Lu + + Backport from mainline 2010-07-04 H.J. Lu PR rtl-optimization/44695 diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index a9d90a6..11c2f1e 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -35,6 +35,8 @@ #define bit_XSAVE (1 << 26) #define bit_OSXSAVE (1 << 27) #define bit_AVX (1 << 28) +#define bit_F16C (1 << 29) +#define bit_RDRND (1 << 30) /* %edx */ #define bit_CMPXCHG8B (1 << 8) @@ -58,6 +60,8 @@ #define bit_3DNOWP (1 << 30) #define bit_3DNOW (1 << 31) +/* Extended Features (%eax == 7) */ +#define bit_FSGSBASE (1 << 0) #if defined(__i386__) && defined(__PIC__) /* %ebx may be the PIC register. */ diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 10310e2..09dd9eb 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -128,6 +128,7 @@ DEF_POINTER_TYPE (PCV8SF, V8SF, CONST) DEF_FUNCTION_TYPE (FLOAT128) DEF_FUNCTION_TYPE (UINT64) DEF_FUNCTION_TYPE (UNSIGNED) +DEF_FUNCTION_TYPE (UINT16) DEF_FUNCTION_TYPE (VOID) DEF_FUNCTION_TYPE (PVOID) @@ -179,6 +180,7 @@ DEF_FUNCTION_TYPE (V4SF, V4DF) DEF_FUNCTION_TYPE (V4SF, V4SF) DEF_FUNCTION_TYPE (V4SF, V4SI) DEF_FUNCTION_TYPE (V4SF, V8SF) +DEF_FUNCTION_TYPE (V4SF, V8HI) DEF_FUNCTION_TYPE (V4SI, V16QI) DEF_FUNCTION_TYPE (V4SI, V2DF) DEF_FUNCTION_TYPE (V4SI, V4DF) @@ -194,10 +196,12 @@ DEF_FUNCTION_TYPE (V8SF, PCV4SF) DEF_FUNCTION_TYPE (V8SF, V4SF) DEF_FUNCTION_TYPE (V8SF, V8SF) DEF_FUNCTION_TYPE (V8SF, V8SI) +DEF_FUNCTION_TYPE (V8SF, V8HI) DEF_FUNCTION_TYPE (V8SI, V4SI) DEF_FUNCTION_TYPE (V8SI, V8SF) DEF_FUNCTION_TYPE (VOID, PCVOID) DEF_FUNCTION_TYPE (VOID, PVOID) +DEF_FUNCTION_TYPE (VOID, UINT64) DEF_FUNCTION_TYPE (VOID, UNSIGNED) DEF_FUNCTION_TYPE (DI, V2DI, INT) @@ -282,6 +286,8 @@ DEF_FUNCTION_TYPE (V8HI, V4SI, V4SI) DEF_FUNCTION_TYPE (V8HI, V8HI, INT) DEF_FUNCTION_TYPE (V8HI, V8HI, SI) DEF_FUNCTION_TYPE (V8HI, V8HI, V8HI) +DEF_FUNCTION_TYPE (V8HI, V8SF, INT) +DEF_FUNCTION_TYPE (V8HI, V4SF, INT) DEF_FUNCTION_TYPE (V8QI, V4HI, V4HI) DEF_FUNCTION_TYPE (V8QI, V8QI, V8QI) DEF_FUNCTION_TYPE (V8SF, PCV8SF, V8SF) diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 35eab49..e557059 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -240,6 +240,12 @@ ix86_target_macros_internal (int isa_flag, def_or_undef (parse_in, "__ABM__"); if (isa_flag & OPTION_MASK_ISA_POPCNT) def_or_undef (parse_in, "__POPCNT__"); + if (isa_flag & OPTION_MASK_ISA_FSGSBASE) + def_or_undef (parse_in, "__FSGSBASE__"); + if (isa_flag & OPTION_MASK_ISA_RDRND) + def_or_undef (parse_in, "__RDRND__"); + if (isa_flag & OPTION_MASK_ISA_F16C) + def_or_undef (parse_in, "__F16C__"); if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE)) def_or_undef (parse_in, "__SSE_MATH__"); if ((fpmath & FPMATH_SSE) && (isa_flag & OPTION_MASK_ISA_SSE2)) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 9c3f351..35ca0e8 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -1985,6 +1985,11 @@ static int ix86_isa_flags_explicit; #define OPTION_MASK_ISA_MOVBE_SET OPTION_MASK_ISA_MOVBE #define OPTION_MASK_ISA_CRC32_SET OPTION_MASK_ISA_CRC32 +#define OPTION_MASK_ISA_FSGSBASE_SET OPTION_MASK_ISA_FSGSBASE +#define OPTION_MASK_ISA_RDRND_SET OPTION_MASK_ISA_RDRND +#define OPTION_MASK_ISA_F16C_SET \ + (OPTION_MASK_ISA_F16C | OPTION_MASK_ISA_AVX_SET) + /* Define a set of ISAs which aren't available when a given ISA is disabled. MMX and SSE ISAs are handled separately. */ @@ -2010,7 +2015,7 @@ static int ix86_isa_flags_explicit; (OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_AVX_UNSET ) #define OPTION_MASK_ISA_AVX_UNSET \ (OPTION_MASK_ISA_AVX | OPTION_MASK_ISA_FMA_UNSET \ - | OPTION_MASK_ISA_FMA4_UNSET) + | OPTION_MASK_ISA_FMA4_UNSET | OPTION_MASK_ISA_F16C_UNSET) #define OPTION_MASK_ISA_FMA_UNSET OPTION_MASK_ISA_FMA /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same @@ -2034,6 +2039,10 @@ static int ix86_isa_flags_explicit; #define OPTION_MASK_ISA_MOVBE_UNSET OPTION_MASK_ISA_MOVBE #define OPTION_MASK_ISA_CRC32_UNSET OPTION_MASK_ISA_CRC32 +#define OPTION_MASK_ISA_FSGSBASE_UNSET OPTION_MASK_ISA_FSGSBASE +#define OPTION_MASK_ISA_RDRND_UNSET OPTION_MASK_ISA_RDRND +#define OPTION_MASK_ISA_F16C_UNSET OPTION_MASK_ISA_F16C + /* Vectorization library interface and handlers. */ tree (*ix86_veclib_handler)(enum built_in_function, tree, tree) = NULL; static tree ix86_veclibabi_svml (enum built_in_function, tree, tree); @@ -2401,6 +2410,45 @@ ix86_handle_option (size_t code, const char *arg ATTRIBUTE_UNUSED, int value) } return true; + case OPT_mfsgsbase: + if (value) + { + ix86_isa_flags |= OPTION_MASK_ISA_FSGSBASE_SET; + ix86_isa_flags_explicit |= OPTION_MASK_ISA_FSGSBASE_SET; + } + else + { + ix86_isa_flags &= ~OPTION_MASK_ISA_FSGSBASE_UNSET; + ix86_isa_flags_explicit |= OPTION_MASK_ISA_FSGSBASE_UNSET; + } + return true; + + case OPT_mrdrnd: + if (value) + { + ix86_isa_flags |= OPTION_MASK_ISA_RDRND_SET; + ix86_isa_flags_explicit |= OPTION_MASK_ISA_RDRND_SET; + } + else + { + ix86_isa_flags &= ~OPTION_MASK_ISA_RDRND_UNSET; + ix86_isa_flags_explicit |= OPTION_MASK_ISA_RDRND_UNSET; + } + return true; + + case OPT_mf16c: + if (value) + { + ix86_isa_flags |= OPTION_MASK_ISA_F16C_SET; + ix86_isa_flags_explicit |= OPTION_MASK_ISA_F16C_SET; + } + else + { + ix86_isa_flags &= ~OPTION_MASK_ISA_F16C_UNSET; + ix86_isa_flags_explicit |= OPTION_MASK_ISA_F16C_UNSET; + } + return true; + default: return true; } @@ -2444,6 +2492,9 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune, { "-mcrc32", OPTION_MASK_ISA_CRC32 }, { "-maes", OPTION_MASK_ISA_AES }, { "-mpclmul", OPTION_MASK_ISA_PCLMUL }, + { "-mfsgsbase", OPTION_MASK_ISA_FSGSBASE }, + { "-mrdrnd", OPTION_MASK_ISA_RDRND }, + { "-mf16c", OPTION_MASK_ISA_F16C }, }; /* Flag options. */ @@ -2661,7 +2712,10 @@ override_options (bool main_args_p) PTA_MOVBE = 1 << 20, PTA_FMA4 = 1 << 21, PTA_XOP = 1 << 22, - PTA_LWP = 1 << 23 + PTA_LWP = 1 << 23, + PTA_FSGSBASE = 1 << 24, + PTA_RDRND = 1 << 25, + PTA_F16C = 1 << 26 }; static struct pta @@ -3028,6 +3082,15 @@ override_options (bool main_args_p) if (processor_alias_table[i].flags & PTA_PCLMUL && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_PCLMUL)) ix86_isa_flags |= OPTION_MASK_ISA_PCLMUL; + if (processor_alias_table[i].flags & PTA_FSGSBASE + && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_FSGSBASE)) + ix86_isa_flags |= OPTION_MASK_ISA_FSGSBASE; + if (processor_alias_table[i].flags & PTA_RDRND + && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_RDRND)) + ix86_isa_flags |= OPTION_MASK_ISA_RDRND; + if (processor_alias_table[i].flags & PTA_F16C + && !(ix86_isa_flags_explicit & OPTION_MASK_ISA_F16C)) + ix86_isa_flags |= OPTION_MASK_ISA_F16C; if (processor_alias_table[i].flags & (PTA_PREFETCH_SSE | PTA_SSE)) x86_prefetch_sse = true; @@ -3693,6 +3756,9 @@ ix86_valid_target_attribute_inner_p (tree args, char *p_strings[]) IX86_ATTR_ISA ("fma4", OPT_mfma4), IX86_ATTR_ISA ("xop", OPT_mxop), IX86_ATTR_ISA ("lwp", OPT_mlwp), + IX86_ATTR_ISA ("fsgsbase", OPT_mfsgsbase), + IX86_ATTR_ISA ("rdrnd", OPT_mrdrnd), + IX86_ATTR_ISA ("f16c", OPT_mf16c), /* string options */ IX86_ATTR_STR ("arch=", IX86_FUNCTION_SPECIFIC_ARCH), @@ -21351,6 +21417,27 @@ enum ix86_builtins IX86_BUILTIN_CLZS, + /* FSGSBASE instructions. */ + IX86_BUILTIN_RDFSBASE32, + IX86_BUILTIN_RDFSBASE64, + IX86_BUILTIN_RDGSBASE32, + IX86_BUILTIN_RDGSBASE64, + IX86_BUILTIN_WRFSBASE32, + IX86_BUILTIN_WRFSBASE64, + IX86_BUILTIN_WRGSBASE32, + IX86_BUILTIN_WRGSBASE64, + + /* RDRND instructions. */ + IX86_BUILTIN_RDRAND16, + IX86_BUILTIN_RDRAND32, + IX86_BUILTIN_RDRAND64, + + /* F16C instructions. */ + IX86_BUILTIN_CVTPH2PS, + IX86_BUILTIN_CVTPH2PS256, + IX86_BUILTIN_CVTPS2PH, + IX86_BUILTIN_CVTPS2PH256, + IX86_BUILTIN_MAX }; @@ -21625,6 +21712,20 @@ static const struct builtin_description bdesc_special_args[] = { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinssi3, "__builtin_ia32_lwpins32", IX86_BUILTIN_LWPINS32, UNKNOWN, (int) UCHAR_FTYPE_UINT_UINT_UINT }, { OPTION_MASK_ISA_LWP, CODE_FOR_lwp_lwpinsdi3, "__builtin_ia32_lwpins64", IX86_BUILTIN_LWPINS64, UNKNOWN, (int) UCHAR_FTYPE_UINT64_UINT_UINT }, + /* FSGSBASE */ + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_rdfsbasesi, "__builtin_ia32_rdfsbase32", IX86_BUILTIN_RDFSBASE32, UNKNOWN, (int) UNSIGNED_FTYPE_VOID }, + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_rdfsbasedi, "__builtin_ia32_rdfsbase64", IX86_BUILTIN_RDFSBASE64, UNKNOWN, (int) UINT64_FTYPE_VOID }, + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_rdgsbasesi, "__builtin_ia32_rdgsbase32", IX86_BUILTIN_RDGSBASE32, UNKNOWN, (int) UNSIGNED_FTYPE_VOID }, + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_rdgsbasedi, "__builtin_ia32_rdgsbase64", IX86_BUILTIN_RDGSBASE64, UNKNOWN, (int) UINT64_FTYPE_VOID }, + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_wrfsbasesi, "__builtin_ia32_wrfsbase32", IX86_BUILTIN_WRFSBASE32, UNKNOWN, (int) VOID_FTYPE_UNSIGNED }, + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_wrfsbasedi, "__builtin_ia32_wrfsbase64", IX86_BUILTIN_WRFSBASE64, UNKNOWN, (int) VOID_FTYPE_UINT64 }, + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_wrgsbasesi, "__builtin_ia32_wrgsbase32", IX86_BUILTIN_WRGSBASE32, UNKNOWN, (int) VOID_FTYPE_UNSIGNED }, + { OPTION_MASK_ISA_FSGSBASE | OPTION_MASK_ISA_64BIT, CODE_FOR_wrgsbasedi, "__builtin_ia32_wrgsbase64", IX86_BUILTIN_WRGSBASE64, UNKNOWN, (int) VOID_FTYPE_UINT64 }, + + /* RDRND */ + { OPTION_MASK_ISA_RDRND, CODE_FOR_rdrandhi, "__builtin_ia32_rdrand16", IX86_BUILTIN_RDRAND16, UNKNOWN, (int) UINT16_FTYPE_VOID }, + { OPTION_MASK_ISA_RDRND, CODE_FOR_rdrandsi, "__builtin_ia32_rdrand32", IX86_BUILTIN_RDRAND32, UNKNOWN, (int) UNSIGNED_FTYPE_VOID }, + { OPTION_MASK_ISA_RDRND | OPTION_MASK_ISA_64BIT, CODE_FOR_rdranddi, "__builtin_ia32_rdrand64", IX86_BUILTIN_RDRAND64, UNKNOWN, (int) UINT64_FTYPE_VOID }, }; /* Builtins with variable number of arguments. */ @@ -22251,6 +22352,12 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX, CODE_FOR_avx_movmskps256, "__builtin_ia32_movmskps256", IX86_BUILTIN_MOVMSKPS256, UNKNOWN, (int) INT_FTYPE_V8SF }, { OPTION_MASK_ISA_ABM, CODE_FOR_clzhi2_abm, "__builtin_clzs", IX86_BUILTIN_CLZS, UNKNOWN, (int) UINT16_FTYPE_UINT16 }, + + /* F16C */ + { OPTION_MASK_ISA_F16C, CODE_FOR_vcvtph2ps, "__builtin_ia32_vcvtph2ps", IX86_BUILTIN_CVTPH2PS, UNKNOWN, (int) V4SF_FTYPE_V8HI }, + { OPTION_MASK_ISA_F16C, CODE_FOR_vcvtph2ps256, "__builtin_ia32_vcvtph2ps256", IX86_BUILTIN_CVTPH2PS256, UNKNOWN, (int) V8SF_FTYPE_V8HI }, + { OPTION_MASK_ISA_F16C, CODE_FOR_vcvtps2ph, "__builtin_ia32_vcvtps2ph", IX86_BUILTIN_CVTPS2PH, UNKNOWN, (int) V8HI_FTYPE_V4SF_INT }, + { OPTION_MASK_ISA_F16C, CODE_FOR_vcvtps2ph256, "__builtin_ia32_vcvtps2ph256", IX86_BUILTIN_CVTPS2PH256, UNKNOWN, (int) V8HI_FTYPE_V8SF_INT }, }; /* FMA4 and XOP. */ @@ -23491,6 +23598,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8SF_FTYPE_V8SF: case V8SF_FTYPE_V8SI: case V8SF_FTYPE_V4SF: + case V8SF_FTYPE_V8HI: case V4SI_FTYPE_V4SI: case V4SI_FTYPE_V16QI: case V4SI_FTYPE_V4SF: @@ -23507,6 +23615,7 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V4SF_FTYPE_V4SI: case V4SF_FTYPE_V8SF: case V4SF_FTYPE_V4DF: + case V4SF_FTYPE_V8HI: case V4SF_FTYPE_V2DF: case V2DI_FTYPE_V2DI: case V2DI_FTYPE_V16QI: @@ -23609,6 +23718,8 @@ ix86_expand_args_builtin (const struct builtin_description *d, nargs_constant = 1; break; case V8HI_FTYPE_V8HI_INT: + case V8HI_FTYPE_V8SF_INT: + case V8HI_FTYPE_V4SF_INT: case V8SF_FTYPE_V8SF_INT: case V4SI_FTYPE_V4SI_INT: case V4SI_FTYPE_V8SI_INT: @@ -23856,7 +23967,16 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case VOID_FTYPE_VOID: emit_insn (GEN_FCN (icode) (target)); return 0; + case VOID_FTYPE_UINT64: + case VOID_FTYPE_UNSIGNED: + nargs = 0; + klass = store; + memory = 0; + break; + break; case UINT64_FTYPE_VOID: + case UNSIGNED_FTYPE_VOID: + case UINT16_FTYPE_VOID: nargs = 0; klass = load; memory = 0; @@ -23935,7 +24055,10 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, arg = CALL_EXPR_ARG (exp, 0); op = expand_normal (arg); gcc_assert (target == 0); - target = gen_rtx_MEM (tmode, copy_to_mode_reg (Pmode, op)); + if (memory) + target = gen_rtx_MEM (tmode, copy_to_mode_reg (Pmode, op)); + else + target = force_reg (tmode, op); arg_adjust = 1; } else diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 5bae99d..694d377 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -66,6 +66,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define TARGET_AES OPTION_ISA_AES #define TARGET_PCLMUL OPTION_ISA_PCLMUL #define TARGET_CMPXCHG16B OPTION_ISA_CX16 +#define TARGET_FSGSBASE OPTION_ISA_FSGSBASE +#define TARGET_RDRND OPTION_ISA_RDRND +#define TARGET_F16C OPTION_ISA_F16C /* SSE4.1 defines round instructions */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 3ddd816..e988546 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -226,6 +226,8 @@ (UNSPEC_MASKSTORE 171) (UNSPEC_CAST 172) (UNSPEC_VTESTP 173) + (UNSPEC_VCVTPH2PS 174) + (UNSPEC_VCVTPS2PH 175) ]) (define_constants @@ -254,6 +256,11 @@ (UNSPECV_SLWP_INTRINSIC 23) (UNSPECV_LWPVAL_INTRINSIC 24) (UNSPECV_LWPINS_INTRINSIC 25) + (UNSPECV_RDFSBASE 26) + (UNSPECV_RDGSBASE 27) + (UNSPECV_WRFSBASE 28) + (UNSPECV_WRGSBASE 29) + (UNSPECV_RDRAND 30) ]) ;; Constants to represent pcomtrue/pcomfalse variants @@ -20932,6 +20939,71 @@ (set (attr "length") (symbol_ref "ix86_attr_length_address_default (insn) + 9"))]) +(define_insn "rdfsbase" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (unspec_volatile:SWI48 [(const_int 0)] UNSPECV_RDFSBASE))] + "TARGET_64BIT && TARGET_FSGSBASE" + "rdfsbase %0" + [(set_attr "type" "other") + (set_attr "prefix_extra" "2")]) + +(define_insn "rdgsbase" + [(set (match_operand:SWI48 0 "register_operand" "=r") + (unspec_volatile:SWI48 [(const_int 0)] UNSPECV_RDGSBASE))] + "TARGET_64BIT && TARGET_FSGSBASE" + "rdgsbase %0" + [(set_attr "type" "other") + (set_attr "prefix_extra" "2")]) + +(define_insn "wrfsbase" + [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")] + UNSPECV_WRFSBASE)] + "TARGET_64BIT && TARGET_FSGSBASE" + "wrfsbase %0" + [(set_attr "type" "other") + (set_attr "prefix_extra" "2")]) + +(define_insn "wrgsbase" + [(unspec_volatile [(match_operand:SWI48 0 "register_operand" "r")] + UNSPECV_WRGSBASE)] + "TARGET_64BIT && TARGET_FSGSBASE" + "wrgsbase %0" + [(set_attr "type" "other") + (set_attr "prefix_extra" "2")]) + +(define_expand "rdrand" + [(set (match_operand:SWI248 0 "register_operand" "=r") + (unspec_volatile:SWI248 [(const_int 0)] UNSPECV_RDRAND))] + "TARGET_RDRND" +{ + rtx retry_label, insn, ccc; + + retry_label = gen_label_rtx (); + + emit_label (retry_label); + + /* Generate rdrand. */ + emit_insn (gen_rdrand_1 (operands[0])); + + /* Retry if the carry flag isn't valid. */ + ccc = gen_rtx_REG (CCCmode, FLAGS_REG); + ccc = gen_rtx_EQ (VOIDmode, ccc, const0_rtx); + ccc = gen_rtx_IF_THEN_ELSE (VOIDmode, ccc, pc_rtx, + gen_rtx_LABEL_REF (VOIDmode, retry_label)); + insn = emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx, ccc)); + JUMP_LABEL (insn) = retry_label; + + DONE; +}) + +(define_insn "rdrand_1" + [(set (match_operand:SWI248 0 "register_operand" "=r") + (unspec_volatile:SWI248 [(const_int 0)] UNSPECV_RDRAND))] + "TARGET_RDRND" + "rdrand %0" + [(set_attr "type" "other") + (set_attr "prefix_extra" "1")]) + (include "mmx.md") (include "sse.md") (include "sync.md") diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 0afdd11..f264c42 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -363,3 +363,15 @@ Support PCLMUL built-in functions and code generation msse2avx Target Report Var(ix86_sse2avx) Encode SSE instructions with VEX prefix + +mfsgsbase +Target Report Mask(ISA_FSGSBASE) Var(ix86_isa_flags) VarExists Save +Support FSGSBASE built-in functions and code generation + +mrdrnd +Target Report Mask(ISA_RDRND) Var(ix86_isa_flags) VarExists Save +Support RDRND built-in functions and code generation + +mf16c +Target Report Mask(ISA_F16C) Var(ix86_isa_flags) VarExists Save +Support F16C built-in functions and code generation diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 7a2b9b9..3e69060 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -56,4 +56,148 @@ #include #endif +#ifdef __RDRND__ +extern __inline unsigned short +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdrand_u16 (void) +{ + return __builtin_ia32_rdrand16 (); +} + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdrand_u32 (void) +{ + return __builtin_ia32_rdrand32 (); +} +#endif /* __RDRND__ */ + +#ifdef __x86_64__ +#ifdef __FSGSBASE__ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readfsbase_u32 (void) +{ + return __builtin_ia32_rdfsbase32 (); +} + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readfsbase_u64 (void) +{ + return __builtin_ia32_rdfsbase64 (); +} + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readgsbase_u32 (void) +{ + return __builtin_ia32_rdgsbase32 (); +} + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readgsbase_u64 (void) +{ + return __builtin_ia32_rdgsbase64 (); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writefsbase_u32 (unsigned int __B) +{ + __builtin_ia32_wrfsbase32 (__B); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writefsbase_u64 (unsigned long long __B) +{ + __builtin_ia32_wrfsbase64 (__B); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writegsbase_u32 (unsigned int __B) +{ + __builtin_ia32_wrgsbase32 (__B); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writegsbase_u64 (unsigned long long __B) +{ + __builtin_ia32_wrgsbase64 (__B); +} +#endif /* __FSGSBASE__ */ + +#ifdef __RDRND__ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdrand_u64 (void) +{ + return __builtin_ia32_rdrand64 (); +} +#endif /* __RDRND__ */ +#endif /* __x86_64__ */ + +#ifdef __F16C__ +extern __inline float __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_cvtsh_ss (unsigned short __S) +{ + __v8hi __H = __extension__ (__v8hi){ __S, 0, 0, 0, 0, 0, 0, 0 }; + __v4sf __A = __builtin_ia32_vcvtph2ps (__H); + return __builtin_ia32_vec_ext_v4sf (__A, 0); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_ps (__m128i __A) +{ + return (__m128) __builtin_ia32_vcvtph2ps ((__v8hi) __A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_ps (__m128i __A) +{ + return (__m256) __builtin_ia32_vcvtph2ps256 ((__v8hi) __A); +} + +#ifdef __OPTIMIZE__ +extern __inline unsigned short __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_cvtss_sh (float __F, const int __I) +{ + __v4sf __A = __extension__ (__v4sf){ __F, 0, 0, 0 }; + __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I); + return (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_ph (__m128 __A, const int __I) +{ + return (__m128i) __builtin_ia32_vcvtps2ph ((__v4sf) __A, __I); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtps_ph (__m256 __A, const int __I) +{ + return (__m128i) __builtin_ia32_vcvtps2ph256 ((__v8sf) __A, __I); +} +#else +#define _cvtss_sh(__F, __I) \ + (__extension__ \ + ({ \ + __v4sf __A = __extension__ (__v4sf){ __F, 0, 0, 0 }; \ + __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I); \ + (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0); \ + })) + +#define _mm_cvtps_ph(A, I) \ + ((__m128i) __builtin_ia32_vcvtps2ph ((__v4sf)(__m128) A, (int) (I))) + +#define _mm256_cvtps_ph(A, I) \ + ((__m128i) __builtin_ia32_vcvtps2ph256 ((__v8sf)(__m256) A, (int) (I))) +#endif + +#endif /* __F16C__ */ + #endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 629b4c4..6d32dbf 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -12329,3 +12329,81 @@ (set_attr "length_immediate" "1,*") (set_attr "prefix" "vex") (set_attr "mode" "")]) + +(define_insn "vcvtph2ps" + [(set (match_operand:V4SF 0 "register_operand" "=x") + (vec_select:V4SF + (unspec:V8SF [(match_operand:V8HI 1 "register_operand" "x")] + UNSPEC_VCVTPH2PS) + (parallel [(const_int 0) (const_int 1) + (const_int 1) (const_int 2)])))] + "TARGET_F16C" + "vcvtph2ps\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "vex") + (set_attr "mode" "V4SF")]) + +(define_insn "*vcvtph2ps_load" + [(set (match_operand:V4SF 0 "register_operand" "=x") + (unspec:V4SF [(match_operand:V4HI 1 "memory_operand" "m")] + UNSPEC_VCVTPH2PS))] + "TARGET_F16C" + "vcvtph2ps\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "vex") + (set_attr "mode" "V8SF")]) + +(define_insn "vcvtph2ps256" + [(set (match_operand:V8SF 0 "register_operand" "=x") + (unspec:V8SF [(match_operand:V8HI 1 "nonimmediate_operand" "xm")] + UNSPEC_VCVTPH2PS))] + "TARGET_F16C" + "vcvtph2ps\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "vex") + (set_attr "mode" "V8SF")]) + +(define_expand "vcvtps2ph" + [(set (match_operand:V8HI 0 "register_operand" "") + (vec_concat:V8HI + (unspec:V4HI [(match_operand:V4SF 1 "register_operand" "") + (match_operand:SI 2 "immediate_operand" "")] + UNSPEC_VCVTPS2PH) + (match_dup 3)))] + "TARGET_F16C" + "operands[3] = CONST0_RTX (V4HImode);") + +(define_insn "*vcvtps2ph" + [(set (match_operand:V8HI 0 "register_operand" "=x") + (vec_concat:V8HI + (unspec:V4HI [(match_operand:V4SF 1 "register_operand" "x") + (match_operand:SI 2 "immediate_operand" "N")] + UNSPEC_VCVTPS2PH) + (match_operand:V4HI 3 "const0_operand" "")))] + "TARGET_F16C" + "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "vex") + (set_attr "mode" "V4SF")]) + +(define_insn "*vcvtps2ph_store" + [(set (match_operand:V4HI 0 "memory_operand" "=m") + (unspec:V4HI [(match_operand:V4SF 1 "register_operand" "x") + (match_operand:SI 2 "immediate_operand" "N")] + UNSPEC_VCVTPS2PH))] + "TARGET_F16C" + "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "vex") + (set_attr "mode" "V4SF")]) + +(define_insn "vcvtps2ph256" + [(set (match_operand:V8HI 0 "nonimmediate_operand" "=xm") + (unspec:V8HI [(match_operand:V8SF 1 "register_operand" "x") + (match_operand:SI 2 "immediate_operand" "N")] + UNSPEC_VCVTPS2PH))] + "TARGET_F16C" + "vcvtps2ph\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "vex") + (set_attr "mode" "V8SF")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 78d9093..ed4f5e6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8930,6 +8930,31 @@ used. Generates the @code{pclmulqdq} machine instruction. @end table +The following built-in function is available when @option{-mfsgsbase} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +unsigned int __builtin_ia32_rdfsbase32 (void) +unsigned long long __builtin_ia32_rdfsbase64 (void) +unsigned int __builtin_ia32_rdgsbase32 (void) +unsigned long long __builtin_ia32_rdgsbase64 (void) +void _writefsbase_u32 (unsigned int) +void _writefsbase_u64 (unsigned long long) +void _writegsbase_u32 (unsigned int) +void _writegsbase_u64 (unsigned long long) +@end smallexample + +The following built-in function is available when @option{-mrdrnd} is +used. All of them generate the machine instruction that is part of the +name. + +@smallexample +unsigned short __builtin_ia32_rdrand16 (void) +unsigned int __builtin_ia32_rdrand32 (void) +unsigned long long __builtin_ia32_rdrand64 (void) +@end smallexample + The following built-in functions are available when @option{-msse4a} is used. All of them generate the machine instruction that is part of the name. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index bf3cd18..2dd0ccd 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -592,7 +592,7 @@ Objective-C and Objective-C++ Dialects}. -mincoming-stack-boundary=@var{num} -mcld -mcx16 -msahf -mmovbe -mcrc32 -mrecip @gol -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol --maes -mpclmul -mfused-madd @gol +-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfused-madd @gol -msse4a -m3dnow -mpopcnt -mabm -mfma4 -mxop -mlwp @gol -mthreads -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol @@ -12070,6 +12070,12 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @itemx -mno-aes @itemx -mpclmul @itemx -mno-pclmul +@itemx -mfsgsbase +@itemx -mno-fsgsbase +@itemx -mrdrnd +@itemx -mno-rdrnd +@itemx -mf16c +@itemx -mno-f16c @itemx -msse4a @itemx -mno-sse4a @itemx -mfma4 @@ -12091,8 +12097,8 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @opindex m3dnow @opindex mno-3dnow These switches enable or disable the use of instructions in the MMX, -SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, SSE4A, FMA4, XOP, -LWP, ABM or 3DNow!@: extended instruction sets. +SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AES, PCLMUL, FSGSBASE, RDRND, +F16C, SSE4A, FMA4, XOP, LWP, ABM or 3DNow!@: extended instruction sets. These extensions are also available as built-in functions: see @ref{X86 Built-in Functions}, for details of the functions enabled and disabled by these switches. diff --git a/gcc/testsuite/ChangeLog.ix86 b/gcc/testsuite/ChangeLog.ix86 index b902c5b..4477d2d 100644 --- a/gcc/testsuite/ChangeLog.ix86 +++ b/gcc/testsuite/ChangeLog.ix86 @@ -1,6 +1,58 @@ 2010-07-07 H.J. Lu Backport from mainline + 2010-07-07 H.J. Lu + + PR target/44844 + * gcc.target/i386/rdrand-1.c: Scan "jnc". + * gcc.target/i386/rdrand-2.c: Likewise. + * gcc.target/i386/rdrand-3.c: Likewise. + + 2010-07-05 H.J. Lu + + AVX Programming Reference (June, 2010) + * g++.dg/other/i386-2.C: Add -mfsgsbase -mrdrnd -mf16c. + * g++.dg/other/i386-3.C: Likewise. + * gcc.target/i386/sse-12.c: Likewise. + + * gcc.target/i386/f16c-check.h: New. + * gcc.target/i386/rdfsbase-1.c: Likewise. + * gcc.target/i386/rdfsbase-2.c: Likewise. + * gcc.target/i386/rdgsbase-1.c: Likewise. + * gcc.target/i386/rdgsbase-2.c: Likewise. + * gcc.target/i386/rdrand-1.c: Likewise. + * gcc.target/i386/rdrand-2.c: Likewise. + * gcc.target/i386/rdrand-3.c: Likewise. + * gcc.target/i386/vcvtph2ps-1.c: Likewise. + * gcc.target/i386/vcvtph2ps-2.c: Likewise. + * gcc.target/i386/vcvtph2ps-3.c: Likewise. + * gcc.target/i386/vcvtps2ph-1.c: Likewise. + * gcc.target/i386/vcvtps2ph-2.c: Likewise. + * gcc.target/i386/vcvtps2ph-3.c: Likewise. + * gcc.target/i386/wrfsbase-1.c: Likewise. + * gcc.target/i386/wrfsbase-2.c: Likewise. + * gcc.target/i386/wrgsbase-1.c: Likewise. + * gcc.target/i386/wrgsbase-2.c: Likewise. + + * gcc.target/i386/sse-13.c: Add -mfsgsbase -mrdrnd -mf16c. + (__builtin_ia32_vcvtps2ph): New. + (__builtin_ia32_vcvtps2ph256): Likewise. + + * gcc.target/i386/sse-14.c: Add -mfsgsbase -mrdrnd -mf16c. + Test _cvtss_sh, _mm_cvtps_ph and _mm256_cvtps_ph. + + * gcc.target/i386/sse-22.c: Add fsgsbase,rdrnd,f16c. + Test _cvtss_sh, _mm_cvtps_ph and _mm256_cvtps_ph. + + * gcc.target/i386/sse-23.c (__builtin_ia32_vcvtps2ph): New. + (__builtin_ia32_vcvtps2ph256): Likewise. + Add fsgsbase,rdrnd,f16c. + + * lib/target-supports.exp (check_effective_target_f16c): New. + +2010-07-07 H.J. Lu + + Backport from mainline 2010-07-04 H.J. Lu PR rtl-optimization/44695 diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index 952fa14..7297068 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp -mfsgsbase -mrdrnd -mf16c" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, abmintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h are usable with diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index 88dd769..75515ef 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp -mfsgsbase -mrdrnd -mf16c" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, xopintrin.h, abmintrin.h, lwpintrin.h, popcntintrin.h and mm3dnow.h are usable with diff --git a/gcc/testsuite/gcc.target/i386/f16c-check.h b/gcc/testsuite/gcc.target/i386/f16c-check.h new file mode 100644 index 0000000..af7f32c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/f16c-check.h @@ -0,0 +1,30 @@ +#include +#include +#include "cpuid.h" +#include "m256-check.h" + +static void f16c_test (void); + +int +main () +{ + unsigned int eax, ebx, ecx, edx; + + if (!__get_cpuid (1, &eax, &ebx, &ecx, &edx)) + return 0; + + /* Run F16C test only if host has F16C support. */ + if (ecx & bit_F16C) + { + f16c_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + } +#ifdef DEBUG + else + printf ("SKIPPED\n"); +#endif + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/rdfsbase-1.c b/gcc/testsuite/gcc.target/i386/rdfsbase-1.c new file mode 100644 index 0000000..c4808e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/rdfsbase-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "rdfsbase\[ \t]+(%|)eax" } } */ + +#include + +unsigned int +read_fs_base32 (void) +{ + return _readfsbase_u32 (); +} diff --git a/gcc/testsuite/gcc.target/i386/rdfsbase-2.c b/gcc/testsuite/gcc.target/i386/rdfsbase-2.c new file mode 100644 index 0000000..40b8f4a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/rdfsbase-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "rdfsbase\[ \t]+(%|)rax" } } */ + +#include + +unsigned long long +read_fs_base64 (void) +{ + return _readfsbase_u64 (); +} diff --git a/gcc/testsuite/gcc.target/i386/rdgsbase-1.c b/gcc/testsuite/gcc.target/i386/rdgsbase-1.c new file mode 100644 index 0000000..1e5a302 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/rdgsbase-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "rdgsbase\[ \t]+(%|)eax" } } */ + +#include + +unsigned int +read_gs_base32 (void) +{ + return _readgsbase_u32 (); +} diff --git a/gcc/testsuite/gcc.target/i386/rdgsbase-2.c b/gcc/testsuite/gcc.target/i386/rdgsbase-2.c new file mode 100644 index 0000000..1321582 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/rdgsbase-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "rdgsbase\[ \t]+(%|)rax" } } */ + +#include + +unsigned long long +read_gs_base64 (void) +{ + return _readgsbase_u64 (); +} diff --git a/gcc/testsuite/gcc.target/i386/rdrand-1.c b/gcc/testsuite/gcc.target/i386/rdrand-1.c new file mode 100644 index 0000000..4f6b9e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/rdrand-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mrdrnd " } */ +/* { dg-final { scan-assembler "rdrand\[ \t]+(%|)ax" } } */ +/* { dg-final { scan-assembler "jnc\[ \t]+" } } */ + +#include + +unsigned short +read_rdrand16 (void) +{ + return _rdrand_u16 (); +} diff --git a/gcc/testsuite/gcc.target/i386/rdrand-2.c b/gcc/testsuite/gcc.target/i386/rdrand-2.c new file mode 100644 index 0000000..2297383 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/rdrand-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mrdrnd " } */ +/* { dg-final { scan-assembler "rdrand\[ \t]+(%|)eax" } } */ +/* { dg-final { scan-assembler "jnc\[ \t]+" } } */ + +#include + +unsigned int +read_rdrand32 (void) +{ + return _rdrand_u32 (); +} diff --git a/gcc/testsuite/gcc.target/i386/rdrand-3.c b/gcc/testsuite/gcc.target/i386/rdrand-3.c new file mode 100644 index 0000000..17c7c6f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/rdrand-3.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mrdrnd " } */ +/* { dg-final { scan-assembler "rdrand\[ \t]+(%|)rax" } } */ +/* { dg-final { scan-assembler "jnc\[ \t]+" } } */ + +#include + +unsigned long long +read_rdrand64 (void) +{ + return _rdrand_u64 (); +} diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index 77baff0..2d50f41 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -2,7 +2,7 @@ abmintrin.h, lwpintrin.h, popcntintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -m3dnow -mavx -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlwp -mfsgsbase -mrdrnd -mf16c" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 96214e0..01809d0 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -maes -mpclmul -mpopcnt -mabm -mlwp -mfsgsbase -mrdrnd -mf16c" } */ #include @@ -50,6 +50,8 @@ #define __builtin_ia32_vinsertf128_si256(X, Y, C) __builtin_ia32_vinsertf128_si256(X, Y, 1) #define __builtin_ia32_roundpd256(V, M) __builtin_ia32_roundpd256(V, 1) #define __builtin_ia32_roundps256(V, M) __builtin_ia32_roundps256(V, 1) +#define __builtin_ia32_vcvtps2ph(A, I) __builtin_ia32_vcvtps2ph(A, 1) +#define __builtin_ia32_vcvtps2ph256(A, I) __builtin_ia32_vcvtps2ph256(A, 1) /* wmmintrin.h */ #define __builtin_ia32_aeskeygenassist128(X, C) __builtin_ia32_aeskeygenassist128(X, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 96a3f21..d256e68 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul -mpopcnt -mabm -mlwp" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -m3dnow -mavx -mxop -msse4a -maes -mpclmul -mpopcnt -mabm -mlwp -mfsgsbase -mrdrnd -mf16c" } */ #include @@ -89,6 +89,9 @@ test_2 (_mm256_insert_epi64, __m256i, __m256i, long long, 1) #endif test_1 (_mm256_round_pd, __m256d, __m256d, 1) test_1 (_mm256_round_ps, __m256, __m256, 1) +test_1 (_cvtss_sh, unsigned short, float, 1) +test_1 (_mm_cvtps_ph, __m128i, __m128, 1) +test_1 (_mm256_cvtps_ph, __m128i, __m256, 1) /* wmmintrin.h */ test_1 (_mm_aeskeygenassist_si128, __m128i, __m128i, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 6d97697..bb0472d 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -39,7 +39,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop,popcnt,abm,lwp") +#pragma GCC target ("mmx,3dnow,sse,sse2,sse3,ssse3,sse4.1,sse4.2,sse4a,aes,pclmul,xop,popcnt,abm,lwp,fsgsbase,rdrnd,f16c") #endif /* Following intrinsics require immediate arguments. They @@ -179,3 +179,12 @@ test_2 ( __lwpins32, unsigned char, unsigned int, unsigned int, 1) test_2 ( __lwpval64, void, unsigned long long, unsigned int, 1) test_2 ( __lwpins64, unsigned char, unsigned long long, unsigned int, 1) #endif + +/* immintrin.h (F16C). */ +#ifdef DIFFERENT_PRAGMAS +#pragma GCC target ("f16c") +#endif +#include +test_1 (_cvtss_sh, unsigned short, float, 1) +test_1 (_mm_cvtps_ph, __m128i, __m128, 1) +test_1 (_mm256_cvtps_ph, __m128i, __m256, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index f74d3a7..0e15bb2 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -126,6 +126,8 @@ #define __builtin_ia32_vinsertf128_si256(X, Y, C) __builtin_ia32_vinsertf128_si256(X, Y, 1) #define __builtin_ia32_roundpd256(V, M) __builtin_ia32_roundpd256(V, 1) #define __builtin_ia32_roundps256(V, M) __builtin_ia32_roundps256(V, 1) +#define __builtin_ia32_vcvtps2ph(A, I) __builtin_ia32_vcvtps2ph(A, 1) +#define __builtin_ia32_vcvtps2ph256(A, I) __builtin_ia32_vcvtps2ph256(A, 1) /* xopintrin.h */ #define __builtin_ia32_vprotbi(A, B) __builtin_ia32_vprotbi(A,1) @@ -139,7 +141,7 @@ #define __builtin_ia32_lwpins32(D2, D1, F) __builtin_ia32_lwpins32 (D2, D1, 1) #define __builtin_ia32_lwpins64(D2, D1, F) __builtin_ia32_lwpins64 (D2, D1, 1) -#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop,abm,popcnt,lwp") +#pragma GCC target ("3dnow,sse4,sse4a,aes,pclmul,xop,abm,popcnt,lwp,fsgsbase,rdrnd,f16c") #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/vcvtph2ps-1.c b/gcc/testsuite/gcc.target/i386/vcvtph2ps-1.c new file mode 100644 index 0000000..3b46671 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vcvtph2ps-1.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-require-effective-target f16c } */ +/* { dg-options "-O2 -mf16c" } */ + +#include "f16c-check.h" + +static void +f16c_test (void) +{ + union128i_w val; + union128 res; + float exp[4]; + + exp[0] = 1; + exp[1] = -2; + exp[2] = -1; + exp[3] = 2; + + val.a[0] = 0x3c00; + val.a[1] = 0xc000; + val.a[2] = 0xbc00; + val.a[3] = 0x4000; + + res.x = _mm_cvtph_ps (val.x); + + if (check_union128 (res, exp)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/vcvtph2ps-2.c b/gcc/testsuite/gcc.target/i386/vcvtph2ps-2.c new file mode 100644 index 0000000..1523dea --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vcvtph2ps-2.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-require-effective-target f16c } */ +/* { dg-options "-O2 -mf16c" } */ + +#include "f16c-check.h" + +static void +f16c_test (void) +{ + union256 res; + union128i_w val; + float exp[8]; + + exp[0] = 1; + exp[1] = 2; + exp[2] = 4; + exp[3] = 8; + exp[4] = -1; + exp[5] = -2; + exp[6] = -4; + exp[7] = -8; + + val.a[0] = 0x3c00; + val.a[1] = 0x4000; + val.a[2] = 0x4400; + val.a[3] = 0x4800; + val.a[4] = 0xbc00; + val.a[5] = 0xc000; + val.a[6] = 0xc400; + val.a[7] = 0xc800; + + res.x = _mm256_cvtph_ps (val.x); + + if (check_union256 (res, exp)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/vcvtph2ps-3.c b/gcc/testsuite/gcc.target/i386/vcvtph2ps-3.c new file mode 100644 index 0000000..49b61f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vcvtph2ps-3.c @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-require-effective-target f16c } */ +/* { dg-options "-O2 -mf16c" } */ + +#include "f16c-check.h" + +static void +f16c_test (void) +{ + unsigned short val = 0xc000; + float exp = -2; + float res; + + res = _cvtsh_ss (val); + + if (res != exp) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/vcvtps2ph-1.c b/gcc/testsuite/gcc.target/i386/vcvtps2ph-1.c new file mode 100644 index 0000000..c114c98 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vcvtps2ph-1.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-require-effective-target f16c } */ +/* { dg-options "-O2 -mf16c" } */ + +#include "f16c-check.h" + +static void +f16c_test (void) +{ + union128 val; + union128i_w res; + short exp[8]; + + val.a[0] = 1; + val.a[1] = -2; + val.a[2] = -1; + val.a[3] = 2; + + exp[0] = 0x3c00; + exp[1] = 0xc000; + exp[2] = 0xbc00; + exp[3] = 0x4000; + exp[4] = 0; + exp[5] = 0; + exp[6] = 0; + exp[7] = 0; + + res.x = _mm_cvtps_ph (val.x, 0); + + if (check_union128i_w (res, exp)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/vcvtps2ph-2.c b/gcc/testsuite/gcc.target/i386/vcvtps2ph-2.c new file mode 100644 index 0000000..57436ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vcvtps2ph-2.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-require-effective-target f16c } */ +/* { dg-options "-O2 -mf16c" } */ + +#include "f16c-check.h" + +static void +f16c_test (void) +{ + union256 val; + union128i_w res; + short exp[8]; + + val.a[0] = 1; + val.a[1] = 2; + val.a[2] = 4; + val.a[3] = 8; + val.a[4] = -1; + val.a[5] = -2; + val.a[6] = -4; + val.a[7] = -8; + + exp[0] = 0x3c00; + exp[1] = 0x4000; + exp[2] = 0x4400; + exp[3] = 0x4800; + exp[4] = 0xbc00; + exp[5] = 0xc000; + exp[6] = 0xc400; + exp[7] = 0xc800; + + res.x = _mm256_cvtps_ph (val.x, 0); + + if (check_union128i_w (res, exp)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/vcvtps2ph-3.c b/gcc/testsuite/gcc.target/i386/vcvtps2ph-3.c new file mode 100644 index 0000000..3b7cb5c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vcvtps2ph-3.c @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-require-effective-target f16c } */ +/* { dg-options "-O2 -mf16c" } */ + +#include "f16c-check.h" + +static void +f16c_test (void) +{ + float val = -2; + unsigned short exp = 0xc000; + unsigned short res; + + res = _cvtss_sh (val, 0); + + if (res != exp) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/wrfsbase-1.c b/gcc/testsuite/gcc.target/i386/wrfsbase-1.c new file mode 100644 index 0000000..4b84926 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/wrfsbase-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "wrfsbase\[ \t]+(%|)edi" } } */ + +#include + +void +write_fs_base32 (unsigned int base) +{ + _writefsbase_u32 (base); +} diff --git a/gcc/testsuite/gcc.target/i386/wrfsbase-2.c b/gcc/testsuite/gcc.target/i386/wrfsbase-2.c new file mode 100644 index 0000000..5e1762d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/wrfsbase-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "wrfsbase\[ \t]+(%|)rdi" } } */ + +#include + +void +write_fs_base64 (unsigned long long base) +{ + _writefsbase_u64 (base); +} diff --git a/gcc/testsuite/gcc.target/i386/wrgsbase-1.c b/gcc/testsuite/gcc.target/i386/wrgsbase-1.c new file mode 100644 index 0000000..15d2d7f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/wrgsbase-1.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "wrgsbase\[ \t]+(%|)edi" } } */ + +#include + +void +write_gs_base32 (unsigned int base) +{ + _writegsbase_u32 (base); +} diff --git a/gcc/testsuite/gcc.target/i386/wrgsbase-2.c b/gcc/testsuite/gcc.target/i386/wrgsbase-2.c new file mode 100644 index 0000000..0a33d77 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/wrgsbase-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mfsgsbase" } */ +/* { dg-final { scan-assembler "wrgsbase\[ \t]+(%|)rdi" } } */ + +#include + +void +write_gs_base64 (unsigned long long base) +{ + _writegsbase_u64 (base); +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 482f406..f49f04e 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3145,6 +3145,19 @@ proc check_effective_target_sse2 { } { } "-O2 -msse2" ] } +# Return 1 if F16C instructions can be compiled. + +proc check_effective_target_f16c { } { + return [check_no_compiler_messages f16c object { + #include "immintrin.h" + float + foo (unsigned short val) + { + return _cvtsh_ss (val); + } + } "-O2 -mf16c" ] +} + # Return 1 if C wchar_t type is compatible with char16_t. proc check_effective_target_wchar_t_char16_t_compatible { } {