From patchwork Thu Jun 18 15:09:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1312190 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=sourceware.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=e/YPGAjU; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49nljw6Fw2z9sTD for ; Fri, 19 Jun 2020 01:10:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AA9373851C3D; Thu, 18 Jun 2020 15:10:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AA9373851C3D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1592493025; bh=kHeAVeofWJ4C87qNzDqMcZEVod0OT2lN/tiZjhVPsYM=; h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=e/YPGAjUrHG66mPRWCiFj0jk89PFQ1EjN/8cyDMjmlYX4v/OybcQZEoXuSSVaJnlI iHAkvAheLp20+DhzcBvx+iNEcVNku8X0JlP0mGe/UjGU1AN+ZhMRslA/R5Bpn7fAgy UjUZAR0ioVLX+Gsm803epD7AAUjr/MSXDIKTU2jg= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd29.google.com (mail-io1-xd29.google.com [IPv6:2607:f8b0:4864:20::d29]) by sourceware.org (Postfix) with ESMTPS id C6D0F3851C3D for ; Thu, 18 Jun 2020 15:10:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C6D0F3851C3D Received: by mail-io1-xd29.google.com with SMTP id f23so2067812iof.6 for ; Thu, 18 Jun 2020 08:10:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kHeAVeofWJ4C87qNzDqMcZEVod0OT2lN/tiZjhVPsYM=; b=pXW6pTqkJ9fD9din+9eHbo6MJegNrqAtCmFdqoy/dOeW8GT+6yUmepp1AaeUJB7i0n IPxHFjMRE4k2BHqLiU25r7ACnr11UmaAQaqOLb9X09mJW8oi16uvFd8kQqZYXvNGX9Me m05E1kT9AyyYtROAu1vn0LGSFoJUb4ulX6dbVT+6GYGm9532kik5dHG+D/IHDA9oDG2x ORv9m99kIlOjcWwRnpoAXbjwGOhPkDXKYa/OHZ9DkTHiBc/WbU2YrrcpZ1Qog5nDVhqx DVV/as4RIuFz0YL732FCDuvwJwMzsuZvVZB/SKczFhL1JxzzpLUois/p1yuD9ccVOgEQ B7Dg== X-Gm-Message-State: AOAM533JMnbfloHeFHndhJNnfmthW6rhz9KpgRq1Gtn+KntICAJ5xWhq R4bXhYf3kHDLUMGXyict2xyp+t9k68Jzi7Fn3o4J+/1C X-Google-Smtp-Source: ABdhPJzxRiqXF81SSG3/Q4Pae4WWULsw+HstnVNRQ3jHpZZGbaxYastGxLQrQdxgG2yVf8DBsz8iHeSj6wNAc6JiKyU= X-Received: by 2002:a6b:e311:: with SMTP id u17mr5385046ioc.51.1592493019120; Thu, 18 Jun 2020 08:10:19 -0700 (PDT) MIME-Version: 1.0 References: <20200617144348.301851-1-hjl.tools@gmail.com> <87366sssl3.fsf@oldenburg2.str.redhat.com> In-Reply-To: <87366sssl3.fsf@oldenburg2.str.redhat.com> Date: Thu, 18 Jun 2020 08:09:43 -0700 Message-ID: Subject: V2 [PATCH] x86: Update CPU feature detection To: Florian Weimer X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Cc: "H.J. Lu via Libc-alpha" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On Thu, Jun 18, 2020 at 1:09 AM Florian Weimer wrote: > > * H. J. Lu via Libc-alpha: > > > 1. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. > > Do you expect similar changes in the future? This changes the layout of > struct cpu_features. > > If __x86_get_cpu_features is exported, such a change will require a new > symbol version for __x86_get_cpu_features and can therefore only be made > at a release boundary. Even then, statically linked libraries will > silently use the old layout of struct cpu_features, silently giving > incorrect results if linked against a newer glibc version. > Here is the updated patch with: 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that struct cpu_features { struct cpu_features_basic basic; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. Currently there are 70 unused bits in the usable array. They should be sufficient for future use. The cpuid array can be expanded with backward binary compatibility for both .o and .so files. From 43e56445730d6e40b00a489afc1276511547c5b7 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Wed, 17 Jun 2020 06:34:46 -0700 Subject: [PATCH] x86: Update CPU feature detection 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that struct cpu_features { struct cpu_features_basic basic; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. Corrently there are 70 unused bits in the usable array. They should be sufficient for future use. The cpuid array can be expanded with backward binary compatibility for both .o and .so files. 3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. 4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID, TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16. 5. Rename CAPABILITIES to ARCH_CAPABILITIES. 6. Check if AVX512_VP2INTERSECT and AVX512_BF16 are usable. 7. Update CPU feature detection test. --- .../unix/sysv/linux/x86_64/64/dl-librecon.h | 2 +- sysdeps/x86/cpu-features.c | 149 ++++--- sysdeps/x86/cpu-features.h | 417 +++++++----------- sysdeps/x86/cpu-tunables.c | 14 +- sysdeps/x86/tst-get-cpu-features.c | 66 +-- 5 files changed, 258 insertions(+), 390 deletions(-) diff --git a/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h b/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h index eeb2a4854d..8205ad11e6 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h +++ b/sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h @@ -33,7 +33,7 @@ case 21: \ if (!__libc_enable_secure \ && memcmp (envline, "PREFER_MAP_32BIT_EXEC", 21) == 0) \ - GLRO(dl_x86_cpu_features).feature[index_arch_Prefer_MAP_32BIT_EXEC] \ + GLRO(dl_x86_cpu_features).preferred[index_arch_Prefer_MAP_32BIT_EXEC] \ |= bit_arch_Prefer_MAP_32BIT_EXEC; \ break; diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 5b4a30e220..ed465aac0a 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -90,11 +90,18 @@ get_common_indices (struct cpu_features *cpu_features, } if (cpu_features->basic.max_cpuid >= 7) - __cpuid_count (7, 0, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].eax, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].ebx, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].ecx, - cpu_features->cpuid[COMMON_CPUID_INDEX_7].edx); + { + __cpuid_count (7, 0, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].eax, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].ebx, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].ecx, + cpu_features->cpuid[COMMON_CPUID_INDEX_7].edx); + __cpuid_count (7, 1, + cpu_features->cpuid[COMMON_CPUID_INDEX_7_ECX_1].eax, + cpu_features->cpuid[COMMON_CPUID_INDEX_7_ECX_1].ebx, + cpu_features->cpuid[COMMON_CPUID_INDEX_7_ECX_1].ecx, + cpu_features->cpuid[COMMON_CPUID_INDEX_7_ECX_1].edx); + } if (cpu_features->basic.max_cpuid >= 0xd) __cpuid_count (0xd, 1, @@ -116,39 +123,39 @@ get_common_indices (struct cpu_features *cpu_features, /* Determine if AVX is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX)) { - cpu_features->feature[index_arch_AVX_Usable] + cpu_features->usable[index_arch_AVX_Usable] |= bit_arch_AVX_Usable; /* The following features depend on AVX being usable. */ /* Determine if AVX2 is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX2)) { - cpu_features->feature[index_arch_AVX2_Usable] + cpu_features->usable[index_arch_AVX2_Usable] |= bit_arch_AVX2_Usable; /* Unaligned load with 256-bit AVX registers are faster on Intel/AMD processors with AVX2. */ - cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load] + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] |= bit_arch_AVX_Fast_Unaligned_Load; } /* Determine if FMA is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, FMA)) - cpu_features->feature[index_arch_FMA_Usable] + cpu_features->usable[index_arch_FMA_Usable] |= bit_arch_FMA_Usable; /* Determine if VAES is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, VAES)) - cpu_features->feature[index_arch_VAES_Usable] + cpu_features->usable[index_arch_VAES_Usable] |= bit_arch_VAES_Usable; /* Determine if VPCLMULQDQ is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, VPCLMULQDQ)) - cpu_features->feature[index_arch_VPCLMULQDQ_Usable] + cpu_features->usable[index_arch_VPCLMULQDQ_Usable] |= bit_arch_VPCLMULQDQ_Usable; /* Determine if XOP is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, XOP)) - cpu_features->feature[index_arch_XOP_Usable] + cpu_features->usable[index_arch_XOP_Usable] |= bit_arch_XOP_Usable; /* Determine if F16C is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, F16C)) - cpu_features->feature[index_arch_F16C_Usable] + cpu_features->usable[index_arch_F16C_Usable] |= bit_arch_F16C_Usable; } @@ -161,64 +168,73 @@ get_common_indices (struct cpu_features *cpu_features, /* Determine if AVX512F is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512F)) { - cpu_features->feature[index_arch_AVX512F_Usable] + cpu_features->usable[index_arch_AVX512F_Usable] |= bit_arch_AVX512F_Usable; /* Determine if AVX512CD is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512CD)) - cpu_features->feature[index_arch_AVX512CD_Usable] + cpu_features->usable[index_arch_AVX512CD_Usable] |= bit_arch_AVX512CD_Usable; /* Determine if AVX512ER is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512ER)) - cpu_features->feature[index_arch_AVX512ER_Usable] + cpu_features->usable[index_arch_AVX512ER_Usable] |= bit_arch_AVX512ER_Usable; /* Determine if AVX512PF is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512PF)) - cpu_features->feature[index_arch_AVX512PF_Usable] + cpu_features->usable[index_arch_AVX512PF_Usable] |= bit_arch_AVX512PF_Usable; /* Determine if AVX512VL is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512VL)) - cpu_features->feature[index_arch_AVX512VL_Usable] + cpu_features->usable[index_arch_AVX512VL_Usable] |= bit_arch_AVX512VL_Usable; /* Determine if AVX512DQ is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512DQ)) - cpu_features->feature[index_arch_AVX512DQ_Usable] + cpu_features->usable[index_arch_AVX512DQ_Usable] |= bit_arch_AVX512DQ_Usable; /* Determine if AVX512BW is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512BW)) - cpu_features->feature[index_arch_AVX512BW_Usable] + cpu_features->usable[index_arch_AVX512BW_Usable] |= bit_arch_AVX512BW_Usable; /* Determine if AVX512_4FMAPS is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_4FMAPS)) - cpu_features->feature[index_arch_AVX512_4FMAPS_Usable] + cpu_features->usable[index_arch_AVX512_4FMAPS_Usable] |= bit_arch_AVX512_4FMAPS_Usable; /* Determine if AVX512_4VNNIW is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_4VNNIW)) - cpu_features->feature[index_arch_AVX512_4VNNIW_Usable] + cpu_features->usable[index_arch_AVX512_4VNNIW_Usable] |= bit_arch_AVX512_4VNNIW_Usable; /* Determine if AVX512_BITALG is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_BITALG)) - cpu_features->feature[index_arch_AVX512_BITALG_Usable] + cpu_features->usable[index_arch_AVX512_BITALG_Usable] |= bit_arch_AVX512_BITALG_Usable; /* Determine if AVX512_IFMA is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_IFMA)) - cpu_features->feature[index_arch_AVX512_IFMA_Usable] + cpu_features->usable[index_arch_AVX512_IFMA_Usable] |= bit_arch_AVX512_IFMA_Usable; /* Determine if AVX512_VBMI is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_VBMI)) - cpu_features->feature[index_arch_AVX512_VBMI_Usable] + cpu_features->usable[index_arch_AVX512_VBMI_Usable] |= bit_arch_AVX512_VBMI_Usable; /* Determine if AVX512_VBMI2 is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_VBMI2)) - cpu_features->feature[index_arch_AVX512_VBMI2_Usable] + cpu_features->usable[index_arch_AVX512_VBMI2_Usable] |= bit_arch_AVX512_VBMI2_Usable; /* Determine if is AVX512_VNNI usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_VNNI)) - cpu_features->feature[index_arch_AVX512_VNNI_Usable] + cpu_features->usable[index_arch_AVX512_VNNI_Usable] |= bit_arch_AVX512_VNNI_Usable; /* Determine if AVX512_VPOPCNTDQ is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512_VPOPCNTDQ)) - cpu_features->feature[index_arch_AVX512_VPOPCNTDQ_Usable] + cpu_features->usable[index_arch_AVX512_VPOPCNTDQ_Usable] |= bit_arch_AVX512_VPOPCNTDQ_Usable; + /* Determine if AVX512_VP2INTERSECT is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, + AVX512_VP2INTERSECT)) + cpu_features->usable[index_arch_AVX512_VP2INTERSECT_Usable] + |= bit_arch_AVX512_VP2INTERSECT_Usable; + /* Determine if AVX512_BF16 is usable. */ + if (CPU_FEATURES_CPU_P (cpu_features, AVX512_BF16)) + cpu_features->usable[index_arch_AVX512_BF16_Usable] + |= bit_arch_AVX512_BF16_Usable; } } } @@ -284,7 +300,7 @@ get_common_indices (struct cpu_features *cpu_features, { cpu_features->xsave_state_size = ALIGN_UP (size + STATE_SAVE_OFFSET, 64); - cpu_features->feature[index_arch_XSAVEC_Usable] + cpu_features->usable[index_arch_XSAVEC_Usable] |= bit_arch_XSAVEC_Usable; } } @@ -344,7 +360,7 @@ init_cpu_features (struct cpu_features *cpu_features) case 0x1c: case 0x26: /* BSF is slow on Atom. */ - cpu_features->feature[index_arch_Slow_BSF] + cpu_features->preferred[index_arch_Slow_BSF] |= bit_arch_Slow_BSF; break; @@ -371,7 +387,7 @@ init_cpu_features (struct cpu_features *cpu_features) case 0x5d: /* Unaligned load versions are faster than SSSE3 on Silvermont. */ - cpu_features->feature[index_arch_Fast_Unaligned_Load] + cpu_features->preferred[index_arch_Fast_Unaligned_Load] |= (bit_arch_Fast_Unaligned_Load | bit_arch_Fast_Unaligned_Copy | bit_arch_Prefer_PMINUB_for_stringop @@ -383,7 +399,7 @@ init_cpu_features (struct cpu_features *cpu_features) case 0x9c: /* Enable rep string instructions, unaligned load, unaligned copy, pminub and avoid SSE 4.2 on Tremont. */ - cpu_features->feature[index_arch_Fast_Rep_String] + cpu_features->preferred[index_arch_Fast_Rep_String] |= (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligned_Load | bit_arch_Fast_Unaligned_Copy @@ -407,7 +423,7 @@ init_cpu_features (struct cpu_features *cpu_features) case 0x2f: /* Rep string instructions, unaligned load, unaligned copy, and pminub are fast on Intel Core i3, i5 and i7. */ - cpu_features->feature[index_arch_Fast_Rep_String] + cpu_features->preferred[index_arch_Fast_Rep_String] |= (bit_arch_Fast_Rep_String | bit_arch_Fast_Unaligned_Load | bit_arch_Fast_Unaligned_Copy @@ -442,10 +458,10 @@ init_cpu_features (struct cpu_features *cpu_features) if AVX512ER is available. Don't use AVX512 to avoid lower CPU frequency if AVX512ER isn't available. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512ER)) - cpu_features->feature[index_arch_Prefer_No_VZEROUPPER] + cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] |= bit_arch_Prefer_No_VZEROUPPER; else - cpu_features->feature[index_arch_Prefer_No_AVX512] + cpu_features->preferred[index_arch_Prefer_No_AVX512] |= bit_arch_Prefer_No_AVX512; } /* This spells out "AuthenticAMD" or "HygonGenuine". */ @@ -468,7 +484,7 @@ init_cpu_features (struct cpu_features *cpu_features) /* Since the FMA4 bit is in COMMON_CPUID_INDEX_80000001 and FMA4 requires AVX, determine if FMA4 is usable here. */ if (CPU_FEATURES_CPU_P (cpu_features, FMA4)) - cpu_features->feature[index_arch_FMA4_Usable] + cpu_features->usable[index_arch_FMA4_Usable] |= bit_arch_FMA4_Usable; } @@ -477,13 +493,13 @@ init_cpu_features (struct cpu_features *cpu_features) /* "Excavator" */ if (model >= 0x60 && model <= 0x7f) { - cpu_features->feature[index_arch_Fast_Unaligned_Load] + cpu_features->preferred[index_arch_Fast_Unaligned_Load] |= (bit_arch_Fast_Unaligned_Load | bit_arch_Fast_Copy_Backward); /* Unaligned AVX loads are slower.*/ - cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; } } } @@ -505,41 +521,38 @@ init_cpu_features (struct cpu_features *cpu_features) { if (model == 0xf || model == 0x19) { - cpu_features->feature[index_arch_AVX_Usable] - &= (~bit_arch_AVX_Usable - & ~bit_arch_AVX2_Usable); + cpu_features->usable[index_arch_AVX_Usable] + &= ~(bit_arch_AVX_Usable | bit_arch_AVX2_Usable); - cpu_features->feature[index_arch_Slow_SSE4_2] - |= (bit_arch_Slow_SSE4_2); + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; - cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; } } else if (family == 0x7) { - if (model == 0x1b) - { - cpu_features->feature[index_arch_AVX_Usable] - &= (~bit_arch_AVX_Usable - & ~bit_arch_AVX2_Usable); + if (model == 0x1b) + { + cpu_features->usable[index_arch_AVX_Usable] + &= ~(bit_arch_AVX_Usable | bit_arch_AVX2_Usable); - cpu_features->feature[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; - cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - else if (model == 0x3b) - { - cpu_features->feature[index_arch_AVX_Usable] - &= (~bit_arch_AVX_Usable - & ~bit_arch_AVX2_Usable); - - cpu_features->feature[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - } + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + } + else if (model == 0x3b) + { + cpu_features->usable[index_arch_AVX_Usable] + &= ~(bit_arch_AVX_Usable | bit_arch_AVX2_Usable); + + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + } + } } else { @@ -549,11 +562,11 @@ init_cpu_features (struct cpu_features *cpu_features) /* Support i586 if CX8 is available. */ if (CPU_FEATURES_CPU_P (cpu_features, CX8)) - cpu_features->feature[index_arch_I586] |= bit_arch_I586; + cpu_features->preferred[index_arch_I586] |= bit_arch_I586; /* Support i686 if CMOV is available. */ if (CPU_FEATURES_CPU_P (cpu_features, CMOV)) - cpu_features->feature[index_arch_I686] |= bit_arch_I686; + cpu_features->preferred[index_arch_I686] |= bit_arch_I686; #if !HAS_CPUID no_cpuid: diff --git a/sysdeps/x86/cpu-features.h b/sysdeps/x86/cpu-features.h index 722bcdc427..f615588a37 100644 --- a/sysdeps/x86/cpu-features.h +++ b/sysdeps/x86/cpu-features.h @@ -20,12 +20,22 @@ enum { - /* The integer bit array index for the first set of internal feature + /* The integer bit array index for the first set of usable feature bits. */ - FEATURE_INDEX_1 = 0, - FEATURE_INDEX_2, + USABLE_FEATURE_INDEX_1 = 0, + USABLE_FEATURE_INDEX_2, + USABLE_FEATURE_INDEX_3, /* The current maximum size of the feature integer bit array. */ - FEATURE_INDEX_MAX + USABLE_FEATURE_INDEX_MAX +}; + +enum +{ + /* The integer bit array index for the first set of preferred feature + bits. */ + PREFERRED_FEATURE_INDEX_1 = 0, + /* The current maximum size of the feature integer bit array. */ + PREFERRED_FEATURE_INDEX_MAX }; enum @@ -36,6 +46,7 @@ enum COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007, COMMON_CPUID_INDEX_80000008, + COMMON_CPUID_INDEX_7_ECX_1, /* Keep the following line at the end. */ COMMON_CPUID_INDEX_MAX }; @@ -68,9 +79,10 @@ struct cpu_features_basic struct cpu_features { - struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; - unsigned int feature[FEATURE_INDEX_MAX]; struct cpu_features_basic basic; + unsigned int usable[USABLE_FEATURE_INDEX_MAX]; + struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; + unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; /* The state size for XSAVEC or XSAVE. The type must be unsigned long int so that we use @@ -102,7 +114,7 @@ extern const struct cpu_features *__get_cpu_features (void) # define CPU_FEATURES_CPU_P(ptr, name) \ ((ptr->cpuid[index_cpu_##name].reg_##name & (bit_cpu_##name)) != 0) # define CPU_FEATURES_ARCH_P(ptr, name) \ - ((ptr->feature[index_arch_##name] & (bit_arch_##name)) != 0) + ((ptr->feature_##name[index_arch_##name] & (bit_arch_##name)) != 0) /* HAS_CPU_FEATURE evaluates to true if CPU supports the feature. */ #define HAS_CPU_FEATURE(name) \ @@ -112,13 +124,12 @@ extern const struct cpu_features *__get_cpu_features (void) # define HAS_ARCH_FEATURE(name) \ CPU_FEATURES_ARCH_P (__get_cpu_features (), name) /* CPU_FEATURE_USABLE evaluates to true if the feature is usable. */ -#define CPU_FEATURE_USABLE(name) \ - ((need_arch_feature_##name && HAS_ARCH_FEATURE (name##_Usable)) \ - || (!need_arch_feature_##name && HAS_CPU_FEATURE(name))) +#define CPU_FEATURE_USABLE(name) \ + HAS_ARCH_FEATURE (name##_Usable) /* Architecture features. */ -/* FEATURE_INDEX_1. */ +/* USABLE_FEATURE_INDEX_1. */ #define bit_arch_AVX_Usable (1u << 0) #define bit_arch_AVX2_Usable (1u << 1) #define bit_arch_AVX512F_Usable (1u << 2) @@ -143,237 +154,62 @@ extern const struct cpu_features *__get_cpu_features (void) #define bit_arch_XOP_Usable (1u << 21) #define bit_arch_XSAVEC_Usable (1u << 22) #define bit_arch_F16C_Usable (1u << 23) - -#define index_arch_AVX_Usable FEATURE_INDEX_1 -#define index_arch_AVX2_Usable FEATURE_INDEX_1 -#define index_arch_AVX512F_Usable FEATURE_INDEX_1 -#define index_arch_AVX512CD_Usable FEATURE_INDEX_1 -#define index_arch_AVX512ER_Usable FEATURE_INDEX_1 -#define index_arch_AVX512PF_Usable FEATURE_INDEX_1 -#define index_arch_AVX512VL_Usable FEATURE_INDEX_1 -#define index_arch_AVX512BW_Usable FEATURE_INDEX_1 -#define index_arch_AVX512DQ_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_4FMAPS_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_4VNNIW_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_BITALG_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_IFMA_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_VBMI_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_VBMI2_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_VNNI_Usable FEATURE_INDEX_1 -#define index_arch_AVX512_VPOPCNTDQ_Usable FEATURE_INDEX_1 -#define index_arch_FMA_Usable FEATURE_INDEX_1 -#define index_arch_FMA4_Usable FEATURE_INDEX_1 -#define index_arch_VAES_Usable FEATURE_INDEX_1 -#define index_arch_VPCLMULQDQ_Usable FEATURE_INDEX_1 -#define index_arch_XOP_Usable FEATURE_INDEX_1 -#define index_arch_XSAVEC_Usable FEATURE_INDEX_1 -#define index_arch_F16C_Usable FEATURE_INDEX_1 - -/* Unused. Compiler will optimize them out. */ -#define bit_arch_SSE3_Usable (1u << 0) -#define bit_arch_PCLMULQDQ_Usable (1u << 0) -#define bit_arch_SSSE3_Usable (1u << 0) -#define bit_arch_CMPXCHG16B_Usable (1u << 0) -#define bit_arch_SSE4_1_Usable (1u << 0) -#define bit_arch_SSE4_2_Usable (1u << 0) -#define bit_arch_MOVBE_Usable (1u << 0) -#define bit_arch_POPCNT_Usable (1u << 0) -#define bit_arch_AES_Usable (1u << 0) -#define bit_arch_XSAVE_Usable (1u << 0) -#define bit_arch_OSXSAVE_Usable (1u << 0) -#define bit_arch_RDRAND_Usable (1u << 0) -#define bit_arch_FPU_Usable (1u << 0) -#define bit_arch_TSC_Usable (1u << 0) -#define bit_arch_MSR_Usable (1u << 0) -#define bit_arch_CX8_Usable (1u << 0) -#define bit_arch_SEP_Usable (1u << 0) -#define bit_arch_CMOV_Usable (1u << 0) -#define bit_arch_CLFSH_Usable (1u << 0) -#define bit_arch_MMX_Usable (1u << 0) -#define bit_arch_FXSR_Usable (1u << 0) -#define bit_arch_SSE_Usable (1u << 0) -#define bit_arch_SSE2_Usable (1u << 0) -#define bit_arch_FSGSBASE_Usable (1u << 0) -#define bit_arch_BMI1_Usable (1u << 0) -#define bit_arch_HLE_Usable (1u << 0) -#define bit_arch_BMI2_Usable (1u << 0) -#define bit_arch_ERMS_Usable (1u << 0) -#define bit_arch_RTM_Usable (1u << 0) -#define bit_arch_RDSEED_Usable (1u << 0) -#define bit_arch_ADX_Usable (1u << 0) -#define bit_arch_CLFLUSHOPT_Usable (1u << 0) -#define bit_arch_CLWB_Usable (1u << 0) -#define bit_arch_SHA_Usable (1u << 0) -#define bit_arch_PREFETCHWT1_Usable (1u << 0) -#define bit_arch_GFNI_Usable (1u << 0) -#define bit_arch_RDPID_Usable (1u << 0) -#define bit_arch_CLDEMOTE_Usable (1u << 0) -#define bit_arch_MOVDIRI_Usable (1u << 0) -#define bit_arch_MOVDIR64B_Usable (1u << 0) -#define bit_arch_FSRM_Usable (1u << 0) -#define bit_arch_LAHF64_SAHF64_Usable (1u << 0) -#define bit_arch_SVM_Usable (1u << 0) -#define bit_arch_LZCNT_Usable (1u << 0) -#define bit_arch_SSE4A_Usable (1u << 0) -#define bit_arch_PREFETCHW_Usable (1u << 0) -#define bit_arch_TBM_Usable (1u << 0) -#define bit_arch_SYSCALL_SYSRET_Usable (1u << 0) -#define bit_arch_RDTSCP_Usable (1u << 0) -#define bit_arch_XSAVEOPT_Usable (1u << 0) -#define bit_arch_XGETBV_ECX_1_Usable (1u << 0) -#define bit_arch_XSAVES_Usable (1u << 0) -#define bit_arch_INVARIANT_TSC_Usable (1u << 0) -#define bit_arch_WBNOINVD_Usable (1u << 0) - -/* Unused. Compiler will optimize them out. */ -#define index_arch_SSE3_Usable FEATURE_INDEX_1 -#define index_arch_PCLMULQDQ_Usable FEATURE_INDEX_1 -#define index_arch_SSSE3_Usable FEATURE_INDEX_1 -#define index_arch_CMPXCHG16B_Usable FEATURE_INDEX_1 -#define index_arch_SSE4_1_Usable FEATURE_INDEX_1 -#define index_arch_SSE4_2_Usable FEATURE_INDEX_1 -#define index_arch_MOVBE_Usable FEATURE_INDEX_1 -#define index_arch_POPCNT_Usable FEATURE_INDEX_1 -#define index_arch_AES_Usable FEATURE_INDEX_1 -#define index_arch_XSAVE_Usable FEATURE_INDEX_1 -#define index_arch_OSXSAVE_Usable FEATURE_INDEX_1 -#define index_arch_RDRAND_Usable FEATURE_INDEX_1 -#define index_arch_FPU_Usable FEATURE_INDEX_1 -#define index_arch_TSC_Usable FEATURE_INDEX_1 -#define index_arch_MSR_Usable FEATURE_INDEX_1 -#define index_arch_CX8_Usable FEATURE_INDEX_1 -#define index_arch_SEP_Usable FEATURE_INDEX_1 -#define index_arch_CMOV_Usable FEATURE_INDEX_1 -#define index_arch_CLFSH_Usable FEATURE_INDEX_1 -#define index_arch_MMX_Usable FEATURE_INDEX_1 -#define index_arch_FXSR_Usable FEATURE_INDEX_1 -#define index_arch_SSE_Usable FEATURE_INDEX_1 -#define index_arch_SSE2_Usable FEATURE_INDEX_1 -#define index_arch_FSGSBASE_Usable FEATURE_INDEX_1 -#define index_arch_BMI1_Usable FEATURE_INDEX_1 -#define index_arch_HLE_Usable FEATURE_INDEX_1 -#define index_arch_BMI2_Usable FEATURE_INDEX_1 -#define index_arch_ERMS_Usable FEATURE_INDEX_1 -#define index_arch_RTM_Usable FEATURE_INDEX_1 -#define index_arch_RDSEED_Usable FEATURE_INDEX_1 -#define index_arch_ADX_Usable FEATURE_INDEX_1 -#define index_arch_CLFLUSHOPT_Usable FEATURE_INDEX_1 -#define index_arch_CLWB_Usable FEATURE_INDEX_1 -#define index_arch_SHA_Usable FEATURE_INDEX_1 -#define index_arch_PREFETCHWT1_Usable FEATURE_INDEX_1 -#define index_arch_GFNI_Usable FEATURE_INDEX_1 -#define index_arch_RDPID_Usable FEATURE_INDEX_1 -#define index_arch_CLDEMOTE_Usable FEATURE_INDEX_1 -#define index_arch_MOVDIRI_Usable FEATURE_INDEX_1 -#define index_arch_MOVDIR64B_Usable FEATURE_INDEX_1 -#define index_arch_FSRM_Usable FEATURE_INDEX_1 -#define index_arch_LAHF64_SAHF64_Usable FEATURE_INDEX_1 -#define index_arch_LZCNT_Usable FEATURE_INDEX_1 -#define index_arch_SSE4A_Usable FEATURE_INDEX_1 -#define index_arch_PREFETCHW_Usable FEATURE_INDEX_1 -#define index_arch_TBM_Usable FEATURE_INDEX_1 -#define index_arch_SYSCALL_SYSRET_Usable FEATURE_INDEX_1 -#define index_arch_RDTSCP_Usable FEATURE_INDEX_1 -#define index_arch_XSAVEOPT_Usable FEATURE_INDEX_1 -#define index_arch_XGETBV_ECX_1_Usable FEATURE_INDEX_1 -#define index_arch_XSAVES_Usable FEATURE_INDEX_1 -#define index_arch_INVARIANT_TSC_Usable FEATURE_INDEX_1 -#define index_arch_WBNOINVD_Usable FEATURE_INDEX_1 - -/* COMMON_CPUID_INDEX_1. */ - -/* ECX. */ -#define need_arch_feature_SSE3 0 -#define need_arch_feature_PCLMULQDQ 0 -#define need_arch_feature_SSSE3 0 -#define need_arch_feature_FMA 1 -#define need_arch_feature_CMPXCHG16B 0 -#define need_arch_feature_SSE4_1 0 -#define need_arch_feature_SSE4_2 0 -#define need_arch_feature_MOVBE 0 -#define need_arch_feature_POPCNT 0 -#define need_arch_feature_AES 0 -#define need_arch_feature_XSAVE 0 -#define need_arch_feature_OSXSAVE 0 -#define need_arch_feature_AVX 1 -#define need_arch_feature_F16C 1 -#define need_arch_feature_RDRAND 0 - -/* EDX. */ -#define need_arch_feature_FPU 0 -#define need_arch_feature_TSC 0 -#define need_arch_feature_MSR 0 -#define need_arch_feature_CX8 0 -#define need_arch_feature_SEP 0 -#define need_arch_feature_CMOV 0 -#define need_arch_feature_CLFSH 0 -#define need_arch_feature_MMX 0 -#define need_arch_feature_FXSR 0 -#define need_arch_feature_SSE 0 -#define need_arch_feature_SSE2 0 - -/* COMMON_CPUID_INDEX_7. */ - -/* EBX. */ -#define need_arch_feature_FSGSBASE 0 -#define need_arch_feature_BMI1 0 -#define need_arch_feature_HLE 0 -#define need_arch_feature_AVX2 1 -#define need_arch_feature_BMI2 0 -#define need_arch_feature_ERMS 0 -#define need_arch_feature_RTM 0 -#define need_arch_feature_AVX512F 1 -#define need_arch_feature_AVX512DQ 1 -#define need_arch_feature_RDSEED 0 -#define need_arch_feature_ADX 0 -#define need_arch_feature_AVX512_IFMA 1 -#define need_arch_feature_CLFLUSHOPT 0 -#define need_arch_feature_CLWB 0 -#define need_arch_feature_AVX512PF 1 -#define need_arch_feature_AVX512ER 1 -#define need_arch_feature_AVX512CD 1 -#define need_arch_feature_SHA 0 -#define need_arch_feature_AVX512BW 1 -#define need_arch_feature_AVX512VL 1 - -/* ECX. */ -#define need_arch_feature_PREFETCHWT1 0 -#define need_arch_feature_AVX512_VBMI 1 -#define need_arch_feature_AVX512_VBMI2 1 -#define need_arch_feature_GFNI 0 -#define need_arch_feature_VAES 1 -#define need_arch_feature_VPCLMULQDQ 1 -#define need_arch_feature_AVX512_VNNI 1 -#define need_arch_feature_AVX512_BITALG 1 -#define need_arch_feature_AVX512_VPOPCNTDQ 1 -#define need_arch_feature_RDPID 0 -#define need_arch_feature_CLDEMOTE 0 -#define need_arch_feature_MOVDIRI 0 -#define need_arch_feature_MOVDIR64B 0 - -/* EDX. */ -#define need_arch_feature_AVX512_4VNNIW 1 -#define need_arch_feature_AVX512_4FMAPS 1 -#define need_arch_feature_FSRM 0 - -/* COMMON_CPUID_INDEX_80000001. */ - -/* ECX. */ -#define need_arch_feature_LAHF64_SAHF64 0 -#define need_arch_feature_LZCNT 0 -#define need_arch_feature_SSE4A 0 -#define need_arch_feature_PREFETCHW 0 -#define need_arch_feature_XOP 1 -#define need_arch_feature_FMA4 1 -#define need_arch_feature_TBM 0 -#define need_arch_feature_SYSCALL_SYSRET 0 -#define need_arch_feature_RDTSCP 0 -#define need_arch_feature_XSAVEOPT 0 -#define need_arch_feature_XSAVEC 1 -#define need_arch_feature_XGETBV_ECX_1 0 -#define need_arch_feature_XSAVES 0 -#define need_arch_feature_INVARIANT_TSC 0 -#define need_arch_feature_WBNOINVD 0 +#define bit_arch_AVX512_VP2INTERSECT_Usable (1u << 24) +#define bit_arch_AVX512_BF16_Usable (1u << 25) + +#define index_arch_AVX_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX2_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512F_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512CD_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512ER_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512PF_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512VL_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512BW_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512DQ_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_4FMAPS_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_4VNNIW_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_BITALG_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_IFMA_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_VBMI_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_VBMI2_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_VNNI_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_VPOPCNTDQ_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_FMA_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_FMA4_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_VAES_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_VPCLMULQDQ_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_XOP_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_XSAVEC_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_F16C_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_VP2INTERSECT_Usable USABLE_FEATURE_INDEX_1 +#define index_arch_AVX512_BF16_Usable USABLE_FEATURE_INDEX_1 + +#define feature_AVX_Usable usable +#define feature_AVX2_Usable usable +#define feature_AVX512F_Usable usable +#define feature_AVX512CD_Usable usable +#define feature_AVX512ER_Usable usable +#define feature_AVX512PF_Usable usable +#define feature_AVX512VL_Usable usable +#define feature_AVX512BW_Usable usable +#define feature_AVX512DQ_Usable usable +#define feature_AVX512_4FMAPS_Usable usable +#define feature_AVX512_4VNNIW_Usable usable +#define feature_AVX512_BITALG_Usable usable +#define feature_AVX512_IFMA_Usable usable +#define feature_AVX512_VBMI_Usable usable +#define feature_AVX512_VBMI2_Usable usable +#define feature_AVX512_VNNI_Usable usable +#define feature_AVX512_VPOPCNTDQ_Usable usable +#define feature_FMA_Usable usable +#define feature_FMA4_Usable usable +#define feature_VAES_Usable usable +#define feature_VPCLMULQDQ_Usable usable +#define feature_XOP_Usable usable +#define feature_XSAVEC_Usable usable +#define feature_F16C_Usable usable +#define feature_AVX512_VP2INTERSECT_Usable usable +#define feature_AVX512_BF16_Usable usable /* CPU features. */ @@ -494,17 +330,26 @@ extern const struct cpu_features *__get_cpu_features (void) #define bit_cpu_CLDEMOTE (1u << 25) #define bit_cpu_MOVDIRI (1u << 27) #define bit_cpu_MOVDIR64B (1u << 28) +#define bit_cpu_ENQCMD (1u << 29) #define bit_cpu_SGX_LC (1u << 30) +#define bit_cpu_PKS (1u << 31) /* EDX. */ #define bit_cpu_AVX512_4VNNIW (1u << 2) #define bit_cpu_AVX512_4FMAPS (1u << 3) #define bit_cpu_FSRM (1u << 4) +#define bit_cpu_AVX512_VP2INTERSECT (1u << 8) +#define bit_cpu_MD_CLEAR (1u << 10) +#define bit_cpu_SERIALIZE (1u << 14) +#define bit_cpu_HYBRID (1u << 15) +#define bit_cpu_TSXLDTRK (1u << 16) #define bit_cpu_PCONFIG (1u << 18) #define bit_cpu_IBT (1u << 20) #define bit_cpu_IBRS_IBPB (1u << 26) #define bit_cpu_STIBP (1u << 27) -#define bit_cpu_CAPABILITIES (1u << 29) +#define bit_cpu_L1D_FLUSH (1u << 28) +#define bit_cpu_ARCH_CAPABILITIES (1u << 29) +#define bit_cpu_CORE_CAPABILITIES (1u << 30) #define bit_cpu_SSBD (1u << 31) /* COMMON_CPUID_INDEX_80000001. */ @@ -545,6 +390,11 @@ extern const struct cpu_features *__get_cpu_features (void) /* EBX. */ #define bit_cpu_WBNOINVD (1u << 9) +/* COMMON_CPUID_INDEX_7_ECX_1. */ + +/* EAX. */ +#define bit_cpu_AVX512_BF16 (1u << 5) + /* COMMON_CPUID_INDEX_1. */ /* ECX. */ @@ -662,17 +512,26 @@ extern const struct cpu_features *__get_cpu_features (void) #define index_cpu_CLDEMOTE COMMON_CPUID_INDEX_7 #define index_cpu_MOVDIRI COMMON_CPUID_INDEX_7 #define index_cpu_MOVDIR64B COMMON_CPUID_INDEX_7 +#define index_cpu_ENQCMD COMMON_CPUID_INDEX_7 #define index_cpu_SGX_LC COMMON_CPUID_INDEX_7 +#define index_cpu_PKS COMMON_CPUID_INDEX_7 /* EDX. */ #define index_cpu_AVX512_4VNNIW COMMON_CPUID_INDEX_7 #define index_cpu_AVX512_4FMAPS COMMON_CPUID_INDEX_7 #define index_cpu_FSRM COMMON_CPUID_INDEX_7 +#define index_cpu_AVX512_VP2INTERSECT COMMON_CPUID_INDEX_7 +#define index_cpu_MD_CLEAR COMMON_CPUID_INDEX_7 +#define index_cpu_SERIALIZE COMMON_CPUID_INDEX_7 +#define index_cpu_HYBRID COMMON_CPUID_INDEX_7 +#define index_cpu_TSXLDTRK COMMON_CPUID_INDEX_7 #define index_cpu_PCONFIG COMMON_CPUID_INDEX_7 #define index_cpu_IBT COMMON_CPUID_INDEX_7 #define index_cpu_IBRS_IBPB COMMON_CPUID_INDEX_7 #define index_cpu_STIBP COMMON_CPUID_INDEX_7 -#define index_cpu_CAPABILITIES COMMON_CPUID_INDEX_7 +#define index_cpu_L1D_FLUSH COMMON_CPUID_INDEX_7 +#define index_cpu_ARCH_CAPABILITIES COMMON_CPUID_INDEX_7 +#define index_cpu_CORE_CAPABILITIES COMMON_CPUID_INDEX_7 #define index_cpu_SSBD COMMON_CPUID_INDEX_7 /* COMMON_CPUID_INDEX_80000001. */ @@ -713,6 +572,11 @@ extern const struct cpu_features *__get_cpu_features (void) /* EBX. */ #define index_cpu_WBNOINVD COMMON_CPUID_INDEX_80000008 +/* COMMON_CPUID_INDEX_7_ECX_1. */ + +/* EAX. */ +#define index_cpu_AVX512_BF16 COMMON_CPUID_INDEX_7_ECX_1 + /* COMMON_CPUID_INDEX_1. */ /* ECX. */ @@ -830,17 +694,26 @@ extern const struct cpu_features *__get_cpu_features (void) #define reg_CLDEMOTE ecx #define reg_MOVDIRI ecx #define reg_MOVDIR64B ecx +#define reg_ENQCMD ecx #define reg_SGX_LC ecx +#define reg_PKS ecx /* EDX. */ #define reg_AVX512_4VNNIW edx #define reg_AVX512_4FMAPS edx #define reg_FSRM edx +#define reg_AVX512_VP2INTERSECT edx +#define reg_MD_CLEAR edx +#define reg_SERIALIZE edx +#define reg_HYBRID edx +#define reg_TSXLDTRK edx #define reg_PCONFIG edx #define reg_IBT edx #define reg_IBRS_IBPB edx #define reg_STIBP edx -#define reg_CAPABILITIES edx +#define reg_L1D_FLUSH edx +#define reg_ARCH_CAPABILITIES edx +#define reg_CORE_CAPABILITIES edx #define reg_SSBD edx /* COMMON_CPUID_INDEX_80000001. */ @@ -881,6 +754,11 @@ extern const struct cpu_features *__get_cpu_features (void) /* EBX. */ #define reg_WBNOINVD ebx +/* COMMON_CPUID_INDEX_7_ECX_1. */ + +/* EAX. */ +#define reg_AVX512_BF16 eax + /* FEATURE_INDEX_2. */ #define bit_arch_I586 (1u << 0) #define bit_arch_I686 (1u << 1) @@ -899,22 +777,39 @@ extern const struct cpu_features *__get_cpu_features (void) #define bit_arch_Prefer_No_AVX512 (1u << 14) #define bit_arch_MathVec_Prefer_No_AVX512 (1u << 15) -#define index_arch_Fast_Rep_String FEATURE_INDEX_2 -#define index_arch_Fast_Copy_Backward FEATURE_INDEX_2 -#define index_arch_Slow_BSF FEATURE_INDEX_2 -#define index_arch_Fast_Unaligned_Load FEATURE_INDEX_2 -#define index_arch_Prefer_PMINUB_for_stringop FEATURE_INDEX_2 -#define index_arch_Fast_Unaligned_Copy FEATURE_INDEX_2 -#define index_arch_I586 FEATURE_INDEX_2 -#define index_arch_I686 FEATURE_INDEX_2 -#define index_arch_Slow_SSE4_2 FEATURE_INDEX_2 -#define index_arch_AVX_Fast_Unaligned_Load FEATURE_INDEX_2 -#define index_arch_Prefer_MAP_32BIT_EXEC FEATURE_INDEX_2 -#define index_arch_Prefer_No_VZEROUPPER FEATURE_INDEX_2 -#define index_arch_Prefer_ERMS FEATURE_INDEX_2 -#define index_arch_Prefer_No_AVX512 FEATURE_INDEX_2 -#define index_arch_MathVec_Prefer_No_AVX512 FEATURE_INDEX_2 -#define index_arch_Prefer_FSRM FEATURE_INDEX_2 +#define index_arch_Fast_Rep_String PREFERRED_FEATURE_INDEX_1 +#define index_arch_Fast_Copy_Backward PREFERRED_FEATURE_INDEX_1 +#define index_arch_Slow_BSF PREFERRED_FEATURE_INDEX_1 +#define index_arch_Fast_Unaligned_Load PREFERRED_FEATURE_INDEX_1 +#define index_arch_Prefer_PMINUB_for_stringop PREFERRED_FEATURE_INDEX_1 +#define index_arch_Fast_Unaligned_Copy PREFERRED_FEATURE_INDEX_1 +#define index_arch_I586 PREFERRED_FEATURE_INDEX_1 +#define index_arch_I686 PREFERRED_FEATURE_INDEX_1 +#define index_arch_Slow_SSE4_2 PREFERRED_FEATURE_INDEX_1 +#define index_arch_AVX_Fast_Unaligned_Load PREFERRED_FEATURE_INDEX_1 +#define index_arch_Prefer_MAP_32BIT_EXEC PREFERRED_FEATURE_INDEX_1 +#define index_arch_Prefer_No_VZEROUPPER PREFERRED_FEATURE_INDEX_1 +#define index_arch_Prefer_ERMS PREFERRED_FEATURE_INDEX_1 +#define index_arch_Prefer_No_AVX512 PREFERRED_FEATURE_INDEX_1 +#define index_arch_MathVec_Prefer_No_AVX512 PREFERRED_FEATURE_INDEX_1 +#define index_arch_Prefer_FSRM PREFERRED_FEATURE_INDEX_1 + +#define feature_Fast_Rep_String preferred +#define feature_Fast_Copy_Backward preferred +#define feature_Slow_BSF preferred +#define feature_Fast_Unaligned_Load preferred +#define feature_Prefer_PMINUB_for_stringop preferred +#define feature_Fast_Unaligned_Copy preferred +#define feature_I586 preferred +#define feature_I686 preferred +#define feature_Slow_SSE4_2 preferred +#define feature_AVX_Fast_Unaligned_Load preferred +#define feature_Prefer_MAP_32BIT_EXEC preferred +#define feature_Prefer_No_VZEROUPPER preferred +#define feature_Prefer_ERMS preferred +#define feature_Prefer_No_AVX512 preferred +#define feature_MathVec_Prefer_No_AVX512 preferred +#define feature_Prefer_FSRM preferred /* XCR0 Feature flags. */ #define bit_XMM_state (1u << 1) diff --git a/sysdeps/x86/cpu-tunables.c b/sysdeps/x86/cpu-tunables.c index 38ad2c2e84..666ec571f2 100644 --- a/sysdeps/x86/cpu-tunables.c +++ b/sysdeps/x86/cpu-tunables.c @@ -54,7 +54,7 @@ extern __typeof (memcmp) DEFAULT_MEMCMP; _Static_assert (sizeof (#name) - 1 == len, #name " != " #len); \ if (!DEFAULT_MEMCMP (f, #name, len)) \ { \ - cpu_features->feature[index_arch_##name] \ + cpu_features->feature_##name[index_arch_##name] \ &= ~bit_arch_##name; \ break; \ } @@ -66,10 +66,10 @@ extern __typeof (memcmp) DEFAULT_MEMCMP; if (!DEFAULT_MEMCMP (f, #name, len)) \ { \ if (disable) \ - cpu_features->feature[index_arch_##name] \ + cpu_features->feature_##name[index_arch_##name] \ &= ~bit_arch_##name; \ else \ - cpu_features->feature[index_arch_##name] \ + cpu_features->feature_##name[index_arch_##name] \ |= bit_arch_##name; \ break; \ } @@ -82,10 +82,10 @@ extern __typeof (memcmp) DEFAULT_MEMCMP; if (!DEFAULT_MEMCMP (f, #name, len)) \ { \ if (disable) \ - cpu_features->feature[index_arch_##name] \ + cpu_features->feature_##name[index_arch_##name] \ &= ~bit_arch_##name; \ else if (CPU_FEATURES_ARCH_P (cpu_features, need)) \ - cpu_features->feature[index_arch_##name] \ + cpu_features->feature_##name[index_arch_##name] \ |= bit_arch_##name; \ break; \ } @@ -98,10 +98,10 @@ extern __typeof (memcmp) DEFAULT_MEMCMP; if (!DEFAULT_MEMCMP (f, #name, len)) \ { \ if (disable) \ - cpu_features->feature[index_arch_##name] \ + cpu_features->feature_##name[index_arch_##name] \ &= ~bit_arch_##name; \ else if (CPU_FEATURES_CPU_P (cpu_features, need)) \ - cpu_features->feature[index_arch_##name] \ + cpu_features->feature_##name[index_arch_##name] \ |= bit_arch_##name; \ break; \ } diff --git a/sysdeps/x86/tst-get-cpu-features.c b/sysdeps/x86/tst-get-cpu-features.c index 0dcb906a86..56f31cbe40 100644 --- a/sysdeps/x86/tst-get-cpu-features.c +++ b/sysdeps/x86/tst-get-cpu-features.c @@ -174,15 +174,24 @@ do_test (void) CHECK_CPU_FEATURE (CLDEMOTE); CHECK_CPU_FEATURE (MOVDIRI); CHECK_CPU_FEATURE (MOVDIR64B); + CHECK_CPU_FEATURE (ENQCMD); CHECK_CPU_FEATURE (SGX_LC); + CHECK_CPU_FEATURE (PKS); CHECK_CPU_FEATURE (AVX512_4VNNIW); CHECK_CPU_FEATURE (AVX512_4FMAPS); CHECK_CPU_FEATURE (FSRM); + CHECK_CPU_FEATURE (AVX512_VP2INTERSECT); + CHECK_CPU_FEATURE (MD_CLEAR); + CHECK_CPU_FEATURE (SERIALIZE); + CHECK_CPU_FEATURE (HYBRID); + CHECK_CPU_FEATURE (TSXLDTRK); CHECK_CPU_FEATURE (PCONFIG); CHECK_CPU_FEATURE (IBT); CHECK_CPU_FEATURE (IBRS_IBPB); CHECK_CPU_FEATURE (STIBP); - CHECK_CPU_FEATURE (CAPABILITIES); + CHECK_CPU_FEATURE (L1D_FLUSH); + CHECK_CPU_FEATURE (ARCH_CAPABILITIES); + CHECK_CPU_FEATURE (CORE_CAPABILITIES); CHECK_CPU_FEATURE (SSBD); CHECK_CPU_FEATURE (LAHF64_SAHF64); CHECK_CPU_FEATURE (SVM); @@ -204,84 +213,35 @@ do_test (void) CHECK_CPU_FEATURE (XSAVES); CHECK_CPU_FEATURE (INVARIANT_TSC); CHECK_CPU_FEATURE (WBNOINVD); + CHECK_CPU_FEATURE (AVX512_BF16); printf ("Usable CPU features:\n"); - CHECK_CPU_FEATURE_USABLE (SSE3); - CHECK_CPU_FEATURE_USABLE (PCLMULQDQ); - CHECK_CPU_FEATURE_USABLE (SSSE3); CHECK_CPU_FEATURE_USABLE (FMA); - CHECK_CPU_FEATURE_USABLE (CMPXCHG16B); - CHECK_CPU_FEATURE_USABLE (SSE4_1); - CHECK_CPU_FEATURE_USABLE (SSE4_2); - CHECK_CPU_FEATURE_USABLE (MOVBE); - CHECK_CPU_FEATURE_USABLE (POPCNT); - CHECK_CPU_FEATURE_USABLE (AES); - CHECK_CPU_FEATURE_USABLE (XSAVE); - CHECK_CPU_FEATURE_USABLE (OSXSAVE); CHECK_CPU_FEATURE_USABLE (AVX); CHECK_CPU_FEATURE_USABLE (F16C); - CHECK_CPU_FEATURE_USABLE (RDRAND); - CHECK_CPU_FEATURE_USABLE (FPU); - CHECK_CPU_FEATURE_USABLE (TSC); - CHECK_CPU_FEATURE_USABLE (MSR); - CHECK_CPU_FEATURE_USABLE (CX8); - CHECK_CPU_FEATURE_USABLE (SEP); - CHECK_CPU_FEATURE_USABLE (CMOV); - CHECK_CPU_FEATURE_USABLE (CLFSH); - CHECK_CPU_FEATURE_USABLE (MMX); - CHECK_CPU_FEATURE_USABLE (FXSR); - CHECK_CPU_FEATURE_USABLE (SSE); - CHECK_CPU_FEATURE_USABLE (SSE2); - CHECK_CPU_FEATURE_USABLE (FSGSBASE); - CHECK_CPU_FEATURE_USABLE (BMI1); - CHECK_CPU_FEATURE_USABLE (HLE); CHECK_CPU_FEATURE_USABLE (AVX2); - CHECK_CPU_FEATURE_USABLE (BMI2); - CHECK_CPU_FEATURE_USABLE (ERMS); CHECK_CPU_FEATURE_USABLE (AVX512F); CHECK_CPU_FEATURE_USABLE (AVX512DQ); - CHECK_CPU_FEATURE_USABLE (RDSEED); - CHECK_CPU_FEATURE_USABLE (ADX); CHECK_CPU_FEATURE_USABLE (AVX512_IFMA); - CHECK_CPU_FEATURE_USABLE (CLFLUSHOPT); - CHECK_CPU_FEATURE_USABLE (CLWB); CHECK_CPU_FEATURE_USABLE (AVX512PF); CHECK_CPU_FEATURE_USABLE (AVX512ER); CHECK_CPU_FEATURE_USABLE (AVX512CD); - CHECK_CPU_FEATURE_USABLE (SHA); CHECK_CPU_FEATURE_USABLE (AVX512BW); CHECK_CPU_FEATURE_USABLE (AVX512VL); - CHECK_CPU_FEATURE_USABLE (PREFETCHWT1); CHECK_CPU_FEATURE_USABLE (AVX512_VBMI); CHECK_CPU_FEATURE_USABLE (AVX512_VBMI2); - CHECK_CPU_FEATURE_USABLE (GFNI); CHECK_CPU_FEATURE_USABLE (VAES); CHECK_CPU_FEATURE_USABLE (VPCLMULQDQ); CHECK_CPU_FEATURE_USABLE (AVX512_VNNI); CHECK_CPU_FEATURE_USABLE (AVX512_BITALG); CHECK_CPU_FEATURE_USABLE (AVX512_VPOPCNTDQ); - CHECK_CPU_FEATURE_USABLE (RDPID); - CHECK_CPU_FEATURE_USABLE (CLDEMOTE); - CHECK_CPU_FEATURE_USABLE (MOVDIRI); - CHECK_CPU_FEATURE_USABLE (MOVDIR64B); CHECK_CPU_FEATURE_USABLE (AVX512_4VNNIW); CHECK_CPU_FEATURE_USABLE (AVX512_4FMAPS); - CHECK_CPU_FEATURE_USABLE (FSRM); - CHECK_CPU_FEATURE_USABLE (LAHF64_SAHF64); - CHECK_CPU_FEATURE_USABLE (LZCNT); - CHECK_CPU_FEATURE_USABLE (SSE4A); - CHECK_CPU_FEATURE_USABLE (PREFETCHW); + CHECK_CPU_FEATURE_USABLE (AVX512_VP2INTERSECT); CHECK_CPU_FEATURE_USABLE (XOP); CHECK_CPU_FEATURE_USABLE (FMA4); - CHECK_CPU_FEATURE_USABLE (TBM); - CHECK_CPU_FEATURE_USABLE (SYSCALL_SYSRET); - CHECK_CPU_FEATURE_USABLE (RDTSCP); - CHECK_CPU_FEATURE_USABLE (XSAVEOPT); CHECK_CPU_FEATURE_USABLE (XSAVEC); - CHECK_CPU_FEATURE_USABLE (XGETBV_ECX_1); - CHECK_CPU_FEATURE_USABLE (XSAVES); - CHECK_CPU_FEATURE_USABLE (INVARIANT_TSC); - CHECK_CPU_FEATURE_USABLE (WBNOINVD); + CHECK_CPU_FEATURE_USABLE (AVX512_BF16); return 0; } -- 2.26.2