From patchwork Fri Dec 15 07:39:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Lane X-Patchwork-Id: 1876496 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Ss1M30V7cz20LT for ; Fri, 15 Dec 2023 18:39:25 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1rE2mw-0004Rn-4P; Fri, 15 Dec 2023 07:39:14 +0000 Received: from smtp-relay-canonical-1.internal ([10.131.114.174] helo=smtp-relay-canonical-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1rE2mm-0004Qo-HJ for kernel-team@lists.ubuntu.com; Fri, 15 Dec 2023 07:39:04 +0000 Received: from galactica.tail09933.ts.net (1.general.jlane.us.vpn [10.172.64.160]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPSA id ECC01400F2 for ; Fri, 15 Dec 2023 07:39:03 +0000 (UTC) From: Jeff Lane To: kernel-team@lists.ubuntu.com Subject: [SRU L][PATCH 1/2] x86: don't use REP_GOOD or ERMS for small memory clearing Date: Fri, 15 Dec 2023 02:39:00 -0500 Message-Id: <20231215073901.1718677-2-jeffrey.lane@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231215073901.1718677-1-jeffrey.lane@canonical.com> References: <20231215073901.1718677-1-jeffrey.lane@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Linus Torvalds BugLink: https://bugs.launchpad.net/bugs/2020022 The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Signed-off-by: Linus Torvalds (cherry picked from commit 20f3337d350c4e1b4ac66d731fd4e98565bf6cc0) Signed-off-by: Michael Reed --- arch/x86/lib/memset_64.S | 47 ++++++++++------------------------------ 1 file changed, 11 insertions(+), 36 deletions(-) diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S index fc9ffd3ff3b2..2e5572c412c4 100644 --- a/arch/x86/lib/memset_64.S +++ b/arch/x86/lib/memset_64.S @@ -16,27 +16,22 @@ * rdx count (bytes) * * rax original destination + * + * The FSRS alternative should be done inline (avoiding the call and + * the disgusting return handling), but that would require some help + * from the compiler for better calling conventions. + * + * The 'rep stosb' itself is small enough to replace the call, but all + * the register moves blow up the code. And two of them are "needed" + * only for the return value that is the same as the source input, + * which the compiler could/should do much better anyway. */ SYM_FUNC_START(__memset) - /* - * Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended - * to use it when possible. If not available, use fast string instructions. - * - * Otherwise, use original memset function. - */ - ALTERNATIVE_2 "jmp memset_orig", "", X86_FEATURE_REP_GOOD, \ - "jmp memset_erms", X86_FEATURE_ERMS + ALTERNATIVE "jmp memset_orig", "", X86_FEATURE_FSRS movq %rdi,%r9 + movb %sil,%al movq %rdx,%rcx - andl $7,%edx - shrq $3,%rcx - /* expand byte value */ - movzbl %sil,%esi - movabs $0x0101010101010101,%rax - imulq %rsi,%rax - rep stosq - movl %edx,%ecx rep stosb movq %r9,%rax RET @@ -46,26 +41,6 @@ EXPORT_SYMBOL(__memset) SYM_FUNC_ALIAS_WEAK(memset, __memset) EXPORT_SYMBOL(memset) -/* - * ISO C memset - set a memory block to a byte value. This function uses - * enhanced rep stosb to override the fast string function. - * The code is simpler and shorter than the fast string function as well. - * - * rdi destination - * rsi value (char) - * rdx count (bytes) - * - * rax original destination - */ -SYM_FUNC_START_LOCAL(memset_erms) - movq %rdi,%r9 - movb %sil,%al - movq %rdx,%rcx - rep stosb - movq %r9,%rax - RET -SYM_FUNC_END(memset_erms) - SYM_FUNC_START_LOCAL(memset_orig) movq %rdi,%r10 From patchwork Fri Dec 15 07:39:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Lane X-Patchwork-Id: 1876497 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Ss1M30Nr9z1ySd for ; Fri, 15 Dec 2023 18:39:25 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1rE2ms-0004RI-Oa; Fri, 15 Dec 2023 07:39:10 +0000 Received: from smtp-relay-canonical-1.internal ([10.131.114.174] helo=smtp-relay-canonical-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1rE2mn-0004Qu-8q for kernel-team@lists.ubuntu.com; Fri, 15 Dec 2023 07:39:05 +0000 Received: from galactica.tail09933.ts.net (1.general.jlane.us.vpn [10.172.64.160]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPSA id B2E483F6B8 for ; Fri, 15 Dec 2023 07:39:04 +0000 (UTC) From: Jeff Lane To: kernel-team@lists.ubuntu.com Subject: [SRU L][PATCH 2/2] x86/cpufeatures: Add macros for Intel's new fast rep string features Date: Fri, 15 Dec 2023 02:39:01 -0500 Message-Id: <20231215073901.1718677-3-jeffrey.lane@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231215073901.1718677-1-jeffrey.lane@canonical.com> References: <20231215073901.1718677-1-jeffrey.lane@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jim Mattson BugLink: https://bugs.launchpad.net/bugs/2020022 KVM_GET_SUPPORTED_CPUID should reflect these host CPUID bits. The bits are already cached in word 12. Give the bits X86_FEATURE names, so that they can be easily referenced. Hide these bits from /proc/cpuinfo, since the host kernel makes no use of them at present. Signed-off-by: Jim Mattson Reviewed-by: Sean Christopherson Link: https://lore.kernel.org/r/20220901211811.2883855-1-jmattson@google.com Signed-off-by: Sean Christopherson (cherry picked from commit f8df91e73a6827a4569bb56cd53e55b4ea2f5b1f) Signed-off-by: Michael Reed --- arch/x86/include/asm/cpufeatures.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cfd280a5aaca..5facbd897fa2 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -316,6 +316,9 @@ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ #define X86_FEATURE_CMPCCXADD (12*32+ 7) /* "" CMPccXADD instructions */ +#define X86_FEATURE_FZRM (12*32+10) /* "" Fast zero-length REP MOVSB */ +#define X86_FEATURE_FSRS (12*32+11) /* "" Fast short REP STOSB */ +#define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP {CMPSB,SCASB} */ #define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */ #define X86_FEATURE_AVX_IFMA (12*32+23) /* "" Support for VPMADD52[H,L]UQ */