From patchwork Tue Jun 7 04:11:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1639783 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=UNpBhDtw; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LHHDl4x45z9sFw for ; Tue, 7 Jun 2022 14:19:19 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CA56A382D472 for ; Tue, 7 Jun 2022 04:19:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CA56A382D472 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1654575557; bh=g51rtSFrtc5/esRF2nnwl2fDrP9rbKMdQCpYoYv3ndg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=UNpBhDtwnyXnpBW5fMHeZv4GALjjNoY98iRsjSM5T0/JabWAQhaYE4Lnkzyo12+ct cb+vxYw1x51qZ8BfjzsZOe3NshyWyXO66N7KAAs9bU7oCUxSwv1vG1lR4MeUTPzWqh MS8gT6Nxq21s2Xe7qobruwF4ng+crDQHJN9keR24= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by sourceware.org (Postfix) with ESMTPS id B8185382D530 for ; Tue, 7 Jun 2022 04:11:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B8185382D530 Received: by mail-pj1-x1029.google.com with SMTP id l20-20020a17090a409400b001dd2a9d555bso14363598pjg.0 for ; Mon, 06 Jun 2022 21:11:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=g51rtSFrtc5/esRF2nnwl2fDrP9rbKMdQCpYoYv3ndg=; b=PLrt2e2zYmHpsCnv4Ie/g1thewXALfwUwK5WGpZoyqRQ5ppWkCtThJ5WM5ulDpbcRi OCrUVa1mu/OJVmGQjL7o+1A8Y4LXTbghtAV4CRgO/Q9nkZcCbI5/XQ4Mu93Q1MvAE69n MOnp+j8NcjQ3cDa3SlgCOLgpbfIJpMOH8ETo+dww0vcDKoivVuhIP3+q99s9BEdgO1fS MTqQGBRZwplKdwG66yYsA6ecFrR5UHJPkGveqGYqG6MGGdcWNdZ9peFS3Fj3Gyudgqc2 QQWTwBrBnhQFpWHyc5vg5IZ17ANx51iqbHd/oq77+sXxd3PkL+V0hm1C93ET7zPERKei s9Sg== X-Gm-Message-State: AOAM532oBqdD5zNznxGe1M6awu75ZQSH4aBtfYQbsYKkJnNX8xODxi6n WW9lEhHH5CIeIeIhTpPbQSJ7G38Pni8= X-Google-Smtp-Source: ABdhPJy2kWngVOAkiix02cSVKKGHx61y434vRYnb4FlBtF6VHNAtOGrG6ThMfS6Iw8HhizDxcGfs3w== X-Received: by 2002:a17:903:40ce:b0:164:248:1464 with SMTP id t14-20020a17090340ce00b0016402481464mr27448849pld.16.1654575108631; Mon, 06 Jun 2022 21:11:48 -0700 (PDT) Received: from noah-tgl.. ([2600:1010:b04a:6ef:921e:3108:9361:2ef8]) by smtp.gmail.com with ESMTPSA id fv9-20020a17090b0e8900b001e30a16c609sm10689205pjb.21.2022.06.06.21.11.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jun 2022 21:11:48 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v6 8/8] x86: Shrink code size of memchr-evex.S Date: Mon, 6 Jun 2022 21:11:34 -0700 Message-Id: <20220607041134.2369903-8-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220607041134.2369903-1-goldstein.w.n@gmail.com> References: <20220603044229.2180216-2-goldstein.w.n@gmail.com> <20220607041134.2369903-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" This is not meant as a performance optimization. The previous code was far to liberal in aligning targets and wasted code size unnecissarily. The total code size saving is: 64 bytes There are no non-negligible changes in the benchmarks. Geometric Mean of all benchmarks New / Old: 1.000 Full xcheck passes on x86_64. Reviewed-by: H.J. Lu --- sysdeps/x86_64/multiarch/memchr-evex.S | 46 ++++++++++++++------------ 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memchr-evex.S b/sysdeps/x86_64/multiarch/memchr-evex.S index cfaf02907d..0fd11b7632 100644 --- a/sysdeps/x86_64/multiarch/memchr-evex.S +++ b/sysdeps/x86_64/multiarch/memchr-evex.S @@ -88,7 +88,7 @@ # define PAGE_SIZE 4096 .section SECTION(.text),"ax",@progbits -ENTRY (MEMCHR) +ENTRY_P2ALIGN (MEMCHR, 6) # ifndef USE_AS_RAWMEMCHR /* Check for zero length. */ test %RDX_LP, %RDX_LP @@ -131,22 +131,24 @@ L(zero): xorl %eax, %eax ret - .p2align 5 + .p2align 4 L(first_vec_x0): - /* Check if first match was before length. */ - tzcntl %eax, %eax - xorl %ecx, %ecx - cmpl %eax, %edx - leaq (%rdi, %rax, CHAR_SIZE), %rax - cmovle %rcx, %rax + /* Check if first match was before length. NB: tzcnt has false data- + dependency on destination. eax already had a data-dependency on esi + so this should have no affect here. */ + tzcntl %eax, %esi +# ifdef USE_AS_WMEMCHR + leaq (%rdi, %rsi, CHAR_SIZE), %rdi +# else + addq %rsi, %rdi +# endif + xorl %eax, %eax + cmpl %esi, %edx + cmovg %rdi, %rax ret -# else - /* NB: first_vec_x0 is 17 bytes which will leave - cross_page_boundary (which is relatively cold) close enough - to ideal alignment. So only realign L(cross_page_boundary) if - rawmemchr. */ - .p2align 4 # endif + + .p2align 4 L(cross_page_boundary): /* Save pointer before aligning as its original value is necessary for computer return address if byte is found or @@ -400,10 +402,14 @@ L(last_2x_vec): L(zero_end): ret +L(set_zero_end): + xorl %eax, %eax + ret .p2align 4 L(first_vec_x1_check): - tzcntl %eax, %eax + /* eax must be non-zero. Use bsfl to save code size. */ + bsfl %eax, %eax /* Adjust length. */ subl $-(CHAR_PER_VEC * 4), %edx /* Check if match within remaining length. */ @@ -412,9 +418,6 @@ L(first_vec_x1_check): /* NB: Multiply bytes by CHAR_SIZE to get the wchar_t count. */ leaq VEC_SIZE(%rdi, %rax, CHAR_SIZE), %rax ret -L(set_zero_end): - xorl %eax, %eax - ret .p2align 4 L(loop_4x_vec_end): @@ -464,7 +467,7 @@ L(loop_4x_vec_end): # endif ret - .p2align 4 + .p2align 4,, 10 L(last_vec_x1_return): tzcntl %eax, %eax # if defined USE_AS_WMEMCHR || RET_OFFSET != 0 @@ -496,6 +499,7 @@ L(last_vec_x3_return): # endif # ifndef USE_AS_RAWMEMCHR + .p2align 4,, 5 L(last_4x_vec_or_less_cmpeq): VPCMP $0, (VEC_SIZE * 5)(%rdi), %YMMMATCH, %k0 kmovd %k0, %eax @@ -546,7 +550,7 @@ L(last_4x_vec): # endif andl %ecx, %eax jz L(zero_end2) - tzcntl %eax, %eax + bsfl %eax, %eax leaq (VEC_SIZE * 4)(%rdi, %rax, CHAR_SIZE), %rax L(zero_end2): ret @@ -562,6 +566,6 @@ L(last_vec_x3): leaq (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %rax ret # endif - + /* 7 bytes from next cache line. */ END (MEMCHR) #endif