From patchwork Wed Mar 20 14:31:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Youdkevitch X-Patchwork-Id: 1059257 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-100744-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bell-sw.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="dQT7j9q1"; dkim=pass (1024-bit key; unprotected) header.d=bell-sw.com header.i=@bell-sw.com header.b="fRg0Af/U"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44PXS40rkWz9sNf for ; Thu, 21 Mar 2019 01:32:03 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; q=dns; s=default; b=GY43/AolaYx9V+syzjjqKgud9P+hE n379EQYXasHnwtEq/YPshCAiP8wl8dMd2/yYFX7BZFC7dF7zC9ynJksOlkAx7xPg UAJQAItawNpw8shfknKvVJGaccY29CUEKHv0DFQrlKLDMpO71p13LGxazQjo73mv WwSVCKoE+Mzg8M= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; s=default; bh=gyYGYYHMPfETVerRxTAGBvNM88Y=; b=dQT 7j9q1PwNpPKLPTrxIqhf8akfShbIpZaOHIZMT6NNcgpSWEWwHE8Exrec74iqO5AH +h4BkbSuxYAt/NtDO9cp3CiGRxDEBLHu3S+/Xi68x4lcFNnsoKkQ1drywAYWo8HD RQIdjrBiSAO3bdbsMUaB9jVyjgtNDyNhn+Vzxgl0= Received: (qmail 33345 invoked by alias); 20 Mar 2019 14:31:51 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 33270 invoked by uid 89); 20 Mar 2019 14:31:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=subs, H*r:Client, H*r:present X-HELO: forward103o.mail.yandex.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bell-sw.com; s=mail; t=1553092303; bh=xPRxzbOBk2zBw4aHMKz2jVvDl/Sq8uGiX49z0cHjzAc=; h=Subject:To:From:Date:Message-ID; b=fRg0Af/UWWVGmHqnHbBPIoBRxgMWEpfaQ40eNqcLM7yxR/2K9w7PVtkKZ6rqnS6mv +r2d0NgurHDU6Y4/kC4aPiNNvbdw1SRDusDRirG/GrK5aNt9ZHdoLtFSSRw8CsPiyP IgxXgCw+eDT1ihVMcIN/a4ViCxJ/2YQbDyaS0VDI= Authentication-Results: mxback17j.mail.yandex.net; dkim=pass header.i=@bell-sw.com Date: Wed, 20 Mar 2019 17:31:42 +0300 From: Anton Youdkevitch To: libc-alpha@sourceware.org Subject: [PATCH] aarch64: thunderx2 memcpy implementation cleanup and streamlining Message-ID: <20190320143141.GA13393@bell-sw.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Replaced named labels with the numeric ones to reduce clutter. Rewrote branching scheme in load and merge chunk to be aligned with the most probable case. ChangeLog: * sysdeps/aarch64/multiarch/memcpy_thunderx2.S: cleanup diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S index b2215c1..b55dd52 100644 --- a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S +++ b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S @@ -352,7 +352,7 @@ L(bytes_32_to_48): /* Small copies: 0..16 bytes. */ L(memcopy16): cmp count, 8 - b.lo L(bytes_0_to_8) + b.lo 1f ldr A_l, [src] ldr A_h, [srcend, -8] add dstend, dstin, count @@ -361,8 +361,8 @@ L(memcopy16): ret .p2align 4 -L(bytes_0_to_8): - tbz count, 2, L(bytes_0_to_3) +1: + tbz count, 2, 1f ldr A_lw, [src] ldr A_hw, [srcend, -4] add dstend, dstin, count @@ -372,8 +372,8 @@ L(bytes_0_to_8): /* Copy 0..3 bytes. Use a branchless sequence that copies the same byte 3 times if count==1, or the 2nd byte twice if count==2. */ -L(bytes_0_to_3): - cbz count, L(end) +1: + cbz count, 1f lsr tmp1, count, 1 ldrb A_lw, [src] ldrb A_hw, [srcend, -1] @@ -382,7 +382,8 @@ L(bytes_0_to_3): strb A_lw, [dstin] strb B_lw, [dstin, tmp1] strb A_hw, [dstend, -1] -L(end): ret +1: + ret .p2align 4 @@ -404,7 +405,7 @@ L(memcpy_copy96): after tmp [0..15] gets added to it, count now is +48 */ cmp count, 80 - b.gt L(copy96_medium) + b.gt 1f ldr D_q, [src, 32] stp B_q, C_q, [dst, 16] str E_q, [dstend, -16] @@ -412,17 +413,17 @@ L(memcpy_copy96): ret .p2align 4 -L(copy96_medium): +1: ldp D_q, A_q, [src, 32] str B_q, [dst, 16] cmp count, 96 - b.gt L(copy96_large) + b.gt 1f str E_q, [dstend, -16] stp C_q, D_q, [dst, 32] str A_q, [dst, 64] ret -L(copy96_large): +1: ldr F_q, [src, 64] stp C_q, D_q, [dst, 32] str E_q, [dstend, -16] @@ -557,17 +558,9 @@ L(ext_size_ ## shft):;\ ext A_v.16b, C_v.16b, D_v.16b, 16-shft;\ ext B_v.16b, D_v.16b, E_v.16b, 16-shft;\ subs count, count, 32;\ - b.ge 2f;\ + b.lt 2f;\ 1:;\ stp A_q, B_q, [dst], #32;\ - ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ - ext I_v.16b, F_v.16b, G_v.16b, 16-shft;\ - stp H_q, I_q, [dst], #16;\ - add dst, dst, tmp1;\ - str G_q, [dst], #16;\ - b L(copy_long_check32);\ -2:;\ - stp A_q, B_q, [dst], #32;\ prfm pldl1strm, [src, MEMCPY_PREFETCH_LDR];\ ldp D_q, J_q, [src], #32;\ ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ @@ -579,8 +572,15 @@ L(ext_size_ ## shft):;\ ext B_v.16b, D_v.16b, J_v.16b, 16-shft;\ mov E_v.16b, J_v.16b;\ subs count, count, 64;\ - b.ge 2b;\ - b 1b;\ + b.ge 1b;\ +2:;\ + stp A_q, B_q, [dst], #32;\ + ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ + ext I_v.16b, F_v.16b, G_v.16b, 16-shft;\ + stp H_q, I_q, [dst], #16;\ + add dst, dst, tmp1;\ + str G_q, [dst], #16;\ + b L(copy_long_check32);\ EXT_CHUNK(1) EXT_CHUNK(2)