From patchwork Fri Mar 22 20:14:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Youdkevitch X-Patchwork-Id: 1061625 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-100831-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bell-sw.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="qJ9SONIK"; dkim=pass (1024-bit key; unprotected) header.d=bell-sw.com header.i=@bell-sw.com header.b="Nuyc2aBX"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44Qvz111FPz9sSQ for ; Sat, 23 Mar 2019 07:15:08 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:subject:to:message-id:date:mime-version :content-type; q=dns; s=default; b=riktqqQIvd7gAdZ9vnTDEJ+caDWwi smeo0dCYX246J49/dQhABDLWk+J6tUXWrUBLiPHZD/Y9cQcd24bmTd6yv5Zf9TzU MMDHc68M8WrAbIcXHfb0qfCCBelWVerAqk/vuyaEpTiFp8lq3sVD10d62PJgD0M5 JFZYFhuZK1B0YQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:subject:to:message-id:date:mime-version :content-type; s=default; bh=elbYHFhPpFasvX3q7HFJDI4CIF4=; b=qJ9 SONIKXptHGpVSO0BB39AcM9smlFxOUP+XmPrNIYHLq772bxN5XDhiTatGRSiVKp1 PxnkWF8bim5MlYhf5Kr79uV0zsIQYpmBZBZaql8EohBiwixd8A+ZAn+K8OVzXONO H9Xpmml3uZweLl20PH/5HyW7lW8pNVB1gR0seM4A= Received: (qmail 65161 invoked by alias); 22 Mar 2019 20:15:01 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 64872 invoked by uid 89); 22 Mar 2019 20:15:00 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=3828 X-HELO: forward102o.mail.yandex.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bell-sw.com; s=mail; t=1553285695; bh=YWhmyCQWTJm+LoMz1Id6Mgyhvyhu7NNRiqyN+5ZY4NU=; h=To:Subject:From:Date:Message-ID; b=Nuyc2aBX+0rOcS1PryTWxILmyF5g+AxxwFSPhlUO+vY7D/ef5P4iOxv4HKyEncSNF yT9AV2hgwQw07KVA2pnZvBlKT6WiJcVQsLxGwChMt9x8ZYqWF9W4HsFYQyeqCbjqQE MgxdVt9McwnGhcOLlNOjONlZ+M82/NZGAis+OlR0= Authentication-Results: mxback3g.mail.yandex.net; dkim=pass header.i=@bell-sw.com From: Anton Youdkevitch Subject: [PATCH v3] aarch64: thunderx2 memcpy branches reordering To: Wilco Dijkstra , libc-alpha@sourceware.org Message-ID: <5C95423E.4050502@bell-sw.com> Date: Fri, 22 Mar 2019 23:14:54 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 The "ext" chunk changes are: 1. Always taken conditional branch at the beginning is removed. 2. Epilogue code is placed after the end of the loop to reduce the number of branches. 2. The redundant "mov" instruction inside the loop is removed. 3. Invariant code in the loop epilogue is no more repeated for each chunk. make check showed no regressions. diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S index b2215c1..f53bc2a 100644 --- a/sysdeps/aarch64/multiarch/memcpy_thunderx2.S +++ b/sysdeps/aarch64/multiarch/memcpy_thunderx2.S @@ -382,7 +382,8 @@ L(bytes_0_to_3): strb A_lw, [dstin] strb B_lw, [dstin, tmp1] strb A_hw, [dstend, -1] -L(end): ret +L(end): + ret .p2align 4 @@ -544,6 +545,7 @@ L(dst_unaligned): str C_q, [dst], #16 ldp F_q, G_q, [src], #32 bic dst, dst, 15 + subs count, count, 32 adrp tmp2, L(ext_table) add tmp2, tmp2, :lo12:L(ext_table) add tmp2, tmp2, tmp1, LSL #2 @@ -556,31 +558,24 @@ L(dst_unaligned): L(ext_size_ ## shft):;\ ext A_v.16b, C_v.16b, D_v.16b, 16-shft;\ ext B_v.16b, D_v.16b, E_v.16b, 16-shft;\ - subs count, count, 32;\ - b.ge 2f;\ 1:;\ stp A_q, B_q, [dst], #32;\ - ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ - ext I_v.16b, F_v.16b, G_v.16b, 16-shft;\ - stp H_q, I_q, [dst], #16;\ - add dst, dst, tmp1;\ - str G_q, [dst], #16;\ - b L(copy_long_check32);\ -2:;\ - stp A_q, B_q, [dst], #32;\ prfm pldl1strm, [src, MEMCPY_PREFETCH_LDR];\ - ldp D_q, J_q, [src], #32;\ + ldp C_q, D_q, [src], #32;\ ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ ext I_v.16b, F_v.16b, G_v.16b, 16-shft;\ - mov C_v.16b, G_v.16b;\ stp H_q, I_q, [dst], #32;\ + ext A_v.16b, G_v.16b, C_v.16b, 16-shft;\ ldp F_q, G_q, [src], #32;\ - ext A_v.16b, C_v.16b, D_v.16b, 16-shft;\ - ext B_v.16b, D_v.16b, J_v.16b, 16-shft;\ - mov E_v.16b, J_v.16b;\ + ext B_v.16b, C_v.16b, D_v.16b, 16-shft;\ + mov E_v.16b, D_v.16b;\ subs count, count, 64;\ - b.ge 2b;\ - b 1b;\ + b.ge 1b;\ +2:;\ + stp A_q, B_q, [dst], #32;\ + ext H_v.16b, E_v.16b, F_v.16b, 16-shft;\ + ext I_v.16b, F_v.16b, G_v.16b, 16-shft;\ + b L(ext_tail); EXT_CHUNK(1) EXT_CHUNK(2) @@ -598,6 +593,13 @@ EXT_CHUNK(13) EXT_CHUNK(14) EXT_CHUNK(15) +L(ext_tail): + stp H_q, I_q, [dst], #16 + add dst, dst, tmp1 + str G_q, [dst], #16 + b L(copy_long_check32) + + END (MEMCPY) .section .rodata .p2align 4