From patchwork Tue May 12 13:32:56 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Leroy X-Patchwork-Id: 471348 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 0C53E140187 for ; Tue, 12 May 2015 23:38:18 +1000 (AEST) Received: from ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id E704B1A1A9A for ; Tue, 12 May 2015 23:38:17 +1000 (AEST) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from mailhub1.si.c-s.fr (pegase1.c-s.fr [93.17.236.30]) by lists.ozlabs.org (Postfix) with ESMTP id 9932C1A013E for ; Tue, 12 May 2015 23:32:57 +1000 (AEST) Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id D2A781C81FB; Tue, 12 May 2015 15:32:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from mailhub1.si.c-s.fr ([192.168.12.234]) by localhost (mailhub1.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C6pUgM_A24iD; Tue, 12 May 2015 15:32:56 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id B9CA71C81FA; Tue, 12 May 2015 15:32:56 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id A51DBC73C5; Tue, 12 May 2015 15:32:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id Upc2bdCtUyFJ; Tue, 12 May 2015 15:32:56 +0200 (CEST) Received: from PO10863.localdomain (unknown [172.25.231.32]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 75797C73C4; Tue, 12 May 2015 15:32:56 +0200 (CEST) Received: by localhost.localdomain (Postfix, from userid 0) id 550A51A24BC; Tue, 12 May 2015 15:32:56 +0200 (CEST) Message-Id: In-Reply-To: References: From: Christophe Leroy Subject: [PATCH 4/4] powerpc32: memcpy: use cacheable_memcpy To: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , scottwood@freescale.com Date: Tue, 12 May 2015 15:32:56 +0200 (CEST) Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Kyle Moffett X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" cacheable_memcpy uses dcbz instruction and is more efficient than memcpy when the destination is in RAM This patch renames memcpy as generic_memcpy, and defines memcpy as a prolog to cacheable_memcpy. This prolog checks if the buffer is in RAM. If not, it falls back to generic_memcpy() On MPC885, we get approximatly 7% increase of the transfer rate on an FTP reception Signed-off-by: Christophe Leroy --- arch/powerpc/lib/copy_32.S | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/lib/copy_32.S b/arch/powerpc/lib/copy_32.S index d8a9a86..8f76d49 100644 --- a/arch/powerpc/lib/copy_32.S +++ b/arch/powerpc/lib/copy_32.S @@ -161,13 +161,27 @@ _GLOBAL(generic_memset) * We only use this version if the source and dest don't overlap. * -- paulus. */ +_GLOBAL(memmove) + cmplw 0,r3,r4 + bgt backwards_memcpy + /* fall through */ + +_GLOBAL(memcpy) + cmplwi r5,L1_CACHE_BYTES + blt- generic_memcpy + lis r8,max_pfn@ha + lwz r8,max_pfn@l(r8) + tophys (r9,r3) + srwi r9,r9,PAGE_SHIFT + cmplw r9,r8 + bge- generic_memcpy _GLOBAL(cacheable_memcpy) add r7,r3,r5 /* test if the src & dst overlap */ add r8,r4,r5 cmplw 0,r4,r7 cmplw 1,r3,r8 crand 0,0,4 /* cr0.lt &= cr1.lt */ - blt memcpy /* if regions overlap */ + blt generic_memcpy /* if regions overlap */ addi r4,r4,-4 addi r6,r3,-4 @@ -233,12 +247,7 @@ _GLOBAL(cacheable_memcpy) bdnz 40b 65: blr -_GLOBAL(memmove) - cmplw 0,r3,r4 - bgt backwards_memcpy - /* fall through */ - -_GLOBAL(memcpy) +_GLOBAL(generic_memcpy) srwi. r7,r5,3 addi r6,r3,-4 addi r4,r4,-4