From patchwork Thu Nov 7 13:01:29 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philippe Bergheaud X-Patchwork-Id: 289313 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from ozlabs.org (localhost [IPv6:::1]) by ozlabs.org (Postfix) with ESMTP id 531EB2C036B for ; Fri, 8 Nov 2013 00:02:19 +1100 (EST) Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e06smtp12.uk.ibm.com", Issuer "GeoTrust SSL CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 533102C00C3 for ; Fri, 8 Nov 2013 00:01:49 +1100 (EST) Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 7 Nov 2013 13:01:45 -0000 Received: from d06dlp02.portsmouth.uk.ibm.com (9.149.20.14) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 7 Nov 2013 13:01:42 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id E8430219005E for ; Thu, 7 Nov 2013 13:01:41 +0000 (GMT) Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by b06cxnps3074.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id rA7D1T3U64880736 for ; Thu, 7 Nov 2013 13:01:29 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id rA7D1f2t005300 for ; Thu, 7 Nov 2013 06:01:41 -0700 Received: from [9.143.103.24] (dhcp-9-143-103-24.ttt.fr.ibm.com [9.143.103.24]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id rA7D1eWn005274; Thu, 7 Nov 2013 06:01:41 -0700 Message-ID: <527B8F29.2020204@linux.vnet.ibm.com> Date: Thu, 07 Nov 2013 14:01:29 +0100 From: Philippe Bergheaud User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.11) Gecko/20050905 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Michael Neuling Subject: [PATCH v2] powerpc: memcpy optimization for 64bit LE References: <1383640732-21449-1-git-send-email-felix@linux.vnet.ibm.com> <11438.1383718966@ale.ozlabs.ibm.com> <527A183F.7030500@linux.vnet.ibm.com> In-Reply-To: X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13110713-8372-0000-0000-000007B32620 Cc: Linuxppc-dev@lists.ozlabs.org, anton@samba.org X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.16rc2 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Unaligned stores take alignment exceptions on POWER7 running in little-endian. This is a dumb little-endian base memcpy that prevents unaligned stores. Once booted the feature fixup code switches over to the VMX copy loops (which are already endian safe). The question is what we do before that switch over. The base 64bit memcpy takes alignment exceptions on POWER7 so we can't use it as is. Fixing the causes of alignment exception would slow it down, because we'd need to ensure all loads and stores are aligned either through rotate tricks or bytewise loads and stores. Either would be bad for all other 64bit platforms. Signed-off-by: Philippe Bergheaud --- arch/powerpc/include/asm/string.h | 4 ---- arch/powerpc/kernel/ppc_ksyms.c | 2 -- arch/powerpc/lib/Makefile | 2 -- arch/powerpc/lib/memcpy_64.S | 19 +++++++++++++++++++ 4 files changed, 19 insertions(+), 8 deletions(-) -- 1.7.10.4 diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h index 0dffad6..e40010a 100644 --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -10,9 +10,7 @@ #define __HAVE_ARCH_STRNCMP #define __HAVE_ARCH_STRCAT #define __HAVE_ARCH_MEMSET -#ifdef __BIG_ENDIAN__ #define __HAVE_ARCH_MEMCPY -#endif #define __HAVE_ARCH_MEMMOVE #define __HAVE_ARCH_MEMCMP #define __HAVE_ARCH_MEMCHR @@ -24,9 +22,7 @@ extern int strcmp(const char *,const char *); extern int strncmp(const char *, const char *, __kernel_size_t); extern char * strcat(char *, const char *); extern void * memset(void *,int,__kernel_size_t); -#ifdef __BIG_ENDIAN__ extern void * memcpy(void *,const void *,__kernel_size_t); -#endif extern void * memmove(void *,const void *,__kernel_size_t); extern int memcmp(const void *,const void *,__kernel_size_t); extern void * memchr(const void *,int,__kernel_size_t); diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index 526ad5c..0c2dd60 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -147,9 +147,7 @@ EXPORT_SYMBOL(__ucmpdi2); #endif long long __bswapdi2(long long); EXPORT_SYMBOL(__bswapdi2); -#ifdef __BIG_ENDIAN__ EXPORT_SYMBOL(memcpy); -#endif EXPORT_SYMBOL(memset); EXPORT_SYMBOL(memmove); EXPORT_SYMBOL(memcmp); diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 5310132..6670361 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -23,9 +23,7 @@ obj-y += checksum_$(CONFIG_WORD_SIZE).o obj-$(CONFIG_PPC64) += checksum_wrappers_64.o endif -ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),) obj-$(CONFIG_PPC64) += memcpy_power7.o memcpy_64.o -endif obj-$(CONFIG_PPC_EMULATE_SSTEP) += sstep.o ldstfp.o diff --git a/arch/powerpc/lib/memcpy_64.S b/arch/powerpc/lib/memcpy_64.S index d2bbbc8..358cf74 100644 --- a/arch/powerpc/lib/memcpy_64.S +++ b/arch/powerpc/lib/memcpy_64.S @@ -12,10 +12,28 @@ .align 7 _GLOBAL(memcpy) BEGIN_FTR_SECTION +#ifdef __LITTLE_ENDIAN__ + cmpdi cr7,r5,0 /* dumb little-endian memcpy */ +#else std r3,48(r1) /* save destination pointer for return value */ +#endif FTR_SECTION_ELSE b memcpy_power7 ALT_FTR_SECTION_END_IFCLR(CPU_FTR_VMX_COPY) +#ifdef __LITTLE_ENDIAN__ + addi r5,r5,-1 + addi r9,r3,-1 + add r5,r3,r5 + subf r5,r9,r5 + addi r4,r4,-1 + mtctr r5 + beqlr cr7 +1: + lbzu r10,1(r4) + stbu r10,1(r9) + bdnz 1b + blr +#else PPC_MTOCRF(0x01,r5) cmpldi cr1,r5,16 neg r6,r3 # LS 3 bits = # bytes to 8-byte dest bdry @@ -201,3 +219,4 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_LD_STD) stb r0,0(r3) 4: ld r3,48(r1) /* return dest pointer */ blr +#endif