From patchwork Mon Mar 18 15:09:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 1057901 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-100695-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="kYe56N8B"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44NKNm5YRFz9s3q for ; Tue, 19 Mar 2019 02:10:00 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:in-reply-to:references :message-id; q=dns; s=default; b=LjDtlJObirJqFm/NCSvf/xc01l+q/n8 nVcL6wIOp/iT10hzOM74ys6Zh3F8GUztlmI0GcprCGrpFwKA0oTQZETSq7zhIscD j/WRk83Qw+dkOubMDIAGyQajZPjM4CiHmtKNjLGBPKjWMpVpRS+LcMmVqmi10zV5 P9CXlgfSF49g= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:in-reply-to:references :message-id; s=default; bh=9YniiRmeSl0pTihkrThaRvhKOdU=; b=kYe56 N8BjfDnNA3HkzhznPncuESwvLHRkeiSo7awkzF2eNXA/7+Iu+sd7XGoqxd9rHg6I qBWMcgcD31btynL5MGRefzH03bGkt/DmJeBGXdq1Po1MEin8UWdPm52ogjs8afM5 UcN51PSJEREFYyJ9hLMfFsOcPBIazPVEYhN9Ys= Received: (qmail 55711 invoked by alias); 18 Mar 2019 15:09:48 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 55648 invoked by uid 89); 18 Mar 2019 15:09:48 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KHOP_DYNAMIC, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=__typeof, 1696, sk:__libc_ X-HELO: mx0a-001b2d01.pphosted.com From: Stefan Liebler To: libc-alpha@sourceware.org Cc: Stefan Liebler Subject: [PATCH 3/5] S390: Add arch13 memmove ifunc variant. Date: Mon, 18 Mar 2019 16:09:00 +0100 In-Reply-To: <1552921742-31456-1-git-send-email-stli@linux.ibm.com> References: <1552921742-31456-1-git-send-email-stli@linux.ibm.com> x-cbid: 19031815-0028-0000-0000-0000035578E9 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19031815-0029-0000-0000-0000241411FA Message-Id: <1552921742-31456-3-git-send-email-stli@linux.ibm.com> This patch introduces the new arch13 ifunc variant for memmove. For the forward or non-overlapping case it is just using memcpy. For the backward case it relies on the new instruction mvcrl. The instruction copies up to 256 bytes at once. In case of an overlap, it copies the bytes like copying them one by one starting from right to left. ChangeLog: * sysdeps/s390/ifunc-memcpy.h (HAVE_MEMMOVE_ARCH13, MEMMOVE_ARCH13 HAVE_MEMMOVE_IFUNC_AND_ARCH13_SUPPORT): New defines. * sysdeps/s390/memcpy-z900.S: Add arch13 memmove implementation. * sysdeps/s390/memmove.c (memmove): Add arch13 variant in ifunc selector. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc variant for arch13 memmove. * sysdeps/s390/multiarch/ifunc-resolve.h (S390_STFLE_BITS_ARCH13_MIE3, S390_IS_ARCH13_MIE3): New defines. --- sysdeps/s390/ifunc-memcpy.h | 23 +++++++++- sysdeps/s390/memcpy-z900.S | 55 ++++++++++++++++++++++++ sysdeps/s390/memmove.c | 16 +++++-- sysdeps/s390/multiarch/ifunc-impl-list.c | 5 +++ sysdeps/s390/multiarch/ifunc-resolve.h | 5 +++ 5 files changed, 99 insertions(+), 5 deletions(-) diff --git a/sysdeps/s390/ifunc-memcpy.h b/sysdeps/s390/ifunc-memcpy.h index b83ae73508..1badb30ed8 100644 --- a/sysdeps/s390/ifunc-memcpy.h +++ b/sysdeps/s390/ifunc-memcpy.h @@ -44,7 +44,7 @@ #endif #if defined SHARED && defined USE_MULTIARCH && IS_IN (libc) \ - && ! defined HAVE_S390_MIN_Z13_ZARCH_ASM_SUPPORT + && ! defined HAVE_S390_MIN_ARCH13_ZARCH_ASM_SUPPORT # define HAVE_MEMMOVE_IFUNC 1 #else # define HAVE_MEMMOVE_IFUNC 0 @@ -56,14 +56,27 @@ # define HAVE_MEMMOVE_IFUNC_AND_VX_SUPPORT 0 #endif -#if defined HAVE_S390_MIN_Z13_ZARCH_ASM_SUPPORT +#ifdef HAVE_S390_ARCH13_ASM_SUPPORT +# define HAVE_MEMMOVE_IFUNC_AND_ARCH13_SUPPORT HAVE_MEMMOVE_IFUNC +#else +# define HAVE_MEMMOVE_IFUNC_AND_ARCH13_SUPPORT 0 +#endif + +#if defined HAVE_S390_MIN_ARCH13_ZARCH_ASM_SUPPORT +# define MEMMOVE_DEFAULT MEMMOVE_ARCH13 +# define HAVE_MEMMOVE_C 0 +# define HAVE_MEMMOVE_Z13 0 +# define HAVE_MEMMOVE_ARCH13 1 +#elif defined HAVE_S390_MIN_Z13_ZARCH_ASM_SUPPORT # define MEMMOVE_DEFAULT MEMMOVE_Z13 # define HAVE_MEMMOVE_C 0 # define HAVE_MEMMOVE_Z13 1 +# define HAVE_MEMMOVE_ARCH13 HAVE_MEMMOVE_IFUNC_AND_ARCH13_SUPPORT #else # define MEMMOVE_DEFAULT MEMMOVE_C # define HAVE_MEMMOVE_C 1 # define HAVE_MEMMOVE_Z13 HAVE_MEMMOVE_IFUNC_AND_VX_SUPPORT +# define HAVE_MEMMOVE_ARCH13 HAVE_MEMMOVE_IFUNC_AND_ARCH13_SUPPORT #endif #if HAVE_MEMCPY_Z900_G5 @@ -101,3 +114,9 @@ #else # define MEMMOVE_Z13 NULL #endif + +#if HAVE_MEMMOVE_ARCH13 +# define MEMMOVE_ARCH13 __memmove_arch13 +#else +# define MEMMOVE_ARCH13 NULL +#endif diff --git a/sysdeps/s390/memcpy-z900.S b/sysdeps/s390/memcpy-z900.S index 90d5f7becc..307332fcf9 100644 --- a/sysdeps/s390/memcpy-z900.S +++ b/sysdeps/s390/memcpy-z900.S @@ -277,6 +277,61 @@ ENTRY(MEMMOVE_Z13) END(MEMMOVE_Z13) #endif /* HAVE_MEMMOVE_Z13 */ +#if HAVE_MEMMOVE_ARCH13 +ENTRY(MEMMOVE_ARCH13) + .machine "arch13" + .machinemode "zarch_nohighgprs" +# if ! defined __s390x__ + /* Note: The 31bit dst and src pointers are prefixed with zeroes. */ + llgfr %r4,%r4 + llgfr %r3,%r3 + llgfr %r2,%r2 +# endif /* ! defined __s390x__ */ + sgrk %r5,%r2,%r3 + aghik %r0,%r4,-1 /* Both vstl and mvcrl needs highest index. */ + clgijh %r4,16,.L_MEMMOVE_ARCH13_LARGE +.L_MEMMOVE_ARCH13_SMALL: + jl .L_MEMMOVE_ARCH13_END /* Return if len was zero (cc of aghik). */ + /* Store up to 16 bytes with vll/vstl (needs highest index). */ + vll %v16,%r0,0(%r3) + vstl %v16,%r0,0(%r2) +.L_MEMMOVE_ARCH13_END: + br %r14 +.L_MEMMOVE_ARCH13_LARGE: + lgr %r1,%r2 /* For memcpy: r1: Use as dest ; r2: Return dest */ + /* The unsigned comparison (dst - src >= len) determines if we can + execute the forward case with memcpy. */ +#if ! HAVE_MEMCPY_Z196 +# error The arch13 variant of memmove needs the z196 variant of memcpy! +#endif + /* Backward case. */ + clgrjhe %r5,%r4,.L_Z196_start2 + clgijh %r0,255,.L_MEMMOVE_ARCH13_LARGER_256B + /* Move up to 256bytes with mvcrl (move right to left). */ + mvcrl 0(%r1),0(%r3) /* Move (r0 + 1) bytes from r3 to r1. */ + br %r14 +.L_MEMMOVE_ARCH13_LARGER_256B: + /* First move the "remaining" block of up to 256 bytes at the end of + src/dst buffers. Then move blocks of 256bytes in a loop starting + with the block at the end. + (If src/dst pointers are aligned e.g. to 256 bytes, then the pointers + passed to mvcrl instructions are aligned, too) */ + risbgn %r5,%r0,8,128+63,56 /* r5 = r0 / 256 */ + risbgn %r0,%r0,56,128+63,0 /* r0 = r0 & 0xFF */ + slgr %r4,%r0 + lay %r1,-1(%r4,%r1) + lay %r3,-1(%r4,%r3) + mvcrl 0(%r1),0(%r3) /* Move (r0 + 1) bytes from r3 to r1. */ + lghi %r0,255 /* Always copy 256 bytes in the loop below! */ +.L_MEMMOVE_ARCH13_LARGE_256B_LOOP: + aghi %r1,-256 + aghi %r3,-256 + mvcrl 0(%r1),0(%r3) /* Move (r0 + 1) bytes from r3 to r1. */ + brctg %r5,.L_MEMMOVE_ARCH13_LARGE_256B_LOOP + br %r14 +END(MEMMOVE_ARCH13) +#endif /* HAVE_MEMMOVE_ARCH13 */ + #if ! HAVE_MEMCPY_IFUNC /* If we don't use ifunc, define an alias for mem[p]cpy here. Otherwise see sysdeps/s390/mem[p]cpy.c. */ diff --git a/sysdeps/s390/memmove.c b/sysdeps/s390/memmove.c index fd4da377a3..fb6b69ae2f 100644 --- a/sysdeps/s390/memmove.c +++ b/sysdeps/s390/memmove.c @@ -36,9 +36,19 @@ extern __typeof (__redirect_memmove) MEMMOVE_C attribute_hidden; extern __typeof (__redirect_memmove) MEMMOVE_Z13 attribute_hidden; # endif +# if HAVE_MEMMOVE_ARCH13 +extern __typeof (__redirect_memmove) MEMMOVE_ARCH13 attribute_hidden; +# endif + s390_libc_ifunc_expr (__redirect_memmove, memmove, - (HAVE_MEMMOVE_Z13 && (hwcap & HWCAP_S390_VX)) - ? MEMMOVE_Z13 - : MEMMOVE_DEFAULT + ({ + s390_libc_ifunc_expr_stfle_init (); + (HAVE_MEMMOVE_ARCH13 + && S390_IS_ARCH13_MIE3 (stfle_bits)) + ? MEMMOVE_ARCH13 + : (HAVE_MEMMOVE_Z13 && (hwcap & HWCAP_S390_VX)) + ? MEMMOVE_Z13 + : MEMMOVE_DEFAULT; + }) ) #endif diff --git a/sysdeps/s390/multiarch/ifunc-impl-list.c b/sysdeps/s390/multiarch/ifunc-impl-list.c index b54c52af36..d742d66a6a 100644 --- a/sysdeps/s390/multiarch/ifunc-impl-list.c +++ b/sysdeps/s390/multiarch/ifunc-impl-list.c @@ -169,6 +169,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, #if HAVE_MEMMOVE_IFUNC IFUNC_IMPL (i, name, memmove, +# if HAVE_MEMMOVE_ARCH13 + IFUNC_IMPL_ADD (array, i, memmove, + S390_IS_ARCH13_MIE3 (stfle_bits), + MEMMOVE_ARCH13) +# endif # if HAVE_MEMMOVE_Z13 IFUNC_IMPL_ADD (array, i, memmove, dl_hwcap & HWCAP_S390_VX, MEMMOVE_Z13) diff --git a/sysdeps/s390/multiarch/ifunc-resolve.h b/sysdeps/s390/multiarch/ifunc-resolve.h index b833dfef28..743de9e591 100644 --- a/sysdeps/s390/multiarch/ifunc-resolve.h +++ b/sysdeps/s390/multiarch/ifunc-resolve.h @@ -22,6 +22,11 @@ #define S390_STFLE_BITS_Z10 34 /* General instructions extension */ #define S390_STFLE_BITS_Z196 45 /* Distinct operands, pop ... */ +#define S390_STFLE_BITS_ARCH13_MIE3 61 /* Miscellaneous-Instruction-Extensions + Facility 3, e.g. mvcrl. */ + +#define S390_IS_ARCH13_MIE3(STFLE_BITS) \ + ((STFLE_BITS & (1ULL << (63 - S390_STFLE_BITS_ARCH13_MIE3))) != 0) #define S390_IS_Z196(STFLE_BITS) \ ((STFLE_BITS & (1ULL << (63 - S390_STFLE_BITS_Z196))) != 0)