From patchwork Mon Feb 6 05:58:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rajalakshmi Srinivasaraghavan X-Patchwork-Id: 724362 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vGxdd65kdz9ryj for ; Mon, 6 Feb 2017 17:00:05 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="mEla4gA/"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id; q=dns; s= default; b=dD7a8YFX6CTzGNIEcER7Pw20dIW7xM523LF0C4soinp6zFo7/bjil g/qo23go+wgWyx7k+yMhPBxaIJ+lULT2t4ve+ynnh2E0ZTwqQS9pH8ree4Nunw10 JV92iGWtYinDd5Don01YN/5TUbPHjaXw7UO1FCNaCwJ2Djqc8rvwso= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id; s=default; bh=4yI5yL7B/DvaaZCzSOfrix9H6Kg=; b=mEla4gA/n9xZwGxDgKsJ0shxxlBS u5ptbQJPD4jn9uVU7p+Rw7Z5UwXPB5Hi2PzUkfTWVmYnk9JIsoq8NCjsvXuYIJ7j 84ZjIIrN4Kh8kGQmXtcaXJZnSZjiEJak0MkeEwrq1N+ewIWAUjmM2CSpgEuJ/X34 p5+W5qd44mFy9Xs= Received: (qmail 34467 invoked by alias); 6 Feb 2017 05:59:56 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 34454 invoked by uid 89); 6 Feb 2017 05:59:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=16B, addi, sk:Sriniva, Rajalakshmi X-HELO: mx0a-001b2d01.pphosted.com From: Rajalakshmi Srinivasaraghavan To: libc-alpha@sourceware.org Cc: Rajalakshmi Srinivasaraghavan Subject: [PATCH] powerpc: Improve strcmp performance for shorter strings Date: Mon, 6 Feb 2017 11:28:53 +0530 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17020605-0048-0000-0000-0000020237B9 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17020605-0049-0000-0000-0000477B561D Message-Id: <1486360733-32462-1-git-send-email-raji@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-02-06_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1702060061 For strings >16B and <32B existing algorithm takes more time than default implementation when strings are placed closed to end of page. This is due to byte by byte access for handling page cross. This is improved by following >32B code path where the address is adjusted to aligned memory before doing load doubleword operation instead of loading bytes. Tested on powerpc64 and powerpc64le. 2017-02-04 Rajalakshmi Srinivasaraghavan * sysdeps/powerpc/powerpc64/power8/strcmp.S: Adjust address for unaligned load for shorter strings. * sysdeps/powerpc/powerpc64/power9/strcmp.S: Likewise. --- sysdeps/powerpc/powerpc64/power8/strcmp.S | 30 ++++++++---------------------- sysdeps/powerpc/powerpc64/power9/strcmp.S | 30 ++++++++---------------------- 2 files changed, 16 insertions(+), 44 deletions(-) diff --git a/sysdeps/powerpc/powerpc64/power8/strcmp.S b/sysdeps/powerpc/powerpc64/power8/strcmp.S index c34ff4a..d46bff8 100644 --- a/sysdeps/powerpc/powerpc64/power8/strcmp.S +++ b/sysdeps/powerpc/powerpc64/power8/strcmp.S @@ -30,21 +30,21 @@ EALIGN (strcmp, 4, 0) li r0,0 - /* Check if [s1]+32 or [s2]+32 will cross a 4K page boundary using + /* Check if [s1]+16 or [s2]+16 will cross a 4K page boundary using the code: (((size_t) s1) % PAGE_SIZE > (PAGE_SIZE - ITER_SIZE)) - with PAGE_SIZE being 4096 and ITER_SIZE begin 32. */ + with PAGE_SIZE being 4096 and ITER_SIZE begin 16. */ rldicl r7,r3,0,52 rldicl r9,r4,0,52 - cmpldi cr7,r7,4096-32 + cmpldi cr7,r7,4096-16 bgt cr7,L(pagecross_check) - cmpldi cr5,r9,4096-32 + cmpldi cr5,r9,4096-16 bgt cr5,L(pagecross_check) - /* For short string up to 32 bytes, load both s1 and s2 using + /* For short string up to 16 bytes, load both s1 and s2 using unaligned dwords and compare. */ ld r8,0(r3) ld r10,0(r4) @@ -60,25 +60,11 @@ EALIGN (strcmp, 4, 0) orc. r9,r12,r11 bne cr0,L(different_nocmpb) - ld r8,16(r3) - ld r10,16(r4) - cmpb r12,r8,r0 - cmpb r11,r8,r10 - orc. r9,r12,r11 - bne cr0,L(different_nocmpb) - - ld r8,24(r3) - ld r10,24(r4) - cmpb r12,r8,r0 - cmpb r11,r8,r10 - orc. r9,r12,r11 - bne cr0,L(different_nocmpb) - - addi r7,r3,32 - addi r4,r4,32 + addi r7,r3,16 + addi r4,r4,16 L(align_8b): - /* Now it has checked for first 32 bytes, align source1 to doubleword + /* Now it has checked for first 16 bytes, align source1 to doubleword and adjust source2 address. */ rldicl r9,r7,0,61 /* source1 alignment to doubleword */ subf r4,r9,r4 /* Adjust source2 address based on source1 diff --git a/sysdeps/powerpc/powerpc64/power9/strcmp.S b/sysdeps/powerpc/powerpc64/power9/strcmp.S index 3e32396..17ec8c2 100644 --- a/sysdeps/powerpc/powerpc64/power9/strcmp.S +++ b/sysdeps/powerpc/powerpc64/power9/strcmp.S @@ -65,21 +65,21 @@ EALIGN (strcmp, 4, 0) li r0, 0 - /* Check if [s1]+32 or [s2]+32 will cross a 4K page boundary using + /* Check if [s1]+16 or [s2]+16 will cross a 4K page boundary using the code: (((size_t) s1) % PAGE_SIZE > (PAGE_SIZE - ITER_SIZE)) - with PAGE_SIZE being 4096 and ITER_SIZE begin 32. */ + with PAGE_SIZE being 4096 and ITER_SIZE begin 16. */ rldicl r7, r3, 0, 52 rldicl r9, r4, 0, 52 - cmpldi cr7, r7, 4096-32 + cmpldi cr7, r7, 4096-16 bgt cr7, L(pagecross_check) - cmpldi cr5, r9, 4096-32 + cmpldi cr5, r9, 4096-16 bgt cr5, L(pagecross_check) - /* For short strings up to 32 bytes, load both s1 and s2 using + /* For short strings up to 16 bytes, load both s1 and s2 using unaligned dwords and compare. */ ld r8, 0(r3) ld r10, 0(r4) @@ -95,25 +95,11 @@ EALIGN (strcmp, 4, 0) orc. r9, r12, r11 bne cr0, L(different_nocmpb) - ld r8, 16(r3) - ld r10, 16(r4) - cmpb r12, r8, r0 - cmpb r11, r8, r10 - orc. r9, r12, r11 - bne cr0, L(different_nocmpb) - - ld r8, 24(r3) - ld r10, 24(r4) - cmpb r12, r8, r0 - cmpb r11, r8, r10 - orc. r9, r12, r11 - bne cr0, L(different_nocmpb) - - addi r7, r3, 32 - addi r4, r4, 32 + addi r7, r3, 16 + addi r4, r4, 16 L(align): - /* Now it has checked for first 32 bytes. */ + /* Now it has checked for first 16 bytes. */ vspltisb v0, 0 vspltisb v2, -1 lvsr v6, 0, r4 /* Compute mask. */