{"id":817321,"url":"http://patchwork.ozlabs.org/api/patches/817321/?format=json","web_url":"http://patchwork.ozlabs.org/project/linuxppc-dev/patch/1505950480-14830-2-git-send-email-wei.guo.simon@gmail.com/","project":{"id":2,"url":"http://patchwork.ozlabs.org/api/projects/2/?format=json","name":"Linux PPC development","link_name":"linuxppc-dev","list_id":"linuxppc-dev.lists.ozlabs.org","list_email":"linuxppc-dev@lists.ozlabs.org","web_url":"https://github.com/linuxppc/wiki/wiki","scm_url":"https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git","webscm_url":"https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/","list_archive_url":"https://lore.kernel.org/linuxppc-dev/","list_archive_url_format":"https://lore.kernel.org/linuxppc-dev/{}/","commit_url_format":"https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id={}"},"msgid":"<1505950480-14830-2-git-send-email-wei.guo.simon@gmail.com>","list_archive_url":"https://lore.kernel.org/linuxppc-dev/1505950480-14830-2-git-send-email-wei.guo.simon@gmail.com/","date":"2017-09-20T23:34:38","name":"[v2,1/3] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp().","commit_ref":null,"pull_url":null,"state":"superseded","archived":true,"hash":"a33f94cfaeffa5127153514689592bed324609c4","submitter":{"id":68632,"url":"http://patchwork.ozlabs.org/api/people/68632/?format=json","name":"Simon Guo","email":"wei.guo.simon@gmail.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linuxppc-dev/patch/1505950480-14830-2-git-send-email-wei.guo.simon@gmail.com/mbox/","series":[{"id":4540,"url":"http://patchwork.ozlabs.org/api/series/4540/?format=json","web_url":"http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=4540","date":"2017-09-20T23:34:38","name":"powerpc/64: memcmp() optimization","version":2,"mbox":"http://patchwork.ozlabs.org/series/4540/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/817321/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/817321/checks/","tags":{},"related":[],"headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xz2Qv3xnlz9sRV\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 22 Sep 2017 15:41:27 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xz2Qv2k9GzDsN1\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 22 Sep 2017 15:41:27 +1000 (AEST)","from mail-pf0-x241.google.com (mail-pf0-x241.google.com\n\t[IPv6:2607:f8b0:400e:c00::241])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xz2Mt4DSjzDrJL\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tFri, 22 Sep 2017 15:38:50 +1000 (AEST)","by mail-pf0-x241.google.com with SMTP id h4so65827pfk.0\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tThu, 21 Sep 2017 22:38:50 -0700 (PDT)","from simonLocalRHEL7.x64 ([112.73.6.48])\n\tby smtp.gmail.com with ESMTPSA id\n\tr12sm6234639pfd.187.2017.09.21.22.38.45\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tThu, 21 Sep 2017 22:38:48 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"l/m+3R5a\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"l/m+3R5a\"; dkim-atps=neutral","ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:400e:c00::241; helo=mail-pf0-x241.google.com;\n\tenvelope-from=wei.guo.simon@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"l/m+3R5a\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=from:to:cc:subject:date:message-id:in-reply-to:references\n\t:mime-version:content-transfer-encoding;\n\tbh=6lXgsb55BH7QEtG975rnFegiAD6Xma1bNAxwWEsa07k=;\n\tb=l/m+3R5aMt+vlxqGD4lSO6dXlTupNVmT359M6xzRolE+dGIe1cGySShsgKiYTjx39X\n\tHQ2saHTpJ7u6K0vT/VjKHsbnbj3hmWbfX03vaiD8hFcLE4nj+chj+Bij962d9Ehg6j4M\n\tQUMlY7oBCJvBqKdgtZSy7Nws+EyE8aOhty2fFUCuqhJRWFSSB3t6rlxQ18gBVKC9QAF/\n\t+T+9hzAYkJAG94lXqnbJj8xK6e6HSJsfeU22qEvvKBg7jEvB1+vqrw0jshRZByqHJ8SI\n\tLp7PR4IwWapRkQ6CAQbtkqj3ff8R5W+lJDVj6/jq81KkJLTZZo/Fw90dhukkkOA5Jw3t\n\tUl1Q==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to\n\t:references:mime-version:content-transfer-encoding;\n\tbh=6lXgsb55BH7QEtG975rnFegiAD6Xma1bNAxwWEsa07k=;\n\tb=e9p/o8fGphiRHdCtsjgCJqO34nAL11nZq9WnjQ4C8OMhNI2J89WhjVQ4kCsxIK52Du\n\tTh3VEzwkkfVYKvJQlEuonLgurSRxEEk69unJZtLtUEMLqd1GEBoEYAsjnrb7jtrhkM1s\n\tLCXAaLSeRFu0A8Vfu6cVmiymRVoVmOlvexLlQD5o1M3Rxa87fn0lJYn30cHBGt0lsdc/\n\tf12zGAInu3ZvoZ/+xsFaj9cMDR5Anj+KNindrRiK6QJxjED2nlyJQhuHlYtcXp6c7pMI\n\t/fT1r3NV00OmCChCunYCAqg2XriZzyOria4I9D6x8NB6kAo/dNASvI9taDbwbrNkRxW3\n\tNnYg==","X-Gm-Message-State":"AHPjjUjXVWan1ilK3zUb2/xQ92tm4FGsbuTbIbQ6pVKKJ+46wrl4+oae\n\t6yX9c0oTy23vAnlvKir8tFMBKQ==","X-Google-Smtp-Source":"AOwi7QASsPdahGb0HCrkp5bOhjQcWD1rEesmxVyNp/qPElQoMQp1ynBGGSkBntz46L4ukH/NfvkjKw==","X-Received":"by 10.101.76.141 with SMTP id m13mr2156798pgt.103.1506058728616; \n\tThu, 21 Sep 2017 22:38:48 -0700 (PDT)","From":"wei.guo.simon@gmail.com","To":"linuxppc-dev@lists.ozlabs.org","Subject":"[PATCH v2 1/3] powerpc/64: Align bytes before fall back to .Lshort\n\tin powerpc64 memcmp().","Date":"Thu, 21 Sep 2017 07:34:38 +0800","Message-Id":"<1505950480-14830-2-git-send-email-wei.guo.simon@gmail.com>","X-Mailer":"git-send-email 1.8.3.1","In-Reply-To":"<1505950480-14830-1-git-send-email-wei.guo.simon@gmail.com>","References":"<1505950480-14830-1-git-send-email-wei.guo.simon@gmail.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=UTF-8","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.24","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Simon Guo <wei.guo.simon@gmail.com>,\n\tDavid Laight <David.Laight@ACULAB.COM>, \n\t\"Naveen N.  Rao\" <naveen.n.rao@linux.vnet.ibm.com>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"},"content":"From: Simon Guo <wei.guo.simon@gmail.com>\n\nCurrently memcmp() 64bytes version in powerpc will fall back to .Lshort\n(compare per byte mode) if either src or dst address is not 8 bytes\naligned. It can be opmitized if both addresses are with the same offset\nwith 8 bytes boundary.\n\nmemcmp() can compare the unaligned bytes within 8 bytes boundary firstly\nand then compare the rest 8-bytes-aligned content with .Llong mode.\n\nThis patch optmizes memcmp() behavior in this situation.\n\nTest result:\n\n(1) 256 bytes\nTest with the existing tools/testing/selftests/powerpc/stringloops/memcmp:\n- without patch\n      50.996607479 seconds time elapsed                                          ( +- 0.01% )\n- with patch\n      28.033316997 seconds time elapsed                                          ( +- 0.01% )\n\t\t-> There is ~+81% percent improvement\n\n(2) 32 bytes\nTo observe performance impact on < 32 bytes, modify\ntools/testing/selftests/powerpc/stringloops/memcmp.c with following:\n-------\n #include <string.h>\n #include \"utils.h\"\n\n-#define SIZE 256\n+#define SIZE 32\n #define ITERATIONS 10000\n\n int test_memcmp(const void *s1, const void *s2, size_t n);\n--------\n\n- Without patch\n       0.392578831 seconds time elapsed                                          ( +- 0.05% )\n- with patch\n       0.358446662 seconds time elapsed                                          ( +- 0.04% )\n\t\t-> There is ～+9% improvement\n\n(3) 0~8 bytes\nTo observe <8 bytes performance impact, modify\ntools/testing/selftests/powerpc/stringloops/memcmp.c with following:\n-------\n #include <string.h>\n #include \"utils.h\"\n\n-#define SIZE 256\n-#define ITERATIONS 10000\n+#define SIZE 8\n+#define ITERATIONS 1000000\n\n int test_memcmp(const void *s1, const void *s2, size_t n);\n-------\n- Without patch\n       3.168752060 seconds time elapsed                                          ( +- 0.10% )\n- With patch\n       3.153030138 seconds time elapsed                                          ( +- 0.09% )\n\t\t-> They are nearly the same. (-0.4%)\n\nSigned-off-by: Simon Guo <wei.guo.simon@gmail.com>\n---\n arch/powerpc/lib/memcmp_64.S | 99 +++++++++++++++++++++++++++++++++++++++++---\n 1 file changed, 93 insertions(+), 6 deletions(-)","diff":"diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S\nindex d75d18b..6dccfb8 100644\n--- a/arch/powerpc/lib/memcmp_64.S\n+++ b/arch/powerpc/lib/memcmp_64.S\n@@ -24,28 +24,35 @@\n #define rH\tr31\n \n #ifdef __LITTLE_ENDIAN__\n+#define LH\tlhbrx\n+#define LW\tlwbrx\n #define LD\tldbrx\n #else\n+#define LH\tlhzx\n+#define LW\tlwzx\n #define LD\tldx\n #endif\n \n _GLOBAL(memcmp)\n \tcmpdi\tcr1,r5,0\n \n-\t/* Use the short loop if both strings are not 8B aligned */\n-\tor\tr6,r3,r4\n+\t/* Use the short loop if the src/dst addresses are not\n+\t * with the same offset of 8 bytes align boundary.\n+\t */\n+\txor\tr6,r3,r4\n \tandi.\tr6,r6,7\n \n-\t/* Use the short loop if length is less than 32B */\n-\tcmpdi\tcr6,r5,31\n+\t/* fall back to short loop if compare at aligned addrs\n+\t * with less than 8 bytes.\n+\t */\n+\tcmpdi   cr6,r5,7\n \n \tbeq\tcr1,.Lzero\n \tbne\t.Lshort\n-\tbgt\tcr6,.Llong\n+\tbgt\tcr6,.L8bytes_make_align_start\n \n .Lshort:\n \tmtctr\tr5\n-\n 1:\tlbz\trA,0(r3)\n \tlbz\trB,0(r4)\n \tsubf.\trC,rB,rA\n@@ -78,6 +85,78 @@ _GLOBAL(memcmp)\n \tli\tr3,0\n \tblr\n \n+.L8bytes_make_align_start:\n+\t/* attempt to compare bytes not aligned with 8 bytes so that\n+\t * left comparison can run based on 8 bytes alignment.\n+\t */\n+\tandi.   r6,r3,7\n+\tbeq     .L8bytes_aligned\n+\n+\t/* Try to compare the first double word which is not 8 bytes aligned:\n+\t * load the first double word at (src & ~7UL) and shift left appropriate\n+\t * bits before comparision.\n+\t */\n+\tclrlwi  r6,r3,29\n+\trlwinm  r6,r6,3,0,28\n+\tclrrdi\tr3,r3,3\n+\tclrrdi\tr4,r4,3\n+\tLD\trA,0,r3\n+\tLD\trB,0,r4\n+\tsld\trA,rA,r6\n+\tsld\trB,rB,r6\n+\tcmpld\tcr0,rA,rB\n+\tbne\tcr0,.LcmpAB_lightweight\n+\tsrwi\tr6,r6,3\n+\tsubfic  r6,r6,8\n+\tsubfc.\tr5,r6,r5\n+\tbeq\t.Lzero\n+\taddi\tr3,r3,8\n+\taddi\tr4,r4,8\n+\n+.L8bytes_aligned:\n+\t/* now we are aligned with 8 bytes.\n+\t * Use .Llong loop if left cmp bytes are equal or greater than 32B.\n+\t */\n+\tcmpdi   cr6,r5,31\n+\tbgt\tcr6,.Llong\n+\n+\tcmpdi   cr6,r5,7\n+\tbgt\tcr6,.Lcmp_8bytes_31bytes\n+\n+.Lcmp_rest_lt8bytes:\n+\t/* Here we have only less than 8 bytes to compare with. Addresses\n+\t * are aligned with 8 bytes.\n+\t * The next double words are load and shift right with appropriate\n+\t * bits.\n+\t */\n+\tsubfic  r6,r5,8\n+\trlwinm  r6,r6,3,0,28\n+\tLD\trA,0,r3\n+\tLD\trB,0,r4\n+\tsrd\trA,rA,r6\n+\tsrd\trB,rB,r6\n+\tcmpld\tcr0,rA,rB\n+\tbne\tcr0,.LcmpAB_lightweight\n+\tbeq\t.Lzero\n+\n+.Lcmp_8bytes_31bytes:\n+\t/* compare 8 ~ 31 bytes with 8 bytes aligned */\n+\tsrdi.   r0,r5,3\n+\tclrldi  r5,r5,61\n+\tmtctr   r0\n+831:\n+\tLD\trA,0,r3\n+\tLD\trB,0,r4\n+\tcmpld\tcr0,rA,rB\n+\tbne\tcr0,.LcmpAB_lightweight\n+\taddi\tr3,r3,8\n+\taddi\tr4,r4,8\n+\tbdnz\t831b\n+\n+\tcmpwi   r5,0\n+\tbeq\t.Lzero\n+\tb\t.Lcmp_rest_lt8bytes\n+\n .Lnon_zero:\n \tmr\tr3,rC\n \tblr\n@@ -232,4 +311,12 @@ _GLOBAL(memcmp)\n \tld\tr28,-32(r1)\n \tld\tr27,-40(r1)\n \tblr\n+\n+.LcmpAB_lightweight:   /* skip NV GPRS restore */\n+\tli\tr3,1\n+\tbgt\tcr0,8f\n+\tli\tr3,-1\n+8:\n+\tblr\n+\n EXPORT_SYMBOL(memcmp)\n","prefixes":["v2","1/3"]}