[{"id":1770821,"web_url":"http://patchwork.ozlabs.org/comment/1770821/","msgid":"<063D6719AE5E284EB5DD2968C1650D6DD0079D63@AcuExch.aculab.com>","date":"2017-09-19T10:12:50","subject":"RE: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","submitter":{"id":6689,"url":"http://patchwork.ozlabs.org/api/people/6689/","name":"David Laight","email":"David.Laight@ACULAB.COM"},"content":"From: wei.guo.simon@gmail.com\r\n> Sent: 19 September 2017 11:04\r\n> Currently memcmp() in powerpc will fall back to .Lshort (compare per byte\r\n> mode) if either src or dst address is not 8 bytes aligned. It can be\r\n> opmitized if both addresses are with the same offset with 8 bytes boundary.\r\n> \r\n> memcmp() can align the src/dst address with 8 bytes firstly and then\r\n> compare with .Llong mode.\r\n\r\nWhy not mask both addresses with ~7 and mask/shift the read value to ignore\r\nthe unwanted high (BE) or low (LE) bits.\r\n\r\nThe same can be done at the end of the compare with any final, partial word.\r\n\r\n\tDavid","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxJfV5vFZz9ryr\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 19 Sep 2017 20:15:30 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xxJfV4W3zzDqZ5\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 19 Sep 2017 20:15:30 +1000 (AEST)","from smtp-out6.electric.net (smtp-out6.electric.net\n\t[192.162.217.182])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xxJbZ1ky4zDqkd\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue, 19 Sep 2017 20:12:57 +1000 (AEST)","from 1duFWO-0003aJ-Tk by out6c.electric.net with emc1-ok (Exim\n\t4.87) (envelope-from <David.Laight@ACULAB.COM>)\n\tid 1duFWO-0003e2-VX; Tue, 19 Sep 2017 03:12:52 -0700","by emcmailer; Tue, 19 Sep 2017 03:12:52 -0700","from [156.67.243.126] (helo=AcuExch.aculab.com)\n\tby out6c.electric.net with esmtps (TLSv1:AES128-SHA:128) (Exim 4.87)\n\t(envelope-from <David.Laight@ACULAB.COM>)\n\tid 1duFWO-0003aJ-Tk; Tue, 19 Sep 2017 03:12:52 -0700","from ACUEXCH.Aculab.com ([::1]) by AcuExch.aculab.com ([::1]) with\n\tmapi id 14.03.0123.003; Tue, 19 Sep 2017 11:12:52 +0100"],"Authentication-Results":"ozlabs.org;\n\tspf=softfail (mailfrom) smtp.mailfrom=aculab.com\n\t(client-ip=192.162.217.182; helo=smtp-out6.electric.net;\n\tenvelope-from=david.laight@aculab.com; receiver=<UNKNOWN>)","From":"David Laight <David.Laight@ACULAB.COM>","To":"\"'wei.guo.simon@gmail.com'\" <wei.guo.simon@gmail.com>,\n\t\"linuxppc-dev@lists.ozlabs.org\" <linuxppc-dev@lists.ozlabs.org>","Subject":"RE: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","Thread-Topic":"[PATCH v1 1/3] powerpc: Align bytes before fall back to\n\t.Lshort in powerpc memcmp","Thread-Index":"AQHTMS8pq6czDG0Sm0W4f6R53XnyeKK7/JSQ","Date":"Tue, 19 Sep 2017 10:12:50 +0000","Message-ID":"<063D6719AE5E284EB5DD2968C1650D6DD0079D63@AcuExch.aculab.com>","References":"<1505815439-18720-1-git-send-email-wei.guo.simon@gmail.com>\n\t<1505815439-18720-2-git-send-email-wei.guo.simon@gmail.com>","In-Reply-To":"<1505815439-18720-2-git-send-email-wei.guo.simon@gmail.com>","Accept-Language":"en-GB, en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.202.99.200]","Content-Type":"text/plain; charset=\"utf-8\"","Content-Transfer-Encoding":"base64","MIME-Version":"1.0","X-Outbound-IP":"156.67.243.126","X-Env-From":"David.Laight@ACULAB.COM","X-Proto":"esmtps","X-Revdns":"","X-HELO":"AcuExch.aculab.com","X-TLS":"TLSv1:AES128-SHA:128","X-Authenticated_ID":"","X-PolicySMART":"3396946, 3397078","X-Virus-Status":["Scanned by VirusSMART (c)","Scanned by VirusSMART (s)"],"X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.24","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"\"Naveen N.  Rao\" <naveen.n.rao@linux.vnet.ibm.com>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":1770946,"web_url":"http://patchwork.ozlabs.org/comment/1770946/","msgid":"<2355fa90-291c-3244-73b0-2d4cd1ce3d2c@c-s.fr>","date":"2017-09-19T12:20:57","subject":"Re: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","submitter":{"id":5234,"url":"http://patchwork.ozlabs.org/api/people/5234/","name":"Christophe Leroy","email":"christophe.leroy@c-s.fr"},"content":"Hi\n\nCould you in the email/patch subject write powerpc/64 instead pof \npowerpc as it doesn't apply to powerpc/32\n\nLe 19/09/2017 à 12:03, wei.guo.simon@gmail.com a écrit :\n> From: Simon Guo <wei.guo.simon@gmail.com>\n> \n> Currently memcmp() in powerpc will fall back to .Lshort (compare per byte\n\nSay powerpc/64 here too.\n\nChristophe\n\n> mode) if either src or dst address is not 8 bytes aligned. It can be\n> opmitized if both addresses are with the same offset with 8 bytes boundary.\n> \n> memcmp() can align the src/dst address with 8 bytes firstly and then\n> compare with .Llong mode.\n> \n> This patch optmizes memcmp() behavior in this situation.\n> \n> Test result:\n> \n> (1) 256 bytes\n> Test with the existing tools/testing/selftests/powerpc/stringloops/memcmp:\n> - without patch\n> \t50.715169506 seconds time elapsed                                          ( +-  0.04% )\n> - with patch\n> \t28.906602373 seconds time elapsed                                          ( +-  0.02% )\n> \t\t-> There is ~+75% percent improvement.\n> \n> (2) 32 bytes\n> To observe performance impact on < 32 bytes, modify\n> tools/testing/selftests/powerpc/stringloops/memcmp.c with following:\n> -------\n>   #include <string.h>\n>   #include \"utils.h\"\n> \n> -#define SIZE 256\n> +#define SIZE 32\n>   #define ITERATIONS 10000\n> \n>   int test_memcmp(const void *s1, const void *s2, size_t n);\n> --------\n> \n> - Without patch\n> \t0.390677136 seconds time elapsed                                          ( +-  0.03% )\n> - with patch\n> \t0.375685926 seconds time elapsed                                          ( +-  0.05% )\n> \t\t-> There is ～+4% improvement\n> \n> (3) 0~8 bytes\n> To observe <8 bytes performance impact, modify\n> tools/testing/selftests/powerpc/stringloops/memcmp.c with following:\n> -------\n>   #include <string.h>\n>   #include \"utils.h\"\n> \n> -#define SIZE 256\n> -#define ITERATIONS 10000\n> +#define SIZE 8\n> +#define ITERATIONS 100000\n> \n>   int test_memcmp(const void *s1, const void *s2, size_t n);\n> -------\n> - Without patch\n> \t3.169203981 seconds time elapsed                                          ( +-  0.23% )\n> - With patch\n> \t3.208257362 seconds time elapsed                                          ( +-  0.13% )\n> \t\t-> There is ~ -1% decrease.\n> (I don't know why yet, since there are the same number of instructions\n> in the code path for 0~8 bytes memcmp() with/without this patch.  Any\n> comments will be appreciated).\n> \n> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>\n> ---\n>   arch/powerpc/lib/memcmp_64.S | 86 +++++++++++++++++++++++++++++++++++++++++---\n>   1 file changed, 82 insertions(+), 4 deletions(-)\n> \n> diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S\n> index d75d18b..6dbafdb 100644\n> --- a/arch/powerpc/lib/memcmp_64.S\n> +++ b/arch/powerpc/lib/memcmp_64.S\n> @@ -24,25 +24,95 @@\n>   #define rH\tr31\n>   \n>   #ifdef __LITTLE_ENDIAN__\n> +#define LH\tlhbrx\n> +#define LW\tlwbrx\n>   #define LD\tldbrx\n>   #else\n> +#define LH\tlhzx\n> +#define LW\tlwzx\n>   #define LD\tldx\n>   #endif\n>   \n>   _GLOBAL(memcmp)\n>   \tcmpdi\tcr1,r5,0\n>   \n> -\t/* Use the short loop if both strings are not 8B aligned */\n> -\tor\tr6,r3,r4\n> +\t/* Use the short loop if the src/dst addresses are not\n> +\t * with the same offset of 8 bytes align boundary.\n> +\t */\n> +\txor\tr6,r3,r4\n>   \tandi.\tr6,r6,7\n>   \n> -\t/* Use the short loop if length is less than 32B */\n> -\tcmpdi\tcr6,r5,31\n> +\t/* fall back to short loop if compare at aligned addrs\n> +\t * with no greater than 8 bytes.\n> +\t */\n> +\tcmpdi   cr6,r5,8\n>   \n>   \tbeq\tcr1,.Lzero\n>   \tbne\t.Lshort\n> +\tble\tcr6,.Lshort\n> +\n> +.Lalignbytes_start:\n> +\t/* The bits 0/1/2 of src/dst addr are the same. */\n> +\tneg\tr0,r3\n> +\tandi.\tr0,r0,7\n> +\tbeq\t.Lalign8bytes\n> +\n> +\tPPC_MTOCRF(1,r0)\n> +\tbf\t31,.Lalign2bytes\n> +\tlbz\trA,0(r3)\n> +\tlbz\trB,0(r4)\n> +\tcmplw\tcr0,rA,rB\n> +\tbne\tcr0,.LcmpAB_lightweight\n> +\taddi\tr3,r3,1\n> +\taddi\tr4,r4,1\n> +\tsubi\tr5,r5,1\n> +.Lalign2bytes:\n> +\tbf\t30,.Lalign4bytes\n> +\tLH\trA,0,r3\n> +\tLH\trB,0,r4\n> +\tcmplw\tcr0,rA,rB\n> +\tbne\tcr0,.LcmpAB_lightweight\n> +\tbne\t.Lnon_zero\n> +\taddi\tr3,r3,2\n> +\taddi\tr4,r4,2\n> +\tsubi\tr5,r5,2\n> +.Lalign4bytes:\n> +\tbf\t29,.Lalign8bytes\n> +\tLW\trA,0,r3\n> +\tLW\trB,0,r4\n> +\tcmpld\tcr0,rA,rB\n> +\tbne\tcr0,.LcmpAB_lightweight\n> +\taddi\tr3,r3,4\n> +\taddi\tr4,r4,4\n> +\tsubi\tr5,r5,4\n> +.Lalign8bytes:\n> +\t/* Now addrs are aligned with 8 bytes. Use the short loop if left\n> +\t * bytes are less than 8B.\n> +\t */\n> +\tcmpdi   cr6,r5,7\n> +\tble\tcr6,.Lshort\n> +\n> +\t/* Use .Llong loop if left cmp bytes are equal or greater than 32B */\n> +\tcmpdi   cr6,r5,31\n>   \tbgt\tcr6,.Llong\n>   \n> +.Lcmploop_8bytes_31bytes:\n> +\t/* handle 8 ~ 31 bytes with 8 bytes aligned addrs */\n> +\tsrdi.   r0,r5,3\n> +\tclrldi  r5,r5,61\n> +\tmtctr   r0\n> +831:\n> +\tLD\trA,0,r3\n> +\tLD\trB,0,r4\n> +\tcmpld\tcr0,rA,rB\n> +\tbne\tcr0,.LcmpAB_lightweight\n> +\taddi\tr3,r3,8\n> +\taddi\tr4,r4,8\n> +\tbdnz\t831b\n> +\n> +\tcmpwi   r5,0\n> +\tbeq\t.Lzero\n> +\n>   .Lshort:\n>   \tmtctr\tr5\n>   \n> @@ -232,4 +302,12 @@ _GLOBAL(memcmp)\n>   \tld\tr28,-32(r1)\n>   \tld\tr27,-40(r1)\n>   \tblr\n> +\n> +.LcmpAB_lightweight:   /* skip NV GPRS restore */\n> +\tli\tr3,1\n> +\tbgt\tcr0,8f\n> +\tli\tr3,-1\n> +8:\n> +\tblr\n> +\n>   EXPORT_SYMBOL(memcmp)\n>","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxMT23lNjz9rxl\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 19 Sep 2017 22:22:30 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xxMT22bvHzDqY8\n\tfor <patchwork-incoming@ozlabs.org>;\n\tTue, 19 Sep 2017 22:22:30 +1000 (AEST)","from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xxMRb4hkvzDq8f\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tTue, 19 Sep 2017 22:21:13 +1000 (AEST)","from localhost (mailhub1-int [192.168.12.234])\n\tby localhost (Postfix) with ESMTP id 3xxMR55vNRz9ttBd;\n\tTue, 19 Sep 2017 14:20:49 +0200 (CEST)","from pegase1.c-s.fr ([192.168.12.234])\n\tby localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new,\n\tport 10024)\n\twith ESMTP id 52nEtrisqq7r; Tue, 19 Sep 2017 14:20:49 +0200 (CEST)","from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192])\n\tby pegase1.c-s.fr (Postfix) with ESMTP id 3xxMR54zC7z9ttBL;\n\tTue, 19 Sep 2017 14:20:49 +0200 (CEST)","from localhost (localhost [127.0.0.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id E4FB68B820;\n\tTue, 19 Sep 2017 14:20:57 +0200 (CEST)","from messagerie.si.c-s.fr ([127.0.0.1])\n\tby localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new,\n\tport 10023)\n\twith ESMTP id d_gojcTcIqwE; Tue, 19 Sep 2017 14:20:57 +0200 (CEST)","from PO15451 (po15451.idsi0.si.c-s.fr [172.25.231.1])\n\tby messagerie.si.c-s.fr (Postfix) with ESMTP id A6C618B810;\n\tTue, 19 Sep 2017 14:20:57 +0200 (CEST)"],"Authentication-Results":"ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=c-s.fr\n\t(client-ip=93.17.236.30; helo=pegase1.c-s.fr;\n\tenvelope-from=christophe.leroy@c-s.fr; receiver=<UNKNOWN>)","X-Virus-Scanned":["Debian amavisd-new at c-s.fr","amavisd-new at c-s.fr"],"Subject":"Re: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","To":"wei.guo.simon@gmail.com, linuxppc-dev@lists.ozlabs.org","References":"<1505815439-18720-1-git-send-email-wei.guo.simon@gmail.com>\n\t<1505815439-18720-2-git-send-email-wei.guo.simon@gmail.com>","From":"Christophe LEROY <christophe.leroy@c-s.fr>","Message-ID":"<2355fa90-291c-3244-73b0-2d4cd1ce3d2c@c-s.fr>","Date":"Tue, 19 Sep 2017 14:20:57 +0200","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101\n\tThunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<1505815439-18720-2-git-send-email-wei.guo.simon@gmail.com>","Content-Type":"text/plain; charset=utf-8; format=flowed","Content-Language":"fr","Content-Transfer-Encoding":"8bit","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.24","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"\"Naveen N. Rao\" <naveen.n.rao@linux.vnet.ibm.com>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":1771719,"web_url":"http://patchwork.ozlabs.org/comment/1771719/","msgid":"<20170920095635.GA3387@simonLocalRHEL7.x64>","date":"2017-09-20T09:56:35","subject":"Re: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","submitter":{"id":68632,"url":"http://patchwork.ozlabs.org/api/people/68632/","name":"Simon Guo","email":"wei.guo.simon@gmail.com"},"content":"On Tue, Sep 19, 2017 at 10:12:50AM +0000, David Laight wrote:\n> From: wei.guo.simon@gmail.com\n> > Sent: 19 September 2017 11:04\n> > Currently memcmp() in powerpc will fall back to .Lshort (compare per byte\n> > mode) if either src or dst address is not 8 bytes aligned. It can be\n> > opmitized if both addresses are with the same offset with 8 bytes boundary.\n> > \n> > memcmp() can align the src/dst address with 8 bytes firstly and then\n> > compare with .Llong mode.\n> \n> Why not mask both addresses with ~7 and mask/shift the read value to ignore\n> the unwanted high (BE) or low (LE) bits.\n> \n> The same can be done at the end of the compare with any final, partial word.\n> \n> \tDavid\n>  \n\nYes. That will be better. A prototyping shows ~5% improvement on 32 bytes \nsize comparison with v1. I will rework on v2.\n\nThanks for the suggestion.\n\nBR,\n- Simon","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxwCw4wDfz9s03\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 20 Sep 2017 19:58:04 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xxwCw3847zDqY3\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 20 Sep 2017 19:58:04 +1000 (AEST)","from mail-pg0-x244.google.com (mail-pg0-x244.google.com\n\t[IPv6:2607:f8b0:400e:c05::244])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xxwBK5wcMzDqBd\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tWed, 20 Sep 2017 19:56:40 +1000 (AEST)","by mail-pg0-x244.google.com with SMTP id i130so1382726pgc.0\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tWed, 20 Sep 2017 02:56:40 -0700 (PDT)","from localhost ([112.73.6.48]) by smtp.gmail.com with ESMTPSA id\n\tk73sm7698575pfg.81.2017.09.20.02.56.38\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tWed, 20 Sep 2017 02:56:38 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"ccDsSZYM\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"ccDsSZYM\"; dkim-atps=neutral","ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:400e:c05::244; helo=mail-pg0-x244.google.com;\n\tenvelope-from=wei.guo.simon@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"ccDsSZYM\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=date:from:to:cc:subject:message-id:references:mime-version\n\t:content-disposition:in-reply-to:user-agent;\n\tbh=0CDEsWTg/vxoNtekgOSWbaqkxQaYBuQ3qPGdZ4/FVIU=;\n\tb=ccDsSZYML8QynDN5vEcCMsMIKwUW0dPWc3LNIU3G6VDsdSFKS2TDKhljP9xl2B5cpF\n\t5ENKSyt/XIliAz+BzY2fImXh0KnMyMtvHrSwUT2eewsIHNvBmTDRn6W6nlBTJxuC3XuP\n\tOwGAWpYVPcPBQ7vP8BpYfKR4S2AVy2RBxAuhZPrPHinAzwz57qyA7QOV9bwzvidh9DCU\n\tLaYaRWKMRayGL8U9b3Hd/Fwp2ukxQKdz0EX+wfhYTMypACIe484yWVQR5hjkZyn/4Zky\n\t8Jgyk3vRcDAexvnOxrrl4kObKvO5INC0DOoFmKPNeArkZc5k3537Mhevk5yp+kgdNulF\n\tGwkA==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:date:from:to:cc:subject:message-id:references\n\t:mime-version:content-disposition:in-reply-to:user-agent;\n\tbh=0CDEsWTg/vxoNtekgOSWbaqkxQaYBuQ3qPGdZ4/FVIU=;\n\tb=QvVO8174LubYi02NYXYHSWnGenQUh018ZH0gYdEu2N8lkSK9V3HO4pNpQ6lz/yxDYk\n\t9+x73IIYmHXpOOx9PkcfCTPzGSoHjp07ZO5ynRGlCuvfzSjuUS3nKJunZcKPAaVbkCUk\n\tFHF82TQB6NxH8bkQzyPGE1ESL/OxUfJY3cufCYJpJf54ddgLUN1WM1CYL0CPkDewKuTR\n\tmgWssv73Aze6A0DUNtzDy/7qHlZWbuBLp5t9HtPJR1sKA5LI4IEND1aHG1wZnrtu969X\n\tGD31/uJBvgZxfzVshFQG9tkvKRhkUnvXZ5Ub26ZBETL4fGcPhtFQ6D4tBx2tb9WTouZb\n\t0GAA==","X-Gm-Message-State":"AHPjjUgz/UWt1LZe6oUmfx1G5Ram+P3BGvEBPIE94IkeaARoWyNIHtPx\n\txep4ygCgSnJr5QB7RspGoN4=","X-Google-Smtp-Source":"AOwi7QAZwVI5g4Ae9EsO+LYGUIeCxp3+lbRskJn1vBd6k0ksYEYM5anIELgLBVAISuYkyfiSlWuW9w==","X-Received":"by 10.84.247.8 with SMTP id n8mr1555472pll.318.1505901399234;\n\tWed, 20 Sep 2017 02:56:39 -0700 (PDT)","Date":"Wed, 20 Sep 2017 17:56:35 +0800","From":"Simon Guo <wei.guo.simon@gmail.com>","To":"David Laight <David.Laight@ACULAB.COM>","Subject":"Re: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","Message-ID":"<20170920095635.GA3387@simonLocalRHEL7.x64>","References":"<1505815439-18720-1-git-send-email-wei.guo.simon@gmail.com>\n\t<1505815439-18720-2-git-send-email-wei.guo.simon@gmail.com>\n\t<063D6719AE5E284EB5DD2968C1650D6DD0079D63@AcuExch.aculab.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<063D6719AE5E284EB5DD2968C1650D6DD0079D63@AcuExch.aculab.com>","User-Agent":"Mutt/1.5.21 (2010-09-15)","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.24","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"\"Naveen N.  Rao\" <naveen.n.rao@linux.vnet.ibm.com>,\n\t\"linuxppc-dev@lists.ozlabs.org\" <linuxppc-dev@lists.ozlabs.org>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}},{"id":1771722,"web_url":"http://patchwork.ozlabs.org/comment/1771722/","msgid":"<063D6719AE5E284EB5DD2968C1650D6DD007B037@AcuExch.aculab.com>","date":"2017-09-20T10:05:49","subject":"RE: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","submitter":{"id":6689,"url":"http://patchwork.ozlabs.org/api/people/6689/","name":"David Laight","email":"David.Laight@ACULAB.COM"},"content":"From: Simon Guo\n> Sent: 20 September 2017 10:57\n> On Tue, Sep 19, 2017 at 10:12:50AM +0000, David Laight wrote:\n> > From: wei.guo.simon@gmail.com\n> > > Sent: 19 September 2017 11:04\n> > > Currently memcmp() in powerpc will fall back to .Lshort (compare per byte\n> > > mode) if either src or dst address is not 8 bytes aligned. It can be\n> > > opmitized if both addresses are with the same offset with 8 bytes boundary.\n> > >\n> > > memcmp() can align the src/dst address with 8 bytes firstly and then\n> > > compare with .Llong mode.\n> >\n> > Why not mask both addresses with ~7 and mask/shift the read value to ignore\n> > the unwanted high (BE) or low (LE) bits.\n> >\n> > The same can be done at the end of the compare with any final, partial word.\n> \n> Yes. That will be better. A prototyping shows ~5% improvement on 32 bytes\n> size comparison with v1. I will rework on v2.\n\nClearly you have to be careful to return the correct +1/-1 on mismatch.\n\nFor systems that can do misaligned transfers you can compare the first\nword, then compare aligned words and finally the last word.\nRather like a memcpy() function I wrote (for NetBDSD) that copied\nthe last word first, then a whole number of words aligned at the start.\n(Hope no one expected anything special for overlapping copies.)\n\n\tDavid","headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xxwQD1pnPz9sPs\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 20 Sep 2017 20:07:00 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xxwQD0jskzDqn6\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed, 20 Sep 2017 20:07:00 +1000 (AEST)","from smtp-out4.electric.net (smtp-out4.electric.net\n\t[192.162.216.194])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xxwP029XxzDqBd\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tWed, 20 Sep 2017 20:05:54 +1000 (AEST)","from 1dubt7-0001qj-VU by out4c.electric.net with emc1-ok (Exim\n\t4.87) (envelope-from <David.Laight@ACULAB.COM>)\n\tid 1dubt8-0001ud-UI; Wed, 20 Sep 2017 03:05:50 -0700","by emcmailer; Wed, 20 Sep 2017 03:05:50 -0700","from [156.67.243.126] (helo=AcuExch.aculab.com)\n\tby out4c.electric.net with esmtps (TLSv1:AES128-SHA:128) (Exim 4.87)\n\t(envelope-from <David.Laight@ACULAB.COM>)\n\tid 1dubt7-0001qj-VU; Wed, 20 Sep 2017 03:05:49 -0700","from ACUEXCH.Aculab.com ([::1]) by AcuExch.aculab.com ([::1]) with\n\tmapi id 14.03.0123.003; Wed, 20 Sep 2017 11:05:50 +0100"],"Authentication-Results":"ozlabs.org;\n\tspf=softfail (mailfrom) smtp.mailfrom=aculab.com\n\t(client-ip=192.162.216.194; helo=smtp-out4.electric.net;\n\tenvelope-from=david.laight@aculab.com; receiver=<UNKNOWN>)","From":"David Laight <David.Laight@ACULAB.COM>","To":"'Simon Guo' <wei.guo.simon@gmail.com>","Subject":"RE: [PATCH v1 1/3] powerpc: Align bytes before fall back to .Lshort\n\tin powerpc memcmp","Thread-Topic":"[PATCH v1 1/3] powerpc: Align bytes before fall back to\n\t.Lshort in powerpc memcmp","Thread-Index":"AQHTMS8pq6czDG0Sm0W4f6R53XnyeKK7/JSQgAF9xoCAABHXEA==","Date":"Wed, 20 Sep 2017 10:05:49 +0000","Message-ID":"<063D6719AE5E284EB5DD2968C1650D6DD007B037@AcuExch.aculab.com>","References":"<1505815439-18720-1-git-send-email-wei.guo.simon@gmail.com>\n\t<1505815439-18720-2-git-send-email-wei.guo.simon@gmail.com>\n\t<063D6719AE5E284EB5DD2968C1650D6DD0079D63@AcuExch.aculab.com>\n\t<20170920095635.GA3387@simonLocalRHEL7.x64>","In-Reply-To":"<20170920095635.GA3387@simonLocalRHEL7.x64>","Accept-Language":"en-GB, en-US","Content-Language":"en-US","X-MS-Has-Attach":"","X-MS-TNEF-Correlator":"","x-originating-ip":"[10.202.99.200]","Content-Type":"text/plain; charset=\"Windows-1252\"","Content-Transfer-Encoding":"quoted-printable","MIME-Version":"1.0","X-Outbound-IP":"156.67.243.126","X-Env-From":"David.Laight@ACULAB.COM","X-Proto":"esmtps","X-Revdns":"","X-HELO":"AcuExch.aculab.com","X-TLS":"TLSv1:AES128-SHA:128","X-Authenticated_ID":"","X-PolicySMART":"3396946, 3397078","X-Virus-Status":["Scanned by VirusSMART (c)","Scanned by VirusSMART (s)"],"X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.24","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"\"Naveen N.  Rao\" <naveen.n.rao@linux.vnet.ibm.com>,\n\t\"linuxppc-dev@lists.ozlabs.org\" <linuxppc-dev@lists.ozlabs.org>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"}}]