{"id":817322,"url":"http://patchwork.ozlabs.org/api/patches/817322/?format=json","web_url":"http://patchwork.ozlabs.org/project/linuxppc-dev/patch/1505950480-14830-3-git-send-email-wei.guo.simon@gmail.com/","project":{"id":2,"url":"http://patchwork.ozlabs.org/api/projects/2/?format=json","name":"Linux PPC development","link_name":"linuxppc-dev","list_id":"linuxppc-dev.lists.ozlabs.org","list_email":"linuxppc-dev@lists.ozlabs.org","web_url":"https://github.com/linuxppc/wiki/wiki","scm_url":"https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git","webscm_url":"https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/","list_archive_url":"https://lore.kernel.org/linuxppc-dev/","list_archive_url_format":"https://lore.kernel.org/linuxppc-dev/{}/","commit_url_format":"https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id={}"},"msgid":"<1505950480-14830-3-git-send-email-wei.guo.simon@gmail.com>","list_archive_url":"https://lore.kernel.org/linuxppc-dev/1505950480-14830-3-git-send-email-wei.guo.simon@gmail.com/","date":"2017-09-20T23:34:39","name":"[v2,2/3] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision","commit_ref":null,"pull_url":null,"state":"superseded","archived":true,"hash":"8452b4f4059370944a05d975c399c87f8ca40d23","submitter":{"id":68632,"url":"http://patchwork.ozlabs.org/api/people/68632/?format=json","name":"Simon Guo","email":"wei.guo.simon@gmail.com"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/linuxppc-dev/patch/1505950480-14830-3-git-send-email-wei.guo.simon@gmail.com/mbox/","series":[{"id":4540,"url":"http://patchwork.ozlabs.org/api/series/4540/?format=json","web_url":"http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=4540","date":"2017-09-20T23:34:38","name":"powerpc/64: memcmp() optimization","version":2,"mbox":"http://patchwork.ozlabs.org/series/4540/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/817322/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/817322/checks/","tags":{},"related":[],"headers":{"Return-Path":"<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>","X-Original-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Delivered-To":["patchwork-incoming@ozlabs.org","linuxppc-dev@lists.ozlabs.org"],"Received":["from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xz2TB4vvsz9sRW\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 22 Sep 2017 15:43:26 +1000 (AEST)","from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xz2TB3nPwzDsMB\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri, 22 Sep 2017 15:43:26 +1000 (AEST)","from mail-pg0-x244.google.com (mail-pg0-x244.google.com\n\t[IPv6:2607:f8b0:400e:c05::244])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xz2My0QsVzDsN0\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tFri, 22 Sep 2017 15:38:53 +1000 (AEST)","by mail-pg0-x244.google.com with SMTP id i130so86197pgc.0\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tThu, 21 Sep 2017 22:38:53 -0700 (PDT)","from simonLocalRHEL7.x64 ([112.73.6.48])\n\tby smtp.gmail.com with ESMTPSA id\n\tr12sm6234639pfd.187.2017.09.21.22.38.48\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tThu, 21 Sep 2017 22:38:51 -0700 (PDT)"],"Authentication-Results":["ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"R4Nn8pPh\"; dkim-atps=neutral","lists.ozlabs.org;\n\tdkim=fail reason=\"signature verification failed\" (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"R4Nn8pPh\"; dkim-atps=neutral","ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gmail.com\n\t(client-ip=2607:f8b0:400e:c05::244; helo=mail-pg0-x244.google.com;\n\tenvelope-from=wei.guo.simon@gmail.com; receiver=<UNKNOWN>)","lists.ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=gmail.com header.i=@gmail.com\n\theader.b=\"R4Nn8pPh\"; dkim-atps=neutral"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;\n\th=from:to:cc:subject:date:message-id:in-reply-to:references;\n\tbh=RqGkfPYpj0WDlieHPH/XpVhhwhDfY/eypKzSzxABFys=;\n\tb=R4Nn8pPhsvrD6jZgKzjM8I0mocb83ZP4/R1yT5Egbo5jgPEHkrpKENumX99PvfkvTN\n\t/4c+eLFwnSYOnyTx7exKMdMDZ3am6khR95xL+JIMiDiqIYgLaT6Z+ExhhjmV84RS2tzG\n\tg1n9xBTQBpnIGEtwPWzvmD6GmBlF8TBojnLrLIej4E2Q22xVrnEtNqzcWyYlkuKAHwRt\n\tlTTl4pVxMnM6o0xC/uyW1dB0cawbrpHICtIFXN5H4sIm34tVgz7oCezimr2oJ6yDrbsi\n\t7OgmfEpjjq1eR5gwPC6GuX9FxdAJjiUsvn6BVfp8nz1HEhWLtGFi7f2lf8ZrIBU6b03Z\n\tHknQ==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to\n\t:references;\n\tbh=RqGkfPYpj0WDlieHPH/XpVhhwhDfY/eypKzSzxABFys=;\n\tb=hCAeVP4py/xf0WBD04yRqblivA0OuuHyStsES/RtFWWgxFnH5o//l0nM5fDvkTmnex\n\tgLlJzwux4n1Y1sZMKGsarKvCvpDyQrO343fiV5wh4HcKfw1bo008DwB585dduL1hy5H8\n\tiPaZy0R0kFjLOGhJKZVIdyj+zNQ8NFTbR3YzxNcYZhJMfPMOnP9S0rlNyRRiqkaT0wfc\n\tuyzO3nITID0RkS1aqIc9Czj/lWixT46kDbIj9sA9GTtETI+ibqlG1h4OIWn5nzBeEn/y\n\tRCs+Wmq7qG2qcmK/VGfOzqae2eK0k0D8OFqu3LN209Q9xnekEenLE//a4kfc2agHj88q\n\t0C3Q==","X-Gm-Message-State":"AHPjjUhMZGvD+47V0rmF4Al0pprBvEUz/BFKpgIyM8BWBK5Z70/ueGN7\n\t+N+4+JGvWjk1VP9JgJKOyUtM3Q==","X-Google-Smtp-Source":"AOwi7QBpfaqxxLjBmwZ5t7AmjVM1lBsg+aXfN1x/t2vtYz/dHVA2LKU9e3rtUUShXd2h4Qao9YZhVA==","X-Received":"by 10.99.188.25 with SMTP id q25mr6076227pge.54.1506058731717;\n\tThu, 21 Sep 2017 22:38:51 -0700 (PDT)","From":"wei.guo.simon@gmail.com","To":"linuxppc-dev@lists.ozlabs.org","Subject":"[PATCH v2 2/3] powerpc/64: enhance memcmp() with VMX instruction for\n\tlong bytes comparision","Date":"Thu, 21 Sep 2017 07:34:39 +0800","Message-Id":"<1505950480-14830-3-git-send-email-wei.guo.simon@gmail.com>","X-Mailer":"git-send-email 1.8.3.1","In-Reply-To":"<1505950480-14830-1-git-send-email-wei.guo.simon@gmail.com>","References":"<1505950480-14830-1-git-send-email-wei.guo.simon@gmail.com>","X-BeenThere":"linuxppc-dev@lists.ozlabs.org","X-Mailman-Version":"2.1.24","Precedence":"list","List-Id":"Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>","List-Unsubscribe":"<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>","List-Archive":"<http://lists.ozlabs.org/pipermail/linuxppc-dev/>","List-Post":"<mailto:linuxppc-dev@lists.ozlabs.org>","List-Help":"<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>","List-Subscribe":"<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>","Cc":"Simon Guo <wei.guo.simon@gmail.com>,\n\tDavid Laight <David.Laight@ACULAB.COM>, \n\t\"Naveen N.  Rao\" <naveen.n.rao@linux.vnet.ibm.com>","Errors-To":"linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org","Sender":"\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>"},"content":"From: Simon Guo <wei.guo.simon@gmail.com>\n\nThis patch add VMX primitives to do memcmp() in case the compare size\nexceeds 4K bytes.\n\nTest result with following test program(replace the \"^>\" with \"\"):\n------\n># cat tools/testing/selftests/powerpc/stringloops/memcmp.c\n>#include <malloc.h>\n>#include <stdlib.h>\n>#include <string.h>\n>#include <time.h>\n>#include \"utils.h\"\n>#define SIZE (1024 * 1024 * 900)\n>#define ITERATIONS 40\n\nint test_memcmp(const void *s1, const void *s2, size_t n);\n\nstatic int testcase(void)\n{\n        char *s1;\n        char *s2;\n        unsigned long i;\n\n        s1 = memalign(128, SIZE);\n        if (!s1) {\n                perror(\"memalign\");\n                exit(1);\n        }\n\n        s2 = memalign(128, SIZE);\n        if (!s2) {\n                perror(\"memalign\");\n                exit(1);\n        }\n\n        for (i = 0; i < SIZE; i++)  {\n                s1[i] = i & 0xff;\n                s2[i] = i & 0xff;\n        }\n        for (i = 0; i < ITERATIONS; i++) {\n\t\tint ret = test_memcmp(s1, s2, SIZE);\n\n\t\tif (ret) {\n\t\t\tprintf(\"return %d at[%ld]! should have returned zero\\n\", ret, i);\n\t\t\tabort();\n\t\t}\n\t}\n\n        return 0;\n}\n\nint main(void)\n{\n        return test_harness(testcase, \"memcmp\");\n}\n------\nWithout VMX patch:\n       7.435191479 seconds time elapsed                                          ( +- 0.51% )\nWith VMX patch:\n       6.802038938 seconds time elapsed                                          ( +- 0.56% )\n\t\tThere is ~+8% improvement.\n\nHowever I am not aware whether there is use case in kernel for memcmp on\nlarge size yet.\n\nSigned-off-by: Simon Guo <wei.guo.simon@gmail.com>\n---\n arch/powerpc/include/asm/asm-prototypes.h |  2 +-\n arch/powerpc/lib/copypage_power7.S        |  2 +-\n arch/powerpc/lib/memcmp_64.S              | 82 +++++++++++++++++++++++++++++++\n arch/powerpc/lib/memcpy_power7.S          |  2 +-\n arch/powerpc/lib/vmx-helper.c             |  2 +-\n 5 files changed, 86 insertions(+), 4 deletions(-)","diff":"diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h\nindex 7330150..e6530d8 100644\n--- a/arch/powerpc/include/asm/asm-prototypes.h\n+++ b/arch/powerpc/include/asm/asm-prototypes.h\n@@ -49,7 +49,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval,\n /* VMX copying */\n int enter_vmx_usercopy(void);\n int exit_vmx_usercopy(void);\n-int enter_vmx_copy(void);\n+int enter_vmx_ops(void);\n void * exit_vmx_copy(void *dest);\n \n /* Traps */\ndiff --git a/arch/powerpc/lib/copypage_power7.S b/arch/powerpc/lib/copypage_power7.S\nindex ca5fc8f..9e7729e 100644\n--- a/arch/powerpc/lib/copypage_power7.S\n+++ b/arch/powerpc/lib/copypage_power7.S\n@@ -60,7 +60,7 @@ _GLOBAL(copypage_power7)\n \tstd\tr4,-STACKFRAMESIZE+STK_REG(R30)(r1)\n \tstd\tr0,16(r1)\n \tstdu\tr1,-STACKFRAMESIZE(r1)\n-\tbl\tenter_vmx_copy\n+\tbl\tenter_vmx_ops\n \tcmpwi\tr3,0\n \tld\tr0,STACKFRAMESIZE+16(r1)\n \tld\tr3,STK_REG(R31)(r1)\ndiff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S\nindex 6dccfb8..40218fc 100644\n--- a/arch/powerpc/lib/memcmp_64.S\n+++ b/arch/powerpc/lib/memcmp_64.S\n@@ -162,6 +162,13 @@ _GLOBAL(memcmp)\n \tblr\n \n .Llong:\n+#ifdef CONFIG_ALTIVEC\n+\t/* Try to use vmx loop if length is larger than 4K */\n+\tcmpldi  cr6,r5,4096\n+\tbgt\tcr6,.Lvmx_cmp\n+\n+.Llong_novmx_cmp:\n+#endif\n \tli\toff8,8\n \tli\toff16,16\n \tli\toff24,24\n@@ -319,4 +326,79 @@ _GLOBAL(memcmp)\n 8:\n \tblr\n \n+#ifdef CONFIG_ALTIVEC\n+.Lvmx_cmp:\n+\tmflr    r0\n+\tstd     r3,-STACKFRAMESIZE+STK_REG(R31)(r1)\n+\tstd     r4,-STACKFRAMESIZE+STK_REG(R30)(r1)\n+\tstd     r5,-STACKFRAMESIZE+STK_REG(R29)(r1)\n+\tstd     r0,16(r1)\n+\tstdu    r1,-STACKFRAMESIZE(r1)\n+\tbl      enter_vmx_ops\n+\tcmpwi   cr1,r3,0\n+\tld      r0,STACKFRAMESIZE+16(r1)\n+\tld      r3,STK_REG(R31)(r1)\n+\tld      r4,STK_REG(R30)(r1)\n+\tld      r5,STK_REG(R29)(r1)\n+\taddi\tr1,r1,STACKFRAMESIZE\n+\tmtlr    r0\n+\tbeq     cr1,.Llong_novmx_cmp\n+\n+3:\n+\t/* Enter with src/dst address 8 bytes aligned, and len is\n+\t * no less than 4KB. Need to align with 16 bytes further.\n+\t */\n+\tandi.\trA,r3,8\n+\tbeq\t4f\n+\tLD\trA,0,r3\n+\tLD\trB,0,r4\n+\tcmpld\tcr0,rA,rB\n+\tbne\tcr0,.LcmpAB_lightweight\n+\n+\taddi\tr3,r3,8\n+\taddi\tr4,r4,8\n+\taddi\tr5,r5,-8\n+\n+4:\n+\t/* compare 32 bytes for each loop */\n+\tsrdi\tr0,r5,5\n+\tmtctr\tr0\n+\tandi.\tr5,r5,31\n+\tli\toff16,16\n+\n+.balign 16\n+5:\n+\tlvx \tv0,0,r3\n+\tlvx \tv1,0,r4\n+\tvcmpequd. v0,v0,v1\n+\tbf\t24,7f\n+\tlvx \tv0,off16,r3\n+\tlvx \tv1,off16,r4\n+\tvcmpequd. v0,v0,v1\n+\tbf\t24,6f\n+\taddi\tr3,r3,32\n+\taddi\tr4,r4,32\n+\tbdnz\t5b\n+\n+\tcmpdi\tr5,0\n+\tbeq\t.Lzero\n+\tb\t.L8bytes_aligned\n+\n+6:\n+\taddi\tr3,r3,16\n+\taddi\tr4,r4,16\n+\n+7:\n+\tLD\trA,0,r3\n+\tLD\trB,0,r4\n+\tcmpld\tcr0,rA,rB\n+\tbne\tcr0,.LcmpAB_lightweight\n+\n+\tli\toff8,8\n+\tLD\trA,off8,r3\n+\tLD\trB,off8,r4\n+\tcmpld\tcr0,rA,rB\n+\tbne\tcr0,.LcmpAB_lightweight\n+\tb\t.Lzero\n+#endif\n EXPORT_SYMBOL(memcmp)\ndiff --git a/arch/powerpc/lib/memcpy_power7.S b/arch/powerpc/lib/memcpy_power7.S\nindex 193909a..682e386 100644\n--- a/arch/powerpc/lib/memcpy_power7.S\n+++ b/arch/powerpc/lib/memcpy_power7.S\n@@ -230,7 +230,7 @@ _GLOBAL(memcpy_power7)\n \tstd\tr5,-STACKFRAMESIZE+STK_REG(R29)(r1)\n \tstd\tr0,16(r1)\n \tstdu\tr1,-STACKFRAMESIZE(r1)\n-\tbl\tenter_vmx_copy\n+\tbl\tenter_vmx_ops\n \tcmpwi\tcr1,r3,0\n \tld\tr0,STACKFRAMESIZE+16(r1)\n \tld\tr3,STK_REG(R31)(r1)\ndiff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c\nindex bf925cd..923a9ab 100644\n--- a/arch/powerpc/lib/vmx-helper.c\n+++ b/arch/powerpc/lib/vmx-helper.c\n@@ -53,7 +53,7 @@ int exit_vmx_usercopy(void)\n \treturn 0;\n }\n \n-int enter_vmx_copy(void)\n+int enter_vmx_ops(void)\n {\n \tif (in_interrupt())\n \t\treturn 0;\n","prefixes":["v2","2/3"]}