Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/231/?format=api
{ "id": 231, "url": "http://patchwork.ozlabs.org/api/patches/231/?format=api", "web_url": "http://patchwork.ozlabs.org/project/linuxppc-dev/patch/alpine.LFD.1.10.0809101502580.14991@localhost.localdomain/", "project": { "id": 2, "url": "http://patchwork.ozlabs.org/api/projects/2/?format=api", "name": "Linux PPC development", "link_name": "linuxppc-dev", "list_id": "linuxppc-dev.lists.ozlabs.org", "list_email": "linuxppc-dev@lists.ozlabs.org", "web_url": "https://github.com/linuxppc/wiki/wiki", "scm_url": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git", "webscm_url": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/", "list_archive_url": "https://lore.kernel.org/linuxppc-dev/", "list_archive_url_format": "https://lore.kernel.org/linuxppc-dev/{}/", "commit_url_format": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id={}" }, "msgid": "<alpine.LFD.1.10.0809101502580.14991@localhost.localdomain>", "list_archive_url": "https://lore.kernel.org/linuxppc-dev/alpine.LFD.1.10.0809101502580.14991@localhost.localdomain/", "date": "2008-09-10T20:15:23", "name": "64 bit csum_partial_copy_generic", "commit_ref": null, "pull_url": null, "state": "changes-requested", "archived": true, "hash": "267e6cca25bf8af9ab50d4c2c0219f14aacf93bc", "submitter": { "id": 131, "url": "http://patchwork.ozlabs.org/api/people/131/?format=api", "name": null, "email": "jschopp@austin.ibm.com" }, "delegate": { "id": 13, "url": "http://patchwork.ozlabs.org/api/users/13/?format=api", "username": "paulus", "first_name": "Paul", "last_name": "Mackerras", "email": "paulus@samba.org" }, "mbox": "http://patchwork.ozlabs.org/project/linuxppc-dev/patch/alpine.LFD.1.10.0809101502580.14991@localhost.localdomain/mbox/", "series": [], "comments": "http://patchwork.ozlabs.org/api/patches/231/comments/", "check": "pending", "checks": "http://patchwork.ozlabs.org/api/patches/231/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<linuxppc-dev-bounces+patchwork=ozlabs.org@ozlabs.org>", "X-Original-To": [ "patchwork@ozlabs.org", "linuxppc-dev@ozlabs.org" ], "Delivered-To": [ "patchwork@ozlabs.org", "linuxppc-dev@ozlabs.org" ], "Received": [ "from ozlabs.org (localhost [127.0.0.1])\n\tby ozlabs.org (Postfix) with ESMTP id 93307DE2FB\n\tfor <patchwork@ozlabs.org>; Thu, 11 Sep 2008 06:16:05 +1000 (EST)", "from e6.ny.us.ibm.com (e6.ny.us.ibm.com [32.97.182.146])\n\t(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))\n\t(Client CN \"e6.ny.us.ibm.com\", Issuer \"Equifax\" (verified OK))\n\tby ozlabs.org (Postfix) with ESMTPS id B0027DDF5D\n\tfor <linuxppc-dev@ozlabs.org>; Thu, 11 Sep 2008 06:15:45 +1000 (EST)", "from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236])\n\tby e6.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m8AKIAwq023117\n\tfor <linuxppc-dev@ozlabs.org>; Wed, 10 Sep 2008 16:18:10 -0400", "from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216])\n\tby d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id\n\tm8AKFO0h198042\n\tfor <linuxppc-dev@ozlabs.org>; Wed, 10 Sep 2008 16:15:24 -0400", "from d01av02.pok.ibm.com (loopback [127.0.0.1])\n\tby d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id\n\tm8AKFORX024566\n\tfor <linuxppc-dev@ozlabs.org>; Wed, 10 Sep 2008 16:15:24 -0400", "from localhost.localdomain (gamma.ltc.austin.ibm.com [9.3.190.168])\n\tby d01av02.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id\n\tm8AKFNGW024510\n\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);\n\tWed, 10 Sep 2008 16:15:24 -0400", "from localhost.localdomain (gamma [127.0.0.1])\n\tby localhost.localdomain (8.14.1/8.14.1) with ESMTP id m8AKFN3n015021;\n\tWed, 10 Sep 2008 15:15:23 -0500", "from localhost (jschopp@localhost)\n\tby localhost.localdomain (8.14.1/8.14.1/Submit) with ESMTP id\n\tm8AKFNka015016; Wed, 10 Sep 2008 15:15:23 -0500" ], "Date": "Wed, 10 Sep 2008 15:15:23 -0500 (CDT)", "From": "jschopp@austin.ibm.com", "To": "linuxppc-dev@ozlabs.org", "Subject": "[PATCH/RFC] 64 bit csum_partial_copy_generic", "Message-ID": "<alpine.LFD.1.10.0809101502580.14991@localhost.localdomain>", "User-Agent": "Alpine 1.10 (LFD 962 2008-03-14)", "MIME-Version": "1.0", "Cc": "paulus@samba.org, anton@samba.org", "X-BeenThere": "linuxppc-dev@ozlabs.org", "X-Mailman-Version": "2.1.11", "Precedence": "list", "List-Id": "Linux on PowerPC Developers Mail List <linuxppc-dev.ozlabs.org>", "List-Unsubscribe": "<https://ozlabs.org/mailman/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@ozlabs.org?subject=unsubscribe>", "List-Archive": "<http://ozlabs.org/pipermail/linuxppc-dev>", "List-Post": "<mailto:linuxppc-dev@ozlabs.org>", "List-Help": "<mailto:linuxppc-dev-request@ozlabs.org?subject=help>", "List-Subscribe": "<https://ozlabs.org/mailman/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@ozlabs.org?subject=subscribe>", "Content-Transfer-Encoding": "7bit", "Content-Type": "text/plain; charset=\"us-ascii\"; Format=\"flowed\"", "Sender": "linuxppc-dev-bounces+patchwork=ozlabs.org@ozlabs.org", "Errors-To": "linuxppc-dev-bounces+patchwork=ozlabs.org@ozlabs.org" }, "content": "The current 64 bit csum_partial_copy_generic function is based on the 32 \nbit version and never was optimized for 64 bit. This patch takes the 64 bit \nmemcpy and adapts it to also do the sum. It has been tested on a variety \nof input sizes and alignments on Power5 and Power6 processors. It gives \ncorrect output for all cases tested. It also runs 20-55% faster \nthan the implemention it replaces depending on size, alignment, and processor.\n\nI think there is still some room for improvement in the unaligned case, \nbut given that it is much faster than what we have now I figured I'd send \nit out.\n\nSigned-off-by: Joel Schopp<jschopp@austin.ibm.com>", "diff": "Index: 2.6.26/arch/powerpc/lib/checksum_64.S\n===================================================================\n--- 2.6.26.orig/arch/powerpc/lib/checksum_64.S\n+++ 2.6.26/arch/powerpc/lib/checksum_64.S\n@@ -22,8 +22,7 @@\n * len is in words and is always >= 5.\n *\n * In practice len == 5, but this is not guaranteed. So this code does not\n- * attempt to use doubleword instructions.\n- */\n+ * attempt to use doubleword instructions. */\n _GLOBAL(ip_fast_csum)\n \tlwz\tr0,0(r3)\n \tlwzu\tr5,4(r3)\n@@ -122,108 +121,286 @@ _GLOBAL(csum_partial)\n * to *src_err or *dst_err respectively, and (for an error on\n * src) zeroes the rest of dst.\n *\n- * This code needs to be reworked to take advantage of 64 bit sum+copy.\n- * However, due to tokenring halfword alignment problems this will be very\n- * tricky. For now we'll leave it until we instrument it somehow.\n+ * This returns a 32 bit 1s complement sum that can be folded to 16 bits and\n+ * notted to produce a 16bit tcp/ip checksum.\n *\n * csum_partial_copy_generic(r3=src, r4=dst, r5=len, r6=sum, r7=src_err, r8=dst_err)\n */\n _GLOBAL(csum_partial_copy_generic)\n-\taddic\tr0,r6,0\n-\tsubi\tr3,r3,4\n-\tsubi\tr4,r4,4\n-\tsrwi.\tr6,r5,2\n-\tbeq\t3f\t\t/* if we're doing < 4 bytes */\n-\tandi.\tr9,r4,2\t\t/* Align dst to longword boundary */\n-\tbeq+\t1f\n-81:\tlhz\tr6,4(r3)\t/* do 2 bytes to get aligned */\n-\taddi\tr3,r3,2\n-\tsubi\tr5,r5,2\n-91:\tsth\tr6,4(r4)\n-\taddi\tr4,r4,2\n-\taddc\tr0,r0,r6\n-\tsrwi.\tr6,r5,2\t\t/* # words to do */\n+\tstd\tr7,48(r1)\t/* we need to save the error pointers ...*/\n+\tstd\tr8,56(r1)\t/* we need to save the error pointers ...*/\n+\tPPC_MTOCRF\t0x01,r5\n+\tcmpldi\tcr1,r5,16\n+\tneg\tr11,r4\t\t# LS 3 bits = # bytes to 8-byte dest bdry\n+\tandi.\tr11,r11,7\n+\tdcbt\t0,r3\n+\tblt\tcr1,.Lshort_copy\n+\tbne\t.Ldst_unaligned\n+.Ldst_aligned:\n+\tandi.\tr0,r3,7\n+\taddi\tr4,r4,-16\n+\tbne\t.Lsrc_unaligned\n+\tsrdi\tr10,r5,4\t\t/* src and dst aligned */\n+80:\tld\tr9,0(r3)\n+\taddi\tr3,r3,-8\n+\tmtctr\tr10\n+\tandi.\tr5,r5,7\n+\tbf\tcr7*4+0,2f\n+\taddi\tr4,r4,8\n+\taddi\tr3,r3,8\n+\tmr\tr12,r9\n+\tblt\tcr1,3f\n+1:\n+81:\tld\tr9,8(r3)\n+82:\tstd\tr12,8(r4)\n+\tadde\tr6,r6,r12 \t/* add to checksum */\n+2:\n+83:\tldu\tr12,16(r3)\n+84:\tstdu\tr9,16(r4)\n+\tadde\tr6,r6,r9 \t/* add to checksum */\n+\tbdnz\t1b\n+3:\n+85:\tstd\tr12,8(r4)\n+\tadde\tr6,r6,r12 \t/* add to checksum */\n \tbeq\t3f\n-1:\tmtctr\tr6\n-82:\tlwzu\tr6,4(r3)\t/* the bdnz has zero overhead, so it should */\n-92:\tstwu\tr6,4(r4)\t/* be unnecessary to unroll this loop */\n-\tadde\tr0,r0,r6\n-\tbdnz\t82b\n-\tandi.\tr5,r5,3\n-3:\tcmpwi\t0,r5,2\n-\tblt+\t4f\n-83:\tlhz\tr6,4(r3)\n+\taddi\tr4,r4,16\n+\tld\tr9,8(r3)\n+.Ldo_tail:\n+\tbf\tcr7*4+1,1f\n+\trotldi\tr9,r9,32\n+86:\tstw\tr9,0(r4)\n+\tadde\tr6,r6,r9 \t/* add to checksum */\n+\taddi\tr4,r4,4\n+1:\tbf\tcr7*4+2,2f\n+\trotldi\tr9,r9,16\n+87:\tsth\tr9,0(r4)\n+\tadde\tr6,r6,r9 \t/* add to checksum */\n+\taddi\tr4,r4,2\n+2:\tbf\tcr7*4+3,3f\n+\trotldi\tr9,r9,8\n+88:\tstb\tr9,0(r4)\n+\tadde\tr6,r6,r9 \t/* add to checksum */\n+3:\taddze\tr6,r6\t\t/* add in final carry (unlikely with 64-bit regs) */\n+ rldicl r9,r6,32,0 /* fold 64 bit value */\n+ add r3,r9,r6\n+ srdi r3,r3,32\n+\tblr\t\t\t/* return sum */\n+\n+.Lsrc_unaligned:\n+\tsrdi\tr11,r5,3\n+\taddi\tr5,r5,-16\n+\tsubf\tr3,r0,r3\n+\tsrdi\tr7,r5,4\n+\tsldi\tr10,r0,3\n+\tcmpdi\tcr6,r11,3\n+\tandi.\tr5,r5,7\n+\tmtctr\tr7\n+\tsubfic\tr12,r10,64\n+\tadd\tr5,r5,r0\n+\n+\tbt\tcr7*4+0,0f\n+\n+115:\tld\tr9,0(r3)\t# 3+2n loads, 2+2n stores\n+116:\tld\tr0,8(r3)\n+\tsld\tr11,r9,r10\n+117:\tldu\tr9,16(r3)\n+\tsrd\tr7,r0,r12\n+\tsld\tr8,r0,r10\n+\tor\tr7,r7,r11\n+\tblt\tcr6,4f\n+118:\tld\tr0,8(r3)\n+\t# s1<< in r8, d0=(s0<<|s1>>) in r7, s3 in r0, s2 in r9, nix in r11 & r12\n+\tb\t2f\n+\n+0:\n+113:\tld\tr0,0(r3)\t# 4+2n loads, 3+2n stores\n+114:\tldu\tr9,8(r3)\n+\tsld\tr8,r0,r10\n+\taddi\tr4,r4,-8\n+\tblt\tcr6,5f\n+119:\tld\tr0,8(r3)\n+\tmr\tr7,r12\t\t/* need more registers */\n+\tsrd\tr12,r9,r12\n+\tsld\tr11,r9,r10\n+120:\tldu\tr9,16(r3)\n+\tor\tr12,r8,r12\n+\tsrd\tr7,r0,r7\t/* lost value but can recreate from r10 */\n+\tsld\tr8,r0,r10\n+\taddi\tr4,r4,16\n+\tbeq\tcr6,3f\n+\n+\t# d0=(s0<<|s1>>) in r12, s1<< in r11, s2>> in r7, s2<< in r8, s3 in r9\n+1:\tor\tr7,r7,r11\n+89:\tld\tr0,8(r3)\n+90:\tstd\tr12,8(r4)\n+\tadde\tr6,r6,r12 \t/* add to checksum */\n+2:\tsubfic\tr12,r10,64\t/* recreate value from r10 */\n+\tsrd\tr12,r9,r12\n+\tsld\tr11,r9,r10\n+91:\tldu\tr9,16(r3)\n+\tor\tr12,r8,r12\n+92:\tstdu\tr7,16(r4)\n+\tadde\tr6,r6,r7 \t/* add to checksum */\n+\tsubfic\tr7,r10,64\t/* recreate value from r10 */\n+\tsrd\tr7,r0,r7\n+\tsld\tr8,r0,r10\n+\tbdnz\t1b\n+\n+3:\n+93:\tstd\tr12,8(r4)\n+\tadde\tr6,r6,r12 \t/* add to checksum */\n+\tor\tr7,r7,r11\n+4:\n+94:\tstd\tr7,16(r4)\n+\tadde\tr6,r6,r7 \t/* add to checksum */\n+5:\tsubfic\tr12,r10,64\t/* recreate value from r10 */\n+\tsrd\tr12,r9,r12\n+\tor\tr12,r8,r12\n+95:\tstd\tr12,24(r4)\n+\tadde\tr6,r6,r12 \t/* add to checksum */\n+\tbeq\t4f\n+\tcmpwi\tcr1,r5,8\n+\taddi\tr4,r4,32\n+\tsld\tr9,r9,r10\n+\tble\tcr1,.Ldo_tail\n+96:\tld\tr0,8(r3)\n+\tsrd\tr7,r0,r12\n+\tor\tr9,r7,r9\n+\tb\t.Ldo_tail\n+\n+.Ldst_unaligned:\n+\tPPC_MTOCRF\t0x01,r11\t\t# put #bytes to 8B bdry into cr7\n+\tsubf\tr5,r11,r5\n+\tli\tr10,0\n+\tcmpldi\tr1,r5,16\n+\tbf\tcr7*4+3,1f\n+97:\tlbz\tr0,0(r3)\n+98:\tstb\tr0,0(r4)\n+\tadde\tr6,r6,r0 \t/* add to checksum */\n+\taddi\tr10,r10,1\n+1:\tbf\tcr7*4+2,2f\n+99:\tlhzx\tr0,r10,r3\n+100:\tsthx\tr0,r10,r4\n+\tadde\tr6,r6,r0 \t/* add to checksum */\n+\taddi\tr10,r10,2\n+2:\tbf\tcr7*4+1,3f\n+101:\tlwzx\tr0,r10,r3\n+102:\tstwx\tr0,r10,r4\n+\tadde\tr6,r6,r0 \t/* add to checksum */\n+3:\tPPC_MTOCRF\t0x01,r5\n+\tadd\tr3,r11,r3\n+\tadd\tr4,r11,r4\n+\tb\t.Ldst_aligned\n+\n+.Lshort_copy:\n+\tbf\tcr7*4+0,1f\n+103:\tlwz\tr0,0(r3)\n+104:\tlwz\tr9,4(r3)\n+ \taddi\tr3,r3,8\n+105:\tstw\tr0,0(r4)\n+106:\tstw\tr9,4(r4)\n+\tadde\tr6,r6,r0\n+\tadde\tr6,r6,r9\n+\taddi\tr4,r4,8\n+1:\tbf\tcr7*4+1,2f\n+107:\tlwz\tr0,0(r3)\n+\taddi\tr3,r3,4\n+108:\tstw\tr0,0(r4)\n+\tadde\tr6,r6,r0\n+\taddi\tr4,r4,4\n+2:\tbf\tcr7*4+2,3f\n+109:\tlhz\tr0,0(r3)\n \taddi\tr3,r3,2\n-\tsubi\tr5,r5,2\n-93:\tsth\tr6,4(r4)\n+110:\tsth\tr0,0(r4)\n+\tadde\tr6,r6,r0\n \taddi\tr4,r4,2\n-\tadde\tr0,r0,r6\n-4:\tcmpwi\t0,r5,1\n-\tbne+\t5f\n-84:\tlbz\tr6,4(r3)\n-94:\tstb\tr6,4(r4)\n-\tslwi\tr6,r6,8\t\t/* Upper byte of word */\n-\tadde\tr0,r0,r6\n-5:\taddze\tr3,r0\t\t/* add in final carry (unlikely with 64-bit regs) */\n- rldicl r4,r3,32,0 /* fold 64 bit value */\n- add r3,r4,r3\n+3:\tbf\tcr7*4+3,4f\n+111:\tlbz\tr0,0(r3)\n+112:\tstb\tr0,0(r4)\n+\tadde\tr6,r6,r0\n+4:\taddze\tr6,r6\t\t/* add in final carry (unlikely with 64-bit regs) */\n+ rldicl r9,r6,32,0 /* fold 64 bit value */\n+ add r3,r9,r6\n srdi r3,r3,32\n-\tblr\n+\tblr\t\t\t/* return dest pointer */\n\n /* These shouldn't go in the fixup section, since that would\n cause the ex_table addresses to get out of order. */\n\n-\t.globl src_error_1\n-src_error_1:\n-\tli\tr6,0\n-\tsubi\tr5,r5,2\n-95:\tsth\tr6,4(r4)\n-\taddi\tr4,r4,2\n-\tsrwi.\tr6,r5,2\n-\tbeq\t3f\n-\tmtctr\tr6\n-\t.globl src_error_2\n-src_error_2:\n-\tli\tr6,0\n-96:\tstwu\tr6,4(r4)\n-\tbdnz\t96b\n-3:\tandi.\tr5,r5,3\n-\tbeq\tsrc_error\n-\t.globl src_error_3\n-src_error_3:\n-\tli\tr6,0\n-\tmtctr\tr5\n-\taddi\tr4,r4,3\n-97:\tstbu\tr6,1(r4)\n-\tbdnz\t97b\n+/* Load store exception handlers */\n \t.globl src_error\n src_error:\n-\tcmpdi\t0,r7,0\n+\tld\tr7,48(r1)\t/* restore src_error */\n+\n+\tli\tr11,0\n+\tmtctr\tr5\t\t/* Non-optimized zero out we will hopefully...*/\n+113:\tstbu\tr11,1(r4)\t\t/* never hit. */\n+\tbdnz\t113b\n+\tcmpdi\t0,r7,0\t\t/* if it isn't NULL write EFAULT into it */\n \tbeq\t1f\n-\tli\tr6,-EFAULT\n-\tstw\tr6,0(r7)\n-1:\taddze\tr3,r0\n+\tli\tr11,-EFAULT\n+\tstw\tr11,0(r7)\n+1:\taddze\tr3,r6\t\t/* add any carry */\n \tblr\n\n \t.globl dst_error\n dst_error:\n+\tld\tr8,56(r1)\t/* restore dst_error */\n \tcmpdi\t0,r8,0\n \tbeq\t1f\n-\tli\tr6,-EFAULT\n-\tstw\tr6,0(r8)\n-1:\taddze\tr3,r0\n+\tli\tr11,-EFAULT\n+\tstw\tr11,0(r8)\n+1:\taddze\tr3,r6\t\t/* add any carry */\n \tblr\n\n+\t.globl dst_error\n+\n .section __ex_table,\"a\"\n \t.align 3\n-\t.llong\t81b,src_error_1\n-\t.llong\t91b,dst_error\n-\t.llong\t82b,src_error_2\n-\t.llong\t92b,dst_error\n-\t.llong\t83b,src_error_3\n-\t.llong\t93b,dst_error\n-\t.llong\t84b,src_error_3\n-\t.llong\t94b,dst_error\n-\t.llong\t95b,dst_error\n-\t.llong\t96b,dst_error\n-\t.llong\t97b,dst_error\n+\t/* labels 80-120 are for load/stores that we have\n+\t * to catch exceptions and handle them\n+\t */\n+\t/*\n+\n+\t*/\n+\t.llong\t80b,src_error\n+\t.llong\t81b,src_error\n+\t.llong 82b,dst_error\n+\t.llong\t83b,src_error\n+\t.llong 84b,dst_error\n+\t.llong 85b,dst_error\n+\t.llong 86b,dst_error\n+\t.llong 87b,dst_error\n+\t.llong 88b,dst_error\n+\t.llong\t115b,src_error\n+\t.llong\t116b,src_error\n+\t.llong\t117b,src_error\n+\t.llong\t118b,src_error\n+\t.llong\t113b,src_error\n+\t.llong\t114b,src_error\n+\t.llong\t119b,src_error\n+\t.llong\t120b,src_error\n+\t.llong 90b,dst_error\n+\t.llong\t91b,src_error\n+\t.llong 92b,dst_error\n+\t.llong 93b,dst_error\n+\t.llong 94b,dst_error\n+\t.llong 95b,dst_error\n+\t.llong\t96b,src_error\n+\t.llong\t97b,src_error\n+\t.llong 98b,dst_error\n+\t.llong\t99b,src_error\n+\t.llong 100b,dst_error\n+\t.llong\t101b,src_error\n+\t.llong 102b,dst_error\n+\t.llong\t103b,src_error\n+\t.llong\t104b,src_error\n+\t.llong 105b,dst_error\n+\t.llong 106b,dst_error\n+\t.llong\t107b,src_error\n+\t.llong 108b,dst_error\n+\t.llong\t109b,src_error\n+\t.llong 110b,dst_error\n+\t.llong\t111b,src_error\n+\t.llong 112b,dst_error\n+\t.llong 113b,dst_error\n", "prefixes": [] }