Cover Letter Detail
Show a cover letter.
GET /api/covers/811769/?format=api
{ "id": 811769, "url": "http://patchwork.ozlabs.org/api/covers/811769/?format=api", "web_url": "http://patchwork.ozlabs.org/project/linuxppc-dev/cover/1504894024-2750-1-git-send-email-ldufour@linux.vnet.ibm.com/", "project": { "id": 2, "url": "http://patchwork.ozlabs.org/api/projects/2/?format=api", "name": "Linux PPC development", "link_name": "linuxppc-dev", "list_id": "linuxppc-dev.lists.ozlabs.org", "list_email": "linuxppc-dev@lists.ozlabs.org", "web_url": "https://github.com/linuxppc/wiki/wiki", "scm_url": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git", "webscm_url": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/", "list_archive_url": "https://lore.kernel.org/linuxppc-dev/", "list_archive_url_format": "https://lore.kernel.org/linuxppc-dev/{}/", "commit_url_format": "https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id={}" }, "msgid": "<1504894024-2750-1-git-send-email-ldufour@linux.vnet.ibm.com>", "list_archive_url": "https://lore.kernel.org/linuxppc-dev/1504894024-2750-1-git-send-email-ldufour@linux.vnet.ibm.com/", "date": "2017-09-08T18:06:44", "name": "[v3,00/20] Speculative page faults", "submitter": { "id": 40248, "url": "http://patchwork.ozlabs.org/api/people/40248/?format=api", "name": "Laurent Dufour", "email": "ldufour@linux.vnet.ibm.com" }, "mbox": "http://patchwork.ozlabs.org/project/linuxppc-dev/cover/1504894024-2750-1-git-send-email-ldufour@linux.vnet.ibm.com/mbox/", "series": [ { "id": 2269, "url": "http://patchwork.ozlabs.org/api/series/2269/?format=api", "web_url": "http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=2269", "date": "2017-09-08T18:06:44", "name": "Speculative page faults", "version": 3, "mbox": "http://patchwork.ozlabs.org/series/2269/mbox/" } ], "comments": "http://patchwork.ozlabs.org/api/covers/811769/comments/", "headers": { "Return-Path": "<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>", "X-Original-To": [ "patchwork-incoming@ozlabs.org", "linuxppc-dev@lists.ozlabs.org" ], "Delivered-To": [ "patchwork-incoming@ozlabs.org", "linuxppc-dev@lists.ozlabs.org" ], "Received": [ "from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\t(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xplk83WZ0z9s8J\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 9 Sep 2017 04:10:56 +1000 (AEST)", "from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3])\n\tby lists.ozlabs.org (Postfix) with ESMTP id 3xplk82QyzzDqF4\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat, 9 Sep 2017 04:10:56 +1000 (AEST)", "from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com\n\t[148.163.156.1])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby lists.ozlabs.org (Postfix) with ESMTPS id 3xpldy6DsGzDrVj\n\tfor <linuxppc-dev@lists.ozlabs.org>;\n\tSat, 9 Sep 2017 04:07:18 +1000 (AEST)", "from pps.filterd (m0098409.ppops.net [127.0.0.1])\n\tby mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv88I4PCC096494\n\tfor <linuxppc-dev@lists.ozlabs.org>; Fri, 8 Sep 2017 14:07:16 -0400", "from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110])\n\tby mx0a-001b2d01.pphosted.com with ESMTP id 2cuys513uj-1\n\t(version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT)\n\tfor <linuxppc-dev@lists.ozlabs.org>; Fri, 08 Sep 2017 14:07:15 -0400", "from localhost\n\tby e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use\n\tOnly! Violators will be prosecuted\n\tfor <linuxppc-dev@lists.ozlabs.org> from <ldufour@linux.vnet.ibm.com>;\n\tFri, 8 Sep 2017 19:07:12 +0100", "from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198)\n\tby e06smtp14.uk.ibm.com (192.168.101.144) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted; \n\tFri, 8 Sep 2017 19:07:07 +0100", "from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com\n\t[9.149.105.60])\n\tby b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with\n\tESMTP id v88I77j215794392; Fri, 8 Sep 2017 18:07:07 GMT", "from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])\n\tby IMSVA (Postfix) with ESMTP id 59B4342042;\n\tFri, 8 Sep 2017 19:03:34 +0100 (BST)", "from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])\n\tby IMSVA (Postfix) with ESMTP id 9997E42041;\n\tFri, 8 Sep 2017 19:03:32 +0100 (BST)", "from nimbus.lab.toulouse-stg.fr.ibm.com (unknown [9.145.31.125])\n\tby d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP;\n\tFri, 8 Sep 2017 19:03:32 +0100 (BST)" ], "Authentication-Results": "ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com\n\t(client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com;\n\tenvelope-from=ldufour@linux.vnet.ibm.com; receiver=<UNKNOWN>)", "From": "Laurent Dufour <ldufour@linux.vnet.ibm.com>", "To": "paulmck@linux.vnet.ibm.com, peterz@infradead.org,\n\takpm@linux-foundation.org, kirill@shutemov.name, ak@linux.intel.com, \n\tmhocko@kernel.org, dave@stgolabs.net, jack@suse.cz,\n\tMatthew Wilcox <willy@infradead.org>, benh@kernel.crashing.org,\n\tmpe@ellerman.id.au, paulus@samba.org,\n\tThomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, \n\thpa@zytor.com, Will Deacon <will.deacon@arm.com>,\n\tSergey Senozhatsky <sergey.senozhatsky@gmail.com>", "Subject": "[PATCH v3 00/20] Speculative page faults", "Date": "Fri, 8 Sep 2017 20:06:44 +0200", "X-Mailer": "git-send-email 2.7.4", "X-TM-AS-GCONF": "00", "x-cbid": "17090818-0016-0000-0000-000004EB9F4C", "X-IBM-AV-DETECTION": "SAVI=unused REMOTE=unused XFE=unused", "x-cbparentid": "17090818-0017-0000-0000-00002825A701", "Message-Id": "<1504894024-2750-1-git-send-email-ldufour@linux.vnet.ibm.com>", "X-Proofpoint-Virus-Version": "vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-08_12:, , signatures=0", "X-Proofpoint-Spam-Details": "rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=2\n\tmalwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam\n\tadjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709080270", "X-BeenThere": "linuxppc-dev@lists.ozlabs.org", "X-Mailman-Version": "2.1.23", "Precedence": "list", "List-Id": "Linux on PowerPC Developers Mail List\n\t<linuxppc-dev.lists.ozlabs.org>", "List-Unsubscribe": "<https://lists.ozlabs.org/options/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe>", "List-Archive": "<http://lists.ozlabs.org/pipermail/linuxppc-dev/>", "List-Post": "<mailto:linuxppc-dev@lists.ozlabs.org>", "List-Help": "<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help>", "List-Subscribe": "<https://lists.ozlabs.org/listinfo/linuxppc-dev>,\n\t<mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe>", "Cc": "linuxppc-dev@lists.ozlabs.org, x86@kernel.org,\n\tlinux-kernel@vger.kernel.org, npiggin@gmail.com, linux-mm@kvack.org,\n\tTim Chen <tim.c.chen@linux.intel.com>, \n\tharen@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com", "Errors-To": "linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org", "Sender": "\"Linuxppc-dev\"\n\t<linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org>" }, "content": "This is a port on kernel 4.13 of the work done by Peter Zijlstra to\nhandle page fault without holding the mm semaphore [1].\n\nThe idea is to try to handle user space page faults without holding the\nmmap_sem. This should allow better concurrency for massively threaded\nprocess since the page fault handler will not wait for other threads memory\nlayout change to be done, assuming that this change is done in another part\nof the process's memory space. This type page fault is named speculative\npage fault. If the speculative page fault fails because of a concurrency is\ndetected or because underlying PMD or PTE tables are not yet allocating, it\nis failing its processing and a classic page fault is then tried.\n\nThe speculative page fault (SPF) has to look for the VMA matching the fault\naddress without holding the mmap_sem, so the VMA list is now managed using\nSRCU allowing lockless walking. The only impact would be the deferred file\nderefencing in the case of a file mapping, since the file pointer is\nreleased once the SRCU cleaning is done. This patch relies on the change\ndone recently by Paul McKenney in SRCU which now runs a callback per CPU\ninstead of per SRCU structure [1].\n\nThe VMA's attributes checked during the speculative page fault processing\nhave to be protected against parallel changes. This is done by using a per\nVMA sequence lock. This sequence lock allows the speculative page fault\nhandler to fast check for parallel changes in progress and to abort the\nspeculative page fault in that case.\n\nOnce the VMA is found, the speculative page fault handler would check for\nthe VMA's attributes to verify that the page fault has to be handled\ncorrectly or not. Thus the VMA is protected through a sequence lock which\nallows fast detection of concurrent VMA changes. If such a change is\ndetected, the speculative page fault is aborted and a *classic* page fault\nis tried. VMA sequence locks are added when VMA attributes which are\nchecked during the page fault are modified.\n\nWhen the PTE is fetched, the VMA is checked to see if it has been changed,\nso once the page table is locked, the VMA is valid, so any other changes\nleading to touching this PTE will need to lock the page table, so no\nparallel change is possible at this time.\n\nCompared to the Peter's initial work, this series introduces a spin_trylock\nwhen dealing with speculative page fault. This is required to avoid dead\nlock when handling a page fault while a TLB invalidate is requested by an\nother CPU holding the PTE. Another change due to a lock dependency issue\nwith mapping->i_mmap_rwsem.\n\nIn addition some VMA field values which are used once the PTE is unlocked\nat the end the page fault path are saved into the vm_fault structure to\nused the values matching the VMA at the time the PTE was locked.\n\nThis series only support VMA with no vm_ops define, so huge page and mapped\nfile are not managed with the speculative path. In addition transparent\nhuge page are not supported. Once this series will be accepted upstream\nI'll extend the support to mapped files, and transparent huge pages.\n\nThis series builds on top of v4.13.9-mm1 and is functional on x86 and\nPowerPC.\n\nTests have been made using a large commercial in-memory database on a\nPowerPC system with 752 CPU using RFC v5 using a previous version of this\nseries. The results are very encouraging since the loading of the 2TB\ndatabase was faster by 14% with the speculative page fault.\n\nUsing ebizzy test [3], which spreads a lot of threads, the result are good\nwhen running on both a large or a small system. When using kernbench, the\nresult are quite similar which expected as not so much multithreaded\nprocesses are involved. But there is no performance degradation neither\nwhich is good.\n\n------------------\nBenchmarks results\n\nNote these test have been made on top of 4.13.0-mm1.\n\nEbizzy:\n-------\nThe test is counting the number of records per second it can manage, the\nhigher is the best. I run it like this 'ebizzy -mTRp'. To get consistent\nresult I repeated the test 100 times and measure the average result, mean\ndeviation, max and min.\n\n- 16 CPUs x86 VM\nRecords/s\t4.13.0-mm1\t4.13.0-mm1-spf\tdelta\nAverage\t\t13217.90 \t65765.94\t+397.55%\nMean deviation\t690.37\t\t2609.36\t\t+277.97%\nMax\t\t16726\t\t77675\t\t+364.40%\nMin\t\t12194\t\t616340\t\t+405.45%\n\t\t\n- 80 CPUs Power 8 node:\nRecords/s\t4.13.0-mm1\t4.13.0-mm1-spf\tdelta\nAverage\t\t38175.40\t67635.55\t77.17% \nMean deviation\t600.09\t \t2349.66\t\t291.55%\nMax\t\t39563\t\t74292\t\t87.78% \nMin\t\t35846\t\t62657\t\t74.79% \n\nThe number of record per second is far better with the speculative page\nfault. \nThe mean deviation is higher with the speculative page fault, may be\nbecause sometime the fault are not handled in a speculative way leading to\nmore variation.\nThe numbers for the x86 guest are really insane for the SPF case, but I\ndid the test several times and this leads each time this delta. I did again\nthe test using the previous version of the patch and I got similar\nnumbers. It happens that the host running the VM is far less loaded now\nleading to better results as more threads are eligible to run.\nTest on Power are done on a badly balanced node where the memory is only\nattached to one core.\n\nKernbench:\n----------\nThis test is building a 4.12 kernel using platform default config. The\nbuild has been run 5 times each time.\n\n- 16 CPUs x86 VM\nAverage Half load -j 8 Run (std deviation)\n \t\t 4.13.0-mm1\t\t4.13.0-mm1-spf\t\tdelta %\nElapsed Time 145.968 (0.402206)\t145.654 (0.533601)\t-0.22\nUser Time 1006.58 (2.74729)\t1003.7 (4.11294)\t-0.29\nSystem Time 108.464 (0.177567)\t111.034 (0.718213)\t+2.37\nPercent CPU \t 763.4 (1.34164)\t764.8 (1.30384)\t\t+0.18\nContext Switches 46599.6 (412.013)\t63771 (1049.95)\t\t+36.85\nSleeps 85313.2 (514.456)\t85532.2 (681.199)\t-0.26\n\nAverage Optimal load -j 16 Run (std deviation)\n \t\t 4.13.0-mm1\t\t4.13.0-mm1-spf\t\tdelta %\nElapsed Time 74.292 (0.75998)\t74.484 (0.723035)\t+0.26\nUser Time 959.949 (49.2036)\t956.057 (50.2993)\t-0.41\nSystem Time 100.203 (8.7119)\t101.984 (9.56099)\t+1.78\nPercent CPU \t 1058 (310.661)\t\t1054.3 (305.263)\t-0.35\nContext Switches 65713.8 (20161.7)\t86619.4 (24095.4)\t+31.81\nSleeps 90344.9 (5364.74)\t90877.4 (5655.87)\t-0.59\n\nThe elapsed time are similar, but the impact less important since there are\nless multithreaded processes involved here. \n\n- 80 CPUs Power 8 node:\nAverage Half load -j 40 Run (std deviation)\n\t\t 4.13.0-mm1\t\t4.13.0-mm1-spf\t\tdelta %\nElapsed Time \t 115.342 (0.321668)\t115.786 (0.427118)\t+0.38\nUser Time \t 4355.08 (10.1778)\t4371.77 (14.9715)\t+0.38\nSystem Time \t 127.612 (0.882083)\t130.048 (1.06258)\t+1.91\nPercent CPU \t 3885.8 (11.606)\t3887.4 (8.04984)\t+0.04\nContext Switches 80907.8 (657.481)\t81936.4 (729.538)\t+1.27\nSleeps\t\t 162109 (793.331)\t162057 (1414.08)\t+0.03\n\nAverage Optimal load -j 80 Run (std deviation)\n \t\t 4.13.0-mm1\t\t4.13.0-mm1-spf\nElapsed Time \t 110.308 (0.725445)\t109.78 (0.826862)\t-0.48\nUser Time \t 5893.12 (1621.33)\t5923.19 (1635.48)\t+0.51\nSystem Time \t 162.168 (36.4347)\t166.533 (38.4695)\t+2.69\nPercent CPU \t 5400.2 (1596.89)\t5440.4 (1637.71)\t+0.74\nContext Switches 129372 (51088.2)\t144529 (65985.5)\t+11.72\nSleeps\t\t 157312 (5113.57)\t158696 (4301.48)\t-0.87\n\nHere the elapsed time are similar the SPF release, but we remain in the error\nmargin. It has to be noted that this system is not correctly balanced on\nthe NUMA point of view as all the available memory is attached to one core.\n\n------------------------\nChanges since v2:\n - Perf event is renamed in PERF_COUNT_SW_SPF\n - On Power handle do_page_fault()'s cleaning\n - On Power if the VM_FAULT_ERROR is returned by\n handle_speculative_fault(), do not retry but jump to the error path\n - If VMA's flags are not matching the fault, directly returns\n VM_FAULT_SIGSEGV and not VM_FAULT_RETRY\n - Check for pud_trans_huge() to avoid speculative path\n - Handles _vm_normal_page()'s introduced by 6f16211df3bf\n (\"mm/device-public-memory: device memory cache coherent with CPU\")\n - add and review few comments in the code\nChanges since v1:\n - Remove PERF_COUNT_SW_SPF_FAILED perf event.\n - Add tracing events to details speculative page fault failures.\n - Cache VMA fields values which are used once the PTE is unlocked at the\n end of the page fault events.\n - Ensure that fields read during the speculative path are written and read\n using WRITE_ONCE and READ_ONCE.\n - Add checks at the beginning of the speculative path to abort it if the\n VMA is known to not be supported.\nChanges since RFC V5 [5]\n - Port to 4.13 kernel\n - Merging patch fixing lock dependency into the original patch\n - Replace the 2 parameters of vma_has_changed() with the vmf pointer\n - In patch 7, don't call __do_fault() in the speculative path as it may\n want to unlock the mmap_sem.\n - In patch 11-12, don't check for vma boundaries when\n page_add_new_anon_rmap() is called during the spf path and protect against\n anon_vma pointer's update.\n - In patch 13-16, add performance events to report number of successful\n and failed speculative events. \n\n[1] http://linux-kernel.2935.n7.nabble.com/RFC-PATCH-0-6-Another-go-at-speculative-page-faults-tt965642.html#none\n[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da915ad5cf25b5f5d358dd3670c3378d8ae8c03e\n[3] http://ebizzy.sourceforge.net/\n[4] http://ck.kolivas.org/apps/kernbench/kernbench-0.50/\n[5] https://lwn.net/Articles/725607/\n\nLaurent Dufour (14):\n mm: Introduce pte_spinlock for FAULT_FLAG_SPECULATIVE\n mm: Protect VMA modifications using VMA sequence count\n mm: Cache some VMA fields in the vm_fault structure\n mm: Protect SPF handler against anon_vma changes\n mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()\n mm: Introduce __lru_cache_add_active_or_unevictable\n mm: Introduce __maybe_mkwrite()\n mm: Introduce __vm_normal_page()\n mm: Introduce __page_add_new_anon_rmap()\n mm: Try spin lock in speculative path\n mm: Adding speculative page fault failure trace events\n perf: Add a speculative page fault sw event\n perf tools: Add support for the SPF perf event\n powerpc/mm: Add speculative page fault\n\nPeter Zijlstra (6):\n mm: Dont assume page-table invariance during faults\n mm: Prepare for FAULT_FLAG_SPECULATIVE\n mm: VMA sequence count\n mm: RCU free VMAs\n mm: Provide speculative fault infrastructure\n x86/mm: Add speculative pagefault handling\n\n arch/powerpc/include/asm/book3s/64/pgtable.h | 5 +\n arch/powerpc/mm/fault.c | 15 +\n arch/x86/include/asm/pgtable_types.h | 7 +\n arch/x86/mm/fault.c | 19 ++\n fs/proc/task_mmu.c | 5 +-\n fs/userfaultfd.c | 17 +-\n include/linux/hugetlb_inline.h | 2 +-\n include/linux/migrate.h | 4 +-\n include/linux/mm.h | 28 +-\n include/linux/mm_types.h | 3 +\n include/linux/pagemap.h | 4 +-\n include/linux/rmap.h | 12 +-\n include/linux/swap.h | 11 +-\n include/trace/events/pagefault.h | 87 +++++\n include/uapi/linux/perf_event.h | 1 +\n kernel/fork.c | 1 +\n mm/hugetlb.c | 2 +\n mm/init-mm.c | 1 +\n mm/internal.h | 19 ++\n mm/khugepaged.c | 5 +\n mm/madvise.c | 6 +-\n mm/memory.c | 478 ++++++++++++++++++++++-----\n mm/mempolicy.c | 51 ++-\n mm/migrate.c | 4 +-\n mm/mlock.c | 13 +-\n mm/mmap.c | 138 ++++++--\n mm/mprotect.c | 4 +-\n mm/mremap.c | 7 +\n mm/rmap.c | 5 +-\n mm/swap.c | 12 +-\n tools/include/uapi/linux/perf_event.h | 1 +\n tools/perf/util/evsel.c | 1 +\n tools/perf/util/parse-events.c | 4 +\n tools/perf/util/parse-events.l | 1 +\n tools/perf/util/python.c | 1 +\n 35 files changed, 796 insertions(+), 178 deletions(-)\n create mode 100644 include/trace/events/pagefault.h" }