Message ID | 20171101101735.2318-1-khandual@linux.vnet.ibm.com (mailing list archive) |
---|---|
Headers | show
Return-Path: <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yRkhm2Zklz9sPs for <patchwork-incoming@ozlabs.org>; Wed, 1 Nov 2017 21:19:04 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3yRkhm1YPczDrG1 for <patchwork-incoming@ozlabs.org>; Wed, 1 Nov 2017 21:19:04 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=khandual@linux.vnet.ibm.com; receiver=<UNKNOWN>) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3yRkgG65K4zDr4H for <linuxppc-dev@lists.ozlabs.org>; Wed, 1 Nov 2017 21:17:46 +1100 (AEDT) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vA1AGYhg120499 for <linuxppc-dev@lists.ozlabs.org>; Wed, 1 Nov 2017 06:17:44 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2dycb28763-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for <linuxppc-dev@lists.ozlabs.org>; Wed, 01 Nov 2017 06:17:43 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <linuxppc-dev@lists.ozlabs.org> from <khandual@linux.vnet.ibm.com>; Wed, 1 Nov 2017 10:17:41 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 1 Nov 2017 10:17:39 -0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vA1AHdZp24051950; Wed, 1 Nov 2017 10:17:39 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B372942047; Wed, 1 Nov 2017 10:12:52 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1FC0442042; Wed, 1 Nov 2017 10:12:51 +0000 (GMT) Received: from localhost.in.ibm.com (unknown [9.77.121.32]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 1 Nov 2017 10:12:50 +0000 (GMT) From: Anshuman Khandual <khandual@linux.vnet.ibm.com> To: linuxppc-dev@lists.ozlabs.org Subject: [RFC 0/2] Enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH on POWER Date: Wed, 1 Nov 2017 15:47:33 +0530 X-Mailer: git-send-email 2.9.3 X-TM-AS-GCONF: 00 x-cbid: 17110110-0012-0000-0000-00000588321F X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17110110-0013-0000-0000-00001902C0C0 Message-Id: <20171101101735.2318-1-khandual@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-11-01_02:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711010146 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Linux on PowerPC Developers Mail List <linuxppc-dev.lists.ozlabs.org> List-Unsubscribe: <https://lists.ozlabs.org/options/linuxppc-dev>, <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=unsubscribe> List-Archive: <http://lists.ozlabs.org/pipermail/linuxppc-dev/> List-Post: <mailto:linuxppc-dev@lists.ozlabs.org> List-Help: <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=help> List-Subscribe: <https://lists.ozlabs.org/listinfo/linuxppc-dev>, <mailto:linuxppc-dev-request@lists.ozlabs.org?subject=subscribe> Cc: aneesh.kumar@linux.vnet.ibm.com, npiggin@gmail.com Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org> |
Series |
Enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH on POWER
|
expand
|
From: Anshuman Khandual <Khandual@linux.vnet.ibm.com> Batched TLB flush during reclaim path has been around for couple of years now and been enabled on X86 platform. The idea is to batch multiple page TLB invalidation requests together and flush all those CPUs completely who might have the TLB cache for any of the unmapped pages instead of just sending multiple IPIs and flushing out individual pages each time reclaim unmaps one page. This has the potential to improve performance for certain types of workloads under memory pressure provided some conditions related to individual page TLB invalidation, CPU wide TLB invalidation, system wide TLB invalidation, TLB reload, IPI costs etc are met. Please refer the commit 72b252aed5 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages") from Mel Gorman for more details on how it can impact the performance for various workloads. This enablement improves performance for the original test case 'case-lru-file-mmap-read' from vm-scallability bucket but only from system time perspective. time ./run case-lru-file-mmap-read Without the patch: real 4m20.364s user 102m52.492s sys 433m26.190s With the patch: real 4m15.942s (- 1.69%) user 111m16.662s (+ 7.55%) sys 382m35.202s (- 11.73%) Parallel kernel compilation does not see any performance improvement or degradation with and with out this patch. It remains within margin of error. Without the patch: real 1m13.850s user 39m21.803s sys 2m43.362s With the patch: real 1m14.481s (+ 0.85%) user 39m27.409s (+ 0.23%) sys 2m44.656s (+ 0.79%) It batches up multiple struct mm during reclaim and keeps on accumulating the superset of struct mm's cpu mask who might have a TLB which needs to be invalidated. Then local struct mm wide invalidation is performance on the cpu mask for all those batched ones. Please do the review and let me know if there is any other way to do this better. Thank you. Anshuman Khandual (2): mm/tlbbatch: Introduce arch_tlbbatch_should_defer() powerpc/mm: Enable deferred flushing of TLB during reclaim arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/tlbbatch.h | 30 +++++++++++++++++++++++ arch/powerpc/include/asm/tlbflush.h | 3 +++ arch/powerpc/mm/tlb-radix.c | 49 +++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 12 +++++++++ mm/rmap.c | 9 +------ 6 files changed, 96 insertions(+), 8 deletions(-) create mode 100644 arch/powerpc/include/asm/tlbbatch.h