mm: Expose lazy vfree pages to control via sysctl

Message ID 1546616141-486-1-git-send-email-amhetre@nvidia.com
State New
Headers show
Series
  • mm: Expose lazy vfree pages to control via sysctl
Related show

Commit Message

Ashish Mhetre Jan. 4, 2019, 3:35 p.m.
From: Hiroshi Doyu <hdoyu@nvidia.com>

The purpose of lazy_max_pages is to gather virtual address space till it
reaches the lazy_max_pages limit and then purge with a TLB flush and hence
reduce the number of global TLB flushes.
The default value of lazy_max_pages with one CPU is 32MB and with 4 CPUs it
is 96MB i.e. for 4 cores, 96MB of vmalloc space will be gathered before it
is purged with a TLB flush.
This feature has shown random latency issues. For example, we have seen
that the kernel thread for some camera application spent 30ms in
__purge_vmap_area_lazy() with 4 CPUs.
So, create "/proc/sys/lazy_vfree_pages" file to control lazy vfree pages.
With this sysctl, the behaviour of lazy_vfree_pages can be controlled and
the systems which can't tolerate latency issues can also disable it.
This is one of the way through which lazy_vfree_pages can be controlled as
proposed in this patch. The other possible solution would be to configure
lazy_vfree_pages through kernel cmdline.

Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
 kernel/sysctl.c | 8 ++++++++
 mm/vmalloc.c    | 5 ++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Jan. 4, 2019, 6:03 p.m. | #1
On Fri, Jan 04, 2019 at 09:05:41PM +0530, Ashish Mhetre wrote:
> From: Hiroshi Doyu <hdoyu@nvidia.com>
> 
> The purpose of lazy_max_pages is to gather virtual address space till it
> reaches the lazy_max_pages limit and then purge with a TLB flush and hence
> reduce the number of global TLB flushes.
> The default value of lazy_max_pages with one CPU is 32MB and with 4 CPUs it
> is 96MB i.e. for 4 cores, 96MB of vmalloc space will be gathered before it
> is purged with a TLB flush.
> This feature has shown random latency issues. For example, we have seen
> that the kernel thread for some camera application spent 30ms in
> __purge_vmap_area_lazy() with 4 CPUs.

You're not the first to report something like this.  Looking through the
kernel logs, I see:

commit 763b218ddfaf56761c19923beb7e16656f66ec62
Author: Joel Fernandes <joelaf@google.com>
Date:   Mon Dec 12 16:44:26 2016 -0800

    mm: add preempt points into __purge_vmap_area_lazy()

commit f9e09977671b618aeb25ddc0d4c9a84d5b5cde9d
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Dec 12 16:44:23 2016 -0800

    mm: turn vmap_purge_lock into a mutex

commit 80c4bd7a5e4368b680e0aeb57050a1b06eb573d8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri May 20 16:57:38 2016 -0700

    mm/vmalloc: keep a separate lazy-free list

So the first thing I want to do is to confirm that you see this problem
on a modern kernel.  We've had trouble with NVidia before reporting
historical problems as if they were new.
kbuild test robot Jan. 4, 2019, 6:29 p.m. | #2
Hi Hiroshi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.20 next-20190103]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ashish-Mhetre/mm-Expose-lazy-vfree-pages-to-control-via-sysctl/20190105-003852
config: sh-rsk7269_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=sh 

All errors (new ones prefixed by >>):

>> kernel/sysctl.o:(.data+0x2d4): undefined reference to `sysctl_lazy_vfree_pages'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
kbuild test robot Jan. 4, 2019, 6:30 p.m. | #3
Hi Hiroshi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.20 next-20190103]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Ashish-Mhetre/mm-Expose-lazy-vfree-pages-to-control-via-sysctl/20190105-003852
config: c6x-evmc6678_defconfig (attached as .config)
compiler: c6x-elf-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=c6x 

All errors (new ones prefixed by >>):

>> kernel/sysctl.o:(.fardata+0x2d4): undefined reference to `sysctl_lazy_vfree_pages'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Ashish Mhetre Jan. 6, 2019, 8:42 a.m. | #4
Matthew, this issue was last reported in September 2018 on K4.9.
I verified that the optimization patches mentioned by you were not 
present in our downstream kernel when we faced the issue. I will check 
whether issue still persist on new kernel with all these patches and 
come back.

On 04/01/19 11:33 PM, Matthew Wilcox wrote:
> On Fri, Jan 04, 2019 at 09:05:41PM +0530, Ashish Mhetre wrote:
>> From: Hiroshi Doyu <hdoyu@nvidia.com>
>>
>> The purpose of lazy_max_pages is to gather virtual address space till it
>> reaches the lazy_max_pages limit and then purge with a TLB flush and hence
>> reduce the number of global TLB flushes.
>> The default value of lazy_max_pages with one CPU is 32MB and with 4 CPUs it
>> is 96MB i.e. for 4 cores, 96MB of vmalloc space will be gathered before it
>> is purged with a TLB flush.
>> This feature has shown random latency issues. For example, we have seen
>> that the kernel thread for some camera application spent 30ms in
>> __purge_vmap_area_lazy() with 4 CPUs.
> 
> You're not the first to report something like this.  Looking through the
> kernel logs, I see:
> 
> commit 763b218ddfaf56761c19923beb7e16656f66ec62
> Author: Joel Fernandes <joelaf@google.com>
> Date:   Mon Dec 12 16:44:26 2016 -0800
> 
>      mm: add preempt points into __purge_vmap_area_lazy()
> 
> commit f9e09977671b618aeb25ddc0d4c9a84d5b5cde9d
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Dec 12 16:44:23 2016 -0800
> 
>      mm: turn vmap_purge_lock into a mutex
> 
> commit 80c4bd7a5e4368b680e0aeb57050a1b06eb573d8
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri May 20 16:57:38 2016 -0700
> 
>      mm/vmalloc: keep a separate lazy-free list
> 
> So the first thing I want to do is to confirm that you see this problem
> on a modern kernel.  We've had trouble with NVidia before reporting
> historical problems as if they were new.
>
Ashish Mhetre Jan. 21, 2019, 8:06 a.m. | #5
The issue is not seen on new kernel. This patch won't be needed. Thanks.

On 06/01/19 2:12 PM, Ashish Mhetre wrote:
> Matthew, this issue was last reported in September 2018 on K4.9.
> I verified that the optimization patches mentioned by you were not 
> present in our downstream kernel when we faced the issue. I will check 
> whether issue still persist on new kernel with all these patches and 
> come back.
> 
> On 04/01/19 11:33 PM, Matthew Wilcox wrote:
>> On Fri, Jan 04, 2019 at 09:05:41PM +0530, Ashish Mhetre wrote:
>>> From: Hiroshi Doyu <hdoyu@nvidia.com>
>>>
>>> The purpose of lazy_max_pages is to gather virtual address space till it
>>> reaches the lazy_max_pages limit and then purge with a TLB flush and 
>>> hence
>>> reduce the number of global TLB flushes.
>>> The default value of lazy_max_pages with one CPU is 32MB and with 4 
>>> CPUs it
>>> is 96MB i.e. for 4 cores, 96MB of vmalloc space will be gathered 
>>> before it
>>> is purged with a TLB flush.
>>> This feature has shown random latency issues. For example, we have seen
>>> that the kernel thread for some camera application spent 30ms in
>>> __purge_vmap_area_lazy() with 4 CPUs.
>>
>> You're not the first to report something like this.  Looking through the
>> kernel logs, I see:
>>
>> commit 763b218ddfaf56761c19923beb7e16656f66ec62
>> Author: Joel Fernandes <joelaf@google.com>
>> Date:   Mon Dec 12 16:44:26 2016 -0800
>>
>>      mm: add preempt points into __purge_vmap_area_lazy()
>>
>> commit f9e09977671b618aeb25ddc0d4c9a84d5b5cde9d
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Mon Dec 12 16:44:23 2016 -0800
>>
>>      mm: turn vmap_purge_lock into a mutex
>>
>> commit 80c4bd7a5e4368b680e0aeb57050a1b06eb573d8
>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>> Date:   Fri May 20 16:57:38 2016 -0700
>>
>>      mm/vmalloc: keep a separate lazy-free list
>>
>> So the first thing I want to do is to confirm that you see this problem
>> on a modern kernel.  We've had trouble with NVidia before reporting
>> historical problems as if they were new.
>>

Patch

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3ae223f..49523efc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -111,6 +111,7 @@  extern int pid_max;
 extern int pid_max_min, pid_max_max;
 extern int percpu_pagelist_fraction;
 extern int latencytop_enabled;
+extern int sysctl_lazy_vfree_pages;
 extern unsigned int sysctl_nr_open_min, sysctl_nr_open_max;
 #ifndef CONFIG_MMU
 extern int sysctl_nr_trim_pages;
@@ -1251,6 +1252,13 @@  static struct ctl_table kern_table[] = {
 
 static struct ctl_table vm_table[] = {
 	{
+		.procname	= "lazy_vfree_pages",
+		.data		= &sysctl_lazy_vfree_pages,
+		.maxlen		= sizeof(sysctl_lazy_vfree_pages),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
 		.procname	= "overcommit_memory",
 		.data		= &sysctl_overcommit_memory,
 		.maxlen		= sizeof(sysctl_overcommit_memory),
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 97d4b25..fa07966 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -619,13 +619,16 @@  static void unmap_vmap_area(struct vmap_area *va)
  * code, and it will be simple to change the scale factor if we find that it
  * becomes a problem on bigger systems.
  */
+
+int sysctl_lazy_vfree_pages = 32UL * 1024 * 1024 / PAGE_SIZE;
+
 static unsigned long lazy_max_pages(void)
 {
 	unsigned int log;
 
 	log = fls(num_online_cpus());
 
-	return log * (32UL * 1024 * 1024 / PAGE_SIZE);
+	return log * sysctl_lazy_vfree_pages;
 }
 
 static atomic_t vmap_lazy_nr = ATOMIC_INIT(0);