Patchwork [4/7] nandsim: Don't use PF_MEMALLOC

login
register
mail settings
Submitter KOSAKI Motohiro
Date Nov. 17, 2009, 7:19 a.m.
Message ID <20091117161843.3DE0.A69D9226@jp.fujitsu.com>
Download mbox | patch
Permalink /patch/38583/
State New
Headers show

Comments

KOSAKI Motohiro - Nov. 17, 2009, 7:19 a.m.
Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
memory, anyone must not prevent it. Otherwise the system cause
mysterious hang-up and/or OOM Killer invokation.

Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 drivers/mtd/nand/nandsim.c |   22 ++--------------------
 1 files changed, 2 insertions(+), 20 deletions(-)
Artem Bityutskiy - Nov. 23, 2009, 3 p.m.
On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
> memory, anyone must not prevent it. Otherwise the system cause
> mysterious hang-up and/or OOM Killer invokation.
> 
> Cc: David Woodhouse <David.Woodhouse@intel.com>
> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> Cc: linux-mtd@lists.infradead.org
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>  1 files changed, 2 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
> index cd0711b..97a8bbb 100644
> --- a/drivers/mtd/nand/nandsim.c
> +++ b/drivers/mtd/nand/nandsim.c
> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>  	return 0;
>  }
>  
> -static int set_memalloc(void)
> -{
> -	if (current->flags & PF_MEMALLOC)
> -		return 0;
> -	current->flags |= PF_MEMALLOC;
> -	return 1;
> -}
> -
> -static void clear_memalloc(int memalloc)
> -{
> -	if (memalloc)
> -		current->flags &= ~PF_MEMALLOC;
> -}
> -
>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_read(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;
> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>  {
>  	mm_segment_t old_fs;
>  	ssize_t tx;
> -	int err, memalloc;
> +	int err;
>  
>  	err = get_pages(ns, file, count, *pos);
>  	if (err)
>  		return err;
>  	old_fs = get_fs();
>  	set_fs(get_ds());
> -	memalloc = set_memalloc();
>  	tx = vfs_write(file, (char __user *)buf, count, pos);
> -	clear_memalloc(memalloc);
>  	set_fs(old_fs);
>  	put_pages(ns);
>  	return tx;

I vaguely remember Adrian (CCed) did this on purpose. This is for the
case when nandsim emulates NAND flash on top of a file. So there are 2
file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
other owns the file which nandsim uses (e.g., ext3).

And I really cannot remember off the top of my head why he needed
PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
should not be a probelm?
Adrian Hunter - Nov. 23, 2009, 8:01 p.m.
Bityutskiy Artem (Nokia-D/Helsinki) wrote:
> On Tue, 2009-11-17 at 16:19 +0900, KOSAKI Motohiro wrote:
>> Non MM subsystem must not use PF_MEMALLOC. Memory reclaim need few
>> memory, anyone must not prevent it. Otherwise the system cause
>> mysterious hang-up and/or OOM Killer invokation.
>>
>> Cc: David Woodhouse <David.Woodhouse@intel.com>
>> Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
>> Cc: linux-mtd@lists.infradead.org
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> ---
>>  drivers/mtd/nand/nandsim.c |   22 ++--------------------
>>  1 files changed, 2 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
>> index cd0711b..97a8bbb 100644
>> --- a/drivers/mtd/nand/nandsim.c
>> +++ b/drivers/mtd/nand/nandsim.c
>> @@ -1322,34 +1322,18 @@ static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
>>  	return 0;
>>  }
>>  
>> -static int set_memalloc(void)
>> -{
>> -	if (current->flags & PF_MEMALLOC)
>> -		return 0;
>> -	current->flags |= PF_MEMALLOC;
>> -	return 1;
>> -}
>> -
>> -static void clear_memalloc(int memalloc)
>> -{
>> -	if (memalloc)
>> -		current->flags &= ~PF_MEMALLOC;
>> -}
>> -
>>  static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_read(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;
>> @@ -1359,16 +1343,14 @@ static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
>>  {
>>  	mm_segment_t old_fs;
>>  	ssize_t tx;
>> -	int err, memalloc;
>> +	int err;
>>  
>>  	err = get_pages(ns, file, count, *pos);
>>  	if (err)
>>  		return err;
>>  	old_fs = get_fs();
>>  	set_fs(get_ds());
>> -	memalloc = set_memalloc();
>>  	tx = vfs_write(file, (char __user *)buf, count, pos);
>> -	clear_memalloc(memalloc);
>>  	set_fs(old_fs);
>>  	put_pages(ns);
>>  	return tx;PF_MEMALLOC,
> 
> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> case when nandsim emulates NAND flash on top of a file. So there are 2
> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> other owns the file which nandsim uses (e.g., ext3).
> 
> And I really cannot remember off the top of my head why he needed
> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> should not be a probelm?
> 

Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
file system on top of nandsim which, in this case, is on top of another
file system.

I do not see how mempools will help here.

Please offer an alternative solution.
KOSAKI Motohiro - Nov. 24, 2009, 10:46 a.m.
Hi

Thank you for this useful comments.

> > I vaguely remember Adrian (CCed) did this on purpose. This is for the
> > case when nandsim emulates NAND flash on top of a file. So there are 2
> > file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> > other owns the file which nandsim uses (e.g., ext3).
> > 
> > And I really cannot remember off the top of my head why he needed
> > PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> > path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> > the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> > should not be a probelm?
> > 
> 
> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> file system on top of nandsim which, in this case, is on top of another
> file system.
> 
> I do not see how mempools will help here.
> 
> Please offer an alternative solution.

I have few questions.

Can you please explain more detail? Another stackable filesystam
(e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
What lock cause deadlock?
Adrian Hunter - Nov. 24, 2009, 11:56 a.m.
ext KOSAKI Motohiro wrote:
> Hi
> 
> Thank you for this useful comments.
> 
>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>> other owns the file which nandsim uses (e.g., ext3).
>>>
>>> And I really cannot remember off the top of my head why he needed
>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>> should not be a probelm?
>>>
>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>> file system on top of nandsim which, in this case, is on top of another
>> file system.
>>
>> I do not see how mempools will help here.
>>
>> Please offer an alternative solution.
> 
> I have few questions.
> 
> Can you please explain more detail? Another stackable filesystam
> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> What lock cause deadlock?
> 
> 
> 

The file systems are not stacked.  One is over nandsim, which nandsim
does not know about because it is just a lowly NAND device, and, with
the file cache option, one file system below to provide the file cache.

The deadlock is the kernel writing out dirty pages to the top file system
which writes to nandsim which writes to the bottom file system which
allocates memory which causes dirty pages to be written out to the top
file system, which tries to write to nandsim => deadlock.
KOSAKI Motohiro - Nov. 25, 2009, 12:42 a.m.
> ext KOSAKI Motohiro wrote:
> > Hi
> > 
> > Thank you for this useful comments.
> > 
> >>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>> other owns the file which nandsim uses (e.g., ext3).
> >>>
> >>> And I really cannot remember off the top of my head why he needed
> >>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>> should not be a probelm?
> >>>
> >> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >> file system on top of nandsim which, in this case, is on top of another
> >> file system.
> >>
> >> I do not see how mempools will help here.
> >>
> >> Please offer an alternative solution.
> > 
> > I have few questions.
> > 
> > Can you please explain more detail? Another stackable filesystam
> > (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> > What lock cause deadlock?
> 
> The file systems are not stacked.  One is over nandsim, which nandsim
> does not know about because it is just a lowly NAND device, and, with
> the file cache option, one file system below to provide the file cache.
> 
> The deadlock is the kernel writing out dirty pages to the top file system
> which writes to nandsim which writes to the bottom file system which
> allocates memory which causes dirty pages to be written out to the top
> file system, which tries to write to nandsim => deadlock.

You mean you want to prevent pageout() instead reclaim itself?
Dropping filecache seems don't make recursive call, right?
Adrian Hunter - Nov. 25, 2009, 7:13 a.m.
KOSAKI Motohiro wrote:
>> KOSAKI Motohiro wrote:
>>> Hi
>>>
>>> Thank you for this useful comments.
>>>
>>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
>>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
>>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
>>>>> other owns the file which nandsim uses (e.g., ext3).
>>>>>
>>>>> And I really cannot remember off the top of my head why he needed
>>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
>>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
>>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
>>>>> should not be a probelm?
>>>>>
>>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
>>>> file system on top of nandsim which, in this case, is on top of another
>>>> file system.
>>>>
>>>> I do not see how mempools will help here.
>>>>
>>>> Please offer an alternative solution.
>>> I have few questions.
>>>
>>> Can you please explain more detail? Another stackable filesystam
>>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
>>> What lock cause deadlock?
>> The file systems are not stacked.  One is over nandsim, which nandsim
>> does not know about because it is just a lowly NAND device, and, with
>> the file cache option, one file system below to provide the file cache.
>>
>> The deadlock is the kernel writing out dirty pages to the top file system
>> which writes to nandsim which writes to the bottom file system which
>> allocates memory which causes dirty pages to be written out to the top
>> file system, which tries to write to nandsim => deadlock.
> 
> You mean you want to prevent pageout() instead reclaim itself?

Yes

> Dropping filecache seems don't make recursive call, right?

Yes
KOSAKI Motohiro - Nov. 25, 2009, 7:18 a.m.
> KOSAKI Motohiro wrote:
> >> KOSAKI Motohiro wrote:
> >>> Hi
> >>>
> >>> Thank you for this useful comments.
> >>>
> >>>>> I vaguely remember Adrian (CCed) did this on purpose. This is for the
> >>>>> case when nandsim emulates NAND flash on top of a file. So there are 2
> >>>>> file-systems involved: one sits on top of nandsim (e.g. UBIFS) and the
> >>>>> other owns the file which nandsim uses (e.g., ext3).
> >>>>>
> >>>>> And I really cannot remember off the top of my head why he needed
> >>>>> PF_MEMALLOC, but I think Adrian wanted to prevent the direct reclaim
> >>>>> path to re-enter, say UBIFS, and cause deadlock. But I'd thing that all
> >>>>> the allocations in vfs_read()/vfs_write() should be GFP_NOFS, so that
> >>>>> should not be a probelm?
> >>>>>
> >>>> Yes it needs PF_MEMALLOC to prevent deadlock because there can be a
> >>>> file system on top of nandsim which, in this case, is on top of another
> >>>> file system.
> >>>>
> >>>> I do not see how mempools will help here.
> >>>>
> >>>> Please offer an alternative solution.
> >>> I have few questions.
> >>>
> >>> Can you please explain more detail? Another stackable filesystam
> >>> (e.g. ecryptfs) don't have such problem. Why nandsim have its issue?
> >>> What lock cause deadlock?
> >> The file systems are not stacked.  One is over nandsim, which nandsim
> >> does not know about because it is just a lowly NAND device, and, with
> >> the file cache option, one file system below to provide the file cache.
> >>
> >> The deadlock is the kernel writing out dirty pages to the top file system
> >> which writes to nandsim which writes to the bottom file system which
> >> allocates memory which causes dirty pages to be written out to the top
> >> file system, which tries to write to nandsim => deadlock.
> > 
> > You mean you want to prevent pageout() instead reclaim itself?
> 
> Yes
> 
> > Dropping filecache seems don't make recursive call, right?
> 
> Yes

o.k.

I really think the cache dropping shuoldn't be prevented because
typical linux box have lots droppable file cache and very few free pages.
but prevent pageout() seems not so problematic.

Thank you for good information.

Patch

diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
index cd0711b..97a8bbb 100644
--- a/drivers/mtd/nand/nandsim.c
+++ b/drivers/mtd/nand/nandsim.c
@@ -1322,34 +1322,18 @@  static int get_pages(struct nandsim *ns, struct file *file, size_t count, loff_t
 	return 0;
 }
 
-static int set_memalloc(void)
-{
-	if (current->flags & PF_MEMALLOC)
-		return 0;
-	current->flags |= PF_MEMALLOC;
-	return 1;
-}
-
-static void clear_memalloc(int memalloc)
-{
-	if (memalloc)
-		current->flags &= ~PF_MEMALLOC;
-}
-
 static ssize_t read_file(struct nandsim *ns, struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_read(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;
@@ -1359,16 +1343,14 @@  static ssize_t write_file(struct nandsim *ns, struct file *file, void *buf, size
 {
 	mm_segment_t old_fs;
 	ssize_t tx;
-	int err, memalloc;
+	int err;
 
 	err = get_pages(ns, file, count, *pos);
 	if (err)
 		return err;
 	old_fs = get_fs();
 	set_fs(get_ds());
-	memalloc = set_memalloc();
 	tx = vfs_write(file, (char __user *)buf, count, pos);
-	clear_memalloc(memalloc);
 	set_fs(old_fs);
 	put_pages(ns);
 	return tx;