Patchwork [natty/ti-omap4,CVE,1/1] oom: use pte pages in OOM score

login
register
mail settings
Submitter Andy Whitcroft
Date Feb. 6, 2012, 9:37 a.m.
Message ID <1328521034-17401-2-git-send-email-apw@canonical.com>
Download mbox | patch
Permalink /patch/139700/
State New
Headers show

Comments

Andy Whitcroft - Feb. 6, 2012, 9:37 a.m.
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

PTE pages eat up memory just like anything else, but we do not account for
them in any way in the OOM scores.  They are also _guaranteed_ to get
freed up when a process is OOM killed, while RSS is not.

Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>		[2.6.36+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit f755a042d82b51b54f3bdd0890e5ea56c0fb6807)
CVE-2011-2498
BugLink: http://bugs.launchpad.net/bugs/922374
Signed-off-by: Andy Whitcroft <apw@canonical.com>
---
 mm/oom_kill.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)
Stefan Bader - Feb. 6, 2012, 9:44 a.m.
On 06.02.2012 10:37, Andy Whitcroft wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> PTE pages eat up memory just like anything else, but we do not account for
> them in any way in the OOM scores.  They are also _guaranteed_ to get
> freed up when a process is OOM killed, while RSS is not.
> 
> Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: <stable@kernel.org>		[2.6.36+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> (cherry picked from commit f755a042d82b51b54f3bdd0890e5ea56c0fb6807)
> CVE-2011-2498
> BugLink: http://bugs.launchpad.net/bugs/922374
> Signed-off-by: Andy Whitcroft <apw@canonical.com>
> ---
>  mm/oom_kill.c |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ea16f72..49ea0cc 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -172,10 +172,13 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  
>  	/*
>  	 * The baseline for the badness score is the proportion of RAM that each
> -	 * task's rss and swap space use.
> +	 * task's rss, pagetable and swap space use.
>  	 */
> -	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
> -			totalpages;
> +	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> +	points += get_mm_counter(p->mm, MM_SWAPENTS);
> +
> +	points *= 1000;
> +	points /= totalpages;
>  	task_unlock(p);
>  
>  	/*

Essentially only adding the pte count and cherry pick anyway...

Acked-by: Stefan Bader <smb@canonical.com>
Colin King - Feb. 6, 2012, 11:11 a.m.
On 06/02/12 09:37, Andy Whitcroft wrote:
> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>
> PTE pages eat up memory just like anything else, but we do not account for
> them in any way in the OOM scores.  They are also _guaranteed_ to get
> freed up when a process is OOM killed, while RSS is not.
>
> Reported-by: Dave Hansen<dave@linux.vnet.ibm.com>
> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Hugh Dickins<hughd@google.com>
> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Oleg Nesterov<oleg@redhat.com>
> Acked-by: David Rientjes<rientjes@google.com>
> Cc:<stable@kernel.org>		[2.6.36+]
> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds<torvalds@linux-foundation.org>
>
> (cherry picked from commit f755a042d82b51b54f3bdd0890e5ea56c0fb6807)
> CVE-2011-2498
> BugLink: http://bugs.launchpad.net/bugs/922374
> Signed-off-by: Andy Whitcroft<apw@canonical.com>
> ---
>   mm/oom_kill.c |    9 ++++++---
>   1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ea16f72..49ea0cc 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -172,10 +172,13 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>
>   	/*
>   	 * The baseline for the badness score is the proportion of RAM that each
> -	 * task's rss and swap space use.
> +	 * task's rss, pagetable and swap space use.
>   	 */
> -	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
> -			totalpages;
> +	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> +	points += get_mm_counter(p->mm, MM_SWAPENTS);
> +
> +	points *= 1000;
> +	points /= totalpages;
>   	task_unlock(p);
>
>   	/*

Makes sense to add in the pte count, and this is cherry pick, so..

Acked-by: Colin King <colin.king@canonical.com>
Andy Whitcroft - Feb. 6, 2012, 11:40 a.m.
Applied.

-apw
Herton Ronaldo Krzesinski - Feb. 6, 2012, 12:43 p.m.
On Mon, Feb 06, 2012 at 09:37:14AM +0000, Andy Whitcroft wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> PTE pages eat up memory just like anything else, but we do not account for
> them in any way in the OOM scores.  They are also _guaranteed_ to get
> freed up when a process is OOM killed, while RSS is not.
> 
> Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: <stable@kernel.org>		[2.6.36+]
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> (cherry picked from commit f755a042d82b51b54f3bdd0890e5ea56c0fb6807)
> CVE-2011-2498
> BugLink: http://bugs.launchpad.net/bugs/922374
> Signed-off-by: Andy Whitcroft <apw@canonical.com>
> ---
>  mm/oom_kill.c |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index ea16f72..49ea0cc 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -172,10 +172,13 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>  
>  	/*
>  	 * The baseline for the badness score is the proportion of RAM that each
> -	 * task's rss and swap space use.
> +	 * task's rss, pagetable and swap space use.
>  	 */
> -	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
> -			totalpages;
> +	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> +	points += get_mm_counter(p->mm, MM_SWAPENTS);
> +
> +	points *= 1000;
> +	points /= totalpages;

This split up of the computation introduced a bug in 64 bit arches, which
is fixed by commit ff05b6f. Arm should be unaffected, but natty have this
broken at least with x86_64, oneiric already got the fix through stable.

>  	task_unlock(p);
>  
>  	/*
> -- 
> 1.7.8.3
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
>
Tim Gardner - Feb. 6, 2012, 12:55 p.m.
On 02/06/2012 05:43 AM, Herton Ronaldo Krzesinski wrote:
> On Mon, Feb 06, 2012 at 09:37:14AM +0000, Andy Whitcroft wrote:
>> From: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>>
>> PTE pages eat up memory just like anything else, but we do not account for
>> them in any way in the OOM scores.  They are also _guaranteed_ to get
>> freed up when a process is OOM killed, while RSS is not.
>>
>> Reported-by: Dave Hansen<dave@linux.vnet.ibm.com>
>> Signed-off-by: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>> Cc: Hugh Dickins<hughd@google.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Oleg Nesterov<oleg@redhat.com>
>> Acked-by: David Rientjes<rientjes@google.com>
>> Cc:<stable@kernel.org>		[2.6.36+]
>> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
>> Signed-off-by: Linus Torvalds<torvalds@linux-foundation.org>
>>
>> (cherry picked from commit f755a042d82b51b54f3bdd0890e5ea56c0fb6807)
>> CVE-2011-2498
>> BugLink: http://bugs.launchpad.net/bugs/922374
>> Signed-off-by: Andy Whitcroft<apw@canonical.com>
>> ---
>>   mm/oom_kill.c |    9 ++++++---
>>   1 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index ea16f72..49ea0cc 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -172,10 +172,13 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
>>
>>   	/*
>>   	 * The baseline for the badness score is the proportion of RAM that each
>> -	 * task's rss and swap space use.
>> +	 * task's rss, pagetable and swap space use.
>>   	 */
>> -	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
>> -			totalpages;
>> +	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
>> +	points += get_mm_counter(p->mm, MM_SWAPENTS);
>> +
>> +	points *= 1000;
>> +	points /= totalpages;
>
> This split up of the computation introduced a bug in 64 bit arches, which
> is fixed by commit ff05b6f. Arm should be unaffected, but natty have this
> broken at least with x86_64, oneiric already got the fix through stable.
>


Good catch. Applied ff05b6f to natty/master-next.

rtg
Andy Whitcroft - Feb. 6, 2012, 2:23 p.m.
On Mon, Feb 06, 2012 at 10:43:50AM -0200, Herton Ronaldo Krzesinski wrote:
> On Mon, Feb 06, 2012 at 09:37:14AM +0000, Andy Whitcroft wrote:
> > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > 
> > PTE pages eat up memory just like anything else, but we do not account for
> > them in any way in the OOM scores.  They are also _guaranteed_ to get
> > freed up when a process is OOM killed, while RSS is not.
> > 
> > Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Acked-by: David Rientjes <rientjes@google.com>
> > Cc: <stable@kernel.org>		[2.6.36+]
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > 
> > (cherry picked from commit f755a042d82b51b54f3bdd0890e5ea56c0fb6807)
> > CVE-2011-2498
> > BugLink: http://bugs.launchpad.net/bugs/922374
> > Signed-off-by: Andy Whitcroft <apw@canonical.com>
> > ---
> >  mm/oom_kill.c |    9 ++++++---
> >  1 files changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index ea16f72..49ea0cc 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -172,10 +172,13 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> >  
> >  	/*
> >  	 * The baseline for the badness score is the proportion of RAM that each
> > -	 * task's rss and swap space use.
> > +	 * task's rss, pagetable and swap space use.
> >  	 */
> > -	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
> > -			totalpages;
> > +	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
> > +	points += get_mm_counter(p->mm, MM_SWAPENTS);
> > +
> > +	points *= 1000;
> > +	points /= totalpages;
> 
> This split up of the computation introduced a bug in 64 bit arches, which
> is fixed by commit ff05b6f. Arm should be unaffected, but natty have this
> broken at least with x86_64, oneiric already got the fix through stable.

Well spotted.  Thanks.

-apw

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ea16f72..49ea0cc 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -172,10 +172,13 @@  unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 
 	/*
 	 * The baseline for the badness score is the proportion of RAM that each
-	 * task's rss and swap space use.
+	 * task's rss, pagetable and swap space use.
 	 */
-	points = (get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS)) * 1000 /
-			totalpages;
+	points = get_mm_rss(p->mm) + p->mm->nr_ptes;
+	points += get_mm_counter(p->mm, MM_SWAPENTS);
+
+	points *= 1000;
+	points /= totalpages;
 	task_unlock(p);
 
 	/*