Patchwork [Maverick,1/1] UBUNTU: (pre-stable) mm: fix page table unmap for stack guard page properly

login
register
mail settings
Submitter Leann Ogasawara
Date Aug. 18, 2010, 6:34 p.m.
Message ID <1282156496.2848.57.camel@emiko>
Download mbox | patch
Permalink /patch/62079/
State Accepted
Delegated to: Leann Ogasawara
Headers show

Comments

Leann Ogasawara - Aug. 18, 2010, 6:34 p.m.
Hi All,

On one of my test systems I was noticing a regression which produced
multiple "BUG: scheduling while atomic" messages and my system would
subsequently hang.  This regression appears to have been introduced via
the 2.6.35.2 stable patches.  This has already been reported upstream
[1] and a subsequent patch applied to mainline (CC'd stable).  I've
built and tested a Maverick kernel with the patch applied and have
confirmed it resolves the regression.  Even though this is likely to be
included in the next 2.6.35.3 stable update, I'm proposing we pull it in
as a "pre-stable" patch for Maverick as I'm already seeing bug reports
in Launchpad which I believe will be resolved with this patch [2,3].  

Thanks,
Leann

[1] https://bugzilla.kernel.org/show_bug.cgi?id=16588
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/618846
[3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/619335

>From b674cf12c3792dc8c0cbf1082d5caf945378ca97 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 14 Aug 2010 11:44:56 -0700
Subject: [PATCH] UBUNTU: (pre-stable) mm: fix page table unmap for stack guard page properly
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We do in fact need to unmap the page table _before_ doing the whole
stack guard page logic, because if it is needed (mainly 32-bit x86 with
PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it
will do a kmap_atomic/kunmap_atomic.

And those kmaps will create an atomic region that we cannot do
allocations in.  However, the whole stack expand code will need to do
anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an
atomic region.

Now, a better model might actually be to do the anon_vma_prepare() when
_creating_ a VM_GROWSDOWN segment, and not have to worry about any of
this at page fault time.  But in the meantime, this is the
straightforward fix for the issue.

See https://bugzilla.kernel.org/show_bug.cgi?id=16588 for details.

Reported-by: Wylda <wylda@volny.cz>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-by: Mike Pagano <mpagano@gentoo.org>
Reported-by: François Valenduc <francois.valenduc@tvcablenet.be>
Tested-by: Ed Tomlinson <edt@aei.ca>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 11ac552477e32835cb6970bf0a70c210807f5673)

Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
---
 mm/memory.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)
Tim Gardner - Aug. 18, 2010, 6:42 p.m.
On 08/18/2010 12:34 PM, Leann Ogasawara wrote:
> Hi All,
>
> On one of my test systems I was noticing a regression which produced
> multiple "BUG: scheduling while atomic" messages and my system would
> subsequently hang.  This regression appears to have been introduced via
> the 2.6.35.2 stable patches.  This has already been reported upstream
> [1] and a subsequent patch applied to mainline (CC'd stable).  I've
> built and tested a Maverick kernel with the patch applied and have
> confirmed it resolves the regression.  Even though this is likely to be
> included in the next 2.6.35.3 stable update, I'm proposing we pull it in
> as a "pre-stable" patch for Maverick as I'm already seeing bug reports
> in Launchpad which I believe will be resolved with this patch [2,3].
>
> Thanks,
> Leann
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=16588
> [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/618846
> [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/619335
>
>> From b674cf12c3792dc8c0cbf1082d5caf945378ca97 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds<torvalds@linux-foundation.org>
> Date: Sat, 14 Aug 2010 11:44:56 -0700
> Subject: [PATCH] UBUNTU: (pre-stable) mm: fix page table unmap for stack guard page properly
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> We do in fact need to unmap the page table _before_ doing the whole
> stack guard page logic, because if it is needed (mainly 32-bit x86 with
> PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it
> will do a kmap_atomic/kunmap_atomic.
>
> And those kmaps will create an atomic region that we cannot do
> allocations in.  However, the whole stack expand code will need to do
> anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an
> atomic region.
>
> Now, a better model might actually be to do the anon_vma_prepare() when
> _creating_ a VM_GROWSDOWN segment, and not have to worry about any of
> this at page fault time.  But in the meantime, this is the
> straightforward fix for the issue.
>
> See https://bugzilla.kernel.org/show_bug.cgi?id=16588 for details.
>
> Reported-by: Wylda<wylda@volny.cz>
> Reported-by: Sedat Dilek<sedat.dilek@gmail.com>
> Reported-by: Mike Pagano<mpagano@gentoo.org>
> Reported-by: François Valenduc<francois.valenduc@tvcablenet.be>
> Tested-by: Ed Tomlinson<edt@aei.ca>
> Cc: Pekka Enberg<penberg@kernel.org>
> Cc: Greg KH<gregkh@suse.de>
> Cc: stable@kernel.org
> Signed-off-by: Linus Torvalds<torvalds@linux-foundation.org>
> (cherry picked from commit 11ac552477e32835cb6970bf0a70c210807f5673)
>
> Signed-off-by: Leann Ogasawara<leann.ogasawara@canonical.com>
> ---
>   mm/memory.c |   13 ++++++-------
>   1 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 4e23287..4122947 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2792,24 +2792,23 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   	spinlock_t *ptl;
>   	pte_t entry;
>
> -	if (check_stack_guard_page(vma, address)<  0) {
> -		pte_unmap(page_table);
> +	pte_unmap(page_table);
> +
> +	/* Check if we need to add a guard page to the stack */
> +	if (check_stack_guard_page(vma, address)<  0)
>   		return VM_FAULT_SIGBUS;
> -	}
>
> +	/* Use the zero-page for reads */
>   	if (!(flags&  FAULT_FLAG_WRITE)) {
>   		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
>   						vma->vm_page_prot));
> -		ptl = pte_lockptr(mm, pmd);
> -		spin_lock(ptl);
> +		page_table = pte_offset_map_lock(mm, pmd, address,&ptl);
>   		if (!pte_none(*page_table))
>   			goto unlock;
>   		goto setpte;
>   	}
>
>   	/* Allocate our own private page. */
> -	pte_unmap(page_table);
> -
>   	if (unlikely(anon_vma_prepare(vma)))
>   		goto oom;
>   	page = alloc_zeroed_user_highpage_movable(vma, address);

Absolutely. This has rippled all the way down to 2.6.27.52

Acked-by: Tim Gardner <tim.gardner@canonical.com>
Leann Ogasawara - Aug. 18, 2010, 6:51 p.m.
Applied to Maverick linux master.

Thanks,
Leann

On Wed, 2010-08-18 at 11:34 -0700, Leann Ogasawara wrote:
> Hi All,
> 
> On one of my test systems I was noticing a regression which produced
> multiple "BUG: scheduling while atomic" messages and my system would
> subsequently hang.  This regression appears to have been introduced via
> the 2.6.35.2 stable patches.  This has already been reported upstream
> [1] and a subsequent patch applied to mainline (CC'd stable).  I've
> built and tested a Maverick kernel with the patch applied and have
> confirmed it resolves the regression.  Even though this is likely to be
> included in the next 2.6.35.3 stable update, I'm proposing we pull it in
> as a "pre-stable" patch for Maverick as I'm already seeing bug reports
> in Launchpad which I believe will be resolved with this patch [2,3].  
> 
> Thanks,
> Leann
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=16588
> [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/618846
> [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/619335
> 
> >From b674cf12c3792dc8c0cbf1082d5caf945378ca97 Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Sat, 14 Aug 2010 11:44:56 -0700
> Subject: [PATCH] UBUNTU: (pre-stable) mm: fix page table unmap for stack guard page properly
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> We do in fact need to unmap the page table _before_ doing the whole
> stack guard page logic, because if it is needed (mainly 32-bit x86 with
> PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it
> will do a kmap_atomic/kunmap_atomic.
> 
> And those kmaps will create an atomic region that we cannot do
> allocations in.  However, the whole stack expand code will need to do
> anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an
> atomic region.
> 
> Now, a better model might actually be to do the anon_vma_prepare() when
> _creating_ a VM_GROWSDOWN segment, and not have to worry about any of
> this at page fault time.  But in the meantime, this is the
> straightforward fix for the issue.
> 
> See https://bugzilla.kernel.org/show_bug.cgi?id=16588 for details.
> 
> Reported-by: Wylda <wylda@volny.cz>
> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> Reported-by: Mike Pagano <mpagano@gentoo.org>
> Reported-by: François Valenduc <francois.valenduc@tvcablenet.be>
> Tested-by: Ed Tomlinson <edt@aei.ca>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Greg KH <gregkh@suse.de>
> Cc: stable@kernel.org
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> (cherry picked from commit 11ac552477e32835cb6970bf0a70c210807f5673)
> 
> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
> ---
>  mm/memory.c |   13 ++++++-------
>  1 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 4e23287..4122947 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2792,24 +2792,23 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	spinlock_t *ptl;
>  	pte_t entry;
>  
> -	if (check_stack_guard_page(vma, address) < 0) {
> -		pte_unmap(page_table);
> +	pte_unmap(page_table);
> +
> +	/* Check if we need to add a guard page to the stack */
> +	if (check_stack_guard_page(vma, address) < 0)
>  		return VM_FAULT_SIGBUS;
> -	}
>  
> +	/* Use the zero-page for reads */
>  	if (!(flags & FAULT_FLAG_WRITE)) {
>  		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
>  						vma->vm_page_prot));
> -		ptl = pte_lockptr(mm, pmd);
> -		spin_lock(ptl);
> +		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>  		if (!pte_none(*page_table))
>  			goto unlock;
>  		goto setpte;
>  	}
>  
>  	/* Allocate our own private page. */
> -	pte_unmap(page_table);
> -
>  	if (unlikely(anon_vma_prepare(vma)))
>  		goto oom;
>  	page = alloc_zeroed_user_highpage_movable(vma, address);
> -- 
> 1.7.0.4
> 
> 
> 
>

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 4e23287..4122947 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2792,24 +2792,23 @@  static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	spinlock_t *ptl;
 	pte_t entry;
 
-	if (check_stack_guard_page(vma, address) < 0) {
-		pte_unmap(page_table);
+	pte_unmap(page_table);
+
+	/* Check if we need to add a guard page to the stack */
+	if (check_stack_guard_page(vma, address) < 0)
 		return VM_FAULT_SIGBUS;
-	}
 
+	/* Use the zero-page for reads */
 	if (!(flags & FAULT_FLAG_WRITE)) {
 		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
 						vma->vm_page_prot));
-		ptl = pte_lockptr(mm, pmd);
-		spin_lock(ptl);
+		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
 		if (!pte_none(*page_table))
 			goto unlock;
 		goto setpte;
 	}
 
 	/* Allocate our own private page. */
-	pte_unmap(page_table);
-
 	if (unlikely(anon_vma_prepare(vma)))
 		goto oom;
 	page = alloc_zeroed_user_highpage_movable(vma, address);