Patchwork PowerPC BUG: using smp_processor_id() in preemptible code

login
register
mail settings
Submitter Hugh Dickins
Date Dec. 29, 2010, 10:54 p.m.
Message ID <alpine.LSU.2.00.1012291413360.22939@sister.anvils>
Download mbox | patch
Permalink /patch/76954/
State Rejected
Delegated to: Benjamin Herrenschmidt
Headers show

Comments

Hugh Dickins - Dec. 29, 2010, 10:54 p.m.
With recent 2.6.37-rc, with CONFIG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y
on the PowerPC G5, I get spammed by BUG warnings each time I swapoff:

BUG: using smp_processor_id() in preemptible [00000000] code: swapoff/3974
caller is .hpte_need_flush+0x4c/0x2e8
Call Trace:
[c0000001b4a3f830] [c00000000000f3cc] .show_stack+0x6c/0x16c (unreliable)
[c0000001b4a3f8e0] [c00000000023eda0] .debug_smp_processor_id+0xe4/0x11c
[c0000001b4a3f970] [c00000000002f2f4] .hpte_need_flush+0x4c/0x2e8
[c0000001b4a3fa30] [c0000000000e7ef8] .vunmap_pud_range+0x148/0x200
[c0000001b4a3fb10] [c0000000000e8058] .vunmap_page_range+0xa8/0xd4
[c0000001b4a3fbb0] [c0000000000e80a4] .free_unmap_vmap_area+0x20/0x38
[c0000001b4a3fc40] [c0000000000e8138] .remove_vm_area+0x7c/0xb4
[c0000001b4a3fcd0] [c0000000000e8308] .__vunmap+0x50/0x104
[c0000001b4a3fd60] [c0000000000ef3fc] .SyS_swapoff+0x59c/0x6a8
[c0000001b4a3fe30] [c0000000000075a8] syscall_exit+0x0/0x40

I notice hpte_need_flush() itself acknowledges
 * Must be called from within some kind of spinlock/non-preempt region...

Though I didn't actually bisect, I believe this is since Jeremy's
64141da587241301ce8638cc945f8b67853156ec "vmalloc: eagerly clear ptes
on vunmap", which moves a call to vunmap_page_range() from one place
(which happened to be inside a spinlock) to another (where it's not).

I guess my warnings would be easily silenced by moving that call to
vunmap_page_range() down just inside the spinlock below it; but I'm
dubious that that's the right fix - it looked as if there are other
paths through vmalloc.c where vunmap_page_range() has been getting
called without preemption disabled, long before Jeremy's change,
just paths that I never happen to go down in my limited testing.

For the moment I'm using the obvious patch below to keep it quiet;
but I doubt that this is the right patch either.  I'm hoping that
ye who understand the importance of hpte_need_flush() will be best
able to judge what to do.  Or might there be other architectures
expecting to be unpreemptible there?

Thanks,
Hugh
Jeremy Fitzhardinge - Dec. 30, 2010, 1:26 a.m.
On 12/30/2010 09:54 AM, Hugh Dickins wrote:
> With recent 2.6.37-rc, with CONFIG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y
> on the PowerPC G5, I get spammed by BUG warnings each time I swapoff:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: swapoff/3974
> caller is .hpte_need_flush+0x4c/0x2e8
> Call Trace:
> [c0000001b4a3f830] [c00000000000f3cc] .show_stack+0x6c/0x16c (unreliable)
> [c0000001b4a3f8e0] [c00000000023eda0] .debug_smp_processor_id+0xe4/0x11c
> [c0000001b4a3f970] [c00000000002f2f4] .hpte_need_flush+0x4c/0x2e8
> [c0000001b4a3fa30] [c0000000000e7ef8] .vunmap_pud_range+0x148/0x200
> [c0000001b4a3fb10] [c0000000000e8058] .vunmap_page_range+0xa8/0xd4
> [c0000001b4a3fbb0] [c0000000000e80a4] .free_unmap_vmap_area+0x20/0x38
> [c0000001b4a3fc40] [c0000000000e8138] .remove_vm_area+0x7c/0xb4
> [c0000001b4a3fcd0] [c0000000000e8308] .__vunmap+0x50/0x104
> [c0000001b4a3fd60] [c0000000000ef3fc] .SyS_swapoff+0x59c/0x6a8
> [c0000001b4a3fe30] [c0000000000075a8] syscall_exit+0x0/0x40
>
> I notice hpte_need_flush() itself acknowledges
>  * Must be called from within some kind of spinlock/non-preempt region...
>
> Though I didn't actually bisect, I believe this is since Jeremy's
> 64141da587241301ce8638cc945f8b67853156ec "vmalloc: eagerly clear ptes
> on vunmap", which moves a call to vunmap_page_range() from one place
> (which happened to be inside a spinlock) to another (where it's not).

Yes.  I was  bit worried by the interaction between vb->lock and the pte
locks, and thought it safer to keep it outside.  Though I suppose kernel
mappings don't have pte locks, so that's a non-issue.

> I guess my warnings would be easily silenced by moving that call to
> vunmap_page_range() down just inside the spinlock below it; but I'm
> dubious that that's the right fix - it looked as if there are other
> paths through vmalloc.c where vunmap_page_range() has been getting
> called without preemption disabled, long before Jeremy's change,
> just paths that I never happen to go down in my limited testing.
>
> For the moment I'm using the obvious patch below to keep it quiet;
> but I doubt that this is the right patch either.  I'm hoping that
> ye who understand the importance of hpte_need_flush() will be best
> able to judge what to do.  Or might there be other architectures
> expecting to be unpreemptible there?

Since kernel mappings don't have a formal pte lock to rely on for
synchronization, each subsystem defines its own ad-hoc one.  In this
case that's probably vb->lock anyway, so the fix is to move
vunmap_page_range() back into its protection.

> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -37,11 +37,13 @@ static void vunmap_pte_range(pmd_t *pmd,
>  {
>  	pte_t *pte;
>  
> +	preempt_disable(); /* Stop __vunmap() triggering smp_processor_id() in preemptible from hpte_need_flush() */
>  	pte = pte_offset_kernel(pmd, addr);
>  	do {
>  		pte_t ptent = ptep_get_and_clear(&init_mm, addr, pte);
>  		WARN_ON(!pte_none(ptent) && !pte_present(ptent));
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
> +	preempt_enable();
>  }
>  
>  static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)

That looks OK, but it interferes with my plans to use
apply_to_page_range(_batch) to replace all this code.

    J
Benjamin Herrenschmidt - Dec. 30, 2010, 10:45 a.m.
On Wed, 2010-12-29 at 14:54 -0800, Hugh Dickins wrote:
> With recent 2.6.37-rc, with CONFIG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y
> on the PowerPC G5, I get spammed by BUG warnings each time I swapoff:
> 
> BUG: using smp_processor_id() in preemptible [00000000] code: swapoff/3974
> caller is .hpte_need_flush+0x4c/0x2e8
> Call Trace:
> [c0000001b4a3f830] [c00000000000f3cc] .show_stack+0x6c/0x16c (unreliable)
> [c0000001b4a3f8e0] [c00000000023eda0] .debug_smp_processor_id+0xe4/0x11c
> [c0000001b4a3f970] [c00000000002f2f4] .hpte_need_flush+0x4c/0x2e8
> [c0000001b4a3fa30] [c0000000000e7ef8] .vunmap_pud_range+0x148/0x200
> [c0000001b4a3fb10] [c0000000000e8058] .vunmap_page_range+0xa8/0xd4
> [c0000001b4a3fbb0] [c0000000000e80a4] .free_unmap_vmap_area+0x20/0x38
> [c0000001b4a3fc40] [c0000000000e8138] .remove_vm_area+0x7c/0xb4
> [c0000001b4a3fcd0] [c0000000000e8308] .__vunmap+0x50/0x104
> [c0000001b4a3fd60] [c0000000000ef3fc] .SyS_swapoff+0x59c/0x6a8
> [c0000001b4a3fe30] [c0000000000075a8] syscall_exit+0x0/0x40
> 
> I notice hpte_need_flush() itself acknowledges
>  * Must be called from within some kind of spinlock/non-preempt region...

Yes, we assume that the PTE lock is always held when modifying page
tables...

> Though I didn't actually bisect, I believe this is since Jeremy's
> 64141da587241301ce8638cc945f8b67853156ec "vmalloc: eagerly clear ptes
> on vunmap", which moves a call to vunmap_page_range() from one place
> (which happened to be inside a spinlock) to another (where it's not).
> 
> I guess my warnings would be easily silenced by moving that call to
> vunmap_page_range() down just inside the spinlock below it; but I'm
> dubious that that's the right fix - it looked as if there are other
> paths through vmalloc.c where vunmap_page_range() has been getting
> called without preemption disabled, long before Jeremy's change,
> just paths that I never happen to go down in my limited testing.
> 
> For the moment I'm using the obvious patch below to keep it quiet;
> but I doubt that this is the right patch either.  I'm hoping that
> ye who understand the importance of hpte_need_flush() will be best
> able to judge what to do.  Or might there be other architectures
> expecting to be unpreemptible there?

Well, it looks like our kernel mappings tend to take some nasty
shortcuts with the PTE locking, which I suppose are legit but do break
some of my assumptions there. I need to have a closer look. Thanks for
the report !

Cheers,
Ben.

> Thanks,
> Hugh
> 
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -37,11 +37,13 @@ static void vunmap_pte_range(pmd_t *pmd,
>  {
>  	pte_t *pte;
>  
> +	preempt_disable(); /* Stop __vunmap() triggering smp_processor_id() in preemptible from hpte_need_flush() */
>  	pte = pte_offset_kernel(pmd, addr);
>  	do {
>  		pte_t ptent = ptep_get_and_clear(&init_mm, addr, pte);
>  		WARN_ON(!pte_none(ptent) && !pte_present(ptent));
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
> +	preempt_enable();
>  }
>  
>  static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)

Patch

--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -37,11 +37,13 @@  static void vunmap_pte_range(pmd_t *pmd,
 {
 	pte_t *pte;
 
+	preempt_disable(); /* Stop __vunmap() triggering smp_processor_id() in preemptible from hpte_need_flush() */
 	pte = pte_offset_kernel(pmd, addr);
 	do {
 		pte_t ptent = ptep_get_and_clear(&init_mm, addr, pte);
 		WARN_ON(!pte_none(ptent) && !pte_present(ptent));
 	} while (pte++, addr += PAGE_SIZE, addr != end);
+	preempt_enable();
 }
 
 static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)