diff mbox

[v2,2/2] powerpc/mm: Ensure "special" zones are empty

Message ID 1462434849-14935-2-git-send-email-oohall@gmail.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Oliver O'Halloran May 5, 2016, 7:54 a.m. UTC
The mm zone mechanism was traditionally used by arch specific code to
partition memory into allocation zones. However there are several zones
that are managed by the mm subsystem rather than the architecture. Most
architectures set the max PFN of these special zones to zero, however on
powerpc we set them to ~0ul. This, in conjunction with a bug in
free_area_init_nodes() results in all of system memory being placed in
ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
memory allocations so this will cause a kernel panic at boot.

Given the planned addition of more mm managed zones (ZONE_CMA) we should
aim to be consistent with every other architecture and set the max PFN for
these zones to zero.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: linux-mm@kvack.org
---
 arch/powerpc/mm/mem.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Michael Ellerman May 9, 2016, 11:55 p.m. UTC | #1
On Thu, 2016-05-05 at 07:54:09 UTC, Oliver O'Halloran wrote:
> The mm zone mechanism was traditionally used by arch specific code to
> partition memory into allocation zones. However there are several zones
> that are managed by the mm subsystem rather than the architecture. Most
> architectures set the max PFN of these special zones to zero, however on
> powerpc we set them to ~0ul. This, in conjunction with a bug in
> free_area_init_nodes() results in all of system memory being placed in
> ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
> memory allocations so this will cause a kernel panic at boot.

This is breaking my freescale machine:

  Sorting __ex_table...
  Unable to handle kernel paging request for data at address 0xc000000101e28020
  Faulting instruction address: 0xc0000000009ab698
  cpu 0x0: Vector: 300 (Data Access) at [c000000000acbb30]
      pc: c0000000009ab698: .reserve_bootmem_region+0x64/0x8c
      lr: c0000000009883d0: .free_all_bootmem+0x70/0x200
      sp: c000000000acbdb0
     msr: 80021000
     dar: c000000101e28020
   dsisr: 800000
    current = 0xc000000000a07640
    paca    = 0xc00000003fff5000	 softe: 0	 irq_happened: 0x01
      pid   = 0, comm = swapper
  Linux version 4.6.0-rc3-00160-gc09920947f23 (michael@ka1) (gcc version 5.3.0 (GCC) ) #5 SMP Tue May 10 09:44:11 AEST 2016
  enter ? for help
  [link register   ] c0000000009883d0 .free_all_bootmem+0x70/0x200
  [c000000000acbdb0] c000000000988398 .free_all_bootmem+0x38/0x200 (unreliable)
  [c000000000acbe80] c00000000097b700 .mem_init+0x5c/0x7c
  [c000000000acbef0] c000000000971a0c .start_kernel+0x28c/0x4e4
  [c000000000acbf90] c000000000000544 start_here_common+0x20/0x5c
  0:mon> ? 

I can give you access some time if you need to debug it.

cheers
Balbir Singh May 11, 2016, 7:11 a.m. UTC | #2
On Tue, 2016-05-10 at 09:55 +1000, Michael Ellerman wrote:
> On Thu, 2016-05-05 at 07:54:09 UTC, Oliver O'Halloran wrote:
> > 
> > The mm zone mechanism was traditionally used by arch specific code to
> > partition memory into allocation zones. However there are several zones
> > that are managed by the mm subsystem rather than the architecture. Most
> > architectures set the max PFN of these special zones to zero, however on
> > powerpc we set them to ~0ul. This, in conjunction with a bug in
> > free_area_init_nodes() results in all of system memory being placed in
> > ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
> > memory allocations so this will cause a kernel panic at boot.
> This is breaking my freescale machine:

>   Sorting __ex_table...
>   Unable to handle kernel paging request for data at address 0xc000000101e28020
>   Faulting instruction address: 0xc0000000009ab698
>   cpu 0x0: Vector: 300 (Data Access) at [c000000000acbb30]
>       pc: c0000000009ab698: .reserve_bootmem_region+0x64/0x8c
>       lr: c0000000009883d0: .free_all_bootmem+0x70/0x200
>       sp: c000000000acbdb0
>      msr: 80021000
>      dar: c000000101e28020
>    dsisr: 800000
>     current = 0xc000000000a07640
>     paca    = 0xc00000003fff5000	 softe: 0	 irq_happened: 0x01
>       pid   = 0, comm = swapper
>   Linux version 4.6.0-rc3-00160-gc09920947f23 (michael@ka1) (gcc version 5.3.0 (GCC) ) #5 SMP Tue May 10 09:44:11 AEST 2016
>   enter ? for help
>   [link register   ] c0000000009883d0 .free_all_bootmem+0x70/0x200
>   [c000000000acbdb0] c000000000988398 .free_all_bootmem+0x38/0x200 (unreliable)
>   [c000000000acbe80] c00000000097b700 .mem_init+0x5c/0x7c
>   [c000000000acbef0] c000000000971a0c .start_kernel+0x28c/0x4e4
>   [c000000000acbf90] c000000000000544 start_here_common+0x20/0x5c
>   0:mon> ? 

> I can give you access some time if you need to debug it.



Could you also please post the bits on the boot containing the zone
and node information. That would provide some information about what
is broken. Or you could just send the whole dmesg

Thanks,
Balbir Singh
diff mbox

Patch

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8f4c19789a38..f0a058ebb6d7 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -239,8 +239,14 @@  static int __init mark_nonram_nosave(void)
 
 static bool zone_limits_final;
 
+/*
+ * The memory zones past TOP_ZONE are managed by generic mm code.
+ * These should be set to zero since that's what every other
+ * architecture does.
+ */
 static unsigned long max_zone_pfns[MAX_NR_ZONES] = {
-	[0 ... MAX_NR_ZONES - 1] = ~0UL
+	[0        ... TOP_ZONE     - 1] = ~0UL,
+	[TOP_ZONE ... MAX_NR_ZONES - 1] = 0
 };
 
 /*