diff mbox series

external/mambo: Disable MEMORY_OVERFLOW

Message ID 20200625115655.955736-1-mpe@ellerman.id.au
State Accepted
Headers show
Series external/mambo: Disable MEMORY_OVERFLOW | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (68dc040a6540c218d20517764ff5d740a3626c55)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Michael Ellerman June 25, 2020, 11:56 a.m. UTC
Mambo has a strange feature called MEMORY_OVERFLOW, enabled by
default, which causes some accesses to non-existent memory addresses
to transparently "create" memory.

This can be confusing when debugging, eg:

  systemsim % mysim cpu 0 display spr pc
  0xC0000000000246B8
  systemsim % mysim memory display 0xC0000000000246B8 8
  0x0000000000000000

Appears to show that the memory at pc (NIP) is currently zeroes.

The astute observer will note that "mysim memory display" takes
physical addresses, not effective addresses. So unless this machine
has > 12XB of RAM, this access should have failed as there is no
memory at that address.

Turning MEMORY_OVERFLOW off gives us a much more sensible result:

  systemsim % mysim memory display 0xC0000000000246B8 8
  Illegal Address 0xC0000000000246B8

It doesn't appear to have any effect on accesses done from Linux, with
the setting enabled or disabled we still get a machine check for bad
accesses in real mode:

  0:mon> p $__mmu_off
  return value is 0x9000000030001003
  0:mon> d c008000000000000
  c008000000000000 [15447.242793] Disabling lock debugging due to kernel taint
  [15447.242824] MCE: CPU0: machine check (Severe) Host Real address Load/Store (foreign) [Not recovered]
  [15447.242868] MCE: CPU0: NIP: [c000000000103090] mread+0x90/0x190
  [15447.242897] MCE: CPU0: Initiator CPU
  [15447.242918] MCE: CPU0: Hardware error
  [15447.242939] opal: Hardware platform error: Unrecoverable Machine Check exception
  [15447.242978] CPU: 0 PID: 18 Comm: kopald Tainted: G   M              5.7.0-gcc-8.2.0-13354-gd475a86526b5-dirty #127
  [15447.243022] NIP:  c000000000103090 LR: c000000000103034 CTR: c0000000000c2050
  [15447.243059] REGS: c0000000fffbfd70 TRAP: 0200   Tainted: G   M               (5.7.0-gcc-8.2.0-13354-gd475a86526b5-dirty)
  [15447.243101] MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 28002224  XER: 20040000
  [15447.243150] CFAR: c000000000029660 DAR: c008000000000000 DSISR: 00000008 IRQMASK: 3
  [15447.243150] GPR00: 0000000028002224 c0000000f05074c0 c000000001775e00 0000000000000000
  [15447.243150] GPR04: c0000000f0507768 0000000000000010 0000000000000010 c0000000f0507768
  [15447.243150] GPR08: 0007ffff0faf8899 0000000000000000 c0000000f0507767 9000000030001003
  [15447.243150] GPR12: 0000000031c10000 c000000001970000 c0000000f0507738 0000000000000000
  [15447.243150] GPR16: 000000000000002e 0000000000000010 000000000000002e 0000000000000010
  [15447.243150] GPR20: c000000000e563d0 c000000000ffcf88 c000000000fddf30 c008000000000000
  [15447.243150] GPR24: c0000000f0507767 c000000000ffc678 0000000000000040 c0000000f0507878
  [15447.243150] GPR28: c0000000f0507768 c0000000f0507768 0000000000000010 0000000000000000
  [15447.243485] NIP [c000000000103090] mread+0x90/0x190
  [15447.243514] LR [c000000000103034] mread+0x34/0x190
  [15447.243538] Call Trace:
  [15447.243559] [c0000000f05074c0] [c0000000f0507510] 0xc0000000f0507510 (unreliable)
  [15447.243602] [c0000000f0507510] [c000000000107b48] xmon_core+0xd18/0x3a10
  [15447.243640] [c0000000f0507850] [c00000000010a870] xmon+0x30/0x80
  [15447.243677] [c0000000f05079f0] [c0000000001050d4] sysrq_handle_xmon+0xc4/0xd0
  [15447.243720] [c0000000f0507a20] [c000000000882954] __handle_sysrq+0xd4/0x220
  [15447.243758] [c0000000f0507ac0] [c00000000089db68] __hvc_poll+0x268/0x4b0
  [15447.243796] [c0000000f0507b60] [c00000000089e824] hvc_handle_interrupt+0x24/0x70
  [15447.243838] [c0000000f0507b80] [c0000000001c2474] __handle_irq_event_percpu+0xb4/0x350
  [15447.243880] [c0000000f0507c50] [c0000000001c2754] handle_irq_event_percpu+0x44/0xc0
  [15447.243923] [c0000000f0507c90] [c0000000001c2844] handle_irq_event+0x74/0xc0
  [15447.243965] [c0000000f0507cc0] [c0000000001c9770] handle_level_irq+0xe0/0x1c0
  [15447.244006] [c0000000f0507cf0] [c0000000001c0a44] generic_handle_irq+0x54/0x80
  [15447.244046] [c0000000f0507d10] [c0000000000ca050] opal_handle_events+0x130/0x150
  [15447.244088] [c0000000f0507d60] [c0000000000c1e7c] kopald+0x9c/0x140
  [15447.244124] [c0000000f0507db0] [c00000000016ea9c] kthread+0x19c/0x1b0
  [15447.244161] [c0000000f0507e20] [c00000000000d1a8] ret_from_kernel_thread+0x5c/0x74

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 external/mambo/skiboot.tcl | 1 +
 1 file changed, 1 insertion(+)

Comments

Gustavo Romero June 26, 2020, 5:31 p.m. UTC | #1
Hi Michael,

On 6/25/20 8:56 AM, Michael Ellerman wrote:
> Mambo has a strange feature called MEMORY_OVERFLOW, enabled by
> default, which causes some accesses to non-existent memory addresses
> to transparently "create" memory.
> 
> This can be confusing when debugging, eg:
> 
>    systemsim % mysim cpu 0 display spr pc
>    0xC0000000000246B8
>    systemsim % mysim memory display 0xC0000000000246B8 8
>    0x0000000000000000
> 
> Appears to show that the memory at pc (NIP) is currently zeroes.
> 
> The astute observer will note that "mysim memory display" takes
> physical addresses, not effective addresses. So unless this machine
> has > 12XB of RAM, this access should have failed as there is no
> memory at that address.
> 
> Turning MEMORY_OVERFLOW off gives us a much more sensible result:
> 
>    systemsim % mysim memory display 0xC0000000000246B8 8
>    Illegal Address 0xC0000000000246B8
> 
> It doesn't appear to have any effect on accesses done from Linux, with
> the setting enabled or disabled we still get a machine check for bad
> accesses in real mode:

With that change applied, on mambo P10 running on a POWER8 I'm getting
the following mambo exception that forbids the kernel to continue booting:

[...]
142233280: (536372251): [    0.001554] printk: bootconsole [udbg0] disabled
142387801: (537126772): [    0.001870] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
142412629: (537151600): [    0.001919] pid_max: default: 32768 minimum: 301
142549799: (537888770): [    0.002187] Mount-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
142570582: (537909553): [    0.002228] Mountpoint-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
143544723: (541883694): [    0.004130] EEH: PowerNV platform initialized
143557364: (541896335): [    0.004155] POWER9 performance monitor hardware support registered
143629574: (541968545): [    0.004296] rcu: Hierarchical SRCU implementation.
143904253: (543143224): [    0.004833] smp: Bringing up secondary CPUs ...
WARNING: 145271326: (548660031): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
FATAL ERROR: 145271326: (548660031): Attempt to store non-existent address 0x00000E995A3AF7B0
INFO: 145271326: (548660032): ** Execution stopped: Mambo Error,  **
145271326: ** finished running 548660032 instructions **
systemsim % c
145785082: (550674532): [    0.008506] smp: Brought up 2 nodes, 4 CPUs
145798101: (550687551): [    0.008527] numa: Node 0 CPUs: 0-1
145810514: (550799964): [    0.008551] numa: Node 1 CPUs: 2-3
145820419: (550809869): [    0.008575] Using standard scheduler topology
159140153: (604064242): [    0.034590] node 0 initialised, 118050 pages in 10ms
159167293: (604195562): [    0.034643] pgdatinit0 (31) used greatest stack depth: 13408 bytes left
160000000: [0:0:0]: (PC:0xC0000000001C164C) :      7.1 Mega-Inst/Sec :      7.1 Mega-Cycles/Sec [17 Zaps  0 PA-Zaps] *ON*  [0:0:0] pri=4 extra=0
160000000: [0:0:1]: (PC:0xC0000000003A162C) :      7.1 Mega-Inst/Sec *ON*  [0:0:1] pri=4 extra=0
160000000: [0:1:0]: (PC:0xC00000000019AA58) :      7.1 Mega-Inst/Sec *ON*  [0:1:0] pri=2 extra=0
160000000: [0:1:1]: (PC:0xC00000000019A5F4) :      7.1 Mega-Inst/Sec *ON*  [0:1:1] pri=2 extra=0
160134492: (608065254): [    0.036532] devtmpfs: initialized
163550958: (621787960): [    0.043205] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
163574637: (621811639): [    0.043251] futex hash table entries: 1024 (order: 1, 131072 bytes, linear)
164292187: (624696519): [    0.044653] NET: Registered protocol family 16
164770384: (626692640): [    0.045587] audit: initializing netlink subsys (disabled)
165130649: (628049925): [    0.046290] audit: type=2000 audit(1024007219.020:1): state=initialized audit_enabled=0 res=1
165359745: (629069307): [    0.046738] cpuidle: using governor menu
165482259: (629491821): [    0.046977] pstore: Registered nvram as persistent store backend
167333983: (636947373): [    0.050594] PCI: Probing PCI hardware
168184808: (640240704): [    0.052255] cpuidle-powernv: Default stop: psscr = 0x0000000000000300,mask=0x00000000003003ff
168204280: (640260428): [    0.052293] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300322,mask=0x00000000003003ff
168223015: (640279240): [    0.052330] cpuidle-powernv: First stop level that may lose SPRs = 0x10
168238094: (640294710): [    0.052359] cpuidle-powernv: First stop level that may lose timebase = 0x10
175060241: (647321380): [    0.065684] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
175077028: (647338167): [    0.065717] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
177879965: (651314704): [    0.071191] iommu: Default domain type: Translated
177936270: (651386119): [    0.071302] vgaarb: loaded
178219574: (651669423): [    0.071855] SCSI subsystem initialized
178331796: (651781645): [    0.072074] usbcore: registered new interface driver usbfs
178364558: (651814407): [    0.072138] usbcore: registered new interface driver hub
178397036: (651981802): [    0.072201] usbcore: registered new device driver usb
179169296: (653522982): [    0.073710] clocksource: Switched to clocksource timebase
WARNING: 179516247: (653875633): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
FATAL ERROR: 179516247: (653875633): Attempt to store non-existent address 0x00000E995A3AF7B0
INFO: 179516247: (653875634): ** Execution stopped: Mambo Error,  **
179516247: ** finished running 653875634 instructions **
systemsim %

I don't understand yet what's happening. Gleen does not hit it either
(he is running mambo on a POWER9 tho). Which mambo / host combination
have you tried it?

Thanks.


Cheers,
Gustavo
Michael Ellerman June 29, 2020, 1:27 a.m. UTC | #2
Gustavo Romero <gromero@linux.vnet.ibm.com> writes:
> Hi Michael,
>
> On 6/25/20 8:56 AM, Michael Ellerman wrote:
>> Mambo has a strange feature called MEMORY_OVERFLOW, enabled by
>> default, which causes some accesses to non-existent memory addresses
>> to transparently "create" memory.
>> 
>> This can be confusing when debugging, eg:
>> 
>>    systemsim % mysim cpu 0 display spr pc
>>    0xC0000000000246B8
>>    systemsim % mysim memory display 0xC0000000000246B8 8
>>    0x0000000000000000
>> 
>> Appears to show that the memory at pc (NIP) is currently zeroes.
>> 
>> The astute observer will note that "mysim memory display" takes
>> physical addresses, not effective addresses. So unless this machine
>> has > 12XB of RAM, this access should have failed as there is no
>> memory at that address.
>> 
>> Turning MEMORY_OVERFLOW off gives us a much more sensible result:
>> 
>>    systemsim % mysim memory display 0xC0000000000246B8 8
>>    Illegal Address 0xC0000000000246B8
>> 
>> It doesn't appear to have any effect on accesses done from Linux, with
>> the setting enabled or disabled we still get a machine check for bad
>> accesses in real mode:
>
> With that change applied, on mambo P10 running on a POWER8 I'm getting
> the following mambo exception that forbids the kernel to continue booting:

This looks like exactly the kind of thing we want to catch, so that's
"good" :)

> [...]
> 142233280: (536372251): [    0.001554] printk: bootconsole [udbg0] disabled
> 142387801: (537126772): [    0.001870] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
> 142412629: (537151600): [    0.001919] pid_max: default: 32768 minimum: 301
> 142549799: (537888770): [    0.002187] Mount-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
> 142570582: (537909553): [    0.002228] Mountpoint-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
> 143544723: (541883694): [    0.004130] EEH: PowerNV platform initialized
> 143557364: (541896335): [    0.004155] POWER9 performance monitor hardware support registered
> 143629574: (541968545): [    0.004296] rcu: Hierarchical SRCU implementation.
> 143904253: (543143224): [    0.004833] smp: Bringing up secondary CPUs ...
> WARNING: 145271326: (548660031): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
> FATAL ERROR: 145271326: (548660031): Attempt to store non-existent address 0x00000E995A3AF7B0
> INFO: 145271326: (548660032): ** Execution stopped: Mambo Error,  **
> 145271326: ** finished running 548660032 instructions **

Can you see where the bad store came from, "bt" should give you a backtrace.

Then "p pc" should give you the PC and "di <that value>" should show us
what instruction it was.

> systemsim % c
> 145785082: (550674532): [    0.008506] smp: Brought up 2 nodes, 4 CPUs
> 145798101: (550687551): [    0.008527] numa: Node 0 CPUs: 0-1
> 145810514: (550799964): [    0.008551] numa: Node 1 CPUs: 2-3

Does it still happen with a single CPU?

> 145820419: (550809869): [    0.008575] Using standard scheduler topology
> 159140153: (604064242): [    0.034590] node 0 initialised, 118050 pages in 10ms
> 159167293: (604195562): [    0.034643] pgdatinit0 (31) used greatest stack depth: 13408 bytes left
> 160000000: [0:0:0]: (PC:0xC0000000001C164C) :      7.1 Mega-Inst/Sec :      7.1 Mega-Cycles/Sec [17 Zaps  0 PA-Zaps] *ON*  [0:0:0] pri=4 extra=0
> 160000000: [0:0:1]: (PC:0xC0000000003A162C) :      7.1 Mega-Inst/Sec *ON*  [0:0:1] pri=4 extra=0
> 160000000: [0:1:0]: (PC:0xC00000000019AA58) :      7.1 Mega-Inst/Sec *ON*  [0:1:0] pri=2 extra=0
> 160000000: [0:1:1]: (PC:0xC00000000019A5F4) :      7.1 Mega-Inst/Sec *ON*  [0:1:1] pri=2 extra=0
> 160134492: (608065254): [    0.036532] devtmpfs: initialized
> 163550958: (621787960): [    0.043205] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> 163574637: (621811639): [    0.043251] futex hash table entries: 1024 (order: 1, 131072 bytes, linear)
> 164292187: (624696519): [    0.044653] NET: Registered protocol family 16
> 164770384: (626692640): [    0.045587] audit: initializing netlink subsys (disabled)
> 165130649: (628049925): [    0.046290] audit: type=2000 audit(1024007219.020:1): state=initialized audit_enabled=0 res=1
> 165359745: (629069307): [    0.046738] cpuidle: using governor menu
> 165482259: (629491821): [    0.046977] pstore: Registered nvram as persistent store backend
> 167333983: (636947373): [    0.050594] PCI: Probing PCI hardware
> 168184808: (640240704): [    0.052255] cpuidle-powernv: Default stop: psscr = 0x0000000000000300,mask=0x00000000003003ff
> 168204280: (640260428): [    0.052293] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300322,mask=0x00000000003003ff
> 168223015: (640279240): [    0.052330] cpuidle-powernv: First stop level that may lose SPRs = 0x10
> 168238094: (640294710): [    0.052359] cpuidle-powernv: First stop level that may lose timebase = 0x10
> 175060241: (647321380): [    0.065684] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
> 175077028: (647338167): [    0.065717] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
> 177879965: (651314704): [    0.071191] iommu: Default domain type: Translated
> 177936270: (651386119): [    0.071302] vgaarb: loaded
> 178219574: (651669423): [    0.071855] SCSI subsystem initialized
> 178331796: (651781645): [    0.072074] usbcore: registered new interface driver usbfs
> 178364558: (651814407): [    0.072138] usbcore: registered new interface driver hub
> 178397036: (651981802): [    0.072201] usbcore: registered new device driver usb
> 179169296: (653522982): [    0.073710] clocksource: Switched to clocksource timebase
> WARNING: 179516247: (653875633): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
> FATAL ERROR: 179516247: (653875633): Attempt to store non-existent address 0x00000E995A3AF7B0
> INFO: 179516247: (653875634): ** Execution stopped: Mambo Error,  **
> 179516247: ** finished running 653875634 instructions **
> systemsim %
>
> I don't understand yet what's happening. Gleen does not hit it either
> (he is running mambo on a POWER9 tho). Which mambo / host combination
> have you tried it?

I tested Power8 / Power9 emulated on x86-64 and ppc64le.

cheers
Gustavo Romero July 2, 2020, 1:56 a.m. UTC | #3
Hello Michael,

On 6/28/20 10:27 PM, Michael Ellerman wrote:
> Gustavo Romero <gromero@linux.vnet.ibm.com> writes:
>> Hi Michael,
>>
>> On 6/25/20 8:56 AM, Michael Ellerman wrote:
>>> Mambo has a strange feature called MEMORY_OVERFLOW, enabled by
>>> default, which causes some accesses to non-existent memory addresses
>>> to transparently "create" memory.
>>>
>>> This can be confusing when debugging, eg:
>>>
>>>     systemsim % mysim cpu 0 display spr pc
>>>     0xC0000000000246B8
>>>     systemsim % mysim memory display 0xC0000000000246B8 8
>>>     0x0000000000000000
>>>
>>> Appears to show that the memory at pc (NIP) is currently zeroes.
>>>
>>> The astute observer will note that "mysim memory display" takes
>>> physical addresses, not effective addresses. So unless this machine
>>> has > 12XB of RAM, this access should have failed as there is no
>>> memory at that address.
>>>
>>> Turning MEMORY_OVERFLOW off gives us a much more sensible result:
>>>
>>>     systemsim % mysim memory display 0xC0000000000246B8 8
>>>     Illegal Address 0xC0000000000246B8
>>>
>>> It doesn't appear to have any effect on accesses done from Linux, with
>>> the setting enabled or disabled we still get a machine check for bad
>>> accesses in real mode:
>>
>> With that change applied, on mambo P10 running on a POWER8 I'm getting
>> the following mambo exception that forbids the kernel to continue booting:
> 
> This looks like exactly the kind of thing we want to catch, so that's
> "good" :)
> 
>> [...]
>> 142233280: (536372251): [    0.001554] printk: bootconsole [udbg0] disabled
>> 142387801: (537126772): [    0.001870] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
>> 142412629: (537151600): [    0.001919] pid_max: default: 32768 minimum: 301
>> 142549799: (537888770): [    0.002187] Mount-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
>> 142570582: (537909553): [    0.002228] Mountpoint-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
>> 143544723: (541883694): [    0.004130] EEH: PowerNV platform initialized
>> 143557364: (541896335): [    0.004155] POWER9 performance monitor hardware support registered
>> 143629574: (541968545): [    0.004296] rcu: Hierarchical SRCU implementation.
>> 143904253: (543143224): [    0.004833] smp: Bringing up secondary CPUs ...
>> WARNING: 145271326: (548660031): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
>> FATAL ERROR: 145271326: (548660031): Attempt to store non-existent address 0x00000E995A3AF7B0
>> INFO: 145271326: (548660032): ** Execution stopped: Mambo Error,  **
>> 145271326: ** finished running 548660032 instructions **
> 
> Can you see where the bad store came from, "bt" should give you a backtrace.
> 
> Then "p pc" should give you the PC and "di <that value>" should show us
> what instruction it was.

PC itself is parked, believe it or not, at a 'nop' instruction:

151630615: (157638813): [    0.040390] pstore: Registered nvram as persistent store backend
152359208: (158398639): [    0.041813] PCI: Probing PCI hardware
152666145: (158718713): [    0.042412] audit: type=2000 audit(1024007219.010:1): state=initialized audit_enabled=0 res=1
153055303: (159124575): [    0.043172] cpuidle-powernv: Default stop: psscr = 0x0000000000000300,mask=0x00000000003003ff
153074775: (159144887): [    0.043210] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300322,mask=0x00000000003003ff
153093510: (159164435): [    0.043247] cpuidle-powernv: First stop level that may lose SPRs = 0x10
153108589: (159180151): [    0.043276] cpuidle-powernv: First stop level that may lose timebase = 0x10
WARNING: 156265663: (162472569): Write_Mapped_Memory_Reg: Unknown address: 0x00000BEE18C96D60, length=8
FATAL ERROR: 156265663: (162472569): Attempt to store non-existent address 0x00000BEE18C96D60
INFO: 156265663: (162472570): ** Execution stopped: Mambo Error,  **
156265663: ** finished running 162472570 instructions **
systemsim % p pc
0xC0000000003A2EC4
systemsim % di 0xC0000000003A2EC4
WARNING: 156265663: (162472570): Need to define a CPU
WARNING: 156265663: (162472570): Need to define a CPU
EADDR:0xC0000000003A2EC0 RADDR:0x003A2EC0 Enc:0x2C4A007C : dcbt    r0,r9,0
WARNING: 156265663: (162472570): Need to define a CPU
WARNING: 156265663: (162472570): Need to define a CPU
EADDR:0xC0000000003A2EC4 RADDR:0x003A2EC4 Enc:0x00000060 : nop
WARNING: 156265663: (162472570): Need to define a CPU
WARNING: 156265663: (162472570): Need to define a CPU
EADDR:0xC0000000003A2EC8 RADDR:0x003A2EC8 Enc:0x00000060 : nop
WARNING: 156265663: (162472570): Need to define a CPU
WARNING: 156265663: (162472570): Need to define a CPU


But just before the 'nop' there is a dcbt. But address passed to the dcbt,
in GPR 9, doesn't contain anything close to the address displayed by mambo:

systemsim % mysim cpu 0:0:0 display gpr 9
0x0000000000000240
systemsim % mysim mcm 0 cpu 0 thread 0 dtranslate 0x240
data address translation for 0x0000000000000240 failed
systemsim %

The dcbt instruction is in mm/slub.c; more context:

1241956         prefetch(object + s->offset);
1241957 c0000000003a2eb4:       20 00 3f 81     lwz     r9,32(r31)
1241958         if (unlikely(!x))
1241959 c0000000003a2eb8:       15 4a 3a 7d     add.    r9,r26,r9
1241960 c0000000003a2ebc:       08 00 82 41     beq     c0000000003a2ec4 <kmem_cache_alloc+0x114>
1241961         __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
1241962 c0000000003a2ec0:       2c 4a 00 7c     dcbt    0,r9
1241963 c0000000003a2ec4:       00 00 00 60     nop
1241964 c0000000003a2ec8:       00 00 00 60     nop

so probably from a prefetch_freepointer() in
https://github.com/torvalds/linux/blob/master/mm/slub.c#L2815


>> systemsim % c
>> 145785082: (550674532): [    0.008506] smp: Brought up 2 nodes, 4 CPUs
>> 145798101: (550687551): [    0.008527] numa: Node 0 CPUs: 0-1
>> 145810514: (550799964): [    0.008551] numa: Node 1 CPUs: 2-3
> 
> Does it still happen with a single CPU?

It still happens with maxcpus=0 or =1. However if I disable radix passing
disable_radix=1 to command line I'm able to boot.

Today I've built upstream mambo and the same issue happens. I'm clueless
yet what's happening... so if you have additional things to try let
me know. It might be an issue with Mambo P10 running on P8.


Thanks,
Gustavo
Oliver O'Halloran July 2, 2020, 3:10 a.m. UTC | #4
On Thu, Jul 2, 2020 at 11:56 AM Gustavo Romero
<gromero@linux.vnet.ibm.com> wrote:
>
>
> But just before the 'nop' there is a dcbt. But address passed to the dcbt,
> in GPR 9, doesn't contain anything close to the address displayed by mambo:
>
> systemsim % mysim cpu 0:0:0 display gpr 9
> 0x0000000000000240

What in the other GPRs?

> systemsim % mysim mcm 0 cpu 0 thread 0 dtranslate 0x240
> data address translation for 0x0000000000000240 failed
> systemsim %
>
> The dcbt instruction is in mm/slub.c; more context:
>
> 1241956         prefetch(object + s->offset);
> 1241957 c0000000003a2eb4:       20 00 3f 81     lwz     r9,32(r31)
> 1241958         if (unlikely(!x))
> 1241959 c0000000003a2eb8:       15 4a 3a 7d     add.    r9,r26,r9
> 1241960 c0000000003a2ebc:       08 00 82 41     beq     c0000000003a2ec4 <kmem_cache_alloc+0x114>
> 1241961         __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
> 1241962 c0000000003a2ec0:       2c 4a 00 7c     dcbt    0,r9
> 1241963 c0000000003a2ec4:       00 00 00 60     nop
> 1241964 c0000000003a2ec8:       00 00 00 60     nop
>
> so probably from a prefetch_freepointer() in
> https://github.com/torvalds/linux/blob/master/mm/slub.c#L2815
>
>
> >> systemsim % c
> >> 145785082: (550674532): [    0.008506] smp: Brought up 2 nodes, 4 CPUs
> >> 145798101: (550687551): [    0.008527] numa: Node 0 CPUs: 0-1
> >> 145810514: (550799964): [    0.008551] numa: Node 1 CPUs: 2-3
> >
> > Does it still happen with a single CPU?
>
> It still happens with maxcpus=0 or =1. However if I disable radix passing
> disable_radix=1 to command line I'm able to boot.

What about with the number of threads configured in skiboot.tcl set to 1?

> Today I've built upstream mambo and the same issue happens. I'm clueless
> yet what's happening... so if you have additional things to try let
> me know. It might be an issue with Mambo P10 running on P8.

It's probably just a P10 mambo bug. Even if it's not they're more
likely to have an answer for you about what's going on. Anyway, I'll
hold off on merging this until people figure it out.
Michael Ellerman July 2, 2020, 7:03 a.m. UTC | #5
Gustavo Romero <gromero@linux.vnet.ibm.com> writes:
> On 6/28/20 10:27 PM, Michael Ellerman wrote:
>> Gustavo Romero <gromero@linux.vnet.ibm.com> writes:
>>> On 6/25/20 8:56 AM, Michael Ellerman wrote:
>>>> Mambo has a strange feature called MEMORY_OVERFLOW, enabled by
>>>> default, which causes some accesses to non-existent memory addresses
>>>> to transparently "create" memory.
>>>>
>>>> This can be confusing when debugging, eg:
>>>>
>>>>     systemsim % mysim cpu 0 display spr pc
>>>>     0xC0000000000246B8
>>>>     systemsim % mysim memory display 0xC0000000000246B8 8
>>>>     0x0000000000000000
>>>>
>>>> Appears to show that the memory at pc (NIP) is currently zeroes.
>>>>
>>>> The astute observer will note that "mysim memory display" takes
>>>> physical addresses, not effective addresses. So unless this machine
>>>> has > 12XB of RAM, this access should have failed as there is no
>>>> memory at that address.
>>>>
>>>> Turning MEMORY_OVERFLOW off gives us a much more sensible result:
>>>>
>>>>     systemsim % mysim memory display 0xC0000000000246B8 8
>>>>     Illegal Address 0xC0000000000246B8
>>>>
>>>> It doesn't appear to have any effect on accesses done from Linux, with
>>>> the setting enabled or disabled we still get a machine check for bad
>>>> accesses in real mode:
>>>
>>> With that change applied, on mambo P10 running on a POWER8 I'm getting
>>> the following mambo exception that forbids the kernel to continue booting:
>> 
>> This looks like exactly the kind of thing we want to catch, so that's
>> "good" :)
>> 
>>> [...]
>>> 142233280: (536372251): [    0.001554] printk: bootconsole [udbg0] disabled
>>> 142387801: (537126772): [    0.001870] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
>>> 142412629: (537151600): [    0.001919] pid_max: default: 32768 minimum: 301
>>> 142549799: (537888770): [    0.002187] Mount-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
>>> 142570582: (537909553): [    0.002228] Mountpoint-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
>>> 143544723: (541883694): [    0.004130] EEH: PowerNV platform initialized
>>> 143557364: (541896335): [    0.004155] POWER9 performance monitor hardware support registered
>>> 143629574: (541968545): [    0.004296] rcu: Hierarchical SRCU implementation.
>>> 143904253: (543143224): [    0.004833] smp: Bringing up secondary CPUs ...
>>> WARNING: 145271326: (548660031): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
>>> FATAL ERROR: 145271326: (548660031): Attempt to store non-existent address 0x00000E995A3AF7B0
>>> INFO: 145271326: (548660032): ** Execution stopped: Mambo Error,  **
>>> 145271326: ** finished running 548660032 instructions **
>> 
>> Can you see where the bad store came from, "bt" should give you a backtrace.
>> 
>> Then "p pc" should give you the PC and "di <that value>" should show us
>> what instruction it was.
>
> PC itself is parked, believe it or not, at a 'nop' instruction:

Hmm, maybe we're looking at the wrong CPU/thread?

> 151630615: (157638813): [    0.040390] pstore: Registered nvram as persistent store backend
> 152359208: (158398639): [    0.041813] PCI: Probing PCI hardware
> 152666145: (158718713): [    0.042412] audit: type=2000 audit(1024007219.010:1): state=initialized audit_enabled=0 res=1
> 153055303: (159124575): [    0.043172] cpuidle-powernv: Default stop: psscr = 0x0000000000000300,mask=0x00000000003003ff
> 153074775: (159144887): [    0.043210] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300322,mask=0x00000000003003ff
> 153093510: (159164435): [    0.043247] cpuidle-powernv: First stop level that may lose SPRs = 0x10
> 153108589: (159180151): [    0.043276] cpuidle-powernv: First stop level that may lose timebase = 0x10
> WARNING: 156265663: (162472569): Write_Mapped_Memory_Reg: Unknown address: 0x00000BEE18C96D60, length=8
> FATAL ERROR: 156265663: (162472569): Attempt to store non-existent address 0x00000BEE18C96D60
> INFO: 156265663: (162472570): ** Execution stopped: Mambo Error,  **
> 156265663: ** finished running 162472570 instructions **

I notice this doesn't tell us what CPU caused the bad access.

> systemsim % p pc
> 0xC0000000003A2EC4

"p" implicitly shows CPU 0 unless you've used the "target" command.

> systemsim % di 0xC0000000003A2EC4
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
> EADDR:0xC0000000003A2EC0 RADDR:0x003A2EC0 Enc:0x2C4A007C : dcbt    r0,r9,0
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
> EADDR:0xC0000000003A2EC4 RADDR:0x003A2EC4 Enc:0x00000060 : nop
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
> EADDR:0xC0000000003A2EC8 RADDR:0x003A2EC8 Enc:0x00000060 : nop
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
>
> But just before the 'nop' there is a dcbt. But address passed to the dcbt,
> in GPR 9, doesn't contain anything close to the address displayed by mambo:
>
> systemsim % mysim cpu 0:0:0 display gpr 9
> 0x0000000000000240
> systemsim % mysim mcm 0 cpu 0 thread 0 dtranslate 0x240
> data address translation for 0x0000000000000240 failed
> systemsim %
>
> The dcbt instruction is in mm/slub.c; more context:
>
> 1241956         prefetch(object + s->offset);
> 1241957 c0000000003a2eb4:       20 00 3f 81     lwz     r9,32(r31)
> 1241958         if (unlikely(!x))
> 1241959 c0000000003a2eb8:       15 4a 3a 7d     add.    r9,r26,r9
> 1241960 c0000000003a2ebc:       08 00 82 41     beq     c0000000003a2ec4 <kmem_cache_alloc+0x114>
> 1241961         __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
> 1241962 c0000000003a2ec0:       2c 4a 00 7c     dcbt    0,r9
> 1241963 c0000000003a2ec4:       00 00 00 60     nop
> 1241964 c0000000003a2ec8:       00 00 00 60     nop
>
> so probably from a prefetch_freepointer() in
> https://github.com/torvalds/linux/blob/master/mm/slub.c#L2815

But we shouldn't be prefetching 0x240, that's a userspace address. So I
suspect something has gone wrong with the debug here.

>>> systemsim % c
>>> 145785082: (550674532): [    0.008506] smp: Brought up 2 nodes, 4 CPUs
>>> 145798101: (550687551): [    0.008527] numa: Node 0 CPUs: 0-1
>>> 145810514: (550799964): [    0.008551] numa: Node 1 CPUs: 2-3
>> 
>> Does it still happen with a single CPU?
>
> It still happens with maxcpus=0 or =1. However if I disable radix passing
> disable_radix=1 to command line I'm able to boot.

Hmm OK.

> Today I've built upstream mambo and the same issue happens. I'm clueless
> yet what's happening... so if you have additional things to try let
> me know. It might be an issue with Mambo P10 running on P8.

Just to be clear it doesn't happen with mambo simulating P9 right?

cheers
Gustavo Romero July 2, 2020, 10:52 p.m. UTC | #6
On 7/2/20 4:03 AM, Michael Ellerman wrote:
> Gustavo Romero <gromero@linux.vnet.ibm.com> writes:
>> On 6/28/20 10:27 PM, Michael Ellerman wrote:
>> Today I've built upstream mambo and the same issue happens. I'm clueless
>> yet what's happening... so if you have additional things to try let
>> me know. It might be an issue with Mambo P10 running on P8.
> 
> Just to be clear it doesn't happen with mambo simulating P9 right?

Yes, just on mambo P10. P9 is ok. Also, Gleen could not reproduce it
w/ mambo P10 running a P9 host (I'm using mambo on a P8 host). I'm not
sure if the host matter that much, but I've also tried with the same
kernel image and mambo config that Gleen used and I still hit the
issue on my P8 host.

I'll dig more tomorrow.

Thanks Michael and Oliver.


Cheers,
Gustavo
Oliver O'Halloran Aug. 4, 2020, 7:01 a.m. UTC | #7
On Fri, Jul 3, 2020 at 8:53 AM Gustavo Romero
<gromero@linux.vnet.ibm.com> wrote:
>
> On 7/2/20 4:03 AM, Michael Ellerman wrote:
> > Gustavo Romero <gromero@linux.vnet.ibm.com> writes:
> >> On 6/28/20 10:27 PM, Michael Ellerman wrote:
> >> Today I've built upstream mambo and the same issue happens. I'm clueless
> >> yet what's happening... so if you have additional things to try let
> >> me know. It might be an issue with Mambo P10 running on P8.
> >
> > Just to be clear it doesn't happen with mambo simulating P9 right?
>
> Yes, just on mambo P10. P9 is ok. Also, Gleen could not reproduce it
> w/ mambo P10 running a P9 host (I'm using mambo on a P8 host). I'm not
> sure if the host matter that much, but I've also tried with the same
> kernel image and mambo config that Gleen used and I still hit the
> issue on my P8 host.
>
> I'll dig more tomorrow.

Any updates on this?
Gustavo Romero Aug. 11, 2020, 10 p.m. UTC | #8
Hi Oliver,

On 8/4/20 4:01 AM, Oliver O'Halloran wrote:
> On Fri, Jul 3, 2020 at 8:53 AM Gustavo Romero
> <gromero@linux.vnet.ibm.com> wrote:
>>
>> On 7/2/20 4:03 AM, Michael Ellerman wrote:
>>> Gustavo Romero <gromero@linux.vnet.ibm.com> writes:
>>>> On 6/28/20 10:27 PM, Michael Ellerman wrote:
>>>> Today I've built upstream mambo and the same issue happens. I'm clueless
>>>> yet what's happening... so if you have additional things to try let
>>>> me know. It might be an issue with Mambo P10 running on P8.
>>>
>>> Just to be clear it doesn't happen with mambo simulating P9 right?
>>
>> Yes, just on mambo P10. P9 is ok. Also, Gleen could not reproduce it
>> w/ mambo P10 running a P9 host (I'm using mambo on a P8 host). I'm not
>> sure if the host matter that much, but I've also tried with the same
>> kernel image and mambo config that Gleen used and I still hit the
>> issue on my P8 host.
>>
>> I'll dig more tomorrow.
> 
> Any updates on this?

So the only update I have is that address in GPR 9 used by dcbt is
indeed correct, it happens that the MMU is already enabled, address
0x0000000000000240 is indeed correct, but will be translated... so
unfortunately not much progress since the last time.

I was only able to reproduce it on a given P8 machine we have in our
lab. Another thing (which doesn't make sense at all) is that it only
reproduces if that P8 machine is running the Ubuntu 18.04.4
4.15.0-110-generic stock kernel, so if I boot the machine
with some upstream one (for instance, 5.5.0) the issue doesn't
reproduce. Nonetheless, the kernel on the host should not affect
Mambo whatsoever. I've spent days digging into that issue but could
not find anything wrong with Mambo or with the kernel booted on
Mambo VM.

Additionally, I've noted that that particular P8 machine is crashing
mysteriously from time to time due to what looks like a HW issue,
so I don't trust it that much at this point.

Glenn was also never able to reproduce it at his side.

Thus, I won't hold the change from Michael anymore, I was caught once
by that Mambo feature being silently enabled when working on a soft/hard
lockup and it was quite misleading. Also, it's not how the real HW works.

So please, go ahead :)


Thanks,
Gustavo
diff mbox series

Patch

diff --git a/external/mambo/skiboot.tcl b/external/mambo/skiboot.tcl
index 6003fcbef..264846c22 100644
--- a/external/mambo/skiboot.tcl
+++ b/external/mambo/skiboot.tcl
@@ -122,6 +122,7 @@  myconf config processor/timebase_frequency 1/1
 myconf config enable_pseries_nvram false
 myconf config machine_option/NO_RAM TRUE
 myconf config machine_option/NO_ROM TRUE
+myconf config machine_option/MEMORY_OVERFLOW FALSE
 
 if { $default_config == "PEGASUS" } {
     # We need to be DD2 or greater on p8 for the HILE HID bit.