diff mbox series

Fix hang in pnv_platform_error_reboot path due to TOD failure.

Message ID 155326667329.5850.3915679443209236584.stgit@jupiter
State Accepted
Headers show
Series Fix hang in pnv_platform_error_reboot path due to TOD failure. | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (b392d785eb49630b9f00fef8d17944ed82b2c1fe)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Mahesh J Salgaonkar March 22, 2019, 2:58 p.m. UTC
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

On TOD failure, with TB stuck, when linux heads down to
pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
But with panic cpu stuck inside OPAL, linux never recovers/reboot.

p0 c1 t0
NIA : 0x000000003001dd3c <.time_wait+0x64>
CFAR : 0x000000003001dce4 <.time_wait+0xc>
MSR : 0x9000000002803002
LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>

STACK: SP NIA
0x0000000031c236e0 0x0000000031c23760 (big-endian)
0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>

This is because, there is a while loop towards the end of
ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
with "msg". It loops over time_wait_ms() until exit condition is met. In
normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
a chance to check ipmi response and set sync_msg to NULL.

	while (sync_msg == msg)
                time_wait_ms(10);

But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
returns immediately without calling pollers and hence we end up looping
forever. This patch fixes this hang by calling opal_run_pollers() in TB
failed state as well.

Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 core/timebase.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Vasant Hegde March 27, 2019, 5:01 a.m. UTC | #1
On 03/22/2019 08:28 PM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> On TOD failure, with TB stuck, when linux heads down to
> pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
> cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
> all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
> But with panic cpu stuck inside OPAL, linux never recovers/reboot.
> 
> p0 c1 t0
> NIA : 0x000000003001dd3c <.time_wait+0x64>
> CFAR : 0x000000003001dce4 <.time_wait+0xc>
> MSR : 0x9000000002803002
> LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
> 
> STACK: SP NIA
> 0x0000000031c236e0 0x0000000031c23760 (big-endian)
> 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
> 
> This is because, there is a while loop towards the end of
> ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
> with "msg". It loops over time_wait_ms() until exit condition is met. In
> normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
> a chance to check ipmi response and set sync_msg to NULL.
> 
> 	while (sync_msg == msg)
>                  time_wait_ms(10);
> 
> But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
> returns immediately without calling pollers and hence we end up looping
> forever. This patch fixes this hang by calling opal_run_pollers() in TB
> failed state as well.
> 
> Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
> Cc: skiboot-stable@lists.ozlabs.org
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Looks good to me.

Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>


-Vasant
Stewart Smith March 29, 2019, 4:52 a.m. UTC | #2
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> On TOD failure, with TB stuck, when linux heads down to
> pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
> cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
> all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
> But with panic cpu stuck inside OPAL, linux never recovers/reboot.
>
> p0 c1 t0
> NIA : 0x000000003001dd3c <.time_wait+0x64>
> CFAR : 0x000000003001dce4 <.time_wait+0xc>
> MSR : 0x9000000002803002
> LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
>
> STACK: SP NIA
> 0x0000000031c236e0 0x0000000031c23760 (big-endian)
> 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
>
> This is because, there is a while loop towards the end of
> ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
> with "msg". It loops over time_wait_ms() until exit condition is met. In
> normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
> a chance to check ipmi response and set sync_msg to NULL.
>
> 	while (sync_msg == msg)
>                 time_wait_ms(10);
>
> But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
> returns immediately without calling pollers and hence we end up looping
> forever. This patch fixes this hang by calling opal_run_pollers() in TB
> failed state as well.
>
> Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
> Cc: skiboot-stable@lists.ozlabs.org
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  core/timebase.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/core/timebase.c b/core/timebase.c
> index 0ae11d253..b1a02a4a1 100644
> --- a/core/timebase.c
> +++ b/core/timebase.c
> @@ -29,6 +29,16 @@ static void time_wait_poll(unsigned long duration)
>  	unsigned long period = msecs_to_tb(5);
>  
>  	if (this_cpu()->tb_invalid) {
> +		/*
> +		 * Run pollers to allow some backends to process response.
> +		 *
> +		 * In TOD failure case where TOD is unrecoverable, running
> +		 * pollers allows ipmi backend to deal with ipmi response
> +		 * from bmc and helps ipmi_queue_msg_sync() to get un-stuck.
> +		 * Thus it avoids linux kernel to hang during panic due to
> +		 * TOD failure.
> +		 */
> +		opal_run_pollers();
>  		cpu_relax();
>  		return;
>  	}

Hah! Well, that's obvious once you see it. I guess the bit that only
slightly worries me is if we have any poller that could run and then
depend on timebase or call time_wait_poll() and then do recursive
pollers... Although, since the testing for this didn't explode horribly,
it's probably at least *mostly* okay.

It'd be fun to do some chaos generating tests (lots of I2c, PNOR
activity etc) though and inject TOD errors to see if we could make it
explode... Actually, I'd almost bet on it.

I wonder if we can come up with some ways to help it *not* explode in
that state.

Anyway, merged to master as of 2c71d7032484f4267ca425b5a58c5a6f06a2eaf3.
Mahesh J Salgaonkar March 29, 2019, 8:42 a.m. UTC | #3
On 3/29/19 10:22 AM, Stewart Smith wrote:
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> On TOD failure, with TB stuck, when linux heads down to
>> pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
>> cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
>> all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
>> But with panic cpu stuck inside OPAL, linux never recovers/reboot.
>>
>> p0 c1 t0
>> NIA : 0x000000003001dd3c <.time_wait+0x64>
>> CFAR : 0x000000003001dce4 <.time_wait+0xc>
>> MSR : 0x9000000002803002
>> LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
>>
>> STACK: SP NIA
>> 0x0000000031c236e0 0x0000000031c23760 (big-endian)
>> 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
>> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
>> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
>> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
>> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
>> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
>> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
>> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
>> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
>> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
>> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
>> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
>> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
>> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
>> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
>> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
>> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
>> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
>> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
>> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
>> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
>>
>> This is because, there is a while loop towards the end of
>> ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
>> with "msg". It loops over time_wait_ms() until exit condition is met. In
>> normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
>> a chance to check ipmi response and set sync_msg to NULL.
>>
>> 	while (sync_msg == msg)
>>                 time_wait_ms(10);
>>
>> But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
>> returns immediately without calling pollers and hence we end up looping
>> forever. This patch fixes this hang by calling opal_run_pollers() in TB
>> failed state as well.
>>
>> Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
>> Cc: skiboot-stable@lists.ozlabs.org
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  core/timebase.c |   10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/core/timebase.c b/core/timebase.c
>> index 0ae11d253..b1a02a4a1 100644
>> --- a/core/timebase.c
>> +++ b/core/timebase.c
>> @@ -29,6 +29,16 @@ static void time_wait_poll(unsigned long duration)
>>  	unsigned long period = msecs_to_tb(5);
>>  
>>  	if (this_cpu()->tb_invalid) {
>> +		/*
>> +		 * Run pollers to allow some backends to process response.
>> +		 *
>> +		 * In TOD failure case where TOD is unrecoverable, running
>> +		 * pollers allows ipmi backend to deal with ipmi response
>> +		 * from bmc and helps ipmi_queue_msg_sync() to get un-stuck.
>> +		 * Thus it avoids linux kernel to hang during panic due to
>> +		 * TOD failure.
>> +		 */
>> +		opal_run_pollers();
>>  		cpu_relax();
>>  		return;
>>  	}
> 
> Hah! Well, that's obvious once you see it. I guess the bit that only
> slightly worries me is if we have any poller that could run and then
> depend on timebase or call time_wait_poll() and then do recursive
> pollers... Although, since the testing for this didn't explode horribly,
> it's probably at least *mostly* okay.
> 
> It'd be fun to do some chaos generating tests (lots of I2c, PNOR
> activity etc) though and inject TOD errors to see if we could make it
> explode... Actually, I'd almost bet on it.
> 
> I wonder if we can come up with some ways to help it *not* explode in
> that state.

How about a nested_poller check inside tb_invalid case and return
immediately to stop the recursion.

Thanks,
-Mahesh.

> 
> Anyway, merged to master as of 2c71d7032484f4267ca425b5a58c5a6f06a2eaf3.
>
diff mbox series

Patch

diff --git a/core/timebase.c b/core/timebase.c
index 0ae11d253..b1a02a4a1 100644
--- a/core/timebase.c
+++ b/core/timebase.c
@@ -29,6 +29,16 @@  static void time_wait_poll(unsigned long duration)
 	unsigned long period = msecs_to_tb(5);
 
 	if (this_cpu()->tb_invalid) {
+		/*
+		 * Run pollers to allow some backends to process response.
+		 *
+		 * In TOD failure case where TOD is unrecoverable, running
+		 * pollers allows ipmi backend to deal with ipmi response
+		 * from bmc and helps ipmi_queue_msg_sync() to get un-stuck.
+		 * Thus it avoids linux kernel to hang during panic due to
+		 * TOD failure.
+		 */
+		opal_run_pollers();
 		cpu_relax();
 		return;
 	}