hw/fsp/rtc: read/write cached rtc tod on fsp hir.

Submitted by ppaidipe@linux.vnet.ibm.com on April 3, 2017, 1:57 a.m.

Details

Message ID 1491184641-19738-1-git-send-email-ppaidipe@linux.vnet.ibm.com
State New
Headers show

Commit Message

ppaidipe@linux.vnet.ibm.com April 3, 2017, 1:57 a.m.
Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
reset. Use latest fsp_in_rr() function to properly read the cached rtc
value when fsp reset initiated by the hir.

Below is the kernel trace when we set hw clock, when hir process starts.

[ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
[ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
[ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
[ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
[ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
[ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
[ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 1727.775889]   CR: 28024442  XER: 20000000
[ 1727.775890] CFAR: c00000000008472c SOFTE: 1
               GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
               GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
               GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
               GPR12: c0000000000846e8 c00000000fba0100
[ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
[ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
[ 1727.775899] Call Trace:
[ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
[ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
[ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
[ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
[ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
[ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
[ 1727.775908] Instruction dump:
[ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
[ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4

This is found when executing the testcase
https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py

With this fix ran fsp hir torture testcase in the above test
which is working fine.

Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
---
 hw/fsp/fsp-rtc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Ananth N Mavinakayanahalli April 4, 2017, 8:33 a.m.
On Mon, Apr 03, 2017 at 07:27:21AM +0530, Pridhiviraj Paidipeddi wrote:
> Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
> reset. Use latest fsp_in_rr() function to properly read the cached rtc
> value when fsp reset initiated by the hir.
> 
> Below is the kernel trace when we set hw clock, when hir process starts.
> 
> [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
> [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
> [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
> [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
> [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
> [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
> [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 1727.775889]   CR: 28024442  XER: 20000000
> [ 1727.775890] CFAR: c00000000008472c SOFTE: 1
>                GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
>                GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
>                GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
>                GPR12: c0000000000846e8 c00000000fba0100
> [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
> [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
> [ 1727.775899] Call Trace:
> [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
> [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
> [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
> [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
> [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
> [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
> [ 1727.775908] Instruction dump:
> [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
> [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
> 
> This is found when executing the testcase
> https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
> 
> With this fix ran fsp hir torture testcase in the above test
> which is working fine.
> 
> Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>

Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>

This will work, but we will need to audit the other FSP_RESET_START
cases also.

A particular sequence of actions (with timeouts) need to be executed to
put an FSP in a state ready to be reset (HIR case). The actual RESET_START
notification is sent after the HIR sequence actually triggers the FSP
reset. The window between the HIR sequence start to the actual
notification is where this problem can occur.

Patch hide | download patch | download mbox

diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c
index df0f679..b908ce9 100644
--- a/hw/fsp/fsp-rtc.c
+++ b/hw/fsp/fsp-rtc.c
@@ -280,7 +280,7 @@  static int64_t fsp_opal_rtc_read(uint32_t *year_month_day,
 	}
 
 	/* During R/R of FSP, read cached TOD */
-	if (fsp_in_reset) {
+	if (fsp_in_rr()) {
 		if (rtc_tod_state == RTC_TOD_VALID) {
 			rtc_cache_get_datetime(year_month_day,
 					       hour_minute_second_millisecond);
@@ -362,7 +362,7 @@  static int64_t fsp_rtc_send_write_request(uint32_t year_month_day,
 	}
 	prlog(PR_TRACE, " -> req at %p\n", msg);
 
-	if (fsp_in_reset) {
+	if (fsp_in_rr()) {
 		datetime_to_tm(msg->data.words[0],
 			       (u64) msg->data.words[1] << 32,  &tm);
 		rtc_cache_update(&tm);