Message ID | m3io51t6u0.fsf@oc8180480414.ibm.com |
---|---|
State | Superseded |
Headers | show |
On 11/17/2015 08:37 AM, Stewart Smith wrote: > Cédric Le Goater <clg@fr.ibm.com> writes: >> When the timeofday is considered INVALID by the FSP : >> >> System time is INVALID: 2015/10/14 20:13:53.305261 >> >> it will return a FSP_STATUS_TOD_RESET (0xA9) to any TOD read >> request done by OPAL. This status stays invalid until a write >> request is done. >> >> This is a problem for the kernel as the OPAL_RTC_READ calls >> are done in loops. The consequence is that the skiroot kernel >> hangs quite early in the boot looping on : >> >> [9163642153,3] RTC TOD in invalid state >> [9186642687,3] RTC TOD in invalid state >> [9204379669,3] RTC TOD in invalid state >> [9226043775,3] RTC TOD in invalid state >> [9240944791,3] RTC TOD in invalid state >> >> with other unpleasant consequences following. The system is >> left unusable and it is difficult for the user to know what >> is happening. >> >> This patch modifies the behavior of the read request when an >> invalid status is received. It simply ignores it and considers >> the read as successful. >> >> This breaks the loops done in the kernel on OPAL_RTC_READ calls >> and lets the boot complete. The status is expected to become >> valid after the first TOD write request. > > The loop in question is likely drivers/rtc/rtc-opal.c: > > static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm) > { > ... > while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { > rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms); > if (rc == OPAL_BUSY_EVENT) > opal_poll_events(NULL); > else > msleep(10); > } > > if (rc != OPAL_SUCCESS) > return -EIO; > ... > } > > I don't think this is quite the right way to fix it though. > > I think this was introduced by 6cf8b663e7d7cb1e827b6d9c90e694ea583f6f87 > where I refactored fsp-rtc to use a state machine over holding locks > (and then we'd have problems with pollers). > > So what I think is the right fix is: > > diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c > index 6b982179d542..df0f67940372 100644 > --- a/hw/fsp/fsp-rtc.c > +++ b/hw/fsp/fsp-rtc.c > @@ -173,19 +173,16 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp) > log_simple_error(&e_info(OPAL_RC_RTC_TOD), > "RTC TOD in invalid state\n"); > rtc_tod_state = RTC_TOD_INVALID; > - rtc_read_request_state = RTC_READ_NO_REQUEST; > break; > > case FSP_STATUS_TOD_PERMANENT_ERROR: > log_simple_error(&e_info(OPAL_RC_RTC_TOD), > "RTC TOD in permanent error state\n"); > rtc_tod_state = RTC_TOD_PERMANENT_ERROR; > - rtc_read_request_state = RTC_READ_NO_REQUEST; > break; > > case FSP_STATUS_SUCCESS: > /* Save the read RTC value in our cache */ > - rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE; > rtc_tod_state = RTC_TOD_VALID; > datetime_to_tm(read_resp->data.words[0], > (u64) read_resp->data.words[1] << 32, &tm); > @@ -199,8 +196,8 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp) > log_simple_error(&e_info(OPAL_RC_RTC_TOD), > "RTC TOD read failed: %d\n", val); > rtc_tod_state = RTC_TOD_INVALID; > - rtc_read_request_state = RTC_READ_NO_REQUEST; > } > + rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE; > } > > > This should mean we return OPAL_INTERNAL_ERROR on TOD error from FSP > which will adequately break out of the loop and return EIO to whoverer > was asking for RTC value from the kernel. > > does that look right? Yes. it does. > Do you have any way to make the FSP give us an invalid time? Yes. I found a special command for rtim. Go on the FSP and run : $ rtim forceClockValue Forcing clock to value specified (0), linuxToo(0) ELOG:0 $ rtim timeofday System time is INVALID: 1970/01/01 00:00:04.978498 You should see in the opal logs something like : [8904001884,8] Getting initial RTC TOD [8941797435,8] RTC completion 0x307fe928 [8941799832,3] RTC TOD in invalid state [15313006986,8] RTC read complete, state 1 [37883060390,8] Sending new RTC read request [37889469947,8] RTC completion 0x30afc350 [37889472230,3] RTC TOD in invalid state [37889493180,8] RTC read complete, state 1 [44953190290,8] Sending new RTC read request [44959735572,8] RTC completion 0x30afc350 [44959737978,3] RTC TOD in invalid state [44959765390,8] RTC read complete, state 1 [60946799142,8] Sending new RTC write request [60978712531,8] RTC completion 0x30afc350 [60984783166,8] Sending new RTC read request [60998259263,8] RTC completion 0x30afc350 [60998260514,8] FSP-RTC Got time: 2015-10-17 13:7:15 [60998272000,8] RTC read complete, state 0 [60998273063,8] FSP-RTC Cached datetime: 20151117 1307150000000000 Thanks, C.
diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c index 6b982179d542..df0f67940372 100644 --- a/hw/fsp/fsp-rtc.c +++ b/hw/fsp/fsp-rtc.c @@ -173,19 +173,16 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp) log_simple_error(&e_info(OPAL_RC_RTC_TOD), "RTC TOD in invalid state\n"); rtc_tod_state = RTC_TOD_INVALID; - rtc_read_request_state = RTC_READ_NO_REQUEST; break; case FSP_STATUS_TOD_PERMANENT_ERROR: log_simple_error(&e_info(OPAL_RC_RTC_TOD), "RTC TOD in permanent error state\n"); rtc_tod_state = RTC_TOD_PERMANENT_ERROR; - rtc_read_request_state = RTC_READ_NO_REQUEST; break; case FSP_STATUS_SUCCESS: /* Save the read RTC value in our cache */ - rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE; rtc_tod_state = RTC_TOD_VALID; datetime_to_tm(read_resp->data.words[0], (u64) read_resp->data.words[1] << 32, &tm); @@ -199,8 +196,8 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp) log_simple_error(&e_info(OPAL_RC_RTC_TOD), "RTC TOD read failed: %d\n", val); rtc_tod_state = RTC_TOD_INVALID; - rtc_read_request_state = RTC_READ_NO_REQUEST; } + rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE; }