diff mbox

[v2] fsp/rtc: ignore TOD_RESET status

Message ID m3io51t6u0.fsf@oc8180480414.ibm.com
State Superseded
Headers show

Commit Message

Stewart Smith Nov. 17, 2015, 7:37 a.m. UTC
Cédric Le Goater <clg@fr.ibm.com> writes:
> When the timeofday is considered INVALID by the FSP :
>
> 	System time is INVALID: 2015/10/14 20:13:53.305261
>
> it will return a FSP_STATUS_TOD_RESET (0xA9) to any TOD read
> request done by OPAL. This status stays invalid until a write
> request is done.
>
> This is a problem for the kernel as the OPAL_RTC_READ calls
> are done in loops. The consequence is that the skiroot kernel
> hangs quite early in the boot looping on : 
>
> 	[9163642153,3] RTC TOD in invalid state
> 	[9186642687,3] RTC TOD in invalid state
> 	[9204379669,3] RTC TOD in invalid state
> 	[9226043775,3] RTC TOD in invalid state
> 	[9240944791,3] RTC TOD in invalid state
>
> with other unpleasant consequences following. The system is
> left unusable and it is difficult for the user to know what
> is happening.
>
> This patch modifies the behavior of the read request when an
> invalid status is received. It simply ignores it and considers
> the read as successful. 
>
> This breaks the loops done in the kernel on OPAL_RTC_READ calls
> and lets the boot complete. The status is expected to become
> valid after the first TOD write request. 

The loop in question is likely drivers/rtc/rtc-opal.c:

static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
{
...
	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
		rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
		if (rc == OPAL_BUSY_EVENT)
			opal_poll_events(NULL);
		else
			msleep(10);
	}

	if (rc != OPAL_SUCCESS)
		return -EIO;
...
}

I don't think this is quite the right way to fix it though.

I think this was introduced by 6cf8b663e7d7cb1e827b6d9c90e694ea583f6f87
where I refactored fsp-rtc to use a state machine over holding locks
(and then we'd have problems with pollers).

So what I think is the right fix is:

This should mean we return OPAL_INTERNAL_ERROR on TOD error from FSP
which will adequately break out of the loop and return EIO to whoverer
was asking for RTC value from the kernel.

does that look right?

Do you have any way to make the FSP give us an invalid time?

Comments

Cédric Le Goater Nov. 17, 2015, 3:10 p.m. UTC | #1
On 11/17/2015 08:37 AM, Stewart Smith wrote:
> Cédric Le Goater <clg@fr.ibm.com> writes:
>> When the timeofday is considered INVALID by the FSP :
>>
>> 	System time is INVALID: 2015/10/14 20:13:53.305261
>>
>> it will return a FSP_STATUS_TOD_RESET (0xA9) to any TOD read
>> request done by OPAL. This status stays invalid until a write
>> request is done.
>>
>> This is a problem for the kernel as the OPAL_RTC_READ calls
>> are done in loops. The consequence is that the skiroot kernel
>> hangs quite early in the boot looping on : 
>>
>> 	[9163642153,3] RTC TOD in invalid state
>> 	[9186642687,3] RTC TOD in invalid state
>> 	[9204379669,3] RTC TOD in invalid state
>> 	[9226043775,3] RTC TOD in invalid state
>> 	[9240944791,3] RTC TOD in invalid state
>>
>> with other unpleasant consequences following. The system is
>> left unusable and it is difficult for the user to know what
>> is happening.
>>
>> This patch modifies the behavior of the read request when an
>> invalid status is received. It simply ignores it and considers
>> the read as successful. 
>>
>> This breaks the loops done in the kernel on OPAL_RTC_READ calls
>> and lets the boot complete. The status is expected to become
>> valid after the first TOD write request. 
> 
> The loop in question is likely drivers/rtc/rtc-opal.c:
> 
> static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
> {
> ...
> 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> 		rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
> 		if (rc == OPAL_BUSY_EVENT)
> 			opal_poll_events(NULL);
> 		else
> 			msleep(10);
> 	}
> 
> 	if (rc != OPAL_SUCCESS)
> 		return -EIO;
> ...
> }
> 
> I don't think this is quite the right way to fix it though.
> 
> I think this was introduced by 6cf8b663e7d7cb1e827b6d9c90e694ea583f6f87
> where I refactored fsp-rtc to use a state machine over holding locks
> (and then we'd have problems with pollers).
> 
> So what I think is the right fix is:
> 
> diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c
> index 6b982179d542..df0f67940372 100644
> --- a/hw/fsp/fsp-rtc.c
> +++ b/hw/fsp/fsp-rtc.c
> @@ -173,19 +173,16 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp)
>                 log_simple_error(&e_info(OPAL_RC_RTC_TOD),
>                                 "RTC TOD in invalid state\n");
>                 rtc_tod_state = RTC_TOD_INVALID;
> -               rtc_read_request_state = RTC_READ_NO_REQUEST;
>                 break;
>  
>         case FSP_STATUS_TOD_PERMANENT_ERROR:
>                 log_simple_error(&e_info(OPAL_RC_RTC_TOD),
>                         "RTC TOD in permanent error state\n");
>                 rtc_tod_state = RTC_TOD_PERMANENT_ERROR;
> -               rtc_read_request_state = RTC_READ_NO_REQUEST;
>                 break;
>  
>         case FSP_STATUS_SUCCESS:
>                 /* Save the read RTC value in our cache */
> -               rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
>                 rtc_tod_state = RTC_TOD_VALID;
>                 datetime_to_tm(read_resp->data.words[0],
>                                (u64) read_resp->data.words[1] << 32, &tm);
> @@ -199,8 +196,8 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp)
>                 log_simple_error(&e_info(OPAL_RC_RTC_TOD),
>                                 "RTC TOD read failed: %d\n", val);
>                 rtc_tod_state = RTC_TOD_INVALID;
> -               rtc_read_request_state = RTC_READ_NO_REQUEST;
>         }
> +       rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
>  }
>  
> 
> This should mean we return OPAL_INTERNAL_ERROR on TOD error from FSP
> which will adequately break out of the loop and return EIO to whoverer
> was asking for RTC value from the kernel.
> 
> does that look right?

Yes. it does. 

> Do you have any way to make the FSP give us an invalid time?

Yes. I found a special command for rtim. Go on the FSP and run :

	$ rtim  forceClockValue 
	Forcing clock to value specified (0), linuxToo(0)  ELOG:0 
	$ rtim timeofday 

	System time is INVALID: 1970/01/01 00:00:04.978498

You should see in the opal logs something like :

	[8904001884,8] Getting initial RTC TOD
	[8941797435,8] RTC completion 0x307fe928
	[8941799832,3] RTC TOD in invalid state
	[15313006986,8] RTC read complete, state 1
	[37883060390,8] Sending new RTC read request
	[37889469947,8] RTC completion 0x30afc350
	[37889472230,3] RTC TOD in invalid state
	[37889493180,8] RTC read complete, state 1
	[44953190290,8] Sending new RTC read request
	[44959735572,8] RTC completion 0x30afc350
	[44959737978,3] RTC TOD in invalid state
	[44959765390,8] RTC read complete, state 1
	[60946799142,8] Sending new RTC write request
	[60978712531,8] RTC completion 0x30afc350
	[60984783166,8] Sending new RTC read request
	[60998259263,8] RTC completion 0x30afc350
	[60998260514,8] FSP-RTC Got time: 2015-10-17 13:7:15
	[60998272000,8] RTC read complete, state 0
	[60998273063,8] FSP-RTC Cached datetime: 20151117 1307150000000000



Thanks,


C.
diff mbox

Patch

diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c
index 6b982179d542..df0f67940372 100644
--- a/hw/fsp/fsp-rtc.c
+++ b/hw/fsp/fsp-rtc.c
@@ -173,19 +173,16 @@  static void fsp_rtc_process_read(struct fsp_msg *read_resp)
                log_simple_error(&e_info(OPAL_RC_RTC_TOD),
                                "RTC TOD in invalid state\n");
                rtc_tod_state = RTC_TOD_INVALID;
-               rtc_read_request_state = RTC_READ_NO_REQUEST;
                break;
 
        case FSP_STATUS_TOD_PERMANENT_ERROR:
                log_simple_error(&e_info(OPAL_RC_RTC_TOD),
                        "RTC TOD in permanent error state\n");
                rtc_tod_state = RTC_TOD_PERMANENT_ERROR;
-               rtc_read_request_state = RTC_READ_NO_REQUEST;
                break;
 
        case FSP_STATUS_SUCCESS:
                /* Save the read RTC value in our cache */
-               rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
                rtc_tod_state = RTC_TOD_VALID;
                datetime_to_tm(read_resp->data.words[0],
                               (u64) read_resp->data.words[1] << 32, &tm);
@@ -199,8 +196,8 @@  static void fsp_rtc_process_read(struct fsp_msg *read_resp)
                log_simple_error(&e_info(OPAL_RC_RTC_TOD),
                                "RTC TOD read failed: %d\n", val);
                rtc_tod_state = RTC_TOD_INVALID;
-               rtc_read_request_state = RTC_READ_NO_REQUEST;
        }
+       rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
 }