diff mbox

FSP/CONSOLE: Workaround for unresponsive ipmi daemon

Message ID 20170607065024.3918-1-hegdevasant@linux.vnet.ibm.com
State Superseded
Headers show

Commit Message

Vasant Hegde June 7, 2017, 6:50 a.m. UTC
We use TCE mapped area to write data to console. Console header
(fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).

Kernel makes opal_console_write() OPAL call to write data to console.
OPAL write data to TCE mapped area and sends MBOX command to FSP.
If our console becomes full and we have data to write to console,
we keep on waiting until FSP reads data.

In some corner cases, where FSP is active but not responding to
console MBOX message (due to buggy IPMI) and we have heavy console
write happening from kernel, then eventually our console buffer
becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
kernel. Kernel will keep on retrying. This is creating kernel soft
lockups. In some extreme case when every CPU is trying to write to
console, user will not be able to ssh and thinks system is hang.

If we reset FSP or restart IPMI daemon on FSP, system recovers and
everything becomes normal.

This patch adds workaround to above issue by returning OPAL_HARDWARE
when cosole is full. Side effect of this patch is, we may endup dropping
latest console data. But better to drop console data than system hang.

Alternative approach is to drop old data from console buffer, make space
for new data. But in normal condition only FSP can update 'next_out'
pointer and if we touch that pointer, it may introduce some other
race conditions. Hence we decided to just new console write request.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 hw/fsp/fsp-console.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Vasant Hegde June 8, 2017, 6:54 a.m. UTC | #1
On 06/07/2017 12:20 PM, Vasant Hegde wrote:
> We use TCE mapped area to write data to console. Console header
> (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
> next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
>
> Kernel makes opal_console_write() OPAL call to write data to console.
> OPAL write data to TCE mapped area and sends MBOX command to FSP.
> If our console becomes full and we have data to write to console,
> we keep on waiting until FSP reads data.
>
> In some corner cases, where FSP is active but not responding to
> console MBOX message (due to buggy IPMI) and we have heavy console
> write happening from kernel, then eventually our console buffer
> becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
> kernel. Kernel will keep on retrying. This is creating kernel soft
> lockups. In some extreme case when every CPU is trying to write to
> console, user will not be able to ssh and thinks system is hang.
>
> If we reset FSP or restart IPMI daemon on FSP, system recovers and
> everything becomes normal.
>
> This patch adds workaround to above issue by returning OPAL_HARDWARE
> when cosole is full. Side effect of this patch is, we may endup dropping
> latest console data. But better to drop console data than system hang.
>
> Alternative approach is to drop old data from console buffer, make space
> for new data. But in normal condition only FSP can update 'next_out'
> pointer and if we touch that pointer, it may introduce some other
> race conditions. Hence we decided to just new console write request.

Stewart,

We have to backport this patch to 860.30 release as well.

I think it will apply cleanly on 830.30 branch. Let me know in case if you want 
me to send
backported patch for 860.30 series.

-Vasant
Stewart Smith June 9, 2017, 5:10 a.m. UTC | #2
Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:

> We use TCE mapped area to write data to console. Console header
> (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
> next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
>
> Kernel makes opal_console_write() OPAL call to write data to console.
> OPAL write data to TCE mapped area and sends MBOX command to FSP.
> If our console becomes full and we have data to write to console,
> we keep on waiting until FSP reads data.
>
> In some corner cases, where FSP is active but not responding to
> console MBOX message (due to buggy IPMI) and we have heavy console
> write happening from kernel, then eventually our console buffer
> becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
> kernel. Kernel will keep on retrying. This is creating kernel soft
> lockups. In some extreme case when every CPU is trying to write to
> console, user will not be able to ssh and thinks system is hang.
>
> If we reset FSP or restart IPMI daemon on FSP, system recovers and
> everything becomes normal.
>
> This patch adds workaround to above issue by returning OPAL_HARDWARE
> when cosole is full. Side effect of this patch is, we may endup dropping
> latest console data. But better to drop console data than system hang.
>
> Alternative approach is to drop old data from console buffer, make space
> for new data. But in normal condition only FSP can update 'next_out'
> pointer and if we touch that pointer, it may introduce some other
> race conditions. Hence we decided to just new console write request.
>
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> ---
>  hw/fsp/fsp-console.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/fsp/fsp-console.c b/hw/fsp/fsp-console.c
> index fd67b20..a7b0281 100644
> --- a/hw/fsp/fsp-console.c
> +++ b/hw/fsp/fsp-console.c
> @@ -610,7 +610,7 @@ static int64_t fsp_console_write(int64_t term_number, int64_t *length,
>  	*length = written;
>  	unlock(&fsp_con_lock);
>
> -	return written ? OPAL_SUCCESS : OPAL_BUSY_EVENT;
> +	return written ? OPAL_SUCCESS : OPAL_HARDWARE;
>  }
>
>  static int64_t fsp_console_write_buffer_space(int64_t term_number,

I've been thinking about this problem a bit... and I'm not quite
convinced that this is the best solution. This would have us start
dropping console output fairly soon after the FSP slows down or stops
responding for a bit.....

What about an approach where we start returting OPAL_HARDWARE if we haven't
seen any progress from the FSP in, say, a second or something? (and
probably also log an error log and/or do a HIR?)

That way, if the FSP is just heavily loaded and being a bit slow, it
should all be okay, and not potentially loose important console
information (e.g. panic(), bug, oops etc).

thoughts?
Stewart Smith June 9, 2017, 5:13 a.m. UTC | #3
Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> On 06/07/2017 12:20 PM, Vasant Hegde wrote:
>> We use TCE mapped area to write data to console. Console header
>> (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
>> next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
>>
>> Kernel makes opal_console_write() OPAL call to write data to console.
>> OPAL write data to TCE mapped area and sends MBOX command to FSP.
>> If our console becomes full and we have data to write to console,
>> we keep on waiting until FSP reads data.
>>
>> In some corner cases, where FSP is active but not responding to
>> console MBOX message (due to buggy IPMI) and we have heavy console
>> write happening from kernel, then eventually our console buffer
>> becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
>> kernel. Kernel will keep on retrying. This is creating kernel soft
>> lockups. In some extreme case when every CPU is trying to write to
>> console, user will not be able to ssh and thinks system is hang.
>>
>> If we reset FSP or restart IPMI daemon on FSP, system recovers and
>> everything becomes normal.
>>
>> This patch adds workaround to above issue by returning OPAL_HARDWARE
>> when cosole is full. Side effect of this patch is, we may endup dropping
>> latest console data. But better to drop console data than system hang.
>>
>> Alternative approach is to drop old data from console buffer, make space
>> for new data. But in normal condition only FSP can update 'next_out'
>> pointer and if we touch that pointer, it may introduce some other
>> race conditions. Hence we decided to just new console write request.
>
> Stewart,
>
> We have to backport this patch to 860.30 release as well.
>
> I think it will apply cleanly on 830.30 branch. Let me know in case if you want 
> me to send
> backported patch for 860.30 series.

860.30 is on skiboot 5.4.x
Vasant Hegde June 9, 2017, 5:19 a.m. UTC | #4
On 06/09/2017 10:40 AM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
>
>> We use TCE mapped area to write data to console. Console header
>> (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
>> next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
>>
>> Kernel makes opal_console_write() OPAL call to write data to console.
>> OPAL write data to TCE mapped area and sends MBOX command to FSP.
>> If our console becomes full and we have data to write to console,
>> we keep on waiting until FSP reads data.
>>
>> In some corner cases, where FSP is active but not responding to
>> console MBOX message (due to buggy IPMI) and we have heavy console
>> write happening from kernel, then eventually our console buffer
>> becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
>> kernel. Kernel will keep on retrying. This is creating kernel soft
>> lockups. In some extreme case when every CPU is trying to write to
>> console, user will not be able to ssh and thinks system is hang.
>>
>> If we reset FSP or restart IPMI daemon on FSP, system recovers and
>> everything becomes normal.
>>
>> This patch adds workaround to above issue by returning OPAL_HARDWARE
>> when cosole is full. Side effect of this patch is, we may endup dropping
>> latest console data. But better to drop console data than system hang.
>>
>> Alternative approach is to drop old data from console buffer, make space
>> for new data. But in normal condition only FSP can update 'next_out'
>> pointer and if we touch that pointer, it may introduce some other
>> race conditions. Hence we decided to just new console write request.
>>
>> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
>> ---
>>  hw/fsp/fsp-console.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/fsp/fsp-console.c b/hw/fsp/fsp-console.c
>> index fd67b20..a7b0281 100644
>> --- a/hw/fsp/fsp-console.c
>> +++ b/hw/fsp/fsp-console.c
>> @@ -610,7 +610,7 @@ static int64_t fsp_console_write(int64_t term_number, int64_t *length,
>>  	*length = written;
>>  	unlock(&fsp_con_lock);
>>
>> -	return written ? OPAL_SUCCESS : OPAL_BUSY_EVENT;
>> +	return written ? OPAL_SUCCESS : OPAL_HARDWARE;
>>  }
>>
>>  static int64_t fsp_console_write_buffer_space(int64_t term_number,
>
> I've been thinking about this problem a bit... and I'm not quite
> convinced that this is the best solution. This would have us start
> dropping console output fairly soon after the FSP slows down or stops
> responding for a bit.....

We have 128K buffer. Slow response from FSP is fine. We hit this issue only when 
FSP stops responding to console message.

>
> What about an approach where we start returting OPAL_HARDWARE if we haven't
> seen any progress from the FSP in, say, a second or something? (and

We may still have space in buffer right? So lets use it until it becomes full.

> probably also log an error log and/or do a HIR?)

That's good option. Log error and start Host initiated Reset.

-Vasant
Vasant Hegde June 9, 2017, 5:20 a.m. UTC | #5
>> We have to backport this patch to 860.30 release as well.
>>
>> I think it will apply cleanly on 830.30 branch. Let me know in case if you want
>> me to send
>> backported patch for 860.30 series.
>
> 860.30 is on skiboot 5.4.x
>

Ah yes. I will verify on 5.4.x branch as well.

-Vasant
Vaidyanathan Srinivasan June 9, 2017, 2:08 p.m. UTC | #6
* Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [2017-06-09 10:49:19]:

> On 06/09/2017 10:40 AM, Stewart Smith wrote:
> > Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> > 
> > > We use TCE mapped area to write data to console. Console header
> > > (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
> > > next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
> > > 
> > > Kernel makes opal_console_write() OPAL call to write data to console.
> > > OPAL write data to TCE mapped area and sends MBOX command to FSP.
> > > If our console becomes full and we have data to write to console,
> > > we keep on waiting until FSP reads data.
> > > 
> > > In some corner cases, where FSP is active but not responding to
> > > console MBOX message (due to buggy IPMI) and we have heavy console
> > > write happening from kernel, then eventually our console buffer
> > > becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
> > > kernel. Kernel will keep on retrying. This is creating kernel soft
> > > lockups. In some extreme case when every CPU is trying to write to
> > > console, user will not be able to ssh and thinks system is hang.
> > > 
> > > If we reset FSP or restart IPMI daemon on FSP, system recovers and
> > > everything becomes normal.
> > > 
> > > This patch adds workaround to above issue by returning OPAL_HARDWARE
> > > when cosole is full. Side effect of this patch is, we may endup dropping
> > > latest console data. But better to drop console data than system hang.
> > > 
> > > Alternative approach is to drop old data from console buffer, make space
> > > for new data. But in normal condition only FSP can update 'next_out'
> > > pointer and if we touch that pointer, it may introduce some other
> > > race conditions. Hence we decided to just new console write request.
> > > 
> > > Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

> > > ---
> > >  hw/fsp/fsp-console.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/fsp/fsp-console.c b/hw/fsp/fsp-console.c
> > > index fd67b20..a7b0281 100644
> > > --- a/hw/fsp/fsp-console.c
> > > +++ b/hw/fsp/fsp-console.c
> > > @@ -610,7 +610,7 @@ static int64_t fsp_console_write(int64_t term_number, int64_t *length,
> > >  	*length = written;
> > >  	unlock(&fsp_con_lock);
> > > 
> > > -	return written ? OPAL_SUCCESS : OPAL_BUSY_EVENT;
> > > +	return written ? OPAL_SUCCESS : OPAL_HARDWARE;
> > >  }
> > > 
> > >  static int64_t fsp_console_write_buffer_space(int64_t term_number,
> > 
> > I've been thinking about this problem a bit... and I'm not quite
> > convinced that this is the best solution. This would have us start
> > dropping console output fairly soon after the FSP slows down or stops
> > responding for a bit.....
> 
> We have 128K buffer. Slow response from FSP is fine. We hit this issue only
> when FSP stops responding to console message.

Allowing the 128k console buffer in OPAL fill is good enough time.  We
do not need additional time out.

> > 
> > What about an approach where we start returting OPAL_HARDWARE if we haven't
> > seen any progress from the FSP in, say, a second or something? (and
> 
> We may still have space in buffer right? So lets use it until it becomes full.

As long as surveillance messages are going through, then FSP is deemed
to be active and mbox is functional.  Hence allowing the buffer to be
full takes quite a bit of time which will be equivalent to FSP not
responding to console print requests for few seconds.  Hence I feel we
do not need to add a time based warning, just add an error when this
buffer is full and also indicate that console data may be lost. 

> 
> > probably also log an error log and/or do a HIR?)
> 
> That's good option. Log error and start Host initiated Reset.

Log an error, but avoid host initiated RR for this case.  That is
a too heavy hammer for restoring console data flow.  As per the spec,
as long as surveillance heart beat works, host should not initiate FSP
reset.

FSP also has various means to restart IPMI daemon and other userspace
to resolve any data flow problem.  Issuing a FSP reset could makes it hard
to debug and further hitting too many FSP resets will run up other
limits.

--Vaidy
diff mbox

Patch

diff --git a/hw/fsp/fsp-console.c b/hw/fsp/fsp-console.c
index fd67b20..a7b0281 100644
--- a/hw/fsp/fsp-console.c
+++ b/hw/fsp/fsp-console.c
@@ -610,7 +610,7 @@  static int64_t fsp_console_write(int64_t term_number, int64_t *length,
 	*length = written;
 	unlock(&fsp_con_lock);
 
-	return written ? OPAL_SUCCESS : OPAL_BUSY_EVENT;
+	return written ? OPAL_SUCCESS : OPAL_HARDWARE;
 }
 
 static int64_t fsp_console_write_buffer_space(int64_t term_number,