diff mbox

hmi: Fix a bug where partial hmi event was reported to host.

Message ID 20160322101839.18759.14744.stgit@mars
State Accepted
Headers show

Commit Message

Mahesh J Salgaonkar March 22, 2016, 10:19 a.m. UTC
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

The current code sends partial hmi event (4 * 64bits instead of
5 * 64bits) to host. The last 64 bits contains chip id/pir info for
reporting checkstop events. This bug affects only checkstop events.

Host console o/p without this patch:

[  305.628283] Fatal Hypervisor Maintenance interrupt [Not recovered]
[  305.628341]  Error detail: Malfunction Alert
[  305.628388] 	HMER: 8040000000000000
[  305.628423] 	CPU PIR: 00000000
[  305.628458] 	[Unit: VSU] Logic core check stop


Host console o/p with this patch:

[  200.122883] Fatal Hypervisor Maintenance interrupt [Not recovered]
[  200.122941]  Error detail: Malfunction Alert
[  200.122986] 	HMER: 8040000000000000
[  200.123021] 	CPU PIR: 000008e8
[  200.123055] 	[Unit: VSU] Logic core check stop

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 core/hmi.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Comments

Stewart Smith March 31, 2016, 6:37 a.m. UTC | #1
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> The current code sends partial hmi event (4 * 64bits instead of
> 5 * 64bits) to host. The last 64 bits contains chip id/pir info for
> reporting checkstop events. This bug affects only checkstop events.
>
> Host console o/p without this patch:
>
> [  305.628283] Fatal Hypervisor Maintenance interrupt [Not recovered]
> [  305.628341]  Error detail: Malfunction Alert
> [  305.628388] 	HMER: 8040000000000000
> [  305.628423] 	CPU PIR: 00000000
> [  305.628458] 	[Unit: VSU] Logic core check stop
>
>
> Host console o/p with this patch:
>
> [  200.122883] Fatal Hypervisor Maintenance interrupt [Not recovered]
> [  200.122941]  Error detail: Malfunction Alert
> [  200.122986] 	HMER: 8040000000000000
> [  200.123021] 	CPU PIR: 000008e8
> [  200.123055] 	[Unit: VSU] Logic core check stop
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This looks like it should also go to stable too, right? As in, to 5.1.x
and 5.2.x ?
Mahesh J Salgaonkar March 31, 2016, 6:44 a.m. UTC | #2
On 03/31/2016 12:07 PM, Stewart Smith wrote:
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> The current code sends partial hmi event (4 * 64bits instead of
>> 5 * 64bits) to host. The last 64 bits contains chip id/pir info for
>> reporting checkstop events. This bug affects only checkstop events.
>>
>> Host console o/p without this patch:
>>
>> [  305.628283] Fatal Hypervisor Maintenance interrupt [Not recovered]
>> [  305.628341]  Error detail: Malfunction Alert
>> [  305.628388] 	HMER: 8040000000000000
>> [  305.628423] 	CPU PIR: 00000000
>> [  305.628458] 	[Unit: VSU] Logic core check stop
>>
>>
>> Host console o/p with this patch:
>>
>> [  200.122883] Fatal Hypervisor Maintenance interrupt [Not recovered]
>> [  200.122941]  Error detail: Malfunction Alert
>> [  200.122986] 	HMER: 8040000000000000
>> [  200.123021] 	CPU PIR: 000008e8
>> [  200.123055] 	[Unit: VSU] Logic core check stop
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> This looks like it should also go to stable too, right? As in, to 5.1.x
> and 5.2.x ?
> 

Yes.
Stewart Smith April 1, 2016, 3:59 a.m. UTC | #3
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> The current code sends partial hmi event (4 * 64bits instead of
> 5 * 64bits) to host. The last 64 bits contains chip id/pir info for
> reporting checkstop events. This bug affects only checkstop events.
>
> Host console o/p without this patch:
>
> [  305.628283] Fatal Hypervisor Maintenance interrupt [Not recovered]
> [  305.628341]  Error detail: Malfunction Alert
> [  305.628388] 	HMER: 8040000000000000
> [  305.628423] 	CPU PIR: 00000000
> [  305.628458] 	[Unit: VSU] Logic core check stop
>
>
> Host console o/p with this patch:
>
> [  200.122883] Fatal Hypervisor Maintenance interrupt [Not recovered]
> [  200.122941]  Error detail: Malfunction Alert
> [  200.122986] 	HMER: 8040000000000000
> [  200.123021] 	CPU PIR: 000008e8
> [  200.123055] 	[Unit: VSU] Logic core check stop
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  core/hmi.c |   15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)

Thanks, applied to:
skiboot-5.1.x as of 2636009
skiboot-5.2.x as of d597168
skiboot master as of a56b9aa
diff mbox

Patch

diff --git a/core/hmi.c b/core/hmi.c
index d2cca90..a934438 100644
--- a/core/hmi.c
+++ b/core/hmi.c
@@ -217,7 +217,7 @@  static struct lock hmi_lock = LOCK_UNLOCKED;
 
 static int queue_hmi_event(struct OpalHMIEvent *hmi_evt, int recover)
 {
-	uint64_t *hmi_data;
+	size_t num_params;
 
 	/* Don't queue up event if recover == -1 */
 	if (recover == -1)
@@ -230,16 +230,17 @@  static int queue_hmi_event(struct OpalHMIEvent *hmi_evt, int recover)
 		hmi_evt->disposition = OpalHMI_DISPOSITION_NOT_RECOVERED;
 
 	/*
-	 * V2 of struct OpalHMIEvent is of (4 * 64 bits) size and well packed
+	 * V2 of struct OpalHMIEvent is of (5 * 64 bits) size and well packed
 	 * structure. Hence use uint64_t pointer to pass entire structure
-	 * using 4 params in generic message format.
+	 * using 5 params in generic message format. Instead of hard coding
+	 * num_params divide the struct size by 8 bytes to get exact
+	 * num_params value.
 	 */
-	hmi_data = (uint64_t *)hmi_evt;
+	num_params = ALIGN_UP(sizeof(*hmi_evt), sizeof(u64)) / sizeof(u64);
 
 	/* queue up for delivery to host. */
-	return opal_queue_msg(OPAL_MSG_HMI_EVT, NULL, NULL,
-				hmi_data[0], hmi_data[1], hmi_data[2],
-				hmi_data[3]);
+	return _opal_queue_msg(OPAL_MSG_HMI_EVT, NULL, NULL,
+				num_params, (uint64_t *)hmi_evt);
 }
 
 static int is_capp_recoverable(int chip_id)