diff mbox

[5/5] target-ppc: Handle cases when multi-processors get machine-check

Message ID 20140825134554.2361.45051.stgit@aravindap
State New
Headers show

Commit Message

Aravinda Prasad Aug. 25, 2014, 1:45 p.m. UTC
It is possible for multi-processors to experience machine
check at or about the same time. As per PAPR, subsequent
processors serialize waiting for the first processor to
issue the ibm,nmi-interlock call.

The second processor retries if the first processor which
received a machine check is still reading the error log
and is yet to issue ibm,nmi-interlock call.

This patch implements this functionality.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
---
 hw/ppc/spapr_hcall.c |   13 +++++++++++++
 hw/ppc/spapr_rtas.c  |    8 +++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

Comments

David Gibson Aug. 26, 2014, 6:04 a.m. UTC | #1
On Mon, Aug 25, 2014 at 07:15:54PM +0530, Aravinda Prasad wrote:
> It is possible for multi-processors to experience machine
> check at or about the same time. As per PAPR, subsequent
> processors serialize waiting for the first processor to
> issue the ibm,nmi-interlock call.
> 
> The second processor retries if the first processor which
> received a machine check is still reading the error log
> and is yet to issue ibm,nmi-interlock call.
> 
> This patch implements this functionality.

The previous patch is broken without this, so just fold the two
together.
Aravinda Prasad Aug. 26, 2014, 7:04 a.m. UTC | #2
On Tuesday 26 August 2014 11:34 AM, David Gibson wrote:
> On Mon, Aug 25, 2014 at 07:15:54PM +0530, Aravinda Prasad wrote:
>> It is possible for multi-processors to experience machine
>> check at or about the same time. As per PAPR, subsequent
>> processors serialize waiting for the first processor to
>> issue the ibm,nmi-interlock call.
>>
>> The second processor retries if the first processor which
>> received a machine check is still reading the error log
>> and is yet to issue ibm,nmi-interlock call.
>>
>> This patch implements this functionality.
> 
> The previous patch is broken without this, so just fold the two
> together.

sure.

Thanks for the review.

Regards,
Aravinda

>
Alexander Graf Aug. 27, 2014, 10:40 a.m. UTC | #3
On 25.08.14 15:45, Aravinda Prasad wrote:
> It is possible for multi-processors to experience machine
> check at or about the same time. As per PAPR, subsequent
> processors serialize waiting for the first processor to
> issue the ibm,nmi-interlock call.
> 
> The second processor retries if the first processor which
> received a machine check is still reading the error log
> and is yet to issue ibm,nmi-interlock call.
> 
> This patch implements this functionality.
> 
> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>

This patch doesn't make any sense. Both threads will issue an HCALL
which will get locked inside of QEMU, so we'll never see the case where
both hypercalls get processed at the same time.


Alex
Aravinda Prasad Aug. 28, 2014, 6:56 a.m. UTC | #4
On Wednesday 27 August 2014 04:10 PM, Alexander Graf wrote:
> 
> 
> On 25.08.14 15:45, Aravinda Prasad wrote:
>> It is possible for multi-processors to experience machine
>> check at or about the same time. As per PAPR, subsequent
>> processors serialize waiting for the first processor to
>> issue the ibm,nmi-interlock call.
>>
>> The second processor retries if the first processor which
>> received a machine check is still reading the error log
>> and is yet to issue ibm,nmi-interlock call.
>>
>> This patch implements this functionality.
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
> 
> This patch doesn't make any sense. Both threads will issue an HCALL
> which will get locked inside of QEMU, so we'll never see the case where
> both hypercalls get processed at the same time.

AFAIK, only one thread can succeed entering qemu upon parallel hcall
from different guest CPUs as it is gated by a lock. Hence one hcall is
processed at a time.

As per PAPR, we don't want any other KVMPPC_H_REPORT_ERR hcall to be
processed at the same time and further KVMPPC_H_REPORT_ERR hcall thus
issued should wait until the OS issues ibm,nmi-interlock.

Thanks for the review.

Regards,
Aravinda

> 
> 
> Alex
>
Alexander Graf Aug. 28, 2014, 8:39 a.m. UTC | #5
On 28.08.14 08:56, Aravinda Prasad wrote:
> 
> 
> On Wednesday 27 August 2014 04:10 PM, Alexander Graf wrote:
>>
>>
>> On 25.08.14 15:45, Aravinda Prasad wrote:
>>> It is possible for multi-processors to experience machine
>>> check at or about the same time. As per PAPR, subsequent
>>> processors serialize waiting for the first processor to
>>> issue the ibm,nmi-interlock call.
>>>
>>> The second processor retries if the first processor which
>>> received a machine check is still reading the error log
>>> and is yet to issue ibm,nmi-interlock call.
>>>
>>> This patch implements this functionality.
>>>
>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>
>> This patch doesn't make any sense. Both threads will issue an HCALL
>> which will get locked inside of QEMU, so we'll never see the case where
>> both hypercalls get processed at the same time.
> 
> AFAIK, only one thread can succeed entering qemu upon parallel hcall
> from different guest CPUs as it is gated by a lock. Hence one hcall is
> processed at a time.

Exactly, so at the point of the if(mc_in_progress == 1), mc_in_progress
will always be 0.


Alex
Alexander Graf Aug. 28, 2014, 8:42 a.m. UTC | #6
On 28.08.14 08:56, Aravinda Prasad wrote:
> 
> 
> On Wednesday 27 August 2014 04:10 PM, Alexander Graf wrote:
>>
>>
>> On 25.08.14 15:45, Aravinda Prasad wrote:
>>> It is possible for multi-processors to experience machine
>>> check at or about the same time. As per PAPR, subsequent
>>> processors serialize waiting for the first processor to
>>> issue the ibm,nmi-interlock call.
>>>
>>> The second processor retries if the first processor which
>>> received a machine check is still reading the error log
>>> and is yet to issue ibm,nmi-interlock call.
>>>
>>> This patch implements this functionality.
>>>
>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>
>> This patch doesn't make any sense. Both threads will issue an HCALL
>> which will get locked inside of QEMU, so we'll never see the case where
>> both hypercalls get processed at the same time.
> 
> AFAIK, only one thread can succeed entering qemu upon parallel hcall
> from different guest CPUs as it is gated by a lock. Hence one hcall is
> processed at a time.
> 
> As per PAPR, we don't want any other KVMPPC_H_REPORT_ERR hcall to be
> processed at the same time and further KVMPPC_H_REPORT_ERR hcall thus
> issued should wait until the OS issues ibm,nmi-interlock.

Oh, now I understand. The locking time is from
[h_report_mc_err...rtas_ibm_nmi_interlock].

This should definitely go into the comment on the check in
h_report_mc_err. In fact, remove the fact that only one thread can
execute and instead write where the lock gets unset and that during that
phase only one vcpu may process the NMI.


Alex
Aravinda Prasad Aug. 28, 2014, 5:45 p.m. UTC | #7
On Thursday 28 August 2014 02:12 PM, Alexander Graf wrote:
> 
> 
> On 28.08.14 08:56, Aravinda Prasad wrote:
>>
>>
>> On Wednesday 27 August 2014 04:10 PM, Alexander Graf wrote:
>>>
>>>
>>> On 25.08.14 15:45, Aravinda Prasad wrote:
>>>> It is possible for multi-processors to experience machine
>>>> check at or about the same time. As per PAPR, subsequent
>>>> processors serialize waiting for the first processor to
>>>> issue the ibm,nmi-interlock call.
>>>>
>>>> The second processor retries if the first processor which
>>>> received a machine check is still reading the error log
>>>> and is yet to issue ibm,nmi-interlock call.
>>>>
>>>> This patch implements this functionality.
>>>>
>>>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>>>
>>> This patch doesn't make any sense. Both threads will issue an HCALL
>>> which will get locked inside of QEMU, so we'll never see the case where
>>> both hypercalls get processed at the same time.
>>
>> AFAIK, only one thread can succeed entering qemu upon parallel hcall
>> from different guest CPUs as it is gated by a lock. Hence one hcall is
>> processed at a time.
>>
>> As per PAPR, we don't want any other KVMPPC_H_REPORT_ERR hcall to be
>> processed at the same time and further KVMPPC_H_REPORT_ERR hcall thus
>> issued should wait until the OS issues ibm,nmi-interlock.
> 
> Oh, now I understand. The locking time is from
> [h_report_mc_err...rtas_ibm_nmi_interlock].
> 
> This should definitely go into the comment on the check in
> h_report_mc_err. In fact, remove the fact that only one thread can
> execute and instead write where the lock gets unset and that during that
> phase only one vcpu may process the NMI.

Sure will add a comment.

Regards,
Aravinda

> 
> 
> Alex
>
diff mbox

Patch

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index be063f4..542d0b7 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -94,6 +94,9 @@  struct rtas_mc_log {
     struct rtas_error_log err_log;
 };
 
+/* Whether machine check handling is in progress by any CPU */
+bool mc_in_progress;
+
 static void do_spr_sync(void *arg)
 {
     struct SPRSyncState *s = arg;
@@ -674,6 +677,16 @@  static target_ulong h_report_mc_err(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 
     /*
+     * No need for lock. Only one thread can be executing
+     * inside an hcall
+     */
+    if (mc_in_progress == 1) {
+        return 0;
+    }
+
+    mc_in_progress = 1;
+
+    /*
      * We save the original r3 register in SPRG2 in 0x200 vector,
      * which is patched during call to ibm.nmi-register. Original
      * r3 is required to be included in error log
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 1135d2b..8fe4db2 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -36,6 +36,8 @@ 
 
 #include <libfdt.h>
 
+extern bool mc_in_progress;
+
 static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                                    uint32_t token, uint32_t nargs,
                                    target_ulong args,
@@ -303,6 +305,9 @@  static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
         0x6063f004,    /* ori     r3,r3,f004  */
         /* Issue H_CALL */
         0x44000022,    /*  sc      1     */
+        0x2fa30000,    /* cmplwi r3,0 */
+        0x409e0008,    /* bne continue */
+        0x4800020a,    /* retry KVMPPC_H_REPORT_ERR */
         0x7c9243a6,    /* mtspr r4 sprg2 */
         0xe8830000,    /* ld r4, 0(r3) */
         0x7c9a03a6,    /* mtspr r4, srr0 */
@@ -333,7 +338,7 @@  static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
      * machine check address requested by OS
      */
     branch_inst |= guest_machine_check_addr;
-    memcpy(&trampoline[11], &branch_inst, sizeof(branch_inst));
+    memcpy(&trampoline[14], &branch_inst, sizeof(branch_inst));
 
     /* Handle all Host/Guest LE/BE combinations */
     if ((*pcc->interrupts_big_endian)(cpu)) {
@@ -359,6 +364,7 @@  static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
                                    target_ulong args,
                                    uint32_t nret, target_ulong rets)
 {
+    mc_in_progress = 0;
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }