diff mbox series

[v7] s390x/cpu: expose the guest crash information

Message ID 20180208095752.23960-1-borntraeger@de.ibm.com
State New
Headers show
Series [v7] s390x/cpu: expose the guest crash information | expand

Commit Message

Christian Borntraeger Feb. 8, 2018, 9:57 a.m. UTC
This patch is the s390 implementation of guest crash information,
similar to commit d187e08dc4 ("i386/cpu: add crash-information QOM
property") and the related commits. We will detect several crash
reasons, with the "disabled wait" being the most important one, since
this is used by all s390 guests as a "panic like" notification.

Demonstrate these ways with examples as follows.

  1. crash-information QOM property;

  Run qemu with -qmp unix:qmp-sock,server, then use utility "qmp-shell"
  to execute "qom-get" command, and might get the result like,

  (QEMU) (QEMU) qom-get path=/machine/unattached/device[0] \
      property=crash-information
  {"return": {"core": 0, "reason": "disabled-wait", "psw-mask": 562956395872256, \
      "type": "s390", "psw-addr": 1102832}}

  2. GUEST_PANICKED event reporting;

  Run qemu with a socket option, and telnet or nc to that,
  -chardev socket,id=qmp,port=4444,host=localhost,server \
  -mon chardev=qmp,mode=control,pretty=on \
  Negotiating the mode by { "execute": "qmp_capabilities" }, and the crash
  information will be reported on a guest crash event like,

  {
    "timestamp": {
        "seconds": 1518004739,
        "microseconds": 552563
    },
    "event": "GUEST_PANICKED",
    "data": {
        "action": "pause",
        "info": {
            "core": 0,
            "psw-addr": 1102832,
            "reason": "disabled-wait",
            "psw-mask": 562956395872256,
            "type": "s390"
        }
    }
  }

  3. log;

  Run qemu with the parameters: -D <logfile> -d guest_errors, to
  specify the logfile and log item. The results might be,

  Guest crashed on cpu 0: disabled-wait
  PSW: 0x0002000180000000 0x000000000010d3f0

Co-authored-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
v6->v7:
	- word separation like disabled-wait instead of disabledwait 
	- use S390CrashReason_str instead of qapi_enum_lookup
	- fix linux user
	- improve description for extint-loop

 qapi/run-state.json   | 55 +++++++++++++++++++++++++++++++++++++++++++++++++--
 target/s390x/cpu.c    | 43 ++++++++++++++++++++++++++++++++++++++++
 target/s390x/cpu.h    |  2 ++
 target/s390x/helper.c |  5 ++++-
 target/s390x/kvm.c    | 15 +++++++-------
 vl.c                  | 11 +++++++++--
 6 files changed, 118 insertions(+), 13 deletions(-)

Comments

Eric Blake Feb. 8, 2018, 2:20 p.m. UTC | #1
On 02/08/2018 03:57 AM, Christian Borntraeger wrote:
> This patch is the s390 implementation of guest crash information,
> similar to commit d187e08dc4 ("i386/cpu: add crash-information QOM
> property") and the related commits. We will detect several crash
> reasons, with the "disabled wait" being the most important one, since
> this is used by all s390 guests as a "panic like" notification.
> 

> Co-authored-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---

> +##
> +# @GuestPanicInformationS390:
> +#
> +# S390 specific guest panic information (PSW)
> +#
> +# @core: core id of the CPU that crashed
> +# @psw-mask: control fields of guest PSW
> +# @psw-addr: guest instruction address
> +# @reason: guest crash reason in human readable form
> +#
> +# Since: 2.12
> +##
> +{'struct': 'GuestPanicInformationS390',
> + 'data': { 'core': 'uint32',

Should core be optional,...


> +static GuestPanicInformation *s390_cpu_get_crash_info(CPUState *cs)
> +{
> +    GuestPanicInformation *panic_info;
> +    S390CPU *cpu = S390_CPU(cs);
> +
> +    cpu_synchronize_state(cs);
> +    panic_info = g_malloc0(sizeof(GuestPanicInformation));
> +
> +    panic_info->type = GUEST_PANIC_INFORMATION_TYPE_S390;
> +#if !defined(CONFIG_USER_ONLY)
> +    panic_info->u.s390.core = cpu->env.core_id;
> +#endif

...given that it is only conditionally assigned?  If so, you'd also need 
to set panic_info->u.s390.has_core when you have a core id to expose.


> +
> +static void s390_cpu_get_crash_info_qom(Object *obj, Visitor *v,
> +                                        const char *name, void *opaque,
> +                                        Error **errp)
> +{
> +    CPUState *cs = CPU(obj);
> +    GuestPanicInformation *panic_info;
> +
> +    if (!cs->crash_occurred) {
> +        error_setg(errp, "No crash occured");

s/occured/occurred/
Christian Borntraeger Feb. 8, 2018, 3:03 p.m. UTC | #2
On 02/08/2018 03:20 PM, Eric Blake wrote:
> On 02/08/2018 03:57 AM, Christian Borntraeger wrote:
>> This patch is the s390 implementation of guest crash information,
>> similar to commit d187e08dc4 ("i386/cpu: add crash-information QOM
>> property") and the related commits. We will detect several crash
>> reasons, with the "disabled wait" being the most important one, since
>> this is used by all s390 guests as a "panic like" notification.
>>
> 
>> Co-authored-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
> 
>> +##
>> +# @GuestPanicInformationS390:
>> +#
>> +# S390 specific guest panic information (PSW)
>> +#
>> +# @core: core id of the CPU that crashed
>> +# @psw-mask: control fields of guest PSW
>> +# @psw-addr: guest instruction address
>> +# @reason: guest crash reason in human readable form
>> +#
>> +# Since: 2.12
>> +##
>> +{'struct': 'GuestPanicInformationS390',
>> + 'data': { 'core': 'uint32',
> 
> Should core be optional,...
> 
> 
>> +static GuestPanicInformation *s390_cpu_get_crash_info(CPUState *cs)
>> +{
>> +    GuestPanicInformation *panic_info;
>> +    S390CPU *cpu = S390_CPU(cs);
>> +
>> +    cpu_synchronize_state(cs);
>> +    panic_info = g_malloc0(sizeof(GuestPanicInformation));
>> +
>> +    panic_info->type = GUEST_PANIC_INFORMATION_TYPE_S390;
>> +#if !defined(CONFIG_USER_ONLY)
>> +    panic_info->u.s390.core = cpu->env.core_id;
>> +#endif
> 
> ...given that it is only conditionally assigned?  If so, you'd also need to set panic_info->u.s390.has_core when you have a core id to expose.

Can we simply set it to 0 here and keep it mandatory? After all for
linux-user the qapi interface really makes not a lot of sense.


> 
> 
>> +
>> +static void s390_cpu_get_crash_info_qom(Object *obj, Visitor *v,
>> +                                        const char *name, void *opaque,
>> +                                        Error **errp)
>> +{
>> +    CPUState *cs = CPU(obj);
>> +    GuestPanicInformation *panic_info;
>> +
>> +    if (!cs->crash_occurred) {
>> +        error_setg(errp, "No crash occured");
> 
> s/occured/occurred/
> 
yes
Christian Borntraeger Feb. 9, 2018, 8:39 a.m. UTC | #3
On 02/08/2018 03:20 PM, Eric Blake wrote:
> On 02/08/2018 03:57 AM, Christian Borntraeger wrote:
>> This patch is the s390 implementation of guest crash information,
>> similar to commit d187e08dc4 ("i386/cpu: add crash-information QOM
>> property") and the related commits. We will detect several crash
>> reasons, with the "disabled wait" being the most important one, since
>> this is used by all s390 guests as a "panic like" notification.
>>
> 
>> Co-authored-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
> 
>> +##
>> +# @GuestPanicInformationS390:
>> +#
>> +# S390 specific guest panic information (PSW)
>> +#
>> +# @core: core id of the CPU that crashed
>> +# @psw-mask: control fields of guest PSW
>> +# @psw-addr: guest instruction address
>> +# @reason: guest crash reason in human readable form
>> +#
>> +# Since: 2.12
>> +##
>> +{'struct': 'GuestPanicInformationS390',
>> + 'data': { 'core': 'uint32',
> 
> Should core be optional,...
> 
> 
>> +static GuestPanicInformation *s390_cpu_get_crash_info(CPUState *cs)
>> +{
>> +    GuestPanicInformation *panic_info;
>> +    S390CPU *cpu = S390_CPU(cs);
>> +
>> +    cpu_synchronize_state(cs);
>> +    panic_info = g_malloc0(sizeof(GuestPanicInformation));
>> +
>> +    panic_info->type = GUEST_PANIC_INFORMATION_TYPE_S390;
>> +#if !defined(CONFIG_USER_ONLY)
>> +    panic_info->u.s390.core = cpu->env.core_id;
>> +#endif
> 
> ...given that it is only conditionally assigned?  If so, you'd also need to set panic_info->u.s390.has_core when you have a core id to expose.

So after some irc discussion with Daniel and Erik, I plan to add

#else
    panic_info->u.s390.core = 0;

for two reasons
- this interface is likely never used for linux-user
- linux-user does not know about CPU numbers, so using 0 is a good default

an optional item seems like overkill interface-wise.
diff mbox series

Patch

diff --git a/qapi/run-state.json b/qapi/run-state.json
index bca46a8785..4bd15ae54f 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -320,22 +320,29 @@ 
 #
 # An enumeration of the guest panic information types
 #
+# @hyper-v: hyper-v guest panic information type
+#
+# @s390: s390 guest panic information type (Since: 2.12)
+#
 # Since: 2.9
 ##
 { 'enum': 'GuestPanicInformationType',
-  'data': [ 'hyper-v'] }
+  'data': [ 'hyper-v', 's390' ] }
 
 ##
 # @GuestPanicInformation:
 #
 # Information about a guest panic
 #
+# @type: Crash type that defines the hypervisor specific information
+#
 # Since: 2.9
 ##
 {'union': 'GuestPanicInformation',
  'base': {'type': 'GuestPanicInformationType'},
  'discriminator': 'type',
- 'data': { 'hyper-v': 'GuestPanicInformationHyperV' } }
+ 'data': { 'hyper-v': 'GuestPanicInformationHyperV',
+           's390': 'GuestPanicInformationS390' } }
 
 ##
 # @GuestPanicInformationHyperV:
@@ -350,3 +357,47 @@ 
            'arg3': 'uint64',
            'arg4': 'uint64',
            'arg5': 'uint64' } }
+
+##
+# @S390CrashReason:
+#
+# Reason why the CPU is in a crashed state.
+#
+# @unknown: no crash reason was set
+#
+# @disabled-wait: the CPU has entered a disabled wait state
+#
+# @extint-loop: clock comparator or cpu timer interrupt with new PSW enabled
+#              for external interrupts
+#
+# @pgmint-loop: program interrupt with BAD new PSW
+#
+# @opint-loop: operation exception interrupt with invalid code at the program
+#             interrupt new PSW
+#
+# Since: 2.12
+##
+{ 'enum': 'S390CrashReason',
+  'data': [ 'unknown',
+            'disabled-wait',
+            'extint-loop',
+            'pgmint-loop',
+            'opint-loop' ] }
+
+##
+# @GuestPanicInformationS390:
+#
+# S390 specific guest panic information (PSW)
+#
+# @core: core id of the CPU that crashed
+# @psw-mask: control fields of guest PSW
+# @psw-addr: guest instruction address
+# @reason: guest crash reason in human readable form
+#
+# Since: 2.12
+##
+{'struct': 'GuestPanicInformationS390',
+ 'data': { 'core': 'uint32',
+           'psw-mask': 'uint64',
+           'psw-addr': 'uint64',
+           'reason': 'S390CrashReason' } }
diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index d2e6b9f5c7..603e68e284 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -35,6 +35,8 @@ 
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "qapi/visitor.h"
+#include "qapi-visit.h"
+#include "sysemu/hw_accel.h"
 #include "exec/exec-all.h"
 #include "hw/qdev-properties.h"
 #ifndef CONFIG_USER_ONLY
@@ -237,6 +239,44 @@  out:
     error_propagate(errp, err);
 }
 
+static GuestPanicInformation *s390_cpu_get_crash_info(CPUState *cs)
+{
+    GuestPanicInformation *panic_info;
+    S390CPU *cpu = S390_CPU(cs);
+
+    cpu_synchronize_state(cs);
+    panic_info = g_malloc0(sizeof(GuestPanicInformation));
+
+    panic_info->type = GUEST_PANIC_INFORMATION_TYPE_S390;
+#if !defined(CONFIG_USER_ONLY)
+    panic_info->u.s390.core = cpu->env.core_id;
+#endif
+    panic_info->u.s390.psw_mask = cpu->env.psw.mask;
+    panic_info->u.s390.psw_addr = cpu->env.psw.addr;
+    panic_info->u.s390.reason = cpu->env.crash_reason;
+
+    return panic_info;
+}
+
+static void s390_cpu_get_crash_info_qom(Object *obj, Visitor *v,
+                                        const char *name, void *opaque,
+                                        Error **errp)
+{
+    CPUState *cs = CPU(obj);
+    GuestPanicInformation *panic_info;
+
+    if (!cs->crash_occurred) {
+        error_setg(errp, "No crash occured");
+        return;
+    }
+
+    panic_info = s390_cpu_get_crash_info(cs);
+
+    visit_type_GuestPanicInformation(v, "crash-information", &panic_info,
+                                     errp);
+    qapi_free_GuestPanicInformation(panic_info);
+}
+
 static void s390_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -249,6 +289,8 @@  static void s390_cpu_initfn(Object *obj)
     cs->env_ptr = env;
     cs->halted = 1;
     cs->exception_index = EXCP_HLT;
+    object_property_add(obj, "crash-information", "GuestPanicInformation",
+                        s390_cpu_get_crash_info_qom, NULL, NULL, NULL, NULL);
     s390_cpu_model_register_props(obj);
 #if !defined(CONFIG_USER_ONLY)
     qemu_get_timedate(&tm, 0);
@@ -482,6 +524,7 @@  static void s390_cpu_class_init(ObjectClass *oc, void *data)
     cc->do_interrupt = s390_cpu_do_interrupt;
 #endif
     cc->dump_state = s390_cpu_dump_state;
+    cc->get_crash_info = s390_cpu_get_crash_info;
     cc->set_pc = s390_cpu_set_pc;
     cc->gdb_read_register = s390_cpu_gdb_read_register;
     cc->gdb_write_register = s390_cpu_gdb_write_register;
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index a1123ad621..a0feed6781 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -102,6 +102,8 @@  struct CPUS390XState {
 
     PSW psw;
 
+    S390CrashReason crash_reason;
+
     uint64_t cc_src;
     uint64_t cc_dst;
     uint64_t cc_vr;
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index 35d9741918..6d297716b1 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -84,12 +84,15 @@  static inline bool is_special_wait_psw(uint64_t psw_addr)
 
 void s390_handle_wait(S390CPU *cpu)
 {
+    CPUState *cs = CPU(cpu);
+
     if (s390_cpu_halt(cpu) == 0) {
 #ifndef CONFIG_USER_ONLY
         if (is_special_wait_psw(cpu->env.psw.addr)) {
             qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
         } else {
-            qemu_system_guest_panicked(NULL);
+            cpu->env.crash_reason = S390_CRASH_REASON_DISABLED_WAIT;
+            qemu_system_guest_panicked(cpu_get_crash_info(cs));
         }
 #endif
     }
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index 8736001156..a67b92774a 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -1568,15 +1568,14 @@  static int handle_instruction(S390CPU *cpu, struct kvm_run *run)
     return r;
 }
 
-static void unmanageable_intercept(S390CPU *cpu, const char *str, int pswoffset)
+static void unmanageable_intercept(S390CPU *cpu, S390CrashReason reason,
+                                   int pswoffset)
 {
     CPUState *cs = CPU(cpu);
 
-    error_report("Unmanageable %s! CPU%i new PSW: 0x%016lx:%016lx",
-                 str, cs->cpu_index, ldq_phys(cs->as, cpu->env.psa + pswoffset),
-                 ldq_phys(cs->as, cpu->env.psa + pswoffset + 8));
     s390_cpu_halt(cpu);
-    qemu_system_guest_panicked(NULL);
+    cpu->env.crash_reason = reason;
+    qemu_system_guest_panicked(cpu_get_crash_info(cs));
 }
 
 /* try to detect pgm check loops */
@@ -1606,7 +1605,7 @@  static int handle_oper_loop(S390CPU *cpu, struct kvm_run *run)
         !(oldpsw.mask & PSW_MASK_PSTATE) &&
         (newpsw.mask & PSW_MASK_ASC) == (oldpsw.mask & PSW_MASK_ASC) &&
         (newpsw.mask & PSW_MASK_DAT) == (oldpsw.mask & PSW_MASK_DAT)) {
-        unmanageable_intercept(cpu, "operation exception loop",
+        unmanageable_intercept(cpu, S390_CRASH_REASON_OPINT_LOOP,
                                offsetof(LowCore, program_new_psw));
         return EXCP_HALTED;
     }
@@ -1627,12 +1626,12 @@  static int handle_intercept(S390CPU *cpu)
             r = handle_instruction(cpu, run);
             break;
         case ICPT_PROGRAM:
-            unmanageable_intercept(cpu, "program interrupt",
+            unmanageable_intercept(cpu, S390_CRASH_REASON_PGMINT_LOOP,
                                    offsetof(LowCore, program_new_psw));
             r = EXCP_HALTED;
             break;
         case ICPT_EXT_INT:
-            unmanageable_intercept(cpu, "external interrupt",
+            unmanageable_intercept(cpu, S390_CRASH_REASON_EXTINT_LOOP,
                                    offsetof(LowCore, external_new_psw));
             r = EXCP_HALTED;
             break;
diff --git a/vl.c b/vl.c
index e517a8d995..7df480acff 100644
--- a/vl.c
+++ b/vl.c
@@ -1738,7 +1738,7 @@  void qemu_system_reset(ShutdownCause reason)
 
 void qemu_system_guest_panicked(GuestPanicInformation *info)
 {
-    qemu_log_mask(LOG_GUEST_ERROR, "Guest crashed\n");
+    qemu_log_mask(LOG_GUEST_ERROR, "Guest crashed");
 
     if (current_cpu) {
         current_cpu->crash_occurred = true;
@@ -1754,13 +1754,20 @@  void qemu_system_guest_panicked(GuestPanicInformation *info)
 
     if (info) {
         if (info->type == GUEST_PANIC_INFORMATION_TYPE_HYPER_V) {
-            qemu_log_mask(LOG_GUEST_ERROR, "HV crash parameters: (%#"PRIx64
+            qemu_log_mask(LOG_GUEST_ERROR, "\nHV crash parameters: (%#"PRIx64
                           " %#"PRIx64" %#"PRIx64" %#"PRIx64" %#"PRIx64")\n",
                           info->u.hyper_v.arg1,
                           info->u.hyper_v.arg2,
                           info->u.hyper_v.arg3,
                           info->u.hyper_v.arg4,
                           info->u.hyper_v.arg5);
+        } else if (info->type == GUEST_PANIC_INFORMATION_TYPE_S390) {
+            qemu_log_mask(LOG_GUEST_ERROR, " on cpu %d: %s\n"
+                          "PSW: 0x%016" PRIx64 " 0x%016" PRIx64"\n",
+                          info->u.s390.core,
+                          S390CrashReason_str(info->u.s390.reason),
+                          info->u.s390.psw_mask,
+                          info->u.s390.psw_addr);
         }
         qapi_free_GuestPanicInformation(info);
     }