Message ID | 20201215175445.1272776-17-pbonzini@redhat.com |
---|---|
State | New |
Headers | show |
Series | [PULL,01/45] remove preconfig state | expand |
On Tue, 15 Dec 2020 at 18:11, Paolo Bonzini <pbonzini@redhat.com> wrote: > > From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> > > The current default action of pausing a guest after a panic event > is received leaves the responsibility to resume guest execution to the > management layer. The reasons for this behavior are discussed here: > https://lore.kernel.org/qemu-devel/52148F88.5000509@redhat.com/ > > However, in instances like the case of older guests (Linux and > Windows) using a pvpanic device but missing support for the > PVPANIC_CRASHLOADED event, and Windows guests using the hv-crash > enlightenment, it is desirable to allow the guests to continue > running after sending a PVPANIC_PANICKED event. This allows such > guests to proceed to capture a crash dump and automatically reboot > without intervention of a management layer. > > Add an option to avoid stopping a VM after a panic event is received, > by passing: > > -action panic=none > > in the command line arguments, or during runtime by using an upcoming > QMP command. Hi. This commit message doesn't say it's changing the default action, but the change does: > @@ -3899,6 +3899,8 @@ DEF("action", HAS_ARG, QEMU_OPTION_action, > " action when guest reboots [default=none]\n" > "-action shutdown=poweroff|pause\n" > " action when guest shuts down [default=poweroff]\n" > + "-action panic=poweroff|pause|none\n" > + " action when guest panics [default=poweroff]\n" > "-action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none\n" > " action when watchdog fires [default=reset]\n", > QEMU_ARCH_ALL) > RebootAction reboot_action = REBOOT_ACTION_NONE; > ShutdownAction shutdown_action = SHUTDOWN_ACTION_POWEROFF; > +PanicAction panic_action = PANIC_ACTION_POWEROFF; We used to default to 'pause' and now we default to 'poweroff'. We noticed this because it broke an in-flight test case for the pvpanic-pci device from Mihai (which was expecting to see the device in 'pause' state and found it was now in 'poweroff'). Test cases aren't very exciting, but was it really intentional to change the default behaviour? It's part of the user-facing surface of QEMU, so if we did intend a default change that ought really to be more clearly stated (and noted in the Changelog) I think. thanks -- PMM
On 1/19/2021 4:34 PM, Peter Maydell wrote: > On Tue, 15 Dec 2020 at 18:11, Paolo Bonzini <pbonzini@redhat.com> wrote: >> From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> >> >> The current default action of pausing a guest after a panic event >> is received leaves the responsibility to resume guest execution to the >> management layer. The reasons for this behavior are discussed here: >> https://lore.kernel.org/qemu-devel/52148F88.5000509@redhat.com/ >> >> However, in instances like the case of older guests (Linux and >> Windows) using a pvpanic device but missing support for the >> PVPANIC_CRASHLOADED event, and Windows guests using the hv-crash >> enlightenment, it is desirable to allow the guests to continue >> running after sending a PVPANIC_PANICKED event. This allows such >> guests to proceed to capture a crash dump and automatically reboot >> without intervention of a management layer. >> >> Add an option to avoid stopping a VM after a panic event is received, >> by passing: >> >> -action panic=none >> >> in the command line arguments, or during runtime by using an upcoming >> QMP command. > Hi. This commit message doesn't say it's changing the default > action, but the change does: > >> @@ -3899,6 +3899,8 @@ DEF("action", HAS_ARG, QEMU_OPTION_action, >> " action when guest reboots [default=none]\n" >> "-action shutdown=poweroff|pause\n" >> " action when guest shuts down [default=poweroff]\n" >> + "-action panic=poweroff|pause|none\n" >> + " action when guest panics [default=poweroff]\n" >> "-action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none\n" >> " action when watchdog fires [default=reset]\n", >> QEMU_ARCH_ALL) >> RebootAction reboot_action = REBOOT_ACTION_NONE; >> ShutdownAction shutdown_action = SHUTDOWN_ACTION_POWEROFF; >> +PanicAction panic_action = PANIC_ACTION_POWEROFF; > We used to default to 'pause' and now we default to 'poweroff'. Hi Peter. My rationale for setting the panic action to 'poweroff' was to keep the default behavior of QEMU when '-no-shutdown' is not specified, and a panic occurs. I believe that in order to accomplish that, the default panic action should still be 'poweroff', but as you point out there is an instance where the behavior changes. Specifically, when '-no-shutdown' is not used there is now one fewer QMP event issued when a guest panic is detected, before stopping the VM and powering off. I tried to account for this scenario in the original patches, but I failed to catch the problem after the rebase when the changes were merged. I'll test and send a fix for this issue in the next few days. > > We noticed this because it broke an in-flight test case for > the pvpanic-pci device from Mihai (which was expecting to see > the device in 'pause' state and found it was now in 'poweroff'). The test is just checking for the arrival of the QMP event, and not actually expecting the VM to be paused, correct? I see that if a test/management app is expecting to receive a GUEST_PANICKED event with the specific 'pause' action, then it might be confused. But any such tests would only be able to check for the arrival of the QMP event, and not actually expect to issue any commands to a paused VM, since the next block of code in QEMU immediately powers off and shutdowns when '-no-shutdown' is not requested. This was the typical behavior before the patches. > Test cases aren't very exciting, but was it really intentional > to change the default behaviour? My intention was to preserve the default behavior. Perhaps Paolo wanted to reduce the number of GUEST_PANICKED events by removing the one with 'pause' action? You could consider it superfluous since it is immediately followed by another indicating the 'poweroff' action... Unless I hear otherwise from either of you, I'll work on a fix to keep the same number and type of events sent. Thank you, Alejandro > It's part of the user-facing > surface of QEMU, so if we did intend a default change that ought > really to be more clearly stated (and noted in the Changelog) I think. > > thanks > -- PMM
On Wed, 20 Jan 2021 at 05:28, Alejandro Jimenez <alejandro.j.jimenez@oracle.com> wrote: > On 1/19/2021 4:34 PM, Peter Maydell wrote: > > Test cases aren't very exciting, but was it really intentional > > to change the default behaviour? > My intention was to preserve the default behavior. Perhaps Paolo wanted > to reduce the number of GUEST_PANICKED events by removing the one with > 'pause' action? You could consider it superfluous since it is > immediately followed by another indicating the 'poweroff' action... > Unless I hear otherwise from either of you, I'll work on a fix to keep > the same number and type of events sent. I'm happy to defer to you/Paolo on the behaviour -- I mostly wanted to check if the change was intentional. -- PMM
On Wed, Jan 20, 2021 at 12:28:14AM -0500, Alejandro Jimenez wrote: > > > On 1/19/2021 4:34 PM, Peter Maydell wrote: > > On Tue, 15 Dec 2020 at 18:11, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> > > > > > > The current default action of pausing a guest after a panic event > > > is received leaves the responsibility to resume guest execution to the > > > management layer. The reasons for this behavior are discussed here: > > > https://lore.kernel.org/qemu-devel/52148F88.5000509@redhat.com/ > > > > > > However, in instances like the case of older guests (Linux and > > > Windows) using a pvpanic device but missing support for the > > > PVPANIC_CRASHLOADED event, and Windows guests using the hv-crash > > > enlightenment, it is desirable to allow the guests to continue > > > running after sending a PVPANIC_PANICKED event. This allows such > > > guests to proceed to capture a crash dump and automatically reboot > > > without intervention of a management layer. > > > > > > Add an option to avoid stopping a VM after a panic event is received, > > > by passing: > > > > > > -action panic=none > > > > > > in the command line arguments, or during runtime by using an upcoming > > > QMP command. > > Hi. This commit message doesn't say it's changing the default > > action, but the change does: > > > > > @@ -3899,6 +3899,8 @@ DEF("action", HAS_ARG, QEMU_OPTION_action, > > > " action when guest reboots [default=none]\n" > > > "-action shutdown=poweroff|pause\n" > > > " action when guest shuts down [default=poweroff]\n" > > > + "-action panic=poweroff|pause|none\n" > > > + " action when guest panics [default=poweroff]\n" > > > "-action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none\n" > > > " action when watchdog fires [default=reset]\n", > > > QEMU_ARCH_ALL) > > > RebootAction reboot_action = REBOOT_ACTION_NONE; > > > ShutdownAction shutdown_action = SHUTDOWN_ACTION_POWEROFF; > > > +PanicAction panic_action = PANIC_ACTION_POWEROFF; > > We used to default to 'pause' and now we default to 'poweroff'. > Hi Peter. > > My rationale for setting the panic action to 'poweroff' was to keep the > default behavior of QEMU when '-no-shutdown' is not specified, and a panic > occurs. I believe that in order to accomplish that, the default panic action > should still be 'poweroff', but as you point out there is an instance where > the behavior changes. Specifically, when '-no-shutdown' is not used there is > now one fewer QMP event issued when a guest panic is detected, before > stopping the VM and powering off. > > I tried to account for this scenario in the original patches, but I failed > to catch the problem after the rebase when the changes were merged. I'll > test and send a fix for this issue in the next few days. > > > > > We noticed this because it broke an in-flight test case for > > the pvpanic-pci device from Mihai (which was expecting to see > > the device in 'pause' state and found it was now in 'poweroff'). > The test is just checking for the arrival of the QMP event, and not actually > expecting the VM to be paused, correct? I see that if a test/management app > is expecting to receive a GUEST_PANICKED event with the specific 'pause' > action, then it might be confused. But any such tests would only be able to > check for the arrival of the QMP event, and not actually expect to issue any > commands to a paused VM, since the next block of code in QEMU immediately > powers off and shutdowns when '-no-shutdown' is not requested. This was the > typical behavior before the patches. > > > Test cases aren't very exciting, but was it really intentional > > to change the default behaviour? > My intention was to preserve the default behavior. Perhaps Paolo wanted to > reduce the number of GUEST_PANICKED events by removing the one with 'pause' > action? You could consider it superfluous since it is immediately followed > by another indicating the 'poweroff' action... Unless I hear otherwise from > either of you, I'll work on a fix to keep the same number and type of events > sent. Why would pause be immediately followed by poweroff ? These are independant actions, and the mgmt app should be deciding what todo next after the pause action. It may wish to capture a guest crash image before poweroff. Regards, Daniel
On 19/01/21 22:34, Peter Maydell wrote: > We used to default to 'pause' and now we default to 'poweroff'. > > We noticed this because it broke an in-flight test case for > the pvpanic-pci device from Mihai (which was expecting to see > the device in 'pause' state and found it was now in 'poweroff'). > Test cases aren't very exciting, but was it really intentional > to change the default behaviour? It's part of the user-facing > surface of QEMU, so if we did intend a default change that ought > really to be more clearly stated (and noted in the Changelog) I think. To sum up the difference: 1) before, without -no-shutdown: {"event": "GUEST_PANICKED", "data": {"action": "pause"}} {"event": "STOP"} {"event": "GUEST_PANICKED", "data": {"action": "poweroff"}} {"event": "SHUTDOWN", "data": {"guest": true, "reason": "guest-panic"}} after, without -no-shutdown: {"event": "GUEST_PANICKED", "data": {"action": "poweroff"}} {"event": "STOP"} {"event": "SHUTDOWN", "data": {"guest": true, "reason": "guest-panic"}} 2) before, with -no-shutdown: {"event": "GUEST_PANICKED", "data": {"action": "pause"}} {"event": "STOP"} after, with -no-shutdown (aka -action panic=pause,shutdown=pause): {"event": "GUEST_PANICKED", "data": {"action": "pause"}} {"event": "STOP"} I think the new behavior without -no-shutdown is (albeit not intentionally) preferrable, since QEMU used to report twice that the guest panicked. Looking again at the -action CLI, though, there are some inconsistencies in the naming. I'll send a patches for them, but they will not affect the HMP output. Paolo
On 20/01/21 14:54, Daniel P. Berrangé wrote: > Why would pause be immediately followed by poweroff ? These are independant > actions, and the mgmt app should be deciding what todo next after the > pause action. It may wish to capture a guest crash image before poweroff. That's exactly why -action was introduced (and before that, why -no-shutdown affected the behavior of panicking too). Guest panic will exit QEMU by default, but you can use "-action panic=pause" or "-no-shutdown" to change the default. Paolo
diff --git a/include/sysemu/runstate-action.h b/include/sysemu/runstate-action.h index 51a461c4ac..cff45a047b 100644 --- a/include/sysemu/runstate-action.h +++ b/include/sysemu/runstate-action.h @@ -14,5 +14,6 @@ /* in softmmu/runstate-action.c */ extern RebootAction reboot_action; extern ShutdownAction shutdown_action; +extern PanicAction panic_action; #endif /* RUNSTATE_ACTION_H */ diff --git a/qapi/run-state.json b/qapi/run-state.json index 25e82d1be1..1f3b329f05 100644 --- a/qapi/run-state.json +++ b/qapi/run-state.json @@ -353,6 +353,18 @@ { 'enum': 'ShutdownAction', 'data': [ 'poweroff', 'pause' ] } +## +# @PanicAction: +# +# @none: Continue VM execution +# +# @pause: Pause the VM +# +# Since: 6.0 +## +{ 'enum': 'PanicAction', + 'data': [ 'poweroff', 'pause', 'none' ] } + ## # @watchdog-set-action: # @@ -372,6 +384,8 @@ # # @shutdown: @ShutdownAction action taken on guest shutdown. # +# @panic: @PanicAction action taken on guest panic. +# # @watchdog: @WatchdogAction action taken when watchdog timer expires . # # Returns: Nothing on success. @@ -383,12 +397,14 @@ # -> { "execute": "set-action", # "arguments": { "reboot": "shutdown", # "shutdown" : "pause", +# "panic": "pause", # "watchdog": "inject-nmi" } } # <- { "return": {} } ## { 'command': 'set-action', 'data': { '*reboot': 'RebootAction', '*shutdown': 'ShutdownAction', + '*panic': 'PanicAction', '*watchdog': 'WatchdogAction' }, 'allow-preconfig': true } diff --git a/qemu-options.hx b/qemu-options.hx index eb55cd0eea..94cdfcf32e 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -3899,6 +3899,8 @@ DEF("action", HAS_ARG, QEMU_OPTION_action, " action when guest reboots [default=none]\n" "-action shutdown=poweroff|pause\n" " action when guest shuts down [default=poweroff]\n" + "-action panic=poweroff|pause|none\n" + " action when guest panics [default=poweroff]\n" "-action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none\n" " action when watchdog fires [default=reset]\n", QEMU_ARCH_ALL) @@ -3911,6 +3913,7 @@ SRST Examples: + ``-action panic=none`` ``-action reboot=shutdown,shutdown=pause`` ``-watchdog i6300esb -action watchdog=pause`` diff --git a/softmmu/runstate-action.c b/softmmu/runstate-action.c index 44de01adaf..99ce880886 100644 --- a/softmmu/runstate-action.c +++ b/softmmu/runstate-action.c @@ -15,6 +15,7 @@ RebootAction reboot_action = REBOOT_ACTION_NONE; ShutdownAction shutdown_action = SHUTDOWN_ACTION_POWEROFF; +PanicAction panic_action = PANIC_ACTION_POWEROFF; /* * Receives actions to be applied for specific guest events @@ -30,6 +31,10 @@ void qmp_set_action(bool has_reboot, RebootAction reboot, reboot_action = reboot; } + if (has_panic) { + panic_action = panic; + } + if (has_watchdog) { qmp_watchdog_set_action(watchdog, errp); } diff --git a/softmmu/runstate.c b/softmmu/runstate.c index bd0522ed9e..636aab0add 100644 --- a/softmmu/runstate.c +++ b/softmmu/runstate.c @@ -469,13 +469,23 @@ void qemu_system_guest_panicked(GuestPanicInformation *info) if (current_cpu) { current_cpu->crash_occurred = true; } - qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, - !!info, info); - vm_stop(RUN_STATE_GUEST_PANICKED); - if (shutdown_action == SHUTDOWN_ACTION_POWEROFF) { + /* + * TODO: Currently the available panic actions are: none, pause, and + * poweroff, but in principle debug and reset could be supported as well. + * Investigate any potential use cases for the unimplemented actions. + */ + if (panic_action == PANIC_ACTION_PAUSE) { + qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, + !!info, info); + vm_stop(RUN_STATE_GUEST_PANICKED); + } else if (panic_action == PANIC_ACTION_POWEROFF) { qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_POWEROFF, !!info, info); + vm_stop(RUN_STATE_GUEST_PANICKED); qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_PANIC); + } else { + qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_RUN, + !!info, info); } if (info) { diff --git a/softmmu/vl.c b/softmmu/vl.c index 6d1a7ebb08..3921a04f77 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -489,6 +489,9 @@ static QemuOptsList qemu_action_opts = { },{ .name = "reboot", .type = QEMU_OPT_STRING, + },{ + .name = "panic", + .type = QEMU_OPT_STRING, },{ .name = "watchdog", .type = QEMU_OPT_STRING, @@ -3212,7 +3215,7 @@ void qemu_init(int argc, char **argv, char **envp) break; case QEMU_OPTION_no_shutdown: olist = qemu_find_opts("action"); - qemu_opts_parse_noisily(olist, "shutdown=pause", false); + qemu_opts_parse_noisily(olist, "panic=pause,shutdown=pause", false); break; case QEMU_OPTION_show_cursor: warn_report("The -show-cursor option is deprecated. Please "