diff mbox

[for-1.6,V2,0/2] pvpanic: Separate pvpanic from machine type

Message ID 52148F88.5000509@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Aug. 21, 2013, 9:59 a.m. UTC
Il 21/08/2013 11:42, Michael S. Tsirkin ha scritto:
> On Wed, Aug 21, 2013 at 10:18:23AM +0200, Paolo Bonzini wrote:
>> Il 21/08/2013 10:03, Marcel Apfelbaum ha scritto:
>>> On Wed, 2013-08-14 at 10:02 +0300, Ronen Hod wrote:
>>>> How about adding a flag that tells QEMU whether to pause or reboot the guest
>>>> after the panic?
>>>> We cannot assume that we always have a management layer that takes care
>>>> of this.
>>>> One example is Microsoft's WHQL that deliberately generates a BSOD, and then
>>>> examines the dump files.
>>> After this patch the pvpanic is not part of the global devices anymore so just
>>> don't enable it if you want to reboot on BSOD.
>>> In my opinion "reboot after panic" equals "run without pvpanic device"
>>
>> This is not entirely possible, since "reboot after panic" is a guest
>> setting while "run without pvpanic device" is a host setting (that the
>> guest administrator may not even have access to: Ronen's case is a good
>> example of this, because the "administrator" there is the WHQL harness).
>>
>> However, I think this is a driver problem.  The driver should just probe
>> the "reboot after panic" setting and not issue the outb to the pvpanic port.
> 
> This might or might not be possible on different OS-es.
> What exactly is gained by doing vmstop on outb of pvpanic?

Because events are edge-triggered, and can be lost if management dies at
the wrong time, each event that QEMU sends must go together with a way
for management to poll the state.

For panic, the way to poll the state is "info status".  This matches
what we do for watchdogs, for example.  Management can issue "info
status" to learn of the panic state, even if it happens while management
itself is not running:

     libvirtd                 QEMU                  guest
  ---------------------------------------------------------------
     stops
                                                 <- pvpanic outb
                              emits panic event
                              (no one receives it)
     starts
     info status ->
                              <- PANICKED


Because there is only one running state, this means the VM has to be
stopped.

But actually, fixing the driver would only be required if pvpanic were
mandatory.

Now that pvpanic is optional, "reboot after panic" can also be fixed in
libvirt.  Let's remove the "must reset after panic" limitation; then,
libvirt can simply do itself a "continue" after receiving the panicked
event (or after seeing that the guest is in panicked state).  The
panicked event will never be sent unless management explicitly requests
it (with "-device pvpanic"), so backwards compatibility is preserved.

The pause will still happen if management was stopped, but that's a fair
compromise IMHO.

It will mean also that "reboot after panic" will be broken in 1.6.0,
unfortunately.  Perhaps we can have a quick 1.6.1 release with this patch:



By the way, this means two things:

- I am now sold on the idea that explicitly enabling of pvpanic is the
right thing to do;

- on the other hand this is the proof that the change was not fully
understood, and rushing it in 1.6 was the wrong thing to do.

Paolo

> We want a notification about the panic but
> adding yet another way to halt seems kind of useless.
> Why not let VM continue? If it wants to stop it
> can always call halt.

Comments

Michael S. Tsirkin Aug. 21, 2013, 10:16 a.m. UTC | #1
On Wed, Aug 21, 2013 at 11:59:36AM +0200, Paolo Bonzini wrote:
> Il 21/08/2013 11:42, Michael S. Tsirkin ha scritto:
> > On Wed, Aug 21, 2013 at 10:18:23AM +0200, Paolo Bonzini wrote:
> >> Il 21/08/2013 10:03, Marcel Apfelbaum ha scritto:
> >>> On Wed, 2013-08-14 at 10:02 +0300, Ronen Hod wrote:
> >>>> How about adding a flag that tells QEMU whether to pause or reboot the guest
> >>>> after the panic?
> >>>> We cannot assume that we always have a management layer that takes care
> >>>> of this.
> >>>> One example is Microsoft's WHQL that deliberately generates a BSOD, and then
> >>>> examines the dump files.
> >>> After this patch the pvpanic is not part of the global devices anymore so just
> >>> don't enable it if you want to reboot on BSOD.
> >>> In my opinion "reboot after panic" equals "run without pvpanic device"
> >>
> >> This is not entirely possible, since "reboot after panic" is a guest
> >> setting while "run without pvpanic device" is a host setting (that the
> >> guest administrator may not even have access to: Ronen's case is a good
> >> example of this, because the "administrator" there is the WHQL harness).
> >>
> >> However, I think this is a driver problem.  The driver should just probe
> >> the "reboot after panic" setting and not issue the outb to the pvpanic port.
> > 
> > This might or might not be possible on different OS-es.
> > What exactly is gained by doing vmstop on outb of pvpanic?
> 
> Because events are edge-triggered, and can be lost if management dies at
> the wrong time, each event that QEMU sends must go together with a way
> for management to poll the state.
> 
> For panic, the way to poll the state is "info status".  This matches
> what we do for watchdogs, for example.  Management can issue "info
> status" to learn of the panic state, even if it happens while management
> itself is not running:
> 
>      libvirtd                 QEMU                  guest
>   ---------------------------------------------------------------
>      stops
>                                                  <- pvpanic outb
>                               emits panic event
>                               (no one receives it)
>      starts
>      info status ->
>                               <- PANICKED
> 
> 
> Because there is only one running state, this means the VM has to be
> stopped.
> 
> But actually, fixing the driver would only be required if pvpanic were
> mandatory.
> 
> Now that pvpanic is optional, "reboot after panic" can also be fixed in
> libvirt.  Let's remove the "must reset after panic" limitation; then,
> libvirt can simply do itself a "continue" after receiving the panicked
> event (or after seeing that the guest is in panicked state).  The
> panicked event will never be sent unless management explicitly requests
> it (with "-device pvpanic"), so backwards compatibility is preserved.
> 
> The pause will still happen if management was stopped, but that's a fair
> compromise IMHO.
> 
> It will mean also that "reboot after panic" will be broken in 1.6.0,
> unfortunately.  Perhaps we can have a quick 1.6.1 release with this patch:
> 
> diff --git a/vl.c b/vl.c
> index 25b8f2f..25e890a 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -685,8 +685,7 @@ int runstate_is_running(void)
>  bool runstate_needs_reset(void)
>  {
>      return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
> -        runstate_check(RUN_STATE_SHUTDOWN) ||
> -        runstate_check(RUN_STATE_GUEST_PANICKED);
> +        runstate_check(RUN_STATE_SHUTDOWN);
>  }
> 
>  StatusInfo *qmp_query_status(Error **errp)
> 
> 
> By the way, this means two things:
> 
> - I am now sold on the idea that explicitly enabling of pvpanic is the
> right thing to do;
> 
> - on the other hand this is the proof that the change was not fully
> understood, and rushing it in 1.6 was the wrong thing to do.
> 
> Paolo

You mean 1.5.
pvpanic was a builtin in 1.5 and that was clearly the wrong thing to do.
We fixed that in 1.6, thankfully.

> > We want a notification about the panic but
> > adding yet another way to halt seems kind of useless.
> > Why not let VM continue? If it wants to stop it
> > can always call halt.
>
Paolo Bonzini Aug. 21, 2013, 10:37 a.m. UTC | #2
Il 21/08/2013 12:16, Michael S. Tsirkin ha scritto:
> > 
> > By the way, this means two things:
> > 
> > - I am now sold on the idea that explicitly enabling of pvpanic is the
> > right thing to do;
> > 
> > - on the other hand this is the proof that the change was not fully
> > understood, and rushing it in 1.6 was the wrong thing to do.
> 
> You mean 1.5.
> pvpanic was a builtin in 1.5 and that was clearly the wrong thing to do.
> We fixed that in 1.6, thankfully.

No, even taking it out in 1.6 is a mistake.  It happened too close to
the release.

Now we have 1.5 which has it, 1.6 which doesn't have it and requires
reset after panic, 1.7 which doesn't have it and in all likelihood will
not require reset after panic.

Paolo
diff mbox

Patch

diff --git a/vl.c b/vl.c
index 25b8f2f..25e890a 100644
--- a/vl.c
+++ b/vl.c
@@ -685,8 +685,7 @@  int runstate_is_running(void)
 bool runstate_needs_reset(void)
 {
     return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-        runstate_check(RUN_STATE_SHUTDOWN) ||
-        runstate_check(RUN_STATE_GUEST_PANICKED);
+        runstate_check(RUN_STATE_SHUTDOWN);
 }

 StatusInfo *qmp_query_status(Error **errp)