diff mbox

pvpanic plans?

Message ID 20131031143004.GA9948@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin Oct. 31, 2013, 2:30 p.m. UTC
On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
> Hi All,
> 
> I know it's been a long time since this thread. But qemu 1.7 is
> releasing, do you have any consensus on this?
> 
> Thanks.


I think the biggest issue is the new PANICKED state.
Guests already have simple ways to halt the CPU,
and actually do.  I think a new state was a mistake.
So how about the following? Does it break anything?
(Untested).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Comments

Eric Blake Oct. 31, 2013, 2:32 p.m. UTC | #1
On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
> On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
>> Hi All,
>>
>> I know it's been a long time since this thread. But qemu 1.7 is
>> releasing, do you have any consensus on this?
>>
>> Thanks.
> 
> 
> I think the biggest issue is the new PANICKED state.
> Guests already have simple ways to halt the CPU,
> and actually do.  I think a new state was a mistake.
> So how about the following? Does it break anything?
> (Untested).
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
> index 226e298..2055afc 100644
> --- a/hw/misc/pvpanic.c
> +++ b/hw/misc/pvpanic.c
> @@ -51,7 +51,6 @@ static void handle_event(int event)
>  
>      if (event & PVPANIC_PANICKED) {
>          panicked_mon_event("pause");
> -        vm_stop(RUN_STATE_GUEST_PANICKED);

Don't you still need to halt the guest on a panic event, for management
to have a chance to choose what to do about the panic?  I'm suspecting
this patch does break things.
Anthony Liguori Oct. 31, 2013, 2:34 p.m. UTC | #2
On Thu, Oct 31, 2013 at 3:32 PM, Eric Blake <eblake@redhat.com> wrote:
> On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
>> On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
>>> Hi All,
>>>
>>> I know it's been a long time since this thread. But qemu 1.7 is
>>> releasing, do you have any consensus on this?
>>>
>>> Thanks.
>>
>>
>> I think the biggest issue is the new PANICKED state.
>> Guests already have simple ways to halt the CPU,
>> and actually do.  I think a new state was a mistake.
>> So how about the following? Does it break anything?
>> (Untested).
>>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
>> index 226e298..2055afc 100644
>> --- a/hw/misc/pvpanic.c
>> +++ b/hw/misc/pvpanic.c
>> @@ -51,7 +51,6 @@ static void handle_event(int event)
>>
>>      if (event & PVPANIC_PANICKED) {
>>          panicked_mon_event("pause");
>> -        vm_stop(RUN_STATE_GUEST_PANICKED);
>
> Don't you still need to halt the guest on a panic event, for management
> to have a chance to choose what to do about the panic?  I'm suspecting
> this patch does break things.

I would be happy to apply a patch that just reverted the whole dang
mess of this device.

Regards,

Anthony Liguori

> --
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Paolo Bonzini Oct. 31, 2013, 2:39 p.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 31/10/2013 15:32, Eric Blake ha scritto:
> On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
>> On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
>>> Hi All,
>>> 
>>> I know it's been a long time since this thread. But qemu 1.7 is
>>> releasing, do you have any consensus on this?
>> 
>> I think the biggest issue is the new PANICKED state. Guests 
>> already have simple ways to halt the CPU, and actually do.  I 
>> think a new state was a mistake. So how about the following?
>> Does it break anything? (Untested).
> 
> Don't you still need to halt the guest on a panic event, for 
> management to have a chance to choose what to do about the panic? 
> I'm suspecting this patch does break things.

Yes, it does.  But I think that, once we make the pvpanic device is
optional, to a large extent there is no bug.  Adding the pvpanic
device to the VM will make libvirt obey <oncrash> instead of the
in-guest setting, and that's it.

Two months have passed and no casualties have been reported due to
pvpanic.  Let's just remove the auto-pvpanic from all machine types in
1.7 (yes, that's backwards incompatible in a strict sense), document
it in the release notes, and hope that the old QEMU versions with
mandatory pvpanic die of old age.

All the advantages/disadvantages from my original messages still
apply.  Let's ignore the disadvantages and just KISS.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJScmuVAAoJEBvWZb6bTYbyS6MP/Rb9japkUmjKy+bUW2Bf5vHD
hOrCL/20LC17OdIdzOngjfcY6OyPrtZ6dtjzvSjsWoxjW+QNuTwLM5h+0kOmvlM/
UCqX9MZiKFysVphnwIFy2fzTKmAVw5WqUlS17XMsiKxpvCquy5Fss5PWB3k3orMD
kRCoTTmQDID95T38OqezvzDaRjNvoHXOTuWFOhbzWt2OFUwg1WDy1GLKV/pSx0/o
GS2/RYo5iQkEtZk0muj0fmWEZX1K2nfaYmVQh39LJhFrWQLabHskYz1+n/OTmwQT
3DiUkfeqrxLahlyHbIOguLjLY8gH2fZa0gzHEB7D0b1aiuqQmsxM3ELdl8pJJzex
1ZGKt9X1u58v/SvfDHW14uCMAiv8jgQDnDOa6pgi7DezBQym+90+RnzC2Z5BcHIp
hsPEB0Bc47REPu69GOSj7XQ1uan5yQS38jSv8D11nJEW8VfXiV4smOZhivrEOicR
mdYYV6BNc985vcVOmvmkTJ5VkUOKeMzMDAJkSDqN6P0fQKTOJpCtJag3TcjHVRB4
ORXUrdO9NuICGjwB75T86INkxEsXFaw1aHIKpcxMk2PN6u/Zc+n7GXf65ReGorbL
2QUrepzUUt+mTYXJha9h7gEudRnixe5zK8AgqtSHAJPKP++LrFz4lrmzCHC6e3V1
6K2AhZl4EjBJaf6AMy70
=GBBt
-----END PGP SIGNATURE-----
Michael S. Tsirkin Oct. 31, 2013, 2:47 p.m. UTC | #4
On Thu, Oct 31, 2013 at 08:32:43AM -0600, Eric Blake wrote:
> On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
> > On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
> >> Hi All,
> >>
> >> I know it's been a long time since this thread. But qemu 1.7 is
> >> releasing, do you have any consensus on this?
> >>
> >> Thanks.
> > 
> > 
> > I think the biggest issue is the new PANICKED state.
> > Guests already have simple ways to halt the CPU,
> > and actually do.  I think a new state was a mistake.
> > So how about the following? Does it break anything?
> > (Untested).
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
> > index 226e298..2055afc 100644
> > --- a/hw/misc/pvpanic.c
> > +++ b/hw/misc/pvpanic.c
> > @@ -51,7 +51,6 @@ static void handle_event(int event)
> >  
> >      if (event & PVPANIC_PANICKED) {
> >          panicked_mon_event("pause");
> > -        vm_stop(RUN_STATE_GUEST_PANICKED);
> 
> Don't you still need to halt the guest on a panic event, for management
> to have a chance to choose what to do about the panic?

Guest can just call hlt to do this. Most guests do this on a panic
already.

> I'm suspecting
> this patch does break things.

http://xkcd.com/1172/

> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Eric Blake Oct. 31, 2013, 2:49 p.m. UTC | #5
On 10/31/2013 08:47 AM, Michael S. Tsirkin wrote:

>>>      if (event & PVPANIC_PANICKED) {
>>>          panicked_mon_event("pause");
>>> -        vm_stop(RUN_STATE_GUEST_PANICKED);
>>
>> Don't you still need to halt the guest on a panic event, for management
>> to have a chance to choose what to do about the panic?
> 
> Guest can just call hlt to do this. Most guests do this on a panic
> already.

On the one hand, the fact that the guest already has to inform the host
means we are already trusting the guest behavior on a panic.  On the
other hand, assuming that the guest will ALWAYS halt after triggering a
panic is putting a lot more trust in the guest, compared to qemu
explicitly halting the guest so that management has a chance to choose
to dump the guest's state at the moment the panic was flagged.

The biggest argument for either removing all auto-pvpanic, or reverting
pvpanic altogether, is that no one seems to be actively using pvpanic in
the field yet.  I wish we could get more feedback from Fujitsu as the
original patch authors on what they are looking for in a working
solution, rather than repeatedly second-guessing everything downstream
and delaying the eradication of the buggy behavior even longer.
Michael S. Tsirkin Oct. 31, 2013, 2:52 p.m. UTC | #6
On Thu, Oct 31, 2013 at 03:39:17PM +0100, Paolo Bonzini wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Il 31/10/2013 15:32, Eric Blake ha scritto:
> > On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
> >> On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
> >>> Hi All,
> >>> 
> >>> I know it's been a long time since this thread. But qemu 1.7 is
> >>> releasing, do you have any consensus on this?
> >> 
> >> I think the biggest issue is the new PANICKED state. Guests 
> >> already have simple ways to halt the CPU, and actually do.  I 
> >> think a new state was a mistake. So how about the following?
> >> Does it break anything? (Untested).
> > 
> > Don't you still need to halt the guest on a panic event, for 
> > management to have a chance to choose what to do about the panic? 
> > I'm suspecting this patch does break things.
> 
> Yes, it does.

What does it break exactly?

> But I think that, once we make the pvpanic device is
> optional, to a large extent there is no bug.  Adding the pvpanic
> device to the VM will make libvirt obey <oncrash> instead of the
> in-guest setting, and that's it.
> 
> Two months have passed and no casualties have been reported due to
> pvpanic.  Let's just remove the auto-pvpanic from all machine types in
> 1.7 (yes, that's backwards incompatible in a strict sense), document
> it in the release notes, and hope that the old QEMU versions with
> mandatory pvpanic die of old age.

Nod. I'm fine with that.

> All the advantages/disadvantages from my original messages still
> apply.  Let's ignore the disadvantages and just KISS.
> 
> Paolo

I think we still need to do get rid of the PANICKED state somehow.
If we can't replace it with RUNNING state, let's replace it with PAUSED.

For example, you can't continue from panicked for some reason.
You can't do a reset.
But you can pause and then continue.


> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iQIcBAEBAgAGBQJScmuVAAoJEBvWZb6bTYbyS6MP/Rb9japkUmjKy+bUW2Bf5vHD
> hOrCL/20LC17OdIdzOngjfcY6OyPrtZ6dtjzvSjsWoxjW+QNuTwLM5h+0kOmvlM/
> UCqX9MZiKFysVphnwIFy2fzTKmAVw5WqUlS17XMsiKxpvCquy5Fss5PWB3k3orMD
> kRCoTTmQDID95T38OqezvzDaRjNvoHXOTuWFOhbzWt2OFUwg1WDy1GLKV/pSx0/o
> GS2/RYo5iQkEtZk0muj0fmWEZX1K2nfaYmVQh39LJhFrWQLabHskYz1+n/OTmwQT
> 3DiUkfeqrxLahlyHbIOguLjLY8gH2fZa0gzHEB7D0b1aiuqQmsxM3ELdl8pJJzex
> 1ZGKt9X1u58v/SvfDHW14uCMAiv8jgQDnDOa6pgi7DezBQym+90+RnzC2Z5BcHIp
> hsPEB0Bc47REPu69GOSj7XQ1uan5yQS38jSv8D11nJEW8VfXiV4smOZhivrEOicR
> mdYYV6BNc985vcVOmvmkTJ5VkUOKeMzMDAJkSDqN6P0fQKTOJpCtJag3TcjHVRB4
> ORXUrdO9NuICGjwB75T86INkxEsXFaw1aHIKpcxMk2PN6u/Zc+n7GXf65ReGorbL
> 2QUrepzUUt+mTYXJha9h7gEudRnixe5zK8AgqtSHAJPKP++LrFz4lrmzCHC6e3V1
> 6K2AhZl4EjBJaf6AMy70
> =GBBt
> -----END PGP SIGNATURE-----
Paolo Bonzini Oct. 31, 2013, 2:56 p.m. UTC | #7
Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
>> > Yes, it does.
> What does it break exactly?

The point of a panicked event is to examine the guest at a particular
moment in time (e.g. host-initiated crash dump).  If you let the guest
run, it may reboot and prevent you from getting a meaningful dump.

>> > But I think that, once we make the pvpanic device is
>> > optional, to a large extent there is no bug.  Adding the pvpanic
>> > device to the VM will make libvirt obey <oncrash> instead of the
>> > in-guest setting, and that's it.
>> > 
>> > Two months have passed and no casualties have been reported due to
>> > pvpanic.  Let's just remove the auto-pvpanic from all machine types in
>> > 1.7 (yes, that's backwards incompatible in a strict sense), document
>> > it in the release notes, and hope that the old QEMU versions with
>> > mandatory pvpanic die of old age.
> 
> Nod. I'm fine with that.
> 
> I think we still need to do get rid of the PANICKED state somehow.
> If we can't replace it with RUNNING state, let's replace it with PAUSED.
> 
> For example, you can't continue from panicked for some reason.
> You can't do a reset.  But you can pause and then continue.

We need to keep the PANICKED state, but we can make it a normal
"resumable" state.

Basically it's patches 1 and 2 at
http://permalink.gmane.org/gmane.comp.emulators.qemu/229131.  Rebasing
will fix the problem highlighted in the commit message of patch 2.

Paolo
Michael S. Tsirkin Oct. 31, 2013, 3:07 p.m. UTC | #8
On Thu, Oct 31, 2013 at 08:49:08AM -0600, Eric Blake wrote:
> On 10/31/2013 08:47 AM, Michael S. Tsirkin wrote:
> 
> >>>      if (event & PVPANIC_PANICKED) {
> >>>          panicked_mon_event("pause");
> >>> -        vm_stop(RUN_STATE_GUEST_PANICKED);
> >>
> >> Don't you still need to halt the guest on a panic event, for management
> >> to have a chance to choose what to do about the panic?
> > 
> > Guest can just call hlt to do this. Most guests do this on a panic
> > already.
> 
> On the one hand, the fact that the guest already has to inform the host
> means we are already trusting the guest behavior on a panic.  On the
> other hand, assuming that the guest will ALWAYS halt after triggering a
> panic is putting a lot more trust in the guest, compared to qemu
> explicitly halting the guest so that management has a chance to choose
> to dump the guest's state at the moment the panic was flagged.

I wouldn't call it *a lot* more trust. And again, this is guest policy:
if you want to do hlt from driver because you think it's safer, go for it.

> The biggest argument for either removing all auto-pvpanic, or reverting
> pvpanic altogether, is that no one seems to be actively using pvpanic in
> the field yet.  I wish we could get more feedback from Fujitsu as the
> original patch authors on what they are looking for in a working
> solution, rather than repeatedly second-guessing everything downstream
> and delaying the eradication of the buggy behavior even longer.


With my patch we have a benign device that merely reports io writes
on the monitor. No code -> no bugs.


> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Michael S. Tsirkin Oct. 31, 2013, 3:09 p.m. UTC | #9
On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
> >> > Yes, it does.
> > What does it break exactly?
> 
> The point of a panicked event is to examine the guest at a particular
> moment in time (e.g. host-initiated crash dump).  If you let the guest
> run, it may reboot and prevent you from getting a meaningful dump.

Well we trust guest anyway, so I think we can trust it to call halt.


> >> > But I think that, once we make the pvpanic device is
> >> > optional, to a large extent there is no bug.  Adding the pvpanic
> >> > device to the VM will make libvirt obey <oncrash> instead of the
> >> > in-guest setting, and that's it.
> >> > 
> >> > Two months have passed and no casualties have been reported due to
> >> > pvpanic.  Let's just remove the auto-pvpanic from all machine types in
> >> > 1.7 (yes, that's backwards incompatible in a strict sense), document
> >> > it in the release notes, and hope that the old QEMU versions with
> >> > mandatory pvpanic die of old age.
> > 
> > Nod. I'm fine with that.
> > 
> > I think we still need to do get rid of the PANICKED state somehow.
> > If we can't replace it with RUNNING state, let's replace it with PAUSED.
> > 
> > For example, you can't continue from panicked for some reason.
> > You can't do a reset.  But you can pause and then continue.
> 
> We need to keep the PANICKED state, but we can make it a normal
> "resumable" state.

If it's resumable how is it different from PAUSED?

> Basically it's patches 1 and 2 at
> http://permalink.gmane.org/gmane.comp.emulators.qemu/229131.  Rebasing
> will fix the problem highlighted in the commit message of patch 2.
> 
> Paolo

Looks like all transitions from paused state should be allowed from panicked
state. So why keep it separate?
Paolo Bonzini Oct. 31, 2013, 3:26 p.m. UTC | #10
Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
> On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
>> Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
>>>>> Yes, it does.
>>> What does it break exactly?
>>
>> The point of a panicked event is to examine the guest at a particular
>> moment in time (e.g. host-initiated crash dump).  If you let the guest
>> run, it may reboot and prevent you from getting a meaningful dump.
> 
> Well we trust guest anyway, so I think we can trust it to call halt.

No, we cannot.  Reboot-in-guest-after-dump-on-host is a perfectly fine
configuration.

>>>>> But I think that, once we make the pvpanic device is
>>>>> optional, to a large extent there is no bug.  Adding the pvpanic
>>>>> device to the VM will make libvirt obey <oncrash> instead of the
>>>>> in-guest setting, and that's it.
>>>>>
>>>>> Two months have passed and no casualties have been reported due to
>>>>> pvpanic.  Let's just remove the auto-pvpanic from all machine types in
>>>>> 1.7 (yes, that's backwards incompatible in a strict sense), document
>>>>> it in the release notes, and hope that the old QEMU versions with
>>>>> mandatory pvpanic die of old age.
>>>
>>> Nod. I'm fine with that.
>>>
>>> I think we still need to do get rid of the PANICKED state somehow.
>>> If we can't replace it with RUNNING state, let's replace it with PAUSED.
>>>
>>> For example, you can't continue from panicked for some reason.
>>> You can't do a reset.  But you can pause and then continue.
>>
>> We need to keep the PANICKED state, but we can make it a normal
>> "resumable" state.
> 
> If it's resumable how is it different from PAUSED?

If the guest panics while for some reason libvirtd went down, libvirt
can see what happened.  It is similar to the "I/O error" state in this
respect.

> Looks like all transitions from paused state should be allowed from panicked
> state. So why keep it separate?

Because you can poll for the state instead of watching an event.

Paolo
Michael S. Tsirkin Oct. 31, 2013, 3:45 p.m. UTC | #11
On Thu, Oct 31, 2013 at 04:26:13PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
> > On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
> >> Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
> >>>>> Yes, it does.
> >>> What does it break exactly?
> >>
> >> The point of a panicked event is to examine the guest at a particular
> >> moment in time (e.g. host-initiated crash dump).  If you let the guest
> >> run, it may reboot and prevent you from getting a meaningful dump.
> > 
> > Well we trust guest anyway, so I think we can trust it to call halt.
> 
> No, we cannot.  Reboot-in-guest-after-dump-on-host is a perfectly fine
> configuration.
>
> >>>>> But I think that, once we make the pvpanic device is
> >>>>> optional, to a large extent there is no bug.  Adding the pvpanic
> >>>>> device to the VM will make libvirt obey <oncrash> instead of the
> >>>>> in-guest setting, and that's it.
> >>>>>
> >>>>> Two months have passed and no casualties have been reported due to
> >>>>> pvpanic.  Let's just remove the auto-pvpanic from all machine types in
> >>>>> 1.7 (yes, that's backwards incompatible in a strict sense), document
> >>>>> it in the release notes, and hope that the old QEMU versions with
> >>>>> mandatory pvpanic die of old age.
> >>>
> >>> Nod. I'm fine with that.
> >>>
> >>> I think we still need to do get rid of the PANICKED state somehow.
> >>> If we can't replace it with RUNNING state, let's replace it with PAUSED.
> >>>
> >>> For example, you can't continue from panicked for some reason.
> >>> You can't do a reset.  But you can pause and then continue.
> >>
> >> We need to keep the PANICKED state, but we can make it a normal
> >> "resumable" state.
> > 
> > If it's resumable how is it different from PAUSED?
> 
> If the guest panics while for some reason libvirtd went down, libvirt
> can see what happened.  It is similar to the "I/O error" state in this
> respect.
> 
> > Looks like all transitions from paused state should be allowed from panicked
> > state. So why keep it separate?
> 
> Because you can poll for the state instead of watching an event.
> 
> Paolo

I see. Maybe it was a mistake to use a separate runtime state for
this, but oh well.

So it should be exactly like paused, we can just find all transitions
from PAUSED and it should be same for PANICKED?
Why isn't DEBUG allowed from PAUSED but allowed from PANICKED then?
Maybe it should be allowed for PAUSED?
Paolo Bonzini Oct. 31, 2013, 3:56 p.m. UTC | #12
Il 31/10/2013 16:45, Michael S. Tsirkin ha scritto:
> On Thu, Oct 31, 2013 at 04:26:13PM +0100, Paolo Bonzini wrote:
>> Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
>>> On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
>>>> Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
>>>>>>> Yes, it does.
>>>>> What does it break exactly?
>>>>
>>>> The point of a panicked event is to examine the guest at a particular
>>>> moment in time (e.g. host-initiated crash dump).  If you let the guest
>>>> run, it may reboot and prevent you from getting a meaningful dump.
>>>
>>> Well we trust guest anyway, so I think we can trust it to call halt.
>>
>> No, we cannot.  Reboot-in-guest-after-dump-on-host is a perfectly fine
>> configuration.
>>
>>>>>>> But I think that, once we make the pvpanic device is
>>>>>>> optional, to a large extent there is no bug.  Adding the pvpanic
>>>>>>> device to the VM will make libvirt obey <oncrash> instead of the
>>>>>>> in-guest setting, and that's it.
>>>>>>>
>>>>>>> Two months have passed and no casualties have been reported due to
>>>>>>> pvpanic.  Let's just remove the auto-pvpanic from all machine types in
>>>>>>> 1.7 (yes, that's backwards incompatible in a strict sense), document
>>>>>>> it in the release notes, and hope that the old QEMU versions with
>>>>>>> mandatory pvpanic die of old age.
>>>>>
>>>>> Nod. I'm fine with that.
>>>>>
>>>>> I think we still need to do get rid of the PANICKED state somehow.
>>>>> If we can't replace it with RUNNING state, let's replace it with PAUSED.
>>>>>
>>>>> For example, you can't continue from panicked for some reason.
>>>>> You can't do a reset.  But you can pause and then continue.
>>>>
>>>> We need to keep the PANICKED state, but we can make it a normal
>>>> "resumable" state.
>>>
>>> If it's resumable how is it different from PAUSED?
>>
>> If the guest panics while for some reason libvirtd went down, libvirt
>> can see what happened.  It is similar to the "I/O error" state in this
>> respect.
>>
>>> Looks like all transitions from paused state should be allowed from panicked
>>> state. So why keep it separate?
>>
>> Because you can poll for the state instead of watching an event.
> 
> I see. Maybe it was a mistake to use a separate runtime state for
> this, but oh well.

Yes, we should have had a list of "reasons" why a guest is stopped (I/O
error, panic, gdb, ...) and a command to clear one or more of them;
there can be paused/running/waiting-for-migration/... states, but many
of the states we have are not necessarily in mutual exclusion.

But we cannot really choose now.

> So it should be exactly like paused, we can just find all transitions
> from PAUSED and it should be same for PANICKED?
> Why isn't DEBUG allowed from PAUSED but allowed from PANICKED then?
> Maybe it should be allowed for PAUSED?

PANICKED->DEBUG was added by commit bc7d0e667.  That commit can be
reverted if the panicked state is removed from runstate_needs_reset.

Paolo
Christian Borntraeger Nov. 4, 2013, 9:25 a.m. UTC | #13
On 31/10/13 15:30, Michael S. Tsirkin wrote:
> On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
>> Hi All,
>>
>> I know it's been a long time since this thread. But qemu 1.7 is
>> releasing, do you have any consensus on this?
>>
>> Thanks.
> 
> 
> I think the biggest issue is the new PANICKED state.

I thought the problem was that the new device broke windows and all
the following hazzle.

> Guests already have simple ways to halt the CPU,
> and actually do.  I think a new state was a mistake.
> So how about the following? Does it break anything?
> (Untested).

Please note that on s390 we also do the panic state (on a disabled wait)


"target-s390x/kvm.c"
...

                    monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
                    qobject_decref(data);
                    vm_stop(RUN_STATE_GUEST_PANICKED);
...

Currently it is possible to restart libvirt, e.g. after an update and then it will
be able to fetch the full state of a guest via QMP. It will then also be able to
detect that this guest panicked some time ago.
I think one issue when removing the PANICKED state is that libvirt can then no 
longer detect that state, correct?

Christian


> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
> index 226e298..2055afc 100644
> --- a/hw/misc/pvpanic.c
> +++ b/hw/misc/pvpanic.c
> @@ -51,7 +51,6 @@ static void handle_event(int event)
> 
>      if (event & PVPANIC_PANICKED) {
>          panicked_mon_event("pause");
> -        vm_stop(RUN_STATE_GUEST_PANICKED);
>          return;
>      }
>  }
> 
>
diff mbox

Patch

diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
index 226e298..2055afc 100644
--- a/hw/misc/pvpanic.c
+++ b/hw/misc/pvpanic.c
@@ -51,7 +51,6 @@  static void handle_event(int event)
 
     if (event & PVPANIC_PANICKED) {
         panicked_mon_event("pause");
-        vm_stop(RUN_STATE_GUEST_PANICKED);
         return;
     }
 }