Patchwork pvpanic plans?

login
register
mail settings
Submitter Michael S. Tsirkin
Date Oct. 31, 2013, 4:14 p.m.
Message ID <20131031161404.GA10710@redhat.com>
Download mbox | patch
Permalink /patch/287551/
State New
Headers show

Comments

Michael S. Tsirkin - Oct. 31, 2013, 4:14 p.m.
On Thu, Oct 31, 2013 at 04:56:07PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 16:45, Michael S. Tsirkin ha scritto:
> > On Thu, Oct 31, 2013 at 04:26:13PM +0100, Paolo Bonzini wrote:
> >> Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
> >>> On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
> >>>> Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
> >>>>>>> Yes, it does.
> >>>>> What does it break exactly?
> >>>>
> >>>> The point of a panicked event is to examine the guest at a particular
> >>>> moment in time (e.g. host-initiated crash dump).  If you let the guest
> >>>> run, it may reboot and prevent you from getting a meaningful dump.
> >>>
> >>> Well we trust guest anyway, so I think we can trust it to call halt.
> >>
> >> No, we cannot.  Reboot-in-guest-after-dump-on-host is a perfectly fine
> >> configuration.
> >>
> >>>>>>> But I think that, once we make the pvpanic device is
> >>>>>>> optional, to a large extent there is no bug.  Adding the pvpanic
> >>>>>>> device to the VM will make libvirt obey <oncrash> instead of the
> >>>>>>> in-guest setting, and that's it.
> >>>>>>>
> >>>>>>> Two months have passed and no casualties have been reported due to
> >>>>>>> pvpanic.  Let's just remove the auto-pvpanic from all machine types in
> >>>>>>> 1.7 (yes, that's backwards incompatible in a strict sense), document
> >>>>>>> it in the release notes, and hope that the old QEMU versions with
> >>>>>>> mandatory pvpanic die of old age.
> >>>>>
> >>>>> Nod. I'm fine with that.
> >>>>>
> >>>>> I think we still need to do get rid of the PANICKED state somehow.
> >>>>> If we can't replace it with RUNNING state, let's replace it with PAUSED.
> >>>>>
> >>>>> For example, you can't continue from panicked for some reason.
> >>>>> You can't do a reset.  But you can pause and then continue.
> >>>>
> >>>> We need to keep the PANICKED state, but we can make it a normal
> >>>> "resumable" state.
> >>>
> >>> If it's resumable how is it different from PAUSED?
> >>
> >> If the guest panics while for some reason libvirtd went down, libvirt
> >> can see what happened.  It is similar to the "I/O error" state in this
> >> respect.
> >>
> >>> Looks like all transitions from paused state should be allowed from panicked
> >>> state. So why keep it separate?
> >>
> >> Because you can poll for the state instead of watching an event.
> > 
> > I see. Maybe it was a mistake to use a separate runtime state for
> > this, but oh well.
> 
> Yes, we should have had a list of "reasons" why a guest is stopped (I/O
> error, panic, gdb, ...) and a command to clear one or more of them;
> there can be paused/running/waiting-for-migration/... states, but many
> of the states we have are not necessarily in mutual exclusion.
> 
> But we cannot really choose now.
> 
> > So it should be exactly like paused, we can just find all transitions
> > from PAUSED and it should be same for PANICKED?
> > Why isn't DEBUG allowed from PAUSED but allowed from PANICKED then?
> > Maybe it should be allowed for PAUSED?
> 
> PANICKED->DEBUG was added by commit bc7d0e667.  That commit can be
> reverted if the panicked state is removed from runstate_needs_reset.
> 
> Paolo

Okay so let's drop the code duplication and explicitly make
them the same?

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Paolo Bonzini - Oct. 31, 2013, 4:17 p.m.
Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
>> PANICKED->DEBUG was added by commit bc7d0e667.  That commit can be
>> reverted if the panicked state is removed from runstate_needs_reset.
> 
> Okay so let's drop the code duplication and explicitly make
> them the same?
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> 
> diff --git a/vl.c b/vl.c
> index 46c29c4..e12d317 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
>  
> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
> -
>      { RUN_STATE_MAX, RUN_STATE_MAX },
>  };
>  
> @@ -660,6 +656,12 @@ static void runstate_init(void)
>  
>      for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) {
>          runstate_valid_transitions[p->from][p->to] = true;
> +        /* Panicked state is same as paused, we only made it different so
> +         * management can detect a panic.
> +         */
> +        if (p->from == RUN_STATE_PAUSED) {
> +            runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;

It makes only sense to me if you do that for IO_ERROR and WATCHDOG as
well, and perhaps there are others I'm missing.  Just add a comment
before runstate_transitions_def's entries for PANICKED, IO_ERROR and
WATCHDOG.

But again, it is somewhat separate from the issue at hand, which is to
finally make pvpanic usable and hopefully before 1.7.

Paolo

> +        }
>      }
>  }
>  
> @@ -686,8 +688,7 @@ int runstate_is_running(void)
>  bool runstate_needs_reset(void)
>  {
>      return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
> -        runstate_check(RUN_STATE_SHUTDOWN) ||
> -        runstate_check(RUN_STATE_GUEST_PANICKED);
> +        runstate_check(RUN_STATE_SHUTDOWN);
>  }
>  
>  StatusInfo *qmp_query_status(Error **errp)
>
Michael S. Tsirkin - Oct. 31, 2013, 4:26 p.m.
On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
> >> PANICKED->DEBUG was added by commit bc7d0e667.  That commit can be
> >> reverted if the panicked state is removed from runstate_needs_reset.
> > 
> > Okay so let's drop the code duplication and explicitly make
> > them the same?
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > 
> > diff --git a/vl.c b/vl.c
> > index 46c29c4..e12d317 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = {
> >      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
> >      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
> >  
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
> > -
> >      { RUN_STATE_MAX, RUN_STATE_MAX },
> >  };
> >  
> > @@ -660,6 +656,12 @@ static void runstate_init(void)
> >  
> >      for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) {
> >          runstate_valid_transitions[p->from][p->to] = true;
> > +        /* Panicked state is same as paused, we only made it different so
> > +         * management can detect a panic.
> > +         */
> > +        if (p->from == RUN_STATE_PAUSED) {
> > +            runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
> 
> It makes only sense to me if you do that for IO_ERROR and WATCHDOG as
> well, and perhaps there are others I'm missing.  Just add a comment
> before runstate_transitions_def's entries for PANICKED, IO_ERROR and
> WATCHDOG.
> 
> But again, it is somewhat separate from the issue at hand, which is to
> finally make pvpanic usable and hopefully before 1.7.
> 
> Paolo

The issue is that you can't continue from panicked state.
You should be able to do that without going through paused.

> > +        }
> >      }
> >  }
> >  
> > @@ -686,8 +688,7 @@ int runstate_is_running(void)
> >  bool runstate_needs_reset(void)
> >  {
> >      return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
> > -        runstate_check(RUN_STATE_SHUTDOWN) ||
> > -        runstate_check(RUN_STATE_GUEST_PANICKED);
> > +        runstate_check(RUN_STATE_SHUTDOWN);
> >  }
> >  
> >  StatusInfo *qmp_query_status(Error **errp)
> >
Michael S. Tsirkin - Oct. 31, 2013, 4:28 p.m.
On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
> >> PANICKED->DEBUG was added by commit bc7d0e667.  That commit can be
> >> reverted if the panicked state is removed from runstate_needs_reset.
> > 
> > Okay so let's drop the code duplication and explicitly make
> > them the same?
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > 
> > diff --git a/vl.c b/vl.c
> > index 46c29c4..e12d317 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = {
> >      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
> >      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
> >  
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
> > -
> >      { RUN_STATE_MAX, RUN_STATE_MAX },
> >  };
> >  
> > @@ -660,6 +656,12 @@ static void runstate_init(void)
> >  
> >      for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) {
> >          runstate_valid_transitions[p->from][p->to] = true;
> > +        /* Panicked state is same as paused, we only made it different so
> > +         * management can detect a panic.
> > +         */
> > +        if (p->from == RUN_STATE_PAUSED) {
> > +            runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
> 
> It makes only sense to me if you do that for IO_ERROR and WATCHDOG as
> well,

Yea, let's do that.

> and perhaps there are others I'm missing.
>  Just add a comment
> before runstate_transitions_def's entries for PANICKED, IO_ERROR and
> WATCHDOG.

comments don't compile :)

> But again, it is somewhat separate from the issue at hand, which is to
> finally make pvpanic usable and hopefully before 1.7.
> 
> Paolo
> 
> > +        }
> >      }
> >  }
> >  
> > @@ -686,8 +688,7 @@ int runstate_is_running(void)
> >  bool runstate_needs_reset(void)
> >  {
> >      return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
> > -        runstate_check(RUN_STATE_SHUTDOWN) ||
> > -        runstate_check(RUN_STATE_GUEST_PANICKED);
> > +        runstate_check(RUN_STATE_SHUTDOWN);
> >  }
> >  
> >  StatusInfo *qmp_query_status(Error **errp)
> >
Paolo Bonzini - Oct. 31, 2013, 4:38 p.m.
Il 31/10/2013 17:26, Michael S. Tsirkin ha scritto:
> On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
>> Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
>>>> PANICKED->DEBUG was added by commit bc7d0e667.  That commit can be
>>>> reverted if the panicked state is removed from runstate_needs_reset.
>>>
>>> Okay so let's drop the code duplication and explicitly make
>>> them the same?
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>>
>>> diff --git a/vl.c b/vl.c
>>> index 46c29c4..e12d317 100644
>>> --- a/vl.c
>>> +++ b/vl.c
>>> @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = {
>>>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>>>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
>>>  
>>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
>>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
>>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
>>> -
>>>      { RUN_STATE_MAX, RUN_STATE_MAX },
>>>  };
>>>  
>>> @@ -660,6 +656,12 @@ static void runstate_init(void)
>>>  
>>>      for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) {
>>>          runstate_valid_transitions[p->from][p->to] = true;
>>> +        /* Panicked state is same as paused, we only made it different so
>>> +         * management can detect a panic.
>>> +         */
>>> +        if (p->from == RUN_STATE_PAUSED) {
>>> +            runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
>>
>> It makes only sense to me if you do that for IO_ERROR and WATCHDOG as
>> well, and perhaps there are others I'm missing.  Just add a comment
>> before runstate_transitions_def's entries for PANICKED, IO_ERROR and
>> WATCHDOG.
>>
>> But again, it is somewhat separate from the issue at hand, which is to
>> finally make pvpanic usable and hopefully before 1.7.
>>
>> Paolo
> 
> The issue is that you can't continue from panicked state.
> You should be able to do that without going through paused.

Yes, that's what my patch (posted the link before) does:

-    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
+    { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
     { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },


Comments don't compile, but are also easier to understand than code.
Special logic in runstate_init is unnecessarily complicated, for a table
that hardly sees any change.  English works better, whoever modifies the
table has it under their eyes.

Paolo
Michael S. Tsirkin - Oct. 31, 2013, 5:01 p.m.
On Thu, Oct 31, 2013 at 05:38:40PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 17:26, Michael S. Tsirkin ha scritto:
> > On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
> >> Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
> >>>> PANICKED->DEBUG was added by commit bc7d0e667.  That commit can be
> >>>> reverted if the panicked state is removed from runstate_needs_reset.
> >>>
> >>> Okay so let's drop the code duplication and explicitly make
> >>> them the same?
> >>>
> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>>
> >>> diff --git a/vl.c b/vl.c
> >>> index 46c29c4..e12d317 100644
> >>> --- a/vl.c
> >>> +++ b/vl.c
> >>> @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = {
> >>>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
> >>>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
> >>>  
> >>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> >>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> >>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
> >>> -
> >>>      { RUN_STATE_MAX, RUN_STATE_MAX },
> >>>  };
> >>>  
> >>> @@ -660,6 +656,12 @@ static void runstate_init(void)
> >>>  
> >>>      for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) {
> >>>          runstate_valid_transitions[p->from][p->to] = true;
> >>> +        /* Panicked state is same as paused, we only made it different so
> >>> +         * management can detect a panic.
> >>> +         */
> >>> +        if (p->from == RUN_STATE_PAUSED) {
> >>> +            runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
> >>
> >> It makes only sense to me if you do that for IO_ERROR and WATCHDOG as
> >> well, and perhaps there are others I'm missing.  Just add a comment
> >> before runstate_transitions_def's entries for PANICKED, IO_ERROR and
> >> WATCHDOG.
> >>
> >> But again, it is somewhat separate from the issue at hand, which is to
> >> finally make pvpanic usable and hopefully before 1.7.
> >>
> >> Paolo
> > 
> > The issue is that you can't continue from panicked state.
> > You should be able to do that without going through paused.
> 
> Yes, that's what my patch (posted the link before) does:
> 
> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> +    { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
>      { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
> 

Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and
drop RUN_STATE_GUEST_PANICKED from need reset list?

> Comments don't compile, but are also easier to understand than code.
> Special logic in runstate_init is unnecessarily complicated, for a table
> that hardly sees any change.  English works better, whoever modifies the
> table has it under their eyes.
> 
> Paolo
Paolo Bonzini - Oct. 31, 2013, 5:10 p.m.
Il 31/10/2013 18:01, Michael S. Tsirkin ha scritto:
> > Yes, that's what my patch (posted the link before) does:
> > 
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> > +    { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
> >      { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
> 
> Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and
> drop RUN_STATE_GUEST_PANICKED from need reset list?

Yes, and also modify gdbstub.c.  It's all in the URL I posted a few
hours ago.

Paolo
Michael S. Tsirkin - Oct. 31, 2013, 5:18 p.m.
On Thu, Oct 31, 2013 at 06:10:36PM +0100, Paolo Bonzini wrote:
> Il 31/10/2013 18:01, Michael S. Tsirkin ha scritto:
> > > Yes, that's what my patch (posted the link before) does:
> > > 
> > > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
> > > +    { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
> > >      { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> > > -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
> > 
> > Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and
> > drop RUN_STATE_GUEST_PANICKED from need reset list?
> 
> Yes, and also modify gdbstub.c.  It's all in the URL I posted a few
> hours ago.
> 
> Paolo

OK, so can you pls post patches 1 and 2? I'll review and ack.
Paolo Bonzini - Oct. 31, 2013, 6:03 p.m.
Il 31/10/2013 18:18, Michael S. Tsirkin ha scritto:
> On Thu, Oct 31, 2013 at 06:10:36PM +0100, Paolo Bonzini wrote:
>> Il 31/10/2013 18:01, Michael S. Tsirkin ha scritto:
>>>> Yes, that's what my patch (posted the link before) does:
>>>>
>>>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
>>>> +    { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
>>>>      { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
>>>> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
>>>
>>> Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and
>>> drop RUN_STATE_GUEST_PANICKED from need reset list?
>>
>> Yes, and also modify gdbstub.c.  It's all in the URL I posted a few
>> hours ago.
>>
>> Paolo
> 
> OK, so can you pls post patches 1 and 2? I'll review and ack.

Next Monday I will.

Paolo

Patch

diff --git a/vl.c b/vl.c
index 46c29c4..e12d317 100644
--- a/vl.c
+++ b/vl.c
@@ -638,10 +638,6 @@  static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
     { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 
-    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
-    { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
-
     { RUN_STATE_MAX, RUN_STATE_MAX },
 };
 
@@ -660,6 +656,12 @@  static void runstate_init(void)
 
     for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) {
         runstate_valid_transitions[p->from][p->to] = true;
+        /* Panicked state is same as paused, we only made it different so
+         * management can detect a panic.
+         */
+        if (p->from == RUN_STATE_PAUSED) {
+            runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
+        }
     }
 }
 
@@ -686,8 +688,7 @@  int runstate_is_running(void)
 bool runstate_needs_reset(void)
 {
     return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-        runstate_check(RUN_STATE_SHUTDOWN) ||
-        runstate_check(RUN_STATE_GUEST_PANICKED);
+        runstate_check(RUN_STATE_SHUTDOWN);
 }
 
 StatusInfo *qmp_query_status(Error **errp)