Patchwork [v3,2/3] qerror: Add a new MACHINE_STOPPED error message

login
register
mail settings
Submitter Amit Shah
Date Aug. 27, 2010, 5:27 a.m.
Message ID <51ec99ce2db02aeb34ec6683a76895b4a127057d.1282886503.git.amit.shah@redhat.com>
Download mbox | patch
Permalink /patch/62827/
State New
Headers show

Comments

Amit Shah - Aug. 27, 2010, 5:27 a.m.
This error message denotes some command was not successful in completing
as the guest was unresponsive.

Use it in the virtio-balloon code when showing older, cached data.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 hw/virtio-balloon.c |    1 +
 qerror.c            |    4 ++++
 qerror.h            |    3 +++
 3 files changed, 8 insertions(+), 0 deletions(-)
Daniel P. Berrange - Aug. 27, 2010, 9:29 a.m.
On Fri, Aug 27, 2010 at 10:57:10AM +0530, Amit Shah wrote:
> This error message denotes some command was not successful in completing
> as the guest was unresponsive.
> 
> Use it in the virtio-balloon code when showing older, cached data.
> 
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  hw/virtio-balloon.c |    1 +
>  qerror.c            |    4 ++++
>  qerror.h            |    3 +++
>  3 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> index d6c66cf..309c343 100644
> --- a/hw/virtio-balloon.c
> +++ b/hw/virtio-balloon.c
> @@ -140,6 +140,7 @@ static void complete_stats_request(VirtIOBalloon *vb)
>  
>  static void show_old_stats(void *opaque)
>  {
> +    qerror_report(QERR_MACHINE_STOPPED);
>      complete_stats_request(opaque);
>  }


NACK. It has always been allowed & valid to call query-balloon
to get the current balloon level. We must not throw an error
just because the recently added mem stats can't be refreshed.

Regards,
Daniel
Anthony Liguori - Aug. 27, 2010, 12:39 p.m.
On 08/27/2010 04:29 AM, Daniel P. Berrange wrote:
> On Fri, Aug 27, 2010 at 10:57:10AM +0530, Amit Shah wrote:
>    
>> This error message denotes some command was not successful in completing
>> as the guest was unresponsive.
>>
>> Use it in the virtio-balloon code when showing older, cached data.
>>
>> Signed-off-by: Amit Shah<amit.shah@redhat.com>
>> ---
>>   hw/virtio-balloon.c |    1 +
>>   qerror.c            |    4 ++++
>>   qerror.h            |    3 +++
>>   3 files changed, 8 insertions(+), 0 deletions(-)
>>
>> diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
>> index d6c66cf..309c343 100644
>> --- a/hw/virtio-balloon.c
>> +++ b/hw/virtio-balloon.c
>> @@ -140,6 +140,7 @@ static void complete_stats_request(VirtIOBalloon *vb)
>>
>>   static void show_old_stats(void *opaque)
>>   {
>> +    qerror_report(QERR_MACHINE_STOPPED);
>>       complete_stats_request(opaque);
>>   }
>>      
>
> NACK. It has always been allowed&  valid to call query-balloon
> to get the current balloon level. We must not throw an error
> just because the recently added mem stats can't be refreshed.
>    

I think that's a fair comment but why even bother fixing the command.  
Let's introduce a new command that just gets a single piece of 
information instead of having a command return lots of information.

Regards,

Anthony Liguori

> Regards,
> Daniel
>
Daniel P. Berrange - Aug. 27, 2010, 12:58 p.m.
On Fri, Aug 27, 2010 at 07:39:37AM -0500, Anthony Liguori wrote:
> On 08/27/2010 04:29 AM, Daniel P. Berrange wrote:
> >On Fri, Aug 27, 2010 at 10:57:10AM +0530, Amit Shah wrote:
> >   
> >>This error message denotes some command was not successful in completing
> >>as the guest was unresponsive.
> >>
> >>Use it in the virtio-balloon code when showing older, cached data.
> >>
> >>Signed-off-by: Amit Shah<amit.shah@redhat.com>
> >>---
> >>  hw/virtio-balloon.c |    1 +
> >>  qerror.c            |    4 ++++
> >>  qerror.h            |    3 +++
> >>  3 files changed, 8 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> >>index d6c66cf..309c343 100644
> >>--- a/hw/virtio-balloon.c
> >>+++ b/hw/virtio-balloon.c
> >>@@ -140,6 +140,7 @@ static void complete_stats_request(VirtIOBalloon *vb)
> >>
> >>  static void show_old_stats(void *opaque)
> >>  {
> >>+    qerror_report(QERR_MACHINE_STOPPED);
> >>      complete_stats_request(opaque);
> >>  }
> >>     
> >
> >NACK. It has always been allowed&  valid to call query-balloon
> >to get the current balloon level. We must not throw an error
> >just because the recently added mem stats can't be refreshed.
> 
> I think that's a fair comment but why even bother fixing the command.  
> Let's introduce a new command that just gets a single piece of 
> information instead of having a command return lots of information.

The existing query-balloon command that has been around for years &
is used by all current apps has a significant regression since we added
the memstats code to it: a guest can now trivially inflict a DOS on the
mgmt app if it crashes or is malicious. IMHO we need to fix that regression
for 0.13 so that existing apps don't suffer[1]. Adding a timeout to silently
skip the stats refresh if the guest doesn't respond, but without raising
an error seems the best tradeoff we can do here.

Beyond fixing that regression, I agree that this command is terminally
flawed & we need to deprecate it & provide better specified new
replacement(s). This seems like 0.14 work to me though.

Regards,
Daniel

[1] I know that they could already suffer if there was a bug in qemu
    that prevented it responding, even if the guest was not being
    malicious/crashed.
Markus Armbruster - Aug. 27, 2010, 1:59 p.m.
"Daniel P. Berrange" <berrange@redhat.com> writes:

> On Fri, Aug 27, 2010 at 07:39:37AM -0500, Anthony Liguori wrote:
>> On 08/27/2010 04:29 AM, Daniel P. Berrange wrote:
>> >On Fri, Aug 27, 2010 at 10:57:10AM +0530, Amit Shah wrote:
>> >   
>> >>This error message denotes some command was not successful in completing
>> >>as the guest was unresponsive.
>> >>
>> >>Use it in the virtio-balloon code when showing older, cached data.
>> >>
>> >>Signed-off-by: Amit Shah<amit.shah@redhat.com>
>> >>---
>> >>  hw/virtio-balloon.c |    1 +
>> >>  qerror.c            |    4 ++++
>> >>  qerror.h            |    3 +++
>> >>  3 files changed, 8 insertions(+), 0 deletions(-)
>> >>
>> >>diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
>> >>index d6c66cf..309c343 100644
>> >>--- a/hw/virtio-balloon.c
>> >>+++ b/hw/virtio-balloon.c
>> >>@@ -140,6 +140,7 @@ static void complete_stats_request(VirtIOBalloon *vb)
>> >>
>> >>  static void show_old_stats(void *opaque)
>> >>  {
>> >>+    qerror_report(QERR_MACHINE_STOPPED);
>> >>      complete_stats_request(opaque);
>> >>  }
>> >>     
>> >
>> >NACK. It has always been allowed&  valid to call query-balloon
>> >to get the current balloon level. We must not throw an error
>> >just because the recently added mem stats can't be refreshed.
>> 
>> I think that's a fair comment but why even bother fixing the command.  
>> Let's introduce a new command that just gets a single piece of 
>> information instead of having a command return lots of information.
>
> The existing query-balloon command that has been around for years &
> is used by all current apps has a significant regression since we added
> the memstats code to it: a guest can now trivially inflict a DOS on the
> mgmt app if it crashes or is malicious. IMHO we need to fix that regression
> for 0.13 so that existing apps don't suffer[1]. Adding a timeout to silently
> skip the stats refresh if the guest doesn't respond, but without raising
> an error seems the best tradeoff we can do here.

I agree.

Adding a roundtrip through the guest to an existing command was a
mistake.

> Beyond fixing that regression, I agree that this command is terminally
> flawed & we need to deprecate it & provide better specified new
> replacement(s). This seems like 0.14 work to me though.

Yup.

> Regards,
> Daniel
>
> [1] I know that they could already suffer if there was a bug in qemu
>     that prevented it responding, even if the guest was not being
>     malicious/crashed.
Luiz Capitulino - Aug. 27, 2010, 2:15 p.m.
On Fri, 27 Aug 2010 15:59:21 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> "Daniel P. Berrange" <berrange@redhat.com> writes:
> 
> > On Fri, Aug 27, 2010 at 07:39:37AM -0500, Anthony Liguori wrote:
> >> On 08/27/2010 04:29 AM, Daniel P. Berrange wrote:
> >> >On Fri, Aug 27, 2010 at 10:57:10AM +0530, Amit Shah wrote:
> >> >   
> >> >>This error message denotes some command was not successful in completing
> >> >>as the guest was unresponsive.
> >> >>
> >> >>Use it in the virtio-balloon code when showing older, cached data.
> >> >>
> >> >>Signed-off-by: Amit Shah<amit.shah@redhat.com>
> >> >>---
> >> >>  hw/virtio-balloon.c |    1 +
> >> >>  qerror.c            |    4 ++++
> >> >>  qerror.h            |    3 +++
> >> >>  3 files changed, 8 insertions(+), 0 deletions(-)
> >> >>
> >> >>diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> >> >>index d6c66cf..309c343 100644
> >> >>--- a/hw/virtio-balloon.c
> >> >>+++ b/hw/virtio-balloon.c
> >> >>@@ -140,6 +140,7 @@ static void complete_stats_request(VirtIOBalloon *vb)
> >> >>
> >> >>  static void show_old_stats(void *opaque)
> >> >>  {
> >> >>+    qerror_report(QERR_MACHINE_STOPPED);
> >> >>      complete_stats_request(opaque);
> >> >>  }
> >> >>     
> >> >
> >> >NACK. It has always been allowed&  valid to call query-balloon
> >> >to get the current balloon level. We must not throw an error
> >> >just because the recently added mem stats can't be refreshed.
> >> 
> >> I think that's a fair comment but why even bother fixing the command.  
> >> Let's introduce a new command that just gets a single piece of 
> >> information instead of having a command return lots of information.
> >
> > The existing query-balloon command that has been around for years &
> > is used by all current apps has a significant regression since we added
> > the memstats code to it: a guest can now trivially inflict a DOS on the
> > mgmt app if it crashes or is malicious. IMHO we need to fix that regression
> > for 0.13 so that existing apps don't suffer[1]. Adding a timeout to silently
> > skip the stats refresh if the guest doesn't respond, but without raising
> > an error seems the best tradeoff we can do here.
> 
> I agree.
> 
> Adding a roundtrip through the guest to an existing command was a
> mistake.

I wondered if we could drop it for now to make it right in 0.14, but I
believe it's already part of the user monitor for some time and libvirt
uses the stats, right?

I think we need testing/unstable namespace in QMP, where commands can be
tested for while so that we reduce the risk of nasty surprises like this one.

> 
> > Beyond fixing that regression, I agree that this command is terminally
> > flawed & we need to deprecate it & provide better specified new
> > replacement(s). This seems like 0.14 work to me though.
> 
> Yup.
> 
> > Regards,
> > Daniel
> >
> > [1] I know that they could already suffer if there was a bug in qemu
> >     that prevented it responding, even if the guest was not being
> >     malicious/crashed.
>
Anthony Liguori - Aug. 27, 2010, 2:59 p.m.
On 08/27/2010 09:15 AM, Luiz Capitulino wrote:
>
> I wondered if we could drop it for now to make it right in 0.14, but I
> believe it's already part of the user monitor for some time and libvirt
> uses the stats, right?
>
> I think we need testing/unstable namespace in QMP, where commands can be
> tested for while so that we reduce the risk of nasty surprises like this one.
>    

It's trying to plug a sieve with a band-aid.  It's certainly an 
"improvement" but it's of question utility looking at the bigger picture.

Balloon is a perfect example of where what we really need to do is build 
interface interfaces that make sense, and then expose them in QMP.

What's a reasonable C-level API to query statistics that potentially may 
never return?  Building in a timeout is something of a crappy API 
because it puts policy deep in the API that is trivial to implement 
elsewhere.  What you'd probably do is something like:

BalloonStatsRequest *query_guest_balloon_stats(CompletionCallback *cb, 
void *opaque);
int cancel_guest_balloon_stats(BalloonStatsRequest *req);
void release_guest_balloon_stats(BalloonStatsRequest *req);

Regards,

Anthony Liguori

>>      
>>> Beyond fixing that regression, I agree that this command is terminally
>>> flawed&  we need to deprecate it&  provide better specified new
>>> replacement(s). This seems like 0.14 work to me though.
>>>        
>> Yup.
>>
>>      
>>> Regards,
>>> Daniel
>>>
>>> [1] I know that they could already suffer if there was a bug in qemu
>>>      that prevented it responding, even if the guest was not being
>>>      malicious/crashed.
>>>        
>>      
>
Daniel P. Berrange - Aug. 27, 2010, 3:33 p.m.
On Fri, Aug 27, 2010 at 09:59:55AM -0500, Anthony Liguori wrote:
> On 08/27/2010 09:15 AM, Luiz Capitulino wrote:
> >
> >I wondered if we could drop it for now to make it right in 0.14, but I
> >believe it's already part of the user monitor for some time and libvirt
> >uses the stats, right?
> >
> >I think we need testing/unstable namespace in QMP, where commands can be
> >tested for while so that we reduce the risk of nasty surprises like this 
> >one.
> >   
> 
> It's trying to plug a sieve with a band-aid.  It's certainly an 
> "improvement" but it's of question utility looking at the bigger picture.
> 
> Balloon is a perfect example of where what we really need to do is build 
> interface interfaces that make sense, and then expose them in QMP.
> 
> What's a reasonable C-level API to query statistics that potentially may 
> never return?  Building in a timeout is something of a crappy API 
> because it puts policy deep in the API that is trivial to implement 
> elsewhere.  What you'd probably do is something like:
> 
> BalloonStatsRequest *query_guest_balloon_stats(CompletionCallback *cb, 
> void *opaque);
> int cancel_guest_balloon_stats(BalloonStatsRequest *req);
> void release_guest_balloon_stats(BalloonStatsRequest *req);

IMHO this is flawed because it does not allow you to fetch the
balloon stats independantly of asking the guest OS to refresh
them. So if the guest dies, it is not possible to ask QEMU
what the last known stats were.

Although the current guest driver implementation requires the
host to request an update before the guest will refresh stats,
in theory someone could provide a guest driver impl thta does
a periodic push to the host without requiring a request.

Thus for a fully generalized access mechanism you need two
APIs and one event

 - API to request guest balloon stats update
 - API to get current known balloon stats
 - Event to notify client of a balloon stats update

And for the non-stats related balloon level we need

 - Event to notify client of balloon level change


Even if the API to request a guest balloon stats update is fully async
and cancellable, you still need the event to notify client of the
balloon stats update, because in a multiple monitor scenario the
client wanting notification may be different to the client requesting
the refresh.

Regards,
Daniel
Anthony Liguori - Aug. 27, 2010, 3:45 p.m.
On 08/27/2010 10:33 AM, Daniel P. Berrange wrote:
> On Fri, Aug 27, 2010 at 09:59:55AM -0500, Anthony Liguori wrote:
>    
>> On 08/27/2010 09:15 AM, Luiz Capitulino wrote:
>>      
>>> I wondered if we could drop it for now to make it right in 0.14, but I
>>> believe it's already part of the user monitor for some time and libvirt
>>> uses the stats, right?
>>>
>>> I think we need testing/unstable namespace in QMP, where commands can be
>>> tested for while so that we reduce the risk of nasty surprises like this
>>> one.
>>>
>>>        
>> It's trying to plug a sieve with a band-aid.  It's certainly an
>> "improvement" but it's of question utility looking at the bigger picture.
>>
>> Balloon is a perfect example of where what we really need to do is build
>> interface interfaces that make sense, and then expose them in QMP.
>>
>> What's a reasonable C-level API to query statistics that potentially may
>> never return?  Building in a timeout is something of a crappy API
>> because it puts policy deep in the API that is trivial to implement
>> elsewhere.  What you'd probably do is something like:
>>
>> BalloonStatsRequest *query_guest_balloon_stats(CompletionCallback *cb,
>> void *opaque);
>> int cancel_guest_balloon_stats(BalloonStatsRequest *req);
>> void release_guest_balloon_stats(BalloonStatsRequest *req);
>>      
> IMHO this is flawed because it does not allow you to fetch the
> balloon stats independantly of asking the guest OS to refresh
> them. So if the guest dies, it is not possible to ask QEMU
> what the last known stats were.
>    

The "last known" stats could be totally meaningless as they could be 
from 3 hours ago and there may have been a full reboot into a different 
OS since then.

Whether a "last known" stat is useful is a policy decision and belongs 
in the management tooling.

> Although the current guest driver implementation requires the
> host to request an update before the guest will refresh stats,
> in theory someone could provide a guest driver impl thta does
> a periodic push to the host without requiring a request.
>
> Thus for a fully generalized access mechanism you need two
> APIs and one event
>
>   - API to request guest balloon stats update
>   - API to get current known balloon stats
>    

Current known balloon stats is not meaningful within qemu.

>   - Event to notify client of a balloon stats update
>
> And for the non-stats related balloon level we need
>
>   - Event to notify client of balloon level change
>    

An alternative API would avoid a 1-1 relationship between requests and 
responses.   That would mean:

void balloon_set_target(BalloonInterface *b, uint64_t value);
uint64_t balloon_get_STAT_NAME(BalloonInterface *b);

void balloon_request_update(BalloonInterface *b);

void balloon_add_stat_change_notifier(BalloonInterface *b, StatNotifer 
*notifer, uint64_t mask);
void balloon_del_stat_change_notifier(BalloonInterface *b, StatNotifier 
*notifier);

This API allows you to set and get individual stats.  Because there are 
individual getters/setters, it provides a better backwards/future 
compatibility and is pretty discoverable.  It offers no real indication 
of the age of the stat.  It's unclear to me if that's a good or bad thing.

balloon_request_update() requests that a guest update it's status but 
there's no specific guarantees in the interface.  A mask might prove 
valuable here.

Then there's an interface to register notifications of when the 
statistics are updated.  I think this probably would cover most use cases.

A main difference between this API is that the API guarantees absolutely 
no "freshness" to the statistics.  It provides an interface to freshen 
them but provides no guarantees about when that will happen and even if 
it will happen.

Regards,

Anthony Liguori

> Even if the API to request a guest balloon stats update is fully async
> and cancellable, you still need the event to notify client of the
> balloon stats update, because in a multiple monitor scenario the
> client wanting notification may be different to the client requesting
> the refresh.
>
> Regards,
> Daniel
>
Luiz Capitulino - Aug. 27, 2010, 4:08 p.m.
On Fri, 27 Aug 2010 09:59:55 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 08/27/2010 09:15 AM, Luiz Capitulino wrote:
> >
> > I wondered if we could drop it for now to make it right in 0.14, but I
> > believe it's already part of the user monitor for some time and libvirt
> > uses the stats, right?
> >
> > I think we need testing/unstable namespace in QMP, where commands can be
> > tested for while so that we reduce the risk of nasty surprises like this one.
> >    
> 
> It's trying to plug a sieve with a band-aid.  It's certainly an 
> "improvement" but it's of question utility looking at the bigger picture.

Are you talking about the testing namespace idea? It doesn't have anything
to do with balloon or how our interfaces are built. It would be just a way
to play with commands that has been converted to QMP but are not available
because they're not stable yet (eg. Jan's device_show).

> Balloon is a perfect example of where what we really need to do is build 
> interface interfaces that make sense, and then expose them in QMP.

Main question is: can we drop the stats the way they are today to do the
Right Thing for 0.14 or not?

I think we have agreed on the internal interfaces approach. My only
concern is whether this will conflict when extending the wire protocol
(eg. adding new arguments to existing commands). Not a problem if the
C API is not stable, of course.

> What's a reasonable C-level API to query statistics that potentially may 
> never return?  Building in a timeout is something of a crappy API 
> because it puts policy deep in the API that is trivial to implement 
> elsewhere.  What you'd probably do is something like:
> 
> BalloonStatsRequest *query_guest_balloon_stats(CompletionCallback *cb, 
> void *opaque);
> int cancel_guest_balloon_stats(BalloonStatsRequest *req);

Shouldn't the API provide a general cancel method? All functions that
talk to the guest will need one.

> void release_guest_balloon_stats(BalloonStatsRequest *req);
> 
> Regards,
> 
> Anthony Liguori
> 
> >>      
> >>> Beyond fixing that regression, I agree that this command is terminally
> >>> flawed&  we need to deprecate it&  provide better specified new
> >>> replacement(s). This seems like 0.14 work to me though.
> >>>        
> >> Yup.
> >>
> >>      
> >>> Regards,
> >>> Daniel
> >>>
> >>> [1] I know that they could already suffer if there was a bug in qemu
> >>>      that prevented it responding, even if the guest was not being
> >>>      malicious/crashed.
> >>>        
> >>      
> >    
>
Anthony Liguori - Aug. 27, 2010, 7:02 p.m.
On 08/27/2010 11:08 AM, Luiz Capitulino wrote:
>> It's trying to plug a sieve with a band-aid.  It's certainly an
>> "improvement" but it's of question utility looking at the bigger picture.
>>      
> Are you talking about the testing namespace idea? It doesn't have anything
> to do with balloon or how our interfaces are built. It would be just a way
> to play with commands that has been converted to QMP but are not available
> because they're not stable yet (eg. Jan's device_show).
>    

My point is that we shouldn't build any QMP APIs and we definitely 
shouldn't try to QMP-ize monitor commands.

Instead, we should design logical C APIs that we could consume within 
QEMU that we think we can support long term and then expose those over QMP.

Having a sandbox doesn't really solve the fundamental problem of making 
sure the interface is consumable.

>> Balloon is a perfect example of where what we really need to do is build
>> interface interfaces that make sense, and then expose them in QMP.
>>      
> Main question is: can we drop the stats the way they are today to do the
> Right Thing for 0.14 or not?
>    

I don't see how 0.13.0 is going to get releases with anything but the 
current behavior.  It's unfortunate but we're too delayed and can't 
afford a change like this this late in the game.

In terms of the stable branch, the least disruptive thing would be a 
timeout.

> I think we have agreed on the internal interfaces approach. My only
> concern is whether this will conflict when extending the wire protocol
> (eg. adding new arguments to existing commands). Not a problem if the
> C API is not stable, of course.
>    

We don't do that.  It's a recipe for disaster.  QEMU isn't written in 
Python and if we try to module our interfaces are if we were a Python 
library, we're destined to fail.

>> What's a reasonable C-level API to query statistics that potentially may
>> never return?  Building in a timeout is something of a crappy API
>> because it puts policy deep in the API that is trivial to implement
>> elsewhere.  What you'd probably do is something like:
>>
>> BalloonStatsRequest *query_guest_balloon_stats(CompletionCallback *cb,
>> void *opaque);
>> int cancel_guest_balloon_stats(BalloonStatsRequest *req);
>>      
> Shouldn't the API provide a general cancel method? All functions that
> talk to the guest will need one.
>    

See next proposal.  There's no cancel but I'd argue it's not needed.  
You don't care if the request succeeds or fails so there's no point in 
cancelling it.  Cancellation only works best when a request has a 
discrete life cycle but in the case of requesting a guest to update 
stats, there is not really a well define dstart and end.

Regards,

Anthony Liguori

>> void release_guest_balloon_stats(BalloonStatsRequest *req);
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>      
>>>>
>>>>          
>>>>> Beyond fixing that regression, I agree that this command is terminally
>>>>> flawed&   we need to deprecate it&   provide better specified new
>>>>> replacement(s). This seems like 0.14 work to me though.
>>>>>
>>>>>            
>>>> Yup.
>>>>
>>>>
>>>>          
>>>>> Regards,
>>>>> Daniel
>>>>>
>>>>> [1] I know that they could already suffer if there was a bug in qemu
>>>>>       that prevented it responding, even if the guest was not being
>>>>>       malicious/crashed.
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>      
>
Luiz Capitulino - Aug. 27, 2010, 7:24 p.m.
On Fri, 27 Aug 2010 14:02:45 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 08/27/2010 11:08 AM, Luiz Capitulino wrote:
> >> It's trying to plug a sieve with a band-aid.  It's certainly an
> >> "improvement" but it's of question utility looking at the bigger picture.
> >>      
> > Are you talking about the testing namespace idea? It doesn't have anything
> > to do with balloon or how our interfaces are built. It would be just a way
> > to play with commands that has been converted to QMP but are not available
> > because they're not stable yet (eg. Jan's device_show).
> >    
> 
> My point is that we shouldn't build any QMP APIs and we definitely 
> shouldn't try to QMP-ize monitor commands.
> 
> Instead, we should design logical C APIs that we could consume within 
> QEMU that we think we can support long term and then expose those over QMP.

Comments on the Python comment below.

> Having a sandbox doesn't really solve the fundamental problem of making 
> sure the interface is consumable.

It doesn't and it shouldn't. It's just a way to test commands that might
not be stable yet. A very minor feature, anyway.

> >> Balloon is a perfect example of where what we really need to do is build
> >> interface interfaces that make sense, and then expose them in QMP.
> >>      
> > Main question is: can we drop the stats the way they are today to do the
> > Right Thing for 0.14 or not?
> >    
> 
> I don't see how 0.13.0 is going to get releases with anything but the 
> current behavior.  It's unfortunate but we're too delayed and can't 
> afford a change like this this late in the game.
> 
> In terms of the stable branch, the least disruptive thing would be a 
> timeout.

Okay.

> > I think we have agreed on the internal interfaces approach. My only
> > concern is whether this will conflict when extending the wire protocol
> > (eg. adding new arguments to existing commands). Not a problem if the
> > C API is not stable, of course.
> >    
> 
> We don't do that.  It's a recipe for disaster.  QEMU isn't written in 
> Python and if we try to module our interfaces are if we were a Python 
> library, we're destined to fail.

You mean we don't do simple protocol extensions?

So, if we need an new argument to an existing command, we add a new
command instead? Just because QEMU is not written in Python?

> >> What's a reasonable C-level API to query statistics that potentially may
> >> never return?  Building in a timeout is something of a crappy API
> >> because it puts policy deep in the API that is trivial to implement
> >> elsewhere.  What you'd probably do is something like:
> >>
> >> BalloonStatsRequest *query_guest_balloon_stats(CompletionCallback *cb,
> >> void *opaque);
> >> int cancel_guest_balloon_stats(BalloonStatsRequest *req);
> >>      
> > Shouldn't the API provide a general cancel method? All functions that
> > talk to the guest will need one.
> >    
> 
> See next proposal.  There's no cancel but I'd argue it's not needed.  
> You don't care if the request succeeds or fails so there's no point in 
> cancelling it.  Cancellation only works best when a request has a 
> discrete life cycle but in the case of requesting a guest to update 
> stats, there is not really a well define dstart and end.
> 
> Regards,
> 
> Anthony Liguori
> 
> >> void release_guest_balloon_stats(BalloonStatsRequest *req);
> >>
> >> Regards,
> >>
> >> Anthony Liguori
> >>
> >>      
> >>>>
> >>>>          
> >>>>> Beyond fixing that regression, I agree that this command is terminally
> >>>>> flawed&   we need to deprecate it&   provide better specified new
> >>>>> replacement(s). This seems like 0.14 work to me though.
> >>>>>
> >>>>>            
> >>>> Yup.
> >>>>
> >>>>
> >>>>          
> >>>>> Regards,
> >>>>> Daniel
> >>>>>
> >>>>> [1] I know that they could already suffer if there was a bug in qemu
> >>>>>       that prevented it responding, even if the guest was not being
> >>>>>       malicious/crashed.
> >>>>>
> >>>>>            
> >>>>
> >>>>          
> >>>
> >>>        
> >>      
> >    
>
Anthony Liguori - Aug. 27, 2010, 7:37 p.m.
On 08/27/2010 02:24 PM, Luiz Capitulino wrote:
>
>> I don't see how 0.13.0 is going to get releases with anything but the
>> current behavior.  It's unfortunate but we're too delayed and can't
>> afford a change like this this late in the game.
>>
>> In terms of the stable branch, the least disruptive thing would be a
>> timeout.
>>      
> Okay.
>
>    
>>> I think we have agreed on the internal interfaces approach. My only
>>> concern is whether this will conflict when extending the wire protocol
>>> (eg. adding new arguments to existing commands). Not a problem if the
>>> C API is not stable, of course.
>>>
>>>        
>> We don't do that.  It's a recipe for disaster.  QEMU isn't written in
>> Python and if we try to module our interfaces are if we were a Python
>> library, we're destined to fail.
>>      
> You mean we don't do simple protocol extensions?
>
> So, if we need an new argument to an existing command, we add a new
> command instead? Just because QEMU is not written in Python?
>    

Because it's too easy to get it wrong in QEMU.  Here's the rationale.

If I can't trivially call a QMP function in C, then I'm not going to use 
QMP functions within QEMU.  I'm not going to create an embedded JSON 
string just to call a function with three integer arguments.

Yes, if we need to do that, we can create a C API that both the QMP 
interface uses and we also use internally but why?  All that does is 
introduce the chance that the C API will have more features than the QMP 
interface.

If we don't use these functions in QEMU, then how do we know that these 
functions have reasonable semantics?  This is exactly the problem we 
suffer today.  We have internal APIs that do reasonable things but 
everything that deals with QMP is a special case.  That creates too many 
opportunities to get things wrong.

I think it's a vitally important requirement that all future QMP 
functions have direct mappings to a C interface.  The long term goal 
should be for that interface to be used by all of the command line 
arguments, SDL, and the human monitor.  If those things only relied on a 
single API and we exposed that API via QMP, than we would have an 
extremely useful interface.

Regards,

Anthony Liguori
Luiz Capitulino - Aug. 27, 2010, 8:58 p.m.
On Fri, 27 Aug 2010 14:37:54 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 08/27/2010 02:24 PM, Luiz Capitulino wrote:
> >
> >> I don't see how 0.13.0 is going to get releases with anything but the
> >> current behavior.  It's unfortunate but we're too delayed and can't
> >> afford a change like this this late in the game.
> >>
> >> In terms of the stable branch, the least disruptive thing would be a
> >> timeout.
> >>      
> > Okay.
> >
> >    
> >>> I think we have agreed on the internal interfaces approach. My only
> >>> concern is whether this will conflict when extending the wire protocol
> >>> (eg. adding new arguments to existing commands). Not a problem if the
> >>> C API is not stable, of course.
> >>>
> >>>        
> >> We don't do that.  It's a recipe for disaster.  QEMU isn't written in
> >> Python and if we try to module our interfaces are if we were a Python
> >> library, we're destined to fail.
> >>      
> > You mean we don't do simple protocol extensions?
> >
> > So, if we need an new argument to an existing command, we add a new
> > command instead? Just because QEMU is not written in Python?
> >    
> 
> Because it's too easy to get it wrong in QEMU.  Here's the rationale.
> 
> If I can't trivially call a QMP function in C, then I'm not going to use 
> QMP functions within QEMU.  I'm not going to create an embedded JSON 
> string just to call a function with three integer arguments.

... And as a QMP client I don't see the point in having to use a new
command just because a new argument was needed. The language QEMU is
written is also unimportant. The wire format is all I see and it's
language independent.

> Yes, if we need to do that, we can create a C API that both the QMP 
> interface uses and we also use internally but why?  All that does is 
> introduce the chance that the C API will have more features than the QMP 
> interface.

Why? I mean, what would stop us on extending QMP when the C interface is
also extended? Examples?

At first, this seems a good middle ground to me: QMP is implemented on
top of the C API and the JSON stuff is limited to QMP.

Of course that C API functions returning structured data will have to
create and return qobjects. But that's needed even with the direct
mapping.

> If we don't use these functions in QEMU, then how do we know that these 
> functions have reasonable semantics?  This is exactly the problem we 
> suffer today.  We have internal APIs that do reasonable things but 
> everything that deals with QMP is a special case.  That creates too many 
> opportunities to get things wrong.
> 
> I think it's a vitally important requirement that all future QMP 
> functions have direct mappings to a C interface.  The long term goal 
> should be for that interface to be used by all of the command line 
> arguments, SDL, and the human monitor.  If those things only relied on a 
> single API and we exposed that API via QMP, than we would have an 
> extremely useful interface.

I agree, I just think an indirect mapping would be more beneficial to
QMP clients.
Amit Shah - Aug. 28, 2010, 12:52 a.m.
On (Fri) Aug 27 2010 [07:39:37], Anthony Liguori wrote:
> >
> >NACK. It has always been allowed&  valid to call query-balloon
> >to get the current balloon level. We must not throw an error
> >just because the recently added mem stats can't be refreshed.
> 
> I think that's a fair comment but why even bother fixing the
> command.  Let's introduce a new command that just gets a single
> piece of information instead of having a command return lots of
> information.

Instead, let's have query-balloon do the same thing as it did in 0.12
and add a new command for 0.13 (query-balloon-stats) that does the stats
with timeout?

That way we won't have regressions from 0.12 and have the new behaviour
only for 0.13, which we can say is 'experimental'.

		Amit
Markus Armbruster - Aug. 30, 2010, 8:30 a.m.
Amit Shah <amit.shah@redhat.com> writes:

> On (Fri) Aug 27 2010 [07:39:37], Anthony Liguori wrote:
>> >
>> >NACK. It has always been allowed&  valid to call query-balloon
>> >to get the current balloon level. We must not throw an error
>> >just because the recently added mem stats can't be refreshed.
>> 
>> I think that's a fair comment but why even bother fixing the
>> command.  Let's introduce a new command that just gets a single
>> piece of information instead of having a command return lots of
>> information.
>
> Instead, let's have query-balloon do the same thing as it did in 0.12
> and add a new command for 0.13 (query-balloon-stats) that does the stats
> with timeout?
>
> That way we won't have regressions from 0.12 and have the new behaviour
> only for 0.13, which we can say is 'experimental'.

Sounds sensible to me.  Not too late for fixing 0.13?  Anthony?
Anthony Liguori - Aug. 30, 2010, 1:06 p.m.
On 08/30/2010 03:30 AM, Markus Armbruster wrote:
> Amit Shah<amit.shah@redhat.com>  writes:
>
>    
>> On (Fri) Aug 27 2010 [07:39:37], Anthony Liguori wrote:
>>      
>>>> NACK. It has always been allowed&   valid to call query-balloon
>>>> to get the current balloon level. We must not throw an error
>>>> just because the recently added mem stats can't be refreshed.
>>>>          
>>> I think that's a fair comment but why even bother fixing the
>>> command.  Let's introduce a new command that just gets a single
>>> piece of information instead of having a command return lots of
>>> information.
>>>        
>> Instead, let's have query-balloon do the same thing as it did in 0.12
>> and add a new command for 0.13 (query-balloon-stats) that does the stats
>> with timeout?
>>
>> That way we won't have regressions from 0.12 and have the new behaviour
>> only for 0.13, which we can say is 'experimental'.
>>      
> Sounds sensible to me.  Not too late for fixing 0.13?  Anthony?
>    

No new commands for 0.13.  I think the only option is to revert the 
query-balloon behavior.  A new command is too risky as -rc1 is imminent.

Regards,

Anthony Liguori
Markus Armbruster - Aug. 30, 2010, 2:52 p.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 08/27/2010 02:24 PM, Luiz Capitulino wrote:
[...]
>>>> I think we have agreed on the internal interfaces approach. My only
>>>> concern is whether this will conflict when extending the wire protocol
>>>> (eg. adding new arguments to existing commands). Not a problem if the
>>>> C API is not stable, of course.
>>>>
>>>>        
>>> We don't do that.  It's a recipe for disaster.  QEMU isn't written in
>>> Python and if we try to module our interfaces are if we were a Python
>>> library, we're destined to fail.
>>>      
>> You mean we don't do simple protocol extensions?
>>
>> So, if we need an new argument to an existing command, we add a new
>> command instead? Just because QEMU is not written in Python?
>>    
>
> Because it's too easy to get it wrong in QEMU.  Here's the rationale.
>
> If I can't trivially call a QMP function in C, then I'm not going to
> use QMP functions within QEMU.  I'm not going to create an embedded
> JSON string just to call a function with three integer arguments.

Yes, an internal interface is better done in idiomatic C, not with JSON
strings.

> Yes, if we need to do that, we can create a C API that both the QMP
> interface uses and we also use internally but why?  All that does is
> introduce the chance that the C API will have more features than the
> QMP interface.

Why is that bad?

Internal and external interfaces have very different tradeoffs.

An internal interface should eschew backward compatibility and embrace
change.

An external interface needs to be stable, yet extensible.

It's therefore advisable to separate the two.  Otherwise the internal
interface gets bogged down with undue compatibility considerations
(backward & forward), or the external interface suffers unnecessary
churn.

When we designed QMP, we took special care to make it support compatible
evolution.  We consciously made it a proper protocol, not RPC to
internal C interfaces.  Are you proposing we go back to square one and
reargue the basics of QMP?

> If we don't use these functions in QEMU, then how do we know that
> these functions have reasonable semantics?  This is exactly the
> problem we suffer today.  We have internal APIs that do reasonable
> things but everything that deals with QMP is a special case.  That
> creates too many opportunities to get things wrong.

No, the problem we suffer today is that we didn't design the external
interface properly above the level of basic protocol.  We took a
shortcut and converted existing monitor commands.  We've since
discovered we don't like that approach.

Instead of giving up on protocol design without even trying and just
expose whatever internal interfaces strike us as useful via RPC, let's
design the external interface properly.

> I think it's a vitally important requirement that all future QMP
> functions have direct mappings to a C interface.

Why?

>                                                   The long term goal
> should be for that interface to be used by all of the command line
> arguments, SDL, and the human monitor.  If those things only relied on
> a single API and we exposed that API via QMP, than we would have an
> extremely useful interface.

Yes, command line, human monitor and QMP should use internal interfaces
to do the real work, thus separate the real work neatly from
interface-specific stuff like parsing text.

No, that doesn't mean we should expose internal interfaces via RPC.
Markus Armbruster - Aug. 30, 2010, 3:01 p.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 08/30/2010 03:30 AM, Markus Armbruster wrote:
>> Amit Shah<amit.shah@redhat.com>  writes:
>>
>>    
>>> On (Fri) Aug 27 2010 [07:39:37], Anthony Liguori wrote:
>>>      
>>>>> NACK. It has always been allowed&   valid to call query-balloon
>>>>> to get the current balloon level. We must not throw an error
>>>>> just because the recently added mem stats can't be refreshed.
>>>>>          
>>>> I think that's a fair comment but why even bother fixing the
>>>> command.  Let's introduce a new command that just gets a single
>>>> piece of information instead of having a command return lots of
>>>> information.
>>>>        
>>> Instead, let's have query-balloon do the same thing as it did in 0.12
>>> and add a new command for 0.13 (query-balloon-stats) that does the stats
>>> with timeout?
>>>
>>> That way we won't have regressions from 0.12 and have the new behaviour
>>> only for 0.13, which we can say is 'experimental'.
>>>      
>> Sounds sensible to me.  Not too late for fixing 0.13?  Anthony?
>>    
>
> No new commands for 0.13.  I think the only option is to revert the
> query-balloon behavior.  A new command is too risky as -rc1 is
> imminent.

Anyone care to submit a patch?
Anthony Liguori - Aug. 30, 2010, 3:28 p.m.
On 08/30/2010 09:52 AM, Markus Armbruster wrote:
>> Because it's too easy to get it wrong in QEMU.  Here's the rationale.
>>
>> If I can't trivially call a QMP function in C, then I'm not going to
>> use QMP functions within QEMU.  I'm not going to create an embedded
>> JSON string just to call a function with three integer arguments.
>>      
> Yes, an internal interface is better done in idiomatic C, not with JSON
> strings.
>
>    
>> Yes, if we need to do that, we can create a C API that both the QMP
>> interface uses and we also use internally but why?  All that does is
>> introduce the chance that the C API will have more features than the
>> QMP interface.
>>      
> Why is that bad?
>
> Internal and external interfaces have very different tradeoffs.
>
> An internal interface should eschew backward compatibility and embrace
> change.
>
> An external interface needs to be stable, yet extensible.
>    

Nope.

The internal interface should be the external interface.

Otherwise, the external interface is going to rot.

> It's therefore advisable to separate the two.  Otherwise the internal
> interface gets bogged down with undue compatibility considerations
> (backward&  forward), or the external interface suffers unnecessary
> churn.
>
> When we designed QMP, we took special care to make it support compatible
> evolution.  We consciously made it a proper protocol, not RPC to
> internal C interfaces.  Are you proposing we go back to square one and
> reargue the basics of QMP?
>    

All the pretty JSON interfaces don't matter if we're not exposing a 
useful API.

We're trying to do too much.  We've been more or less completing 
ignoring the problem of creating a useful API for users to consume.

We need to simplify.  We simplify by reducing scope.  Of the things that 
are important, a useful API is more important than whether it maps to 
your favorite dynamic language in an elegant way.

> No, the problem we suffer today is that we didn't design the external
> interface properly above the level of basic protocol.  We took a
> shortcut and converted existing monitor commands.  We've since
> discovered we don't like that approach.
>
> Instead of giving up on protocol design without even trying and just
> expose whatever internal interfaces strike us as useful via RPC, let's
> design the external interface properly.
>    

What does that even mean?  How do you describe the external interface 
properly?

It's a hell of a lot simpler to design the external interface as a C 
API, and then to implement the external interface as a C API.  Why make 
life harder than that?

>> I think it's a vitally important requirement that all future QMP
>> functions have direct mappings to a C interface.
>>      
> Why?
>    

So that we can consume those APIs within QEMU.

>>                                                    The long term goal
>> should be for that interface to be used by all of the command line
>> arguments, SDL, and the human monitor.  If those things only relied on
>> a single API and we exposed that API via QMP, than we would have an
>> extremely useful interface.
>>      
> Yes, command line, human monitor and QMP should use internal interfaces
> to do the real work, thus separate the real work neatly from
> interface-specific stuff like parsing text.
>
> No, that doesn't mean we should expose internal interfaces via RPC.
>    

Having two interfaces guarantees failure.  What's the separation between 
internal and external?  Is qdev internal or external?

Regards,

Anthony Liguori
Anthony Liguori - Aug. 30, 2010, 3:38 p.m.
On 08/30/2010 10:28 AM, Anthony Liguori wrote:
> Having two interfaces guarantees failure.  What's the separation 
> between internal and external?  Is qdev internal or external?

Let me put it another way, compatibility cannot be an after thought.

We need to think deeply about compatibility when we design our 
interfaces and we're going to have to redesign interfaces with 
compatibility in mind.  It's a hard problem but it's solvable.  Just 
defaulting arguments in QMP doesn't do anything to improve compatibility.

We cannot radically change our internal implementations and expect to 
bridge it all in some special sauce code somewhere.

This also suggests that we're going to have to practice deprecation to 
evolve our APIs in a reasonable fashion.

Regards,

Anthony Liguori

> Regards,
>
> Anthony Liguori
Luiz Capitulino - Aug. 30, 2010, 4:16 p.m.
On Mon, 30 Aug 2010 10:38:45 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 08/30/2010 10:28 AM, Anthony Liguori wrote:
> > Having two interfaces guarantees failure.  What's the separation 
> > between internal and external?  Is qdev internal or external?
> 
> Let me put it another way, compatibility cannot be an after thought.
> 
> We need to think deeply about compatibility when we design our 
> interfaces and we're going to have to redesign interfaces with 
> compatibility in mind.  It's a hard problem but it's solvable.  Just 
> defaulting arguments in QMP doesn't do anything to improve compatibility.

The point is: C compat sucks, QMP's doesn't. QMP will suck too if we direct
map the two.

You seem to think it's worth it, we don't. How do we solve that?

> We cannot radically change our internal implementations and expect to 
> bridge it all in some special sauce code somewhere.
> 
> This also suggests that we're going to have to practice deprecation to 
> evolve our APIs in a reasonable fashion.

Deprecation should be mostly used for bad defined commands, not for simple
extensions.
Anthony Liguori - Aug. 30, 2010, 4:26 p.m.
On 08/30/2010 11:16 AM, Luiz Capitulino wrote:
> On Mon, 30 Aug 2010 10:38:45 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> On 08/30/2010 10:28 AM, Anthony Liguori wrote:
>>      
>>> Having two interfaces guarantees failure.  What's the separation
>>> between internal and external?  Is qdev internal or external?
>>>        
>> Let me put it another way, compatibility cannot be an after thought.
>>
>> We need to think deeply about compatibility when we design our
>> interfaces and we're going to have to redesign interfaces with
>> compatibility in mind.  It's a hard problem but it's solvable.  Just
>> defaulting arguments in QMP doesn't do anything to improve compatibility.
>>      
> The point is: C compat sucks, QMP's doesn't. QMP will suck too if we direct
> map the two.
>    

Prove it because I don't believe you :-)

Here's a counter example.  Let's say we have function do_something that 
takes three arguments.  We add a forth additional argument and default 
it.  And old client can call with three arguments and it just works 
because the default is "do what we used to do".

How do we discover if this new version of do_something supports four 
arguments instead of three?  Do we ignore unknown arguments?  We can't 
because that fourth argument may be important (but guess what, we do 
today and that's busted).  Because I might be passing "do something new" 
as the fourth argument because I expressly didn't want the old behavior.

Okay, so now we have to add a mechanism to enumerate function names and 
their arguments.  Now I can detect the fourth argument.  What if the 
argument is a structure and we want to add a new field to the structure?

How do I know, as a client, that I can add another field to the 
structure and that field will be respected?

You'd end up writing a full blown schema and then enforcing the schema.  
As a client, you'd have to download the schema and then try to detect 
the features a command supported.  That's an absolutely awful compat 
interface IMHO especially if you have different downstream schemas.

On the other hand, introducing a new function that takes a fourth 
argument, or a new type of structure with an added argument simplifies 
the problem tremendously.  It means we might have four versions of the 
same function but that's unavoidable.  Flattening this for the client 
makes their lives much simpler.

>> We cannot radically change our internal implementations and expect to
>> bridge it all in some special sauce code somewhere.
>>
>> This also suggests that we're going to have to practice deprecation to
>> evolve our APIs in a reasonable fashion.
>>      
> Deprecation should be mostly used for bad defined commands, not for simple
> extensions.
>    

It's better to talk about concrete things so please give an example of 
how you can provide compatibility in QMP that you can't do in C and we 
can discuss it.

My position is that we aren't any closer to having compatible APIs then 
we were with the human monitor.  I think we need to focus on 
compatibility and that that has to be solved as the QEMU interface 
level.  I contend that it's not solvable at the QMP level.

Regards,

Anthony Liguori

Regards,

Anthony Liguori
Markus Armbruster - Aug. 31, 2010, 8:47 a.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 08/30/2010 09:52 AM, Markus Armbruster wrote:
>>> Because it's too easy to get it wrong in QEMU.  Here's the rationale.
>>>
>>> If I can't trivially call a QMP function in C, then I'm not going to
>>> use QMP functions within QEMU.  I'm not going to create an embedded
>>> JSON string just to call a function with three integer arguments.
>>>      
>> Yes, an internal interface is better done in idiomatic C, not with JSON
>> strings.
>>
>>    
>>> Yes, if we need to do that, we can create a C API that both the QMP
>>> interface uses and we also use internally but why?  All that does is
>>> introduce the chance that the C API will have more features than the
>>> QMP interface.
>>>      
>> Why is that bad?
>>
>> Internal and external interfaces have very different tradeoffs.
>>
>> An internal interface should eschew backward compatibility and embrace
>> change.
>>
>> An external interface needs to be stable, yet extensible.
>>    
>
> Nope.

Which part(s) do you disagree with?

(1) Internal and external interfaces have very different tradeoffs.

(2) An internal interface should eschew backward compatibility and
    embrace change.

(3) An external interface needs to be stable, yet extensible.

If none of the above, then how do you propose to resolve the tension
between external and internal interfaces?

> The internal interface should be the external interface.
>
> Otherwise, the external interface is going to rot.
>
>> It's therefore advisable to separate the two.  Otherwise the internal
>> interface gets bogged down with undue compatibility considerations
>> (backward&  forward), or the external interface suffers unnecessary
>> churn.
>>
>> When we designed QMP, we took special care to make it support compatible
>> evolution.  We consciously made it a proper protocol, not RPC to
>> internal C interfaces.  Are you proposing we go back to square one and
>> reargue the basics of QMP?
>>    
>
> All the pretty JSON interfaces don't matter if we're not exposing a
> useful API.
>
> We're trying to do too much.  We've been more or less completing
> ignoring the problem of creating a useful API for users to consume.
>
> We need to simplify.  We simplify by reducing scope.  Of the things
> that are important, a useful API is more important than whether it
> maps to your favorite dynamic language in an elegant way.

So you do propose we go back to square one and reargue the basics of
QMP.

>> No, the problem we suffer today is that we didn't design the external
>> interface properly above the level of basic protocol.  We took a
>> shortcut and converted existing monitor commands.  We've since
>> discovered we don't like that approach.
>>
>> Instead of giving up on protocol design without even trying and just
>> expose whatever internal interfaces strike us as useful via RPC, let's
>> design the external interface properly.
>>    
>
> What does that even mean?  How do you describe the external interface
> properly?
>
> It's a hell of a lot simpler to design the external interface as a C
> API, and then to implement the external interface as a C API.  Why
> make life harder than that?

You define the external interace the same how you define any interface:
you specify operations, arguments, returns and so forth.

C declarations are not a specification.  For instance, a C prototype is
only a part of a function's specification.  It says too little about
possible values, nothing about error conditions, semantics and so forth.

There are "a few" successful internet protocols that aren't RPC to a C
interface.

>>> I think it's a vitally important requirement that all future QMP
>>> functions have direct mappings to a C interface.
>>>      
>> Why?
>
> So that we can consume those APIs within QEMU.

QMP functions should consume the internal APIs.

A QMP function is concerned with interfacing and compatibility, and
leaves the actual work to the internal APIs.  Separation of concerns.

>>>                                                    The long term goal
>>> should be for that interface to be used by all of the command line
>>> arguments, SDL, and the human monitor.  If those things only relied on
>>> a single API and we exposed that API via QMP, than we would have an
>>> extremely useful interface.
>>>      
>> Yes, command line, human monitor and QMP should use internal interfaces
>> to do the real work, thus separate the real work neatly from
>> interface-specific stuff like parsing text.
>>
>> No, that doesn't mean we should expose internal interfaces via RPC.
>>    
>
> Having two interfaces guarantees failure.

The kernel tries hard to separate external and internal interfaces.
It's been failing quite successfully.

>                                            What's the separation
> between internal and external?  Is qdev internal or external?

qdev is the device model, i.e. a generic interface to devices.  qdev.h
is an internal interface.  It has a part facing generic code: the
generic interface to devices.  And it has a part facing devices: the
abstract interface the devices need to implement, plus stuff to help
with that.

The external interface is layered onto the internal interface.  -device
/ device_add expose a selected part of qdev externally.  info qtree
exposes some more.

We could do a better job specifying either of the interfaces.  But
removing the distinction between internal and external interface doesn't
improve the specification of the external interface one bit.

The internal interface is much richer.  It contains things that are of
no interest remotely, including but not limited to the part facing
devices.

It's designed for local, not remote use.  In particular, it makes
liberal use of pointers.  How would you handle those over RPC?

It can be changed easily, because the impact is limited to QEMU itself.
We don't do compatibility there, we just change it and fix up the users.
Markus Armbruster - Aug. 31, 2010, 12:48 p.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

[...]
> My position is that we aren't any closer to having compatible APIs
> then we were with the human monitor.  I think we need to focus on
> compatibility and that that has to be solved as the QEMU interface
> level.  I contend that it's not solvable at the QMP level.

We've argued from day 0 every step along the way.  And here we are, one
year later, still arguing about the very basics.

There's a fundamental disagreement.  I want to keep QMP the way it was
designed: supporting compatible evolution.  You want to remake it from
the ground up as RPC to internal C interfaces.

It seems exceedingly unlikely to me that we can agree on the wisdom of
such a remake.  We can repeat and elaborate on our arguments for a
while, but let's face it: we want different things.

I'm afraid I can't build you the thing you want.  The best I can offer
is to step out of the way and let you build it.
Luiz Capitulino - Aug. 31, 2010, 12:58 p.m.
On Tue, 31 Aug 2010 14:48:51 +0200
Markus Armbruster <armbru@redhat.com> wrote:

> Anthony Liguori <anthony@codemonkey.ws> writes:
> 
> [...]
> > My position is that we aren't any closer to having compatible APIs
> > then we were with the human monitor.  I think we need to focus on
> > compatibility and that that has to be solved as the QEMU interface
> > level.  I contend that it's not solvable at the QMP level.
> 
> We've argued from day 0 every step along the way.  And here we are, one
> year later, still arguing about the very basics.
> 
> There's a fundamental disagreement.  I want to keep QMP the way it was
> designed: supporting compatible evolution.  You want to remake it from
> the ground up as RPC to internal C interfaces.
> 
> It seems exceedingly unlikely to me that we can agree on the wisdom of
> such a remake.  We can repeat and elaborate on our arguments for a
> while, but let's face it: we want different things.

Yes, that's the fundamental problem here.

> I'm afraid I can't build you the thing you want.  The best I can offer
> is to step out of the way and let you build it.
>
Anthony Liguori - Aug. 31, 2010, 1:03 p.m.
On 08/31/2010 03:47 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>    
>> On 08/30/2010 09:52 AM, Markus Armbruster wrote:
>>      
>>>> Because it's too easy to get it wrong in QEMU.  Here's the rationale.
>>>>
>>>> If I can't trivially call a QMP function in C, then I'm not going to
>>>> use QMP functions within QEMU.  I'm not going to create an embedded
>>>> JSON string just to call a function with three integer arguments.
>>>>
>>>>          
>>> Yes, an internal interface is better done in idiomatic C, not with JSON
>>> strings.
>>>
>>>
>>>        
>>>> Yes, if we need to do that, we can create a C API that both the QMP
>>>> interface uses and we also use internally but why?  All that does is
>>>> introduce the chance that the C API will have more features than the
>>>> QMP interface.
>>>>
>>>>          
>>> Why is that bad?
>>>
>>> Internal and external interfaces have very different tradeoffs.
>>>
>>> An internal interface should eschew backward compatibility and embrace
>>> change.
>>>
>>> An external interface needs to be stable, yet extensible.
>>>
>>>        
>> Nope.
>>      
> Which part(s) do you disagree with?
>
> (1) Internal and external interfaces have very different tradeoffs.
>
> (2) An internal interface should eschew backward compatibility and
>      embrace change.
>
> (3) An external interface needs to be stable, yet extensible.
>
> If none of the above, then how do you propose to resolve the tension
> between external and internal interfaces?
>    

What's the different between an internal an external interface?

You're "external" interface at some point needs code in QEMU that 
implements the interface.  Why can't that interface be used by another 
part of QEMU?  Why is it then not an internal interface?

A big part of my argument here, is that we should be eating our own dog 
food with respect to QMP.We need to simplify. We simplify by reducing 
scope. Of the things

>> that are important, a useful API is more important than whether it
>> maps to your favorite dynamic language in an elegant way.
>>      
> So you do propose we go back to square one and reargue the basics of
> QMP.
>    

I don't even know what you mean by that.  Very, very, concretely, I'm 
suggesting that we:

1) Declare what we have supported

2) Design useful interfaces in C that we can consume internally (see 
above comment about eating our dog food)

3) Bridge those useful interfaces via QMP.  Ideally, we would make the 
QMP code dead simple such that it could be generated.  If ya'll want to 
pretty-ify it by making things have default values or whatever type of 
syntactic sugar you want, that's fine.

4) I'm specifically suggesting that we don't start at the QMP level when 
defining new interfaces.

>> What does that even mean? How do you describe the external interface
>> properly?
>>
>> It's a hell of a lot simpler to design the external interface as a C
>> API, and then to implement the external interface as a C API.  Why
>> make life harder than that?
>>      
> You define the external interace the same how you define any interface:
> you specify operations, arguments, returns and so forth.
>
> C declarations are not a specification.  For instance, a C prototype is
> only a part of a function's specification.  It says too little about
> possible values, nothing about error conditions, semantics and so forth.
>    

Well here's how I've attempted to address it:

http://repo.or.cz/w/qemu/aliguori-queue.git/blob/refs/heads/qpi:/qpi-migration.h

We need our internal interfaces to be just as well specified as we need 
QMP to.

>>>> I think it's a vitally important requirement that all future QMP
>>>> functions have direct mappings to a C interface.
>>>>
>>>>          
>>> Why?
>>>        
>> So that we can consume those APIs within QEMU.
>>      
> QMP functions should consume the internal APIs.
>
> A QMP function is concerned with interfacing and compatibility, and
> leaves the actual work to the internal APIs.  Separation of concerns.
>    

I don't think you can separate the two.  I'd be happy to be proven wrong 
but so far, such an existence proof doesn't exist.

>>                                             What's the separation
>> between internal and external?  Is qdev internal or external?
>>      
> qdev is the device model, i.e. a generic interface to devices.  qdev.h
> is an internal interface.  It has a part facing generic code: the
> generic interface to devices.  And it has a part facing devices: the
> abstract interface the devices need to implement, plus stuff to help
> with that.
>
> The external interface is layered onto the internal interface.  -device
> / device_add expose a selected part of qdev externally.  info qtree
> exposes some more.
>    

device_add is intimately tied into qdev.  If you don't believe me, look 
at 
http://repo.or.cz/w/qemu/aliguori.git/shortlog/refs/heads/features/qdev-refactor 
to see how much work it took to separate the two.

Concretely, I'm looking at doing some significant modelling changes.  To 
consider virtio, we have virtio-net-pci today but I'd like to move to 
having a virtio-pci and adding virtio-net to virtio-pci.  This would 
totally change the -device interface and break backwards compatibility.

So how am I going to address that?  I don't know yet but I'm come to 
believe that -device as an interface is going to be a PITA to maintain 
and that we should have thought much harder about compatibility.

> We could do a better job specifying either of the interfaces.  But
> removing the distinction between internal and external interface doesn't
> improve the specification of the external interface one bit.
>
> The internal interface is much richer.  It contains things that are of
> no interest remotely, including but not limited to the part facing
> devices.
>    

Really, you're argument boils down to: if we separate 'internal' and 
'external' interfaces, we can totally ignore compatibility in the 
'internal' interface and implement compatibility solely in the 
'external' part of the interface.

I'm calling B.S. on that.  Compatibility is a hard problem and we need 
to think about it when introducing any interfaces.

> It's designed for local, not remote use.  In particular, it makes
> liberal use of pointers.  How would you handle those over RPC?
>    

A boxed type.  We discussed this as KVM Forum.  It would look like:

QBoxed *qboxed_new(const char *type, void *pointer);
void *qboxed_get_ptr(QBoxed *obj);

It marshals to a token.  Internally, there's a static global map that 
associates a tuple of (type, token) with pointer.

A QPI operation like:

MigrationState *qpi_migration_create(Error **err);

Maps to QMP like:

string: token qmp_migration_create(void)

Or we could use a special QMP type for handles, it's really just a detail.

Regards,

Anthony Liguori

> It can be changed easily, because the impact is limited to QEMU itself.
> We don't do compatibility there, we just change it and fix up the users.
>
Anthony Liguori - Aug. 31, 2010, 1:05 p.m.
On 08/31/2010 07:58 AM, Luiz Capitulino wrote:
> On Tue, 31 Aug 2010 14:48:51 +0200
> Markus Armbruster<armbru@redhat.com>  wrote:
>
>    
>> Anthony Liguori<anthony@codemonkey.ws>  writes:
>>
>> [...]
>>      
>>> My position is that we aren't any closer to having compatible APIs
>>> then we were with the human monitor.  I think we need to focus on
>>> compatibility and that that has to be solved as the QEMU interface
>>> level.  I contend that it's not solvable at the QMP level.
>>>        
>> We've argued from day 0 every step along the way.  And here we are, one
>> year later, still arguing about the very basics.
>>
>> There's a fundamental disagreement.  I want to keep QMP the way it was
>> designed: supporting compatible evolution.  You want to remake it from
>> the ground up as RPC to internal C interfaces.
>>
>> It seems exceedingly unlikely to me that we can agree on the wisdom of
>> such a remake.  We can repeat and elaborate on our arguments for a
>> while, but let's face it: we want different things.
>>      
> Yes, that's the fundamental problem here.
>    

Okay, so what's the path forward?

I've proposed something that I think can get us out of the rut that 
we're in.  I've got code to support that.

Ignoring what I've proposed, what do we do differently?

Regards,

Anthony Liguori

>> I'm afraid I can't build you the thing you want.  The best I can offer
>> is to step out of the way and let you build it.
>>
>>      
>

Patch

diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index d6c66cf..309c343 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -140,6 +140,7 @@  static void complete_stats_request(VirtIOBalloon *vb)
 
 static void show_old_stats(void *opaque)
 {
+    qerror_report(QERR_MACHINE_STOPPED);
     complete_stats_request(opaque);
 }
 
diff --git a/qerror.c b/qerror.c
index 0af3ab3..b7a9f7f 100644
--- a/qerror.c
+++ b/qerror.c
@@ -141,6 +141,10 @@  static const QErrorStringTable qerror_table[] = {
         .desc      = "Using KVM without %(capability), %(feature) unavailable",
     },
     {
+        .error_fmt = QERR_MACHINE_STOPPED,
+        .desc      = "The machine is stopped or the guest is taking too long to respond",
+    },
+    {
         .error_fmt = QERR_MIGRATION_EXPECTED,
         .desc      = "An incoming migration is expected before this command can be executed",
     },
diff --git a/qerror.h b/qerror.h
index 62802ea..470577a 100644
--- a/qerror.h
+++ b/qerror.h
@@ -121,6 +121,9 @@  QError *qobject_to_qerror(const QObject *obj);
 #define QERR_KVM_MISSING_CAP \
     "{ 'class': 'KVMMissingCap', 'data': { 'capability': %s, 'feature': %s } }"
 
+#define QERR_MACHINE_STOPPED \
+    "{ 'class': 'MachineStopped', 'data': {} }"
+
 #define QERR_MIGRATION_EXPECTED \
     "{ 'class': 'MigrationExpected', 'data': {} }"