diff mbox

[v8,04/17] qapi-introspect: Guarantee particular sorting

Message ID 1446052473-19170-5-git-send-email-eblake@redhat.com
State New
Headers show

Commit Message

Eric Blake Oct. 28, 2015, 5:14 p.m. UTC
Sorting the values of an enum makes it easier to look up whether
a particular value is present, via binary rather than linear search
(probably most visible with QKeyCode, which has grown over
several releases).  Additionally, QMP clients need not know which
C value is associated with an enum name, so sorting is an
effective way to hide that non-ABI aspect of qapi.

While we are at it, it is also easy to sort the members and
variants of objects, to allow for a similar binary search (although
less compelling, as any struct with that many members should
arguably be broken into hierarchical substructs), and equally
valid since JSON objects have no specific order in which keys must
appear.  There is no trivial or obvious way to sort the types of
an alternate, so that is left unchanged.

However, the overall SchemaInfo array remains unsorted.  It might
make sense to sort with 'meta-type' as a primary key and 'name'
as a secondary key, but it is not obvious that this will provide
benefits to end-user clients (we allow mutually recursive types,
so there is no posible topological sorting where a single pass
over the array could resolve all types, and while binary searches
could be made possible by sorting, it would be even more efficient
for clients to read the array into a hashtable for O(1) rather
than O(log n) random-access lookups, at which point pre-sorting is
wasted effort).

Document these conventions, so that clients will know what can
and cannot be relied on.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
TODO: should the documentation mention that the sorting is done
in the C locale? Is there anything required to ensure that python
sorts sanely (ie. that the choice of locale while building
doesn't cause inadvertent sorting differences such as turning on
case-insensitivity)?

v8: no change
v7: tweak commit wording
v6: no change from v5
---
 docs/qapi-code-gen.txt     | 21 +++++++++++++++++----
 qapi/introspect.json       | 22 +++++++++++++++++-----
 scripts/qapi-introspect.py |  9 ++++++---
 3 files changed, 40 insertions(+), 12 deletions(-)

Comments

Markus Armbruster Oct. 30, 2015, 11:20 a.m. UTC | #1
For now, only high-level review.

The main cost of sorting is interface complexity: we need to specify
which things are sorted, and what the sorting order is (see your TODO
below).  Once we've done so, we can't go back.

There's also implementation complexity, but your patch shows it's low
enough to be ignored.

Cost needs be justified by benefits.  Let's have a look at the benefits.

Eric Blake <eblake@redhat.com> writes:

> Sorting the values of an enum makes it easier to look up whether
> a particular value is present, via binary rather than linear search
> (probably most visible with QKeyCode, which has grown over
> several releases).

Making the interface guarantee "sorted" saves a client that wants it
sorted (say for binary search) the trouble of sorting itself.  We sort
at QEMU compile time instead of client run time.

My qmp-introspect.c has 48 SchemaInfoEnum, ranging from one to 125
members.  Histogram:

 #enums with this #members
      6 1
     11 2
      8 3
      9 4
      2 5
      1 6
      3 7
      1 8
      2 9
      1 15
      1 19
      1 27
      1 43
      1 125

Mean is less than eight.  Median is *three*.

Sorting these member arrays in the client won't have a noticeable
performance impact.  Heck, I suspect even using linear search wouldn't!
Therefore, we can't really claim client performance as a benefit.

We might be able to claim reduced client complexity as a benefit, but
I'm rather skeptical.  When your search space has a dozen members, you
better stick to simple search techniques.  Linear search in array or
list and binary search in array look adequate, hash table already
approaches overkill, and anything fancier definitely is.  Of these, only
binary search profits from a "sorted" guarantee.  But when you got data
in an array already, all it takes to sort it is a call to qsort() and a
comparison function.  Ten straightforward lines of code, tops.  Less in
a language that isn't quite as primitive as C.  Not much of a benefit,
I'm afraid.

>                     Additionally, QMP clients need not know which
> C value is associated with an enum name, so sorting is an
> effective way to hide that non-ABI aspect of qapi.

Sorting indeed hides the implementation detail of how enumerations are
encoded in the server.  However, I can't see what would tempt clients
into relying on it.  I can see for type names, and that's why we hide
them (commit 1a9a507).

One more potential benefit: when the schema changes in a way that
affects only order in introspection, sorting can hide the changes from
clients.  Can't think of a practical use of it, though.

> While we are at it, it is also easy to sort the members and
> variants of objects, to allow for a similar binary search (although
> less compelling, as any struct with that many members should
> arguably be broken into hierarchical substructs), and equally
> valid since JSON objects have no specific order in which keys must
> appear.

My qmp-introspect.c has 48 SchemaInfoObject, ranging from zero to 27
members.  Histogram:

  #objs with this #members
      3 0
    102 2
     58 3
     28 4
     14 5
     17 6
     15 7
      3 8
      2 9
      7 10
      2 11
      3 12
      1 13
      2 14
      1 15
      1 16
      1 27

Mean is 9.5, median is 9.

The variants are also enums, and therefore can't be any worse than
enums.

Again, client performance is hardly a benefit, and client complexity
isn't particularly convincing, either.

>          There is no trivial or obvious way to sort the types of
> an alternate, so that is left unchanged.

In the schema, an alternate's member has a name, a type and nothing
else, so that's what we could use to sort.  The name isn't visible in
introspection, and the type is masked.  Sorting by either hides schema
change from the client, but isn't of much use to the client otherwise.

If you want to let the client use binary search without having to sort
itself, you obviously have to sort by masked type.

My qmp-introspect.c has *two* alternate types, each with two members.

> However, the overall SchemaInfo array remains unsorted.  It might
> make sense to sort with 'meta-type' as a primary key and 'name'
> as a secondary key, but it is not obvious that this will provide
> benefits to end-user clients (we allow mutually recursive types,
> so there is no posible topological sorting where a single pass
> over the array could resolve all types, and while binary searches
> could be made possible by sorting, it would be even more efficient
> for clients to read the array into a hashtable for O(1) rather
> than O(log n) random-access lookups, at which point pre-sorting is
> wasted effort).

My qmp-introspect.c has:

      2 alternate
     55 array
      5 builtin
    126 command
     48 enum
     35 event
    260 object
    -------------
    531 total

Since they all share the same entity name space, I'd use a single hash
table to map from name to SchemaInfo, just like qapi.py's QAPISchema
does.

Since we visit stuff in a defined order, qmp-introspect.c's order is
stable.  The order just isn't particularly useful for clients, let alone
specified.

> Document these conventions, so that clients will know what can
> and cannot be relied on.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
> TODO: should the documentation mention that the sorting is done
> in the C locale?

I'd assume C locale because we're sorting plain ASCII strings.  But
spelling it out can't hurt.

>                  Is there anything required to ensure that python
> sorts sanely (ie. that the choice of locale while building
> doesn't cause inadvertent sorting differences such as turning on
> case-insensitivity)?

Good question.

> v8: no change
> v7: tweak commit wording
> v6: no change from v5
> ---
>  docs/qapi-code-gen.txt     | 21 +++++++++++++++++----
>  qapi/introspect.json       | 22 +++++++++++++++++-----
>  scripts/qapi-introspect.py |  9 ++++++---
>  3 files changed, 40 insertions(+), 12 deletions(-)
>
> diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
> index ba29bc6..163f547 100644
> --- a/docs/qapi-code-gen.txt
> +++ b/docs/qapi-code-gen.txt
> @@ -516,6 +516,13 @@ query-qmp-schema.  QGA currently doesn't support introspection.
>
>  query-qmp-schema returns a JSON array of SchemaInfo objects.  These
>  objects together describe the wire ABI, as defined in the QAPI schema.
> +There is no specified order to the SchemaInfo objects returned; a
> +client must search for a particular name and meta-type throughout the
> +entire array to learn more about that name.  For now, the name for
> +each SchemaInfo is unique thanks to qapi naming conventions; however
> +this may be changed in the future (such as allowing a command and an
> +event with the same name), so it is important that the client check
> +for the desired meta-type.

I feel separate name spaces aren't necessary and would be confusing, and
I'd be prepared to cast "single entity name space" in stone now.

>
>  However, the SchemaInfo can't reflect all the rules and restrictions
>  that apply to QMP.  It's interface introspection (figuring out what's
> @@ -596,7 +603,8 @@ any.  Each element is a JSON object with members "name" (the member's
>  name), "type" (the name of its type), and optionally "default".  The
>  member is optional if "default" is present.  Currently, "default" can
>  only have value null.  Other values are reserved for future
> -extensions.
> +extensions.  The "members" array is sorted by "name", so that clients
> +can use a binary search to see if a particular member is supported.
>
>  Example: the SchemaInfo for MyType from section Struct types
>
[...]

This all boils down to whether the increase in interface specification
complexity is justified by the benefits.  The cost is small, but I'm
having a hard time seeing the benefits, to be honest.

Am I missing something?
Eric Blake Oct. 30, 2015, 3:41 p.m. UTC | #2
On 10/30/2015 05:20 AM, Markus Armbruster wrote:
> For now, only high-level review.
> 
> The main cost of sorting is interface complexity: we need to specify
> which things are sorted, and what the sorting order is (see your TODO
> below).  Once we've done so, we can't go back.
> 
> There's also implementation complexity, but your patch shows it's low
> enough to be ignored.
> 
> Cost needs be justified by benefits.  Let's have a look at the benefits.
> 

> My qmp-introspect.c has 48 SchemaInfoEnum, ranging from one to 125
> members.  Histogram:
> 
>  #enums with this #members

>       1 27
>       1 43
>       1 125
> 
> Mean is less than eight.  Median is *three*.

Binary searches have more overhead on small lists; you don't get the
benefits of the better algorithm until you overcome the initial hit of
more expensive lookups (bsearch() has to use a callback function, which
is intrinsically slower than doing direct comparison; the speedups are
only present if the search set is large enough that you are able to skip
enough comparisons to overcome the higher cost of the extra
indirection).  So leaving enums unsorted is unlikely to slow clients
down (clients will have to do a linear search, but not the end of the
world since we have few large lists, and even among those, the size is
unlikely to grow much further).

> 
> Sorting these member arrays in the client won't have a noticeable
> performance impact.  Heck, I suspect even using linear search wouldn't!
> Therefore, we can't really claim client performance as a benefit.

So I'm inclined to agree with your conclusion.


>>                     Additionally, QMP clients need not know which
>> C value is associated with an enum name, so sorting is an
>> effective way to hide that non-ABI aspect of qapi.
> 
> Sorting indeed hides the implementation detail of how enumerations are
> encoded in the server.  However, I can't see what would tempt clients
> into relying on it.  I can see for type names, and that's why we hide
> them (commit 1a9a507).

You have a point - even if a client figures out what integer value an
enum name maps to, it doesn't really help, because the client never
sends the integer over the wire.

> 
> One more potential benefit: when the schema changes in a way that
> affects only order in introspection, sorting can hide the changes from
> clients.  Can't think of a practical use of it, though.

Because we send names, not numbers, over the wire, we can insert a new
enum member in slot 0.  Clients blindly doing strcmp(enum[0], "magic")
will get thrown off if "magic" moves to a different position - but this
is true whether or not we sort.  So the only working clients are those
that check every enum position, not just enum[0].  So I agree - broken
clients are broken whether or not we sort, and working clients don't
miss anything whether or not we sort.


> My qmp-introspect.c has 48 SchemaInfoObject, ranging from zero to 27
> members.  Histogram:
> 
>   #objs with this #members

>       1 15
>       1 16
>       1 27

Even less than the largest enum (and large structs are already unwieldy,
compared to making trees of substructs, so we're less likely to make
them too much bigger).

> Since we visit stuff in a defined order, qmp-introspect.c's order is
> stable.  The order just isn't particularly useful for clients, let alone
> specified.
> 
>> Document these conventions, so that clients will know what can
>> and cannot be relied on.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---
>> TODO: should the documentation mention that the sorting is done
>> in the C locale?
> 
> I'd assume C locale because we're sorting plain ASCII strings.  But
> spelling it out can't hurt.
> 
>>                  Is there anything required to ensure that python
>> sorts sanely (ie. that the choice of locale while building
>> doesn't cause inadvertent sorting differences such as turning on
>> case-insensitivity)?
> 
> Good question.

And my biggest worry.

>> +++ b/docs/qapi-code-gen.txt
>> @@ -516,6 +516,13 @@ query-qmp-schema.  QGA currently doesn't support introspection.
>>
>>  query-qmp-schema returns a JSON array of SchemaInfo objects.  These
>>  objects together describe the wire ABI, as defined in the QAPI schema.
>> +There is no specified order to the SchemaInfo objects returned; a
>> +client must search for a particular name and meta-type throughout the
>> +entire array to learn more about that name.  For now, the name for
>> +each SchemaInfo is unique thanks to qapi naming conventions; however
>> +this may be changed in the future (such as allowing a command and an
>> +event with the same name), so it is important that the client check
>> +for the desired meta-type.
> 
> I feel separate name spaces aren't necessary and would be confusing, and
> I'd be prepared to cast "single entity name space" in stone now.

Sounds fair to me.

> This all boils down to whether the increase in interface specification
> complexity is justified by the benefits.  The cost is small, but I'm
> having a hard time seeing the benefits, to be honest.
> 
> Am I missing something?

No, I think you gave a fair review. I put this patch out there for
discussion and to see how hard it would be; but I'm not strongly tied to
it (if I have to pick only one patch out of 3/17 and 4/17, I'd rather
see 3/17 go in).  So now that we've looked at the benefits (or rather,
that they aren't very strong), I'm okay with the idea of making my v9
spin of this series use an alternative patch that _just_ documents the
"single entity name space" and the fact that clients cannot rely on any
ordering (we don't sort - we may be deterministic, but we make no
guarantees on what that order is, and we are free to change the order in
the future).
diff mbox

Patch

diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
index ba29bc6..163f547 100644
--- a/docs/qapi-code-gen.txt
+++ b/docs/qapi-code-gen.txt
@@ -516,6 +516,13 @@  query-qmp-schema.  QGA currently doesn't support introspection.

 query-qmp-schema returns a JSON array of SchemaInfo objects.  These
 objects together describe the wire ABI, as defined in the QAPI schema.
+There is no specified order to the SchemaInfo objects returned; a
+client must search for a particular name and meta-type throughout the
+entire array to learn more about that name.  For now, the name for
+each SchemaInfo is unique thanks to qapi naming conventions; however
+this may be changed in the future (such as allowing a command and an
+event with the same name), so it is important that the client check
+for the desired meta-type.

 However, the SchemaInfo can't reflect all the rules and restrictions
 that apply to QMP.  It's interface introspection (figuring out what's
@@ -596,7 +603,8 @@  any.  Each element is a JSON object with members "name" (the member's
 name), "type" (the name of its type), and optionally "default".  The
 member is optional if "default" is present.  Currently, "default" can
 only have value null.  Other values are reserved for future
-extensions.
+extensions.  The "members" array is sorted by "name", so that clients
+can use a binary search to see if a particular member is supported.

 Example: the SchemaInfo for MyType from section Struct types

@@ -610,7 +618,9 @@  Example: the SchemaInfo for MyType from section Struct types
 "variants" is a JSON array describing the object's variant members.
 Each element is a JSON object with members "case" (the value of type
 tag this element applies to) and "type" (the name of an object type
-that provides the variant members for this type tag value).
+that provides the variant members for this type tag value).  The
+"variants" array is sorted by "case", so it appears in the same
+order as the enum type matching "tag".

 Example: the SchemaInfo for flat union BlockdevOptions from section
 Union types
@@ -651,7 +661,8 @@  Union types
 The SchemaInfo for an alternate type has meta-type "alternate", and
 variant member "members".  "members" is a JSON array.  Each element is
 a JSON object with member "type", which names a type.  Values of the
-alternate type conform to exactly one of its member types.
+alternate type conform to exactly one of its member types.  There is
+no guarantee on the order in which "members" will be listed.

 Example: the SchemaInfo for BlockRef from section Alternate types

@@ -673,7 +684,9 @@  Example: the SchemaInfo for ['str']
       "element-type": "str" }

 The SchemaInfo for an enumeration type has meta-type "enum" and
-variant member "values".
+variant member "values".  The values are listed in sorted order,
+so clients can use a binary search to see if a particular value
+is present.

 Example: the SchemaInfo for MyEnum from section Enumeration types

diff --git a/qapi/introspect.json b/qapi/introspect.json
index cc50dc6..71632af 100644
--- a/qapi/introspect.json
+++ b/qapi/introspect.json
@@ -25,6 +25,11 @@ 
 # Returns: array of @SchemaInfo, where each element describes an
 # entity in the ABI: command, event, type, ...
 #
+# The order of the various SchemaInfo is unspecified.  For now, the
+# name of each SchemaInfo is unique regardless of meta-type, but to be
+# safe, clients should check that a given name has the expected
+# meta-type.
+#
 # Note: the QAPI schema is also used to help define *internal*
 # interfaces, by defining QAPI types.  These are not part of the QMP
 # wire ABI, and therefore not returned by this command.
@@ -78,7 +83,8 @@ 
 #        Commands and events have the name defined in the QAPI schema.
 #        Unlike command and event names, type names are not part of
 #        the wire ABI.  Consequently, type names are meaningless
-#        strings here.
+#        strings here.  Although all names are currently unique
+#        regardless of @meta-type, clients should not rely on this.
 #
 # All references to other SchemaInfo are by name.
 #
@@ -130,7 +136,9 @@ 
 #
 # Additional SchemaInfo members for meta-type 'enum'.
 #
-# @values: the enumeration type's values.
+# @values: the enumeration type's values.  The values are sorted, so
+#          clients can use a binary search to see if a particular value
+#          is present.
 #
 # Values of this type are JSON string on the wire.
 #
@@ -158,14 +166,18 @@ 
 #
 # Additional SchemaInfo members for meta-type 'object'.
 #
-# @members: the object type's (non-variant) members.
+# @members: the object type's (non-variant) members.  The members are
+#           sorted by name, so clients can use a binary search to see
+#           if a given member is present.
 #
 # @tag: #optional the name of the member serving as type tag.
 #       An element of @members with this name must exist.
 #
 # @variants: #optional variant members, i.e. additional members that
 #            depend on the type tag's value.  Present exactly when
-#            @tag is present.
+#            @tag is present.  The variants are sorted by case, which
+#            means they appear in the same order as the values of the
+#            enum type of the @tag.
 #
 # Values of this type are JSON object on the wire.
 #
@@ -219,7 +231,7 @@ 
 #
 # Additional SchemaInfo members for meta-type 'alternate'.
 #
-# @members: the alternate type's members.
+# @members: the alternate type's members, in no particular order.
 #           The members' wire encoding is distinct, see
 #           docs/qapi-code-gen.txt section Alternate types.
 #
diff --git a/scripts/qapi-introspect.py b/scripts/qapi-introspect.py
index 64f2cd0..6a5a843 100644
--- a/scripts/qapi-introspect.py
+++ b/scripts/qapi-introspect.py
@@ -10,6 +10,7 @@ 
 # See the COPYING file in the top-level directory.

 from qapi import *
+from operator import attrgetter


 # Caveman's json.dumps() replacement (we're stuck at Python 2.4)
@@ -126,7 +127,8 @@  const char %(c_name)s[] = %(c_string)s;

     def _gen_variants(self, tag_name, variants):
         return {'tag': tag_name,
-                'variants': [self._gen_variant(v) for v in variants]}
+                'variants': [self._gen_variant(v) for v in
+                             sorted(variants, key=attrgetter('name'))]}

     def _gen_variant(self, variant):
         return {'case': variant.name, 'type': self._use_type(variant.type)}
@@ -135,14 +137,15 @@  const char %(c_name)s[] = %(c_string)s;
         self._gen_json(name, 'builtin', {'json-type': json_type})

     def visit_enum_type(self, name, info, values, prefix):
-        self._gen_json(name, 'enum', {'values': values})
+        self._gen_json(name, 'enum', {'values': sorted(values)})

     def visit_array_type(self, name, info, element_type):
         element = self._use_type(element_type)
         self._gen_json('[' + element + ']', 'array', {'element-type': element})

     def visit_object_type_flat(self, name, info, members, variants):
-        obj = {'members': [self._gen_member(m) for m in members]}
+        obj = {'members': [self._gen_member(m) for m in
+                           sorted(members, key=attrgetter('name'))]}
         if variants:
             obj.update(self._gen_variants(variants.tag_member.name,
                                           variants.variants))