diff mbox

[RFC,v2,27/47] qapi-visit: Convert to QAPISchemaVisitor, fixing bugs

Message ID 1435782155-31412-28-git-send-email-armbru@redhat.com
State New
Headers show

Commit Message

Markus Armbruster July 1, 2015, 8:22 p.m. UTC
Fixes flat unions to visit the base's base members (the previous
commit merely added them to the struct).  Same test case.

Patch's effect on visit_type_UserDefFlatUnion():

     static void visit_type_UserDefFlatUnion_fields(Visitor *m, UserDefFlatUnion **obj, Error **errp)
     {
         Error *err = NULL;

    +    visit_type_int(m, &(*obj)->integer, "integer", &err);
    +    if (err) {
    +        goto out;
    +    }
         visit_type_str(m, &(*obj)->string, "string", &err);
         if (err) {
             goto out;

Test cases updated for the bug fix.

Fixes alternates to generate a visitor for their implicit enumeration
type.  None of them are currently used, obviously.  Example:
block-core.json's BlockdevRef now generates
visit_type_BlockdevRefKind().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 scripts/qapi-visit.py                   | 254 ++++++++++++--------------------
 tests/qapi-schema/qapi-schema-test.json |   3 -
 tests/test-qmp-input-strict.c           |   2 +-
 tests/test-qmp-input-visitor.c          |   4 +-
 4 files changed, 100 insertions(+), 163 deletions(-)

Comments

Eric Blake July 22, 2015, 10:28 p.m. UTC | #1
On 07/01/2015 02:22 PM, Markus Armbruster wrote:
> Fixes flat unions to visit the base's base members (the previous
> commit merely added them to the struct).  Same test case.
> 
> Patch's effect on visit_type_UserDefFlatUnion():
> 
>      static void visit_type_UserDefFlatUnion_fields(Visitor *m, UserDefFlatUnion **obj, Error **errp)
>      {
>          Error *err = NULL;
> 
>     +    visit_type_int(m, &(*obj)->integer, "integer", &err);
>     +    if (err) {
>     +        goto out;
>     +    }
>          visit_type_str(m, &(*obj)->string, "string", &err);
>          if (err) {
>              goto out;
> 
> Test cases updated for the bug fix.
> 
> Fixes alternates to generate a visitor for their implicit enumeration
> type.  None of them are currently used, obviously.  Example:
> block-core.json's BlockdevRef now generates
> visit_type_BlockdevRefKind().
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  scripts/qapi-visit.py                   | 254 ++++++++++++--------------------
>  tests/qapi-schema/qapi-schema-test.json |   3 -
>  tests/test-qmp-input-strict.c           |   2 +-
>  tests/test-qmp-input-visitor.c          |   4 +-
>  4 files changed, 100 insertions(+), 163 deletions(-)

Another conversion that results in a fairly large diffstat to the
generated files:

 qapi-visit.c                        | 4542
++++++++++++++++++------------------
 qapi-visit.h                        |  256 --
 qga/qapi-generated/qga-qapi-visit.c |   88
 qga/qapi-generated/qga-qapi-visit.h |   36
 4 files changed, 2355 insertions(+), 2567 deletions(-)

Same complaints as in 26/47, where splitting some of the cleanups into
separate patches would make it easier to validate that the final
conversion is correct.

Here, a very common thing in the generated .c file is that you end up
swapping the order of visit_type_foo and visit_type_fooList, for any
time when foo is an enum. [1]

> 
> diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
> index a52a572..135e7c1 100644
> --- a/scripts/qapi-visit.py
> +++ b/scripts/qapi-visit.py
> @@ -12,7 +12,6 @@
>  # This work is licensed under the terms of the GNU GPL, version 2.
>  # See the COPYING file in the top-level directory.
>  
> -from ordereddict import OrderedDict
>  from qapi import *
>  import re
>  
> @@ -24,13 +23,13 @@ def generate_visit_implicit_struct(type):
>          return ''
>      implicit_structs_seen.add(type)
>      ret = ''
> -    if type not in struct_fields_seen:
> +    if type.name not in struct_fields_seen:
>          # Need a forward declaration
>          ret += mcgen('''
>  
>  static void visit_type_%(c_type)s_fields(Visitor *m, %(c_type)s **obj, Error **errp);
>  ''',
> -                     c_type=type_name(type))
> +                     c_type=type.c_name())

This looks a little fishy on first read; why are we calling
type.c_name() (and not type.c_type()) when assigning to a placeholder
named %(c_type)s?  But on second thought, it looks correct: since we
really do want the name of the type (and not the magic '*' pointer suffix).

Still, it might be nicer to name things %(c_name)s here; and in that
case, it's more of a pre-existing cleanup that might be better floating
into one of your earlier patches.

>  
>      ret += mcgen('''
>  
> @@ -46,7 +45,7 @@ static void visit_type_implicit_%(c_type)s(Visitor *m, %(c_type)s **obj, Error *
>      error_propagate(errp, err);
>  }
>  ''',
> -                 c_type=type_name(type))
> +                 c_type=type.c_name())

same here.

>      return ret
>  
>  def generate_visit_struct_fields(name, members, base = None):
> @@ -74,24 +73,24 @@ if (err) {
>      goto out;
>  }
>  ''',
> -                     type=type_name(base), c_name=c_name('base'))
> +                     type=base.c_name(), c_name=c_name('base'))

And this one's pointless: c_name('base') == 'base'. Pointless since
commit 622f557 introduced type inheritance.  Why do we even need
%(c_name)s if we are always passing a constant string?

Oh, and that means our generator has a collision bug that none of my
added tests have exposed yet: you cannot have a base class and
simultaneously add a member named 'base':

{ 'struct': 'Base', 'data': { 'i': 'int' } }
{ 'struct': 'Sub', 'base': 'Base', 'data': { 'base': 'str' } }

because the generated C code is trying to use the name 'base' for its
own purposes.  I guess that means more pre-req patches to the series to
expose the bug, and either tighten the parser to reject things for now
(easiest) or update the generator to not collide (harder, and fine for a
later series).

By the way, now that we are emitting flat unions in such a way that you
can cast to the base class, why don't we change our C code to do
likewise?  That is, where we now have this generated C:

struct BlockdevOptionsGenericFormat {
    BlockdevRef *file;
};

struct BlockdevOptionsGenericCOWFormat {
    BlockdevOptionsGenericFormat *base;
    bool has_backing;
    BlockdevRef *backing;
};

why can't we instead have an unboxed representation:

struct BlockdevOptionsGenericFormat {
    BlockdevRef *file;
};

/* This struct can be cast to BlockdevOptionsGenericFormat */
struct BlockdevOptionsGenericCOWFormat {
    BlockdevRef *file;
    /* end of fields from base class BlockdevOptionsGenericFormat */
    bool has_backing;
    BlockdevRef *backing;
};

where client code that was referring to o->base->file now refers to o->file.

> +def gen_visit_union(name, base, variants):
> +    ret = ''
>  
>      if base:
> -        assert discriminator
> -        base_fields = find_struct(base)['data'].copy()
> -        del base_fields[discriminator]
> -        ret += generate_visit_struct_fields(name, base_fields)
> +        members = [m for m in base.members if m != variants.tag_member]
> +        ret += generate_visit_struct_fields(name, members)

Why not just visit ALL base class members, unconditionally?

>  
> -    if discriminator:
> -        for key in members:
> -            ret += generate_visit_implicit_struct(members[key])
> +    for var in variants.variants:
> +        if var.flat:
> +            ret += generate_visit_implicit_struct(var.type)

Okay, I see where you are using .flat from the initial parse.  I still
think it is a bit odd that you are defining '.flat' for each 'variant'
within 'variants', even though, for a given 'variants', all members will
have the same setting of '.flat'.  That makes me wonder if '.flat'
should belong instead to the top-level 'variants' struct rather than to
each 'variant' member.

But again I wonder what would happen if you had instead normalized the
input of simple unions into always having an implicit struct (with
single member 'data'), so that by the time you get here, you only have
to deal with a single representation of unions instead of having to
still emit different things for flat vs. simple (since on the wire, we
already proved simple is shorthand that can be duplicated by a flat union).

>  
>      ret += mcgen('''
>  
> @@ -300,41 +268,39 @@ void visit_type_%(c_name)s(Visitor *m, %(c_name)s **obj, const char *name, Error
>  ''',
>                       name=c_name(name))
>  
> -    if not discriminator:
> -        tag = 'kind'
> -        disc_key = "type"
> -    else:
> -        tag = discriminator
> -        disc_key = discriminator
> +    disc_key = variants.tag_member.name
> +    if not variants.tag_name:
> +        # we pointlessly use a different key for simple unions

We could fix that (as a separate patch); wonder how much C code it would
affect.  A lot of these things that we can alter in generated code are
certainly easier to see now that we have a clean generator :)

> +def gen_visit_decl(name, scalar=False):
> +    c_type = c_name(name) + ' *'
> +    if not scalar:
> +        c_type += '*'
>      return mcgen('''
> -
> -void visit_type_%(name)s(Visitor *m, %(name)s *obj, const char *name, Error **errp);
> +void visit_type_%(c_name)s(Visitor *m, %(c_type)sobj, const char *name, Error **errp);
>  ''',
> -                 name=c_name(name))
> +                 c_name=c_name(name), c_type=c_type)

Nice way to consolidate several near-identical copies.

> +
> +class QAPISchemaGenVisitVisitor(QAPISchemaVisitor):
> +    def __init__(self):
> +        self.decl = None
> +        self.defn = None
> +        self.btin = None
> +    def visit_begin(self):
> +        self.decl = ''
> +        self.defn = ''
> +        self.btin = guardstart('QAPI_VISIT_BUILTIN_VISITOR_DECL')
> +    def visit_end(self):
> +        # to avoid header dependency hell, we always generate
> +        # declarations for built-in types in our header files and
> +        # simply guard them
> +        self.btin += guardend('QAPI_VISIT_BUILTIN_VISITOR_DECL')
> +        self.decl = self.btin + self.decl
> +        self.btin = None
> +        # ...this doesn't work for cases where we link in multiple
> +        # objects that have the functions defined, so we use
> +        # do_builtins (option -b) to provide control

And once again, as in 26/47, this floats the .h file to have all builtin
representations in one chunk (for continuity with pre-patch), but fails
to do the same for the .c code...

> +    def visit_enum_type(self, name, info, values):
> +        self.decl += gen_visit_decl(name, scalar=True)
> +        self.defn += generate_visit_enum(name)
> +    def visit_array_type(self, name, info, element_type):
> +        decl = gen_visit_decl(name)
> +        defn = gen_visit_list(name, element_type)
> +        if isinstance(element_type, QAPISchemaBuiltinType):
> +            self.btin += decl
> +            if do_builtins:
> +                self.defn += defn

...where the builtins are now interleaved with everything else instead
of bunched together, making the generated diff larger and more confusing
than necessary.

> +        else:
> +            self.decl += decl
> +            self.defn += defn
> +    def visit_object_type(self, name, info, base, members, variants):
> +        if info:
> +            self.decl += gen_visit_decl(name)
> +            if variants:
> +                self.defn += gen_visit_union(name, base, variants)

Worth adding 'assert not members'?

> +            else:
> +                self.defn += gen_visit_struct(name, base, members)

Or maybe we can someday consolidate these two into a single
gen_visit_object, that handles all members and variants in a uniform
manner, instead of our current differences. I wonder how much C code
would be impacted?

> +    def visit_alternate_type(self, name, info, variants):
> +        self.decl += gen_visit_decl(name)
> +        self.defn += gen_visit_alternate(name, variants)
>  
>  do_builtins = False
>  
> @@ -442,56 +428,10 @@ fdecl.write(mcgen('''
>  ''',
>                    prefix=prefix))
>  
> -exprs = QAPISchema(input_file).get_exprs()
> -
> -# to avoid header dependency hell, we always generate declarations
> -# for built-in types in our header files and simply guard them
> -fdecl.write(guardstart("QAPI_VISIT_BUILTIN_VISITOR_DECL"))
> -for typename in builtin_types.keys():
> -    fdecl.write(generate_declaration(typename, builtin_type=True))
> -fdecl.write(guardend("QAPI_VISIT_BUILTIN_VISITOR_DECL"))
> -
> -# ...this doesn't work for cases where we link in multiple objects that
> -# have the functions defined, so we use -b option to provide control
> -# over these cases
> -if do_builtins:
> -    for typename in builtin_types.keys():
> -        fdef.write(generate_visit_list(typename))

Again, a well-placed sorted() over these two loops in a pre-req patch
will minimize the churn on the builtins.

> -
> -for expr in exprs:
> -    if expr.has_key('struct'):
> -        ret = generate_visit_struct(expr)
> -        ret += generate_visit_list(expr['struct'])
> -        fdef.write(ret)
> -
> -        ret = generate_declaration(expr['struct'])
> -        fdecl.write(ret)
> -    elif expr.has_key('union'):
> -        ret = generate_visit_union(expr)
> -        ret += generate_visit_list(expr['union'])
> -        fdef.write(ret)
> -
> -        enum_define = discriminator_find_enum_define(expr)
> -        ret = ""
> -        if not enum_define:
> -            ret = generate_decl_enum('%sKind' % expr['union'])

Nice that the new visitor automatically visits any implicit enum,
without us having to special case it.

> -        ret += generate_declaration(expr['union'])
> -        fdecl.write(ret)
> -    elif expr.has_key('alternate'):
> -        ret = generate_visit_alternate(expr['alternate'], expr['data'])
> -        ret += generate_visit_list(expr['alternate'])
> -        fdef.write(ret)
> -
> -        ret = generate_decl_enum('%sKind' % expr['alternate'])
> -        ret += generate_declaration(expr['alternate'])
> -        fdecl.write(ret)
> -    elif expr.has_key('enum'):
> -        ret = generate_visit_list(expr['enum'])
> -        ret += generate_visit_enum(expr['enum'])

[1] swapping these two lines in a pre-req patch will minimize the churn
of this conversion.

> -        fdef.write(ret)
> -
> -        ret = generate_decl_enum(expr['enum'])
> -        ret += generate_enum_declaration(expr['enum'])
> -        fdecl.write(ret)
> +schema = QAPISchema(input_file)
> +gen = QAPISchemaGenVisitVisitor()
> +schema.visit(gen)
> +fdef.write(gen.defn)
> +fdecl.write(gen.decl)

Again, overall impression is that your series is headed in the right
direction.

And nice that the TODOs in the testsuite pointed out what this fixes,
for visiting indirect bases.
Markus Armbruster July 27, 2015, 5:53 p.m. UTC | #2
Eric Blake <eblake@redhat.com> writes:

> On 07/01/2015 02:22 PM, Markus Armbruster wrote:
>> Fixes flat unions to visit the base's base members (the previous
>> commit merely added them to the struct).  Same test case.
>> 
>> Patch's effect on visit_type_UserDefFlatUnion():
>> 
>>      static void visit_type_UserDefFlatUnion_fields(Visitor *m, UserDefFlatUnion **obj, Error **errp)
>>      {
>>          Error *err = NULL;
>> 
>>     +    visit_type_int(m, &(*obj)->integer, "integer", &err);
>>     +    if (err) {
>>     +        goto out;
>>     +    }
>>          visit_type_str(m, &(*obj)->string, "string", &err);
>>          if (err) {
>>              goto out;
>> 
>> Test cases updated for the bug fix.
>> 
>> Fixes alternates to generate a visitor for their implicit enumeration
>> type.  None of them are currently used, obviously.  Example:
>> block-core.json's BlockdevRef now generates
>> visit_type_BlockdevRefKind().
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  scripts/qapi-visit.py                   | 254 ++++++++++++--------------------
>>  tests/qapi-schema/qapi-schema-test.json |   3 -
>>  tests/test-qmp-input-strict.c           |   2 +-
>>  tests/test-qmp-input-visitor.c          |   4 +-
>>  4 files changed, 100 insertions(+), 163 deletions(-)
>
> Another conversion that results in a fairly large diffstat to the
> generated files:
>
>  qapi-visit.c                        | 4542 ++++++++++++++++++------------------
>  qapi-visit.h                        |  256 --
>  qga/qapi-generated/qga-qapi-visit.c |   88
>  qga/qapi-generated/qga-qapi-visit.h |   36
>  4 files changed, 2355 insertions(+), 2567 deletions(-)
>
> Same complaints as in 26/47, where splitting some of the cleanups into
> separate patches would make it easier to validate that the final
> conversion is correct.
>
> Here, a very common thing in the generated .c file is that you end up
> swapping the order of visit_type_foo and visit_type_fooList, for any
> time when foo is an enum. [1]

Again, this is because the old code generates the array visitor from
within the code generating the element type visitor, while the new code
makes all array types full-fledged types, which get visited like any
other type.

>> 
>> diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
>> index a52a572..135e7c1 100644
>> --- a/scripts/qapi-visit.py
>> +++ b/scripts/qapi-visit.py
>> @@ -12,7 +12,6 @@
>>  # This work is licensed under the terms of the GNU GPL, version 2.
>>  # See the COPYING file in the top-level directory.
>>  
>> -from ordereddict import OrderedDict
>>  from qapi import *
>>  import re
>>  
>> @@ -24,13 +23,13 @@ def generate_visit_implicit_struct(type):
>>          return ''
>>      implicit_structs_seen.add(type)
>>      ret = ''
>> -    if type not in struct_fields_seen:
>> +    if type.name not in struct_fields_seen:
>>          # Need a forward declaration
>>          ret += mcgen('''
>>  
>>  static void visit_type_%(c_type)s_fields(Visitor *m, %(c_type)s **obj, Error **errp);
>>  ''',
>> -                     c_type=type_name(type))
>> +                     c_type=type.c_name())
>
> This looks a little fishy on first read; why are we calling
> type.c_name() (and not type.c_type()) when assigning to a placeholder
> named %(c_type)s?  But on second thought, it looks correct: since we
> really do want the name of the type (and not the magic '*' pointer suffix).

Most of the time, c_type is an identifier naming a type (something like
typ.c_name()), but sometimes it's a type expression (something like
typ.c_type()).  If that turns out to be too confusing, we can use
different names for the two.  Not exactly looking forward to tracking
them down, though...

> Still, it might be nicer to name things %(c_name)s here; and in that
> case, it's more of a pre-existing cleanup that might be better floating
> into one of your earlier patches.

This patch tries very hard not to touch lines without need.  Cleanup is
left for PATCH 33.

I could do some (but not all) cleanup before converting to
QAPISchemaVisitor.  But I'm afraid that just complicates things even
more.

>>  
>>      ret += mcgen('''
>>  
>> @@ -46,7 +45,7 @@ static void visit_type_implicit_%(c_type)s(Visitor *m, %(c_type)s **obj, Error *
>>      error_propagate(errp, err);
>>  }
>>  ''',
>> -                 c_type=type_name(type))
>> +                 c_type=type.c_name())
>
> same here.
>
>>      return ret
>>  
>>  def generate_visit_struct_fields(name, members, base = None):
>> @@ -74,24 +73,24 @@ if (err) {
            ret += mcgen('''
    visit_type_implicit_%(c_type)s(m, &(*obj)->%(c_name)s, &err);
    if (err) {
>>      goto out;
>>  }
>>  ''',
>> -                     type=type_name(base), c_name=c_name('base'))
>> +                     type=base.c_name(), c_name=c_name('base'))
>
> And this one's pointless: c_name('base') == 'base'. Pointless since
> commit 622f557 introduced type inheritance.  Why do we even need
> %(c_name)s if we are always passing a constant string?

My best guess: to keep the code similar to other code generating
visit_type_FOO() calls?

> Oh, and that means our generator has a collision bug that none of my
> added tests have exposed yet: you cannot have a base class and
> simultaneously add a member named 'base':
>
> { 'struct': 'Base', 'data': { 'i': 'int' } }
> { 'struct': 'Sub', 'base': 'Base', 'data': { 'base': 'str' } }
>
> because the generated C code is trying to use the name 'base' for its
> own purposes.

*sigh*

>                I guess that means more pre-req patches to the series to
> expose the bug, and either tighten the parser to reject things for now
> (easiest) or update the generator to not collide (harder, and fine for a
> later series).

Yes.

Life would be easier if the original authors had adopted sane naming
conventions from the start.

> By the way, now that we are emitting flat unions in such a way that you
> can cast to the base class, why don't we change our C code to do
> likewise?  That is, where we now have this generated C:
>
> struct BlockdevOptionsGenericFormat {
>     BlockdevRef *file;
> };
>
> struct BlockdevOptionsGenericCOWFormat {
>     BlockdevOptionsGenericFormat *base;
>     bool has_backing;
>     BlockdevRef *backing;
> };
>
> why can't we instead have an unboxed representation:
>
> struct BlockdevOptionsGenericFormat {
>     BlockdevRef *file;
> };
>
> /* This struct can be cast to BlockdevOptionsGenericFormat */
> struct BlockdevOptionsGenericCOWFormat {
>     BlockdevRef *file;
>     /* end of fields from base class BlockdevOptionsGenericFormat */
>     bool has_backing;
>     BlockdevRef *backing;
> };
>
> where client code that was referring to o->base->file now refers to o->file.

That's where I want to go, but not in this series.

>> +def gen_visit_union(name, base, variants):
>> +    ret = ''
>>  
>>      if base:
>> -        assert discriminator
>> -        base_fields = find_struct(base)['data'].copy()
>> -        del base_fields[discriminator]
>> -        ret += generate_visit_struct_fields(name, base_fields)
>> +        members = [m for m in base.members if m != variants.tag_member]
>> +        ret += generate_visit_struct_fields(name, members)
>
> Why not just visit ALL base class members, unconditionally?

This patch converts to QAPISchemaVisitor.  It tries hard to keep the
generated code the same.

The old code generates the discriminator visit into the visit_type_FOO()
and not the the visit_type_FOO_fields() helper (can't tell why), so the
new code does the same.

Simply visiting all members makes more sense, but it needs to be done in
a followup patch.  Since this series is already huge, it'll likely be in
a followup series.

>> -    if discriminator:
>> -        for key in members:
>> -            ret += generate_visit_implicit_struct(members[key])
>> +    for var in variants.variants:
>> +        if var.flat:
>> +            ret += generate_visit_implicit_struct(var.type)
>
> Okay, I see where you are using .flat from the initial parse.  I still
> think it is a bit odd that you are defining '.flat' for each 'variant'
> within 'variants', even though, for a given 'variants', all members will
> have the same setting of '.flat'.  That makes me wonder if '.flat'
> should belong instead to the top-level 'variants' struct rather than to
> each 'variant' member.

Two reasons for putting flat where it is:

* The philosophical one: from the generator's point of view, there's no
  fundamental reason why all variants need to be flat or none.  The
  generator really doesn't care.

* The pragmatic one (a.k.a. the real one): there are places where I use
  v.flat, but don't have the variants owning v handy.

> But again I wonder what would happen if you had instead normalized the
> input of simple unions into always having an implicit struct (with
> single member 'data'), so that by the time you get here, you only have
> to deal with a single representation of unions instead of having to
> still emit different things for flat vs. simple (since on the wire, we
> already proved simple is shorthand that can be duplicated by a flat union).

I hope we can get there!  But at this stage of the conversion, I want to
minimize output change, and .flat makes preserving all its warts much
easier.

>>  
>>      ret += mcgen('''
>>  
>> @@ -300,41 +268,39 @@ void visit_type_%(c_name)s(Visitor *m, %(c_name)s **obj, const char *name, Error
>>  ''',
>>                       name=c_name(name))
>>  
>> -    if not discriminator:
>> -        tag = 'kind'
>> -        disc_key = "type"
>> -    else:
>> -        tag = discriminator
>> -        disc_key = discriminator
>> +    disc_key = variants.tag_member.name
>> +    if not variants.tag_name:
>> +        # we pointlessly use a different key for simple unions
>
> We could fix that (as a separate patch); wonder how much C code it would
> affect.  A lot of these things that we can alter in generated code are
> certainly easier to see now that we have a clean generator :)

Yup, the warts stand out now.

>> +def gen_visit_decl(name, scalar=False):
>> +    c_type = c_name(name) + ' *'
>> +    if not scalar:
>> +        c_type += '*'
>>      return mcgen('''
>> -
>> -void visit_type_%(name)s(Visitor *m, %(name)s *obj, const char *name, Error **errp);
>> +void visit_type_%(c_name)s(Visitor *m, %(c_type)sobj, const char *name, Error **errp);
>>  ''',
>> -                 name=c_name(name))
>> +                 c_name=c_name(name), c_type=c_type)
>
> Nice way to consolidate several near-identical copies.
>
>> +
>> +class QAPISchemaGenVisitVisitor(QAPISchemaVisitor):
>> +    def __init__(self):
>> +        self.decl = None
>> +        self.defn = None
>> +        self.btin = None
>> +    def visit_begin(self):
>> +        self.decl = ''
>> +        self.defn = ''
>> +        self.btin = guardstart('QAPI_VISIT_BUILTIN_VISITOR_DECL')
>> +    def visit_end(self):
>> +        # to avoid header dependency hell, we always generate
>> +        # declarations for built-in types in our header files and
>> +        # simply guard them
>> +        self.btin += guardend('QAPI_VISIT_BUILTIN_VISITOR_DECL')
>> +        self.decl = self.btin + self.decl
>> +        self.btin = None
>> +        # ...this doesn't work for cases where we link in multiple
>> +        # objects that have the functions defined, so we use
>> +        # do_builtins (option -b) to provide control
>
> And once again, as in 26/47, this floats the .h file to have all builtin
> representations in one chunk (for continuity with pre-patch), but fails
> to do the same for the .c code...

Yes, because they all need to be under the
QAPI_VISIT_BUILTIN_VISITOR_DECL guard.

>> +    def visit_enum_type(self, name, info, values):
>> +        self.decl += gen_visit_decl(name, scalar=True)
>> +        self.defn += generate_visit_enum(name)
>> +    def visit_array_type(self, name, info, element_type):
>> +        decl = gen_visit_decl(name)
>> +        defn = gen_visit_list(name, element_type)
>> +        if isinstance(element_type, QAPISchemaBuiltinType):
>> +            self.btin += decl
>> +            if do_builtins:
>> +                self.defn += defn
>
> ...where the builtins are now interleaved with everything else instead
> of bunched together, making the generated diff larger and more confusing
> than necessary.

Yes, because there's no need to separate them out.

>> +        else:
>> +            self.decl += decl
>> +            self.defn += defn
>> +    def visit_object_type(self, name, info, base, members, variants):
>> +        if info:
>> +            self.decl += gen_visit_decl(name)
>> +            if variants:
>> +                self.defn += gen_visit_union(name, base, variants)
>
> Worth adding 'assert not members'?

Yes, with a TODO, because it's an artificial restriction I'd like to
lift some day.

>> +            else:
>> +                self.defn += gen_visit_struct(name, base, members)
>
> Or maybe we can someday consolidate these two into a single
> gen_visit_object, that handles all members and variants in a uniform
> manner, instead of our current differences. I wonder how much C code
> would be impacted?

Yes, that's how we'd lift the restriction.

>> +    def visit_alternate_type(self, name, info, variants):
>> +        self.decl += gen_visit_decl(name)
>> +        self.defn += gen_visit_alternate(name, variants)
>>  
>>  do_builtins = False
>>  
>> @@ -442,56 +428,10 @@ fdecl.write(mcgen('''
>>  ''',
>>                    prefix=prefix))
>>  
>> -exprs = QAPISchema(input_file).get_exprs()
>> -
>> -# to avoid header dependency hell, we always generate declarations
>> -# for built-in types in our header files and simply guard them
>> -fdecl.write(guardstart("QAPI_VISIT_BUILTIN_VISITOR_DECL"))
>> -for typename in builtin_types.keys():
>> -    fdecl.write(generate_declaration(typename, builtin_type=True))
>> -fdecl.write(guardend("QAPI_VISIT_BUILTIN_VISITOR_DECL"))
>> -
>> -# ...this doesn't work for cases where we link in multiple objects that
>> -# have the functions defined, so we use -b option to provide control
>> -# over these cases
>> -if do_builtins:
>> -    for typename in builtin_types.keys():
>> -        fdef.write(generate_visit_list(typename))
>
> Again, a well-placed sorted() over these two loops in a pre-req patch
> will minimize the churn on the builtins.

Can try and see.

>> -
>> -for expr in exprs:
>> -    if expr.has_key('struct'):
>> -        ret = generate_visit_struct(expr)
>> -        ret += generate_visit_list(expr['struct'])
>> -        fdef.write(ret)
>> -
>> -        ret = generate_declaration(expr['struct'])
>> -        fdecl.write(ret)
>> -    elif expr.has_key('union'):
>> -        ret = generate_visit_union(expr)
>> -        ret += generate_visit_list(expr['union'])
>> -        fdef.write(ret)
>> -
>> -        enum_define = discriminator_find_enum_define(expr)
>> -        ret = ""
>> -        if not enum_define:
>> -            ret = generate_decl_enum('%sKind' % expr['union'])
>
> Nice that the new visitor automatically visits any implicit enum,
> without us having to special case it.

Making implicit stuff explicit in the internal representation kills
special cases and de-duplicates code :)

>> -        ret += generate_declaration(expr['union'])
>> -        fdecl.write(ret)
>> -    elif expr.has_key('alternate'):
>> -        ret = generate_visit_alternate(expr['alternate'], expr['data'])
>> -        ret += generate_visit_list(expr['alternate'])
>> -        fdef.write(ret)
>> -
>> -        ret = generate_decl_enum('%sKind' % expr['alternate'])
>> -        ret += generate_declaration(expr['alternate'])
>> -        fdecl.write(ret)
>> -    elif expr.has_key('enum'):
>> -        ret = generate_visit_list(expr['enum'])
>> -        ret += generate_visit_enum(expr['enum'])
>
> [1] swapping these two lines in a pre-req patch will minimize the churn
> of this conversion.

Can try and see.

>> -        fdef.write(ret)
>> -
>> -        ret = generate_decl_enum(expr['enum'])
>> -        ret += generate_enum_declaration(expr['enum'])
>> -        fdecl.write(ret)
>> +schema = QAPISchema(input_file)
>> +gen = QAPISchemaGenVisitVisitor()
>> +schema.visit(gen)
>> +fdef.write(gen.defn)
>> +fdecl.write(gen.decl)
>
> Again, overall impression is that your series is headed in the right
> direction.
>
> And nice that the TODOs in the testsuite pointed out what this fixes,
> for visiting indirect bases.

Thanks!
Eric Blake July 27, 2015, 7:01 p.m. UTC | #3
On 07/27/2015 11:53 AM, Markus Armbruster wrote:

>> Oh, and that means our generator has a collision bug that none of my
>> added tests have exposed yet: you cannot have a base class and
>> simultaneously add a member named 'base':
>>
>> { 'struct': 'Base', 'data': { 'i': 'int' } }
>> { 'struct': 'Sub', 'base': 'Base', 'data': { 'base': 'str' } }
>>
>> because the generated C code is trying to use the name 'base' for its
>> own purposes.
> 
> *sigh*
> 
>>                I guess that means more pre-req patches to the series to
>> expose the bug, and either tighten the parser to reject things for now
>> (easiest) or update the generator to not collide (harder, and fine for a
>> later series).
> 
> Yes.
> 
> Life would be easier if the original authors had adopted sane naming
> conventions from the start.

Like reserving ourselves a namespace based on single _ for internal use.
 We practically already have that - all user names either start with a
letter or double underscore, so we could use single (and triple)
underscore for internally-generated purposes, freeing up 'base',
'*Kind', '*_MAX', and other namespace abuses back to the user.  Well, we
may also need to reserve mid-name double-underscore (that is, the user
can only supply double underscore at the beginning, but not middle, of
an identifier).  Ah well, food for thought for later patches.


>> Okay, I see where you are using .flat from the initial parse.  I still
>> think it is a bit odd that you are defining '.flat' for each 'variant'
>> within 'variants', even though, for a given 'variants', all members will
>> have the same setting of '.flat'.  That makes me wonder if '.flat'
>> should belong instead to the top-level 'variants' struct rather than to
>> each 'variant' member.
> 
> Two reasons for putting flat where it is:
> 
> * The philosophical one: from the generator's point of view, there's no
>   fundamental reason why all variants need to be flat or none.  The
>   generator really doesn't care.

And we may decide to exploit that down the road to allow some sort of
qapi syntax for explicitly designating a union branch as flat or boxed,
rather than the current approach of the type of union determining the
status of all branch members.

> 
> * The pragmatic one (a.k.a. the real one): there are places where I use
>   v.flat, but don't have the variants owning v handy.
> 
>> But again I wonder what would happen if you had instead normalized the
>> input of simple unions into always having an implicit struct (with
>> single member 'data'), so that by the time you get here, you only have
>> to deal with a single representation of unions instead of having to
>> still emit different things for flat vs. simple (since on the wire, we
>> already proved simple is shorthand that can be duplicated by a flat union).
> 
> I hope we can get there!  But at this stage of the conversion, I want to
> minimize output change, and .flat makes preserving all its warts much
> easier.

Agreed.  By the end of the series, I was convinced that the use of
.flat, at least in this series, makes sense.


>>> +    disc_key = variants.tag_member.name
>>> +    if not variants.tag_name:
>>> +        # we pointlessly use a different key for simple unions
>>
>> We could fix that (as a separate patch); wonder how much C code it would
>> affect.  A lot of these things that we can alter in generated code are
>> certainly easier to see now that we have a clean generator :)
> 
> Yup, the warts stand out now.

And I've already demonstrated what sort of cleanups can be done to
attack some of the warts:
https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05266.html
Eric Blake July 27, 2015, 9:35 p.m. UTC | #4
On 07/01/2015 02:22 PM, Markus Armbruster wrote:
> Fixes flat unions to visit the base's base members (the previous
> commit merely added them to the struct).  Same test case.
> 
> Patch's effect on visit_type_UserDefFlatUnion():
> 
>      static void visit_type_UserDefFlatUnion_fields(Visitor *m, UserDefFlatUnion **obj, Error **errp)
>      {
>          Error *err = NULL;
> 
>     +    visit_type_int(m, &(*obj)->integer, "integer", &err);
>     +    if (err) {
>     +        goto out;
>     +    }
>          visit_type_str(m, &(*obj)->string, "string", &err);
>          if (err) {
>              goto out;
> 

> +def gen_visit_union(name, base, variants):
> +    ret = ''
>  
>      if base:
> -        assert discriminator
> -        base_fields = find_struct(base)['data'].copy()
> -        del base_fields[discriminator]
> -        ret += generate_visit_struct_fields(name, base_fields)
> +        members = [m for m in base.members if m != variants.tag_member]
> +        ret += generate_visit_struct_fields(name, members)

Ouch. This hurts.  If the same class is used as both the base class of a
flat union, and the base class of an ordinary struct, then the struct
tries to visit the base class, but no longer parses the field that the
union was using as its discriminator.

We don't have any code that demonstrates this, but probably should.  I
ran into it while working up my POC of what it would take to unbox
inherited structs (http://thread.gmane.org/gmane.comp.emulators.qemu/353204)

We want visit_FOO_fields to visit _all_ fields of the struct, no matter
who called generate_visit_struct_fields().  So what you must instead do
here is use the fact that we've already visited the discriminator...

>  
> -    if discriminator:
> -        for key in members:
> -            ret += generate_visit_implicit_struct(members[key])
> +    for var in variants.variants:
> +        if var.flat:
> +            ret += generate_visit_implicit_struct(var.type)
>  
>      ret += mcgen('''
>  
> @@ -300,41 +268,39 @@ void visit_type_%(c_name)s(Visitor *m, %(c_name)s **obj, const char *name, Error
>  ''',
>                       name=c_name(name))
>  
> -    if not discriminator:
> -        tag = 'kind'
> -        disc_key = "type"
> -    else:
> -        tag = discriminator
> -        disc_key = discriminator
> +    disc_key = variants.tag_member.name
> +    if not variants.tag_name:
> +        # we pointlessly use a different key for simple unions
> +        disc_key = 'type'
>      ret += mcgen('''
> -        visit_type_%(disc_type)s(m, &(*obj)->%(c_tag)s, "%(disc_key)s", &err);
> +        visit_type_%(disc_type)s(m, &(*obj)->%(c_name)s, "%(disc_key)s", &err);

...and omit this call if the flat union's base class already took care
of it.
Markus Armbruster July 28, 2015, 6:41 a.m. UTC | #5
Eric Blake <eblake@redhat.com> writes:

> On 07/27/2015 11:53 AM, Markus Armbruster wrote:
>
>>> Oh, and that means our generator has a collision bug that none of my
>>> added tests have exposed yet: you cannot have a base class and
>>> simultaneously add a member named 'base':
>>>
>>> { 'struct': 'Base', 'data': { 'i': 'int' } }
>>> { 'struct': 'Sub', 'base': 'Base', 'data': { 'base': 'str' } }
>>>
>>> because the generated C code is trying to use the name 'base' for its
>>> own purposes.
>> 
>> *sigh*
>> 
>>>                I guess that means more pre-req patches to the series to
>>> expose the bug, and either tighten the parser to reject things for now
>>> (easiest) or update the generator to not collide (harder, and fine for a
>>> later series).
>> 
>> Yes.
>> 
>> Life would be easier if the original authors had adopted sane naming
>> conventions from the start.
>
> Like reserving ourselves a namespace based on single _ for internal use.
>  We practically already have that - all user names either start with a
> letter or double underscore, so we could use single (and triple)
> underscore for internally-generated purposes, freeing up 'base',
> '*Kind', '*_MAX', and other namespace abuses back to the user.  Well, we
> may also need to reserve mid-name double-underscore (that is, the user
> can only supply double underscore at the beginning, but not middle, of
> an identifier).  Ah well, food for thought for later patches.

Another concern: we should take care not to generate reserved
identifiers.

* Potential issue with your proposal: identifiers that begin with an
  underscore and either an uppercase letter or another underscore are
  always reserved for any use.

* Existing issue: downstream extensions carry a __RFQDN_ prefix in the
  schema, which map to reserved C identifiers.

  Example: qapi-schema-test.json type '__org.qemu_x-Enum' generates

    typedef enum __org_qemu_x_Enum {
        ORG_QEMU_X_ENUM___ORG_QEMU_X_VALUE = 0,
        ORG_QEMU_X_ENUM_MAX = 1,
    } __org_qemu_x_Enum;

    extern const char *const __org_qemu_x_Enum_lookup[];

>>> Okay, I see where you are using .flat from the initial parse.  I still
>>> think it is a bit odd that you are defining '.flat' for each 'variant'
>>> within 'variants', even though, for a given 'variants', all members will
>>> have the same setting of '.flat'.  That makes me wonder if '.flat'
>>> should belong instead to the top-level 'variants' struct rather than to
>>> each 'variant' member.
>> 
>> Two reasons for putting flat where it is:
>> 
>> * The philosophical one: from the generator's point of view, there's no
>>   fundamental reason why all variants need to be flat or none.  The
>>   generator really doesn't care.
>
> And we may decide to exploit that down the road to allow some sort of
> qapi syntax for explicitly designating a union branch as flat or boxed,
> rather than the current approach of the type of union determining the
> status of all branch members.

I can't see a need now, but if one arises, we could do it.

>> * The pragmatic one (a.k.a. the real one): there are places where I use
>>   v.flat, but don't have the variants owning v handy.
>> 
>>> But again I wonder what would happen if you had instead normalized the
>>> input of simple unions into always having an implicit struct (with
>>> single member 'data'), so that by the time you get here, you only have
>>> to deal with a single representation of unions instead of having to
>>> still emit different things for flat vs. simple (since on the wire, we
>>> already proved simple is shorthand that can be duplicated by a flat union).
>> 
>> I hope we can get there!  But at this stage of the conversion, I want to
>> minimize output change, and .flat makes preserving all its warts much
>> easier.
>
> Agreed.  By the end of the series, I was convinced that the use of
> .flat, at least in this series, makes sense.

Good :)

>>>> +    disc_key = variants.tag_member.name
>>>> +    if not variants.tag_name:
>>>> +        # we pointlessly use a different key for simple unions
>>>
>>> We could fix that (as a separate patch); wonder how much C code it would
>>> affect.  A lot of these things that we can alter in generated code are
>>> certainly easier to see now that we have a clean generator :)
>> 
>> Yup, the warts stand out now.
>
> And I've already demonstrated what sort of cleanups can be done to
> attack some of the warts:
> https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg05266.html

I only have time for a quick glance now.  It looks lovely!
Markus Armbruster July 28, 2015, 6:44 a.m. UTC | #6
Eric Blake <eblake@redhat.com> writes:

> On 07/01/2015 02:22 PM, Markus Armbruster wrote:
>> Fixes flat unions to visit the base's base members (the previous
>> commit merely added them to the struct).  Same test case.
>> 
>> Patch's effect on visit_type_UserDefFlatUnion():
>> 
>>      static void visit_type_UserDefFlatUnion_fields(Visitor *m, UserDefFlatUnion **obj, Error **errp)
>>      {
>>          Error *err = NULL;
>> 
>>     +    visit_type_int(m, &(*obj)->integer, "integer", &err);
>>     +    if (err) {
>>     +        goto out;
>>     +    }
>>          visit_type_str(m, &(*obj)->string, "string", &err);
>>          if (err) {
>>              goto out;
>> 
>
>> +def gen_visit_union(name, base, variants):
>> +    ret = ''
>>  
>>      if base:
>> -        assert discriminator
>> -        base_fields = find_struct(base)['data'].copy()
>> -        del base_fields[discriminator]
>> -        ret += generate_visit_struct_fields(name, base_fields)
>> +        members = [m for m in base.members if m != variants.tag_member]
>> +        ret += generate_visit_struct_fields(name, members)
>
> Ouch. This hurts.  If the same class is used as both the base class of a
> flat union, and the base class of an ordinary struct, then the struct
> tries to visit the base class, but no longer parses the field that the
> union was using as its discriminator.
>
> We don't have any code that demonstrates this, but probably should.  I
> ran into it while working up my POC of what it would take to unbox
> inherited structs (http://thread.gmane.org/gmane.comp.emulators.qemu/353204)

Is this broken in master, or do my patches break it?

Got a reproducer?

[...]
Eric Blake July 28, 2015, 2:46 p.m. UTC | #7
On 07/28/2015 12:41 AM, Markus Armbruster wrote:
>> Like reserving ourselves a namespace based on single _ for internal use.
>>  We practically already have that - all user names either start with a
>> letter or double underscore, so we could use single (and triple)
>> underscore for internally-generated purposes, freeing up 'base',
>> '*Kind', '*_MAX', and other namespace abuses back to the user.  Well, we
>> may also need to reserve mid-name double-underscore (that is, the user
>> can only supply double underscore at the beginning, but not middle, of
>> an identifier).  Ah well, food for thought for later patches.
> 
> Another concern: we should take care not to generate reserved
> identifiers.

And we do that for a number of identifiers already, by renaming
'default' in qapi to 'q_default' in C.

> 
> * Potential issue with your proposal: identifiers that begin with an
>   underscore and either an uppercase letter or another underscore are
>   always reserved for any use.

True, so only underscore and lower is guaranteed safe. But in practice,
we can often get away with more...

> 
> * Existing issue: downstream extensions carry a __RFQDN_ prefix in the
>   schema, which map to reserved C identifiers.

...the whole point of the __RFQDN_ prefix was that it is very unlikely
to conflict with any vendor extensions (unless you happen to be the same
vendor that writes the compiler and als use __RFQDN_ as part of your
compiler implementation).  Yes, it's risky, but a risk that hasn't hurt
any downstream clients yet :)  It's one of those things where if someone
reports an actual problem in their environment, we'll fix it; but until
then, it's not worth the headache of strict compliance to "fix"
something that happens to work in spite of being undefined behavior.
Eric Blake July 28, 2015, 8:41 p.m. UTC | #8
On 07/28/2015 12:44 AM, Markus Armbruster wrote:

>>
>>> +def gen_visit_union(name, base, variants):
>>> +    ret = ''
>>>  
>>>      if base:
>>> -        assert discriminator
>>> -        base_fields = find_struct(base)['data'].copy()
>>> -        del base_fields[discriminator]
>>> -        ret += generate_visit_struct_fields(name, base_fields)
>>> +        members = [m for m in base.members if m != variants.tag_member]
>>> +        ret += generate_visit_struct_fields(name, members)
>>
>> Ouch. This hurts.  If the same class is used as both the base class of a
>> flat union, and the base class of an ordinary struct, then the struct
>> tries to visit the base class, but no longer parses the field that the
>> union was using as its discriminator.
>>
>> We don't have any code that demonstrates this, but probably should.  I
>> ran into it while working up my POC of what it would take to unbox
>> inherited structs (http://thread.gmane.org/gmane.comp.emulators.qemu/353204)
> 
> Is this broken in master, or do my patches break it?
> 
> Got a reproducer?

Turns out I'm mistaken; we got lucky.  The call to
generate_visit_struct_fields() creates a function for 'name' (aka the
union name), and not for 'base' (aka the class name that owns the
fields).  So even if we have Base as a common struct between Child and
Union, the code emitted for Child generates visit_Base_fields(), while
the code emitted for Union generates visit_Union_fields().

So there is no reproducer, but _only_ as long as we reject unions as a
base class for any other object.  And there is redundancy: we could
reuse visit_Base_fields() for the sake of the union, then avoid
(re-)visiting the discriminator, and we would no longer need to emit
visit_Union_fields().  But I can do that as part of the followup
cleanups; since I don't see anything broken with your patch, we don't
have to worry about it during this series.
Markus Armbruster July 29, 2015, 7:59 a.m. UTC | #9
Eric Blake <eblake@redhat.com> writes:

> On 07/28/2015 12:41 AM, Markus Armbruster wrote:
>>> Like reserving ourselves a namespace based on single _ for internal use.
>>>  We practically already have that - all user names either start with a
>>> letter or double underscore, so we could use single (and triple)
>>> underscore for internally-generated purposes, freeing up 'base',
>>> '*Kind', '*_MAX', and other namespace abuses back to the user.  Well, we
>>> may also need to reserve mid-name double-underscore (that is, the user
>>> can only supply double underscore at the beginning, but not middle, of
>>> an identifier).  Ah well, food for thought for later patches.
>> 
>> Another concern: we should take care not to generate reserved
>> identifiers.
>
> And we do that for a number of identifiers already, by renaming
> 'default' in qapi to 'q_default' in C.
>
>> 
>> * Potential issue with your proposal: identifiers that begin with an
>>   underscore and either an uppercase letter or another underscore are
>>   always reserved for any use.
>
> True, so only underscore and lower is guaranteed safe.

And we'd get that for most names.  Only names we want to shout (macros,
enumeration constants) could result in a problematic underscore and
uppercase letter.

Note that we *strip* leading underscores from enumeration constants.
Example (from qapi-schema-test.json):

    { 'enum': '__org.qemu_x-Enum', 'data': [ '__org.qemu_x-value' ] }

generates

    typedef enum __org_qemu_x_Enum
    {
        ORG_QEMU_X_ENUM___ORG_QEMU_X_VALUE = 0,
        ORG_QEMU_X_ENUM_MAX = 1,
    } __org_qemu_x_Enum;

Now add this one:

    { 'enum': 'org_qemu_x-Enum', 'data': [ ] }

Just fine according to our documentation.  But of course it clashes:

    typedef enum org_qemu_x_Enum
    {
        ORG_QEMU_X_ENUM_MAX = 0,
    } org_qemu_x_Enum;

Short term, we should note in qapi-code-gen.txt: different names in the
schema can nevertheless clash in generated C, and when that happens, you
get to pick non-clashing names.

Longer term, we may want to rethink how we map names to C.

> we can often get away with more...
>
>> 
>> * Existing issue: downstream extensions carry a __RFQDN_ prefix in the
>>   schema, which map to reserved C identifiers.
>
> ...the whole point of the __RFQDN_ prefix was that it is very unlikely
> to conflict with any vendor extensions (unless you happen to be the same
> vendor that writes the compiler and als use __RFQDN_ as part of your
> compiler implementation).  Yes, it's risky, but a risk that hasn't hurt
> any downstream clients yet :)  It's one of those things where if someone
> reports an actual problem in their environment, we'll fix it; but until
> then, it's not worth the headache of strict compliance to "fix"
> something that happens to work in spite of being undefined behavior.

I'm not religious about reserved names, but this intrusion into the
reserved name space was entirely avoidable: we could've just as well
picked a non-reserved prefix.
Markus Armbruster July 29, 2015, 8 a.m. UTC | #10
Eric Blake <eblake@redhat.com> writes:

> On 07/28/2015 12:44 AM, Markus Armbruster wrote:
>
>>>
>>>> +def gen_visit_union(name, base, variants):
>>>> +    ret = ''
>>>>  
>>>>      if base:
>>>> -        assert discriminator
>>>> -        base_fields = find_struct(base)['data'].copy()
>>>> -        del base_fields[discriminator]
>>>> -        ret += generate_visit_struct_fields(name, base_fields)
>>>> +        members = [m for m in base.members if m != variants.tag_member]
>>>> +        ret += generate_visit_struct_fields(name, members)
>>>
>>> Ouch. This hurts.  If the same class is used as both the base class of a
>>> flat union, and the base class of an ordinary struct, then the struct
>>> tries to visit the base class, but no longer parses the field that the
>>> union was using as its discriminator.
>>>
>>> We don't have any code that demonstrates this, but probably should.  I
>>> ran into it while working up my POC of what it would take to unbox
>>> inherited structs (http://thread.gmane.org/gmane.comp.emulators.qemu/353204)
>> 
>> Is this broken in master, or do my patches break it?
>> 
>> Got a reproducer?
>
> Turns out I'm mistaken; we got lucky.  The call to
> generate_visit_struct_fields() creates a function for 'name' (aka the
> union name), and not for 'base' (aka the class name that owns the
> fields).  So even if we have Base as a common struct between Child and
> Union, the code emitted for Child generates visit_Base_fields(), while
> the code emitted for Union generates visit_Union_fields().
>
> So there is no reproducer, but _only_ as long as we reject unions as a
> base class for any other object.  And there is redundancy: we could
> reuse visit_Base_fields() for the sake of the union, then avoid
> (re-)visiting the discriminator, and we would no longer need to emit
> visit_Union_fields().  But I can do that as part of the followup
> cleanups; since I don't see anything broken with your patch, we don't
> have to worry about it during this series.

Good.  Thank you!
Eric Blake July 29, 2015, 4:56 p.m. UTC | #11
On 07/29/2015 02:00 AM, Markus Armbruster wrote:
>>>> We don't have any code that demonstrates this, but probably should.  I
>>>> ran into it while working up my POC of what it would take to unbox
>>>> inherited structs (http://thread.gmane.org/gmane.comp.emulators.qemu/353204)
>>>
>>> Is this broken in master, or do my patches break it?
>>>
>>> Got a reproducer?
>>
>> Turns out I'm mistaken; we got lucky.  The call to
>> generate_visit_struct_fields() creates a function for 'name' (aka the
>> union name), and not for 'base' (aka the class name that owns the
>> fields).  So even if we have Base as a common struct between Child and
>> Union, the code emitted for Child generates visit_Base_fields(), while
>> the code emitted for Union generates visit_Union_fields().
>>
>> So there is no reproducer, but _only_ as long as we reject unions as a
>> base class for any other object.  And there is redundancy: we could
>> reuse visit_Base_fields() for the sake of the union, then avoid
>> (re-)visiting the discriminator, and we would no longer need to emit
>> visit_Union_fields().  But I can do that as part of the followup
>> cleanups; since I don't see anything broken with your patch, we don't
>> have to worry about it during this series.
> 
> Good.  Thank you!

And here's a followup patch that cleans it up:
http://thread.gmane.org/gmane.comp.emulators.qemu/353204/focus=353728
diff mbox

Patch

diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
index a52a572..135e7c1 100644
--- a/scripts/qapi-visit.py
+++ b/scripts/qapi-visit.py
@@ -12,7 +12,6 @@ 
 # This work is licensed under the terms of the GNU GPL, version 2.
 # See the COPYING file in the top-level directory.
 
-from ordereddict import OrderedDict
 from qapi import *
 import re
 
@@ -24,13 +23,13 @@  def generate_visit_implicit_struct(type):
         return ''
     implicit_structs_seen.add(type)
     ret = ''
-    if type not in struct_fields_seen:
+    if type.name not in struct_fields_seen:
         # Need a forward declaration
         ret += mcgen('''
 
 static void visit_type_%(c_type)s_fields(Visitor *m, %(c_type)s **obj, Error **errp);
 ''',
-                     c_type=type_name(type))
+                     c_type=type.c_name())
 
     ret += mcgen('''
 
@@ -46,7 +45,7 @@  static void visit_type_implicit_%(c_type)s(Visitor *m, %(c_type)s **obj, Error *
     error_propagate(errp, err);
 }
 ''',
-                 c_type=type_name(type))
+                 c_type=type.c_name())
     return ret
 
 def generate_visit_struct_fields(name, members, base = None):
@@ -74,24 +73,24 @@  if (err) {
     goto out;
 }
 ''',
-                     type=type_name(base), c_name=c_name('base'))
+                     type=base.c_name(), c_name=c_name('base'))
 
-    for argname, argentry, optional in parse_args(members):
-        if optional:
+    for memb in members:
+        if memb.optional:
             ret += mcgen('''
 visit_optional(m, &(*obj)->has_%(c_name)s, "%(name)s", &err);
 if (!err && (*obj)->has_%(c_name)s) {
 ''',
-                         c_name=c_name(argname), name=argname)
+                         c_name=c_name(memb.name), name=memb.name)
             push_indent()
 
         ret += mcgen('''
 visit_type_%(type)s(m, &(*obj)->%(c_name)s, "%(name)s", &err);
 ''',
-                     type=type_name(argentry), c_name=c_name(argname),
-                     name=argname)
+                     type=memb.type.c_name(), c_name=c_name(memb.name),
+                     name=memb.name)
 
-        if optional:
+        if memb.optional:
             pop_indent()
             ret += mcgen('''
 }
@@ -132,12 +131,7 @@  def generate_visit_struct_body(name):
 
     return ret
 
-def generate_visit_struct(expr):
-
-    name = expr['struct']
-    members = expr['data']
-    base = expr.get('base')
-
+def gen_visit_struct(name, base, members):
     ret = generate_visit_struct_fields(name, members, base)
 
     ret += mcgen('''
@@ -154,10 +148,10 @@  void visit_type_%(name)s(Visitor *m, %(name)s **obj, const char *name, Error **e
 ''')
     return ret
 
-def generate_visit_list(name):
+def gen_visit_list(name, element_type):
     return mcgen('''
 
-void visit_type_%(name)sList(Visitor *m, %(name)sList **obj, const char *name, Error **errp)
+void visit_type_%(name)s(Visitor *m, %(name)s **obj, const char *name, Error **errp)
 {
     Error *err = NULL;
     GenericList *i, **prev;
@@ -170,8 +164,8 @@  void visit_type_%(name)sList(Visitor *m, %(name)sList **obj, const char *name, E
     for (prev = (GenericList **)obj;
          !err && (i = visit_next_list(m, prev, &err)) != NULL;
          prev = &i) {
-        %(name)sList *native_i = (%(name)sList *)i;
-        visit_type_%(name)s(m, &native_i->value, NULL, &err);
+        %(name)s *native_i = (%(name)s *)i;
+        visit_type_%(c_elt_type)s(m, &native_i->value, NULL, &err);
     }
 
     error_propagate(errp, err);
@@ -181,7 +175,8 @@  out:
     error_propagate(errp, err);
 }
 ''',
-                name=type_name(name))
+                 name=c_name(name),
+                 c_elt_type=element_type.c_name())
 
 def generate_visit_enum(name):
     return mcgen('''
@@ -193,7 +188,7 @@  void visit_type_%(c_name)s(Visitor *m, %(c_name)s *obj, const char *name, Error
 ''',
                  c_name=c_name(name), name=name)
 
-def generate_visit_alternate(name, members):
+def gen_visit_alternate(name, variants):
     ret = mcgen('''
 
 void visit_type_%(name)s(Visitor *m, %(name)s **obj, const char *name, Error **errp)
@@ -212,25 +207,17 @@  void visit_type_%(name)s(Visitor *m, %(name)s **obj, const char *name, Error **e
 ''',
                 name=c_name(name))
 
-    # For alternate, always use the default enum type automatically generated
-    # as name + 'Kind'
-    disc_type = c_name(name) + 'Kind'
-
-    for key in members:
-        assert (members[key] in builtin_types.keys()
-            or find_struct(members[key])
-            or find_union(members[key])
-            or find_enum(members[key])), "Invalid alternate member"
-
-        enum_full_value = c_enum_const(disc_type, key)
+    for var in variants.variants:
+        enum_full_value = c_enum_const(variants.tag_member.type.name,
+                                       var.name)
         ret += mcgen('''
     case %(enum_full_value)s:
         visit_type_%(c_type)s(m, &(*obj)->%(c_name)s, name, &err);
         break;
 ''',
                 enum_full_value = enum_full_value,
-                c_type = type_name(members[key]),
-                c_name = c_name(key))
+                c_type=var.type.c_name(),
+                c_name=c_name(var.name))
 
     ret += mcgen('''
     default:
@@ -247,35 +234,16 @@  out:
 
     return ret
 
-
-def generate_visit_union(expr):
-
-    name = expr['union']
-    members = expr['data']
-
-    base = expr.get('base')
-    discriminator = expr.get('discriminator')
-
-    enum_define = discriminator_find_enum_define(expr)
-    if enum_define:
-        # Use the enum type as discriminator
-        ret = ""
-        disc_type = c_name(enum_define['enum_name'])
-    else:
-        # There will always be a discriminator in the C switch code, by default
-        # it is an enum type generated silently
-        ret = generate_visit_enum(name + 'Kind')
-        disc_type = c_name(name) + 'Kind'
+def gen_visit_union(name, base, variants):
+    ret = ''
 
     if base:
-        assert discriminator
-        base_fields = find_struct(base)['data'].copy()
-        del base_fields[discriminator]
-        ret += generate_visit_struct_fields(name, base_fields)
+        members = [m for m in base.members if m != variants.tag_member]
+        ret += generate_visit_struct_fields(name, members)
 
-    if discriminator:
-        for key in members:
-            ret += generate_visit_implicit_struct(members[key])
+    for var in variants.variants:
+        if var.flat:
+            ret += generate_visit_implicit_struct(var.type)
 
     ret += mcgen('''
 
@@ -300,41 +268,39 @@  void visit_type_%(c_name)s(Visitor *m, %(c_name)s **obj, const char *name, Error
 ''',
                      name=c_name(name))
 
-    if not discriminator:
-        tag = 'kind'
-        disc_key = "type"
-    else:
-        tag = discriminator
-        disc_key = discriminator
+    disc_key = variants.tag_member.name
+    if not variants.tag_name:
+        # we pointlessly use a different key for simple unions
+        disc_key = 'type'
     ret += mcgen('''
-        visit_type_%(disc_type)s(m, &(*obj)->%(c_tag)s, "%(disc_key)s", &err);
+        visit_type_%(disc_type)s(m, &(*obj)->%(c_name)s, "%(disc_key)s", &err);
         if (err) {
             goto out_obj;
         }
         if (!visit_start_union(m, !!(*obj)->data, &err) || err) {
             goto out_obj;
         }
-        switch ((*obj)->%(c_tag)s) {
+        switch ((*obj)->%(c_name)s) {
 ''',
-                 disc_type = disc_type,
-                 c_tag=c_name(tag),
+                 disc_type=variants.tag_member.type.c_name(),
+                 c_name=c_name(variants.tag_member.name),
                  disc_key = disc_key)
 
-    for key in members:
-        if not discriminator:
+    for var in variants.variants:
+        if not var.flat:
             fmt = 'visit_type_%(c_type)s(m, &(*obj)->%(c_name)s, "data", &err);'
         else:
             fmt = 'visit_type_implicit_%(c_type)s(m, &(*obj)->%(c_name)s, &err);'
 
-        enum_full_value = c_enum_const(disc_type, key)
+        enum_full_value = c_enum_const(variants.tag_member.type.name, var.name)
         ret += mcgen('''
         case %(enum_full_value)s:
             ''' + fmt + '''
             break;
 ''',
                 enum_full_value = enum_full_value,
-                c_type=type_name(members[key]),
-                c_name=c_name(key))
+                c_type=var.type.c_name(),
+                c_name=c_name(var.name))
 
     ret += mcgen('''
         default:
@@ -355,37 +321,57 @@  out:
 
     return ret
 
-def generate_declaration(name, builtin_type=False):
-    ret = ""
-    if not builtin_type:
-        name = c_name(name)
-        ret += mcgen('''
-
-void visit_type_%(name)s(Visitor *m, %(name)s **obj, const char *name, Error **errp);
-''',
-                     name=name)
-
-    ret += mcgen('''
-void visit_type_%(name)sList(Visitor *m, %(name)sList **obj, const char *name, Error **errp);
-''',
-                 name=name)
-
-    return ret
-
-def generate_enum_declaration(name):
-    ret = mcgen('''
-void visit_type_%(name)sList(Visitor *m, %(name)sList **obj, const char *name, Error **errp);
-''',
-                name=c_name(name))
-
-    return ret
-
-def generate_decl_enum(name):
+def gen_visit_decl(name, scalar=False):
+    c_type = c_name(name) + ' *'
+    if not scalar:
+        c_type += '*'
     return mcgen('''
-
-void visit_type_%(name)s(Visitor *m, %(name)s *obj, const char *name, Error **errp);
+void visit_type_%(c_name)s(Visitor *m, %(c_type)sobj, const char *name, Error **errp);
 ''',
-                 name=c_name(name))
+                 c_name=c_name(name), c_type=c_type)
+
+class QAPISchemaGenVisitVisitor(QAPISchemaVisitor):
+    def __init__(self):
+        self.decl = None
+        self.defn = None
+        self.btin = None
+    def visit_begin(self):
+        self.decl = ''
+        self.defn = ''
+        self.btin = guardstart('QAPI_VISIT_BUILTIN_VISITOR_DECL')
+    def visit_end(self):
+        # to avoid header dependency hell, we always generate
+        # declarations for built-in types in our header files and
+        # simply guard them
+        self.btin += guardend('QAPI_VISIT_BUILTIN_VISITOR_DECL')
+        self.decl = self.btin + self.decl
+        self.btin = None
+        # ...this doesn't work for cases where we link in multiple
+        # objects that have the functions defined, so we use
+        # do_builtins (option -b) to provide control
+    def visit_enum_type(self, name, info, values):
+        self.decl += gen_visit_decl(name, scalar=True)
+        self.defn += generate_visit_enum(name)
+    def visit_array_type(self, name, info, element_type):
+        decl = gen_visit_decl(name)
+        defn = gen_visit_list(name, element_type)
+        if isinstance(element_type, QAPISchemaBuiltinType):
+            self.btin += decl
+            if do_builtins:
+                self.defn += defn
+        else:
+            self.decl += decl
+            self.defn += defn
+    def visit_object_type(self, name, info, base, members, variants):
+        if info:
+            self.decl += gen_visit_decl(name)
+            if variants:
+                self.defn += gen_visit_union(name, base, variants)
+            else:
+                self.defn += gen_visit_struct(name, base, members)
+    def visit_alternate_type(self, name, info, variants):
+        self.decl += gen_visit_decl(name)
+        self.defn += gen_visit_alternate(name, variants)
 
 do_builtins = False
 
@@ -442,56 +428,10 @@  fdecl.write(mcgen('''
 ''',
                   prefix=prefix))
 
-exprs = QAPISchema(input_file).get_exprs()
-
-# to avoid header dependency hell, we always generate declarations
-# for built-in types in our header files and simply guard them
-fdecl.write(guardstart("QAPI_VISIT_BUILTIN_VISITOR_DECL"))
-for typename in builtin_types.keys():
-    fdecl.write(generate_declaration(typename, builtin_type=True))
-fdecl.write(guardend("QAPI_VISIT_BUILTIN_VISITOR_DECL"))
-
-# ...this doesn't work for cases where we link in multiple objects that
-# have the functions defined, so we use -b option to provide control
-# over these cases
-if do_builtins:
-    for typename in builtin_types.keys():
-        fdef.write(generate_visit_list(typename))
-
-for expr in exprs:
-    if expr.has_key('struct'):
-        ret = generate_visit_struct(expr)
-        ret += generate_visit_list(expr['struct'])
-        fdef.write(ret)
-
-        ret = generate_declaration(expr['struct'])
-        fdecl.write(ret)
-    elif expr.has_key('union'):
-        ret = generate_visit_union(expr)
-        ret += generate_visit_list(expr['union'])
-        fdef.write(ret)
-
-        enum_define = discriminator_find_enum_define(expr)
-        ret = ""
-        if not enum_define:
-            ret = generate_decl_enum('%sKind' % expr['union'])
-        ret += generate_declaration(expr['union'])
-        fdecl.write(ret)
-    elif expr.has_key('alternate'):
-        ret = generate_visit_alternate(expr['alternate'], expr['data'])
-        ret += generate_visit_list(expr['alternate'])
-        fdef.write(ret)
-
-        ret = generate_decl_enum('%sKind' % expr['alternate'])
-        ret += generate_declaration(expr['alternate'])
-        fdecl.write(ret)
-    elif expr.has_key('enum'):
-        ret = generate_visit_list(expr['enum'])
-        ret += generate_visit_enum(expr['enum'])
-        fdef.write(ret)
-
-        ret = generate_decl_enum(expr['enum'])
-        ret += generate_enum_declaration(expr['enum'])
-        fdecl.write(ret)
+schema = QAPISchema(input_file)
+gen = QAPISchemaGenVisitVisitor()
+schema.visit(gen)
+fdef.write(gen.defn)
+fdecl.write(gen.decl)
 
 close_output(fdef, fdecl)
diff --git a/tests/qapi-schema/qapi-schema-test.json b/tests/qapi-schema/qapi-schema-test.json
index 257b4d4..b182174 100644
--- a/tests/qapi-schema/qapi-schema-test.json
+++ b/tests/qapi-schema/qapi-schema-test.json
@@ -39,8 +39,6 @@ 
   'data': { 'value1' : 'UserDefA',
             'value2' : 'UserDefB',
             'value3' : 'UserDefB' } }
-# FIXME generated visit_type_UserDefFlatUnion_fields() fails to visit
-# members of indirect base UserDefZero
 
 { 'struct': 'UserDefUnionBase',
   'base': 'UserDefZero',
@@ -57,7 +55,6 @@ 
 
 { 'alternate': 'UserDefAlternate',
   'data': { 'uda': 'UserDefA', 's': 'str', 'i': 'int' } }
-# FIXME only a declaration of visit_type_UserDefAlternateKind() generated
 
 { 'struct': 'UserDefC',
   'data': { 'string1': 'str', 'string2': 'str' } }
diff --git a/tests/test-qmp-input-strict.c b/tests/test-qmp-input-strict.c
index 68f855b..00c3e29 100644
--- a/tests/test-qmp-input-strict.c
+++ b/tests/test-qmp-input-strict.c
@@ -167,9 +167,9 @@  static void test_validate_union_flat(TestInputVisitorData *data,
 
     v = validate_test_init(data,
                            "{ 'enum1': 'value1', "
+                           "'integer': 41, "
                            "'string': 'str', "
                            "'boolean': true }");
-    /* TODO when generator bug is fixed, add 'integer': 41 */
 
     visit_type_UserDefFlatUnion(v, &tmp, NULL, &err);
     g_assert(!err);
diff --git a/tests/test-qmp-input-visitor.c b/tests/test-qmp-input-visitor.c
index b7a87ee..56da3c6 100644
--- a/tests/test-qmp-input-visitor.c
+++ b/tests/test-qmp-input-visitor.c
@@ -307,15 +307,15 @@  static void test_visitor_in_union_flat(TestInputVisitorData *data,
 
     v = visitor_input_test_init(data,
                                 "{ 'enum1': 'value1', "
+                                "'integer': 41, "
                                 "'string': 'str', "
                                 "'boolean': true }");
-    /* TODO when generator bug is fixed, add 'integer': 41 */
 
     visit_type_UserDefFlatUnion(v, &tmp, NULL, &err);
     g_assert(err == NULL);
     g_assert_cmpint(tmp->enum1, ==, ENUM_ONE_VALUE1);
     g_assert_cmpstr(tmp->string, ==, "str");
-    /* TODO g_assert_cmpint(tmp->integer, ==, 41); */
+    g_assert_cmpint(tmp->integer, ==, 41);
     g_assert_cmpint(tmp->value1->boolean, ==, true);
     qapi_free_UserDefFlatUnion(tmp);
 }