diff mbox

[RFC,v3,02/32] qapi: New QAPISchema intermediate reperesentation

Message ID 1438703896-12553-3-git-send-email-armbru@redhat.com
State New
Headers show

Commit Message

Markus Armbruster Aug. 4, 2015, 3:57 p.m. UTC
The QAPI code generators work with a syntax tree (nested dictionaries)
plus a few symbol tables (also dictionaries) on the side.

They have clearly outgrown these simple data structures.  There's lots
of rummaging around in dictionaries, and information is recomputed on
the fly.  For the work I'm going to do, I want more clearly defined
and more convenient interfaces.

Going forward, I also want less coupling between the back-ends and the
syntax tree, to make messing with the syntax easier.

Create a bunch of classes to represent QAPI schemata.

Have the QAPISchema initializer call the parser, then walk the syntax
tree to create the new internal representation, and finally perform
semantic analysis.

Shortcut: the semantic analysis still relies on existing check_exprs()
to do the actual semantic checking.  All this code needs to move into
the classes.  Mark as TODO.

We generate array types eagerly, even though most of them aren't used.
Mark as TODO.

Nothing uses the new intermediate representation just yet, thus no
change to generated files.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 scripts/qapi-commands.py       |   2 +-
 scripts/qapi-event.py          |   2 +-
 scripts/qapi-types.py          |   2 +-
 scripts/qapi-visit.py          |   2 +-
 scripts/qapi.py                | 361 ++++++++++++++++++++++++++++++++++++++++-
 tests/qapi-schema/test-qapi.py |   2 +-
 6 files changed, 357 insertions(+), 14 deletions(-)

Comments

Eric Blake Aug. 4, 2015, 9:53 p.m. UTC | #1
On 08/04/2015 09:57 AM, Markus Armbruster wrote:
> The QAPI code generators work with a syntax tree (nested dictionaries)
> plus a few symbol tables (also dictionaries) on the side.
> 
> They have clearly outgrown these simple data structures.  There's lots
> of rummaging around in dictionaries, and information is recomputed on
> the fly.  For the work I'm going to do, I want more clearly defined
> and more convenient interfaces.
> 
> Going forward, I also want less coupling between the back-ends and the
> syntax tree, to make messing with the syntax easier.
> 
> Create a bunch of classes to represent QAPI schemata.
> 
> Have the QAPISchema initializer call the parser, then walk the syntax
> tree to create the new internal representation, and finally perform
> semantic analysis.
> 
> Shortcut: the semantic analysis still relies on existing check_exprs()
> to do the actual semantic checking.  All this code needs to move into
> the classes.  Mark as TODO.
> 
> We generate array types eagerly, even though most of them aren't used.
> Mark as TODO.

I'm not sure if there are any array types that the rest of the code base
uses even though it doesn't appear as a directly used type within the
schemata.  Perhaps some of the builtin types (for example, if qom-get
needs to return ['uint8']). Besides builtins, maybe we can add some sort
of 'needs-array':'bool' key to each 'struct'/'union'/'enum'/'alternate',
which defaults to true only if the schema refers to the array type, but
which can be explicitly set to true to force the array type generation
even without a schema reference.  But as it is all properly marked TODO,
the idle ramblings in this paragraph don't affect review.

> 
> Nothing uses the new intermediate representation just yet, thus no
> change to generated files.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  scripts/qapi-commands.py       |   2 +-
>  scripts/qapi-event.py          |   2 +-
>  scripts/qapi-types.py          |   2 +-
>  scripts/qapi-visit.py          |   2 +-
>  scripts/qapi.py                | 361 ++++++++++++++++++++++++++++++++++++++++-
>  tests/qapi-schema/test-qapi.py |   2 +-
>  6 files changed, 357 insertions(+), 14 deletions(-)
> 

> +class QAPISchemaEnumType(QAPISchemaType):
> +    def __init__(self, name, info, values):
> +        QAPISchemaType.__init__(self, name, info)
> +        for v in values:
> +            assert isinstance(v, str)
> +        self.values = values
> +    def check(self, schema):
> +        assert len(set(self.values)) == len(self.values)

Doesn't check whether any of the distinct values map to the same C name.
 But not a show-stopper to this patch (the earlier semantic checking in
check_exprs() covers it, and your TODO about moving those checks here at
a later date is the right time to worry about it here).

> +
> +class QAPISchemaArrayType(QAPISchemaType):
> +    def __init__(self, name, info, element_type):
> +        QAPISchemaType.__init__(self, name, info)
> +        assert isinstance(element_type, str)
> +        self.element_type_name = element_type
> +        self.element_type = None
> +    def check(self, schema):
> +        self.element_type = schema.lookup_type(self.element_type_name)
> +        assert self.element_type

Is it worth adding:

assert not isinstance(self.element_type, QAPISchemaArrayType)

since we don't allow 2D arrays?

> +
> +class QAPISchemaObjectType(QAPISchemaType):
> +    def __init__(self, name, info, base, local_members, variants):
> +        QAPISchemaType.__init__(self, name, info)
> +        assert base == None or isinstance(base, str)
> +        for m in local_members:
> +            assert isinstance(m, QAPISchemaObjectTypeMember)
> +        if variants != None:
> +            assert isinstance(variants, QAPISchemaObjectTypeVariants)

Style nit: the 'base' and 'variants' checks are identical patterns
(checking for None or specific type), but only one uses an 'if'.
Possibly because of line-length issues, though, so I can live with it.

> +
> +class QAPISchemaObjectTypeVariant(QAPISchemaObjectTypeMember):
> +    def __init__(self, name, typ):
> +        QAPISchemaObjectTypeMember.__init__(self, name, typ, False)
> +    def check(self, schema, tag_type, seen):
> +        QAPISchemaObjectTypeMember.check(self, schema, [], seen)
> +        assert self.name in tag_type.values
> +    # TODO try to get rid of .simple_union_type()
> +    def simple_union_type(self):
> +        if isinstance(self.type, QAPISchemaObjectType) and not self.type.info:
> +            assert len(self.type.members) == 1
> +            assert not self.type.variants # not implemented

and probably never will be (looks like this assert is copy-and-pasted
from other locations where it makes sense that we might implement
support for variants, but I don't see it ever happening for the
generated ':obj-*-wrapper' type for the branch of a simple union)

At any rate, I concur that we have a difference in the generated code
for simple unions compared to flat unions (even if simple unions could
be rewritten in terms of a flat union from the QMP side, the generated C
side is not the same), so I agree that you need this function for now,
as well as the comment for if we later try to clean up the code base to
drop that difference.

> +class QAPISchema(object):

> +    def _make_array_type(self, element_type):
> +        name = element_type + 'List'
> +        if not self.lookup_type(name):
> +            self._def_entity(QAPISchemaArrayType(name, None, element_type))
> +        return name

Hmm - we probably have collisions if a user tries to explicitly name a
'struct' or other type with a 'List' suffix.  Not made worse by this
patch and not an actual problem with any of our existing .json files, so
we can save it for another day.

> +
> +    def _make_members(self, data):
> +        return [self._make_member(key, data[key]) for key in data.keys()]

You could write this as:

return [self._make_member(key, value) for (key, value) in data.iteritems()]

for fewer copies and lookups (dict.keys() and dict.items() creates
copies, while dict.iteritems() visits key/value pairs in place).

I don't care strongly enough to hold up review, whether or not you
change it.

> +
> +    def _def_union_type(self, expr, info):
> +        name = expr['union']
> +        data = expr['data']
> +        base = expr.get('base')
> +        tag_name = expr.get('discriminator')
> +        tag_enum = None
> +        if tag_name:
> +            variants = [self._make_variant(key, data[key])
> +                        for key in data.keys()]
> +        else:
> +            variants = [self._make_simple_variant(key, data[key])
> +                        for key in data.keys()]

Same comments about the possibility for writing these list
comprehensions more efficiently.  [and "list comprehensions" is my
latest bit of cool python language terminology that I learned the name
of, in order to do this review]

> +    def _def_command(self, expr, info):
> +        name = expr['command']
> +        data = expr.get('data')
> +        rets = expr.get('returns')
> +        gen = expr.get('gen', True)
> +        success_response = expr.get('success-response', True)
> +        if isinstance(data, OrderedDict):
> +            data = self._make_implicit_object_type(name, 'arg',
> +                                                   self._make_members(data))
> +        if isinstance(rets, list):
> +            assert len(rets) == 1
> +            rets = self._make_array_type(rets[0])
> +        elif isinstance(rets, OrderedDict):
> +            rets = self._make_implicit_object_type(name, 'ret',
> +                                                   self._make_members(rets))

Dead 'elif' branch (since you reject dictionaries in "[PATCH 21/26]
qapi: Command returning anonymous type doesn't work, outlaw")

Looks like there will be some minor tweaks when you drop the RFC, but
it's close enough at this point that I'm okay with:

Reviewed-by: Eric Blake <eblake@redhat.com>
Markus Armbruster Aug. 5, 2015, 6:23 a.m. UTC | #2
Eric Blake <eblake@redhat.com> writes:

> On 08/04/2015 09:57 AM, Markus Armbruster wrote:
>> The QAPI code generators work with a syntax tree (nested dictionaries)
>> plus a few symbol tables (also dictionaries) on the side.
>> 
>> They have clearly outgrown these simple data structures.  There's lots
>> of rummaging around in dictionaries, and information is recomputed on
>> the fly.  For the work I'm going to do, I want more clearly defined
>> and more convenient interfaces.
>> 
>> Going forward, I also want less coupling between the back-ends and the
>> syntax tree, to make messing with the syntax easier.
>> 
>> Create a bunch of classes to represent QAPI schemata.
>> 
>> Have the QAPISchema initializer call the parser, then walk the syntax
>> tree to create the new internal representation, and finally perform
>> semantic analysis.
>> 
>> Shortcut: the semantic analysis still relies on existing check_exprs()
>> to do the actual semantic checking.  All this code needs to move into
>> the classes.  Mark as TODO.
>> 
>> We generate array types eagerly, even though most of them aren't used.
>> Mark as TODO.
>
> I'm not sure if there are any array types that the rest of the code base
> uses even though it doesn't appear as a directly used type within the
> schemata.  Perhaps some of the builtin types (for example, if qom-get
> needs to return ['uint8']). Besides builtins, maybe we can add some sort
> of 'needs-array':'bool' key to each 'struct'/'union'/'enum'/'alternate',
> which defaults to true only if the schema refers to the array type, but
> which can be explicitly set to true to force the array type generation
> even without a schema reference.  But as it is all properly marked TODO,
> the idle ramblings in this paragraph don't affect review.

Because code for built-ins can be shared among schemata (see -b), we
probably have to keep generating code for array of built-in type
unconditionally.

Re needs-array: stupidest solution that could possibly work is to have
an otherwise useless struct mention all the needed arrays.

>> Nothing uses the new intermediate representation just yet, thus no
>> change to generated files.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  scripts/qapi-commands.py       |   2 +-
>>  scripts/qapi-event.py          |   2 +-
>>  scripts/qapi-types.py          |   2 +-
>>  scripts/qapi-visit.py          |   2 +-
>>  scripts/qapi.py                | 361 ++++++++++++++++++++++++++++++++++++++++-
>>  tests/qapi-schema/test-qapi.py |   2 +-
>>  6 files changed, 357 insertions(+), 14 deletions(-)
>> 
>
>> +class QAPISchemaEnumType(QAPISchemaType):
>> +    def __init__(self, name, info, values):
>> +        QAPISchemaType.__init__(self, name, info)
>> +        for v in values:
>> +            assert isinstance(v, str)
>> +        self.values = values
>> +    def check(self, schema):
>> +        assert len(set(self.values)) == len(self.values)
>
> Doesn't check whether any of the distinct values map to the same C name.

Pervasive issue.

>  But not a show-stopper to this patch (the earlier semantic checking in
> check_exprs() covers it, and your TODO about moving those checks here at
> a later date is the right time to worry about it here).
>
>> +
>> +class QAPISchemaArrayType(QAPISchemaType):
>> +    def __init__(self, name, info, element_type):
>> +        QAPISchemaType.__init__(self, name, info)
>> +        assert isinstance(element_type, str)
>> +        self.element_type_name = element_type
>> +        self.element_type = None
>> +    def check(self, schema):
>> +        self.element_type = schema.lookup_type(self.element_type_name)
>> +        assert self.element_type
>
> Is it worth adding:
>
> assert not isinstance(self.element_type, QAPISchemaArrayType)
>
> since we don't allow 2D arrays?

If the generators actually rely on it, yes.

If it's just an arbitrary schema language restriction, probably no.

>> +
>> +class QAPISchemaObjectType(QAPISchemaType):
>> +    def __init__(self, name, info, base, local_members, variants):
>> +        QAPISchemaType.__init__(self, name, info)
>> +        assert base == None or isinstance(base, str)
>> +        for m in local_members:
>> +            assert isinstance(m, QAPISchemaObjectTypeMember)
>> +        if variants != None:
>> +            assert isinstance(variants, QAPISchemaObjectTypeVariants)
>
> Style nit: the 'base' and 'variants' checks are identical patterns
> (checking for None or specific type), but only one uses an 'if'.
> Possibly because of line-length issues, though, so I can live with it.

I'll clean it up.

>> +
>> +class QAPISchemaObjectTypeVariant(QAPISchemaObjectTypeMember):
>> +    def __init__(self, name, typ):
>> +        QAPISchemaObjectTypeMember.__init__(self, name, typ, False)
>> +    def check(self, schema, tag_type, seen):
>> +        QAPISchemaObjectTypeMember.check(self, schema, [], seen)
>> +        assert self.name in tag_type.values
>> +    # TODO try to get rid of .simple_union_type()
>> +    def simple_union_type(self):
>> +        if isinstance(self.type, QAPISchemaObjectType) and not self.type.info:
>> +            assert len(self.type.members) == 1
>> +            assert not self.type.variants # not implemented
>
> and probably never will be (looks like this assert is copy-and-pasted
> from other locations where it makes sense that we might implement
> support for variants, but I don't see it ever happening for the
> generated ':obj-*-wrapper' type for the branch of a simple union)

True.  I'll drop the comment.

> At any rate, I concur that we have a difference in the generated code
> for simple unions compared to flat unions (even if simple unions could
> be rewritten in terms of a flat union from the QMP side, the generated C
> side is not the same), so I agree that you need this function for now,
> as well as the comment for if we later try to clean up the code base to
> drop that difference.

That's the plan.

>> +class QAPISchema(object):
>
>> +    def _make_array_type(self, element_type):
>> +        name = element_type + 'List'
>> +        if not self.lookup_type(name):
>> +            self._def_entity(QAPISchemaArrayType(name, None, element_type))
>> +        return name
>
> Hmm - we probably have collisions if a user tries to explicitly name a
> 'struct' or other type with a 'List' suffix.  Not made worse by this
> patch and not an actual problem with any of our existing .json files, so
> we can save it for another day.

qapi-code-gen.txt reserves the 'Kind' suffix.

We should either adopt a sane, non-colliding scheme for generated names,
or prevent collisions by rejecting reserved names with a sane error
message (documenting them is then optional), or document reserved names.
The latter two require us to figure out what names we reserve.  But as
you say, it's a task for another day.

>> +
>> +    def _make_members(self, data):
>> +        return [self._make_member(key, data[key]) for key in data.keys()]
>
> You could write this as:
>
> return [self._make_member(key, value) for (key, value) in data.iteritems()]
>
> for fewer copies and lookups (dict.keys() and dict.items() creates
> copies, while dict.iteritems() visits key/value pairs in place).
>
> I don't care strongly enough to hold up review, whether or not you
> change it.

I'll check it out.

>> +
>> +    def _def_union_type(self, expr, info):
>> +        name = expr['union']
>> +        data = expr['data']
>> +        base = expr.get('base')
>> +        tag_name = expr.get('discriminator')
>> +        tag_enum = None
>> +        if tag_name:
>> +            variants = [self._make_variant(key, data[key])
>> +                        for key in data.keys()]
>> +        else:
>> +            variants = [self._make_simple_variant(key, data[key])
>> +                        for key in data.keys()]
>
> Same comments about the possibility for writing these list
> comprehensions more efficiently.

There are a few more.  I'll either change all or none.

>                                   [and "list comprehensions" is my
> latest bit of cool python language terminology that I learned the name
> of, in order to do this review]

If I remember correctly, I first met them in an obscure hybrid of Prolog
and ML that was at the time a Professor's pet project, which naturally
meant every student must experience it, implemented in Lisp by a gifted
grad student, on an underpowered, memory-starved Mac.  Let's say not
every student was able to see the beauty within the beast.

>> +    def _def_command(self, expr, info):
>> +        name = expr['command']
>> +        data = expr.get('data')
>> +        rets = expr.get('returns')
>> +        gen = expr.get('gen', True)
>> +        success_response = expr.get('success-response', True)
>> +        if isinstance(data, OrderedDict):
>> +            data = self._make_implicit_object_type(name, 'arg',
>> +                                                   self._make_members(data))
>> +        if isinstance(rets, list):
>> +            assert len(rets) == 1
>> +            rets = self._make_array_type(rets[0])
>> +        elif isinstance(rets, OrderedDict):
>> +            rets = self._make_implicit_object_type(name, 'ret',
>> +                                                   self._make_members(rets))
>
> Dead 'elif' branch (since you reject dictionaries in "[PATCH 21/26]
> qapi: Command returning anonymous type doesn't work, outlaw")

Yup, will drop it.

> Looks like there will be some minor tweaks when you drop the RFC, but
> it's close enough at this point that I'm okay with:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!
Eric Blake Aug. 5, 2015, 2:27 p.m. UTC | #3
On 08/05/2015 12:23 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 08/04/2015 09:57 AM, Markus Armbruster wrote:
>>> The QAPI code generators work with a syntax tree (nested dictionaries)
>>> plus a few symbol tables (also dictionaries) on the side.
>>>

>>> +class QAPISchemaArrayType(QAPISchemaType):
>>> +    def __init__(self, name, info, element_type):
>>> +        QAPISchemaType.__init__(self, name, info)
>>> +        assert isinstance(element_type, str)
>>> +        self.element_type_name = element_type
>>> +        self.element_type = None
>>> +    def check(self, schema):
>>> +        self.element_type = schema.lookup_type(self.element_type_name)
>>> +        assert self.element_type
>>
>> Is it worth adding:
>>
>> assert not isinstance(self.element_type, QAPISchemaArrayType)
>>
>> since we don't allow 2D arrays?
> 
> If the generators actually rely on it, yes.

Hmm. What happens if you do
 { 'command': 'Foo', 'returns': [ 'intList' ] }

> 
> If it's just an arbitrary schema language restriction, probably no.

That's a tough judgment call. We don't currently allow [ [ 'int' ] ],
and the [ 'intList' ] hack is gross. On the other hand, I'm having a
tough time coming up with technical reasons why we can't do it (arrays
as a parameter or return type should work, and 2D arrays just add
another layer of '*' to the C code).

>>> +    def _make_array_type(self, element_type):
>>> +        name = element_type + 'List'
>>> +        if not self.lookup_type(name):
>>> +            self._def_entity(QAPISchemaArrayType(name, None, element_type))
>>> +        return name
>>
>> Hmm - we probably have collisions if a user tries to explicitly name a
>> 'struct' or other type with a 'List' suffix.  Not made worse by this
>> patch and not an actual problem with any of our existing .json files, so
>> we can save it for another day.
> 
> qapi-code-gen.txt reserves the 'Kind' suffix.
> 
> We should either adopt a sane, non-colliding scheme for generated names,
> or prevent collisions by rejecting reserved names with a sane error
> message (documenting them is then optional), or document reserved names.
> The latter two require us to figure out what names we reserve.  But as
> you say, it's a task for another day.

And that cleanup can worry about [ 'intList' ].
Markus Armbruster Aug. 6, 2015, 5:46 a.m. UTC | #4
Eric Blake <eblake@redhat.com> writes:

> On 08/05/2015 12:23 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>>> On 08/04/2015 09:57 AM, Markus Armbruster wrote:
>>>> The QAPI code generators work with a syntax tree (nested dictionaries)
>>>> plus a few symbol tables (also dictionaries) on the side.
>>>>
>
>>>> +class QAPISchemaArrayType(QAPISchemaType):
>>>> +    def __init__(self, name, info, element_type):
>>>> +        QAPISchemaType.__init__(self, name, info)
>>>> +        assert isinstance(element_type, str)
>>>> +        self.element_type_name = element_type
>>>> +        self.element_type = None
>>>> +    def check(self, schema):
>>>> +        self.element_type = schema.lookup_type(self.element_type_name)
>>>> +        assert self.element_type
>>>
>>> Is it worth adding:
>>>
>>> assert not isinstance(self.element_type, QAPISchemaArrayType)
>>>
>>> since we don't allow 2D arrays?
>> 
>> If the generators actually rely on it, yes.
>
> Hmm. What happens if you do
>  { 'command': 'Foo', 'returns': [ 'intList' ] }
>
>> 
>> If it's just an arbitrary schema language restriction, probably no.
>
> That's a tough judgment call. We don't currently allow [ [ 'int' ] ],
> and the [ 'intList' ] hack is gross. On the other hand, I'm having a
> tough time coming up with technical reasons why we can't do it (arrays
> as a parameter or return type should work, and 2D arrays just add
> another layer of '*' to the C code).

Perhaps a quick experiment can decide the nature of the restriction.

>>>> +    def _make_array_type(self, element_type):
>>>> +        name = element_type + 'List'
>>>> +        if not self.lookup_type(name):
>>>> +            self._def_entity(QAPISchemaArrayType(name, None, element_type))
>>>> +        return name
>>>
>>> Hmm - we probably have collisions if a user tries to explicitly name a
>>> 'struct' or other type with a 'List' suffix.  Not made worse by this
>>> patch and not an actual problem with any of our existing .json files, so
>>> we can save it for another day.
>> 
>> qapi-code-gen.txt reserves the 'Kind' suffix.
>> 
>> We should either adopt a sane, non-colliding scheme for generated names,
>> or prevent collisions by rejecting reserved names with a sane error
>> message (documenting them is then optional), or document reserved names.
>> The latter two require us to figure out what names we reserve.  But as
>> you say, it's a task for another day.
>
> And that cleanup can worry about [ 'intList' ].

Yes.
diff mbox

Patch

diff --git a/scripts/qapi-commands.py b/scripts/qapi-commands.py
index 890ce5d..12bdc4c 100644
--- a/scripts/qapi-commands.py
+++ b/scripts/qapi-commands.py
@@ -272,7 +272,7 @@  for o, a in opts:
     if o in ("-m", "--middle"):
         middle_mode = True
 
-exprs = parse_schema(input_file)
+exprs = QAPISchema(input_file).get_exprs()
 commands = filter(lambda expr: expr.has_key('command'), exprs)
 commands = filter(lambda expr: not expr.has_key('gen'), commands)
 
diff --git a/scripts/qapi-event.py b/scripts/qapi-event.py
index 7f238df..aec2d32 100644
--- a/scripts/qapi-event.py
+++ b/scripts/qapi-event.py
@@ -263,7 +263,7 @@  fdecl.write(mcgen('''
 ''',
                   prefix=prefix))
 
-exprs = parse_schema(input_file)
+exprs = QAPISchema(input_file).get_exprs()
 
 event_enum_name = c_name(prefix + "QAPIEvent", protect=False)
 event_enum_values = []
diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
index f2428f3..d162ca2 100644
--- a/scripts/qapi-types.py
+++ b/scripts/qapi-types.py
@@ -336,7 +336,7 @@  fdecl.write(mcgen('''
 #include <stdint.h>
 '''))
 
-exprs = parse_schema(input_file)
+exprs = QAPISchema(input_file).get_exprs()
 
 fdecl.write(guardstart("QAPI_TYPES_BUILTIN_STRUCT_DECL"))
 for typename in builtin_types.keys():
diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
index 3cd662b..c493964 100644
--- a/scripts/qapi-visit.py
+++ b/scripts/qapi-visit.py
@@ -446,7 +446,7 @@  fdecl.write(mcgen('''
 ''',
                   prefix=prefix))
 
-exprs = parse_schema(input_file)
+exprs = QAPISchema(input_file).get_exprs()
 
 # to avoid header dependency hell, we always generate declarations
 # for built-in types in our header files and simply guard them
diff --git a/scripts/qapi.py b/scripts/qapi.py
index 6297656..34e52c5 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -302,6 +302,7 @@  class QAPISchemaParser(object):
 
 #
 # Semantic analysis of schema expressions
+# TODO fold into QAPISchema
 #
 
 def find_base_fields(base):
@@ -755,15 +756,357 @@  def check_exprs(exprs):
         else:
             assert False, 'unexpected meta type'
 
-    return map(lambda expr_elem: expr_elem['expr'], exprs)
-
-def parse_schema(fname):
-    try:
-        schema = QAPISchemaParser(open(fname, "r"))
-        return check_exprs(schema.exprs)
-    except (QAPISchemaError, QAPIExprError), e:
-        print >>sys.stderr, e
-        exit(1)
+    return exprs
+
+#
+# Schema compiler frontend
+#
+
+class QAPISchemaEntity(object):
+    def __init__(self, name, info):
+        assert isinstance(name, str)
+        self.name = name
+        self.info = info
+    def check(self, schema):
+        pass
+
+class QAPISchemaType(QAPISchemaEntity):
+    pass
+
+class QAPISchemaBuiltinType(QAPISchemaType):
+    def __init__(self, name):
+        QAPISchemaType.__init__(self, name, None)
+
+class QAPISchemaEnumType(QAPISchemaType):
+    def __init__(self, name, info, values):
+        QAPISchemaType.__init__(self, name, info)
+        for v in values:
+            assert isinstance(v, str)
+        self.values = values
+    def check(self, schema):
+        assert len(set(self.values)) == len(self.values)
+
+class QAPISchemaArrayType(QAPISchemaType):
+    def __init__(self, name, info, element_type):
+        QAPISchemaType.__init__(self, name, info)
+        assert isinstance(element_type, str)
+        self.element_type_name = element_type
+        self.element_type = None
+    def check(self, schema):
+        self.element_type = schema.lookup_type(self.element_type_name)
+        assert self.element_type
+
+class QAPISchemaObjectType(QAPISchemaType):
+    def __init__(self, name, info, base, local_members, variants):
+        QAPISchemaType.__init__(self, name, info)
+        assert base == None or isinstance(base, str)
+        for m in local_members:
+            assert isinstance(m, QAPISchemaObjectTypeMember)
+        if variants != None:
+            assert isinstance(variants, QAPISchemaObjectTypeVariants)
+        self.base_name = base
+        self.base = None
+        self.local_members = local_members
+        self.variants = variants
+        self.members = None
+    def check(self, schema):
+        assert self.members != False    # not running in cycles
+        if self.members:
+            return
+        self.members = False            # mark as being checked
+        if self.base_name:
+            self.base = schema.lookup_type(self.base_name)
+            assert isinstance(self.base, QAPISchemaObjectType)
+            assert not self.base.variants # not implemented
+            self.base.check(schema)
+            members = list(self.base.members)
+        else:
+            members = []
+        seen = {}
+        for m in members:
+            seen[m.name] = m
+        for m in self.local_members:
+            m.check(schema, members, seen)
+        if self.variants:
+            self.variants.check(schema, members, seen)
+        self.members = members
+
+class QAPISchemaObjectTypeMember(object):
+    def __init__(self, name, typ, optional):
+        assert isinstance(name, str)
+        assert isinstance(typ, str)
+        assert isinstance(optional, bool)
+        self.name = name
+        self.type_name = typ
+        self.type = None
+        self.optional = optional
+    def check(self, schema, all_members, seen):
+        assert self.name not in seen
+        self.type = schema.lookup_type(self.type_name)
+        assert self.type
+        all_members.append(self)
+        seen[self.name] = self
+
+class QAPISchemaObjectTypeVariants(object):
+    def __init__(self, tag_name, tag_enum, variants):
+        assert tag_name == None or isinstance(tag_name, str)
+        assert tag_enum == None or isinstance(tag_enum, str)
+        for v in variants:
+            assert isinstance(v, QAPISchemaObjectTypeVariant)
+        self.tag_name = tag_name
+        if tag_name:
+            assert not tag_enum
+            self.tag_member = None
+        else:
+            self.tag_member = QAPISchemaObjectTypeMember('kind', tag_enum,
+                                                         False)
+        self.variants = variants
+    def check(self, schema, members, seen):
+        if self.tag_name:
+            self.tag_member = seen[self.tag_name]
+        else:
+            self.tag_member.check(schema, members, seen)
+        assert isinstance(self.tag_member.type, QAPISchemaEnumType)
+        for v in self.variants:
+            vseen = dict(seen)
+            v.check(schema, self.tag_member.type, vseen)
+
+class QAPISchemaObjectTypeVariant(QAPISchemaObjectTypeMember):
+    def __init__(self, name, typ):
+        QAPISchemaObjectTypeMember.__init__(self, name, typ, False)
+    def check(self, schema, tag_type, seen):
+        QAPISchemaObjectTypeMember.check(self, schema, [], seen)
+        assert self.name in tag_type.values
+    # TODO try to get rid of .simple_union_type()
+    def simple_union_type(self):
+        if isinstance(self.type, QAPISchemaObjectType) and not self.type.info:
+            assert len(self.type.members) == 1
+            assert not self.type.variants # not implemented
+            return self.type.members[0].type
+        return None
+
+class QAPISchemaAlternateType(QAPISchemaType):
+    def __init__(self, name, info, variants):
+        QAPISchemaType.__init__(self, name, info)
+        assert isinstance(variants, QAPISchemaObjectTypeVariants)
+        assert not variants.tag_name
+        self.variants = variants
+    def check(self, schema):
+        self.variants.check(schema, [], {})
+
+class QAPISchemaCommand(QAPISchemaEntity):
+    def __init__(self, name, info, arg_type, ret_type, gen, success_response):
+        QAPISchemaEntity.__init__(self, name, info)
+        assert not arg_type or isinstance(arg_type, str)
+        assert not ret_type or isinstance(ret_type, str)
+        self.arg_type_name = arg_type
+        self.arg_type = None
+        self.ret_type_name = ret_type
+        self.ret_type = None
+        self.gen = gen
+        self.success_response = success_response
+    def check(self, schema):
+        if self.arg_type_name:
+            self.arg_type = schema.lookup_type(self.arg_type_name)
+            assert isinstance(self.arg_type, QAPISchemaObjectType)
+            assert not self.arg_type.variants # not implemented
+        if self.ret_type_name:
+            self.ret_type = schema.lookup_type(self.ret_type_name)
+            assert isinstance(self.ret_type, QAPISchemaType)
+
+class QAPISchemaEvent(QAPISchemaEntity):
+    def __init__(self, name, info, arg_type):
+        QAPISchemaEntity.__init__(self, name, info)
+        assert not arg_type or isinstance(arg_type, str)
+        self.arg_type_name = arg_type
+        self.arg_type = None
+    def check(self, schema):
+        if self.arg_type_name:
+            self.arg_type = schema.lookup_type(self.arg_type_name)
+            assert isinstance(self.arg_type, QAPISchemaObjectType)
+            assert not self.arg_type.variants # not implemented
+
+class QAPISchema(object):
+    def __init__(self, fname):
+        try:
+            self.exprs = check_exprs(QAPISchemaParser(open(fname, "r")).exprs)
+        except (QAPISchemaError, QAPIExprError), err:
+            print >>sys.stderr, err
+            exit(1)
+        self.entity_dict = {}
+        self._def_predefineds()
+        self._def_exprs()
+        self.check()
+
+    def get_exprs(self):
+        return [expr_elem['expr'] for expr_elem in self.exprs]
+
+    def _def_entity(self, ent):
+        assert ent.name not in self.entity_dict
+        self.entity_dict[ent.name] = ent
+
+    def lookup_entity(self, name, typ=None):
+        ent = self.entity_dict.get(name)
+        if typ and not isinstance(ent, typ):
+            return None
+        return ent
+
+    def lookup_type(self, name):
+        return self.lookup_entity(name, QAPISchemaType)
+
+    def _def_builtin_type(self, name):
+        self._def_entity(QAPISchemaBuiltinType(name))
+        if name != '**':
+            self._make_array_type(name) # TODO really needed?
+
+    def _def_predefineds(self):
+        for t in ['str', 'number', 'int', 'int8', 'int16', 'int32', 'int64',
+                  'uint8', 'uint16', 'uint32', 'uint64', 'size', 'bool', '**']:
+            self._def_builtin_type(t)
+
+    def _make_implicit_enum_type(self, name, values):
+        name = name + 'Kind'
+        self._def_entity(QAPISchemaEnumType(name, None, values))
+        return name
+
+    def _make_array_type(self, element_type):
+        name = element_type + 'List'
+        if not self.lookup_type(name):
+            self._def_entity(QAPISchemaArrayType(name, None, element_type))
+        return name
+
+    def _make_implicit_object_type(self, name, role, members):
+        if not members:
+            return None
+        name = ':obj-%s-%s' % (name, role)
+        if not self.lookup_entity(name, QAPISchemaObjectType):
+            self._def_entity(QAPISchemaObjectType(name, None, None,
+                                                  members, None))
+        return name
+
+    def _def_enum_type(self, expr, info):
+        name = expr['enum']
+        data = expr['data']
+        self._def_entity(QAPISchemaEnumType(name, info, data))
+        self._make_array_type(name) # TODO really needed?
+
+    def _make_member(self, name, typ):
+        optional = False
+        if name.startswith('*'):
+            name = name[1:]
+            optional = True
+        if isinstance(typ, list):
+            assert len(typ) == 1
+            typ = self._make_array_type(typ[0])
+        return QAPISchemaObjectTypeMember(name, typ, optional)
+
+    def _make_members(self, data):
+        return [self._make_member(key, data[key]) for key in data.keys()]
+
+    def _def_struct_type(self, expr, info):
+        name = expr['struct']
+        base = expr.get('base')
+        data = expr['data']
+        self._def_entity(QAPISchemaObjectType(name, info, base,
+                                              self._make_members(data),
+                                              None))
+        self._make_array_type(name) # TODO really needed?
+
+    def _make_variant(self, case, typ):
+        return QAPISchemaObjectTypeVariant(case, typ)
+
+    def _make_simple_variant(self, case, typ):
+        if isinstance(typ, list):
+            assert len(typ) == 1
+            typ = self._make_array_type(typ[0])
+        typ = self._make_implicit_object_type(typ, 'wrapper',
+                                              [self._make_member('data', typ)])
+        return QAPISchemaObjectTypeVariant(case, typ)
+
+    def _make_tag_enum(self, type_name, variants):
+        return self._make_implicit_enum_type(type_name,
+                                             [v.name for v in variants])
+
+    def _def_union_type(self, expr, info):
+        name = expr['union']
+        data = expr['data']
+        base = expr.get('base')
+        tag_name = expr.get('discriminator')
+        tag_enum = None
+        if tag_name:
+            variants = [self._make_variant(key, data[key])
+                        for key in data.keys()]
+        else:
+            variants = [self._make_simple_variant(key, data[key])
+                        for key in data.keys()]
+            tag_enum = self._make_tag_enum(name, variants)
+        self._def_entity(QAPISchemaObjectType(name, info, base,
+                                    self._make_members(OrderedDict()),
+                                    QAPISchemaObjectTypeVariants(tag_name,
+                                                                 tag_enum,
+                                                                 variants)))
+        self._make_array_type(name) # TODO really needed?
+
+    def _def_alternate_type(self, expr, info):
+        name = expr['alternate']
+        data = expr['data']
+        variants = [self._make_variant(key, data[key])
+                    for key in data.keys()]
+        tag_enum = self._make_tag_enum(name, variants)
+        self._def_entity(QAPISchemaAlternateType(name, info,
+                                    QAPISchemaObjectTypeVariants(None,
+                                                                 tag_enum,
+                                                                 variants)))
+        self._make_array_type(name) # TODO really needed?
+
+    def _def_command(self, expr, info):
+        name = expr['command']
+        data = expr.get('data')
+        rets = expr.get('returns')
+        gen = expr.get('gen', True)
+        success_response = expr.get('success-response', True)
+        if isinstance(data, OrderedDict):
+            data = self._make_implicit_object_type(name, 'arg',
+                                                   self._make_members(data))
+        if isinstance(rets, list):
+            assert len(rets) == 1
+            rets = self._make_array_type(rets[0])
+        elif isinstance(rets, OrderedDict):
+            rets = self._make_implicit_object_type(name, 'ret',
+                                                   self._make_members(rets))
+        self._def_entity(QAPISchemaCommand(name, info, data, rets, gen,
+                                           success_response))
+
+    def _def_event(self, expr, info):
+        name = expr['event']
+        data = expr.get('data')
+        if isinstance(data, OrderedDict):
+            data = self._make_implicit_object_type(name, 'arg',
+                                                   self._make_members(data))
+        self._def_entity(QAPISchemaEvent(name, info, data))
+
+    def _def_exprs(self):
+        for expr_elem in self.exprs:
+            expr = expr_elem['expr']
+            info = expr_elem['info']
+            if 'enum' in expr:
+                self._def_enum_type(expr, info)
+            elif 'struct' in expr:
+                self._def_struct_type(expr, info)
+            elif 'union' in expr:
+                self._def_union_type(expr, info)
+            elif 'alternate' in expr:
+                self._def_alternate_type(expr, info)
+            elif 'command' in expr:
+                self._def_command(expr, info)
+            elif 'event' in expr:
+                self._def_event(expr, info)
+            else:
+                assert False
+
+    def check(self):
+        for ent in self.entity_dict.values():
+            ent.check(self)
 
 #
 # Code generation helpers
diff --git a/tests/qapi-schema/test-qapi.py b/tests/qapi-schema/test-qapi.py
index 634ef2d..461c713 100644
--- a/tests/qapi-schema/test-qapi.py
+++ b/tests/qapi-schema/test-qapi.py
@@ -16,7 +16,7 @@  import os
 import sys
 
 try:
-    exprs = parse_schema(sys.argv[1])
+    exprs = QAPISchema(sys.argv[1]).get_exprs()
 except SystemExit:
     raise