diff mbox series

[v3,03/16] qapi/expr.py: constrain incoming expression types

Message ID 20210223003408.964543-4-jsnow@redhat.com
State New
Headers show
Series qapi: static typing conversion, pt3 | expand

Commit Message

John Snow Feb. 23, 2021, 12:33 a.m. UTC
mypy does not know the types of values stored in Dicts that masquerade
as objects. Help the type checker out by constraining the type.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Cleber Rosa <crosa@redhat.com>
---
 scripts/qapi/expr.py | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

Comments

Markus Armbruster Feb. 24, 2021, 10:01 a.m. UTC | #1
John Snow <jsnow@redhat.com> writes:

> mypy does not know the types of values stored in Dicts that masquerade
> as objects. Help the type checker out by constraining the type.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> Reviewed-by: Cleber Rosa <crosa@redhat.com>
> ---
>  scripts/qapi/expr.py | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
> index 5694c501fa3..783282b53ce 100644
> --- a/scripts/qapi/expr.py
> +++ b/scripts/qapi/expr.py
> @@ -15,9 +15,17 @@
>  # See the COPYING file in the top-level directory.
>  
>  import re
> +from typing import MutableMapping, Optional
>  
>  from .common import c_name
>  from .error import QAPISemError
> +from .parser import QAPIDoc
> +from .source import QAPISourceInfo
> +
> +
> +# Expressions in their raw form are JSON-like structures with arbitrary forms.
> +# Minimally, their top-level form must be a mapping of strings to values.
> +Expression = MutableMapping[str, object]

MutableMapping, fancy.  It's only ever dict.  Why abstract from that?

The use of object is again owed to mypy's inability to do recursive
types.  What we really have here is something like

   Expression = Union[bool, str, dict[str, Expression], list[Expression]]

with the root further constrained to the Union's dict branch.  Spell
that out in a bit more detail, like you did in introspect.py?

Hmm, there you used Any, not object.  I guess that's because mypy lets
you get away with object here, but not there.  Correct?

Also, PEP 8 comment line length.

>  
>  
>  # Names must be letters, numbers, -, and _.  They must start with letter,
> @@ -287,9 +295,20 @@ def check_event(expr, info):
>  
>  def check_exprs(exprs):
>      for expr_elem in exprs:
> -        expr = expr_elem['expr']
> -        info = expr_elem['info']
> -        doc = expr_elem.get('doc')
> +        # Expression
> +        assert isinstance(expr_elem['expr'], dict)
> +        for key in expr_elem['expr'].keys():
> +            assert isinstance(key, str)
> +        expr: Expression = expr_elem['expr']

You're fine with repeating exp_elem['expr'] here, and ...

> +
> +        # QAPISourceInfo
> +        assert isinstance(expr_elem['info'], QAPISourceInfo)
> +        info: QAPISourceInfo = expr_elem['info']

... expr_elem['info'] here, but ...

> +
> +        # Optional[QAPIDoc]
> +        tmp = expr_elem.get('doc')
> +        assert tmp is None or isinstance(tmp, QAPIDoc)
> +        doc: Optional[QAPIDoc] = tmp

... you avoid repetition of expr_elem.get('doc') here.  Any particular
reason?

>  
>          if 'include' in expr:
>              continue
John Snow Feb. 24, 2021, 9:46 p.m. UTC | #2
On 2/24/21 5:01 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> mypy does not know the types of values stored in Dicts that masquerade
>> as objects. Help the type checker out by constraining the type.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>> Reviewed-by: Cleber Rosa <crosa@redhat.com>
>> ---
>>   scripts/qapi/expr.py | 25 ++++++++++++++++++++++---
>>   1 file changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
>> index 5694c501fa3..783282b53ce 100644
>> --- a/scripts/qapi/expr.py
>> +++ b/scripts/qapi/expr.py
>> @@ -15,9 +15,17 @@
>>   # See the COPYING file in the top-level directory.
>>   
>>   import re
>> +from typing import MutableMapping, Optional
>>   
>>   from .common import c_name
>>   from .error import QAPISemError
>> +from .parser import QAPIDoc
>> +from .source import QAPISourceInfo
>> +
>> +
>> +# Expressions in their raw form are JSON-like structures with arbitrary forms.
>> +# Minimally, their top-level form must be a mapping of strings to values.
>> +Expression = MutableMapping[str, object]
> 
> MutableMapping, fancy.  It's only ever dict.  Why abstract from that?
> 

I don't know! I referenced this in the cover letter. I cannot remember 
the reason anymore. It had R-Bs on it so I left it alone.

There are some differences, but I no longer remember why I thought they 
applied. Maybe some of my more exploratory work wanted it. Dunno.

> The use of object is again owed to mypy's inability to do recursive
> types.  What we really have here is something like
> 
>     Expression = Union[bool, str, dict[str, Expression], list[Expression]]
> 
> with the root further constrained to the Union's dict branch.  Spell
> that out in a bit more detail, like you did in introspect.py?
> 

Terminology problem?

I am using "Expression" to mean very specifically a top-level object as 
returned from parser.py, which *must* be an Object, so it *must* be a 
mapping of str => yaddayadda.

The type as I intended it is Expression = Dict[str, yaddayadda]

where yaddayadda is
Union[int, str, bool, List[yaddayadda], Dict[str, yaddayadda]]
expr.py is what validates the yaddayadda, so there's no point in trying 
to type it further, I think.

Probably worth a better comment.

> Hmm, there you used Any, not object.  I guess that's because mypy lets
> you get away with object here, but not there.  Correct?
> 

Yep. I can get away with the stricter type here because of how we use 
it, so I did. That's an artifact of it not being recursive and how 
expr.py's entire raison d'etre is using isinstance() checks to 
effectively downcast for us everywhere already.

> Also, PEP 8 comment line length.
> 

Augh.

Is there a way to set emacs mode highlighting in Python such that it 
highlights when I run past the 72-col margin, but only for comments?

I have the general-purpose highlighter on for the 80-col margin.

I'm not familiar with any setting like this for any of the linters or 
pycharm right away either.

>>   
>>   
>>   # Names must be letters, numbers, -, and _.  They must start with letter,
>> @@ -287,9 +295,20 @@ def check_event(expr, info):
>>   
>>   def check_exprs(exprs):
>>       for expr_elem in exprs:
>> -        expr = expr_elem['expr']
>> -        info = expr_elem['info']
>> -        doc = expr_elem.get('doc')
>> +        # Expression
>> +        assert isinstance(expr_elem['expr'], dict)
>> +        for key in expr_elem['expr'].keys():
>> +            assert isinstance(key, str)
>> +        expr: Expression = expr_elem['expr']
> 
> You're fine with repeating exp_elem['expr'] here, and ...
> 
>> +
>> +        # QAPISourceInfo
>> +        assert isinstance(expr_elem['info'], QAPISourceInfo)
>> +        info: QAPISourceInfo = expr_elem['info']
> 
> ... expr_elem['info'] here, but ...
> 
>> +
>> +        # Optional[QAPIDoc]
>> +        tmp = expr_elem.get('doc')
>> +        assert tmp is None or isinstance(tmp, QAPIDoc)
>> +        doc: Optional[QAPIDoc] = tmp
> 
> ... you avoid repetition of expr_elem.get('doc') here.  Any particular
> reason?
> 

Because this looks like garbage written by a drunkard:

assert expr_elem.get('doc') is None or isinstance(expr_elem.get('doc'), 
QAPIDoc)
doc: Optional[QAPIDoc] = expr_elem.get('doc')

>>   
>>           if 'include' in expr:
>>               continue
Markus Armbruster Feb. 25, 2021, 11:56 a.m. UTC | #3
John Snow <jsnow@redhat.com> writes:

> On 2/24/21 5:01 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> mypy does not know the types of values stored in Dicts that masquerade
>>> as objects. Help the type checker out by constraining the type.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>> Reviewed-by: Cleber Rosa <crosa@redhat.com>
>>> ---
>>>   scripts/qapi/expr.py | 25 ++++++++++++++++++++++---
>>>   1 file changed, 22 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
>>> index 5694c501fa3..783282b53ce 100644
>>> --- a/scripts/qapi/expr.py
>>> +++ b/scripts/qapi/expr.py
>>> @@ -15,9 +15,17 @@
>>>   # See the COPYING file in the top-level directory.
>>>   
>>>   import re
>>> +from typing import MutableMapping, Optional
>>>   
>>>   from .common import c_name
>>>   from .error import QAPISemError
>>> +from .parser import QAPIDoc
>>> +from .source import QAPISourceInfo
>>> +
>>> +
>>> +# Expressions in their raw form are JSON-like structures with arbitrary forms.
>>> +# Minimally, their top-level form must be a mapping of strings to values.
>>> +Expression = MutableMapping[str, object]
>> 
>> MutableMapping, fancy.  It's only ever dict.  Why abstract from that?

OrderedDict, actually.  MutableMapping is misleading, because it doesn't
specify "orderedness".

> I don't know! I referenced this in the cover letter. I cannot remember 
> the reason anymore. It had R-Bs on it so I left it alone.
>
> There are some differences, but I no longer remember why I thought they 
> applied. Maybe some of my more exploratory work wanted it. Dunno.

Happens.  It's a long patch queue you're trying to flush.

>> The use of object is again owed to mypy's inability to do recursive
>> types.  What we really have here is something like
>> 
>>     Expression = Union[bool, str, dict[str, Expression], list[Expression]]
>> 
>> with the root further constrained to the Union's dict branch.  Spell
>> that out in a bit more detail, like you did in introspect.py?
>> 
>
> Terminology problem?
>
> I am using "Expression" to mean very specifically a top-level object as 
> returned from parser.py, which *must* be an Object, so it *must* be a 
> mapping of str => yaddayadda.

Aha!

We'll talk some more about naming of type aliases in review of PATCH 08.

> The type as I intended it is Expression = Dict[str, yaddayadda]
>
> where yaddayadda is
> Union[int, str, bool, List[yaddayadda], Dict[str, yaddayadda]]

Yes.

As qapi-code-gen.txt explains, we have two layers of syntax:

* The bottom layer is (heavily bastardized) JSON.  qapi-code-gen.txt
  specifies it by listing the differences to RFC 8259.  parser.py parses
  it into abstract syntax trees.

* The upper layer recognizes the abstract syntax trees that are valid as
  QAPI schema.  qapi-code-gen.txt specifies it with a context-free
  grammar.  expr.py checks the ASTs against that grammar.  It also
  expands shorthand forms into longhand.

Detail not documented in qapi-code-gen.txt: parser.py rejects non-object
at the top-level, so expr.py doesn't have to.

> expr.py is what validates the yaddayadda, so there's no point in trying 
> to type it further, I think.

If mypy could do recursive types, typing it further would be a
no-brainer: just state what is.

Since it can't, we need to stop typing / start cheating at some point.
Where exactly is not obvious.  Your idea is at least as good as mine.

> Probably worth a better comment.

Yes :)

>> Hmm, there you used Any, not object.  I guess that's because mypy lets
>> you get away with object here, but not there.  Correct?
>> 
>
> Yep. I can get away with the stricter type here because of how we use 
> it, so I did. That's an artifact of it not being recursive and how 
> expr.py's entire raison d'etre is using isinstance() checks to 
> effectively downcast for us everywhere already.
>
>> Also, PEP 8 comment line length.
>> 
>
> Augh.
>
> Is there a way to set emacs mode highlighting in Python such that it 
> highlights when I run past the 72-col margin, but only for comments?
>
> I have the general-purpose highlighter on for the 80-col margin.

Got a .emacs snippet for me?

> I'm not familiar with any setting like this for any of the linters or 
> pycharm right away either.

Hmm, ... okay, TIL from pycodestyle(1):

            --max-line-length=n  set maximum allowed line length (default: 79)
            --max-doc-length=n   set maximum allowed doc line length and perform these
                                 checks (unchecked if not set)

Let me know whether --max-doc-length=72 fits the bill.

>
>>>   
>>>   
>>>   # Names must be letters, numbers, -, and _.  They must start with letter,
>>> @@ -287,9 +295,20 @@ def check_event(expr, info):
>>>   
>>>   def check_exprs(exprs):
>>>       for expr_elem in exprs:
>>> -        expr = expr_elem['expr']
>>> -        info = expr_elem['info']
>>> -        doc = expr_elem.get('doc')
>>> +        # Expression
>>> +        assert isinstance(expr_elem['expr'], dict)

Must be an *ordered* mapping, actually.  It's only ever OrderedDict.

>>> +        for key in expr_elem['expr'].keys():
>>> +            assert isinstance(key, str)
>>> +        expr: Expression = expr_elem['expr']

*Unchecked* way to tell the type checker (I think):

             expr = cast(Expression, expr_elem['expr']

I like checking in general, but is it worth the bother here?

>> 
>> You're fine with repeating exp_elem['expr'] here, and ...
>> 
>>> +
>>> +        # QAPISourceInfo
>>> +        assert isinstance(expr_elem['info'], QAPISourceInfo)
>>> +        info: QAPISourceInfo = expr_elem['info']
>> 
>> ... expr_elem['info'] here, but ...
>> 
>>> +
>>> +        # Optional[QAPIDoc]
>>> +        tmp = expr_elem.get('doc')
>>> +        assert tmp is None or isinstance(tmp, QAPIDoc)
>>> +        doc: Optional[QAPIDoc] = tmp
>> 
>> ... you avoid repetition of expr_elem.get('doc') here.  Any particular
>> reason?
>> 
>
> Because this looks like garbage written by a drunkard:
>
> assert expr_elem.get('doc') is None or isinstance(expr_elem.get('doc'), 
> QAPIDoc)
> doc: Optional[QAPIDoc] = expr_elem.get('doc')

Unchecked way:

             doc = cast(Optional[QAPIDoc], expr_elem.get('doc'))

>>>   
>>>           if 'include' in expr:
>>>               continue
John Snow Feb. 25, 2021, 8:43 p.m. UTC | #4
On 2/25/21 6:56 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 2/24/21 5:01 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> mypy does not know the types of values stored in Dicts that masquerade
>>>> as objects. Help the type checker out by constraining the type.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>>> Reviewed-by: Cleber Rosa <crosa@redhat.com>
>>>> ---
>>>>    scripts/qapi/expr.py | 25 ++++++++++++++++++++++---
>>>>    1 file changed, 22 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
>>>> index 5694c501fa3..783282b53ce 100644
>>>> --- a/scripts/qapi/expr.py
>>>> +++ b/scripts/qapi/expr.py
>>>> @@ -15,9 +15,17 @@
>>>>    # See the COPYING file in the top-level directory.
>>>>    
>>>>    import re
>>>> +from typing import MutableMapping, Optional
>>>>    
>>>>    from .common import c_name
>>>>    from .error import QAPISemError
>>>> +from .parser import QAPIDoc
>>>> +from .source import QAPISourceInfo
>>>> +
>>>> +
>>>> +# Expressions in their raw form are JSON-like structures with arbitrary forms.
>>>> +# Minimally, their top-level form must be a mapping of strings to values.
>>>> +Expression = MutableMapping[str, object]
>>>
>>> MutableMapping, fancy.  It's only ever dict.  Why abstract from that?
> 
> OrderedDict, actually.  MutableMapping is misleading, because it doesn't
> specify "orderedness".
> 

Yeah, I am realizing that Dict helps imply that constraint on 3.6+ but 
that MutableMapping doesn't.

I am worried about how hard it's gonna hurt when I remember why I wanted 
MutableMapping.

 >:|

For now, I'll go back to Dict.

>> I don't know! I referenced this in the cover letter. I cannot remember
>> the reason anymore. It had R-Bs on it so I left it alone.
>>
>> There are some differences, but I no longer remember why I thought they
>> applied. Maybe some of my more exploratory work wanted it. Dunno.
> 
> Happens.  It's a long patch queue you're trying to flush.
> 
>>> The use of object is again owed to mypy's inability to do recursive
>>> types.  What we really have here is something like
>>>
>>>      Expression = Union[bool, str, dict[str, Expression], list[Expression]]
>>>
>>> with the root further constrained to the Union's dict branch.  Spell
>>> that out in a bit more detail, like you did in introspect.py?
>>>
>>
>> Terminology problem?
>>
>> I am using "Expression" to mean very specifically a top-level object as
>> returned from parser.py, which *must* be an Object, so it *must* be a
>> mapping of str => yaddayadda.
> 
> Aha!
> 
> We'll talk some more about naming of type aliases in review of PATCH 08.
> 
>> The type as I intended it is Expression = Dict[str, yaddayadda]
>>
>> where yaddayadda is
>> Union[int, str, bool, List[yaddayadda], Dict[str, yaddayadda]]
> 
> Yes.
> 
> As qapi-code-gen.txt explains, we have two layers of syntax:
> 
> * The bottom layer is (heavily bastardized) JSON.  qapi-code-gen.txt
>    specifies it by listing the differences to RFC 8259.  parser.py parses
>    it into abstract syntax trees.
> 

Aside: A new realization about a deviation from JSON: objects are 
inherently unordered collections.

> * The upper layer recognizes the abstract syntax trees that are valid as
>    QAPI schema.  qapi-code-gen.txt specifies it with a context-free
>    grammar.  expr.py checks the ASTs against that grammar.  It also
>    expands shorthand forms into longhand.
> 
> Detail not documented in qapi-code-gen.txt: parser.py rejects non-object
> at the top-level, so expr.py doesn't have to.
> 

Yep.

>> expr.py is what validates the yaddayadda, so there's no point in trying
>> to type it further, I think.
> 
> If mypy could do recursive types, typing it further would be a
> no-brainer: just state what is.
> 
> Since it can't, we need to stop typing / start cheating at some point.
> Where exactly is not obvious.  Your idea is at least as good as mine.
> 
>> Probably worth a better comment.
> 
> Yes :)
> 

I'll look at Patch 8 and then revisit, but I will attempt to make a 
better comment. I think there are bits of part 5 that makes it a bit 
more obvious, because I create a Real Type :tm: to pass each 
"Expression" as a whole over to to expr.py.

(This is just kind of an intermediate form and as such it's not 
necessarily something I gave tremendous thought to.)

>>> Hmm, there you used Any, not object.  I guess that's because mypy lets
>>> you get away with object here, but not there.  Correct?
>>>
>>
>> Yep. I can get away with the stricter type here because of how we use
>> it, so I did. That's an artifact of it not being recursive and how
>> expr.py's entire raison d'etre is using isinstance() checks to
>> effectively downcast for us everywhere already.
>>
>>> Also, PEP 8 comment line length.
>>>
>>
>> Augh.
>>
>> Is there a way to set emacs mode highlighting in Python such that it
>> highlights when I run past the 72-col margin, but only for comments?
>>
>> I have the general-purpose highlighter on for the 80-col margin.
> 
> Got a .emacs snippet for me?
> 

I use only these bits:

  ;; Reflow-width defaults to 72.
  '(fill-column 72)

  ;; Highlight past column 80
  '(whitespace-line-column 80)

  ;; Theme whitespace highlighting as such:
  '(whitespace-style '(face empty tabs lines-tail trailing))

  ;; Don't insert tabs for spaces
  '(indent-tabs-mode nil)

>> I'm not familiar with any setting like this for any of the linters or
>> pycharm right away either.
> 
> Hmm, ... okay, TIL from pycodestyle(1):
> 
>              --max-line-length=n  set maximum allowed line length (default: 79)
>              --max-doc-length=n   set maximum allowed doc line length and perform these
>                                   checks (unchecked if not set)
> 
> Let me know whether --max-doc-length=72 fits the bill.
> 

It does.

I'd need to send a fixup patch in order to enable it, but I am not 
thrilled with the idea of having to squabble with you over how to break 
lines that are just barely overlong.

Least annoying for me: I write a draft patch to get the flake8 baseline 
for local testing, you copy-edit the patch to your stylistic liking, 
I'll ACK the edits.

>>
>>>>    
>>>>    
>>>>    # Names must be letters, numbers, -, and _.  They must start with letter,
>>>> @@ -287,9 +295,20 @@ def check_event(expr, info):
>>>>    
>>>>    def check_exprs(exprs):
>>>>        for expr_elem in exprs:
>>>> -        expr = expr_elem['expr']
>>>> -        info = expr_elem['info']
>>>> -        doc = expr_elem.get('doc')
>>>> +        # Expression
>>>> +        assert isinstance(expr_elem['expr'], dict)
> 
> Must be an *ordered* mapping, actually.  It's only ever OrderedDict.
> 

Allegedly. Lawsuit pending appeal.

>>>> +        for key in expr_elem['expr'].keys():
>>>> +            assert isinstance(key, str)
>>>> +        expr: Expression = expr_elem['expr']
> 
> *Unchecked* way to tell the type checker (I think):
> 
>               expr = cast(Expression, expr_elem['expr']
> 
> I like checking in general, but is it worth the bother here?
> 

It all goes away in the first half of part 5, where I create an 
Expression object that has typed fields for its components, and mypy's 
static checker does the rest of the lifting.

Could do casts, yeah, but I suppose I liked the assert to let me know 
that the types I saw on the back of my eyelids were the real ones.

>>>
>>> You're fine with repeating exp_elem['expr'] here, and ...
>>>
>>>> +
>>>> +        # QAPISourceInfo
>>>> +        assert isinstance(expr_elem['info'], QAPISourceInfo)
>>>> +        info: QAPISourceInfo = expr_elem['info']
>>>
>>> ... expr_elem['info'] here, but ...
>>>
>>>> +
>>>> +        # Optional[QAPIDoc]
>>>> +        tmp = expr_elem.get('doc')
>>>> +        assert tmp is None or isinstance(tmp, QAPIDoc)
>>>> +        doc: Optional[QAPIDoc] = tmp
>>>
>>> ... you avoid repetition of expr_elem.get('doc') here.  Any particular
>>> reason?
>>>
>>
>> Because this looks like garbage written by a drunkard:
>>
>> assert expr_elem.get('doc') is None or isinstance(expr_elem.get('doc'),
>> QAPIDoc)
>> doc: Optional[QAPIDoc] = expr_elem.get('doc')
> 
> Unchecked way:
> 
>               doc = cast(Optional[QAPIDoc], expr_elem.get('doc'))
> 
>>>>    
>>>>            if 'include' in expr:
>>>>                continue

I'll see if I can clean it up a little. I will take your suggestion of 
casts to mean that you'd be okay with actually not checking the form at 
runtime.
Markus Armbruster March 2, 2021, 5:23 a.m. UTC | #5
John Snow <jsnow@redhat.com> writes:

> On 2/25/21 6:56 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> On 2/24/21 5:01 AM, Markus Armbruster wrote:
>>>> John Snow <jsnow@redhat.com> writes:
>>>>
>>>>> mypy does not know the types of values stored in Dicts that masquerade
>>>>> as objects. Help the type checker out by constraining the type.
>>>>>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>>>> Reviewed-by: Cleber Rosa <crosa@redhat.com>
>>>>> ---
>>>>>    scripts/qapi/expr.py | 25 ++++++++++++++++++++++---
>>>>>    1 file changed, 22 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
>>>>> index 5694c501fa3..783282b53ce 100644
>>>>> --- a/scripts/qapi/expr.py
>>>>> +++ b/scripts/qapi/expr.py
>>>>> @@ -15,9 +15,17 @@
>>>>>    # See the COPYING file in the top-level directory.
>>>>>    
>>>>>    import re
>>>>> +from typing import MutableMapping, Optional
>>>>>    
>>>>>    from .common import c_name
>>>>>    from .error import QAPISemError
>>>>> +from .parser import QAPIDoc
>>>>> +from .source import QAPISourceInfo
>>>>> +
>>>>> +
>>>>> +# Expressions in their raw form are JSON-like structures with arbitrary forms.
>>>>> +# Minimally, their top-level form must be a mapping of strings to values.
>>>>> +Expression = MutableMapping[str, object]
>>>>
>>>> MutableMapping, fancy.  It's only ever dict.  Why abstract from that?
>> 
>> OrderedDict, actually.  MutableMapping is misleading, because it doesn't
>> specify "orderedness".
>> 
>
> Yeah, I am realizing that Dict helps imply that constraint on 3.6+ but 
> that MutableMapping doesn't.
>
> I am worried about how hard it's gonna hurt when I remember why I wanted 
> MutableMapping.
>
>  >:|
>
> For now, I'll go back to Dict.
>
>>> I don't know! I referenced this in the cover letter. I cannot remember
>>> the reason anymore. It had R-Bs on it so I left it alone.
>>>
>>> There are some differences, but I no longer remember why I thought they
>>> applied. Maybe some of my more exploratory work wanted it. Dunno.
>> 
>> Happens.  It's a long patch queue you're trying to flush.
>> 
>>>> The use of object is again owed to mypy's inability to do recursive
>>>> types.  What we really have here is something like
>>>>
>>>>      Expression = Union[bool, str, dict[str, Expression], list[Expression]]
>>>>
>>>> with the root further constrained to the Union's dict branch.  Spell
>>>> that out in a bit more detail, like you did in introspect.py?
>>>>
>>>
>>> Terminology problem?
>>>
>>> I am using "Expression" to mean very specifically a top-level object as
>>> returned from parser.py, which *must* be an Object, so it *must* be a
>>> mapping of str => yaddayadda.
>> 
>> Aha!
>> 
>> We'll talk some more about naming of type aliases in review of PATCH 08.
>> 
>>> The type as I intended it is Expression = Dict[str, yaddayadda]
>>>
>>> where yaddayadda is
>>> Union[int, str, bool, List[yaddayadda], Dict[str, yaddayadda]]
>> 
>> Yes.
>> 
>> As qapi-code-gen.txt explains, we have two layers of syntax:
>> 
>> * The bottom layer is (heavily bastardized) JSON.  qapi-code-gen.txt
>>    specifies it by listing the differences to RFC 8259.  parser.py parses
>>    it into abstract syntax trees.
>> 
>
> Aside: A new realization about a deviation from JSON: objects are 
> inherently unordered collections.

For a value of "new" :)

In JSON *syntax* (the thing defined by its grammar), order matters.
It doesn't in *semantics*.  Except when it does:

    JSON parsing libraries have been observed to differ as to whether or
    not they make the ordering of object members visible to calling
    software.  Implementations whose behavior does not depend on member
    ordering will be interoperable in the sense that they will not be
    affected by these differences.

This is RFC 8259 section 4, Objects.

qapi-code-gen.txt spells it out for the QAPI schema language.  Section
"Schema syntax":

    The order of members within JSON objects does not matter unless
    explicitly noted.

Later sections note explicitly.

>> * The upper layer recognizes the abstract syntax trees that are valid as
>>    QAPI schema.  qapi-code-gen.txt specifies it with a context-free
>>    grammar.  expr.py checks the ASTs against that grammar.  It also
>>    expands shorthand forms into longhand.
>> 
>> Detail not documented in qapi-code-gen.txt: parser.py rejects non-object
>> at the top-level, so expr.py doesn't have to.
>> 
>
> Yep.
>
>>> expr.py is what validates the yaddayadda, so there's no point in trying
>>> to type it further, I think.
>> 
>> If mypy could do recursive types, typing it further would be a
>> no-brainer: just state what is.
>> 
>> Since it can't, we need to stop typing / start cheating at some point.
>> Where exactly is not obvious.  Your idea is at least as good as mine.
>> 
>>> Probably worth a better comment.
>> 
>> Yes :)
>> 
>
> I'll look at Patch 8 and then revisit, but I will attempt to make a 
> better comment. I think there are bits of part 5 that makes it a bit 
> more obvious, because I create a Real Type :tm: to pass each 
> "Expression" as a whole over to to expr.py.
>
> (This is just kind of an intermediate form and as such it's not 
> necessarily something I gave tremendous thought to.)
>
>>>> Hmm, there you used Any, not object.  I guess that's because mypy lets
>>>> you get away with object here, but not there.  Correct?
>>>>
>>>
>>> Yep. I can get away with the stricter type here because of how we use
>>> it, so I did. That's an artifact of it not being recursive and how
>>> expr.py's entire raison d'etre is using isinstance() checks to
>>> effectively downcast for us everywhere already.
>>>
>>>> Also, PEP 8 comment line length.
>>>>
>>>
>>> Augh.
>>>
>>> Is there a way to set emacs mode highlighting in Python such that it
>>> highlights when I run past the 72-col margin, but only for comments?
>>>
>>> I have the general-purpose highlighter on for the 80-col margin.
>> 
>> Got a .emacs snippet for me?
>> 
>
> I use only these bits:
>
>   ;; Reflow-width defaults to 72.
>   '(fill-column 72)
>
>   ;; Highlight past column 80
>   '(whitespace-line-column 80)
>
>   ;; Theme whitespace highlighting as such:
>   '(whitespace-style '(face empty tabs lines-tail trailing))
>
>   ;; Don't insert tabs for spaces
>   '(indent-tabs-mode nil)
>
>>> I'm not familiar with any setting like this for any of the linters or
>>> pycharm right away either.
>> 
>> Hmm, ... okay, TIL from pycodestyle(1):
>> 
>>              --max-line-length=n  set maximum allowed line length (default: 79)
>>              --max-doc-length=n   set maximum allowed doc line length and perform these
>>                                   checks (unchecked if not set)
>> 
>> Let me know whether --max-doc-length=72 fits the bill.
>> 
>
> It does.
>
> I'd need to send a fixup patch in order to enable it, but I am not 
> thrilled with the idea of having to squabble with you over how to break 
> lines that are just barely overlong.
>
> Least annoying for me: I write a draft patch to get the flake8 baseline 
> for local testing, you copy-edit the patch to your stylistic liking, 
> I'll ACK the edits.

Go right ahead.

>>>
>>>>>    
>>>>>    
>>>>>    # Names must be letters, numbers, -, and _.  They must start with letter,
>>>>> @@ -287,9 +295,20 @@ def check_event(expr, info):
>>>>>    
>>>>>    def check_exprs(exprs):
>>>>>        for expr_elem in exprs:
>>>>> -        expr = expr_elem['expr']
>>>>> -        info = expr_elem['info']
>>>>> -        doc = expr_elem.get('doc')
>>>>> +        # Expression
>>>>> +        assert isinstance(expr_elem['expr'], dict)
>> 
>> Must be an *ordered* mapping, actually.  It's only ever OrderedDict.
>> 
>
> Allegedly. Lawsuit pending appeal.
>
>>>>> +        for key in expr_elem['expr'].keys():
>>>>> +            assert isinstance(key, str)
>>>>> +        expr: Expression = expr_elem['expr']
>> 
>> *Unchecked* way to tell the type checker (I think):
>> 
>>               expr = cast(Expression, expr_elem['expr']
>> 
>> I like checking in general, but is it worth the bother here?
>> 
>
> It all goes away in the first half of part 5, where I create an 
> Expression object that has typed fields for its components, and mypy's 
> static checker does the rest of the lifting.
>
> Could do casts, yeah, but I suppose I liked the assert to let me know 
> that the types I saw on the back of my eyelids were the real ones.
>
>>>>
>>>> You're fine with repeating exp_elem['expr'] here, and ...
>>>>
>>>>> +
>>>>> +        # QAPISourceInfo
>>>>> +        assert isinstance(expr_elem['info'], QAPISourceInfo)
>>>>> +        info: QAPISourceInfo = expr_elem['info']
>>>>
>>>> ... expr_elem['info'] here, but ...
>>>>
>>>>> +
>>>>> +        # Optional[QAPIDoc]
>>>>> +        tmp = expr_elem.get('doc')
>>>>> +        assert tmp is None or isinstance(tmp, QAPIDoc)
>>>>> +        doc: Optional[QAPIDoc] = tmp
>>>>
>>>> ... you avoid repetition of expr_elem.get('doc') here.  Any particular
>>>> reason?
>>>>
>>>
>>> Because this looks like garbage written by a drunkard:
>>>
>>> assert expr_elem.get('doc') is None or isinstance(expr_elem.get('doc'),
>>> QAPIDoc)
>>> doc: Optional[QAPIDoc] = expr_elem.get('doc')
>> 
>> Unchecked way:
>> 
>>               doc = cast(Optional[QAPIDoc], expr_elem.get('doc'))
>> 
>>>>>    
>>>>>            if 'include' in expr:
>>>>>                continue
>
> I'll see if I can clean it up a little. I will take your suggestion of 
> casts to mean that you'd be okay with actually not checking the form at 
> runtime.

I am.
diff mbox series

Patch

diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index 5694c501fa3..783282b53ce 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -15,9 +15,17 @@ 
 # See the COPYING file in the top-level directory.
 
 import re
+from typing import MutableMapping, Optional
 
 from .common import c_name
 from .error import QAPISemError
+from .parser import QAPIDoc
+from .source import QAPISourceInfo
+
+
+# Expressions in their raw form are JSON-like structures with arbitrary forms.
+# Minimally, their top-level form must be a mapping of strings to values.
+Expression = MutableMapping[str, object]
 
 
 # Names must be letters, numbers, -, and _.  They must start with letter,
@@ -287,9 +295,20 @@  def check_event(expr, info):
 
 def check_exprs(exprs):
     for expr_elem in exprs:
-        expr = expr_elem['expr']
-        info = expr_elem['info']
-        doc = expr_elem.get('doc')
+        # Expression
+        assert isinstance(expr_elem['expr'], dict)
+        for key in expr_elem['expr'].keys():
+            assert isinstance(key, str)
+        expr: Expression = expr_elem['expr']
+
+        # QAPISourceInfo
+        assert isinstance(expr_elem['info'], QAPISourceInfo)
+        info: QAPISourceInfo = expr_elem['info']
+
+        # Optional[QAPIDoc]
+        tmp = expr_elem.get('doc')
+        assert tmp is None or isinstance(tmp, QAPIDoc)
+        doc: Optional[QAPIDoc] = tmp
 
         if 'include' in expr:
             continue