diff mbox series

[16/22] qapi/parser: add docstrings

Message ID 20210422030720.3685766-17-jsnow@redhat.com
State New
Headers show
Series qapi: static typing conversion, pt5a | expand

Commit Message

John Snow April 22, 2021, 3:07 a.m. UTC
Signed-off-by: John Snow <jsnow@redhat.com>

---

My hubris is infinite.

OK, I only added a few -- to help me remember how the parser works at a glance.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

Comments

Markus Armbruster April 25, 2021, 1:27 p.m. UTC | #1
John Snow <jsnow@redhat.com> writes:

> Signed-off-by: John Snow <jsnow@redhat.com>
>
> ---
>
> My hubris is infinite.

Score one of the three principal virtues of a programmer ;)

> OK, I only added a few -- to help me remember how the parser works at a glance.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
>
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index dbbd0fcbc2f..8fc77808ace 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>  
>  
>  class QAPISchemaParser:
> +    """
> +    Performs parsing of a QAPI schema source file.

Actually, this parses one of two layers, see qapi-code-gen.txt section
"Schema syntax".  Pointing there might help.

>  
> +    :param fname: Path to the source file

Either "Source file name" or "Source pathname", please.  I prefer "file
name" for additional distance to "path" in the sense of a search path,
i.e. a list of directory names.

> +    :param previously_included:
> +        The absolute paths of previously included source files.

Either "absolute file name" or "absulute pathname".

> +        Only used by recursive calls to avoid re-parsing files.

Feels like detail, not sure it's needed here.

> +    :param incl_info:
> +       `QAPISourceInfo` for the parent document.
> +       This may be None if this is the root schema document.

Recommend s/This maybe //.

qapi-code-gen.txt calls a QAPI schema that uses include directives
"modular", and the included files "sub-modules".  s/root schema
document/root module/?

> +
> +    :ivar exprs: Resulting parsed expressions.
> +    :ivar docs: Resulting parsed documentation blocks.

Uh, why are these here?  A doc string is interface documentation...

> +
> +    :raise OSError: For problems opening the root schema document.
> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
> +    :raise QAPISemError: For various semantic issues with the schema.

Should callers care for the difference between QAPIParseError and
QAPISemError?

> +    """
>      def __init__(self,
>                   fname: str,
>                   previously_included: Optional[Set[str]] = None,
> @@ -77,6 +94,11 @@ def __init__(self,
>          self._parse()
>  
>      def _parse(self) -> None:
> +        """
> +        Parse the QAPI schema document.
> +
> +        :return: None; results are stored in ``exprs`` and ``docs``.

Another ignorant doc string markup question...  how am I supposed to see
that exprs and docs are attributes, and not global variables?

> +        """
>          cur_doc = None
>  
>          with open(self._fname, 'r', encoding='utf-8') as fp:
> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>              raise QAPISemError(info, "unknown pragma '%s'" % name)
>  
>      def accept(self, skip_comment: bool = True) -> None:
> +        """
> +        Read the next lexeme and process it into a token.
> +
> +        :Object state:
> +          :tok: represents the token type. See below for values.
> +          :pos: is the position of the first character in the lexeme.
> +          :cursor: is the position of the next character.

Define "position" :)  It's an index in self.src.

self.cursor and self.pos are not used outside accept().  Not sure thet
belong into interface documentation.

> +          :val: is the variable value of the token, if any.

Missing: self.info, which *is* used outside accept().

> +
> +        Single-character tokens:
> +
> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
> +        ``LSQB``, and ``RSQB``.

"These include ..." is misleading.  This is the complete list of
single-character tokens.

> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
> +        lexeme.  ``val`` is ``None``.
> +
> +        Multi-character tokens:
> +
> +        - ``COMMENT``:
> +
> +          - This token is not normally yielded by the lexer, but it
> +            can be when ``skip_comment`` is False.
> +          - ``tok`` is the value ``"#"``.
> +          - ``val`` is a string including all chars until end-of-line.
> +
> +        - ``STRING``:
> +
> +          - ``tok`` is the ``"'"``, the single quote.
> +          - ``value`` is the string, *excluding* the quotes.
> +
> +        - ``TRUE`` and ``FALSE``:
> +
> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
> +          - ``val`` is either ``True`` or ``False`` accordingly.
> +
> +        - ``NEWLINE`` and ``SPACE``:
> +
> +          - These are consumed by the lexer directly. ``line_pos`` and
> +            ``info`` are advanced when ``NEWLINE`` is encountered.
> +            ``tok`` is set to ``None`` upon reaching EOF.
> +
> +        :param skip_comment:
> +            When false, return ``COMMENT`` tokens.
> +            This is used when reading documentation blocks.

The doc string mostly describes possible state on return of accept().
*Within* accept(), self.tok may be any character.

"Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
happens within accept().

Perhaps phrasing it as a postcondition would be clearer:

    Read and store the next token.

    On return, self.tok is the token type, self.info is describes its
    source location, and self.value is the token's value.

    The possible token types and their values are

    ...

> +        """
>          while True:
>              self.tok = self.src[self.cursor]
>              self.pos = self.cursor
John Snow April 26, 2021, 6:26 p.m. UTC | #2
On 4/25/21 9:27 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> ---
>>
>> My hubris is infinite.
> 
> Score one of the three principal virtues of a programmer ;)
> 

It was written before the prior review, but I promise I am slowing down 
on adding these. I just genuinely left them to help remind myself how 
these modules are actually structured and work so that I will be able to 
"pop in" quickly in the future and make a tactical, informed edit.

>> OK, I only added a few -- to help me remember how the parser works at a glance.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>   scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> index dbbd0fcbc2f..8fc77808ace 100644
>> --- a/scripts/qapi/parser.py
>> +++ b/scripts/qapi/parser.py
>> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>>   
>>   
>>   class QAPISchemaParser:
>> +    """
>> +    Performs parsing of a QAPI schema source file.
> 
> Actually, this parses one of two layers, see qapi-code-gen.txt section
> "Schema syntax".  Pointing there might help.
> 

It sort of parses one-and-a-half layers, but yes ... I know the 
distinction you're drawing here. This is *mostly* the JSON/AST level.

(With some upper-level or mid-level parsing for Pragmas and Includes.)

>>   
>> +    :param fname: Path to the source file
> 
> Either "Source file name" or "Source pathname", please.  I prefer "file
> name" for additional distance to "path" in the sense of a search path,
> i.e. a list of directory names.
> 

OK, I am not sure I have any ... prejudice about when to use which kind 
of description for these sorts of things. I'm happy to defer to you, but 
if there's some kind of existing standard vocabulary I'm trampling all 
over, feel free to point me to your preferred hacker dictionary.

Anyway, happy to adopt your phrasing here.

>> +    :param previously_included:
>> +        The absolute paths of previously included source files.
> 
> Either "absolute file name" or "absulute pathname".
> 

OK.

>> +        Only used by recursive calls to avoid re-parsing files.
> 
> Feels like detail, not sure it's needed here.
> 

You're probably right, but I suppose I wanted to hint/suggest that it 
was not necessary to feed it this argument for the root schema, but it 
was crucial for the recursive calls.

(Earlier I mentioned possibly just passing the parent parser in: that 
helps eliminate some of this ambiguity, too.)

>> +    :param incl_info:
>> +       `QAPISourceInfo` for the parent document.
>> +       This may be None if this is the root schema document.
> 
> Recommend s/This maybe //.
> 
> qapi-code-gen.txt calls a QAPI schema that uses include directives
> "modular", and the included files "sub-modules".  s/root schema
> document/root module/?
> 

Sure. All in favor of phrasing consistency.

(By the way: I did write up a draft for converting qapi-code-gen.txt to 
ReST format, and if I had finished that, it might be nice to hotlink to 
it here. I stopped for now because I wanted to solidify some conventions 
on how to markup certain constructs first, and wanted ... not to 
overwhelm you with more doc-wrangling.)

>> +
>> +    :ivar exprs: Resulting parsed expressions.
>> +    :ivar docs: Resulting parsed documentation blocks.
> 
> Uh, why are these here?  A doc string is interface documentation...
> 

These *are* interface. It is how callers are expected to get the results 
of parsing.

We could change that, of course, but that is absolutely how this class 
works today.

>> +
>> +    :raise OSError: For problems opening the root schema document.
>> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
>> +    :raise QAPISemError: For various semantic issues with the schema.
> 
> Should callers care for the difference between QAPIParseError and
> QAPISemError?
> 

That's up to the caller, I suppose. I just dutifully reported the truth 
of the matter here.

(That's a real non-answer, I know.)

I could always document QAPISourceError instead, with a note about the 
subclasses used for completeness.

(The intent is that QAPIError is always assumed/implied to be sufficient 
for capturing absolutely everything raised directly by this package, if 
you want to ignore the meanings behind them.)

>> +    """
>>       def __init__(self,
>>                    fname: str,
>>                    previously_included: Optional[Set[str]] = None,
>> @@ -77,6 +94,11 @@ def __init__(self,
>>           self._parse()
>>   
>>       def _parse(self) -> None:
>> +        """
>> +        Parse the QAPI schema document.
>> +
>> +        :return: None; results are stored in ``exprs`` and ``docs``.
> 
> Another ignorant doc string markup question...  how am I supposed to see
> that exprs and docs are attributes, and not global variables?
> 

I don't know, it's an unsolved mystery for me too. I need more time in 
the Sphinx dungeon to figure out how this stuff is supposed to work. 
You're right to wonder.

>> +        """
>>           cur_doc = None
>>   
>>           with open(self._fname, 'r', encoding='utf-8') as fp:
>> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>>               raise QAPISemError(info, "unknown pragma '%s'" % name)
>>   
>>       def accept(self, skip_comment: bool = True) -> None:
>> +        """
>> +        Read the next lexeme and process it into a token.
>> +
>> +        :Object state:
>> +          :tok: represents the token type. See below for values.
>> +          :pos: is the position of the first character in the lexeme.
>> +          :cursor: is the position of the next character.
> 
> Define "position" :)  It's an index in self.src.
> 

Good call.

> self.cursor and self.pos are not used outside accept().  Not sure thet
> belong into interface documentation.
> 

Fair point, though I was on a mission to document exactly how the parser 
works even at the internal level, because accept(), despite being 
"public", is really more of an internal function here.

I am somewhat partial to documenting these state variables for my own 
sake so that I can remember the way this lexer behaves.

>> +          :val: is the variable value of the token, if any.
> 
> Missing: self.info, which *is* used outside accept().
> 

Oh, yes.

>> +
>> +        Single-character tokens:
>> +
>> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
>> +        ``LSQB``, and ``RSQB``.
> 
> "These include ..." is misleading.  This is the complete list of
> single-character tokens.
> 

I'm just testing your ability to recognize the difference between proper 
and improper subsets.

(Joking. I'll reword to avoid that ambiguity.)

>> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
>> +        lexeme.  ``val`` is ``None``.
>> +
>> +        Multi-character tokens:
>> +
>> +        - ``COMMENT``:
>> +
>> +          - This token is not normally yielded by the lexer, but it
>> +            can be when ``skip_comment`` is False.
>> +          - ``tok`` is the value ``"#"``.
>> +          - ``val`` is a string including all chars until end-of-line.
>> +
>> +        - ``STRING``:
>> +
>> +          - ``tok`` is the ``"'"``, the single quote.
>> +          - ``value`` is the string, *excluding* the quotes.
>> +
>> +        - ``TRUE`` and ``FALSE``:
>> +
>> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
>> +          - ``val`` is either ``True`` or ``False`` accordingly.
>> +
>> +        - ``NEWLINE`` and ``SPACE``:
>> +
>> +          - These are consumed by the lexer directly. ``line_pos`` and
>> +            ``info`` are advanced when ``NEWLINE`` is encountered.
>> +            ``tok`` is set to ``None`` upon reaching EOF.
>> +
>> +        :param skip_comment:
>> +            When false, return ``COMMENT`` tokens.
>> +            This is used when reading documentation blocks.
> 
> The doc string mostly describes possible state on return of accept().
> *Within* accept(), self.tok may be any character.
> 
> "Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
> happens within accept().
> 

Almost kinda-sorta. The value of "tok" is important there, too.

> Perhaps phrasing it as a postcondition would be clearer:
> 
>      Read and store the next token.
> 
>      On return, self.tok is the token type, self.info is describes its
>      source location, and self.value is the token's value.
> 
>      The possible token types and their values are
> 
>      ...
> 

OK, I will play with this suggestion while I try to clean up the docs.

>> +        """
>>           while True:
>>               self.tok = self.src[self.cursor]
>>               self.pos = self.cursor

Thanks for taking a look at this one.

--js
Markus Armbruster April 27, 2021, 9:03 a.m. UTC | #3
John Snow <jsnow@redhat.com> writes:

> On 4/25/21 9:27 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>> 
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>
>>> ---
>>>
>>> My hubris is infinite.
>> 
>> Score one of the three principal virtues of a programmer ;)
>> 
>
> It was written before the prior review, but I promise I am slowing down 
> on adding these. I just genuinely left them to help remind myself how 
> these modules are actually structured and work so that I will be able to 
> "pop in" quickly in the future and make a tactical, informed edit.
>
>>> OK, I only added a few -- to help me remember how the parser works at a glance.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>   scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 66 insertions(+)
>>>
>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>> index dbbd0fcbc2f..8fc77808ace 100644
>>> --- a/scripts/qapi/parser.py
>>> +++ b/scripts/qapi/parser.py
>>> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>>>   
>>>   
>>>   class QAPISchemaParser:
>>> +    """
>>> +    Performs parsing of a QAPI schema source file.
>> 
>> Actually, this parses one of two layers, see qapi-code-gen.txt section
>> "Schema syntax".  Pointing there might help.
>> 
>
> It sort of parses one-and-a-half layers, but yes ... I know the 
> distinction you're drawing here. This is *mostly* the JSON/AST level.
>
> (With some upper-level or mid-level parsing for Pragmas and Includes.)

True.  I chose simplicity over purity.

>>>   
>>> +    :param fname: Path to the source file
>> 
>> Either "Source file name" or "Source pathname", please.  I prefer "file
>> name" for additional distance to "path" in the sense of a search path,
>> i.e. a list of directory names.
>> 
>
> OK, I am not sure I have any ... prejudice about when to use which kind 
> of description for these sorts of things. I'm happy to defer to you, but 
> if there's some kind of existing standard vocabulary I'm trampling all 
> over, feel free to point me to your preferred hacker dictionary.
>
> Anyway, happy to adopt your phrasing here.
>
>>> +    :param previously_included:
>>> +        The absolute paths of previously included source files.
>> 
>> Either "absolute file name" or "absulute pathname".
>> 
>
> OK.
>
>>> +        Only used by recursive calls to avoid re-parsing files.
>> 
>> Feels like detail, not sure it's needed here.
>> 
>
> You're probably right, but I suppose I wanted to hint/suggest that it 
> was not necessary to feed it this argument for the root schema, but it 
> was crucial for the recursive calls.

To me "if root schema, then nothing was previously included" feels
obvious enough :)  But if you want to spell out proper use of the
parameter, I recommend to stick to the interface, i.e. when to pass it,
not what the function does with it (in the hope that the reader can
then guess when to pass it).

> (Earlier I mentioned possibly just passing the parent parser in: that 
> helps eliminate some of this ambiguity, too.)
>
>>> +    :param incl_info:
>>> +       `QAPISourceInfo` for the parent document.
>>> +       This may be None if this is the root schema document.
>> 
>> Recommend s/This maybe //.
>> 
>> qapi-code-gen.txt calls a QAPI schema that uses include directives
>> "modular", and the included files "sub-modules".  s/root schema
>> document/root module/?
>> 
>
> Sure. All in favor of phrasing consistency.
>
> (By the way: I did write up a draft for converting qapi-code-gen.txt to 
> ReST format, and if I had finished that, it might be nice to hotlink to 
> it here. I stopped for now because I wanted to solidify some conventions 
> on how to markup certain constructs first, and wanted ... not to 
> overwhelm you with more doc-wrangling.)

Appreciated :)

>>> +
>>> +    :ivar exprs: Resulting parsed expressions.
>>> +    :ivar docs: Resulting parsed documentation blocks.
>> 
>> Uh, why are these here?  A doc string is interface documentation...
>> 
>
> These *are* interface. It is how callers are expected to get the results 
> of parsing.

You're right, but is the constructor the right place to document
attributes?

> We could change that, of course, but that is absolutely how this class 
> works today.
>
>>> +
>>> +    :raise OSError: For problems opening the root schema document.
>>> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
>>> +    :raise QAPISemError: For various semantic issues with the schema.
>> 
>> Should callers care for the difference between QAPIParseError and
>> QAPISemError?
>> 
>
> That's up to the caller, I suppose. I just dutifully reported the truth 
> of the matter here.
>
> (That's a real non-answer, I know.)
>
> I could always document QAPISourceError instead, with a note about the 
> subclasses used for completeness.
>
> (The intent is that QAPIError is always assumed/implied to be sufficient 
> for capturing absolutely everything raised directly by this package, if 
> you want to ignore the meanings behind them.)

I honestly can't think of a reason for catching anything but QAPIError.
The other classes exist only to give us more convenient ways to
construct instances of QAPIError.  We could replace them all by
functions returning QAPIError.

>>> +    """
>>>       def __init__(self,
>>>                    fname: str,
>>>                    previously_included: Optional[Set[str]] = None,
>>> @@ -77,6 +94,11 @@ def __init__(self,
>>>           self._parse()
>>>   
>>>       def _parse(self) -> None:
>>> +        """
>>> +        Parse the QAPI schema document.
>>> +
>>> +        :return: None; results are stored in ``exprs`` and ``docs``.
>> 
>> Another ignorant doc string markup question...  how am I supposed to see
>> that exprs and docs are attributes, and not global variables?
>> 
>
> I don't know, it's an unsolved mystery for me too. I need more time in 
> the Sphinx dungeon to figure out how this stuff is supposed to work. 
> You're right to wonder.

Use self.exprs and self.docs meanwhile?

>>> +        """
>>>           cur_doc = None
>>>   
>>>           with open(self._fname, 'r', encoding='utf-8') as fp:
>>> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>>>               raise QAPISemError(info, "unknown pragma '%s'" % name)
>>>   
>>>       def accept(self, skip_comment: bool = True) -> None:
>>> +        """
>>> +        Read the next lexeme and process it into a token.
>>> +
>>> +        :Object state:
>>> +          :tok: represents the token type. See below for values.
>>> +          :pos: is the position of the first character in the lexeme.
>>> +          :cursor: is the position of the next character.
>> 
>> Define "position" :)  It's an index in self.src.
>> 
>
> Good call.
>
>> self.cursor and self.pos are not used outside accept().  Not sure thet
>> belong into interface documentation.
>> 
>
> Fair point, though I was on a mission to document exactly how the parser 
> works even at the internal level, because accept(), despite being 
> "public", is really more of an internal function here.
>
> I am somewhat partial to documenting these state variables for my own 
> sake so that I can remember the way this lexer behaves.

I understand why you want to document how they work.  Since they're
internal to accept(), a comment in accept() seems more proper than
accept() doc string.  Admittedly doesn't matter that much, as accept()
is internal to the class.

>>> +          :val: is the variable value of the token, if any.
>> 
>> Missing: self.info, which *is* used outside accept().
>> 
>
> Oh, yes.
>
>>> +
>>> +        Single-character tokens:
>>> +
>>> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
>>> +        ``LSQB``, and ``RSQB``.
>> 
>> "These include ..." is misleading.  This is the complete list of
>> single-character tokens.
>> 
>
> I'm just testing your ability to recognize the difference between proper 
> and improper subsets.
>
> (Joking. I'll reword to avoid that ambiguity.)
>
>>> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
>>> +        lexeme.  ``val`` is ``None``.
>>> +
>>> +        Multi-character tokens:
>>> +
>>> +        - ``COMMENT``:
>>> +
>>> +          - This token is not normally yielded by the lexer, but it
>>> +            can be when ``skip_comment`` is False.
>>> +          - ``tok`` is the value ``"#"``.
>>> +          - ``val`` is a string including all chars until end-of-line.
>>> +
>>> +        - ``STRING``:
>>> +
>>> +          - ``tok`` is the ``"'"``, the single quote.
>>> +          - ``value`` is the string, *excluding* the quotes.
>>> +
>>> +        - ``TRUE`` and ``FALSE``:
>>> +
>>> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
>>> +          - ``val`` is either ``True`` or ``False`` accordingly.
>>> +
>>> +        - ``NEWLINE`` and ``SPACE``:
>>> +
>>> +          - These are consumed by the lexer directly. ``line_pos`` and
>>> +            ``info`` are advanced when ``NEWLINE`` is encountered.
>>> +            ``tok`` is set to ``None`` upon reaching EOF.
>>> +
>>> +        :param skip_comment:
>>> +            When false, return ``COMMENT`` tokens.
>>> +            This is used when reading documentation blocks.
>> 
>> The doc string mostly describes possible state on return of accept().
>> *Within* accept(), self.tok may be any character.
>> 
>> "Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
>> happens within accept().
>> 
>
> Almost kinda-sorta. The value of "tok" is important there, too.

--verbose?

>> Perhaps phrasing it as a postcondition would be clearer:
>> 
>>      Read and store the next token.
>> 
>>      On return, self.tok is the token type, self.info is describes its
>>      source location, and self.value is the token's value.
>> 
>>      The possible token types and their values are
>> 
>>      ...
>> 
>
> OK, I will play with this suggestion while I try to clean up the docs.
>
>>> +        """
>>>           while True:
>>>               self.tok = self.src[self.cursor]
>>>               self.pos = self.cursor
>
> Thanks for taking a look at this one.

Thank *you* for documenting my[*] code!


[*] Some of it mine in the sense I wrote it, some of it mine in the
sense I maintain it.
John Snow May 6, 2021, 2:08 a.m. UTC | #4
On 4/27/21 5:03 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 4/25/21 9:27 AM, Markus Armbruster wrote:
>>> John Snow <jsnow@redhat.com> writes:
>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>
>>>> ---
>>>>
>>>> My hubris is infinite.
>>>
>>> Score one of the three principal virtues of a programmer ;)
>>>
>>
>> It was written before the prior review, but I promise I am slowing down
>> on adding these. I just genuinely left them to help remind myself how
>> these modules are actually structured and work so that I will be able to
>> "pop in" quickly in the future and make a tactical, informed edit.
>>
>>>> OK, I only added a few -- to help me remember how the parser works at a glance.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    scripts/qapi/parser.py | 66 ++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 66 insertions(+)
>>>>
>>>> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>>>> index dbbd0fcbc2f..8fc77808ace 100644
>>>> --- a/scripts/qapi/parser.py
>>>> +++ b/scripts/qapi/parser.py
>>>> @@ -51,7 +51,24 @@ def __init__(self, parser: 'QAPISchemaParser', msg: str):
>>>>    
>>>>    
>>>>    class QAPISchemaParser:
>>>> +    """
>>>> +    Performs parsing of a QAPI schema source file.
>>>
>>> Actually, this parses one of two layers, see qapi-code-gen.txt section
>>> "Schema syntax".  Pointing there might help.
>>>
>>
>> It sort of parses one-and-a-half layers, but yes ... I know the
>> distinction you're drawing here. This is *mostly* the JSON/AST level.
>>
>> (With some upper-level or mid-level parsing for Pragmas and Includes.)
> 
> True.  I chose simplicity over purity.
> 
>>>>    
>>>> +    :param fname: Path to the source file
>>>
>>> Either "Source file name" or "Source pathname", please.  I prefer "file
>>> name" for additional distance to "path" in the sense of a search path,
>>> i.e. a list of directory names.
>>>
>>
>> OK, I am not sure I have any ... prejudice about when to use which kind
>> of description for these sorts of things. I'm happy to defer to you, but
>> if there's some kind of existing standard vocabulary I'm trampling all
>> over, feel free to point me to your preferred hacker dictionary.
>>
>> Anyway, happy to adopt your phrasing here.
>>
>>>> +    :param previously_included:
>>>> +        The absolute paths of previously included source files.
>>>
>>> Either "absolute file name" or "absulute pathname".
>>>
>>
>> OK.
>>
>>>> +        Only used by recursive calls to avoid re-parsing files.
>>>
>>> Feels like detail, not sure it's needed here.
>>>
>>
>> You're probably right, but I suppose I wanted to hint/suggest that it
>> was not necessary to feed it this argument for the root schema, but it
>> was crucial for the recursive calls.
> 
> To me "if root schema, then nothing was previously included" feels
> obvious enough :)  But if you want to spell out proper use of the
> parameter, I recommend to stick to the interface, i.e. when to pass it,
> not what the function does with it (in the hope that the reader can
> then guess when to pass it).
> 
>> (Earlier I mentioned possibly just passing the parent parser in: that
>> helps eliminate some of this ambiguity, too.)
>>
>>>> +    :param incl_info:
>>>> +       `QAPISourceInfo` for the parent document.
>>>> +       This may be None if this is the root schema document.
>>>
>>> Recommend s/This maybe //.
>>>
>>> qapi-code-gen.txt calls a QAPI schema that uses include directives
>>> "modular", and the included files "sub-modules".  s/root schema
>>> document/root module/?
>>>
>>
>> Sure. All in favor of phrasing consistency.
>>
>> (By the way: I did write up a draft for converting qapi-code-gen.txt to
>> ReST format, and if I had finished that, it might be nice to hotlink to
>> it here. I stopped for now because I wanted to solidify some conventions
>> on how to markup certain constructs first, and wanted ... not to
>> overwhelm you with more doc-wrangling.)
> 
> Appreciated :)
> 
>>>> +
>>>> +    :ivar exprs: Resulting parsed expressions.
>>>> +    :ivar docs: Resulting parsed documentation blocks.
>>>
>>> Uh, why are these here?  A doc string is interface documentation...
>>>
>>
>> These *are* interface. It is how callers are expected to get the results
>> of parsing.
> 
> You're right, but is the constructor the right place to document
> attributes?
> 

This is the docstring for the class, actually.

https://www.python.org/dev/peps/pep-0257/ says:

"The docstring for a class should summarize its behavior and list the 
public methods and instance variables. If the class is intended to be 
subclassed, and has an additional interface for subclasses, this 
interface should be listed separately (in the docstring). The class 
constructor should be documented in the docstring for its __init__ 
method. Individual methods should be documented by their own docstring."

So that's where parameters for the init method goes, as well as class 
and instance variables.

One-stop shop for interface documentation.

>> We could change that, of course, but that is absolutely how this class
>> works today.
>>
>>>> +
>>>> +    :raise OSError: For problems opening the root schema document.
>>>> +    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
>>>> +    :raise QAPISemError: For various semantic issues with the schema.
>>>
>>> Should callers care for the difference between QAPIParseError and
>>> QAPISemError?
>>>
>>
>> That's up to the caller, I suppose. I just dutifully reported the truth
>> of the matter here.
>>
>> (That's a real non-answer, I know.)
>>
>> I could always document QAPISourceError instead, with a note about the
>> subclasses used for completeness.
>>
>> (The intent is that QAPIError is always assumed/implied to be sufficient
>> for capturing absolutely everything raised directly by this package, if
>> you want to ignore the meanings behind them.)
> 
> I honestly can't think of a reason for catching anything but QAPIError.
> The other classes exist only to give us more convenient ways to
> construct instances of QAPIError.  We could replace them all by
> functions returning QAPIError.
> 

Summary it is.

>>>> +    """
>>>>        def __init__(self,
>>>>                     fname: str,
>>>>                     previously_included: Optional[Set[str]] = None,
>>>> @@ -77,6 +94,11 @@ def __init__(self,
>>>>            self._parse()
>>>>    
>>>>        def _parse(self) -> None:
>>>> +        """
>>>> +        Parse the QAPI schema document.
>>>> +
>>>> +        :return: None; results are stored in ``exprs`` and ``docs``.
>>>
>>> Another ignorant doc string markup question...  how am I supposed to see
>>> that exprs and docs are attributes, and not global variables?
>>>
>>
>> I don't know, it's an unsolved mystery for me too. I need more time in
>> the Sphinx dungeon to figure out how this stuff is supposed to work.
>> You're right to wonder.
> 
> Use self.exprs and self.docs meanwhile?
> 

If I don't accidentally trip and fall and decide to care more about it 
by the time I finish revising the docs tomorrow, yes.

>>>> +        """
>>>>            cur_doc = None
>>>>    
>>>>            with open(self._fname, 'r', encoding='utf-8') as fp:
>>>> @@ -197,6 +219,50 @@ def _check(name: str, value: object) -> List[str]:
>>>>                raise QAPISemError(info, "unknown pragma '%s'" % name)
>>>>    
>>>>        def accept(self, skip_comment: bool = True) -> None:
>>>> +        """
>>>> +        Read the next lexeme and process it into a token.
>>>> +
>>>> +        :Object state:
>>>> +          :tok: represents the token type. See below for values.
>>>> +          :pos: is the position of the first character in the lexeme.
>>>> +          :cursor: is the position of the next character.
>>>
>>> Define "position" :)  It's an index in self.src.
>>>
>>
>> Good call.
>>
>>> self.cursor and self.pos are not used outside accept().  Not sure thet
>>> belong into interface documentation.
>>>
>>
>> Fair point, though I was on a mission to document exactly how the parser
>> works even at the internal level, because accept(), despite being
>> "public", is really more of an internal function here.
>>
>> I am somewhat partial to documenting these state variables for my own
>> sake so that I can remember the way this lexer behaves.
> 
> I understand why you want to document how they work.  Since they're
> internal to accept(), a comment in accept() seems more proper than
> accept() doc string.  Admittedly doesn't matter that much, as accept()
> is internal to the class.
> 

OK, I'll take it into consideration and see what subjectively looks and 
feels the nicest.

>>>> +          :val: is the variable value of the token, if any.
>>>
>>> Missing: self.info, which *is* used outside accept().
>>>
>>
>> Oh, yes.
>>
>>>> +
>>>> +        Single-character tokens:
>>>> +
>>>> +        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
>>>> +        ``LSQB``, and ``RSQB``.
>>>
>>> "These include ..." is misleading.  This is the complete list of
>>> single-character tokens.
>>>
>>
>> I'm just testing your ability to recognize the difference between proper
>> and improper subsets.
>>
>> (Joking. I'll reword to avoid that ambiguity.)
>>
>>>> +        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
>>>> +        lexeme.  ``val`` is ``None``.
>>>> +
>>>> +        Multi-character tokens:
>>>> +
>>>> +        - ``COMMENT``:
>>>> +
>>>> +          - This token is not normally yielded by the lexer, but it
>>>> +            can be when ``skip_comment`` is False.
>>>> +          - ``tok`` is the value ``"#"``.
>>>> +          - ``val`` is a string including all chars until end-of-line.
>>>> +
>>>> +        - ``STRING``:
>>>> +
>>>> +          - ``tok`` is the ``"'"``, the single quote.
>>>> +          - ``value`` is the string, *excluding* the quotes.
>>>> +
>>>> +        - ``TRUE`` and ``FALSE``:
>>>> +
>>>> +          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
>>>> +          - ``val`` is either ``True`` or ``False`` accordingly.
>>>> +
>>>> +        - ``NEWLINE`` and ``SPACE``:
>>>> +
>>>> +          - These are consumed by the lexer directly. ``line_pos`` and
>>>> +            ``info`` are advanced when ``NEWLINE`` is encountered.
>>>> +            ``tok`` is set to ``None`` upon reaching EOF.
>>>> +
>>>> +        :param skip_comment:
>>>> +            When false, return ``COMMENT`` tokens.
>>>> +            This is used when reading documentation blocks.
>>>
>>> The doc string mostly describes possible state on return of accept().
>>> *Within* accept(), self.tok may be any character.
>>>
>>> "Mostly" because item ``NEWLINE`` and ``SPACE`` is about something that
>>> happens within accept().
>>>
>>
>> Almost kinda-sorta. The value of "tok" is important there, too.
> 
> --verbose?
> 

Fair enough. I'll trim it down. There is some future bleed from some 
experimental stuff I cut out here.

(It's been banished to some realm even further beyond pt5c, the 
oft-feared but seldom-mentioned pt7. Spoken of in frightened whispers, 
leading QAPI scholars are as of yet unable to confirm it truly exists.)

>>> Perhaps phrasing it as a postcondition would be clearer:
>>>
>>>       Read and store the next token.
>>>
>>>       On return, self.tok is the token type, self.info is describes its
>>>       source location, and self.value is the token's value.
>>>
>>>       The possible token types and their values are
>>>
>>>       ...
>>>
>>
>> OK, I will play with this suggestion while I try to clean up the docs.
>>
>>>> +        """
>>>>            while True:
>>>>                self.tok = self.src[self.cursor]
>>>>                self.pos = self.cursor
>>
>> Thanks for taking a look at this one.
> 
> Thank *you* for documenting my[*] code!
> 
> 
> [*] Some of it mine in the sense I wrote it, some of it mine in the
> sense I maintain it.
> 
> 

I assure you it's entirely selfish. I have the memory of a goldfish and 
the docs I wrote myself here have *already* come in handy for reminding 
myself what's going on in here.

--js
John Snow May 7, 2021, 1:34 a.m. UTC | #5
On 4/25/21 9:27 AM, Markus Armbruster wrote:
> Another ignorant doc string markup question...  how am I supposed to see
> that exprs and docs are attributes, and not global variables?
> 

The syntax is apparently supposed to be :py:attr:`MyClass.attr`. Though, 
it doesn't seem to be working for me. I can write :py:attr:`bzbxglkdsgl` 
and the build succeeds. I gotta hunch:

Sphinx was designed to parse ReST written by hand. The " .. py:method::" 
directives are ones you'd use when using sphinx in that style. Those 
directives are what create an object in Sphinx's cross-reference system. 
Later, if you use :py:meth:`foo`, it references that specific object.

Sphinx autodoc is a system that parses your code and automatically 
generates py:method:: and py:class:: directives for you, allowing the 
reference syntax to work.

MY HUNCH is that for field list markup within a docstring -- things like 
:ivar: -- that there is not any corresponding object being created, 
rendering cross-references for things at that scope when using autodoc 
ineffective.

BOO, BOO, A THOUSAND TIMES BOO TO THIS.

Argh, yep.

If I use:

     .. py:attribute:: exprs 

 

         Resulting parsed expressions. 


instead of

:ivar exprs: Resulting parsed expressions

then the syntax :attr:`qapi.parser.QAPISchemaParser.exprs` does resolve 
into a clickable hyperlink on the rendered output.

  ____   ___   ___   ___  _
| __ ) / _ \ / _ \ / _ \| |
|  _ \| | | | | | | | | | |
| |_) | |_| | |_| | |_| |_|
|____/ \___/ \___/ \___/(_)


Sigh. Well, while I'm here doing the research and talking to myself, the 
syntax :attr:`exprs` also works when you have the target defined. It 
doesn't have to be as verbose. With my testing setup of using the 
default role of "any", even just `exprs` works.

I wonder if there's the possibility of having sphinx enhance :ivar: and 
:cvar: to automatically create the same kind of reference target as 
py:attribute:: does.

Problems for later.

For now ...

``.exprs`` and ``.docs``?

--js
Markus Armbruster May 7, 2021, 8:25 a.m. UTC | #6
John Snow <jsnow@redhat.com> writes:

> On 4/25/21 9:27 AM, Markus Armbruster wrote:
>> Another ignorant doc string markup question...  how am I supposed to see
>> that exprs and docs are attributes, and not global variables?
>> 
>
> The syntax is apparently supposed to be :py:attr:`MyClass.attr`. Though, 
> it doesn't seem to be working for me. I can write :py:attr:`bzbxglkdsgl` 
> and the build succeeds. I gotta hunch:

[Problems...]

>
>   ____   ___   ___   ___  _
> | __ ) / _ \ / _ \ / _ \| |
> |  _ \| | | | | | | | | | |
> | |_) | |_| | |_| | |_| |_|
> |____/ \___/ \___/ \___/(_)
>
>
> Sigh. Well, while I'm here doing the research and talking to myself, the 
> syntax :attr:`exprs` also works when you have the target defined. It 
> doesn't have to be as verbose. With my testing setup of using the 
> default role of "any", even just `exprs` works.
>
> I wonder if there's the possibility of having sphinx enhance :ivar: and 
> :cvar: to automatically create the same kind of reference target as 
> py:attribute:: does.
>
> Problems for later.
>
> For now ...
>
> ``.exprs`` and ``.docs``?

Works for me.
diff mbox series

Patch

diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index dbbd0fcbc2f..8fc77808ace 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -51,7 +51,24 @@  def __init__(self, parser: 'QAPISchemaParser', msg: str):
 
 
 class QAPISchemaParser:
+    """
+    Performs parsing of a QAPI schema source file.
 
+    :param fname: Path to the source file.
+    :param previously_included:
+        The absolute paths of previously included source files.
+        Only used by recursive calls to avoid re-parsing files.
+    :param incl_info:
+       `QAPISourceInfo` for the parent document.
+       This may be None if this is the root schema document.
+
+    :ivar exprs: Resulting parsed expressions.
+    :ivar docs: Resulting parsed documentation blocks.
+
+    :raise OSError: For problems opening the root schema document.
+    :raise QAPIParseError: For JSON or QAPIDoc syntax problems.
+    :raise QAPISemError: For various semantic issues with the schema.
+    """
     def __init__(self,
                  fname: str,
                  previously_included: Optional[Set[str]] = None,
@@ -77,6 +94,11 @@  def __init__(self,
         self._parse()
 
     def _parse(self) -> None:
+        """
+        Parse the QAPI schema document.
+
+        :return: None; results are stored in ``exprs`` and ``docs``.
+        """
         cur_doc = None
 
         with open(self._fname, 'r', encoding='utf-8') as fp:
@@ -197,6 +219,50 @@  def _check(name: str, value: object) -> List[str]:
             raise QAPISemError(info, "unknown pragma '%s'" % name)
 
     def accept(self, skip_comment: bool = True) -> None:
+        """
+        Read the next lexeme and process it into a token.
+
+        :Object state:
+          :tok: represents the token type. See below for values.
+          :pos: is the position of the first character in the lexeme.
+          :cursor: is the position of the next character.
+          :val: is the variable value of the token, if any.
+
+        Single-character tokens:
+
+        These include ``LBRACE``, ``RBRACE``, ``COLON``, ``COMMA``,
+        ``LSQB``, and ``RSQB``.  ``tok`` holds the single character
+        lexeme.  ``val`` is ``None``.
+
+        Multi-character tokens:
+
+        - ``COMMENT``:
+
+          - This token is not normally yielded by the lexer, but it
+            can be when ``skip_comment`` is False.
+          - ``tok`` is the value ``"#"``.
+          - ``val`` is a string including all chars until end-of-line.
+
+        - ``STRING``:
+
+          - ``tok`` is the ``"'"``, the single quote.
+          - ``value`` is the string, *excluding* the quotes.
+
+        - ``TRUE`` and ``FALSE``:
+
+          - ``tok`` is either ``"t"`` or ``"f"`` accordingly.
+          - ``val`` is either ``True`` or ``False`` accordingly.
+
+        - ``NEWLINE`` and ``SPACE``:
+
+          - These are consumed by the lexer directly. ``line_pos`` and
+            ``info`` are advanced when ``NEWLINE`` is encountered.
+            ``tok`` is set to ``None`` upon reaching EOF.
+
+        :param skip_comment:
+            When false, return ``COMMENT`` tokens.
+            This is used when reading documentation blocks.
+        """
         while True:
             self.tok = self.src[self.cursor]
             self.pos = self.cursor