diff mbox series

[08/28] qapi: Support flat unions tag values with leading digit

Message ID 20210323094025.3569441-9-armbru@redhat.com
State New
Headers show
Series qapi: Enforce naming rules | expand

Commit Message

Markus Armbruster March 23, 2021, 9:40 a.m. UTC
Flat union tag values get checked twice: as enum member name, and as
union branch name.  The former accepts leading digits, the latter
doesn't.  The restriction feels arbitrary.  Skip the latter check.

This can expose c_name() to input it can't handle: a name starting
with a digit.  Improve it to return a valid C identifier for any
input.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 scripts/qapi/common.py | 8 ++++----
 scripts/qapi/expr.py   | 4 +++-
 2 files changed, 7 insertions(+), 5 deletions(-)

Comments

Eric Blake March 23, 2021, 2:11 p.m. UTC | #1
On 3/23/21 4:40 AM, Markus Armbruster wrote:
> Flat union tag values get checked twice: as enum member name, and as
> union branch name.  The former accepts leading digits, the latter
> doesn't.  The restriction feels arbitrary.  Skip the latter check.
> 
> This can expose c_name() to input it can't handle: a name starting
> with a digit.  Improve it to return a valid C identifier for any
> input.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  scripts/qapi/common.py | 8 ++++----
>  scripts/qapi/expr.py   | 4 +++-
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>
John Snow March 23, 2021, 2:49 p.m. UTC | #2
On 3/23/21 5:40 AM, Markus Armbruster wrote:
> Flat union tag values get checked twice: as enum member name, and as
> union branch name.  The former accepts leading digits, the latter
> doesn't.  The restriction feels arbitrary.  Skip the latter check.
> 
> This can expose c_name() to input it can't handle: a name starting
> with a digit.  Improve it to return a valid C identifier for any
> input.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Anything in particular inspire this?

> ---
>   scripts/qapi/common.py | 8 ++++----
>   scripts/qapi/expr.py   | 4 +++-
>   2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> index 11b86beeab..cbd3fd81d3 100644
> --- a/scripts/qapi/common.py
> +++ b/scripts/qapi/common.py
> @@ -18,7 +18,6 @@
>   #: Magic string that gets removed along with all space to its right.
>   EATSPACE = '\033EATSPACE.'
>   POINTER_SUFFIX = ' *' + EATSPACE
> -_C_NAME_TRANS = str.maketrans('.-', '__')
>   
>   
>   def camel_to_upper(value: str) -> str:
> @@ -109,9 +108,10 @@ def c_name(name: str, protect: bool = True) -> str:
>                        'not_eq', 'or', 'or_eq', 'xor', 'xor_eq'])
>       # namespace pollution:
>       polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386'])
> -    name = name.translate(_C_NAME_TRANS)
> -    if protect and (name in c89_words | c99_words | c11_words | gcc_words
> -                    | cpp_words | polluted_words):
> +    name = re.sub(r'[^A-Za-z0-9_]', '_', name)

The regex gets a little more powerful now. Instead of just . and - we 
now translate *everything* that's not an alphanumeric to _.

Does this have a visible impact anywhere, or are we constrained 
somewhere else?

> +    if protect and (name in (c89_words | c99_words | c11_words | gcc_words
> +                             | cpp_words | polluted_words)
> +                    or name[0].isdigit()):

Worth touching the docstring?

:param protect: If true, avoid returning certain ticklish identifiers 

                 (like C keywords) by prepending ``q_``. 


I know the formatting is a hot mess, but I still intend to get to that 
"all at once" with an "enable sphinx" pass later, so just touching the 
content so it gets included in that pass would be fine (to me, at least.)

>           return 'q_' + name
>       return name
>   
> diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
> index cf09fa9fd3..507550c340 100644
> --- a/scripts/qapi/expr.py
> +++ b/scripts/qapi/expr.py
> @@ -246,7 +246,9 @@ def check_union(expr, info):
>   
>       for (key, value) in members.items():
>           source = "'data' member '%s'" % key
> -        check_name_str(key, info, source)
> +        if discriminator is None:
> +            check_name_str(key, info, source)
> +        # else: name is in discriminator enum, which gets checked

I suppose the alternative would be to have allowed check_name_str to 
take an 'allow_leading_digits' parameter (instead of 'enum_member') and 
set that to true here and just deal with the mild inefficiency.

I might have a slight preference to just accept the inefficiency so that 
it's obvious that it's checked regardless of the discriminator 
condition, buuuuut not enough to be pushy about it, I think.

>           check_keys(value, info, source, ['type'], ['if'])
>           check_if(value, info, source)
>           check_type(value['type'], info, source, allow_array=not base)
>
Markus Armbruster March 23, 2021, 4:18 p.m. UTC | #3
John Snow <jsnow@redhat.com> writes:

> On 3/23/21 5:40 AM, Markus Armbruster wrote:
>> Flat union tag values get checked twice: as enum member name, and as
>> union branch name.  The former accepts leading digits, the latter
>> doesn't.  The restriction feels arbitrary.  Skip the latter check.
>> This can expose c_name() to input it can't handle: a name starting
>> with a digit.  Improve it to return a valid C identifier for any
>> input.
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> Anything in particular inspire this?

Just a desire for keeping things simple.  "Any enum type works as
discriminator" is simpler than "any enum works, but branches
corresponding to enum values starting with a digit cannot have members".
Let me elaborate.

This works:

    {'enum': 'Enu', 'data': ['0', 'eins', '2']}
    {'struct': 'St', 'data': {'s': 'str'}}
    {'union': 'Uni',
     'base': {'type': 'Enu'},
     'discriminator': 'type',
     'data': {
       'eins': 'St'}}

But if you change the last line to

       '0': 'St'}}

you get told off:

    scripts/qapi-gen.py: /dev/stdin: In union 'Uni':
    /dev/stdin:3: 'data' member '0' has an invalid name

>
>> ---
>>   scripts/qapi/common.py | 8 ++++----
>>   scripts/qapi/expr.py   | 4 +++-
>>   2 files changed, 7 insertions(+), 5 deletions(-)
>> diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
>> index 11b86beeab..cbd3fd81d3 100644
>> --- a/scripts/qapi/common.py
>> +++ b/scripts/qapi/common.py
>> @@ -18,7 +18,6 @@
>>   #: Magic string that gets removed along with all space to its right.
>>   EATSPACE = '\033EATSPACE.'
>>   POINTER_SUFFIX = ' *' + EATSPACE
>> -_C_NAME_TRANS = str.maketrans('.-', '__')
>>     
>>   def camel_to_upper(value: str) -> str:
>> @@ -109,9 +108,10 @@ def c_name(name: str, protect: bool = True) -> str:
>>                        'not_eq', 'or', 'or_eq', 'xor', 'xor_eq'])
>>       # namespace pollution:
>>       polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386'])
>> -    name = name.translate(_C_NAME_TRANS)
>> -    if protect and (name in c89_words | c99_words | c11_words | gcc_words
>> -                    | cpp_words | polluted_words):
>> +    name = re.sub(r'[^A-Za-z0-9_]', '_', name)
>
> The regex gets a little more powerful now. Instead of just . and - we
> now translate *everything* that's not an alphanumeric to _.

Yes.

> Does this have a visible impact anywhere, or are we constrained
> somewhere else?

If it had, we'd generate C that doesn't compile.  Except when we're
unlucky.  Then it compiles, but means something different than we want.

I need to catch "C identifiers can't start with a digit" to make the
remainder of the patch work (right below).  Instead of doing just that,
I chose to do a complete job, and ensure the function always returns a
valid C identifier.

>> +    if protect and (name in (c89_words | c99_words | c11_words | gcc_words
>> +                             | cpp_words | polluted_words)
>> +                    or name[0].isdigit()):
>
> Worth touching the docstring?
>
> :param protect: If true, avoid returning certain ticklish identifiers 
>                 (like C keywords) by prepending ``q_``. 

It doesn't become wrong.  Care to suggest an improvement?

> I know the formatting is a hot mess, but I still intend to get to that
> "all at once" with an "enable sphinx" pass later, so just touching the 
> content so it gets included in that pass would be fine (to me, at least.)
>
>>           return 'q_' + name
>>       return name
>>   diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
>> index cf09fa9fd3..507550c340 100644
>> --- a/scripts/qapi/expr.py
>> +++ b/scripts/qapi/expr.py
>> @@ -246,7 +246,9 @@ def check_union(expr, info):
>>         for (key, value) in members.items():
>>           source = "'data' member '%s'" % key
>> -        check_name_str(key, info, source)
>> +        if discriminator is None:
>> +            check_name_str(key, info, source)
>> +        # else: name is in discriminator enum, which gets checked
>
> I suppose the alternative would be to have allowed check_name_str to
> take an 'allow_leading_digits' parameter (instead of 'enum_member')
> and set that to true here and just deal with the mild inefficiency.
>
> I might have a slight preference to just accept the inefficiency so
> that it's obvious that it's checked regardless of the discriminator 
> condition, buuuuut not enough to be pushy about it, I think.

Sounds like a defensible idea, but this series is heading in the
opposite direction; see the next few patches.

>>           check_keys(value, info, source, ['type'], ['if'])
>>           check_if(value, info, source)
>>           check_type(value['type'], info, source, allow_array=not base)
>>
Markus Armbruster March 23, 2021, 9:07 p.m. UTC | #4
Markus Armbruster <armbru@redhat.com> writes:

> John Snow <jsnow@redhat.com> writes:
>
>> On 3/23/21 5:40 AM, Markus Armbruster wrote:
>>> Flat union tag values get checked twice: as enum member name, and as
>>> union branch name.  The former accepts leading digits, the latter
>>> doesn't.  The restriction feels arbitrary.  Skip the latter check.
>>>
>>> This can expose c_name() to input it can't handle: a name starting
>>> with a digit.  Improve it to return a valid C identifier for any
>>> input.
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>
>> Anything in particular inspire this?
>
> Just a desire for keeping things simple.  "Any enum type works as
> discriminator" is simpler than "any enum works, but branches
> corresponding to enum values starting with a digit cannot have members".
> Let me elaborate.
>
> This works:
>
>     {'enum': 'Enu', 'data': ['0', 'eins', '2']}
>     {'struct': 'St', 'data': {'s': 'str'}}
>     {'union': 'Uni',
>      'base': {'type': 'Enu'},
>      'discriminator': 'type',
>      'data': {
>        'eins': 'St'}}
>
> But if you change the last line to
>
>        '0': 'St'}}
>
> you get told off:
>
>     scripts/qapi-gen.py: /dev/stdin: In union 'Uni':
>     /dev/stdin:3: 'data' member '0' has an invalid name

Improved commit message:

    qapi: Permit flat union members for any tag value

    Flat union branch names match the tag enum's member names.  Omitted
    branches default to "no members for this tag value".

    Branch names starting with a digit get rejected like "'data' member
    '0' has an invalid name".  However, omitting the branch works.

    This is because flat union tag values get checked twice: as enum
    member name, and as union branch name.  The former accepts leading
    digits, the latter doesn't.

    Branches whose names start with a digit therefore cannot have members.
    Feels wrong.  Get rid of the restriction by skipping the latter check.

    This can expose c_name() to input it can't handle: a name starting
    with a digit.  Improve it to return a valid C identifier for any
    input.

    Signed-off-by: Markus Armbruster <armbru@redhat.com>

[...]
diff mbox series

Patch

diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index 11b86beeab..cbd3fd81d3 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -18,7 +18,6 @@ 
 #: Magic string that gets removed along with all space to its right.
 EATSPACE = '\033EATSPACE.'
 POINTER_SUFFIX = ' *' + EATSPACE
-_C_NAME_TRANS = str.maketrans('.-', '__')
 
 
 def camel_to_upper(value: str) -> str:
@@ -109,9 +108,10 @@  def c_name(name: str, protect: bool = True) -> str:
                      'not_eq', 'or', 'or_eq', 'xor', 'xor_eq'])
     # namespace pollution:
     polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386'])
-    name = name.translate(_C_NAME_TRANS)
-    if protect and (name in c89_words | c99_words | c11_words | gcc_words
-                    | cpp_words | polluted_words):
+    name = re.sub(r'[^A-Za-z0-9_]', '_', name)
+    if protect and (name in (c89_words | c99_words | c11_words | gcc_words
+                             | cpp_words | polluted_words)
+                    or name[0].isdigit()):
         return 'q_' + name
     return name
 
diff --git a/scripts/qapi/expr.py b/scripts/qapi/expr.py
index cf09fa9fd3..507550c340 100644
--- a/scripts/qapi/expr.py
+++ b/scripts/qapi/expr.py
@@ -246,7 +246,9 @@  def check_union(expr, info):
 
     for (key, value) in members.items():
         source = "'data' member '%s'" % key
-        check_name_str(key, info, source)
+        if discriminator is None:
+            check_name_str(key, info, source)
+        # else: name is in discriminator enum, which gets checked
         check_keys(value, info, source, ['type'], ['if'])
         check_if(value, info, source)
         check_type(value['type'], info, source, allow_array=not base)