diff mbox

[v12,21/36] qapi: Tighten the regex on valid names

Message ID 1447836791-369-22-git-send-email-eblake@redhat.com
State New
Headers show

Commit Message

Eric Blake Nov. 18, 2015, 8:52 a.m. UTC
We already documented that qapi names should match specific
patterns (such as starting with a letter unless it was an enum
value or a downstream extension).  Tighten that from a suggestion
into a hard requirement, which frees up names beginning with a
single underscore for qapi internal usage.  Also restrict things
to avoid 'a__b' or 'a--b' (that is, dash and underscore must not
occur consecutively).

Add a new test, reserved-member-underscore, to demonstrate the
tighter checking.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: new patch
---
 docs/qapi-code-gen.txt                            | 19 ++++++++++---------
 scripts/qapi.py                                   | 12 +++++++-----
 tests/Makefile                                    |  1 +
 tests/qapi-schema/reserved-member-underscore.err  |  1 +
 tests/qapi-schema/reserved-member-underscore.exit |  1 +
 tests/qapi-schema/reserved-member-underscore.json |  4 ++++
 tests/qapi-schema/reserved-member-underscore.out  |  0
 7 files changed, 24 insertions(+), 14 deletions(-)
 create mode 100644 tests/qapi-schema/reserved-member-underscore.err
 create mode 100644 tests/qapi-schema/reserved-member-underscore.exit
 create mode 100644 tests/qapi-schema/reserved-member-underscore.json
 create mode 100644 tests/qapi-schema/reserved-member-underscore.out

diff --git a/tests/qapi-schema/reserved-member-underscore.out b/tests/qapi-schema/reserved-member-underscore.out
new file mode 100644
index 0000000..e69de29

Comments

Markus Armbruster Nov. 18, 2015, 11:51 a.m. UTC | #1
Eric Blake <eblake@redhat.com> writes:

> We already documented that qapi names should match specific
> patterns (such as starting with a letter unless it was an enum
> value or a downstream extension).  Tighten that from a suggestion
> into a hard requirement, which frees up names beginning with a
> single underscore for qapi internal usage.

If we care enough about naming conventions to document them, enforcing
them makes only sense.

>                                             Also restrict things
> to avoid 'a__b' or 'a--b' (that is, dash and underscore must not
> occur consecutively).

I feel this is entering "foolish names" territory, where things like
"IcAnFiNdNaMeSeVeNlEsSrEaDaBlEtHaNStudlyCaps" live.  Catching fools
automatically is generally futile, they're too creative :)

Let's drop this part.

> Add a new test, reserved-member-underscore, to demonstrate the
> tighter checking.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
> v12: new patch
> ---
>  docs/qapi-code-gen.txt                            | 19 ++++++++++---------
>  scripts/qapi.py                                   | 12 +++++++-----
>  tests/Makefile                                    |  1 +
>  tests/qapi-schema/reserved-member-underscore.err  |  1 +
>  tests/qapi-schema/reserved-member-underscore.exit |  1 +
>  tests/qapi-schema/reserved-member-underscore.json |  4 ++++
>  tests/qapi-schema/reserved-member-underscore.out  |  0
>  7 files changed, 24 insertions(+), 14 deletions(-)
>  create mode 100644 tests/qapi-schema/reserved-member-underscore.err
>  create mode 100644 tests/qapi-schema/reserved-member-underscore.exit
>  create mode 100644 tests/qapi-schema/reserved-member-underscore.json
>  create mode 100644 tests/qapi-schema/reserved-member-underscore.out
>
> diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
> index ceb9a78..ec225d1 100644
> --- a/docs/qapi-code-gen.txt
> +++ b/docs/qapi-code-gen.txt
> @@ -118,17 +118,18 @@ tracking optional fields.
>
>  Any name (command, event, type, field, or enum value) beginning with
>  "x-" is marked experimental, and may be withdrawn or changed
> -incompatibly in a future release.  Downstream vendors may add
> -extensions; such extensions should begin with a prefix matching
> +incompatibly in a future release.  All names must begin with a letter,
> +and contain only ASCII letters, digits, dash, and underscore, where
> +dash and underscore cannot occur consecutively.  There are two
> +exceptions: enum values may start with a digit, and any extensions
> +added by downstream vendors should start with a prefix matching
>  "__RFQDN_" (for the reverse-fully-qualified-domain-name of the
>  vendor), even if the rest of the name uses dash (example:
> -__com.redhat_drive-mirror).  Other than downstream extensions (with
> -leading underscore and the use of dots), all names should begin with a
> -letter, and contain only ASCII letters, digits, dash, and underscore.
> -Names beginning with 'q_' are reserved for the generator: QMP names
> -that resemble C keywords or other problematic strings will be munged
> -in C to use this prefix.  For example, a field named "default" in
> -qapi becomes "q_default" in the generated C code.
> +__com.redhat_drive-mirror).  Names beginning with 'q_' are reserved
> +for the generator: QMP names that resemble C keywords or other
> +problematic strings will be munged in C to use this prefix.  For
> +example, a field named "default" in qapi becomes "q_default" in the
> +generated C code.
>
>  In the rest of this document, usage lines are given for each
>  expression type, with literal strings written in lower case and
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index 4a30bc0..e6d014b 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -353,9 +353,11 @@ def discriminator_find_enum_define(expr):
>      return find_enum(discriminator_type)
>
>
> -# FIXME should enforce "other than downstream extensions [...], all
> -# names should begin with a letter".
> -valid_name = re.compile('^[a-zA-Z_][a-zA-Z0-9_.-]*$')
> +# Names must be letters, numbers, -, and _.  They must start with letter,
> +# except for downstream extensions which must start with __RFQDN_.
> +# Dots are only valid in the downstream extension prefix.
> +valid_name = re.compile('^(__[a-zA-Z][a-zA-Z0-9.]*_)?'
> +                        '[a-zA-Z][a-zA-Z0-9]*([_-][a-zA-Z0-9]+)*$')

This regexp consists of two parts: optional __RFQDN_ prefix and the name
proper.

The latter stays simpler if we don't attempt to catch adjacent [-_].

The former isn't quite right.  According to RFC 822 Appendix 1 - Domain
Name Syntax Specification:

* We must accept '-' in addition to digits.

* We require RFQDN to start with a letter.  This is still a loose match.
  The tight match is "labels separated by dots", where labels consist of
  letters, digits and '-', starting with a letter.  I wouldn't bother
  checking first characters are letters at all.

Recommend

   valid_name = re.compile('^(__[a-zA-Z0-9.-]+_)?'
                           '[a-zA-Z][a-zA-Z0-9_-]*$')

>
>
>  def check_name(expr_info, source, name, allow_optional=False,
> @@ -374,8 +376,8 @@ def check_name(expr_info, source, name, allow_optional=False,
>                                  % (source, name))
>      # Enum members can start with a digit, because the generated C
>      # code always prefixes it with the enum name
> -    if enum_member:
> -        membername = '_' + membername
> +    if enum_member and membername[0].isdigit():

What's wrong with the old condition?

> +        membername = 'D' + membername
>      # Reserve the entire 'q_' namespace for c_name()
>      if not valid_name.match(membername) or \
>         c_name(membername, False).startswith('q_'):
> diff --git a/tests/Makefile b/tests/Makefile
> index 1c2c8ee..b70d246 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -320,6 +320,7 @@ qapi-schema += reserved-command-q.json
>  qapi-schema += reserved-member-has.json
>  qapi-schema += reserved-member-q.json
>  qapi-schema += reserved-member-u.json
> +qapi-schema += reserved-member-underscore.json
>  qapi-schema += reserved-type-kind.json
>  qapi-schema += reserved-type-list.json
>  qapi-schema += returns-alternate.json
> diff --git a/tests/qapi-schema/reserved-member-underscore.err b/tests/qapi-schema/reserved-member-underscore.err
> new file mode 100644
> index 0000000..65ff0da
> --- /dev/null
> +++ b/tests/qapi-schema/reserved-member-underscore.err
> @@ -0,0 +1 @@
> +tests/qapi-schema/reserved-member-underscore.json:4: Member of 'data' for struct 'Oops' uses invalid name '_oops'
> diff --git a/tests/qapi-schema/reserved-member-underscore.exit b/tests/qapi-schema/reserved-member-underscore.exit
> new file mode 100644
> index 0000000..d00491f
> --- /dev/null
> +++ b/tests/qapi-schema/reserved-member-underscore.exit
> @@ -0,0 +1 @@
> +1
> diff --git a/tests/qapi-schema/reserved-member-underscore.json b/tests/qapi-schema/reserved-member-underscore.json
> new file mode 100644
> index 0000000..4a3a017
> --- /dev/null
> +++ b/tests/qapi-schema/reserved-member-underscore.json
> @@ -0,0 +1,4 @@
> +# C member name collision
> +# We reject use of a single leading underscore in all names (names must
> +# begin with a letter or a downstream extension double-underscore prefix).
> +{ 'struct': 'Oops', 'data': { '_oops': 'str' } }
> diff --git a/tests/qapi-schema/reserved-member-underscore.out b/tests/qapi-schema/reserved-member-underscore.out
> new file mode 100644
> index 0000000..e69de29
Eric Blake Nov. 18, 2015, 4:57 p.m. UTC | #2
On 11/18/2015 04:51 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> We already documented that qapi names should match specific
>> patterns (such as starting with a letter unless it was an enum
>> value or a downstream extension).  Tighten that from a suggestion
>> into a hard requirement, which frees up names beginning with a
>> single underscore for qapi internal usage.
> 
> If we care enough about naming conventions to document them, enforcing
> them makes only sense.
> 
>>                                             Also restrict things
>> to avoid 'a__b' or 'a--b' (that is, dash and underscore must not
>> occur consecutively).
> 
> I feel this is entering "foolish names" territory, where things like
> "IcAnFiNdNaMeSeVeNlEsSrEaDaBlEtHaNStudlyCaps" live.  Catching fools

IhAdToOmUcHfUnReAdInGtHaT

[It's amazing that scholars can ever manage to correctly interpret old
written languages, thanks to omitted word breaks (scriptio continua) or
omitted vowels (such as in Hebrew)]

> automatically is generally futile, they're too creative :)
> 
> Let's drop this part.

Sure, I can do that. I'll post a fixup patch, as it will affect docs too.

> 
>> Add a new test, reserved-member-underscore, to demonstrate the
>> tighter checking.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---
>> v12: new patch
>> ---

>> +incompatibly in a future release.  All names must begin with a letter,
>> +and contain only ASCII letters, digits, dash, and underscore, where
>> +dash and underscore cannot occur consecutively.  There are two

Namely, right here.

>> +# Names must be letters, numbers, -, and _.  They must start with letter,
>> +# except for downstream extensions which must start with __RFQDN_.
>> +# Dots are only valid in the downstream extension prefix.
>> +valid_name = re.compile('^(__[a-zA-Z][a-zA-Z0-9.]*_)?'
>> +                        '[a-zA-Z][a-zA-Z0-9]*([_-][a-zA-Z0-9]+)*$')
> 
> This regexp consists of two parts: optional __RFQDN_ prefix and the name
> proper.
> 
> The latter stays simpler if we don't attempt to catch adjacent [-_].
> 
> The former isn't quite right.  According to RFC 822 Appendix 1 - Domain
> Name Syntax Specification:
> 
> * We must accept '-' in addition to digits.
> 
> * We require RFQDN to start with a letter.  This is still a loose match.
>   The tight match is "labels separated by dots", where labels consist of
>   letters, digits and '-', starting with a letter.  I wouldn't bother
>   checking first characters are letters at all.
> 
> Recommend
> 
>    valid_name = re.compile('^(__[a-zA-Z0-9.-]+_)?'
>                            '[a-zA-Z][a-zA-Z0-9_-]*$')

Sure, that works for me. It's tighter than what we had before, but
looser than what I proposed so that it allows more RFQDN.  It
potentially lets users confuse us by abusing 'foo__max' or similar, but
I'm less worried about that abuse - it's okay to stop the blatantly
obvious mistakes without worrying about the corner cases, if it makes
the maintenance easier for the cases we do catch.

> 
>>
>>
>>  def check_name(expr_info, source, name, allow_optional=False,
>> @@ -374,8 +376,8 @@ def check_name(expr_info, source, name, allow_optional=False,
>>                                  % (source, name))
>>      # Enum members can start with a digit, because the generated C
>>      # code always prefixes it with the enum name
>> -    if enum_member:
>> -        membername = '_' + membername
>> +    if enum_member and membername[0].isdigit():
> 
> What's wrong with the old condition?
> 
>> +        membername = 'D' + membername

The old code prepended a lone '_', which doesn't work any more with the
tighter valid_name regex, so we have to use some other prefix (I picked
'D').

>>      # Reserve the entire 'q_' namespace for c_name()
>>      if not valid_name.match(membername) or \
>>         c_name(membername, False).startswith('q_'):

The other thing is that I realized that an enum value of 'q-int' was
getting munged to '_q-int' which no longer gets flagged by the c_name()
filter.  But neither would 'Dq-int' get flagged.  So limiting the
munging to just enum values that start with a digit (where we know it
doesn't start with q) means we don't weaken the second condition.

I guess I need to call that out in the commit message; all the more
reason for me to send a fixup.
Markus Armbruster Nov. 18, 2015, 6:08 p.m. UTC | #3
Eric Blake <eblake@redhat.com> writes:

> On 11/18/2015 04:51 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>>> We already documented that qapi names should match specific
>>> patterns (such as starting with a letter unless it was an enum
>>> value or a downstream extension).  Tighten that from a suggestion
>>> into a hard requirement, which frees up names beginning with a
>>> single underscore for qapi internal usage.
>> 
>> If we care enough about naming conventions to document them, enforcing
>> them makes only sense.
>> 
>>>                                             Also restrict things
>>> to avoid 'a__b' or 'a--b' (that is, dash and underscore must not
>>> occur consecutively).
>> 
>> I feel this is entering "foolish names" territory, where things like
>> "IcAnFiNdNaMeSeVeNlEsSrEaDaBlEtHaNStudlyCaps" live.  Catching fools
>
> IhAdToOmUcHfUnReAdInGtHaT
>
> [It's amazing that scholars can ever manage to correctly interpret old
> written languages, thanks to omitted word breaks (scriptio continua) or
> omitted vowels (such as in Hebrew)]

Indeed.

>> automatically is generally futile, they're too creative :)
>> 
>> Let's drop this part.
>
> Sure, I can do that. I'll post a fixup patch, as it will affect docs too.
>
>> 
>>> Add a new test, reserved-member-underscore, to demonstrate the
>>> tighter checking.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
>>> ---
>>> v12: new patch
>>> ---
>
>>> +incompatibly in a future release.  All names must begin with a letter,
>>> +and contain only ASCII letters, digits, dash, and underscore, where
>>> +dash and underscore cannot occur consecutively.  There are two
>
> Namely, right here.
>
>>> +# Names must be letters, numbers, -, and _.  They must start with letter,
>>> +# except for downstream extensions which must start with __RFQDN_.
>>> +# Dots are only valid in the downstream extension prefix.
>>> +valid_name = re.compile('^(__[a-zA-Z][a-zA-Z0-9.]*_)?'
>>> +                        '[a-zA-Z][a-zA-Z0-9]*([_-][a-zA-Z0-9]+)*$')
>> 
>> This regexp consists of two parts: optional __RFQDN_ prefix and the name
>> proper.
>> 
>> The latter stays simpler if we don't attempt to catch adjacent [-_].
>> 
>> The former isn't quite right.  According to RFC 822 Appendix 1 - Domain
>> Name Syntax Specification:
>> 
>> * We must accept '-' in addition to digits.
>> 
>> * We require RFQDN to start with a letter.  This is still a loose match.
>>   The tight match is "labels separated by dots", where labels consist of
>>   letters, digits and '-', starting with a letter.  I wouldn't bother
>>   checking first characters are letters at all.
>> 
>> Recommend
>> 
>>    valid_name = re.compile('^(__[a-zA-Z0-9.-]+_)?'
>>                            '[a-zA-Z][a-zA-Z0-9_-]*$')
>
> Sure, that works for me. It's tighter than what we had before, but
> looser than what I proposed so that it allows more RFQDN.  It
> potentially lets users confuse us by abusing 'foo__max' or similar, but
> I'm less worried about that abuse - it's okay to stop the blatantly
> obvious mistakes without worrying about the corner cases, if it makes
> the maintenance easier for the cases we do catch.
>
>> 
>>>
>>>
>>>  def check_name(expr_info, source, name, allow_optional=False,
>>> @@ -374,8 +376,8 @@ def check_name(expr_info, source, name, allow_optional=False,
>>>                                  % (source, name))
>>>      # Enum members can start with a digit, because the generated C
>>>      # code always prefixes it with the enum name
>>> -    if enum_member:
>>> -        membername = '_' + membername
>>> +    if enum_member and membername[0].isdigit():
>> 
>> What's wrong with the old condition?
>> 
>>> +        membername = 'D' + membername
>
> The old code prepended a lone '_', which doesn't work any more with the
> tighter valid_name regex, so we have to use some other prefix (I picked
> 'D').
>
>>>      # Reserve the entire 'q_' namespace for c_name()
>>>      if not valid_name.match(membername) or \
>>>         c_name(membername, False).startswith('q_'):
>
> The other thing is that I realized that an enum value of 'q-int' was
> getting munged to '_q-int' which no longer gets flagged by the c_name()
> filter.  But neither would 'Dq-int' get flagged.  So limiting the
> munging to just enum values that start with a digit (where we know it
> doesn't start with q) means we don't weaken the second condition.

Aha!  Worth a comment, I think.

> I guess I need to call that out in the commit message; all the more
> reason for me to send a fixup.

Yes, please!
diff mbox

Patch

diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
index ceb9a78..ec225d1 100644
--- a/docs/qapi-code-gen.txt
+++ b/docs/qapi-code-gen.txt
@@ -118,17 +118,18 @@  tracking optional fields.

 Any name (command, event, type, field, or enum value) beginning with
 "x-" is marked experimental, and may be withdrawn or changed
-incompatibly in a future release.  Downstream vendors may add
-extensions; such extensions should begin with a prefix matching
+incompatibly in a future release.  All names must begin with a letter,
+and contain only ASCII letters, digits, dash, and underscore, where
+dash and underscore cannot occur consecutively.  There are two
+exceptions: enum values may start with a digit, and any extensions
+added by downstream vendors should start with a prefix matching
 "__RFQDN_" (for the reverse-fully-qualified-domain-name of the
 vendor), even if the rest of the name uses dash (example:
-__com.redhat_drive-mirror).  Other than downstream extensions (with
-leading underscore and the use of dots), all names should begin with a
-letter, and contain only ASCII letters, digits, dash, and underscore.
-Names beginning with 'q_' are reserved for the generator: QMP names
-that resemble C keywords or other problematic strings will be munged
-in C to use this prefix.  For example, a field named "default" in
-qapi becomes "q_default" in the generated C code.
+__com.redhat_drive-mirror).  Names beginning with 'q_' are reserved
+for the generator: QMP names that resemble C keywords or other
+problematic strings will be munged in C to use this prefix.  For
+example, a field named "default" in qapi becomes "q_default" in the
+generated C code.

 In the rest of this document, usage lines are given for each
 expression type, with literal strings written in lower case and
diff --git a/scripts/qapi.py b/scripts/qapi.py
index 4a30bc0..e6d014b 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -353,9 +353,11 @@  def discriminator_find_enum_define(expr):
     return find_enum(discriminator_type)


-# FIXME should enforce "other than downstream extensions [...], all
-# names should begin with a letter".
-valid_name = re.compile('^[a-zA-Z_][a-zA-Z0-9_.-]*$')
+# Names must be letters, numbers, -, and _.  They must start with letter,
+# except for downstream extensions which must start with __RFQDN_.
+# Dots are only valid in the downstream extension prefix.
+valid_name = re.compile('^(__[a-zA-Z][a-zA-Z0-9.]*_)?'
+                        '[a-zA-Z][a-zA-Z0-9]*([_-][a-zA-Z0-9]+)*$')


 def check_name(expr_info, source, name, allow_optional=False,
@@ -374,8 +376,8 @@  def check_name(expr_info, source, name, allow_optional=False,
                                 % (source, name))
     # Enum members can start with a digit, because the generated C
     # code always prefixes it with the enum name
-    if enum_member:
-        membername = '_' + membername
+    if enum_member and membername[0].isdigit():
+        membername = 'D' + membername
     # Reserve the entire 'q_' namespace for c_name()
     if not valid_name.match(membername) or \
        c_name(membername, False).startswith('q_'):
diff --git a/tests/Makefile b/tests/Makefile
index 1c2c8ee..b70d246 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -320,6 +320,7 @@  qapi-schema += reserved-command-q.json
 qapi-schema += reserved-member-has.json
 qapi-schema += reserved-member-q.json
 qapi-schema += reserved-member-u.json
+qapi-schema += reserved-member-underscore.json
 qapi-schema += reserved-type-kind.json
 qapi-schema += reserved-type-list.json
 qapi-schema += returns-alternate.json
diff --git a/tests/qapi-schema/reserved-member-underscore.err b/tests/qapi-schema/reserved-member-underscore.err
new file mode 100644
index 0000000..65ff0da
--- /dev/null
+++ b/tests/qapi-schema/reserved-member-underscore.err
@@ -0,0 +1 @@ 
+tests/qapi-schema/reserved-member-underscore.json:4: Member of 'data' for struct 'Oops' uses invalid name '_oops'
diff --git a/tests/qapi-schema/reserved-member-underscore.exit b/tests/qapi-schema/reserved-member-underscore.exit
new file mode 100644
index 0000000..d00491f
--- /dev/null
+++ b/tests/qapi-schema/reserved-member-underscore.exit
@@ -0,0 +1 @@ 
+1
diff --git a/tests/qapi-schema/reserved-member-underscore.json b/tests/qapi-schema/reserved-member-underscore.json
new file mode 100644
index 0000000..4a3a017
--- /dev/null
+++ b/tests/qapi-schema/reserved-member-underscore.json
@@ -0,0 +1,4 @@ 
+# C member name collision
+# We reject use of a single leading underscore in all names (names must
+# begin with a letter or a downstream extension double-underscore prefix).
+{ 'struct': 'Oops', 'data': { '_oops': 'str' } }