diff mbox series

[v2,06/16] qapi: Restrict strings to printable ASCII

Message ID 20190910063724.28470-7-armbru@redhat.com
State New
Headers show
Series qapi: Schema language cleanups & doc improvements | expand

Commit Message

Markus Armbruster Sept. 10, 2019, 6:37 a.m. UTC
RFC 8259 on string contents:

   All Unicode characters may be placed within the quotation marks,
   except for the characters that MUST be escaped: quotation mark,
   reverse solidus, and the control characters (U+0000 through
   U+001F).

The QAPI schema parser accepts both less and more than JSON: it
accepts only ASCII with \u (less), and accepts control characters
other than LF (new line) unescaped.  How it treats unescaped non-ASCII
input differs between Python 2 and Python 3.

Make it accept strictly less: require printable ASCII.  Drop support
for \b, \f, \n, \r, \t.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 scripts/qapi/common.py                        | 25 ++++++-------------
 tests/Makefile.include                        |  3 ++-
 tests/qapi-schema/string-code-point-127.err   |  1 +
 ...de-str.exit => string-code-point-127.exit} |  0
 tests/qapi-schema/string-code-point-127.json  |  2 ++
 ...code-str.out => string-code-point-127.out} |  0
 tests/qapi-schema/string-code-point-31.err    |  1 +
 tests/qapi-schema/string-code-point-31.exit   |  1 +
 tests/qapi-schema/string-code-point-31.json   |  2 ++
 tests/qapi-schema/string-code-point-31.out    |  0
 tests/qapi-schema/unicode-str.err             |  1 -
 tests/qapi-schema/unicode-str.json            |  2 --
 12 files changed, 17 insertions(+), 21 deletions(-)
 create mode 100644 tests/qapi-schema/string-code-point-127.err
 rename tests/qapi-schema/{unicode-str.exit => string-code-point-127.exit} (100%)
 create mode 100644 tests/qapi-schema/string-code-point-127.json
 rename tests/qapi-schema/{unicode-str.out => string-code-point-127.out} (100%)
 create mode 100644 tests/qapi-schema/string-code-point-31.err
 create mode 100644 tests/qapi-schema/string-code-point-31.exit
 create mode 100644 tests/qapi-schema/string-code-point-31.json
 create mode 100644 tests/qapi-schema/string-code-point-31.out
 delete mode 100644 tests/qapi-schema/unicode-str.err
 delete mode 100644 tests/qapi-schema/unicode-str.json

Comments

Eric Blake Sept. 10, 2019, 3:22 p.m. UTC | #1
On 9/10/19 1:37 AM, Markus Armbruster wrote:
> RFC 8259 on string contents:
> 
>    All Unicode characters may be placed within the quotation marks,
>    except for the characters that MUST be escaped: quotation mark,
>    reverse solidus, and the control characters (U+0000 through
>    U+001F).
> 
> The QAPI schema parser accepts both less and more than JSON: it
> accepts only ASCII with \u (less), and accepts control characters
> other than LF (new line) unescaped.  How it treats unescaped non-ASCII
> input differs between Python 2 and Python 3.
> 
> Make it accept strictly less: require printable ASCII.  Drop support
> for \b, \f, \n, \r, \t.

Fair enough.  It doesn't prevent QMP clients from sending strings with
non-ASCII characters, merely that those strings will never match the
schema because we have guaranteed our schema is limited to ASCII.  (This
change means we are promising to never allow { "execute": "a\tb" } as a
valid QMP command, for instance.)

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

> @@ -523,17 +523,7 @@ class QAPISchemaParser(object):
>                      if ch == '\n':
>                          raise QAPIParseError(self, 'Missing terminating "\'"')
>                      if esc:
> -                        if ch == 'b':
> -                            string += '\b'
> -                        elif ch == 'f':
> -                            string += '\f'
> -                        elif ch == 'n':
> -                            string += '\n'

Is it worth a comment in the code that we are specifically not parsing
all possible JSON escapes, because of the later requirement that QAPI
strings be limited to the subset of printable ASCII?

Reviewed-by: Eric Blake <eblake@redhat.com>
Markus Armbruster Sept. 13, 2019, 2:28 p.m. UTC | #2
Eric Blake <eblake@redhat.com> writes:

> On 9/10/19 1:37 AM, Markus Armbruster wrote:
>> RFC 8259 on string contents:
>> 
>>    All Unicode characters may be placed within the quotation marks,
>>    except for the characters that MUST be escaped: quotation mark,
>>    reverse solidus, and the control characters (U+0000 through
>>    U+001F).
>> 
>> The QAPI schema parser accepts both less and more than JSON: it
>> accepts only ASCII with \u (less), and accepts control characters
>> other than LF (new line) unescaped.  How it treats unescaped non-ASCII
>> input differs between Python 2 and Python 3.
>> 
>> Make it accept strictly less: require printable ASCII.  Drop support
>> for \b, \f, \n, \r, \t.
>
> Fair enough.  It doesn't prevent QMP clients from sending strings with
> non-ASCII characters, merely that those strings will never match the
> schema because we have guaranteed our schema is limited to ASCII.  (This

Note that this only affects QMP commands, events, parameter and enum
value names, *not* non-enum string arguments.

> change means we are promising to never allow { "execute": "a\tb" } as a
> valid QMP command, for instance.)

We're not actually promising anything.  We're merely making it slightly
harder to do: need to revert this patch first.  Feels quite unlikely,
though.

>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>
>> @@ -523,17 +523,7 @@ class QAPISchemaParser(object):
>>                      if ch == '\n':
>>                          raise QAPIParseError(self, 'Missing terminating "\'"')
>>                      if esc:
>> -                        if ch == 'b':
>> -                            string += '\b'
>> -                        elif ch == 'f':
>> -                            string += '\f'
>> -                        elif ch == 'n':
>> -                            string += '\n'
>
> Is it worth a comment in the code that we are specifically not parsing
> all possible JSON escapes, because of the later requirement that QAPI
> strings be limited to the subset of printable ASCII?

Can't hurt.

> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!
diff mbox series

Patch

diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index 54d02458b5..c3dfad024f 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -523,17 +523,7 @@  class QAPISchemaParser(object):
                     if ch == '\n':
                         raise QAPIParseError(self, 'Missing terminating "\'"')
                     if esc:
-                        if ch == 'b':
-                            string += '\b'
-                        elif ch == 'f':
-                            string += '\f'
-                        elif ch == 'n':
-                            string += '\n'
-                        elif ch == 'r':
-                            string += '\r'
-                        elif ch == 't':
-                            string += '\t'
-                        elif ch == 'u':
+                        if ch == 'u':
                             value = 0
                             for _ in range(0, 4):
                                 ch = self.src[self.cursor]
@@ -552,20 +542,21 @@  class QAPISchemaParser(object):
                                                      'For now, \\u escape '
                                                      'only supports non-zero '
                                                      'values up to \\u007f')
-                            string += chr(value)
-                        elif ch in '\\/\'"':
-                            string += ch
-                        else:
+                            ch = chr(value)
+                        elif ch not in '\\/\'"':
                             raise QAPIParseError(self,
                                                  "Unknown escape \\%s" % ch)
                         esc = False
                     elif ch == '\\':
                         esc = True
+                        continue
                     elif ch == "'":
                         self.val = string
                         return
-                    else:
-                        string += ch
+                    if ord(ch) < 32 or ord(ch) >= 127:
+                        raise QAPIParseError(
+                            self, "Funny character in string")
+                    string += ch
             elif self.src.startswith('true', self.pos):
                 self.val = True
                 self.cursor += 3
diff --git a/tests/Makefile.include b/tests/Makefile.include
index f5ac09549c..403748579f 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -452,6 +452,8 @@  qapi-schema += returns-array-bad.json
 qapi-schema += returns-dict.json
 qapi-schema += returns-unknown.json
 qapi-schema += returns-whitelist.json
+qapi-schema += string-code-point-31.json
+qapi-schema += string-code-point-127.json
 qapi-schema += struct-base-clash-deep.json
 qapi-schema += struct-base-clash.json
 qapi-schema += struct-data-invalid.json
@@ -463,7 +465,6 @@  qapi-schema += type-bypass-bad-gen.json
 qapi-schema += unclosed-list.json
 qapi-schema += unclosed-object.json
 qapi-schema += unclosed-string.json
-qapi-schema += unicode-str.json
 qapi-schema += union-base-empty.json
 qapi-schema += union-base-no-discriminator.json
 qapi-schema += union-branch-case.json
diff --git a/tests/qapi-schema/string-code-point-127.err b/tests/qapi-schema/string-code-point-127.err
new file mode 100644
index 0000000000..c310910c23
--- /dev/null
+++ b/tests/qapi-schema/string-code-point-127.err
@@ -0,0 +1 @@ 
+tests/qapi-schema/string-code-point-127.json:2:14: Funny character in string
diff --git a/tests/qapi-schema/unicode-str.exit b/tests/qapi-schema/string-code-point-127.exit
similarity index 100%
rename from tests/qapi-schema/unicode-str.exit
rename to tests/qapi-schema/string-code-point-127.exit
diff --git a/tests/qapi-schema/string-code-point-127.json b/tests/qapi-schema/string-code-point-127.json
new file mode 100644
index 0000000000..480318a69f
--- /dev/null
+++ b/tests/qapi-schema/string-code-point-127.json
@@ -0,0 +1,2 @@ 
+# We accept printable ASCII: code points 32..126.  Test code point 127:
+{ 'command': '' }
diff --git a/tests/qapi-schema/unicode-str.out b/tests/qapi-schema/string-code-point-127.out
similarity index 100%
rename from tests/qapi-schema/unicode-str.out
rename to tests/qapi-schema/string-code-point-127.out
diff --git a/tests/qapi-schema/string-code-point-31.err b/tests/qapi-schema/string-code-point-31.err
new file mode 100644
index 0000000000..45797928d9
--- /dev/null
+++ b/tests/qapi-schema/string-code-point-31.err
@@ -0,0 +1 @@ 
+tests/qapi-schema/string-code-point-31.json:2:14: Funny character in string
diff --git a/tests/qapi-schema/string-code-point-31.exit b/tests/qapi-schema/string-code-point-31.exit
new file mode 100644
index 0000000000..d00491fd7e
--- /dev/null
+++ b/tests/qapi-schema/string-code-point-31.exit
@@ -0,0 +1 @@ 
+1
diff --git a/tests/qapi-schema/string-code-point-31.json b/tests/qapi-schema/string-code-point-31.json
new file mode 100644
index 0000000000..f186cbd720
--- /dev/null
+++ b/tests/qapi-schema/string-code-point-31.json
@@ -0,0 +1,2 @@ 
+# We accept printable ASCII: code points 32..126.  Test code point 127:
+{ 'command': '' }
diff --git a/tests/qapi-schema/string-code-point-31.out b/tests/qapi-schema/string-code-point-31.out
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/qapi-schema/unicode-str.err b/tests/qapi-schema/unicode-str.err
deleted file mode 100644
index f621cd6448..0000000000
--- a/tests/qapi-schema/unicode-str.err
+++ /dev/null
@@ -1 +0,0 @@ 
-tests/qapi-schema/unicode-str.json:2: 'command' uses invalid name 'é'
diff --git a/tests/qapi-schema/unicode-str.json b/tests/qapi-schema/unicode-str.json
deleted file mode 100644
index 5253a1b9f3..0000000000
--- a/tests/qapi-schema/unicode-str.json
+++ /dev/null
@@ -1,2 +0,0 @@ 
-# we don't support full Unicode strings, yet
-{ 'command': 'é' }