Message ID | 20190910063724.28470-7-armbru@redhat.com |
---|---|
State | New |
Headers | show |
Series | qapi: Schema language cleanups & doc improvements | expand |
On 9/10/19 1:37 AM, Markus Armbruster wrote: > RFC 8259 on string contents: > > All Unicode characters may be placed within the quotation marks, > except for the characters that MUST be escaped: quotation mark, > reverse solidus, and the control characters (U+0000 through > U+001F). > > The QAPI schema parser accepts both less and more than JSON: it > accepts only ASCII with \u (less), and accepts control characters > other than LF (new line) unescaped. How it treats unescaped non-ASCII > input differs between Python 2 and Python 3. > > Make it accept strictly less: require printable ASCII. Drop support > for \b, \f, \n, \r, \t. Fair enough. It doesn't prevent QMP clients from sending strings with non-ASCII characters, merely that those strings will never match the schema because we have guaranteed our schema is limited to ASCII. (This change means we are promising to never allow { "execute": "a\tb" } as a valid QMP command, for instance.) > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > @@ -523,17 +523,7 @@ class QAPISchemaParser(object): > if ch == '\n': > raise QAPIParseError(self, 'Missing terminating "\'"') > if esc: > - if ch == 'b': > - string += '\b' > - elif ch == 'f': > - string += '\f' > - elif ch == 'n': > - string += '\n' Is it worth a comment in the code that we are specifically not parsing all possible JSON escapes, because of the later requirement that QAPI strings be limited to the subset of printable ASCII? Reviewed-by: Eric Blake <eblake@redhat.com>
Eric Blake <eblake@redhat.com> writes: > On 9/10/19 1:37 AM, Markus Armbruster wrote: >> RFC 8259 on string contents: >> >> All Unicode characters may be placed within the quotation marks, >> except for the characters that MUST be escaped: quotation mark, >> reverse solidus, and the control characters (U+0000 through >> U+001F). >> >> The QAPI schema parser accepts both less and more than JSON: it >> accepts only ASCII with \u (less), and accepts control characters >> other than LF (new line) unescaped. How it treats unescaped non-ASCII >> input differs between Python 2 and Python 3. >> >> Make it accept strictly less: require printable ASCII. Drop support >> for \b, \f, \n, \r, \t. > > Fair enough. It doesn't prevent QMP clients from sending strings with > non-ASCII characters, merely that those strings will never match the > schema because we have guaranteed our schema is limited to ASCII. (This Note that this only affects QMP commands, events, parameter and enum value names, *not* non-enum string arguments. > change means we are promising to never allow { "execute": "a\tb" } as a > valid QMP command, for instance.) We're not actually promising anything. We're merely making it slightly harder to do: need to revert this patch first. Feels quite unlikely, though. >> Signed-off-by: Markus Armbruster <armbru@redhat.com> >> --- > >> @@ -523,17 +523,7 @@ class QAPISchemaParser(object): >> if ch == '\n': >> raise QAPIParseError(self, 'Missing terminating "\'"') >> if esc: >> - if ch == 'b': >> - string += '\b' >> - elif ch == 'f': >> - string += '\f' >> - elif ch == 'n': >> - string += '\n' > > Is it worth a comment in the code that we are specifically not parsing > all possible JSON escapes, because of the later requirement that QAPI > strings be limited to the subset of printable ASCII? Can't hurt. > Reviewed-by: Eric Blake <eblake@redhat.com> Thanks!
diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py index 54d02458b5..c3dfad024f 100644 --- a/scripts/qapi/common.py +++ b/scripts/qapi/common.py @@ -523,17 +523,7 @@ class QAPISchemaParser(object): if ch == '\n': raise QAPIParseError(self, 'Missing terminating "\'"') if esc: - if ch == 'b': - string += '\b' - elif ch == 'f': - string += '\f' - elif ch == 'n': - string += '\n' - elif ch == 'r': - string += '\r' - elif ch == 't': - string += '\t' - elif ch == 'u': + if ch == 'u': value = 0 for _ in range(0, 4): ch = self.src[self.cursor] @@ -552,20 +542,21 @@ class QAPISchemaParser(object): 'For now, \\u escape ' 'only supports non-zero ' 'values up to \\u007f') - string += chr(value) - elif ch in '\\/\'"': - string += ch - else: + ch = chr(value) + elif ch not in '\\/\'"': raise QAPIParseError(self, "Unknown escape \\%s" % ch) esc = False elif ch == '\\': esc = True + continue elif ch == "'": self.val = string return - else: - string += ch + if ord(ch) < 32 or ord(ch) >= 127: + raise QAPIParseError( + self, "Funny character in string") + string += ch elif self.src.startswith('true', self.pos): self.val = True self.cursor += 3 diff --git a/tests/Makefile.include b/tests/Makefile.include index f5ac09549c..403748579f 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -452,6 +452,8 @@ qapi-schema += returns-array-bad.json qapi-schema += returns-dict.json qapi-schema += returns-unknown.json qapi-schema += returns-whitelist.json +qapi-schema += string-code-point-31.json +qapi-schema += string-code-point-127.json qapi-schema += struct-base-clash-deep.json qapi-schema += struct-base-clash.json qapi-schema += struct-data-invalid.json @@ -463,7 +465,6 @@ qapi-schema += type-bypass-bad-gen.json qapi-schema += unclosed-list.json qapi-schema += unclosed-object.json qapi-schema += unclosed-string.json -qapi-schema += unicode-str.json qapi-schema += union-base-empty.json qapi-schema += union-base-no-discriminator.json qapi-schema += union-branch-case.json diff --git a/tests/qapi-schema/string-code-point-127.err b/tests/qapi-schema/string-code-point-127.err new file mode 100644 index 0000000000..c310910c23 --- /dev/null +++ b/tests/qapi-schema/string-code-point-127.err @@ -0,0 +1 @@ +tests/qapi-schema/string-code-point-127.json:2:14: Funny character in string diff --git a/tests/qapi-schema/unicode-str.exit b/tests/qapi-schema/string-code-point-127.exit similarity index 100% rename from tests/qapi-schema/unicode-str.exit rename to tests/qapi-schema/string-code-point-127.exit diff --git a/tests/qapi-schema/string-code-point-127.json b/tests/qapi-schema/string-code-point-127.json new file mode 100644 index 0000000000..480318a69f --- /dev/null +++ b/tests/qapi-schema/string-code-point-127.json @@ -0,0 +1,2 @@ +# We accept printable ASCII: code points 32..126. Test code point 127: +{ 'command': '' } diff --git a/tests/qapi-schema/unicode-str.out b/tests/qapi-schema/string-code-point-127.out similarity index 100% rename from tests/qapi-schema/unicode-str.out rename to tests/qapi-schema/string-code-point-127.out diff --git a/tests/qapi-schema/string-code-point-31.err b/tests/qapi-schema/string-code-point-31.err new file mode 100644 index 0000000000..45797928d9 --- /dev/null +++ b/tests/qapi-schema/string-code-point-31.err @@ -0,0 +1 @@ +tests/qapi-schema/string-code-point-31.json:2:14: Funny character in string diff --git a/tests/qapi-schema/string-code-point-31.exit b/tests/qapi-schema/string-code-point-31.exit new file mode 100644 index 0000000000..d00491fd7e --- /dev/null +++ b/tests/qapi-schema/string-code-point-31.exit @@ -0,0 +1 @@ +1 diff --git a/tests/qapi-schema/string-code-point-31.json b/tests/qapi-schema/string-code-point-31.json new file mode 100644 index 0000000000..f186cbd720 --- /dev/null +++ b/tests/qapi-schema/string-code-point-31.json @@ -0,0 +1,2 @@ +# We accept printable ASCII: code points 32..126. Test code point 127: +{ 'command': '' } diff --git a/tests/qapi-schema/string-code-point-31.out b/tests/qapi-schema/string-code-point-31.out new file mode 100644 index 0000000000..e69de29bb2 diff --git a/tests/qapi-schema/unicode-str.err b/tests/qapi-schema/unicode-str.err deleted file mode 100644 index f621cd6448..0000000000 --- a/tests/qapi-schema/unicode-str.err +++ /dev/null @@ -1 +0,0 @@ -tests/qapi-schema/unicode-str.json:2: 'command' uses invalid name 'é' diff --git a/tests/qapi-schema/unicode-str.json b/tests/qapi-schema/unicode-str.json deleted file mode 100644 index 5253a1b9f3..0000000000 --- a/tests/qapi-schema/unicode-str.json +++ /dev/null @@ -1,2 +0,0 @@ -# we don't support full Unicode strings, yet -{ 'command': 'é' }
RFC 8259 on string contents: All Unicode characters may be placed within the quotation marks, except for the characters that MUST be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F). The QAPI schema parser accepts both less and more than JSON: it accepts only ASCII with \u (less), and accepts control characters other than LF (new line) unescaped. How it treats unescaped non-ASCII input differs between Python 2 and Python 3. Make it accept strictly less: require printable ASCII. Drop support for \b, \f, \n, \r, \t. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- scripts/qapi/common.py | 25 ++++++------------- tests/Makefile.include | 3 ++- tests/qapi-schema/string-code-point-127.err | 1 + ...de-str.exit => string-code-point-127.exit} | 0 tests/qapi-schema/string-code-point-127.json | 2 ++ ...code-str.out => string-code-point-127.out} | 0 tests/qapi-schema/string-code-point-31.err | 1 + tests/qapi-schema/string-code-point-31.exit | 1 + tests/qapi-schema/string-code-point-31.json | 2 ++ tests/qapi-schema/string-code-point-31.out | 0 tests/qapi-schema/unicode-str.err | 1 - tests/qapi-schema/unicode-str.json | 2 -- 12 files changed, 17 insertions(+), 21 deletions(-) create mode 100644 tests/qapi-schema/string-code-point-127.err rename tests/qapi-schema/{unicode-str.exit => string-code-point-127.exit} (100%) create mode 100644 tests/qapi-schema/string-code-point-127.json rename tests/qapi-schema/{unicode-str.out => string-code-point-127.out} (100%) create mode 100644 tests/qapi-schema/string-code-point-31.err create mode 100644 tests/qapi-schema/string-code-point-31.exit create mode 100644 tests/qapi-schema/string-code-point-31.json create mode 100644 tests/qapi-schema/string-code-point-31.out delete mode 100644 tests/qapi-schema/unicode-str.err delete mode 100644 tests/qapi-schema/unicode-str.json