diff mbox

[1/7] string-input-visitor: Fix uint64 parsing

Message ID 1443184788-18859-2-git-send-email-afaerber@suse.de
State New
Headers show

Commit Message

Andreas Färber Sept. 25, 2015, 12:39 p.m. UTC
All integers would get parsed by strtoll(), not handling the case of
UINT64 properties with the most significient bit set.

Implement a .type_uint64 visitor callback, reusing the existing
parse_str() code through a new argument, using strtoull().

As a bug fix, ignore warnings about preference of qemu_strto[u]ll().

Cc: qemu-stable@nongnu.org
Signed-off-by: Andreas Färber <afaerber@suse.de>
---
 qapi/string-input-visitor.c | 57 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 52 insertions(+), 5 deletions(-)

Comments

Eric Blake Sept. 25, 2015, 2:49 p.m. UTC | #1
On 09/25/2015 06:39 AM, Andreas Färber wrote:
> All integers would get parsed by strtoll(), not handling the case of
> UINT64 properties with the most significient bit set.
> 
> Implement a .type_uint64 visitor callback, reusing the existing
> parse_str() code through a new argument, using strtoull().
> 
> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Andreas Färber <afaerber@suse.de>
> ---
>  qapi/string-input-visitor.c | 57 +++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 52 insertions(+), 5 deletions(-)
> 

> @@ -50,7 +50,11 @@ static void parse_str(StringInputVisitor *siv, Error **errp)
>  
>      do {
>          errno = 0;
> -        start = strtoll(str, &endptr, 0);
> +        if (u64) {
> +            start = strtoull(str, &endptr, 0);

accepts the range [-ULLONG_MAX, ULLONG_MAX] (with 2s complement
wraparound). Do you really want -1 being a synonym for ULLONG_MAX, or do
you want to explicitly reject leading '-' when parsing unsigned
(arguments can be made for both behaviors; in fact, libvirt has two
separate wrappers for parsing uint64_t depending on which behavior is
wanted)

> +        } else {
> +            start = strtoll(str, &endptr, 0);

accepts the range [LLONG_MIN, LLONG_MAX] (that is, roughly half the
range of the unsigned version)

> +        }
>          if (errno == 0 && endptr > str) {
>              if (*endptr == '\0') {
>                  cur = g_malloc0(sizeof(*cur));
> @@ -60,7 +64,7 @@ static void parse_str(StringInputVisitor *siv, Error **errp)
>                                                            range_compare);
>                  cur = NULL;
>                  str = NULL;
> -            } else if (*endptr == '-') {
> +            } else if (*endptr == '-' && !u64) {

Why do you not want to handle ranges when using unsigned numbers?


>  
> +static void parse_type_uint64(Visitor *v, uint64_t *obj, const char *name,
> +                              Error **errp)
> +{
> +    StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
> +
> +    if (!siv->string) {
> +        error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
> +                   "integer");
> +        return;
> +    }
...

That's a lot of copy-and-paste. Can't you make parse_type_int64() and
parse_type_uint64() both call into a single helper method, that contains
the guts of the existing parse_type_int64() and adds a single parameter
for the one place where the two functions differ on their call to
parse_str()?
Markus Armbruster Sept. 30, 2015, 1:19 p.m. UTC | #2
Andreas Färber <afaerber@suse.de> writes:

> All integers would get parsed by strtoll(), not handling the case of
> UINT64 properties with the most significient bit set.

This mess is part of a bigger mess, I'm afraid.

The major ways integers get parsed are:

* QMP: parse_literal() in qmp/qobject/json-parser.c

  This is what parses QMP off the wire.

  RFC 7159 does not prescribe range or precision of JSON numbers.  Our
  implementation accepts the union of int64_t and double.

  If the lexer recognizes a floating-point number, we convert it with
  strtod() and represent it as double.

  If the lexer recognizes a decimal integer, and strtoll() can convert
  it, we represent it in int64_t.  Else, we convert it with strtod() and
  represent it as double.  Unclean: code assumes int64_t is long long.

  Yes, that means QMP can't currently support the full range of QAPI's
  uint64 type.

* QemuOpts: parse_option_number() in util/qemu-option.c

  This is what parses key=value,... strings for command line and other
  places.

  QemuOpts can be used in two ways.  If you fill out QemuOptDesc desc[],
  it rejects unknown keys and parses values of known keys.  If you leave
  it empty, it accepts all keys, and doesn't parse values.  Either way,
  it also stores raw string values.

  QemuOpts' parser only supports unsigned numbers, in decimal, octal and
  hex.  Error checking is very poor.  In particular, it considers
  negative input valid, and silently casts it to uint64_t.  I wouldn't
  be surprised if some code depended on that.

* String input visitor: parse_str() in qapi/string-input-visitor.c

  This appears to be used only by QOM so far:

  - object_property_get_enum()
  - object_property_get_uint16List()
  - object_property_parse()

  parse_str() appears to parse some fancy list syntax.  Comes from
  commit 659268f.  The commit message is useless.  I can't see offhand
  how this interacts with the visitor core.

  Anyway, if we ignore the fancy crap and just look at the parsing of a
  single integer, we see that it supports int64_t in decimal, octal and
  hex, it fails to check for ERANGE, and assumes int64_t is long long.

* Options visitor: opts_type_int() in opts qapi/opts-visitor.c

  This one is for converting QemuOpts to QAPI-defined C types.  It uses
  the raw string values, not the parsed ones.  The QemuOpts parser is
  neither needed nor wanted here.  You should use the options visitor
  with an empty desc[] array to bypass it.  Example: numa.c.

  We got fancy list syntax again.  This one looks like I could
  understand it with a bit of effort.  But let's look just at the
  parsing of a single integer.  It supports uint64_t in decimal, octal
  and hex, and *surprise* checks for errors carefully.

Fixing just a part of a mess can be okay.  I just don't want to lever
the bigger mess unmentioned.

> Implement a .type_uint64 visitor callback, reusing the existing
> parse_str() code through a new argument, using strtoull().

I'm afraid you're leaving the bug in the visitor core unfixed.

The (essentially undocumented) Visitor abstraction has the following
methods for integers:

* Mandatory: type_int()

  Interface uses int64_t for the value.  The implementation should
  ensure it fits into int64_t.

* Optional: type_int{8,16,32}()

  These use int{8,16,32}_t for the value.

  If present, it should ensure the value fits into the data type.

  If missing, the core falls back to type_int() plus appropriate range
  checking.

* Optional: type_int64()

  Same interface as type_int().

  If present, it should ensure the value fits into int64_t.

  If missing, the core falls back to type_int().

  Aside: setting type_int64() would be useful only when you want to
  distinguish QAPI types int and int64.  So far, nobody does.  In fact,
  nobody uses QAPI type int64!  I'm tempted to define QAPI type int as a
  mere alias for int64 and drop the redundant stuff.

* Optional: type_uint{8,16,32}()

  These use uint{8,16,32}_t for the value.

  If present, it should ensure the value fits into the data type.

  If missing, the core falls back to type_int() plus appropriate range
  checking.

* Optional: type_uint64()

  Now it gets interesting.  Interface uses uint64_t for the value.

  If present, it should ensure the value fits into uint64_t.

  If missing, the core falls back to type_int().  No range checking.  If
  type_int() performs range checking as it should, then uint64_t values
  not representable in int64_t get rejected (wrong), and negative values
  representable in int64_t get cast to uint64_t (also wrong).

  I think we need to make type_uint64() mandatory, and drop the
  fallback.

* Optional: type_size()

  Same interface as type_uint64().

  If present, it should ensure the value fits into uint64_t.

  If missing, the core first tries falling back to type_uint64() and
  then to type_int().  Falling back to type_int() is as wrong here as it
  is in type_uint64().

> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().

I'm not sure I get this sentence.

> Cc: qemu-stable@nongnu.org
> Signed-off-by: Andreas Färber <afaerber@suse.de>

On the actual patch, I have nothing to add over Eric's review right now.
Andreas Färber Sept. 30, 2015, 1:23 p.m. UTC | #3
Am 30.09.2015 um 15:19 schrieb Markus Armbruster:
> Andreas Färber <afaerber@suse.de> writes:
>> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
> 
> I'm not sure I get this sentence.

This patch causes checkpatch warnings. I intentionally do not address
them in this bug-fix patch, but instead in a later patch in the series.

Andreas
Eric Blake Sept. 30, 2015, 1:47 p.m. UTC | #4
On 09/30/2015 07:19 AM, Markus Armbruster wrote:

> 
> The (essentially undocumented) Visitor abstraction has the following
> methods for integers:

I proposed documentation at:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05434.html

> 
> * Mandatory: type_int()
> 
>   Interface uses int64_t for the value.  The implementation should
>   ensure it fits into int64_t.
> 
> * Optional: type_int{8,16,32}()
> 
>   These use int{8,16,32}_t for the value.
> 
>   If present, it should ensure the value fits into the data type.
> 
>   If missing, the core falls back to type_int() plus appropriate range
>   checking.

No one implements them.  In fact, as part of preparing my documentation,
I actually proposed simplifying the visitor callback interface to drop them:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05432.html

> 
> * Optional: type_int64()
> 
>   Same interface as type_int().
> 
>   If present, it should ensure the value fits into int64_t.
> 
>   If missing, the core falls back to type_int().
> 
>   Aside: setting type_int64() would be useful only when you want to
>   distinguish QAPI types int and int64.  So far, nobody does.  In fact,
>   nobody uses QAPI type int64!  I'm tempted to define QAPI type int as a
>   mere alias for int64 and drop the redundant stuff.

Already part of my proposal.

> 
> * Optional: type_uint{8,16,32}()
> 
>   These use uint{8,16,32}_t for the value.
> 
>   If present, it should ensure the value fits into the data type.
> 
>   If missing, the core falls back to type_int() plus appropriate range
>   checking.

Also unused, and simplified above.

> 
> * Optional: type_uint64()
> 
>   Now it gets interesting.  Interface uses uint64_t for the value.
> 
>   If present, it should ensure the value fits into uint64_t.
> 
>   If missing, the core falls back to type_int().  No range checking.  If
>   type_int() performs range checking as it should, then uint64_t values
>   not representable in int64_t get rejected (wrong), and negative values
>   representable in int64_t get cast to uint64_t (also wrong).
> 
>   I think we need to make type_uint64() mandatory, and drop the
>   fallback.

Probably a good idea, although not done in my proposed patches.

> 
> * Optional: type_size()
> 
>   Same interface as type_uint64().
> 
>   If present, it should ensure the value fits into uint64_t.
> 
>   If missing, the core first tries falling back to type_uint64() and
>   then to type_int().  Falling back to type_int() is as wrong here as it
>   is in type_uint64().

Provided by the QemuOpts parser to allow '1k' to mean 1024, and so on.

> 
>> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
> 
> I'm not sure I get this sentence.
> 
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Andreas Färber <afaerber@suse.de>
> 
> On the actual patch, I have nothing to add over Eric's review right now.
>
Eric Blake Sept. 30, 2015, 1:48 p.m. UTC | #5
On 09/30/2015 07:23 AM, Andreas Färber wrote:
> Am 30.09.2015 um 15:19 schrieb Markus Armbruster:
>> Andreas Färber <afaerber@suse.de> writes:
>>> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
>>
>> I'm not sure I get this sentence.
> 
> This patch causes checkpatch warnings. I intentionally do not address
> them in this bug-fix patch, but instead in a later patch in the series.

Maybe:

As this is a bug fix, the patch intentionally ignores checkpatch
warnings to prefer the use of qemu_strto[u]ll() to minimize size; a
later patch will further address that issue.
Andreas Färber Nov. 11, 2015, 7:26 p.m. UTC | #6
Am 25.09.2015 um 16:49 schrieb Eric Blake:
> On 09/25/2015 06:39 AM, Andreas Färber wrote:
>> All integers would get parsed by strtoll(), not handling the case of
>> UINT64 properties with the most significient bit set.
>>
>> Implement a .type_uint64 visitor callback, reusing the existing
>> parse_str() code through a new argument, using strtoull().
>>
>> As a bug fix, ignore warnings about preference of qemu_strto[u]ll().
>>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Andreas Färber <afaerber@suse.de>
>> ---
>>  qapi/string-input-visitor.c | 57 +++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 52 insertions(+), 5 deletions(-)
>>
> 
>> @@ -50,7 +50,11 @@ static void parse_str(StringInputVisitor *siv, Error **errp)
>>  
>>      do {
>>          errno = 0;
>> -        start = strtoll(str, &endptr, 0);
>> +        if (u64) {
>> +            start = strtoull(str, &endptr, 0);
> 
> accepts the range [-ULLONG_MAX, ULLONG_MAX] (with 2s complement
> wraparound). Do you really want -1 being a synonym for ULLONG_MAX, or do
> you want to explicitly reject leading '-' when parsing unsigned
> (arguments can be made for both behaviors; in fact, libvirt has two
> separate wrappers for parsing uint64_t depending on which behavior is
> wanted)
> 
>> +        } else {
>> +            start = strtoll(str, &endptr, 0);
> 
> accepts the range [LLONG_MIN, LLONG_MAX] (that is, roughly half the
> range of the unsigned version)

No one has further commented on this, so I take it no further changes
are required here for now.

>> +        }
>>          if (errno == 0 && endptr > str) {
>>              if (*endptr == '\0') {
>>                  cur = g_malloc0(sizeof(*cur));
>> @@ -60,7 +64,7 @@ static void parse_str(StringInputVisitor *siv, Error **errp)
>>                                                            range_compare);
>>                  cur = NULL;
>>                  str = NULL;
>> -            } else if (*endptr == '-') {
>> +            } else if (*endptr == '-' && !u64) {
> 
> Why do you not want to handle ranges when using unsigned numbers?

For some reason I must've read this as handling negative numbers, which
we wouldn't have for unsigned numbers...

However, since there is only one .start_list() callback, which passes
!u64 to retain previous behavior, we would never actually run into this
code path today. I've reverted my change and duplicated the strtoull()
handling instead nonetheless.

>>  
>> +static void parse_type_uint64(Visitor *v, uint64_t *obj, const char *name,
>> +                              Error **errp)
>> +{
>> +    StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
>> +
>> +    if (!siv->string) {
>> +        error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
>> +                   "integer");
>> +        return;
>> +    }
> ...
> 
> That's a lot of copy-and-paste. Can't you make parse_type_int64() and
> parse_type_uint64() both call into a single helper method, that contains
> the guts of the existing parse_type_int64() and adds a single parameter
> for the one place where the two functions differ on their call to
> parse_str()?

I don't see how. They have different signatures, and there's a lot of
gotos that differ in the error message. I'm all for sharing code but it
seems more work refactoring that code for reuse than duplication saved.
If you have a concrete suggestion how to improve it, please share a diff
or let's do that as follow-up.

Regards,
Andreas
diff mbox

Patch

diff --git a/qapi/string-input-visitor.c b/qapi/string-input-visitor.c
index bbd6a54..cf81f85 100644
--- a/qapi/string-input-visitor.c
+++ b/qapi/string-input-visitor.c
@@ -37,7 +37,7 @@  static void free_range(void *range, void *dummy)
     g_free(range);
 }
 
-static void parse_str(StringInputVisitor *siv, Error **errp)
+static void parse_str(StringInputVisitor *siv, bool u64, Error **errp)
 {
     char *str = (char *) siv->string;
     long long start, end;
@@ -50,7 +50,11 @@  static void parse_str(StringInputVisitor *siv, Error **errp)
 
     do {
         errno = 0;
-        start = strtoll(str, &endptr, 0);
+        if (u64) {
+            start = strtoull(str, &endptr, 0);
+        } else {
+            start = strtoll(str, &endptr, 0);
+        }
         if (errno == 0 && endptr > str) {
             if (*endptr == '\0') {
                 cur = g_malloc0(sizeof(*cur));
@@ -60,7 +64,7 @@  static void parse_str(StringInputVisitor *siv, Error **errp)
                                                           range_compare);
                 cur = NULL;
                 str = NULL;
-            } else if (*endptr == '-') {
+            } else if (*endptr == '-' && !u64) {
                 str = endptr + 1;
                 errno = 0;
                 end = strtoll(str, &endptr, 0);
@@ -122,7 +126,7 @@  start_list(Visitor *v, const char *name, Error **errp)
 {
     StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
 
-    parse_str(siv, errp);
+    parse_str(siv, false, errp);
 
     siv->cur_range = g_list_first(siv->ranges);
     if (siv->cur_range) {
@@ -190,7 +194,7 @@  static void parse_type_int(Visitor *v, int64_t *obj, const char *name,
         return;
     }
 
-    parse_str(siv, errp);
+    parse_str(siv, false, errp);
 
     if (!siv->ranges) {
         goto error;
@@ -221,6 +225,48 @@  error:
                "an int64 value or range");
 }
 
+static void parse_type_uint64(Visitor *v, uint64_t *obj, const char *name,
+                              Error **errp)
+{
+    StringInputVisitor *siv = DO_UPCAST(StringInputVisitor, visitor, v);
+
+    if (!siv->string) {
+        error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
+                   "integer");
+        return;
+    }
+
+    parse_str(siv, true, errp);
+
+    if (!siv->ranges) {
+        goto error;
+    }
+
+    if (!siv->cur_range) {
+        Range *r;
+
+        siv->cur_range = g_list_first(siv->ranges);
+        if (!siv->cur_range) {
+            goto error;
+        }
+
+        r = siv->cur_range->data;
+        if (!r) {
+            goto error;
+        }
+
+        siv->cur = r->begin;
+    }
+
+    *obj = siv->cur;
+    siv->cur++;
+    return;
+
+error:
+    error_setg(errp, QERR_INVALID_PARAMETER_VALUE, name,
+               "a uint64 value or range");
+}
+
 static void parse_type_size(Visitor *v, uint64_t *obj, const char *name,
                             Error **errp)
 {
@@ -332,6 +378,7 @@  StringInputVisitor *string_input_visitor_new(const char *str)
 
     v->visitor.type_enum = input_type_enum;
     v->visitor.type_int = parse_type_int;
+    v->visitor.type_uint64 = parse_type_uint64;
     v->visitor.type_size = parse_type_size;
     v->visitor.type_bool = parse_type_bool;
     v->visitor.type_str = parse_type_str;