diff mbox

[1/8] cutils: unsigned int parsing functions

Message ID 1358360933-5323-2-git-send-email-ehabkost@redhat.com
State New
Headers show

Commit Message

Eduardo Habkost Jan. 16, 2013, 6:28 p.m. UTC
There are lots of duplicate parsing code using strto*() in QEMU, and
most of that code is broken in one way or another. Even the visitors
code have duplicate integer parsing code[1]. This introduces functions
to help parsing unsigned int values: parse_uint() and parse_uint_full().

Parsing functions for signed ints and floats will be submitted later.

parse_uint_full() has all the checks made by opts_type_uint64() at
opts-visitor.c:

 - Check for NULL (returns -EINVAL)
 - Check for negative numbers (returns -ERANGE)
 - Check for empty string (returns -EINVAL)
 - Check for overflow or other errno values set by strtoll() (returns
   -errno)
 - Check for end of string (reject invalid characters after number)
   (returns -EINVAL)

parse_uint() does everything above except checking for the end of the
string, so callers can continue parsing the remainder of string after
the number.

Unit tests included.

[1] string-input-visitor.c:parse_int() could use the same parsing code
    used by opts-visitor.c:opts_type_int(), instead of duplicating that
    logic.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Eric Blake <eblake@redhat.com>

Changes v2:
 - Trivial whitespace change
 - Add 'base' parameter to the functions
---
 include/qemu-common.h |   4 +
 tests/Makefile        |   3 +
 tests/test-cutils.c   | 216 ++++++++++++++++++++++++++++++++++++++++++++++++++
 util/cutils.c         |  79 ++++++++++++++++++
 4 files changed, 302 insertions(+)
 create mode 100644 tests/test-cutils.c

Comments

Laszlo Ersek Jan. 17, 2013, 6:29 p.m. UTC | #1
On 01/16/13 19:28, Eduardo Habkost wrote:

> +static void test_parse_uint_negative(void)
> +{
> +    unsigned long long i = 999;
> +    char f = 'X';
> +    char *endptr = &f;
> +    const char *str = " \t -321";
> +    int r;
> +
> +    r = parse_uint(str, &i, &endptr, 0);
> +
> +    g_assert_cmpint(r, ==, -ERANGE);
> +    g_assert_cmpint(i, ==, 0);
> +    g_assert(endptr == str + 3);
> +}

I think it would be more true to the strtol() family if in this case

(a) we reported -EINVAL (invalid subject sequence) -- but I certainly
don't insist on that,

(b) and, independently,

(b1) we either consumed all of the whitespace sequence *and* the subject
sequence (which would be consistent with ERANGE; see
test_parse_uint_overflow()),

(b2) or we didn't consume anything (not even part of the whitespace
sequence). This would be easy to implement and also consistent with the
strtol() family's behavior when it sees an invalid subject sequence:

"If the subject sequence is empty or does not have the expected form, no
conversion is performed; the value of /str/ is stored in the object
pointed to by /endptr/, provided that /endptr/ is not a null pointer."

But I don't insist on (b) either :)

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Eduardo Habkost Jan. 17, 2013, 6:50 p.m. UTC | #2
On Thu, Jan 17, 2013 at 07:29:17PM +0100, Laszlo Ersek wrote:
> On 01/16/13 19:28, Eduardo Habkost wrote:
> 
> > +static void test_parse_uint_negative(void)
> > +{
> > +    unsigned long long i = 999;
> > +    char f = 'X';
> > +    char *endptr = &f;
> > +    const char *str = " \t -321";
> > +    int r;
> > +
> > +    r = parse_uint(str, &i, &endptr, 0);
> > +
> > +    g_assert_cmpint(r, ==, -ERANGE);
> > +    g_assert_cmpint(i, ==, 0);
> > +    g_assert(endptr == str + 3);
> > +}
> 
> I think it would be more true to the strtol() family if in this case
> 
> (a) we reported -EINVAL (invalid subject sequence) -- but I certainly
> don't insist on that,

It makes sense, as we didn't consume any number and simply aborted
parsing as soon as '-' was found.

> 
> (b) and, independently,
> 
> (b1) we either consumed all of the whitespace sequence *and* the subject
> sequence (which would be consistent with ERANGE; see
> test_parse_uint_overflow()),
> 
> (b2) or we didn't consume anything (not even part of the whitespace
> sequence). This would be easy to implement and also consistent with the
> strtol() family's behavior when it sees an invalid subject sequence:

This makes sense as well, especially if we return -EINVAL.

I will submit a new version of just this patch. Thanks!

> 
> "If the subject sequence is empty or does not have the expected form, no
> conversion is performed; the value of /str/ is stored in the object
> pointed to by /endptr/, provided that /endptr/ is not a null pointer."
> 
> But I don't insist on (b) either :)
> 
> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Markus Armbruster Jan. 18, 2013, 10:01 a.m. UTC | #3
Eduardo Habkost <ehabkost@redhat.com> writes:

> There are lots of duplicate parsing code using strto*() in QEMU, and
> most of that code is broken in one way or another. Even the visitors
> code have duplicate integer parsing code[1]. This introduces functions
> to help parsing unsigned int values: parse_uint() and parse_uint_full().
>
> Parsing functions for signed ints and floats will be submitted later.
>
> parse_uint_full() has all the checks made by opts_type_uint64() at
> opts-visitor.c:
>
>  - Check for NULL (returns -EINVAL)
>  - Check for negative numbers (returns -ERANGE)
>  - Check for empty string (returns -EINVAL)
>  - Check for overflow or other errno values set by strtoll() (returns
>    -errno)
>  - Check for end of string (reject invalid characters after number)
>    (returns -EINVAL)
>
> parse_uint() does everything above except checking for the end of the
> string, so callers can continue parsing the remainder of string after
> the number.
>
> Unit tests included.
>
> [1] string-input-visitor.c:parse_int() could use the same parsing code
>     used by opts-visitor.c:opts_type_int(), instead of duplicating that
>     logic.
[...]
> diff --git a/util/cutils.c b/util/cutils.c
> index 80bb1dc..7f09251 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end)
>      return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
>  }
>  
> +/* Try to parse an unsigned integer
> + *
> + * Error checks done by the function:
> + * - NULL pointer will return -EINVAL.
> + * - Empty strings will return -EINVAL.

Same for strings containing only whitespace.

> + * - Overflow errors or other errno values  set by strtoull() will

Extra space before 'set'.

> + *   return -errno (-ERANGE in case of overflow).
> + * - Differently from strtoull(), values starting with a minus sign are
> + *   rejected (returning -ERANGE).
> + *
> + * Sets endptr to point to the first invalid character.

Mention that unlike strtol() & friends, endptr must not be null?
Probably not necessary.

The two ERANGE cases treat endptr differently: it's either set to point
just behind the out-of-range number, or to point to the beginning of the
out-of-range number.  Ugh.

> +                                                        Callers may rely
> + * on *value and *endptr to be always set by the function, even in case of
> + * errors.

You neglect to specify what gets assigned to *value in error cases.

Your list of error checks isn't quite complete.  Here's my try:

/**
 * Parse unsigned integer
 *
 * @s: String to parse
 * @value: Destination for parsed integer value
 * @endptr: Destination for pointer to first character not consumed
 * @base: integer base, between 2 and 36 inclusive, or 0
 *
 * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
 * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
 * more digits.
 *
 * If @s is null, or @base is invalid, or @s doesn't start with an
 * integer in the syntax above, set *@value to 0, *@endptr to @s, and
 * return -EINVAL.
 *
 * Set @endptr to point right beyond the parsed integer.
 *
 * If the integer is negative, set *@value to 0, and return -ERANGE.
 * If the integer overflows unsigned long long, set *@value to
 * ULLONG_MAX, and return -ERANGE.
 * Else, set *@value to the parsed integer, and return 0.
 */

> + *
> + * The 'base' parameter has the same meaning of 'base' on strtoull().
> + *
> + * Returns 0 on success, negative errno value on error.
> + */
> +int parse_uint(const char *s, unsigned long long *value, char **endptr,
> +               int base)
> +{
> +    int r = 0;
> +    char *endp = (char *)s;
> +    unsigned long long val = 0;
> +
> +    if (!s) {
> +        r = -EINVAL;
> +        goto out;
> +    }
> +
> +    /* make sure we reject negative numbers: */
> +    while (isspace((unsigned char)*s)) {
> +        ++s; endp++;

Mixing pre- and post-increment like that is ugly.  Recommend to stick to
post-increment.

I'd set endp after the loop.  Right before goto.

Or simply change the specification to set *endptr to point beyond the
integer instead of to the '-'.  I took the liberty to do exactly that in
my proposed function comment.

> +    }
> +    if (*s == '-') {
> +        r = -ERANGE;
> +        goto out;
> +    }

This creates the special case that made me go "ugh" above.  Suggest to
drop the if here, and check right after strtoull() instead, as shown
below.  That way, we get the same endptr behavior for all out-of-range
numbers.

> +
> +    errno = 0;
> +    val = strtoull(s, &endp, base);

       if (*s = '-') {
           r = -ERANGE;
           val = 0;
           goto out;
       }

> +    if (errno) {
> +        r = -errno;
> +        goto out;
> +    }
> +
> +    if (endp == s) {
> +        r = -EINVAL;
> +        goto out;
> +    }
> +
> +out:
> +    *value = val;
> +    *endptr = endp;
> +    return r;
> +}
> +
> +/* Try to parse an unsigned integer, making sure the whole string is parsed
> + *
> + * Have the same behavior of parse_uint(), but with an additional check
> + * for additional data after the parsed number (in that case, the function
> + * will return -EINVAL).

What's assigned to *value then?

> + */

Alternatively, you could make into parse_uint() do that check when
endptr is null, and drop this function.  Matter of taste.

> +int parse_uint_full(const char *s, unsigned long long *value, int base)
> +{
> +    char *endp;
> +    int r;
> +
> +    r = parse_uint(s, value, &endp, base);
> +    if (r < 0) {
> +        return r;
> +    }
> +    if (*endp) {

Answering my own question: the parsed integer is assigned to *value then.

> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
>  int qemu_parse_fd(const char *param)
>  {
>      int fd;
Eduardo Habkost Jan. 18, 2013, 1:26 p.m. UTC | #4
On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > There are lots of duplicate parsing code using strto*() in QEMU, and
> > most of that code is broken in one way or another. Even the visitors
> > code have duplicate integer parsing code[1]. This introduces functions
> > to help parsing unsigned int values: parse_uint() and parse_uint_full().
> >
> > Parsing functions for signed ints and floats will be submitted later.
> >
> > parse_uint_full() has all the checks made by opts_type_uint64() at
> > opts-visitor.c:
> >
> >  - Check for NULL (returns -EINVAL)
> >  - Check for negative numbers (returns -ERANGE)
> >  - Check for empty string (returns -EINVAL)
> >  - Check for overflow or other errno values set by strtoll() (returns
> >    -errno)
> >  - Check for end of string (reject invalid characters after number)
> >    (returns -EINVAL)
> >
> > parse_uint() does everything above except checking for the end of the
> > string, so callers can continue parsing the remainder of string after
> > the number.
> >
> > Unit tests included.
> >
> > [1] string-input-visitor.c:parse_int() could use the same parsing code
> >     used by opts-visitor.c:opts_type_int(), instead of duplicating that
> >     logic.
> [...]
> > diff --git a/util/cutils.c b/util/cutils.c
> > index 80bb1dc..7f09251 100644
> > --- a/util/cutils.c
> > +++ b/util/cutils.c
> > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end)
> >      return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
> >  }
> >  
> > +/* Try to parse an unsigned integer
> > + *
> > + * Error checks done by the function:
> > + * - NULL pointer will return -EINVAL.
> > + * - Empty strings will return -EINVAL.
> 
> Same for strings containing only whitespace.

Oh, you're right.

> 
> > + * - Overflow errors or other errno values  set by strtoull() will
> 
> Extra space before 'set'.

Oops.

> 
> > + *   return -errno (-ERANGE in case of overflow).
> > + * - Differently from strtoull(), values starting with a minus sign are
> > + *   rejected (returning -ERANGE).
> > + *
> > + * Sets endptr to point to the first invalid character.
> 
> Mention that unlike strtol() & friends, endptr must not be null?
> Probably not necessary.

I believe it is implicit if I document it as "Sets *endptr". But worth
documenting as it is different from strtoull() behavior.

> 
> The two ERANGE cases treat endptr differently: it's either set to point
> just behind the out-of-range number, or to point to the beginning of the
> out-of-range number.  Ugh.

This should be fixed in the newer version I sent.

> 
> > +                                                        Callers may rely
> > + * on *value and *endptr to be always set by the function, even in case of
> > + * errors.
> 
> You neglect to specify what gets assigned to *value in error cases.
> 

Undefined.  :-)

Anyway, I agree it not very useful to document that "*value and *endptr
will be always set" if it is undefined. I will try to reword it.


> Your list of error checks isn't quite complete.  Here's my try:
> 
> /**
>  * Parse unsigned integer
>  *
>  * @s: String to parse
>  * @value: Destination for parsed integer value
>  * @endptr: Destination for pointer to first character not consumed
>  * @base: integer base, between 2 and 36 inclusive, or 0
>  *
>  * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
>  * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
>  * more digits.

The newer version has '-' as _not_ part of the syntax.

>  *
>  * If @s is null, or @base is invalid, or @s doesn't start with an
>  * integer in the syntax above, set *@value to 0, *@endptr to @s, and
>  * return -EINVAL.

This matches the behavior of the newer version. But I would like to keep
*value documented as undefined in case of errors.

>  *
>  * Set @endptr to point right beyond the parsed integer.
>  *
>  * If the integer is negative, set *@value to 0, and return -ERANGE.

This isn't the case in the newer version.

>  * If the integer overflows unsigned long long, set *@value to
>  * ULLONG_MAX, and return -ERANGE.
>  * Else, set *@value to the parsed integer, and return 0.
>  */

Thanks for taking the time to write this. I hate having to look up
what's the right syntax to use in docstrings, so I often just write
plain comments before functions.  :-)

I have a minor problem with the documentation above: as a developer, the
most important question I have is: what's the difference between using
parse_uint() and using strtoull() directly? But maybe it is a good
thing: instead of referencing the strtoull() specification (and an
implementation detail), now we have a well-defined and well-documented
behavior.

> 
> > + *
> > + * The 'base' parameter has the same meaning of 'base' on strtoull().
> > + *
> > + * Returns 0 on success, negative errno value on error.
> > + */
> > +int parse_uint(const char *s, unsigned long long *value, char **endptr,
> > +               int base)
> > +{
> > +    int r = 0;
> > +    char *endp = (char *)s;
> > +    unsigned long long val = 0;
> > +
> > +    if (!s) {
> > +        r = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +    /* make sure we reject negative numbers: */
> > +    while (isspace((unsigned char)*s)) {
> > +        ++s; endp++;
> 
> Mixing pre- and post-increment like that is ugly.  Recommend to stick to
> post-increment.

Agreed. I blame the copy & paste I made from opts-visitor. Later I added
the postfix increment without noticing how ugly it looked like

Anyway, this was changed in the newer patch version.

> 
> I'd set endp after the loop.  Right before goto.

Fixed in the newer version.

> 
> Or simply change the specification to set *endptr to point beyond the
> integer instead of to the '-'.  I took the liberty to do exactly that in
> my proposed function comment.

The newer version was changed to return -EINVAL and set endptr to the
beginning of the string.

> 
> > +    }
> > +    if (*s == '-') {
> > +        r = -ERANGE;
> > +        goto out;
> > +    }
> 
> This creates the special case that made me go "ugh" above.  Suggest to
> drop the if here, and check right after strtoull() instead, as shown
> below.  That way, we get the same endptr behavior for all out-of-range
> numbers.

Is returning -EINVAL acceptable for you, as well? In your proposal we
consider "-1234" part of the language but out-of-range. Returning
-EINVAL, we consider "-1234" not part of the language, and invalid
input. The newer version returns -EINVAL.

> 
> > +
> > +    errno = 0;
> > +    val = strtoull(s, &endp, base);
> 
>        if (*s = '-') {
>            r = -ERANGE;
>            val = 0;
>            goto out;
>        }

This has another bug: makes endptr point to the middle of the string in
case we are parsing "   xxx" or "   -1234".

> 
> > +    if (errno) {
> > +        r = -errno;
> > +        goto out;
> > +    }
> > +
> > +    if (endp == s) {
> > +        r = -EINVAL;
> > +        goto out;
> > +    }
> > +
> > +out:
> > +    *value = val;
> > +    *endptr = endp;
> > +    return r;
> > +}
> > +
> > +/* Try to parse an unsigned integer, making sure the whole string is parsed
> > + *
> > + * Have the same behavior of parse_uint(), but with an additional check
> > + * for additional data after the parsed number (in that case, the function
> > + * will return -EINVAL).
> 
> What's assigned to *value then?

Undefined.  :-)


> 
> > + */
> 
> Alternatively, you could make into parse_uint() do that check when
> endptr is null, and drop this function.  Matter of taste.

I considered doing that, but I believe it would be too subtle.

> 
> > +int parse_uint_full(const char *s, unsigned long long *value, int base)
> > +{
> > +    char *endp;
> > +    int r;
> > +
> > +    r = parse_uint(s, value, &endp, base);
> > +    if (r < 0) {
> > +        return r;
> > +    }
> > +    if (*endp) {
> 
> Answering my own question: the parsed integer is assigned to *value then.

I prefer to document it as undefined. If you want to use the parsed
integer even if there's additional data after it, you should use
parse_int() instead.

> 
> > +        return -EINVAL;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  int qemu_parse_fd(const char *param)
> >  {
> >      int fd;
Andreas Färber Jan. 18, 2013, 1:32 p.m. UTC | #5
Am 18.01.2013 14:26, schrieb Eduardo Habkost:
> 
> On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote:
>> Your list of error checks isn't quite complete.  Here's my try:

Even better would be:

>> /**

 * parse_uint:

>>  * @s: String to parse
>>  * @value: Destination for parsed integer value
>>  * @endptr: Destination for pointer to first character not consumed
>>  * @base: integer base, between 2 and 36 inclusive, or 0
>>  *

 * Parse unsigned integer

>>  * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
>>  * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
>>  * more digits.
[snip]

Cheers,
Andreas
Markus Armbruster Jan. 18, 2013, 6:06 p.m. UTC | #6
Eduardo Habkost <ehabkost@redhat.com> writes:

> On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>> 
>> > There are lots of duplicate parsing code using strto*() in QEMU, and
>> > most of that code is broken in one way or another. Even the visitors
>> > code have duplicate integer parsing code[1]. This introduces functions
>> > to help parsing unsigned int values: parse_uint() and parse_uint_full().
>> >
>> > Parsing functions for signed ints and floats will be submitted later.
>> >
>> > parse_uint_full() has all the checks made by opts_type_uint64() at
>> > opts-visitor.c:
>> >
>> >  - Check for NULL (returns -EINVAL)
>> >  - Check for negative numbers (returns -ERANGE)
>> >  - Check for empty string (returns -EINVAL)
>> >  - Check for overflow or other errno values set by strtoll() (returns
>> >    -errno)
>> >  - Check for end of string (reject invalid characters after number)
>> >    (returns -EINVAL)
>> >
>> > parse_uint() does everything above except checking for the end of the
>> > string, so callers can continue parsing the remainder of string after
>> > the number.
>> >
>> > Unit tests included.
>> >
>> > [1] string-input-visitor.c:parse_int() could use the same parsing code
>> >     used by opts-visitor.c:opts_type_int(), instead of duplicating that
>> >     logic.
>> [...]
>> > diff --git a/util/cutils.c b/util/cutils.c
>> > index 80bb1dc..7f09251 100644
>> > --- a/util/cutils.c
>> > +++ b/util/cutils.c
>> > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end)
>> >      return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
>> >  }
>> >  
>> > +/* Try to parse an unsigned integer
>> > + *
>> > + * Error checks done by the function:
>> > + * - NULL pointer will return -EINVAL.
>> > + * - Empty strings will return -EINVAL.
>> 
>> Same for strings containing only whitespace.
>
> Oh, you're right.
>
>> 
>> > + * - Overflow errors or other errno values  set by strtoull() will
>> 
>> Extra space before 'set'.
>
> Oops.
>
>> 
>> > + *   return -errno (-ERANGE in case of overflow).
>> > + * - Differently from strtoull(), values starting with a minus sign are
>> > + *   rejected (returning -ERANGE).
>> > + *
>> > + * Sets endptr to point to the first invalid character.
>> 
>> Mention that unlike strtol() & friends, endptr must not be null?
>> Probably not necessary.
>
> I believe it is implicit if I document it as "Sets *endptr". But worth
> documenting as it is different from strtoull() behavior.
>
>> 
>> The two ERANGE cases treat endptr differently: it's either set to point
>> just behind the out-of-range number, or to point to the beginning of the
>> out-of-range number.  Ugh.
>
> This should be fixed in the newer version I sent.
>
>> 
>> > +                                                        Callers may rely
>> > + * on *value and *endptr to be always set by the function, even in case of
>> > + * errors.
>> 
>> You neglect to specify what gets assigned to *value in error cases.
>> 
>
> Undefined.  :-)
>
> Anyway, I agree it not very useful to document that "*value and *endptr
> will be always set" if it is undefined. I will try to reword it.

Yes, please.

I'm not fond of undefined behavior, unless there's a good reason.

If you don't want to assign a defined value, consider not assigning
anything.

>> Your list of error checks isn't quite complete.  Here's my try:
>> 
>> /**
>>  * Parse unsigned integer
>>  *
>>  * @s: String to parse
>>  * @value: Destination for parsed integer value
>>  * @endptr: Destination for pointer to first character not consumed
>>  * @base: integer base, between 2 and 36 inclusive, or 0
>>  *
>>  * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
>>  * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
>>  * more digits.
>
> The newer version has '-' as _not_ part of the syntax.
>
>>  *
>>  * If @s is null, or @base is invalid, or @s doesn't start with an
>>  * integer in the syntax above, set *@value to 0, *@endptr to @s, and
>>  * return -EINVAL.
>
> This matches the behavior of the newer version. But I would like to keep
> *value documented as undefined in case of errors.
>
>>  *
>>  * Set @endptr to point right beyond the parsed integer.
>>  *
>>  * If the integer is negative, set *@value to 0, and return -ERANGE.
>
> This isn't the case in the newer version.
>
>>  * If the integer overflows unsigned long long, set *@value to
>>  * ULLONG_MAX, and return -ERANGE.
>>  * Else, set *@value to the parsed integer, and return 0.
>>  */
>
> Thanks for taking the time to write this. I hate having to look up
> what's the right syntax to use in docstrings, so I often just write
> plain comments before functions.  :-)

I don't particularly like GLib-style doc-strings, but it's what we use
when we use anything, which is rarely.

> I have a minor problem with the documentation above: as a developer, the
> most important question I have is: what's the difference between using
> parse_uint() and using strtoull() directly? But maybe it is a good
> thing: instead of referencing the strtoull() specification (and an
> implementation detail), now we have a well-defined and well-documented
> behavior.

I understand why you're asking for the difference to stroull().  Trouble
is strtoull() is complex, and often misunderstood.

>> > + *
>> > + * The 'base' parameter has the same meaning of 'base' on strtoull().
>> > + *
>> > + * Returns 0 on success, negative errno value on error.
>> > + */
>> > +int parse_uint(const char *s, unsigned long long *value, char **endptr,
>> > +               int base)
>> > +{
>> > +    int r = 0;
>> > +    char *endp = (char *)s;
>> > +    unsigned long long val = 0;
>> > +
>> > +    if (!s) {
>> > +        r = -EINVAL;
>> > +        goto out;
>> > +    }
>> > +
>> > +    /* make sure we reject negative numbers: */
>> > +    while (isspace((unsigned char)*s)) {
>> > +        ++s; endp++;
>> 
>> Mixing pre- and post-increment like that is ugly.  Recommend to stick to
>> post-increment.
>
> Agreed. I blame the copy & paste I made from opts-visitor. Later I added
> the postfix increment without noticing how ugly it looked like
>
> Anyway, this was changed in the newer patch version.
>
>> 
>> I'd set endp after the loop.  Right before goto.
>
> Fixed in the newer version.
>
>> 
>> Or simply change the specification to set *endptr to point beyond the
>> integer instead of to the '-'.  I took the liberty to do exactly that in
>> my proposed function comment.
>
> The newer version was changed to return -EINVAL and set endptr to the
> beginning of the string.
>
>> 
>> > +    }
>> > +    if (*s == '-') {
>> > +        r = -ERANGE;
>> > +        goto out;
>> > +    }
>> 
>> This creates the special case that made me go "ugh" above.  Suggest to
>> drop the if here, and check right after strtoull() instead, as shown
>> below.  That way, we get the same endptr behavior for all out-of-range
>> numbers.
>
> Is returning -EINVAL acceptable for you, as well? In your proposal we
> consider "-1234" part of the language but out-of-range. Returning
> -EINVAL, we consider "-1234" not part of the language, and invalid
> input. The newer version returns -EINVAL.

I prefer -ERANGE on underflow, for symmetry with parsing *signed*
integers.  A similar parsing function for signed integers would surely
assign LLONG_MIN and return -ERANGE then.

A secondary, weak argument: caller can more easily distinguish the
failure modes:

* -EINVAL: @s invalid, @base invalid (both programming errors), or @s
  doesn't start with an integer.  I use "integer" in the mathematical
  sense here.

* -ERANGE, *value == 0: integer underflows *value

* -ERANGE, *value == ULLONG_MAX: integer overflows *value.

If we lump underflow into the -EINVAL case, you have to check *endptr ==
'-' to find it.

The argument is weak, because I don't have a caller ready that actually
wants to find it.

>> > +
>> > +    errno = 0;
>> > +    val = strtoull(s, &endp, base);
>> 
>>        if (*s = '-') {
>>            r = -ERANGE;
>>            val = 0;
>>            goto out;
>>        }
>
> This has another bug: makes endptr point to the middle of the string in
> case we are parsing "   xxx" or "   -1234".

It makes endptr point right behind the integer, doesn't it?

When parsing "   xxx", it points to the first 'x' (we skipped the spaces
already).

When parsing "   -1234", it points behind '4'.

>> > +    if (errno) {
>> > +        r = -errno;
>> > +        goto out;
>> > +    }
>> > +
>> > +    if (endp == s) {
>> > +        r = -EINVAL;
>> > +        goto out;
>> > +    }
>> > +
>> > +out:
>> > +    *value = val;
>> > +    *endptr = endp;
>> > +    return r;
>> > +}
>> > +
>> > +/* Try to parse an unsigned integer, making sure the whole string is parsed
>> > + *
>> > + * Have the same behavior of parse_uint(), but with an additional check
>> > + * for additional data after the parsed number (in that case, the function
>> > + * will return -EINVAL).
>> 
>> What's assigned to *value then?
>
> Undefined.  :-)
>
>
>> 
>> > + */
>> 
>> Alternatively, you could make into parse_uint() do that check when
>> endptr is null, and drop this function.  Matter of taste.
>
> I considered doing that, but I believe it would be too subtle.
>
>> 
>> > +int parse_uint_full(const char *s, unsigned long long *value, int base)
>> > +{
>> > +    char *endp;
>> > +    int r;
>> > +
>> > +    r = parse_uint(s, value, &endp, base);
>> > +    if (r < 0) {
>> > +        return r;
>> > +    }
>> > +    if (*endp) {
>> 
>> Answering my own question: the parsed integer is assigned to *value then.
>
> I prefer to document it as undefined. If you want to use the parsed
> integer even if there's additional data after it, you should use
> parse_int() instead.

Maybe.

What I want is a proper function contract.  That includes how *value is
changed.  *Especially* when it's an unspecified change.

>> > +        return -EINVAL;
>> > +    }
>> > +
>> > +    return 0;
>> > +}
>> > +
>> >  int qemu_parse_fd(const char *param)
>> >  {
>> >      int fd;
diff mbox

Patch

diff --git a/include/qemu-common.h b/include/qemu-common.h
index ca464bb..f134629 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -170,6 +170,10 @@  int qemu_fdatasync(int fd);
 int fcntl_setfl(int fd, int flag);
 int qemu_parse_fd(const char *param);
 
+int parse_uint(const char *s, unsigned long long *value, char **endptr,
+               int base);
+int parse_uint_full(const char *s, unsigned long long *value, int base);
+
 /*
  * strtosz() suffixes used to specify the default treatment of an
  * argument passed to strtosz() without an explicit suffix.
diff --git a/tests/Makefile b/tests/Makefile
index d97a571..2dba3e5 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -45,6 +45,8 @@  gcov-files-test-aio-$(CONFIG_WIN32) = aio-win32.c
 gcov-files-test-aio-$(CONFIG_POSIX) = aio-posix.c
 check-unit-y += tests/test-thread-pool$(EXESUF)
 gcov-files-test-thread-pool-y = thread-pool.c
+check-unit-y += tests/test-cutils$(EXESUF)
+gcov-files-test-cutils-y += util/cutils.c
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -84,6 +86,7 @@  tests/test-coroutine$(EXESUF): tests/test-coroutine.o $(block-obj-y) libqemuutil
 tests/test-aio$(EXESUF): tests/test-aio.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(block-obj-y) libqemuutil.a libqemustub.a
 tests/test-iov$(EXESUF): tests/test-iov.o libqemuutil.a
+tests/test-cutils$(EXESUF): tests/test-cutils.o util/cutils.o
 
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/tests/test-cutils.c b/tests/test-cutils.c
new file mode 100644
index 0000000..f4f6951
--- /dev/null
+++ b/tests/test-cutils.c
@@ -0,0 +1,216 @@ 
+/*
+ * cutils.c unit-tests
+ *
+ * Copyright (C) 2013 Red Hat Inc.
+ *
+ * Authors:
+ *  Eduardo Habkost <ehabkost@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <glib.h>
+#include <errno.h>
+#include <string.h>
+
+#include "qemu-common.h"
+
+
+static void test_parse_uint_null(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    int r;
+
+    r = parse_uint(NULL, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, -EINVAL);
+    g_assert_cmpint(i, ==, 0);
+    g_assert(endptr == NULL);
+}
+
+static void test_parse_uint_empty(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    const char *str = "";
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, -EINVAL);
+    g_assert_cmpint(i, ==, 0);
+    g_assert(endptr == str);
+}
+
+static void test_parse_uint_trailing(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    const char *str = "123xxx";
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, 0);
+    g_assert_cmpint(i, ==, 123);
+    g_assert(endptr == str + 3);
+}
+
+static void test_parse_uint_correct(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    const char *str = "123";
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, 0);
+    g_assert_cmpint(i, ==, 123);
+    g_assert(endptr == str + strlen(str));
+}
+
+static void test_parse_uint_octal(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    const char *str = "0123";
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, 0);
+    g_assert_cmpint(i, ==, 0123);
+    g_assert(endptr == str + strlen(str));
+}
+
+static void test_parse_uint_decimal(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    const char *str = "0123";
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 10);
+
+    g_assert_cmpint(r, ==, 0);
+    g_assert_cmpint(i, ==, 123);
+    g_assert(endptr == str + strlen(str));
+}
+
+
+static void test_parse_uint_llong_max(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    char *str = g_strdup_printf("%llu", (unsigned long long)LLONG_MAX + 1);
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, 0);
+    g_assert_cmpint(i, ==, (unsigned long long)LLONG_MAX + 1);
+    g_assert(endptr == str + strlen(str));
+
+    g_free(str);
+}
+
+static void test_parse_uint_overflow(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    const char *str = "99999999999999999999999999999999999999";
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, -ERANGE);
+    g_assert_cmpint(i, ==, ULLONG_MAX);
+    g_assert(endptr == str + strlen(str));
+}
+
+static void test_parse_uint_negative(void)
+{
+    unsigned long long i = 999;
+    char f = 'X';
+    char *endptr = &f;
+    const char *str = " \t -321";
+    int r;
+
+    r = parse_uint(str, &i, &endptr, 0);
+
+    g_assert_cmpint(r, ==, -ERANGE);
+    g_assert_cmpint(i, ==, 0);
+    g_assert(endptr == str + 3);
+}
+
+
+static void test_parse_uint_full_trailing(void)
+{
+    unsigned long long i = 999;
+    const char *str = "123xxx";
+    int r;
+
+    r = parse_uint_full(str, &i, 0);
+
+    g_assert_cmpint(r, ==, -EINVAL);
+    g_assert_cmpint(i, ==, 123);
+}
+
+static void test_parse_uint_full_correct(void)
+{
+    unsigned long long i = 999;
+    const char *str = "123";
+    int r;
+
+    r = parse_uint_full(str, &i, 0);
+
+    g_assert_cmpint(r, ==, 0);
+    g_assert_cmpint(i, ==, 123);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/cutils/parse_uint/null", test_parse_uint_null);
+    g_test_add_func("/cutils/parse_uint/empty", test_parse_uint_empty);
+    g_test_add_func("/cutils/parse_uint/trailing", test_parse_uint_trailing);
+    g_test_add_func("/cutils/parse_uint/correct", test_parse_uint_correct);
+    g_test_add_func("/cutils/parse_uint/octal", test_parse_uint_octal);
+    g_test_add_func("/cutils/parse_uint/decimal", test_parse_uint_decimal);
+    g_test_add_func("/cutils/parse_uint/llong_max", test_parse_uint_llong_max);
+    g_test_add_func("/cutils/parse_uint/overflow", test_parse_uint_overflow);
+    g_test_add_func("/cutils/parse_uint/negative", test_parse_uint_negative);
+    g_test_add_func("/cutils/parse_uint_full/trailing",
+                    test_parse_uint_full_trailing);
+    g_test_add_func("/cutils/parse_uint_full/correct",
+                    test_parse_uint_full_correct);
+
+    return g_test_run();
+}
diff --git a/util/cutils.c b/util/cutils.c
index 80bb1dc..7f09251 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -270,6 +270,85 @@  int64_t strtosz(const char *nptr, char **end)
     return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
 }
 
+/* Try to parse an unsigned integer
+ *
+ * Error checks done by the function:
+ * - NULL pointer will return -EINVAL.
+ * - Empty strings will return -EINVAL.
+ * - Overflow errors or other errno values  set by strtoull() will
+ *   return -errno (-ERANGE in case of overflow).
+ * - Differently from strtoull(), values starting with a minus sign are
+ *   rejected (returning -ERANGE).
+ *
+ * Sets endptr to point to the first invalid character. Callers may rely
+ * on *value and *endptr to be always set by the function, even in case of
+ * errors.
+ *
+ * The 'base' parameter has the same meaning of 'base' on strtoull().
+ *
+ * Returns 0 on success, negative errno value on error.
+ */
+int parse_uint(const char *s, unsigned long long *value, char **endptr,
+               int base)
+{
+    int r = 0;
+    char *endp = (char *)s;
+    unsigned long long val = 0;
+
+    if (!s) {
+        r = -EINVAL;
+        goto out;
+    }
+
+    /* make sure we reject negative numbers: */
+    while (isspace((unsigned char)*s)) {
+        ++s; endp++;
+    }
+    if (*s == '-') {
+        r = -ERANGE;
+        goto out;
+    }
+
+    errno = 0;
+    val = strtoull(s, &endp, base);
+    if (errno) {
+        r = -errno;
+        goto out;
+    }
+
+    if (endp == s) {
+        r = -EINVAL;
+        goto out;
+    }
+
+out:
+    *value = val;
+    *endptr = endp;
+    return r;
+}
+
+/* Try to parse an unsigned integer, making sure the whole string is parsed
+ *
+ * Have the same behavior of parse_uint(), but with an additional check
+ * for additional data after the parsed number (in that case, the function
+ * will return -EINVAL).
+ */
+int parse_uint_full(const char *s, unsigned long long *value, int base)
+{
+    char *endp;
+    int r;
+
+    r = parse_uint(s, value, &endp, base);
+    if (r < 0) {
+        return r;
+    }
+    if (*endp) {
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
 int qemu_parse_fd(const char *param)
 {
     int fd;