diff mbox

[1/2] qemu-opts: introduce a function to compare option names

Message ID 1343233543-18561-2-git-send-email-aliguori@us.ibm.com
State New
Headers show

Commit Message

Anthony Liguori July 25, 2012, 4:25 p.m. UTC
We don't use the standard C functions for conversion because we don't want to
depend on the user's locale.  All option names in QEMU are en_US in plain ASCII.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
 qemu-option.c |   53 +++++++++++++++++++++++++++++++++++++++++++++--------
 qemu-option.h |    2 ++
 2 files changed, 47 insertions(+), 8 deletions(-)

Comments

Eric Blake July 25, 2012, 4:45 p.m. UTC | #1
On 07/25/2012 10:25 AM, Anthony Liguori wrote:
> We don't use the standard C functions for conversion because we don't want to
> depend on the user's locale.  All option names in QEMU are en_US in plain ASCII.

[Wondering if I should bring up the US 'canceled' vs. UK 'cancelled' as
a counterpoint to the claim of everything being en_US, but then again, I
didn't find any use of 'cancelled' as an option in qapi-schema.json.]

> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> ---
>  qemu-option.c |   53 +++++++++++++++++++++++++++++++++++++++++++++--------
>  qemu-option.h |    2 ++
>  2 files changed, 47 insertions(+), 8 deletions(-)
> 
> diff --git a/qemu-option.c b/qemu-option.c
> index 8334190..6494c99 100644
> --- a/qemu-option.c
> +++ b/qemu-option.c
> @@ -89,6 +89,43 @@ const char *get_opt_value(char *buf, int buf_size, const char *p)
>      return p;
>  }
>  
> +static int opt_tolower(int ch)
> +{
> +    if (ch >= 'A' && ch <= 'Z') {
> +        return 'a' + (ch - 'A');
> +    } else if (ch == '_') {
> +        return '-';

Slick - making case-insensitive comparison also fold '-' and '_' together :)

Reviewed-by: Eric Blake <eblake@redhat.com>
Eric Blake July 25, 2012, 4:46 p.m. UTC | #2
On 07/25/2012 10:45 AM, Eric Blake wrote:
> On 07/25/2012 10:25 AM, Anthony Liguori wrote:
>> We don't use the standard C functions for conversion because we don't want to
>> depend on the user's locale.  All option names in QEMU are en_US in plain ASCII.
> 
>>  
>> +static int opt_tolower(int ch)
>> +{
>> +    if (ch >= 'A' && ch <= 'Z') {
>> +        return 'a' + (ch - 'A');

P.S. This is not portable to EBCDIC, but I guess we don't care about
compilation of qemu on a non-ASCII machine, so my review still stands.

> Reviewed-by: Eric Blake <eblake@redhat.com>
>
Peter Maydell July 25, 2012, 4:53 p.m. UTC | #3
On 25 July 2012 17:25, Anthony Liguori <aliguori@us.ibm.com> wrote:
> We don't use the standard C functions for conversion because we don't want to
> depend on the user's locale.  All option names in QEMU are en_US in plain ASCII.
>
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> ---
>  qemu-option.c |   53 +++++++++++++++++++++++++++++++++++++++++++++--------
>  qemu-option.h |    2 ++
>  2 files changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/qemu-option.c b/qemu-option.c
> index 8334190..6494c99 100644
> --- a/qemu-option.c
> +++ b/qemu-option.c
> @@ -89,6 +89,43 @@ const char *get_opt_value(char *buf, int buf_size, const char *p)
>      return p;
>  }
>
> +static int opt_tolower(int ch)

This isn't actually doing a pure tolower() operation.
opt_canonicalize() is a bit long though, perhaps.
I guess it's static so not a big deal.

 +{
> +    if (ch >= 'A' && ch <= 'Z') {
> +        return 'a' + (ch - 'A');
> +    } else if (ch == '_') {
> +        return '-';
> +    }
> +
> +    return ch;
> +}
> +
> +int qemu_opt_name_cmp(const char *lhs, const char *rhs)
> +{
> +    int i;
> +
> +    g_assert(lhs && rhs);
> +
> +    for (i = 0; lhs[i] && rhs[i]; i++) {
> +        int ret;
> +
> +        ret = opt_tolower(lhs[i]) - opt_tolower(rhs[i]);

This is not in line with the return value that the C library
strcmp() would return. C99 7.21.4 says "The sign of a nonzero
value returned by the comparison functions memcmp, strcmp,
and strncmp is determined by the sign of the difference between
the values of the first pair of characters (both interpreted
as unsigned char) that differ in the objects being compared."
This code will use whatever the signed/unsignedess of plain
'char' is.

(None of the callers use the sign of the return value so this
is something of a nitpick.)

> +        if (ret < 0) {
> +            return -1;
> +        } else if (ret > 0) {
> +            return 1;
> +        }
> +    }
> +
> +    if (!lhs[i] && rhs[i]) {
> +        return -1;
> +    } else if (lhs[i] && !rhs[i]) {
> +        return 1;
> +    }

If you made the for() loop into a do..while so that we
execute the loop body for the 'final NUL' case you could
avoid having this pair of checks, I think (and you can
make the loop termination case just '!lhs[i]' since if
we get past the 'mismatch' exits we've definitely got
to the end of both strings and can return true).

> +
> +    return 0;
> +}

> --- a/qemu-option.h
> +++ b/qemu-option.h
> @@ -141,4 +141,6 @@ int qemu_opts_print(QemuOpts *opts, void *dummy);
>  int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque,
>                        int abort_on_failure);
>
> +int qemu_opt_name_cmp(const char *lhs, const char *rhs);

No documentation comment?

-- PMM
Anthony Liguori July 25, 2012, 5:33 p.m. UTC | #4
Peter Maydell <peter.maydell@linaro.org> writes:

> On 25 July 2012 17:25, Anthony Liguori <aliguori@us.ibm.com> wrote:
>> We don't use the standard C functions for conversion because we don't want to
>> depend on the user's locale.  All option names in QEMU are en_US in plain ASCII.
>>
>> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>> ---
>>  qemu-option.c |   53 +++++++++++++++++++++++++++++++++++++++++++++--------
>>  qemu-option.h |    2 ++
>>  2 files changed, 47 insertions(+), 8 deletions(-)
>>
>> diff --git a/qemu-option.c b/qemu-option.c
>> index 8334190..6494c99 100644
>> --- a/qemu-option.c
>> +++ b/qemu-option.c
>> @@ -89,6 +89,43 @@ const char *get_opt_value(char *buf, int buf_size, const char *p)
>>      return p;
>>  }
>>
>> +static int opt_tolower(int ch)
>
> This isn't actually doing a pure tolower() operation.
> opt_canonicalize() is a bit long though, perhaps.
> I guess it's static so not a big deal.

Yeah.

>
>  +{
>> +    if (ch >= 'A' && ch <= 'Z') {
>> +        return 'a' + (ch - 'A');
>> +    } else if (ch == '_') {
>> +        return '-';
>> +    }
>> +
>> +    return ch;
>> +}
>> +
>> +int qemu_opt_name_cmp(const char *lhs, const char *rhs)
>> +{
>> +    int i;
>> +
>> +    g_assert(lhs && rhs);
>> +
>> +    for (i = 0; lhs[i] && rhs[i]; i++) {
>> +        int ret;
>> +
>> +        ret = opt_tolower(lhs[i]) - opt_tolower(rhs[i]);
>
> This is not in line with the return value that the C library
> strcmp() would return. C99 7.21.4 says "The sign of a nonzero
> value returned by the comparison functions memcmp, strcmp,
> and strncmp is determined by the sign of the difference between
> the values of the first pair of characters (both interpreted
> as unsigned char) that differ in the objects being compared."
> This code will use whatever the signed/unsignedess of plain
> 'char' is.
>
> (None of the callers use the sign of the return value so this
> is something of a nitpick.)

Sorry, how is this wrong?

This returns:

strcmp("a", "b") -> -1
qemu_opt_name_cmp("a", "b") -> -1

>> +        if (ret < 0) {
>> +            return -1;
>> +        } else if (ret > 0) {
>> +            return 1;
>> +        }
>> +    }
>> +
>> +    if (!lhs[i] && rhs[i]) {
>> +        return -1;
>> +    } else if (lhs[i] && !rhs[i]) {
>> +        return 1;
>> +    }
>
> If you made the for() loop into a do..while so that we
> execute the loop body for the 'final NUL' case you could
> avoid having this pair of checks, I think (and you can
> make the loop termination case just '!lhs[i]' since if
> we get past the 'mismatch' exits we've definitely got
> to the end of both strings and can return true).

I can poke around but not convinced it will result in better code.  I
must admit that I prefer explicit handling of edge cases like this.

>
>> +
>> +    return 0;
>> +}
>
>> --- a/qemu-option.h
>> +++ b/qemu-option.h
>> @@ -141,4 +141,6 @@ int qemu_opts_print(QemuOpts *opts, void *dummy);
>>  int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque,
>>                        int abort_on_failure);
>>
>> +int qemu_opt_name_cmp(const char *lhs, const char *rhs);
>
> No documentation comment?

Good point.

Regards,

Anthony Liguori

>
> -- PMM
Anthony Liguori July 25, 2012, 5:34 p.m. UTC | #5
Eric Blake <eblake@redhat.com> writes:

> On 07/25/2012 10:45 AM, Eric Blake wrote:
>> On 07/25/2012 10:25 AM, Anthony Liguori wrote:
>>> We don't use the standard C functions for conversion because we don't want to
>>> depend on the user's locale.  All option names in QEMU are en_US in plain ASCII.
>> 
>>>  
>>> +static int opt_tolower(int ch)
>>> +{
>>> +    if (ch >= 'A' && ch <= 'Z') {
>>> +        return 'a' + (ch - 'A');
>
> P.S. This is not portable to EBCDIC, but I guess we don't care about
> compilation of qemu on a non-ASCII machine, so my review still stands.

Fortunately, even on S390, Linux uses ASCII under normal circumstances
:-)

Regards,

Anthony Liguori

>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> 
>
> -- 
> Eric Blake   eblake@redhat.com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
Peter Maydell July 25, 2012, 6:31 p.m. UTC | #6
On 25 July 2012 18:33, Anthony Liguori <aliguori@us.ibm.com> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
>> This is not in line with the return value that the C library
>> strcmp() would return. C99 7.21.4 says "The sign of a nonzero
>> value returned by the comparison functions memcmp, strcmp,
>> and strncmp is determined by the sign of the difference between
>> the values of the first pair of characters (both interpreted
>> as unsigned char) that differ in the objects being compared."
>> This code will use whatever the signed/unsignedess of plain
>> 'char' is.
>>
>> (None of the callers use the sign of the return value so this
>> is something of a nitpick.)
>
> Sorry, how is this wrong?
>
> This returns:
>
> strcmp("a", "b") -> -1
> qemu_opt_name_cmp("a", "b") -> -1

 strcmp("\x90", "\x50") -> 64
 qemu_opt_name_cmp("\x90", "\x50") -> -1

(assuming a 'char is signed' architecture like x86, or use
"gcc -fsigned-char" if by some miracle you do your qemu
development on an ARM box :-))

>> If you made the for() loop into a do..while so that we
>> execute the loop body for the 'final NUL' case you could
>> avoid having this pair of checks, I think (and you can
>> make the loop termination case just '!lhs[i]' since if
>> we get past the 'mismatch' exits we've definitely got
>> to the end of both strings and can return true).
>
> I can poke around but not convinced it will result in better code.  I
> must admit that I prefer explicit handling of edge cases like this.

Typically these basic C library functions are arranged so
that the edge cases mostly fall out in the wash
(because the original implementation was done to be simple
to code rather than to provide particular semantics),
so if I see an implementation that does special case them
it makes me wonder if it's getting something wrong.

For instance the BSD naive strcmp is just four lines and
no special casing apart from "stop if we ran out of string":

    while (*s1 == *s2++)
        if (*s1++ == '\0')
            return (0);
    return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));

-- PMM
Markus Armbruster July 27, 2012, 7:44 a.m. UTC | #7
Anthony Liguori <aliguori@us.ibm.com> writes:

> We don't use the standard C functions for conversion because we don't want to
> depend on the user's locale.  All option names in QEMU are en_US in plain ASCII.

Fails to explain the important part, namely the actual change to option
comparison!

> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> ---
>  qemu-option.c |   53 +++++++++++++++++++++++++++++++++++++++++++++--------
>  qemu-option.h |    2 ++
>  2 files changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/qemu-option.c b/qemu-option.c
> index 8334190..6494c99 100644
> --- a/qemu-option.c
> +++ b/qemu-option.c
> @@ -89,6 +89,43 @@ const char *get_opt_value(char *buf, int buf_size, const char *p)
>      return p;
>  }
>  
> +static int opt_tolower(int ch)
> +{
> +    if (ch >= 'A' && ch <= 'Z') {
> +        return 'a' + (ch - 'A');
> +    } else if (ch == '_') {
> +        return '-';
> +    }
> +
> +    return ch;
> +}
> +
> +int qemu_opt_name_cmp(const char *lhs, const char *rhs)
> +{
> +    int i;
> +    
> +    g_assert(lhs && rhs);

Why bother?

> +
> +    for (i = 0; lhs[i] && rhs[i]; i++) {
> +        int ret;
> +
> +        ret = opt_tolower(lhs[i]) - opt_tolower(rhs[i]);
> +        if (ret < 0) {
> +            return -1;
> +        } else if (ret > 0) {
> +            return 1;
> +        }
> +    }
> +
> +    if (!lhs[i] && rhs[i]) {
> +        return -1;
> +    } else if (lhs[i] && !rhs[i]) {
> +        return 1;
> +    }
> +    
> +    return 0;
> +}
> +

Are you sure you want to make options case-insensitive?  If yes, please
mention it in the commit message.

The implementation is too longwinded for my taste :)

static unsigned char opt_canon_ch(char ch)
{
    if (ch >= 'A' && ch <= 'Z') {
        return 'a' + (ch - 'A');
    } else if (ch == '_') {
        return '-';
    }

    return ch;
}

int qemu_opt_name_cmp(const char *lhs, const char *rhs)
{
    unsigned char l, r;

    do {
        l = opt_canon_ch(*lhs++);
        r = opt_canon_ch(*rhs++);
    } while (l && l == r);

    return l - r;
}

Or maybe a bool qemu_opt_name_eq() instead.

>  int get_next_param_value(char *buf, int buf_size,
>                           const char *tag, const char **pstr)
>  {
> @@ -101,7 +138,7 @@ int get_next_param_value(char *buf, int buf_size,
>          if (*p != '=')
>              break;
>          p++;
> -        if (!strcmp(tag, option)) {
> +        if (!qemu_opt_name_cmp(tag, option)) {
>              *pstr = get_opt_value(buf, buf_size, p);
>              if (**pstr == ',') {
>                  (*pstr)++;

Aha, you cover the non-QemuOpts stuff, too.  Fine with me.

> @@ -137,7 +174,7 @@ int check_params(char *buf, int buf_size,
>          }
>          p++;
>          for (i = 0; params[i] != NULL; i++) {
> -            if (!strcmp(params[i], buf)) {
> +            if (!qemu_opt_name_cmp(params[i], buf)) {
>                  break;
>              }
>          }
> @@ -160,7 +197,7 @@ QEMUOptionParameter *get_option_parameter(QEMUOptionParameter *list,
>      const char *name)
>  {
>      while (list && list->name) {
> -        if (!strcmp(list->name, name)) {
> +        if (!qemu_opt_name_cmp(list->name, name)) {
>              return list;
>          }
>          list++;
> @@ -516,7 +553,7 @@ static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name)
>      QemuOpt *opt;
>  
>      QTAILQ_FOREACH_REVERSE(opt, &opts->head, QemuOptHead, next) {
> -        if (strcmp(opt->name, name) != 0)
> +        if (qemu_opt_name_cmp(opt->name, name) != 0)
>              continue;
>          return opt;
>      }
> @@ -599,7 +636,7 @@ static void opt_set(QemuOpts *opts, const char *name, const char *value,
>      int i;
>  
>      for (i = 0; desc[i].name != NULL; i++) {
> -        if (strcmp(desc[i].name, name) == 0) {
> +        if (qemu_opt_name_cmp(desc[i].name, name) == 0) {
>              break;
>          }
>      }
> @@ -660,7 +697,7 @@ int qemu_opt_set_bool(QemuOpts *opts, const char *name, bool val)
>      int i;
>  
>      for (i = 0; desc[i].name != NULL; i++) {
> -        if (strcmp(desc[i].name, name) == 0) {
> +        if (qemu_opt_name_cmp(desc[i].name, name) == 0) {
>              break;
>          }
>      }
> @@ -709,7 +746,7 @@ QemuOpts *qemu_opts_find(QemuOptsList *list, const char *id)
>              }
>              continue;
>          }
> -        if (strcmp(opts->id, id) != 0) {
> +        if (qemu_opt_name_cmp(opts->id, id) != 0) {
>              continue;
>          }
>          return opts;

There's a strcmp(option, "id") in opts_do_parse(), and a similar one in
qemu_opts_from_qdict_1().  Perhaps they should be compared with
qemu_opt_name_cmp(), too, just for consistency.

> @@ -1062,7 +1099,7 @@ void qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc, Error **errp)
>          int i;
>  
>          for (i = 0; desc[i].name != NULL; i++) {
> -            if (strcmp(desc[i].name, opt->name) == 0) {
> +            if (qemu_opt_name_cmp(desc[i].name, opt->name) == 0) {
>                  break;
>              }
>          }
> diff --git a/qemu-option.h b/qemu-option.h
> index 951dec3..ee27a13 100644
> --- a/qemu-option.h
> +++ b/qemu-option.h
> @@ -141,4 +141,6 @@ int qemu_opts_print(QemuOpts *opts, void *dummy);
>  int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque,
>                        int abort_on_failure);
>  
> +int qemu_opt_name_cmp(const char *lhs, const char *rhs);
> +
>  #endif

Why external?
Peter Maydell July 27, 2012, 11:14 a.m. UTC | #8
On 27 July 2012 08:44, Markus Armbruster <armbru@redhat.com> wrote:
> The implementation is too longwinded for my taste :)
>
> static unsigned char opt_canon_ch(char ch)
> {
>     if (ch >= 'A' && ch <= 'Z') {
>         return 'a' + (ch - 'A');
>     } else if (ch == '_') {
>         return '-';
>     }
>
>     return ch;
> }
>
> int qemu_opt_name_cmp(const char *lhs, const char *rhs)
> {
>     unsigned char l, r;
>
>     do {
>         l = opt_canon_ch(*lhs++);
>         r = opt_canon_ch(*rhs++);
>     } while (l && l == r);
>
>     return l - r;
> }

Yes, I like the function naming and this implementation
is both shorter and more obviously correct.

-- PMM
diff mbox

Patch

diff --git a/qemu-option.c b/qemu-option.c
index 8334190..6494c99 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -89,6 +89,43 @@  const char *get_opt_value(char *buf, int buf_size, const char *p)
     return p;
 }
 
+static int opt_tolower(int ch)
+{
+    if (ch >= 'A' && ch <= 'Z') {
+        return 'a' + (ch - 'A');
+    } else if (ch == '_') {
+        return '-';
+    }
+
+    return ch;
+}
+
+int qemu_opt_name_cmp(const char *lhs, const char *rhs)
+{
+    int i;
+    
+    g_assert(lhs && rhs);
+
+    for (i = 0; lhs[i] && rhs[i]; i++) {
+        int ret;
+
+        ret = opt_tolower(lhs[i]) - opt_tolower(rhs[i]);
+        if (ret < 0) {
+            return -1;
+        } else if (ret > 0) {
+            return 1;
+        }
+    }
+
+    if (!lhs[i] && rhs[i]) {
+        return -1;
+    } else if (lhs[i] && !rhs[i]) {
+        return 1;
+    }
+    
+    return 0;
+}
+
 int get_next_param_value(char *buf, int buf_size,
                          const char *tag, const char **pstr)
 {
@@ -101,7 +138,7 @@  int get_next_param_value(char *buf, int buf_size,
         if (*p != '=')
             break;
         p++;
-        if (!strcmp(tag, option)) {
+        if (!qemu_opt_name_cmp(tag, option)) {
             *pstr = get_opt_value(buf, buf_size, p);
             if (**pstr == ',') {
                 (*pstr)++;
@@ -137,7 +174,7 @@  int check_params(char *buf, int buf_size,
         }
         p++;
         for (i = 0; params[i] != NULL; i++) {
-            if (!strcmp(params[i], buf)) {
+            if (!qemu_opt_name_cmp(params[i], buf)) {
                 break;
             }
         }
@@ -160,7 +197,7 @@  QEMUOptionParameter *get_option_parameter(QEMUOptionParameter *list,
     const char *name)
 {
     while (list && list->name) {
-        if (!strcmp(list->name, name)) {
+        if (!qemu_opt_name_cmp(list->name, name)) {
             return list;
         }
         list++;
@@ -516,7 +553,7 @@  static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name)
     QemuOpt *opt;
 
     QTAILQ_FOREACH_REVERSE(opt, &opts->head, QemuOptHead, next) {
-        if (strcmp(opt->name, name) != 0)
+        if (qemu_opt_name_cmp(opt->name, name) != 0)
             continue;
         return opt;
     }
@@ -599,7 +636,7 @@  static void opt_set(QemuOpts *opts, const char *name, const char *value,
     int i;
 
     for (i = 0; desc[i].name != NULL; i++) {
-        if (strcmp(desc[i].name, name) == 0) {
+        if (qemu_opt_name_cmp(desc[i].name, name) == 0) {
             break;
         }
     }
@@ -660,7 +697,7 @@  int qemu_opt_set_bool(QemuOpts *opts, const char *name, bool val)
     int i;
 
     for (i = 0; desc[i].name != NULL; i++) {
-        if (strcmp(desc[i].name, name) == 0) {
+        if (qemu_opt_name_cmp(desc[i].name, name) == 0) {
             break;
         }
     }
@@ -709,7 +746,7 @@  QemuOpts *qemu_opts_find(QemuOptsList *list, const char *id)
             }
             continue;
         }
-        if (strcmp(opts->id, id) != 0) {
+        if (qemu_opt_name_cmp(opts->id, id) != 0) {
             continue;
         }
         return opts;
@@ -1062,7 +1099,7 @@  void qemu_opts_validate(QemuOpts *opts, const QemuOptDesc *desc, Error **errp)
         int i;
 
         for (i = 0; desc[i].name != NULL; i++) {
-            if (strcmp(desc[i].name, opt->name) == 0) {
+            if (qemu_opt_name_cmp(desc[i].name, opt->name) == 0) {
                 break;
             }
         }
diff --git a/qemu-option.h b/qemu-option.h
index 951dec3..ee27a13 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -141,4 +141,6 @@  int qemu_opts_print(QemuOpts *opts, void *dummy);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func, void *opaque,
                       int abort_on_failure);
 
+int qemu_opt_name_cmp(const char *lhs, const char *rhs);
+
 #endif