Patchwork qemu-sockets: Fix assertion failure

login
register
mail settings
Submitter Kevin Wolf
Date March 6, 2013, 10:48 a.m.
Message ID <1362566886-14073-1-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/225474/
State New
Headers show

Comments

Kevin Wolf - March 6, 2013, 10:48 a.m.
inet_connect_opts() tries all possible addrinfos returned by
getaddrinfo(). If one fails with an error, the next one is tried. In
this case, the Error should be discarded because the whole operation is
successful if another addrinfo from the list succeeds; and if it
doesn't, setting an already set Error will trigger an assertion failure.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 util/qemu-sockets.c | 8 ++++++++
 1 file changed, 8 insertions(+)
Paolo Bonzini - March 6, 2013, 11:04 a.m.
Il 06/03/2013 11:48, Kevin Wolf ha scritto:
> inet_connect_opts() tries all possible addrinfos returned by
> getaddrinfo(). If one fails with an error, the next one is tried. In
> this case, the Error should be discarded because the whole operation is
> successful if another addrinfo from the list succeeds; and if it
> doesn't, setting an already set Error will trigger an assertion failure.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  util/qemu-sockets.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 1350ccc..32e609a 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
>      }
>  
>      for (e = res; e != NULL; e = e->ai_next) {
> +
> +        /* Overwriting errors isn't allowed, so clear any error that may have
> +         * occured in the previous iteration */
> +        if (error_is_set(errp)) {
> +            error_free(*errp);
> +            *errp = NULL;
> +        }
> +
>          if (connect_state != NULL) {
>              connect_state->current_addr = e;
>          }
> 

Should we also do nothing if errp is not NULL on entry?

Paolo
Kevin Wolf - March 6, 2013, 11:11 a.m.
Am 06.03.2013 um 12:04 hat Paolo Bonzini geschrieben:
> Il 06/03/2013 11:48, Kevin Wolf ha scritto:
> > inet_connect_opts() tries all possible addrinfos returned by
> > getaddrinfo(). If one fails with an error, the next one is tried. In
> > this case, the Error should be discarded because the whole operation is
> > successful if another addrinfo from the list succeeds; and if it
> > doesn't, setting an already set Error will trigger an assertion failure.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  util/qemu-sockets.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 1350ccc..32e609a 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
> >      }
> >  
> >      for (e = res; e != NULL; e = e->ai_next) {
> > +
> > +        /* Overwriting errors isn't allowed, so clear any error that may have
> > +         * occured in the previous iteration */
> > +        if (error_is_set(errp)) {
> > +            error_free(*errp);
> > +            *errp = NULL;
> > +        }
> > +
> >          if (connect_state != NULL) {
> >              connect_state->current_addr = e;
> >          }
> > 
> 
> Should we also do nothing if errp is not NULL on entry?

We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
an Error, you must return instead of calling more functions with the
same error pointer.

Kevin
Laszlo Ersek - March 6, 2013, 2:46 p.m.
On 03/06/13 12:11, Kevin Wolf wrote:
> Am 06.03.2013 um 12:04 hat Paolo Bonzini geschrieben:
>> Il 06/03/2013 11:48, Kevin Wolf ha scritto:
>>> inet_connect_opts() tries all possible addrinfos returned by
>>> getaddrinfo(). If one fails with an error, the next one is tried. In
>>> this case, the Error should be discarded because the whole operation is
>>> successful if another addrinfo from the list succeeds; and if it
>>> doesn't, setting an already set Error will trigger an assertion failure.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  util/qemu-sockets.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
>>> index 1350ccc..32e609a 100644
>>> --- a/util/qemu-sockets.c
>>> +++ b/util/qemu-sockets.c
>>> @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
>>>      }
>>>  
>>>      for (e = res; e != NULL; e = e->ai_next) {
>>> +
>>> +        /* Overwriting errors isn't allowed, so clear any error that may have
>>> +         * occured in the previous iteration */
>>> +        if (error_is_set(errp)) {
>>> +            error_free(*errp);
>>> +            *errp = NULL;
>>> +        }
>>> +
>>>          if (connect_state != NULL) {
>>>              connect_state->current_addr = e;
>>>          }
>>>
>>
>> Should we also do nothing if errp is not NULL on entry?
> 
> We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
> an Error, you must return instead of calling more functions with the
> same error pointer.

I think Luiz would suggest (*) to receive any error into a
NULL-initialized local_err pointer; do the logic above on local_err, and
just before returning, error_propagate() it to errp.

(*) I hope you can see what I did there: if you disagree, you get to
take that to Luiz, even though he didn't say anything. I'm getting
better at working this list! :)

Laszlo
Paolo Bonzini - March 6, 2013, 3:04 p.m.
Il 06/03/2013 15:46, Laszlo Ersek ha scritto:
>> > We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
>> > an Error, you must return instead of calling more functions with the
>> > same error pointer.
> I think Luiz would suggest (*) to receive any error into a
> NULL-initialized local_err pointer; do the logic above on local_err, and
> just before returning, error_propagate() it to errp.
> 
> (*) I hope you can see what I did there: if you disagree, you get to
> take that to Luiz, even though he didn't say anything. I'm getting
> better at working this list! :)

I agree with Laszlo.

Paolo
Markus Armbruster - March 6, 2013, 3:05 p.m.
[Note cc: Luiz]

Kevin Wolf <kwolf@redhat.com> writes:

> inet_connect_opts() tries all possible addrinfos returned by
> getaddrinfo(). If one fails with an error, the next one is tried. In
> this case, the Error should be discarded because the whole operation is
> successful if another addrinfo from the list succeeds; and if it
> doesn't, setting an already set Error will trigger an assertion failure.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  util/qemu-sockets.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 1350ccc..32e609a 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
>      }
>  
>      for (e = res; e != NULL; e = e->ai_next) {
> +
> +        /* Overwriting errors isn't allowed, so clear any error that may have
> +         * occured in the previous iteration */
> +        if (error_is_set(errp)) {
> +            error_free(*errp);
> +            *errp = NULL;
> +        }
> +
>          if (connect_state != NULL) {
>              connect_state->current_addr = e;
>          }

I'm not sure this is the proper solution, but that could be just my
uncertainty on proper Error usage.

The convention as I understand it is that a function stores whatever
error it wants to return through its errp parameter, with
error_set(errp, ...).  error_set() does nothing when passed a null errp.
This lets callers ignore errors without fuss.

I guess we don't want callers to pass a non-null errp with *errp != NULL
ever, but I don't think that's spelled out anywhere.  I suspect code is
generally unprepared for such usage.

Your patch *clears* errors through its errp parameter.  I'm not saying
it's wrong, I just spooks poor ignorant me.

My gut feeling: as soon as we go beyond utterly trivial with errors,
it's best to put them in local variables, and error_propagate() to the
parameter only at the end.

Let's review some usage patterns, in the hope of learning more.

    /*
     * Do something on @arg.
     * On error, set *@errp if @errp isn't null.
     */
    void do_something(Argument *arg, Error **errp);

    // ignoring errors:
    do_something(arg, NULL);

    // checking and handling errors locally:
    Error *local_err = NULL;
    do_something(arg, &local_err);
    if (local_err) {    // some prefer error_is_set(local_err) here,
                        // but I think that only muddies the waters
        // error handling goes here
        error_free(local_err);
    }

    // propagating to caller via parameter Error *errp:
    Error *local_err = NULL;
    do_something(arg, &local_err);
    if (local_err) {
        // local error handling goes here
        error_propagate(errp, local_err);
    }

    // same, but "simpler" (except it doesn't work):
    do_something(arg, errp);
    if (error_is_set(errp)) {   // BUG: doesn't work when !errp
        // local error handling goes here
    }

    // same, when no local error handling is wanted:
    do_something(arg, errp);

    // but it works only once:
    do_something(arg1, errp);
    do_something(arg2, errp);  // BUG: trips assert() when both fail
    // pity

Let me stress: beware of error_is_set()!  If the argument may or may not
be null, the result is *worthless*.  If it can't be null, you could just
as well test the argument directly.  If it must be null, the result is
always false.

More patterns:
    
    // accumulating errors, first error wins
    Error *err = NULL;
    while (...) {
        Error *local_err = NULL:
        do_something(arg, &local_err);
        if (local_err) {
            // local error handling goes here
            error_propagate(&err, local_err);
        }
    }
    // err contains the first error
    // need to free or propagate it

    // same, but "simpler" (except it doesn't work)
    Error *err = NULL;
    while (...) {
        do_something(arg, &err);// BUG: trips assert() on second error
        if (err) {              // BUG: always true after first error
            // local error handling goes here
        }
    }

    // can omit err when propagating, just replace it by errp:
    // (works because error_propagate() is designed for this)
    while (...) {
        Error *local_err = NULL:
        do_something(arg, &local_err);
        if (local_err) {
            // local error handling goes here
            error_propagate(errp, local_err);
        }
    }
    // errp contains the first error

    // accumulating errors, last error wins
    Error *err = NULL;
    while (...) {
        Error *local_err = NULL:
        do_something(arg, &local_err);
        if (local_err) {
            // local error handling goes here
            // if we do the following often, we should perhaps provide a
            // function for it, just like error_propagate(), only "last
            // one wins" instead of "first one wins"
            if (err) {
                error_free(err);
            }
            err = local_err;
        }
    }
    // err contains the last error
    // need to free or propagate it

    // can omit err when propagating, but need to be careful, because
    // errp may be null:
    while (...) {
        Error *local_err = NULL:
        do_something(arg, &local_err);
        if (local_err) {
            // local error handling goes here
            if (error_is_set(errp)) {
                // exploits that error_is_set(errp) implies errp != NULL
                error_free(*errp);
                *errp = NULL;
            }
            error_propagate(errp, local_err);
        }
    }

Just found a tolerable use of error_is_set().  I'd prefer a function
error_clear(errp) to clear through a possibly null errp.

Still more patterns:

    // try a number of args, until one works
    Error *local_err = NULL;
    for (arg = first_one(); arg; arg = next_one()) {
        if (local_err) {
            error_free(local_err);
            local_err = NULL;
        }
        do_something(arg, &local_err);
        if (!local_err) {
            break;              // arg worked
        }
        // local error handling goes here
    }
    if (local_err) {
        // local error handling goes here
        // need to free or propagate local_err
    } else {
        // arg worked
    }

We can't omit local_err when propagating, because with just errp, we
can't test whether do_something() succeeded.

With a function that return a value that can be tested:

    /*
     * Get @arg's foo.
     * On success, return foo.
     * On failure, return null, and set *@errp if @errp isn't null.
     */
    Foo *get_foo(Argument *arg, Error *errp)

we can omit local_err:

    // try a number of args, until one works
    for (arg = first_one(); arg; arg = next_one()) {
        if (error_is_set(errp)) {
            // exploits that error_is_set(errp) implies errp != NULL
            error_free(*errp);
            *errp = NULL;
        }
        foo = get_foo(arg, errp);
        if (foo) {
            break;              // arg worked
        }
        // local error handling goes here
    }
    if (foo) {
        // arg worked
    } else {
        // local error handling goes here
        // need to free or propagate local_err
    }

This is roughly how your patch works.

Functions like get_foo() spook me, because when I do

    local_err = NULL;
    foo = get_foo(arg, &local_err);
    if (!foo) {
        // fail
        return;
    }
    // success
    // use foo (surely not null)

I worry about local_err being null at // fail, or non-null (and leaked)
at // success.  When I do

    local_err = NULL;
    foo = get_foo(arg, &local_err);
    if (local_err) {
        // fail
        return;
    }
    // success
    // use foo

I worry about foo being null at // success, or non-null (and leaked) at
// fail.

Perhaps I'm too easily worried.
Markus Armbruster - March 6, 2013, 3:05 p.m.
Kevin Wolf <kwolf@redhat.com> writes:

> Am 06.03.2013 um 12:04 hat Paolo Bonzini geschrieben:
>> Il 06/03/2013 11:48, Kevin Wolf ha scritto:
>> > inet_connect_opts() tries all possible addrinfos returned by
>> > getaddrinfo(). If one fails with an error, the next one is tried. In
>> > this case, the Error should be discarded because the whole operation is
>> > successful if another addrinfo from the list succeeds; and if it
>> > doesn't, setting an already set Error will trigger an assertion failure.
>> > 
>> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> > ---
>> >  util/qemu-sockets.c | 8 ++++++++
>> >  1 file changed, 8 insertions(+)
>> > 
>> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
>> > index 1350ccc..32e609a 100644
>> > --- a/util/qemu-sockets.c
>> > +++ b/util/qemu-sockets.c
>> > @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
>> >      }
>> >  
>> >      for (e = res; e != NULL; e = e->ai_next) {
>> > +
>> > +        /* Overwriting errors isn't allowed, so clear any error that may have
>> > +         * occured in the previous iteration */
>> > +        if (error_is_set(errp)) {
>> > +            error_free(*errp);
>> > +            *errp = NULL;
>> > +        }
>> > +
>> >          if (connect_state != NULL) {
>> >              connect_state->current_addr = e;
>> >          }
>> > 
>> 
>> Should we also do nothing if errp is not NULL on entry?
>
> We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
> an Error, you must return instead of calling more functions with the
> same error pointer.

I wouldn't bother asserting this just here.  It's a pervasive issue.  We
already got an assert() that covers actual error overwrites, in
error_set().
Kevin Wolf - March 6, 2013, 3:19 p.m.
Am 06.03.2013 um 16:04 hat Paolo Bonzini geschrieben:
> Il 06/03/2013 15:46, Laszlo Ersek ha scritto:
> >> > We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
> >> > an Error, you must return instead of calling more functions with the
> >> > same error pointer.
> > I think Luiz would suggest (*) to receive any error into a
> > NULL-initialized local_err pointer; do the logic above on local_err, and
> > just before returning, error_propagate() it to errp.
> > 
> > (*) I hope you can see what I did there: if you disagree, you get to
> > take that to Luiz, even though he didn't say anything. I'm getting
> > better at working this list! :)
> 
> I agree with Laszlo.

I don't really understand the difference. As long as the function
doesn't depend on the Error object to be present (which it doesn't),
isn't it semantically exactly the same?

Also, Markus' reply makes me think that I should restrict myself to code
areas where errors are reported as -errno. That one I understand at
least...

Kevin
Laszlo Ersek - March 6, 2013, 3:38 p.m.
On 03/06/13 16:19, Kevin Wolf wrote:
> Am 06.03.2013 um 16:04 hat Paolo Bonzini geschrieben:
>> Il 06/03/2013 15:46, Laszlo Ersek ha scritto:
>>>>> We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
>>>>> an Error, you must return instead of calling more functions with the
>>>>> same error pointer.
>>> I think Luiz would suggest (*) to receive any error into a
>>> NULL-initialized local_err pointer; do the logic above on local_err, and
>>> just before returning, error_propagate() it to errp.
>>>
>>> (*) I hope you can see what I did there: if you disagree, you get to
>>> take that to Luiz, even though he didn't say anything. I'm getting
>>> better at working this list! :)
>>
>> I agree with Laszlo.
> 
> I don't really understand the difference. As long as the function
> doesn't depend on the Error object to be present (which it doesn't),
> isn't it semantically exactly the same?

The difference is when the caller passes in an already set Error. In
this case you release that and replace it with your own error.

Usually we stick to the first error encountered. Under the above
suggestion you'd keep error handling internal to yourself, and in the
end make one attempt to propagate it outwards. If the caller has passed
in NULL, the error is dropped. If the caller's passed in a preexistent
error, then that one takes precedence and the new one is dropped (but it
doesn't interfere with the internal logic). Third, the caller can even
accept your error.

error_propagate() and error_set() deal with the overwrite attempt
differently. The former silently drops the newcomer, whereas the latter
assert()s.

Of course one wonders why a caller would pass in a preexistent Error.

Laszlo
Kevin Wolf - March 6, 2013, 3:47 p.m.
Am 06.03.2013 um 16:38 hat Laszlo Ersek geschrieben:
> On 03/06/13 16:19, Kevin Wolf wrote:
> > Am 06.03.2013 um 16:04 hat Paolo Bonzini geschrieben:
> >> Il 06/03/2013 15:46, Laszlo Ersek ha scritto:
> >>>>> We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
> >>>>> an Error, you must return instead of calling more functions with the
> >>>>> same error pointer.
> >>> I think Luiz would suggest (*) to receive any error into a
> >>> NULL-initialized local_err pointer; do the logic above on local_err, and
> >>> just before returning, error_propagate() it to errp.
> >>>
> >>> (*) I hope you can see what I did there: if you disagree, you get to
> >>> take that to Luiz, even though he didn't say anything. I'm getting
> >>> better at working this list! :)
> >>
> >> I agree with Laszlo.
> > 
> > I don't really understand the difference. As long as the function
> > doesn't depend on the Error object to be present (which it doesn't),
> > isn't it semantically exactly the same?
> 
> The difference is when the caller passes in an already set Error. In
> this case you release that and replace it with your own error.
> 
> Usually we stick to the first error encountered. Under the above
> suggestion you'd keep error handling internal to yourself, and in the
> end make one attempt to propagate it outwards. If the caller has passed
> in NULL, the error is dropped. If the caller's passed in a preexistent
> error, then that one takes precedence and the new one is dropped (but it
> doesn't interfere with the internal logic). Third, the caller can even
> accept your error.
> 
> error_propagate() and error_set() deal with the overwrite attempt
> differently. The former silently drops the newcomer, whereas the latter
> assert()s.
> 
> Of course one wonders why a caller would pass in a preexistent Error.

Thanks, Laszlo, now I think I understand what Paolo and you were
suggesting.

However, I'd call any such caller buggy and don't feel like adding code
so that it doesn't break. This is what I meant when I said you should
return when you get an error, and not call other functions with the
already used error pointer.

Kevin
Markus Armbruster - March 6, 2013, 3:59 p.m.
Kevin Wolf <kwolf@redhat.com> writes:

> Am 06.03.2013 um 16:04 hat Paolo Bonzini geschrieben:
>> Il 06/03/2013 15:46, Laszlo Ersek ha scritto:
>> >> > We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
>> >> > an Error, you must return instead of calling more functions with the
>> >> > same error pointer.
>> > I think Luiz would suggest (*) to receive any error into a
>> > NULL-initialized local_err pointer; do the logic above on local_err, and
>> > just before returning, error_propagate() it to errp.
>> > 
>> > (*) I hope you can see what I did there: if you disagree, you get to
>> > take that to Luiz, even though he didn't say anything. I'm getting
>> > better at working this list! :)
>> 
>> I agree with Laszlo.
>
> I don't really understand the difference. As long as the function
> doesn't depend on the Error object to be present (which it doesn't),
> isn't it semantically exactly the same?

I guess it is in this case (I didn't call your patch wrong).  However, I
figure that as soon as we go beyond utterly trivial with errors, it's
advisable to put them in local variables, and error_propagate() to the
parameter only at the end.  Messing around with errp parameters directly
feels error-prone, and I'm afraid even correct messing could easily lead
to incorrect messing, when folks imitate it without really understanding
what they're doing.

Yes, extra local variables make the error propagation code even bulkier.
No, I don't like it any more than you do.

> Also, Markus' reply makes me think that I should restrict myself to code
> areas where errors are reported as -errno. That one I understand at
> least...

I wish I could, but I stepped into the error swamp long ago, and the
muck is still sticking to my heels.

Whenever a simple error reporting mechanism such as non-null/null or
non-negative/-errno suffices, I very much recommend to use it.
Laszlo Ersek - March 6, 2013, 4:04 p.m.
On 03/06/13 16:47, Kevin Wolf wrote:
> Am 06.03.2013 um 16:38 hat Laszlo Ersek geschrieben:

>> Of course one wonders why a caller would pass in a preexistent Error.
> 
> Thanks, Laszlo, now I think I understand what Paolo and you were
> suggesting.
> 
> However, I'd call any such caller buggy and don't feel like adding code
> so that it doesn't break. This is what I meant when I said you should
> return when you get an error, and not call other functions with the
> already used error pointer.

I don't disagree. I'm certainly not blocking your patch! :)

Laszlo
Paolo Bonzini - March 6, 2013, 4:43 p.m.
Il 06/03/2013 16:59, Markus Armbruster ha scritto:
>> >
>> > I don't really understand the difference. As long as the function
>> > doesn't depend on the Error object to be present (which it doesn't),
>> > isn't it semantically exactly the same?
> I guess it is in this case (I didn't call your patch wrong).  However, I
> figure that as soon as we go beyond utterly trivial with errors, it's
> advisable to put them in local variables, and error_propagate() to the
> parameter only at the end.  Messing around with errp parameters directly
> feels error-prone, and I'm afraid even correct messing could easily lead
> to incorrect messing, when folks imitate it without really understanding
> what they're doing.

Yes, that's my thought more or less.  However, the possibilities are:

1) asserting that there is no error.  We hardly ever do this, and it's
not clear to me that we should start.  It would introduce one more
error-propagation idiom, and more confusion.

2) do nothing if there is an error.  This is what for example the QAPI
visitors do.

3) what Laszlo suggested.  However, it is almost certainly a bug to call
this function with a pre-existing error, because this function has quite
"strong" side effects.  Hence this might mask a bug and leave pending
connections.

4) just your patch.  This is just as wrong as (3) if there was a
pre-existing error, and it also overwrites the pre-existing error;
harder to debug, and inconsistent with functions using the local variables.


None is optimal, but (2) seems the best.

Paolo

> Yes, extra local variables make the error propagation code even bulkier.
> No, I don't like it any more than you do.
Luiz Capitulino - March 19, 2013, 8:34 p.m.
On Wed, 06 Mar 2013 15:46:45 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 03/06/13 12:11, Kevin Wolf wrote:
> > Am 06.03.2013 um 12:04 hat Paolo Bonzini geschrieben:
> >> Il 06/03/2013 11:48, Kevin Wolf ha scritto:
> >>> inet_connect_opts() tries all possible addrinfos returned by
> >>> getaddrinfo(). If one fails with an error, the next one is tried. In
> >>> this case, the Error should be discarded because the whole operation is
> >>> successful if another addrinfo from the list succeeds; and if it
> >>> doesn't, setting an already set Error will trigger an assertion failure.
> >>>
> >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> >>> ---
> >>>  util/qemu-sockets.c | 8 ++++++++
> >>>  1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> >>> index 1350ccc..32e609a 100644
> >>> --- a/util/qemu-sockets.c
> >>> +++ b/util/qemu-sockets.c
> >>> @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
> >>>      }
> >>>  
> >>>      for (e = res; e != NULL; e = e->ai_next) {
> >>> +
> >>> +        /* Overwriting errors isn't allowed, so clear any error that may have
> >>> +         * occured in the previous iteration */
> >>> +        if (error_is_set(errp)) {
> >>> +            error_free(*errp);
> >>> +            *errp = NULL;
> >>> +        }
> >>> +
> >>>          if (connect_state != NULL) {
> >>>              connect_state->current_addr = e;
> >>>          }
> >>>
> >>
> >> Should we also do nothing if errp is not NULL on entry?
> > 
> > We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
> > an Error, you must return instead of calling more functions with the
> > same error pointer.
> 
> I think Luiz would suggest (*) to receive any error into a
> NULL-initialized local_err pointer; do the logic above on local_err, and
> just before returning, error_propagate() it to errp.

Yes, I'd suggest that but it turns out that inet_connect_addr() error
reporting was and still is confusing, which causes callers to use it
incorrectly.

This patch (which has been applied by Anthony) solves the problem at
hand but it also introduces a new issue: errors from inet_connect_addr()
are only reported if they happen in the last loop interaction. Note that
a few other errors other than 'couldn't connect' can happen.

Laszlo's comment seemed to have triggered a discussion around Error **,
but this really has very little to do with it: the real problem is that
inet_connect_addr() is too confusing.

inet_connect_addr() has two users: inet_connect_opts() and wait_for_connect(),
with this patch both of them are now ignoring errors from inet_connect_addr().

Suggested solution: refactor inet_connect_addr() to return an errno value.
Callers use error_set() when they want to report an error upward.
Kevin Wolf - March 20, 2013, 8:39 a.m.
Am 19.03.2013 um 21:34 hat Luiz Capitulino geschrieben:
> On Wed, 06 Mar 2013 15:46:45 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
> > On 03/06/13 12:11, Kevin Wolf wrote:
> > > Am 06.03.2013 um 12:04 hat Paolo Bonzini geschrieben:
> > >> Il 06/03/2013 11:48, Kevin Wolf ha scritto:
> > >>> inet_connect_opts() tries all possible addrinfos returned by
> > >>> getaddrinfo(). If one fails with an error, the next one is tried. In
> > >>> this case, the Error should be discarded because the whole operation is
> > >>> successful if another addrinfo from the list succeeds; and if it
> > >>> doesn't, setting an already set Error will trigger an assertion failure.
> > >>>
> > >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > >>> ---
> > >>>  util/qemu-sockets.c | 8 ++++++++
> > >>>  1 file changed, 8 insertions(+)
> > >>>
> > >>> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > >>> index 1350ccc..32e609a 100644
> > >>> --- a/util/qemu-sockets.c
> > >>> +++ b/util/qemu-sockets.c
> > >>> @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
> > >>>      }
> > >>>  
> > >>>      for (e = res; e != NULL; e = e->ai_next) {
> > >>> +
> > >>> +        /* Overwriting errors isn't allowed, so clear any error that may have
> > >>> +         * occured in the previous iteration */
> > >>> +        if (error_is_set(errp)) {
> > >>> +            error_free(*errp);
> > >>> +            *errp = NULL;
> > >>> +        }
> > >>> +
> > >>>          if (connect_state != NULL) {
> > >>>              connect_state->current_addr = e;
> > >>>          }
> > >>>
> > >>
> > >> Should we also do nothing if errp is not NULL on entry?
> > > 
> > > We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
> > > an Error, you must return instead of calling more functions with the
> > > same error pointer.
> > 
> > I think Luiz would suggest (*) to receive any error into a
> > NULL-initialized local_err pointer; do the logic above on local_err, and
> > just before returning, error_propagate() it to errp.
> 
> Yes, I'd suggest that but it turns out that inet_connect_addr() error
> reporting was and still is confusing, which causes callers to use it
> incorrectly.
> 
> This patch (which has been applied by Anthony)

No, Anthony applied a different, but similar patch of his own. This is
why I don't feel particularly responsible for the specific problem any
more.

How to do error handling with Error right is the only reason for me to
continue the discussion.

> solves the problem at
> hand but it also introduces a new issue: errors from inet_connect_addr()
> are only reported if they happen in the last loop interaction. Note that
> a few other errors other than 'couldn't connect' can happen.

> Laszlo's comment seemed to have triggered a discussion around Error **,
> but this really has very little to do with it: the real problem is that
> inet_connect_addr() is too confusing.

Maybe we need to discuss first what the intended behaviour even is. My
interpretation was this: We may have several addresses to try. If one of
them works, the function as a whole has succeeded and must not return an
error, neither in errp nor as -errno. If none of them succeeds, the
function has to return an error, and returning the error of the last
attempt is as good as the error of any other attempt.

> inet_connect_addr() has two users: inet_connect_opts() and wait_for_connect(),
> with this patch both of them are now ignoring errors from inet_connect_addr().
> 
> Suggested solution: refactor inet_connect_addr() to return an errno value.
> Callers use error_set() when they want to report an error upward.

Doesn't change the problem that you need to know when to set a return
value != 0. So it doesn't help, but you'd lose some error information.

Kevin
Luiz Capitulino - March 20, 2013, 12:57 p.m.
On Wed, 20 Mar 2013 09:39:34 +0100
Kevin Wolf <kwolf@redhat.com> wrote:

> Am 19.03.2013 um 21:34 hat Luiz Capitulino geschrieben:
> > On Wed, 06 Mar 2013 15:46:45 +0100
> > Laszlo Ersek <lersek@redhat.com> wrote:
> > 
> > > On 03/06/13 12:11, Kevin Wolf wrote:
> > > > Am 06.03.2013 um 12:04 hat Paolo Bonzini geschrieben:
> > > >> Il 06/03/2013 11:48, Kevin Wolf ha scritto:
> > > >>> inet_connect_opts() tries all possible addrinfos returned by
> > > >>> getaddrinfo(). If one fails with an error, the next one is tried. In
> > > >>> this case, the Error should be discarded because the whole operation is
> > > >>> successful if another addrinfo from the list succeeds; and if it
> > > >>> doesn't, setting an already set Error will trigger an assertion failure.
> > > >>>
> > > >>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > >>> ---
> > > >>>  util/qemu-sockets.c | 8 ++++++++
> > > >>>  1 file changed, 8 insertions(+)
> > > >>>
> > > >>> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > > >>> index 1350ccc..32e609a 100644
> > > >>> --- a/util/qemu-sockets.c
> > > >>> +++ b/util/qemu-sockets.c
> > > >>> @@ -373,6 +373,14 @@ int inet_connect_opts(QemuOpts *opts, Error **errp,
> > > >>>      }
> > > >>>  
> > > >>>      for (e = res; e != NULL; e = e->ai_next) {
> > > >>> +
> > > >>> +        /* Overwriting errors isn't allowed, so clear any error that may have
> > > >>> +         * occured in the previous iteration */
> > > >>> +        if (error_is_set(errp)) {
> > > >>> +            error_free(*errp);
> > > >>> +            *errp = NULL;
> > > >>> +        }
> > > >>> +
> > > >>>          if (connect_state != NULL) {
> > > >>>              connect_state->current_addr = e;
> > > >>>          }
> > > >>>
> > > >>
> > > >> Should we also do nothing if errp is not NULL on entry?
> > > > 
> > > > We could assert(!error_is_set(errp)) if we wanted. As soon as you've got
> > > > an Error, you must return instead of calling more functions with the
> > > > same error pointer.
> > > 
> > > I think Luiz would suggest (*) to receive any error into a
> > > NULL-initialized local_err pointer; do the logic above on local_err, and
> > > just before returning, error_propagate() it to errp.
> > 
> > Yes, I'd suggest that but it turns out that inet_connect_addr() error
> > reporting was and still is confusing, which causes callers to use it
> > incorrectly.
> > 
> > This patch (which has been applied by Anthony)
> 
> No, Anthony applied a different, but similar patch of his own. This is
> why I don't feel particularly responsible for the specific problem any
> more.
> 
> How to do error handling with Error right is the only reason for me to
> continue the discussion.

I think we need a kind of "Error best practices" doc.

> > solves the problem at
> > hand but it also introduces a new issue: errors from inet_connect_addr()
> > are only reported if they happen in the last loop interaction. Note that
> > a few other errors other than 'couldn't connect' can happen.
> 
> > Laszlo's comment seemed to have triggered a discussion around Error **,
> > but this really has very little to do with it: the real problem is that
> > inet_connect_addr() is too confusing.
> 
> Maybe we need to discuss first what the intended behaviour even is. My
> interpretation was this: We may have several addresses to try. If one of
> them works, the function as a whole has succeeded and must not return an
> error, neither in errp nor as -errno. If none of them succeeds, the
> function has to return an error, and returning the error of the last
> attempt is as good as the error of any other attempt.

I agree. When I looked at the code yesterday I had the impression that
several other errors where possible, which made me wonder if we shouldn't
stop short the loop on non-"can't connect" type of errors.

But looking at it again we have only socket() and connect() calls, and
I'd expect that most (all?) non can't connect errors will happen in all
loop iterations, which will cause the error to be reported.

> > inet_connect_addr() has two users: inet_connect_opts() and wait_for_connect(),
> > with this patch both of them are now ignoring errors from inet_connect_addr().
> > 
> > Suggested solution: refactor inet_connect_addr() to return an errno value.
> > Callers use error_set() when they want to report an error upward.
> 
> Doesn't change the problem that you need to know when to set a return
> value != 0. So it doesn't help, but you'd lose some error information.

My real point is that it's easier to check against errno to find out
the error cause (compared to using Error for that).
Kevin Wolf - March 20, 2013, 1:37 p.m.
Am 20.03.2013 um 13:57 hat Luiz Capitulino geschrieben:
> On Wed, 20 Mar 2013 09:39:34 +0100
> Kevin Wolf <kwolf@redhat.com> wrote:
> 
> > Am 19.03.2013 um 21:34 hat Luiz Capitulino geschrieben:
> > > inet_connect_addr() has two users: inet_connect_opts() and wait_for_connect(),
> > > with this patch both of them are now ignoring errors from inet_connect_addr().
> > > 
> > > Suggested solution: refactor inet_connect_addr() to return an errno value.
> > > Callers use error_set() when they want to report an error upward.
> > 
> > Doesn't change the problem that you need to know when to set a return
> > value != 0. So it doesn't help, but you'd lose some error information.
> 
> My real point is that it's easier to check against errno to find out
> the error cause (compared to using Error for that).

You mean if the caller has to distinguish between different error codes?
I think I would agree that avoiding Error can be a good way then if it
doesn't lose error information. If we would lose information, using
error classes other than generic would be acceptable, right?

In the specific case, I don't think the callers make any difference and
all errors are just errors, so this is mostly about the theory.

Kevin
Luiz Capitulino - March 20, 2013, 1:52 p.m.
On Wed, 20 Mar 2013 14:37:59 +0100
Kevin Wolf <kwolf@redhat.com> wrote:

> Am 20.03.2013 um 13:57 hat Luiz Capitulino geschrieben:
> > On Wed, 20 Mar 2013 09:39:34 +0100
> > Kevin Wolf <kwolf@redhat.com> wrote:
> > 
> > > Am 19.03.2013 um 21:34 hat Luiz Capitulino geschrieben:
> > > > inet_connect_addr() has two users: inet_connect_opts() and wait_for_connect(),
> > > > with this patch both of them are now ignoring errors from inet_connect_addr().
> > > > 
> > > > Suggested solution: refactor inet_connect_addr() to return an errno value.
> > > > Callers use error_set() when they want to report an error upward.
> > > 
> > > Doesn't change the problem that you need to know when to set a return
> > > value != 0. So it doesn't help, but you'd lose some error information.
> > 
> > My real point is that it's easier to check against errno to find out
> > the error cause (compared to using Error for that).
> 
> You mean if the caller has to distinguish between different error codes?

Yes.

> I think I would agree that avoiding Error can be a good way then if it
> doesn't lose error information. If we would lose information, using
> error classes other than generic would be acceptable, right?

Yes, I think so.

I mean, even to me it's not entirely clear when new classes should be added.
The rule I had in mind was that a new class should be added when a QMP
client needs to distinguish a specific error. However, we're considering
QEMU subsystems to be QMP clients, which (taken to an extreme) would lead us
to our recent past where Error was trying to replace errno.

Markus once wrote about where to set boundaries between errno and Error, but I
don't remember if it was a private discussion or an email to the list. It's
time to write this down in docs/.

Patch

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 1350ccc..32e609a 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -373,6 +373,14 @@  int inet_connect_opts(QemuOpts *opts, Error **errp,
     }
 
     for (e = res; e != NULL; e = e->ai_next) {
+
+        /* Overwriting errors isn't allowed, so clear any error that may have
+         * occured in the previous iteration */
+        if (error_is_set(errp)) {
+            error_free(*errp);
+            *errp = NULL;
+        }
+
         if (connect_state != NULL) {
             connect_state->current_addr = e;
         }