Message ID | 20171112013936.5942-1-eblake@redhat.com |
---|---|
State | New |
Headers | show |
Series | [for-2.11] nbd: Don't crash when server reports NBD_CMD_READ failure | expand |
12.11.2017 04:39, Eric Blake wrote: > If a server fails a read, for example with EIO, but the connection > is still live, then we would crash trying to print a non-existent > error message. Bug introduced in commit f140e300. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > block/nbd-client.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/block/nbd-client.c b/block/nbd-client.c > index b44d4d4a01..5f3375a970 100644 > --- a/block/nbd-client.c > +++ b/block/nbd-client.c > @@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque) > while (!s->quit) { > assert(s->reply.handle == 0); > ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); hmm. /* nbd_receive_reply * Returns 1 on success * 0 on eof, when no data was read (errp is not set) * negative errno on failure (errp is set) */ int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp) - looks like errp should be set if ret < 0. may be the function should be fixed? - don't you think that this function returns transferred server error? it doesn't. > - if (ret < 0) { > + if (local_err) { > error_report_err(local_err); > } > if (ret <= 0) { > @@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset, > > ret = nbd_co_receive_cmdread_reply(client, request.handle, offset, qiov, > &local_err); > - if (ret < 0) { > + if (local_err) { > error_report_err(local_err); > } > return ret; but here, it looks like you are right. My first attempt was to store server error to ret and server error msg to errp (bad idea, as I can see now), but you proposed to do not print every server error msg and deleted this functionality. And now we have a set of functions which can return ret < 0 and not set errp.. It looks very confusing, I think. Better solution is to refactor this somehow, may be not mixing server-error-reply with local normal errors.. For now, for second chunk: Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
On 11/13/2017 05:37 AM, Vladimir Sementsov-Ogievskiy wrote: > 12.11.2017 04:39, Eric Blake wrote: >> If a server fails a read, for example with EIO, but the connection >> is still live, then we would crash trying to print a non-existent >> error message. Bug introduced in commit f140e300. >> >> Signed-off-by: Eric Blake <eblake@redhat.com> >> --- >> block/nbd-client.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/block/nbd-client.c b/block/nbd-client.c >> index b44d4d4a01..5f3375a970 100644 >> --- a/block/nbd-client.c >> +++ b/block/nbd-client.c >> @@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void >> *opaque) >> while (!s->quit) { >> assert(s->reply.handle == 0); >> ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); > > hmm. > > /* nbd_receive_reply > * Returns 1 on success > * 0 on eof, when no data was read (errp is not set) > * negative errno on failure (errp is set) > */ > int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp) > > > - looks like errp should be set if ret < 0. may be the function should > be fixed? > - don't you think that this function returns transferred server error? > it doesn't. > >> - if (ret < 0) { >> + if (local_err) { >> error_report_err(local_err); In my testing, I did not trip on this error_report_err() crashing. Which means I think we are already honoring our contract: nbd_receive_reply() is setting local_err if it returns -1. Thus, for this call site, 'if (ret < 0)' is equivalent to 'if (local_err)'. The only reason to change it, then, is for consistency with the other 2 callers of error_report_err() in this file... >> } >> if (ret <= 0) { >> @@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, >> uint64_t offset, >> >> ret = nbd_co_receive_cmdread_reply(client, request.handle, >> offset, qiov, >> &local_err); >> - if (ret < 0) { >> + if (local_err) { >> error_report_err(local_err); >> } >> return ret; > > but here, it looks like you are right. My first attempt was to store > server error to ret and > server error msg to errp (bad idea, as I can see now), but you proposed > to do not print every server error msg and > deleted this functionality. And now we have a set of functions which can > return ret < 0 > and not set errp.. Indeed, this is the caller that MUST return a negative value to get the caller to report a failed operation, but where the negative value does NOT imply a dead connection unless local_err was also set, so here, we must not print anything unless local_err was set. I caught one of the two spots like this while revising your patch prior to soft freeze, but missed this one. > It looks very confusing, I think. Better solution is > to refactor this somehow, > may be not mixing server-error-reply with local normal errors.. A refactoring may be nice, but it would be 2.12 material; at this point, I want to minimize what further changes go into 2.11. That said, > > For now, for second chunk: I see no problem with both chunks going in. The second is a bug fix, but the first is for consistency, and I'd rather have all callers to error_report_err() look consistent, rather than trying to remember that 'ret < 0' does not always imply errp is set. > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > >
13.11.2017 17:45, Eric Blake wrote: > On 11/13/2017 05:37 AM, Vladimir Sementsov-Ogievskiy wrote: >> 12.11.2017 04:39, Eric Blake wrote: >>> If a server fails a read, for example with EIO, but the connection >>> is still live, then we would crash trying to print a non-existent >>> error message. Bug introduced in commit f140e300. >>> >>> Signed-off-by: Eric Blake <eblake@redhat.com> >>> --- >>> block/nbd-client.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/block/nbd-client.c b/block/nbd-client.c >>> index b44d4d4a01..5f3375a970 100644 >>> --- a/block/nbd-client.c >>> +++ b/block/nbd-client.c >>> @@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void >>> *opaque) >>> while (!s->quit) { >>> assert(s->reply.handle == 0); >>> ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); >> hmm. >> >> /* nbd_receive_reply >> * Returns 1 on success >> * 0 on eof, when no data was read (errp is not set) >> * negative errno on failure (errp is set) >> */ >> int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp) >> >> >> - looks like errp should be set if ret < 0. may be the function should >> be fixed? >> - don't you think that this function returns transferred server error? >> it doesn't. >> >>> - if (ret < 0) { >>> + if (local_err) { >>> error_report_err(local_err); > In my testing, I did not trip on this error_report_err() crashing. > Which means I think we are already honoring our contract: > nbd_receive_reply() is setting local_err if it returns -1. Thus, for > this call site, 'if (ret < 0)' is equivalent to 'if (local_err)'. The > only reason to change it, then, is for consistency with the other 2 > callers of error_report_err() in this file... > >>> } >>> if (ret <= 0) { >>> @@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, >>> uint64_t offset, >>> >>> ret = nbd_co_receive_cmdread_reply(client, request.handle, >>> offset, qiov, >>> &local_err); >>> - if (ret < 0) { >>> + if (local_err) { >>> error_report_err(local_err); >>> } >>> return ret; >> but here, it looks like you are right. My first attempt was to store >> server error to ret and >> server error msg to errp (bad idea, as I can see now), but you proposed >> to do not print every server error msg and >> deleted this functionality. And now we have a set of functions which can >> return ret < 0 >> and not set errp.. > Indeed, this is the caller that MUST return a negative value to get the > caller to report a failed operation, but where the negative value does > NOT imply a dead connection unless local_err was also set, so here, we > must not print anything unless local_err was set. I caught one of the > two spots like this while revising your patch prior to soft freeze, but > missed this one. > >> It looks very confusing, I think. Better solution is >> to refactor this somehow, >> may be not mixing server-error-reply with local normal errors.. > A refactoring may be nice, but it would be 2.12 material; at this point, > I want to minimize what further changes go into 2.11. That said, > >> For now, for second chunk: > I see no problem with both chunks going in. The second is a bug fix, > but the first is for consistency, and I'd rather have all callers to > error_report_err() look consistent, rather than trying to remember that > 'ret < 0' does not always imply errp is set. ok, lets proceed with both chunks. > >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >> >> >>
On 11/11/2017 07:39 PM, Eric Blake wrote: > If a server fails a read, for example with EIO, but the connection > is still live, then we would crash trying to print a non-existent > error message. Bug introduced in commit f140e300. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > block/nbd-client.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Queued on my NBD tree; will be in 2.11 (I may do another pull request prior to -rc1, otherwise it will be -rc2).
diff --git a/block/nbd-client.c b/block/nbd-client.c index b44d4d4a01..5f3375a970 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque) while (!s->quit) { assert(s->reply.handle == 0); ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); - if (ret < 0) { + if (local_err) { error_report_err(local_err); } if (ret <= 0) { @@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset, ret = nbd_co_receive_cmdread_reply(client, request.handle, offset, qiov, &local_err); - if (ret < 0) { + if (local_err) { error_report_err(local_err); } return ret;
If a server fails a read, for example with EIO, but the connection is still live, then we would crash trying to print a non-existent error message. Bug introduced in commit f140e300. Signed-off-by: Eric Blake <eblake@redhat.com> --- block/nbd-client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)