diff mbox

[for-2.10,v2] nbd-client: Fix regression when server sends garbage

Message ID 6c97fa2e-a04c-57e7-430d-8dcecb1875e8@redhat.com
State New
Headers show

Commit Message

Eric Blake Aug. 15, 2017, 2:54 p.m. UTC
On 08/15/2017 09:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2017 00:34, Eric Blake wrote:
>> When we switched NBD to use coroutines for qemu 2.9 (in particular,
>> commit a12a712a), we introduced a regression: if a server sends us
>> garbage (such as a corrupted magic number), we quit the read loop
>> but do not stop sending further queued commands, resulting in the
>> client hanging when it never reads the response to those additional
>> commands.  In qemu 2.8, we properly detected that the server is no
>> longer reliable, and cancelled all existing pending commands with
>> EIO, then tore down the socket so that all further command attempts
>> get EPIPE.
>>
>> Restore the proper behavior of quitting (almost) all communication
>> with a broken server: Once we know we are out of sync or otherwise
>> can't trust the server, we must assume that any further incoming
>> data is unreliable and therefore end all pending commands with EIO,
>> and quit trying to send any further commands.  As an exception, we
>> still (try to) send NBD_CMD_DISC to let the server know we are going
>> away (in part, because it is easier to do that than to further
>> refactor nbd_teardown_connection, and in part because it is the
>> only command where we do not have to wait for a reply).
>>
>> Based on a patch by Vladimir Sementsov-Ogievskiy.
>>

>> +++ b/block/nbd-client.c
>> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>>       int ret;
>>       Error *local_err = NULL;
>>
>> -    for (;;) {
>> +    while (!s->quit) {
> 
> looks like quit will never be true here
> 
>>           assert(s->reply.handle == 0);
>>           ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>>           if (ret < 0) {
>> @@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>>           qemu_coroutine_yield();
>>       }
>>
>> +    if (ret < 0) {
>> +        s->quit = true;
> 
> so, you set quit only here.. if we fail on some write, reading coroutine
> will not
> know about it and will continue reading..

Looks like you are correct - your version set the flag in more places
than me, but it looks like you're right that we DO want to set the flag
when writing hits a known failure.

Here's what I plan to squash in:
diff mbox

Patch

diff --git i/block/nbd-client.c w/block/nbd-client.c
index bb17e3da45..422ecb4307 100644
--- i/block/nbd-client.c
+++ w/block/nbd-client.c
@@ -161,6 +161,9 @@  static int nbd_co_send_request(BlockDriverState *bs,
     } else {
         rc = nbd_send_request(s->ioc, request);
     }
+    if (rc < 0) {
+        s->quit = true;
+    }
     qemu_co_mutex_unlock(&s->send_mutex);
     return rc;
 }