[1/2] nbd: Drop connection if broken server is detected

Message ID 7d6be617-6907-5213-941b-38fe5d3f0fee@redhat.com
State New
Headers show

Commit Message

Eric Blake Aug. 11, 2017, 2:15 p.m.
On 08/11/2017 02:48 AM, Vladimir Sementsov-Ogievskiy wrote:
> 11.08.2017 05:37, Eric Blake wrote:
>> As soon as the server is sending us garbage, we should quit
>> trying to send further messages to the server, and allow all
>> pending coroutines for any remaining replies to error out.
>> Failure to do so can let a malicious server cause the client
>> to hang, for example, if the server sends an invalid magic
>> number in its response.

>> @@ -107,8 +108,12 @@ static coroutine_fn void
>> nbd_read_reply_entry(void *opaque)
>>           qemu_coroutine_yield();
>>       }
>>
>> +    s->reply.handle = 0;
>>       nbd_recv_coroutines_enter_all(s);
>>       s->read_reply_co = NULL;
>> +    if (ret < 0) {
>> +        nbd_teardown_connection(bs);
>> +    }
> 
> what if it happens in parallel with nbd_co_send_request?
> nbd_teardown_connectin destroys s->ioc, nbd_co_send_requests
> checks s->ioc only once and then calls nbd_send_request (which is
> finally nbd_rwv and may yield). I think nbd_rwv is not
> prepared to sudden destruction of ioc..

The nbd_recv_coroutines_enter_all() call schedules all pending
nbd_co_send_request coroutines to fire as soon as the current coroutine
reaches a yield point. The next yield point is during the
BDRV_POLL_WHILE of nbd_teardown_connection - but this is AFTER we've
called qio_channel_shutdown() - so as long as nbd_rwv() is called with a
valid ioc, the qio code should recognize that we are shutting down the
connection and gracefully give an error on each write attempt.

I see your point about the fact that coroutines can change hands in
between our two writes for an NBD_CMD_WRITE in nbd_co_send_request()
(the first write is nbd_send_request() for the header, the second is
nbd_rwv() for the data) - if between those two writes we process a
failing read, I see your point about us risking re-reading s->ioc as
NULL for the second write call.  But maybe this is an appropriate fix -
hanging on to the ioc that we learned when grabbing the send_mutex:


     return rc;



But I'm really hoping Paolo will chime in on this thread.

Comments

Vladimir Sementsov-Ogievskiy Aug. 11, 2017, 2:53 p.m. | #1
11.08.2017 17:15, Eric Blake wrote:
> On 08/11/2017 02:48 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 11.08.2017 05:37, Eric Blake wrote:
>>> As soon as the server is sending us garbage, we should quit
>>> trying to send further messages to the server, and allow all
>>> pending coroutines for any remaining replies to error out.
>>> Failure to do so can let a malicious server cause the client
>>> to hang, for example, if the server sends an invalid magic
>>> number in its response.
>>> @@ -107,8 +108,12 @@ static coroutine_fn void
>>> nbd_read_reply_entry(void *opaque)
>>>            qemu_coroutine_yield();
>>>        }
>>>
>>> +    s->reply.handle = 0;
>>>        nbd_recv_coroutines_enter_all(s);
>>>        s->read_reply_co = NULL;
>>> +    if (ret < 0) {
>>> +        nbd_teardown_connection(bs);
>>> +    }
>> what if it happens in parallel with nbd_co_send_request?
>> nbd_teardown_connectin destroys s->ioc, nbd_co_send_requests
>> checks s->ioc only once and then calls nbd_send_request (which is
>> finally nbd_rwv and may yield). I think nbd_rwv is not
>> prepared to sudden destruction of ioc..
> The nbd_recv_coroutines_enter_all() call schedules all pending
> nbd_co_send_request coroutines to fire as soon as the current coroutine
> reaches a yield point. The next yield point is during the
> BDRV_POLL_WHILE of nbd_teardown_connection - but this is AFTER we've
> called qio_channel_shutdown() - so as long as nbd_rwv() is called with a
> valid ioc, the qio code should recognize that we are shutting down the
> connection and gracefully give an error on each write attempt.


Hmm, was it correct even before your patch? Is it safe to enter a coroutine
(which we've scheduled by nbd_recv_coroutines_enter_all()), which is 
actually
yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)?

>
> I see your point about the fact that coroutines can change hands in
> between our two writes for an NBD_CMD_WRITE in nbd_co_send_request()
> (the first write is nbd_send_request() for the header, the second is
> nbd_rwv() for the data) - if between those two writes we process a
> failing read, I see your point about us risking re-reading s->ioc as

But there are no yields between two writes, so, if previous logic is 
correct,
if the read fails during first write it will return and error and we 
will not go
into the second write. If it fails during the second write, it should be 
OK too.

> NULL for the second write call.  But maybe this is an appropriate fix -
> hanging on to the ioc that we learned when grabbing the send_mutex:
>
>
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 802d50b636..28b10f3fa2 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -122,6 +122,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
>   {
>       NBDClientSession *s = nbd_get_client_session(bs);
>       int rc, ret, i;
> +    QIOChannel *ioc;
>
>       qemu_co_mutex_lock(&s->send_mutex);
>       while (s->in_flight == MAX_NBD_REQUESTS) {
> @@ -139,25 +140,26 @@ static int nbd_co_send_request(BlockDriverState *bs,
>       g_assert(qemu_in_coroutine());
>       assert(i < MAX_NBD_REQUESTS);
>       request->handle = INDEX_TO_HANDLE(s, i);
> +    ioc = s->ioc;
>
> -    if (!s->ioc) {
> +    if (!ioc) {
>           qemu_co_mutex_unlock(&s->send_mutex);
>           return -EPIPE;
>       }
>
>       if (qiov) {
> -        qio_channel_set_cork(s->ioc, true);
> -        rc = nbd_send_request(s->ioc, request);
> +        qio_channel_set_cork(ioc, true);
> +        rc = nbd_send_request(ioc, request);
>           if (rc >= 0) {
> -            ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len,
> false,
> +            ret = nbd_rwv(ioc, qiov->iov, qiov->niov, request->len, false,
>                             NULL);
>               if (ret != request->len) {
>                   rc = -EIO;
>               }
>           }
> -        qio_channel_set_cork(s->ioc, false);
> +        qio_channel_set_cork(ioc, false);
>       } else {
> -        rc = nbd_send_request(s->ioc, request);
> +        rc = nbd_send_request(ioc, request);
>       }
>       qemu_co_mutex_unlock(&s->send_mutex);
>       return rc;
>
>
>
> But I'm really hoping Paolo will chime in on this thread.
>
Eric Blake Aug. 11, 2017, 7:41 p.m. | #2
On 08/11/2017 09:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> 11.08.2017 17:15, Eric Blake wrote:
>> On 08/11/2017 02:48 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> 11.08.2017 05:37, Eric Blake wrote:
>>>> As soon as the server is sending us garbage, we should quit
>>>> trying to send further messages to the server, and allow all
>>>> pending coroutines for any remaining replies to error out.
>>>> Failure to do so can let a malicious server cause the client
>>>> to hang, for example, if the server sends an invalid magic
>>>> number in its response.

>> The nbd_recv_coroutines_enter_all() call schedules all pending
>> nbd_co_send_request coroutines to fire as soon as the current coroutine
>> reaches a yield point. The next yield point is during the
>> BDRV_POLL_WHILE of nbd_teardown_connection - but this is AFTER we've
>> called qio_channel_shutdown() - so as long as nbd_rwv() is called with a
>> valid ioc, the qio code should recognize that we are shutting down the
>> connection and gracefully give an error on each write attempt.
> 
> 
> Hmm, was it correct even before your patch? Is it safe to enter a coroutine
> (which we've scheduled by nbd_recv_coroutines_enter_all()), which is
> actually
> yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)?

I'm honestly not sure how to answer the question. In my testing, I was
unable to catch a coroutine yielding inside of nbd_rwv(); I was able to
easily provoke a situation where the client can send two or more
commands prior to the server getting a chance to reply to either:

./qemu-io -f raw nbd://localhost:10809/foo \
   -c 'aio_read 0 512' -c 'aio_write 1k 1m'

where tracing the server proves that the server received both commands
before sending a reply; when the client sends two aio_read commands, it
was even the case that I could observe the server replying to the second
read before the first.  So I'm definitely provoking parallel coroutines.
 But even without my tentative squash patch, I haven't been able to
observe s->ioc change from valid to NULL within the body of
nbd_co_send_request - either the entire request is skipped because ioc
was already cleared, or the entire request operates on a valid ioc
(although the request may still fail with EPIPE because the ioc has
started its efforts at shutdown).  I even tried varying the size of the
aio_write; with 1M, the client got the write request sent off before the
server's reply; but 2M was large enough that the server sent the read
reply before the client could send the write.  Since we are using a
mutex, we have at most one coroutine able to attempt a write at once;
but that still says nothing about how many other parallel coroutines can
wake up to do a read.  I also think the fact that we are using qio's
set_cork around the paired writes is helping: although we have two calls
to nbd_rwv(), the first one is for the NBD_CMD_WRITE header which is
small that the qio layer waits for the second nbd_rwv() call before
actually sending anything over the wire (if anything, we are more likely
to see s->ioc change before the final set_cork call, rather than between
the two writes - if that change can even happen).

>>
>> I see your point about the fact that coroutines can change hands in
>> between our two writes for an NBD_CMD_WRITE in nbd_co_send_request()
>> (the first write is nbd_send_request() for the header, the second is
>> nbd_rwv() for the data) - if between those two writes we process a
>> failing read, I see your point about us risking re-reading s->ioc as
> 
> But there are no yields between two writes, so, if previous logic is
> correct,
> if the read fails during first write it will return and error and we
> will not go
> into the second write. If it fails during the second write, it should be
> OK too.

If we can ever observe s->ioc changing to NULL, then my followup squash
patch is needed (if nothing else, calling qio_channel_set_cork(NULL,
false) will crash).  But I'm not familiar enough with coroutines to know
if it is possible, or just paranoia on my part.
Eric Blake Aug. 11, 2017, 8:01 p.m. | #3
On 08/11/2017 02:41 PM, Eric Blake wrote:
>> Hmm, was it correct even before your patch? Is it safe to enter a coroutine
>> (which we've scheduled by nbd_recv_coroutines_enter_all()), which is
>> actually
>> yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)?
> 
> I'm honestly not sure how to answer the question. In my testing, I was
> unable to catch a coroutine yielding inside of nbd_rwv();

Single stepping through nbd_rwv(), I see that I/O is performed by
sendmsg(), which either gets the message sent or, because of nonblocking
mode, fails with EAGAIN, which gets turned into QIO_CHANNEL_ERR_BLOCK
and indeed a call to qemu_channel_yield() within nbd_rwv() - but it's
timing sensitive, so I still haven't been able to provoke this scenario
using gdb.

Patch

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 802d50b636..28b10f3fa2 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -122,6 +122,7 @@  static int nbd_co_send_request(BlockDriverState *bs,
 {
     NBDClientSession *s = nbd_get_client_session(bs);
     int rc, ret, i;
+    QIOChannel *ioc;

     qemu_co_mutex_lock(&s->send_mutex);
     while (s->in_flight == MAX_NBD_REQUESTS) {
@@ -139,25 +140,26 @@  static int nbd_co_send_request(BlockDriverState *bs,
     g_assert(qemu_in_coroutine());
     assert(i < MAX_NBD_REQUESTS);
     request->handle = INDEX_TO_HANDLE(s, i);
+    ioc = s->ioc;

-    if (!s->ioc) {
+    if (!ioc) {
         qemu_co_mutex_unlock(&s->send_mutex);
         return -EPIPE;
     }

     if (qiov) {
-        qio_channel_set_cork(s->ioc, true);
-        rc = nbd_send_request(s->ioc, request);
+        qio_channel_set_cork(ioc, true);
+        rc = nbd_send_request(ioc, request);
         if (rc >= 0) {
-            ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len,
false,
+            ret = nbd_rwv(ioc, qiov->iov, qiov->niov, request->len, false,
                           NULL);
             if (ret != request->len) {
                 rc = -EIO;
             }
         }
-        qio_channel_set_cork(s->ioc, false);
+        qio_channel_set_cork(ioc, false);
     } else {
-        rc = nbd_send_request(s->ioc, request);
+        rc = nbd_send_request(ioc, request);
     }
     qemu_co_mutex_unlock(&s->send_mutex);