diff mbox

[1/3] nbd-client: enter read_reply_co during init to avoid crash

Message ID 20170824153345.2244-2-stefanha@redhat.com
State New
Headers show

Commit Message

Stefan Hajnoczi Aug. 24, 2017, 3:33 p.m. UTC
The following segfault is encountered if the NBD server closes the UNIX
domain socket immediately after negotiation:

  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
  441	    QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
  (gdb) bt
  #0  0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
  #1  0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207
  #2  0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237
  #3  0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836
  #4  0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:1086
  #5  0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182
  #6  0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032
  #7  0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079
  #8  0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
  #9  0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6

The problem is that nbd_client_init() uses
nbd_client_attach_aio_context() -> aio_co_schedule(new_context,
client->read_reply_co).  Execution of read_reply_co is deferred to a BH
which doesn't run until later.

In the mean time blk_co_preadv() can be called and nbd_coroutine_end()
calls aio_wake() on read_reply_co.  At this point in time
read_reply_co's ctx isn't set because it has never been entered yet.

This patch enters read_reply_co directly in
nbd_client_attach_aio_context().  This is safe because new_context is
acquired by the caller.  This ensures that read_reply_co reaches its
first yield point and its ctx is set up.

Note this only happens with UNIX domain sockets on Linux.  It doesn't
seem possible to reproduce this with TCP sockets.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nbd-client.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Paolo Bonzini Aug. 24, 2017, 4:21 p.m. UTC | #1
On 24/08/2017 17:33, Stefan Hajnoczi wrote:
> This patch enters read_reply_co directly in
> nbd_client_attach_aio_context().  This is safe because new_context is
> acquired by the caller.  This ensures that read_reply_co reaches its
> first yield point and its ctx is set up.

I'm not very confident with this patch.  aio_context_acquire/release is
going to go away, and this then becomes possible

	main context			new_context
	qemu_aio_coroutine_enter
					send request
					wait for reply
	read first reply
	wake coroutine

where the "wake coroutine" part thinks it's running in new_context, and
thus simply enters the coroutine instead of using the bottom half.

But blk_co_preadv() should need the read_reply_co itself, in order to be
woken up after reading the reply header.  The core issue here is that
nbd_co_receive_reply was never called, I suspect.  And if it was never
called, read_reply_co should not be woken up by nbd_coroutine_end.

So the fix is:

1) assign NULL to s->recv_coroutine[i] when nbd_co_send_request fails

2) move this to nbd_co_receive_reply:

    s->recv_coroutine[i] = NULL;

    /* Kick the read_reply_co to get the next reply.  */
    if (s->read_reply_co) {
        aio_co_wake(s->read_reply_co);
    }

Does this make sense?  (Note that the read_reply_co idea actually came
from you, or from my recollections of your proposed design :)).

Paolo

> Note this only happens with UNIX domain sockets on Linux.  It doesn't
> seem possible to reproduce this with TCP sockets.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/nbd-client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 25bcaa2346..0a7f32779e 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -371,7 +371,7 @@ void nbd_client_attach_aio_context(BlockDriverState *bs,
>  {
>      NBDClientSession *client = nbd_get_client_session(bs);
>      qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
> -    aio_co_schedule(new_context, client->read_reply_co);
> +    qemu_aio_coroutine_enter(new_context, client->read_reply_co);
>  }
Eric Blake Aug. 24, 2017, 5:37 p.m. UTC | #2
On 08/24/2017 11:21 AM, Paolo Bonzini wrote:
> On 24/08/2017 17:33, Stefan Hajnoczi wrote:
>> This patch enters read_reply_co directly in
>> nbd_client_attach_aio_context().  This is safe because new_context is
>> acquired by the caller.  This ensures that read_reply_co reaches its
>> first yield point and its ctx is set up.
> 
> I'm not very confident with this patch.  aio_context_acquire/release is
> going to go away, and this then becomes possible
> 
> 	main context			new_context
> 	qemu_aio_coroutine_enter
> 					send request
> 					wait for reply
> 	read first reply
> 	wake coroutine
> 
> where the "wake coroutine" part thinks it's running in new_context, and
> thus simply enters the coroutine instead of using the bottom half.
> 
> But blk_co_preadv() should need the read_reply_co itself, in order to be
> woken up after reading the reply header.  The core issue here is that
> nbd_co_receive_reply was never called, I suspect.  And if it was never
> called, read_reply_co should not be woken up by nbd_coroutine_end.
> 
> So the fix is:
> 
> 1) assign NULL to s->recv_coroutine[i] when nbd_co_send_request fails
> 
> 2) move this to nbd_co_receive_reply:
> 
>     s->recv_coroutine[i] = NULL;
> 
>     /* Kick the read_reply_co to get the next reply.  */
>     if (s->read_reply_co) {
>         aio_co_wake(s->read_reply_co);
>     }
> 
> Does this make sense?  (Note that the read_reply_co idea actually came
> from you, or from my recollections of your proposed design :)).

How much of this overlaps with Vladimir's proposal?
https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg00846.html
Paolo Bonzini Aug. 24, 2017, 5:42 p.m. UTC | #3
On 24/08/2017 19:37, Eric Blake wrote:
> On 08/24/2017 11:21 AM, Paolo Bonzini wrote:
>> On 24/08/2017 17:33, Stefan Hajnoczi wrote:
>>> This patch enters read_reply_co directly in
>>> nbd_client_attach_aio_context().  This is safe because new_context is
>>> acquired by the caller.  This ensures that read_reply_co reaches its
>>> first yield point and its ctx is set up.
>>
>> I'm not very confident with this patch.  aio_context_acquire/release is
>> going to go away, and this then becomes possible
>>
>> 	main context			new_context
>> 	qemu_aio_coroutine_enter
>> 					send request
>> 					wait for reply
>> 	read first reply
>> 	wake coroutine
>>
>> where the "wake coroutine" part thinks it's running in new_context, and
>> thus simply enters the coroutine instead of using the bottom half.
>>
>> But blk_co_preadv() should need the read_reply_co itself, in order to be
>> woken up after reading the reply header.  The core issue here is that
>> nbd_co_receive_reply was never called, I suspect.  And if it was never
>> called, read_reply_co should not be woken up by nbd_coroutine_end.
>>
>> So the fix is:
>>
>> 1) assign NULL to s->recv_coroutine[i] when nbd_co_send_request fails
>>
>> 2) move this to nbd_co_receive_reply:
>>
>>     s->recv_coroutine[i] = NULL;
>>
>>     /* Kick the read_reply_co to get the next reply.  */
>>     if (s->read_reply_co) {
>>         aio_co_wake(s->read_reply_co);
>>     }
>>
>> Does this make sense?  (Note that the read_reply_co idea actually came
>> from you, or from my recollections of your proposed design :)).
> 
> How much of this overlaps with Vladimir's proposal?
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg00846.html

The above should be about 15 lines added, 10 removed. :)

Paolo
Stefan Hajnoczi Aug. 25, 2017, 10:40 a.m. UTC | #4
On Thu, Aug 24, 2017 at 5:21 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 24/08/2017 17:33, Stefan Hajnoczi wrote:
>> This patch enters read_reply_co directly in
>> nbd_client_attach_aio_context().  This is safe because new_context is
>> acquired by the caller.  This ensures that read_reply_co reaches its
>> first yield point and its ctx is set up.
>
> I'm not very confident with this patch.  aio_context_acquire/release is
> going to go away, and this then becomes possible
>
>         main context                    new_context
>         qemu_aio_coroutine_enter
>                                         send request
>                                         wait for reply
>         read first reply
>         wake coroutine
>
> where the "wake coroutine" part thinks it's running in new_context, and
> thus simply enters the coroutine instead of using the bottom half.
>
> But blk_co_preadv() should need the read_reply_co itself, in order to be
> woken up after reading the reply header.  The core issue here is that
> nbd_co_receive_reply was never called, I suspect.  And if it was never
> called, read_reply_co should not be woken up by nbd_coroutine_end.
>
> So the fix is:
>
> 1) assign NULL to s->recv_coroutine[i] when nbd_co_send_request fails
>
> 2) move this to nbd_co_receive_reply:
>
>     s->recv_coroutine[i] = NULL;
>
>     /* Kick the read_reply_co to get the next reply.  */
>     if (s->read_reply_co) {
>         aio_co_wake(s->read_reply_co);
>     }
>
> Does this make sense?  (Note that the read_reply_co idea actually came
> from you, or from my recollections of your proposed design :)).

Yes, I think that would work.

Stefan
Vladimir Sementsov-Ogievskiy Aug. 25, 2017, 3:57 p.m. UTC | #5
24.08.2017 20:42, Paolo Bonzini wrote:
> On 24/08/2017 19:37, Eric Blake wrote:
>> On 08/24/2017 11:21 AM, Paolo Bonzini wrote:
>>> On 24/08/2017 17:33, Stefan Hajnoczi wrote:
>>>> This patch enters read_reply_co directly in
>>>> nbd_client_attach_aio_context().  This is safe because new_context is
>>>> acquired by the caller.  This ensures that read_reply_co reaches its
>>>> first yield point and its ctx is set up.
>>> I'm not very confident with this patch.  aio_context_acquire/release is
>>> going to go away, and this then becomes possible
>>>
>>> 	main context			new_context
>>> 	qemu_aio_coroutine_enter
>>> 					send request
>>> 					wait for reply
>>> 	read first reply
>>> 	wake coroutine
>>>
>>> where the "wake coroutine" part thinks it's running in new_context, and
>>> thus simply enters the coroutine instead of using the bottom half.
>>>
>>> But blk_co_preadv() should need the read_reply_co itself, in order to be
>>> woken up after reading the reply header.  The core issue here is that
>>> nbd_co_receive_reply was never called, I suspect.  And if it was never
>>> called, read_reply_co should not be woken up by nbd_coroutine_end.
>>>
>>> So the fix is:
>>>
>>> 1) assign NULL to s->recv_coroutine[i] when nbd_co_send_request fails
>>>
>>> 2) move this to nbd_co_receive_reply:
>>>
>>>      s->recv_coroutine[i] = NULL;
>>>
>>>      /* Kick the read_reply_co to get the next reply.  */
>>>      if (s->read_reply_co) {
>>>          aio_co_wake(s->read_reply_co);
>>>      }
>>>
>>> Does this make sense?  (Note that the read_reply_co idea actually came
>>> from you, or from my recollections of your proposed design :)).
>> How much of this overlaps with Vladimir's proposal?
>> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg00846.html
> The above should be about 15 lines added, 10 removed. :)
>
> Paolo
>
>

I'll be on vocation for the next two weeks and will continue this work 
after it.

I think we have a lot of conflicts already with my series after 
different fixes,

so I'll rebase on them all, no problem.


PS: I think recent problems and bugs in nbd are related to not very good 
separation between

block/nbd-client.c and nbd/client.c. Ideally, I think, block/nbd-client 
should not

directly access the ioc, it should just call one or two high level 
functions of

nbd/client. The big problem of this separation is CMD_READ reply - only 
real client

knows, should we read a payload or not. I have two ideas on it:

1. We can add this information to request handle, then nbd/client will 
now by handle,

   that it should read the payload.. The length of payload should be in 
handle too. It looks

  possible, as handle is 64bit when request len is 32bit.

2. Move part of NBDClientSession to nbd/client, with NBDClientRequest 
requests[MAX_NBD_REQUESTS];

   to nbd/client, to implement one public function nbd_request(state, 
*request, *reply), which will do all the

  work for block/nbd-client..
diff mbox

Patch

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 25bcaa2346..0a7f32779e 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -371,7 +371,7 @@  void nbd_client_attach_aio_context(BlockDriverState *bs,
 {
     NBDClientSession *client = nbd_get_client_session(bs);
     qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
-    aio_co_schedule(new_context, client->read_reply_co);
+    qemu_aio_coroutine_enter(new_context, client->read_reply_co);
 }
 
 void nbd_client_close(BlockDriverState *bs)