diff mbox series

[1/3] block: add BlockBackend->in_flight counter

Message ID 20180208171807.24267-2-stefanha@redhat.com
State New
Headers show
Series block: fix blk_aio_*() segfault when blk->root == NULL | expand

Commit Message

Stefan Hajnoczi Feb. 8, 2018, 5:18 p.m. UTC
BlockBackend currently relies on BlockDriverState->in_flight to track
requests for blk_drain().  There is a corner case where
BlockDriverState->in_flight cannot be used though: blk->root can be NULL
when there is no medium.  This results in a segfault when the NULL
pointer is dereferenced.

Introduce a BlockBackend->in_flight counter for aio requests so it works
even when blk->root == NULL.

Based on a patch by Kevin Wolf <kwolf@redhat.com>.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c               |  2 +-
 block/block-backend.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 53 insertions(+), 8 deletions(-)

Comments

Eric Blake Feb. 9, 2018, 3:20 p.m. UTC | #1
On 02/08/2018 11:18 AM, Stefan Hajnoczi wrote:
> BlockBackend currently relies on BlockDriverState->in_flight to track
> requests for blk_drain().  There is a corner case where
> BlockDriverState->in_flight cannot be used though: blk->root can be NULL
> when there is no medium.  This results in a segfault when the NULL
> pointer is dereferenced.
> 
> Introduce a BlockBackend->in_flight counter for aio requests so it works
> even when blk->root == NULL.
> 
> Based on a patch by Kevin Wolf <kwolf@redhat.com>.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block.c               |  2 +-
>   block/block-backend.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------
>   2 files changed, 53 insertions(+), 8 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>
Paolo Bonzini Feb. 9, 2018, 4:28 p.m. UTC | #2
On 08/02/2018 18:18, Stefan Hajnoczi wrote:
> +    BlockDriverState *bs = blk_bs(blk);
> +
> +    if (bs) {
> +        bdrv_drained_begin(bs);
> +    }
> +
> +    /* We may have aio requests like -ENOMEDIUM in flight */
> +    while (atomic_mb_read(&blk->in_flight) > 0) {
> +        aio_poll(blk_get_aio_context(blk), true);
> +    }
> +
> +    if (bs) {
> +        bdrv_drained_end(bs);

This only works if you are in the default AioContext, otherwise you can
run handlers for one AioContexts in two threads.

bdrv_dec_in_flight uses bdrv_wakeup and BDRV_POLL_WHILE to ensure that
this doesn't happen, so there would be more code that you have to copy
into block-backend.c.

Paolo
Kevin Wolf Feb. 9, 2018, 5:23 p.m. UTC | #3
Am 09.02.2018 um 17:28 hat Paolo Bonzini geschrieben:
> On 08/02/2018 18:18, Stefan Hajnoczi wrote:
> > +    BlockDriverState *bs = blk_bs(blk);
> > +
> > +    if (bs) {
> > +        bdrv_drained_begin(bs);
> > +    }
> > +
> > +    /* We may have aio requests like -ENOMEDIUM in flight */
> > +    while (atomic_mb_read(&blk->in_flight) > 0) {
> > +        aio_poll(blk_get_aio_context(blk), true);
> > +    }
> > +
> > +    if (bs) {
> > +        bdrv_drained_end(bs);
> 
> This only works if you are in the default AioContext, otherwise you can
> run handlers for one AioContexts in two threads.

Should aio_poll() assert that it's called in the right thread to make
sure we obey the rules?

> bdrv_dec_in_flight uses bdrv_wakeup and BDRV_POLL_WHILE to ensure that
> this doesn't happen, so there would be more code that you have to copy
> into block-backend.c.

Instead of copying, can't we generalise it into a POLL_WHILE(ctx,
wakeup, cond) and make BDRV_POLL_WHILE() a wrapper for that?

Kevin
Paolo Bonzini Feb. 9, 2018, 5:26 p.m. UTC | #4
On 09/02/2018 18:23, Kevin Wolf wrote:
> Am 09.02.2018 um 17:28 hat Paolo Bonzini geschrieben:
>> On 08/02/2018 18:18, Stefan Hajnoczi wrote:
>>> +    BlockDriverState *bs = blk_bs(blk);
>>> +
>>> +    if (bs) {
>>> +        bdrv_drained_begin(bs);
>>> +    }
>>> +
>>> +    /* We may have aio requests like -ENOMEDIUM in flight */
>>> +    while (atomic_mb_read(&blk->in_flight) > 0) {
>>> +        aio_poll(blk_get_aio_context(blk), true);
>>> +    }
>>> +
>>> +    if (bs) {
>>> +        bdrv_drained_end(bs);
>>
>> This only works if you are in the default AioContext, otherwise you can
>> run handlers for one AioContexts in two threads.
> 
> Should aio_poll() assert that it's called in the right thread to make
> sure we obey the rules?
> 
>> bdrv_dec_in_flight uses bdrv_wakeup and BDRV_POLL_WHILE to ensure that
>> this doesn't happen, so there would be more code that you have to copy
>> into block-backend.c.
> 
> Instead of copying, can't we generalise it into a POLL_WHILE(ctx,
> wakeup, cond) and make BDRV_POLL_WHILE() a wrapper for that?

Yes, or even move bdrv_wakeup to AioContext would do.  We already have
block layer-specific fields such as the linux-aio state(*).

	(*) though now that linux-aio has been improved to do something
	    like epoll, there may be a better reason to place linux-aio
	    state in AioContext.

Paolo
Stefan Hajnoczi Feb. 12, 2018, 2:08 p.m. UTC | #5
On Fri, Feb 09, 2018 at 06:26:41PM +0100, Paolo Bonzini wrote:
> On 09/02/2018 18:23, Kevin Wolf wrote:
> > Am 09.02.2018 um 17:28 hat Paolo Bonzini geschrieben:
> >> On 08/02/2018 18:18, Stefan Hajnoczi wrote:
> >>> +    BlockDriverState *bs = blk_bs(blk);
> >>> +
> >>> +    if (bs) {
> >>> +        bdrv_drained_begin(bs);
> >>> +    }
> >>> +
> >>> +    /* We may have aio requests like -ENOMEDIUM in flight */
> >>> +    while (atomic_mb_read(&blk->in_flight) > 0) {
> >>> +        aio_poll(blk_get_aio_context(blk), true);
> >>> +    }
> >>> +
> >>> +    if (bs) {
> >>> +        bdrv_drained_end(bs);
> >>
> >> This only works if you are in the default AioContext, otherwise you can
> >> run handlers for one AioContexts in two threads.
> > 
> > Should aio_poll() assert that it's called in the right thread to make
> > sure we obey the rules?
> > 
> >> bdrv_dec_in_flight uses bdrv_wakeup and BDRV_POLL_WHILE to ensure that
> >> this doesn't happen, so there would be more code that you have to copy
> >> into block-backend.c.
> > 
> > Instead of copying, can't we generalise it into a POLL_WHILE(ctx,
> > wakeup, cond) and make BDRV_POLL_WHILE() a wrapper for that?
> 
> Yes, or even move bdrv_wakeup to AioContext would do.  We already have
> block layer-specific fields such as the linux-aio state(*).
> 
> 	(*) though now that linux-aio has been improved to do something
> 	    like epoll, there may be a better reason to place linux-aio
> 	    state in AioContext.

Thanks for the ideas, will fix in v2.

Stefan
diff mbox series

Patch

diff --git a/block.c b/block.c
index a8da4f2b25..140f7919f8 100644
--- a/block.c
+++ b/block.c
@@ -4716,7 +4716,7 @@  out:
 
 AioContext *bdrv_get_aio_context(BlockDriverState *bs)
 {
-    return bs->aio_context;
+    return bs ? bs->aio_context : qemu_get_aio_context();
 }
 
 void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
diff --git a/block/block-backend.c b/block/block-backend.c
index baef8e7abc..4811d8bc55 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -71,6 +71,13 @@  struct BlockBackend {
     int quiesce_counter;
     VMChangeStateEntry *vmsh;
     bool force_allow_inactivate;
+
+    /* Number of in-flight aio requests.  BlockDriverState also counts
+     * in-flight requests but aio requests can exist even when blk->root is
+     * NULL, so we cannot rely on its counter for that case.
+     * Accessed with atomic ops.
+     */
+    unsigned int in_flight;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -1223,11 +1230,21 @@  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
     return bdrv_make_zero(blk->root, flags);
 }
 
+static void blk_inc_in_flight(BlockBackend *blk)
+{
+    atomic_inc(&blk->in_flight);
+}
+
+static void blk_dec_in_flight(BlockBackend *blk)
+{
+    atomic_dec(&blk->in_flight);
+}
+
 static void error_callback_bh(void *opaque)
 {
     struct BlockBackendAIOCB *acb = opaque;
 
-    bdrv_dec_in_flight(acb->common.bs);
+    blk_dec_in_flight(acb->blk);
     acb->common.cb(acb->common.opaque, acb->ret);
     qemu_aio_unref(acb);
 }
@@ -1238,7 +1255,7 @@  BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 {
     struct BlockBackendAIOCB *acb;
 
-    bdrv_inc_in_flight(blk_bs(blk));
+    blk_inc_in_flight(blk);
     acb = blk_aio_get(&block_backend_aiocb_info, blk, cb, opaque);
     acb->blk = blk;
     acb->ret = ret;
@@ -1261,7 +1278,7 @@  static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
     if (acb->has_returned) {
-        bdrv_dec_in_flight(acb->common.bs);
+        blk_dec_in_flight(acb->rwco.blk);
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
         qemu_aio_unref(acb);
     }
@@ -1282,7 +1299,7 @@  static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
     BlkAioEmAIOCB *acb;
     Coroutine *co;
 
-    bdrv_inc_in_flight(blk_bs(blk));
+    blk_inc_in_flight(blk);
     acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
@@ -1519,14 +1536,42 @@  int blk_flush(BlockBackend *blk)
 
 void blk_drain(BlockBackend *blk)
 {
-    if (blk_bs(blk)) {
-        bdrv_drain(blk_bs(blk));
+    BlockDriverState *bs = blk_bs(blk);
+
+    if (bs) {
+        bdrv_drained_begin(bs);
+    }
+
+    /* We may have aio requests like -ENOMEDIUM in flight */
+    while (atomic_mb_read(&blk->in_flight) > 0) {
+        aio_poll(blk_get_aio_context(blk), true);
+    }
+
+    if (bs) {
+        bdrv_drained_end(bs);
     }
 }
 
 void blk_drain_all(void)
 {
-    bdrv_drain_all();
+    BlockBackend *blk = NULL;
+
+    bdrv_drain_all_begin();
+
+    while ((blk = blk_all_next(blk)) != NULL) {
+        AioContext *ctx = blk_get_aio_context(blk);
+
+        aio_context_acquire(ctx);
+
+        /* We may have aio requests like -ENOMEDIUM in flight */
+        while (atomic_mb_read(&blk->in_flight) > 0) {
+            aio_poll(blk_get_aio_context(blk), true);
+        }
+
+        aio_context_release(ctx);
+    }
+
+    bdrv_drain_all_end();
 }
 
 void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,