diff mbox

[for-2.9-rc5,v4,1/2] block: Walk bs->children carefully in bdrv_drain_recurse

Message ID 20170418143044.12187-2-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng April 18, 2017, 2:30 p.m. UTC
The recursive bdrv_drain_recurse may run a block job completion BH that
drops nodes. The coming changes will make that more likely and use-after-free
would happen without this patch

Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
QLIST_FOREACH_SAFE to prevent such a case from happening.

Since bdrv_unref accesses global state that is not protected by the AioContext
lock, we cannot use bdrv_ref/bdrv_unref unconditionally.  Fortunately the
protection is not needed in IOThread because only main loop can modify a graph
with the AioContext lock held.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/io.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Comments

Kevin Wolf April 18, 2017, 2:46 p.m. UTC | #1
Am 18.04.2017 um 16:30 hat Fam Zheng geschrieben:
> The recursive bdrv_drain_recurse may run a block job completion BH that
> drops nodes. The coming changes will make that more likely and use-after-free
> would happen without this patch
> 
> Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
> QLIST_FOREACH_SAFE to prevent such a case from happening.
> 
> Since bdrv_unref accesses global state that is not protected by the AioContext
> lock, we cannot use bdrv_ref/bdrv_unref unconditionally.  Fortunately the
> protection is not needed in IOThread because only main loop can modify a graph
> with the AioContext lock held.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/io.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 8706bfa..a0df8c4 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -158,7 +158,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
>  
>  static bool bdrv_drain_recurse(BlockDriverState *bs)
>  {
> -    BdrvChild *child;
> +    BdrvChild *child, *tmp;
>      bool waited;
>  
>      waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
> @@ -167,8 +167,25 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
>          bs->drv->bdrv_drain(bs);
>      }
>  
> -    QLIST_FOREACH(child, &bs->children, next) {
> -        waited |= bdrv_drain_recurse(child->bs);
> +    QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
> +        BlockDriverState *bs = child->bs;
> +        bool in_main_loop =
> +            qemu_get_current_aio_context() == qemu_get_aio_context();
> +        assert(bs->refcnt > 0);
> +        if (in_main_loop) {
> +            /* In case the resursive bdrv_drain_recurse processes a

s/resursive/recursive/

> +             * block_job_defer_to_main_loop BH and modifies the graph,
> +             * let's hold a reference to bs until we are done.
> +             *
> +             * IOThread doesn't have such a BH, and it is not safe to call
> +             * bdrv_unref without BQL, so skip doing it there.
> +             **/

And **/ is unusual, too.

> +            bdrv_ref(bs);
> +        }
> +        waited |= bdrv_drain_recurse(bs);
> +        if (in_main_loop) {
> +            bdrv_unref(bs);
> +        }
>      }

Other than this, the series looks good to me.

Kevin
Fam Zheng April 18, 2017, 2:54 p.m. UTC | #2
On Tue, 04/18 16:46, Kevin Wolf wrote:
> Am 18.04.2017 um 16:30 hat Fam Zheng geschrieben:
> > The recursive bdrv_drain_recurse may run a block job completion BH that
> > drops nodes. The coming changes will make that more likely and use-after-free
> > would happen without this patch
> > 
> > Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
> > QLIST_FOREACH_SAFE to prevent such a case from happening.
> > 
> > Since bdrv_unref accesses global state that is not protected by the AioContext
> > lock, we cannot use bdrv_ref/bdrv_unref unconditionally.  Fortunately the
> > protection is not needed in IOThread because only main loop can modify a graph
> > with the AioContext lock held.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  block/io.c | 23 ++++++++++++++++++++---
> >  1 file changed, 20 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/io.c b/block/io.c
> > index 8706bfa..a0df8c4 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -158,7 +158,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
> >  
> >  static bool bdrv_drain_recurse(BlockDriverState *bs)
> >  {
> > -    BdrvChild *child;
> > +    BdrvChild *child, *tmp;
> >      bool waited;
> >  
> >      waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
> > @@ -167,8 +167,25 @@ static bool bdrv_drain_recurse(BlockDriverState *bs)
> >          bs->drv->bdrv_drain(bs);
> >      }
> >  
> > -    QLIST_FOREACH(child, &bs->children, next) {
> > -        waited |= bdrv_drain_recurse(child->bs);
> > +    QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
> > +        BlockDriverState *bs = child->bs;
> > +        bool in_main_loop =
> > +            qemu_get_current_aio_context() == qemu_get_aio_context();
> > +        assert(bs->refcnt > 0);
> > +        if (in_main_loop) {
> > +            /* In case the resursive bdrv_drain_recurse processes a
> 
> s/resursive/recursive/
> 
> > +             * block_job_defer_to_main_loop BH and modifies the graph,
> > +             * let's hold a reference to bs until we are done.
> > +             *
> > +             * IOThread doesn't have such a BH, and it is not safe to call
> > +             * bdrv_unref without BQL, so skip doing it there.
> > +             **/
> 
> And **/ is unusual, too.
> 
> > +            bdrv_ref(bs);
> > +        }
> > +        waited |= bdrv_drain_recurse(bs);
> > +        if (in_main_loop) {
> > +            bdrv_unref(bs);
> > +        }
> >      }
> 
> Other than this, the series looks good to me.

Thanks, I'll fix them and send a pull request to Peter.

Fam
diff mbox

Patch

diff --git a/block/io.c b/block/io.c
index 8706bfa..a0df8c4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -158,7 +158,7 @@  bool bdrv_requests_pending(BlockDriverState *bs)
 
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
-    BdrvChild *child;
+    BdrvChild *child, *tmp;
     bool waited;
 
     waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
@@ -167,8 +167,25 @@  static bool bdrv_drain_recurse(BlockDriverState *bs)
         bs->drv->bdrv_drain(bs);
     }
 
-    QLIST_FOREACH(child, &bs->children, next) {
-        waited |= bdrv_drain_recurse(child->bs);
+    QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
+        BlockDriverState *bs = child->bs;
+        bool in_main_loop =
+            qemu_get_current_aio_context() == qemu_get_aio_context();
+        assert(bs->refcnt > 0);
+        if (in_main_loop) {
+            /* In case the resursive bdrv_drain_recurse processes a
+             * block_job_defer_to_main_loop BH and modifies the graph,
+             * let's hold a reference to bs until we are done.
+             *
+             * IOThread doesn't have such a BH, and it is not safe to call
+             * bdrv_unref without BQL, so skip doing it there.
+             **/
+            bdrv_ref(bs);
+        }
+        waited |= bdrv_drain_recurse(bs);
+        if (in_main_loop) {
+            bdrv_unref(bs);
+        }
     }
 
     return waited;