diff mbox

[v3] Do not delete BlockDriverState when deleting the drive

Message ID 20110328184616.GH20374@us.ibm.com
State New
Headers show

Commit Message

Ryan Harper March 28, 2011, 6:46 p.m. UTC
When removing a drive from the host-side via drive_del we currently have the
following path:

drive_del
qemu_aio_flush()
bdrv_close()    // zaps bs->drv, which makes any subsequent I/O get
                // dropped.  Works as designed
drive_uninit()
bdrv_delete()   // frees the bs.  Since the device is still connected to
                // bs, any subsequent I/O is a use-after-free.

The value of bs->drv becomes unpredictable on free.  As long as it
remains null, I/O still gets dropped, however it could become non-null at any
point after the free resulting SEGVs or other QEMU state corruption.

To resolve this issue as simply as possible, we can chose to not actually
delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
pointer to NULL, we just need to remove the BlockDriverState from the QLIST
that is used to enumerate the block devices.  This is currently handled within
bdrv_delete, so move this into its own function, bdrv_make_anon().

The result is that we can now invoke drive_del, this closes the file descriptors
and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
and since we do not free BlockDriverState, we don't have to worry about the copy
retained in the block devices.

We also don't attempt to remove the qdev property since we are no longer deleting
the BlockDriverState on drives with associated drives.  This also allows for
removing Drives with no devices associated either.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
v2->v3
 - Update drive_del use after free description
 - s/bdrv_remove/bdrv_make_anon/g
 - Don't remove qdev property since we don't delete bs any more
 - If (bs->peer) bdrv_make_anon else bdrv_delete to handle removing
   drives with no device.
v1->v2
  - NULL bs->device_name after removing from list to prevent
    second removal.

 block.c    |   13 ++++++++++---
 block.h    |    1 +
 blockdev.c |   25 ++++++++-----------------
 3 files changed, 19 insertions(+), 20 deletions(-)

Comments

Markus Armbruster March 29, 2011, 7:40 a.m. UTC | #1
Ryan Harper <ryanh@us.ibm.com> writes:

> When removing a drive from the host-side via drive_del we currently have the
> following path:
>
> drive_del
> qemu_aio_flush()
> bdrv_close()    // zaps bs->drv, which makes any subsequent I/O get
>                 // dropped.  Works as designed
> drive_uninit()
> bdrv_delete()   // frees the bs.  Since the device is still connected to
>                 // bs, any subsequent I/O is a use-after-free.
>
> The value of bs->drv becomes unpredictable on free.  As long as it
> remains null, I/O still gets dropped, however it could become non-null at any
> point after the free resulting SEGVs or other QEMU state corruption.
>
> To resolve this issue as simply as possible, we can chose to not actually
> delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
> pointer to NULL, we just need to remove the BlockDriverState from the QLIST
> that is used to enumerate the block devices.  This is currently handled within
> bdrv_delete, so move this into its own function, bdrv_make_anon().
>
> The result is that we can now invoke drive_del, this closes the file descriptors
> and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
> and since we do not free BlockDriverState, we don't have to worry about the copy
> retained in the block devices.
>
> We also don't attempt to remove the qdev property since we are no longer deleting
> the BlockDriverState on drives with associated drives.  This also allows for
> removing Drives with no devices associated either.
>
> Reported-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> ---
> v2->v3
>  - Update drive_del use after free description
>  - s/bdrv_remove/bdrv_make_anon/g
>  - Don't remove qdev property since we don't delete bs any more
>  - If (bs->peer) bdrv_make_anon else bdrv_delete to handle removing
>    drives with no device.
> v1->v2
>   - NULL bs->device_name after removing from list to prevent
>     second removal.
>
>  block.c    |   13 ++++++++++---
>  block.h    |    1 +
>  blockdev.c |   25 ++++++++-----------------
>  3 files changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/block.c b/block.c
> index c8e2f97..6a5d3f2 100644
> --- a/block.c
> +++ b/block.c
> @@ -697,14 +697,21 @@ void bdrv_close_all(void)
>      }
>  }
>  
> +/* make a BlockDriverState anonymous by removing from bdrv_state list.
> +   Also, NULL terminate the device_name to prevent double remove */
> +void bdrv_make_anon(BlockDriverState *bs)
> +{
> +    if (bs->device_name[0] != '\0') {
> +        QTAILQ_REMOVE(&bdrv_states, bs, list);
> +    }

You lost

+    bs->device_name[0] = '\0';

since v2.  Oops.

> +}
> +
>  void bdrv_delete(BlockDriverState *bs)
>  {
>      assert(!bs->peer);
>  
>      /* remove from list, if necessary */
> -    if (bs->device_name[0] != '\0') {
> -        QTAILQ_REMOVE(&bdrv_states, bs, list);
> -    }
> +    bdrv_make_anon(bs);
>  
>      bdrv_close(bs);
>      if (bs->file != NULL) {
> diff --git a/block.h b/block.h
> index 5d78fc0..52e9cad 100644
> --- a/block.h
> +++ b/block.h
> @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>      QEMUOptionParameter *options);
>  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>  BlockDriverState *bdrv_new(const char *device_name);
> +void bdrv_make_anon(BlockDriverState *bs);
>  void bdrv_delete(BlockDriverState *bs);
>  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
> diff --git a/blockdev.c b/blockdev.c
> index ecf2252..2c0eb06 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
>  {
>      const char *id = qdict_get_str(qdict, "id");
>      BlockDriverState *bs;
> -    BlockDriverState **ptr;
> -    Property *prop;
>  
>      bs = bdrv_find(id);
>      if (!bs) {
> @@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
>      bdrv_flush(bs);
>      bdrv_close(bs);
>  
> -    /* clean up guest state from pointing to host resource by
> -     * finding and removing DeviceState "drive" property */
> +    /* if we have a device associated with this BlockDriverState (bs->peer)
> +     * then we need to make the drive anonymous until the device
> +     * can be removed.  If this is a drive with no device backing
> +     * then we can just get rid of the block driver state right here.
> +     */
>      if (bs->peer) {
> -        for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> -            if (prop->info->type == PROP_TYPE_DRIVE) {
> -                ptr = qdev_get_prop_ptr(bs->peer, prop);
> -                if (*ptr == bs) {
> -                    bdrv_detach(bs, bs->peer);
> -                    *ptr = NULL;
> -                    break;
> -                }
> -            }
> -        }
> +        bdrv_make_anon(bs);
> +    } else {
> +        bdrv_delete(bs);

Don't you need drive_uninit() here?

>      }
>  
> -    /* clean up host side */
> -    drive_uninit(drive_get_by_blockdev(bs));
> -
>      return 0;
>  }
>  
> -- 
> 1.7.1
Markus Armbruster March 29, 2011, 9:01 a.m. UTC | #2
Since you have to respin anyway, would you mind limiting commit message
line length to 70-75 characters?  Thanks.
Ryan Harper March 29, 2011, 3:41 p.m. UTC | #3
* Markus Armbruster <armbru@redhat.com> [2011-03-29 02:44]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > When removing a drive from the host-side via drive_del we currently have the
> > following path:
> >
> > drive_del
> > qemu_aio_flush()
> > bdrv_close()    // zaps bs->drv, which makes any subsequent I/O get
> >                 // dropped.  Works as designed
> > drive_uninit()
> > bdrv_delete()   // frees the bs.  Since the device is still connected to
> >                 // bs, any subsequent I/O is a use-after-free.
> >
> > The value of bs->drv becomes unpredictable on free.  As long as it
> > remains null, I/O still gets dropped, however it could become non-null at any
> > point after the free resulting SEGVs or other QEMU state corruption.
> >
> > To resolve this issue as simply as possible, we can chose to not actually
> > delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST
> > that is used to enumerate the block devices.  This is currently handled within
> > bdrv_delete, so move this into its own function, bdrv_make_anon().
> >
> > The result is that we can now invoke drive_del, this closes the file descriptors
> > and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
> > and since we do not free BlockDriverState, we don't have to worry about the copy
> > retained in the block devices.
> >
> > We also don't attempt to remove the qdev property since we are no longer deleting
> > the BlockDriverState on drives with associated drives.  This also allows for
> > removing Drives with no devices associated either.
> >
> > Reported-by: Markus Armbruster <armbru@redhat.com>
> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> > ---
> > v2->v3
> >  - Update drive_del use after free description
> >  - s/bdrv_remove/bdrv_make_anon/g
> >  - Don't remove qdev property since we don't delete bs any more
> >  - If (bs->peer) bdrv_make_anon else bdrv_delete to handle removing
> >    drives with no device.
> > v1->v2
> >   - NULL bs->device_name after removing from list to prevent
> >     second removal.
> >
> >  block.c    |   13 ++++++++++---
> >  block.h    |    1 +
> >  blockdev.c |   25 ++++++++-----------------
> >  3 files changed, 19 insertions(+), 20 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index c8e2f97..6a5d3f2 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -697,14 +697,21 @@ void bdrv_close_all(void)
> >      }
> >  }
> >  
> > +/* make a BlockDriverState anonymous by removing from bdrv_state list.
> > +   Also, NULL terminate the device_name to prevent double remove */
> > +void bdrv_make_anon(BlockDriverState *bs)
> > +{
> > +    if (bs->device_name[0] != '\0') {
> > +        QTAILQ_REMOVE(&bdrv_states, bs, list);
> > +    }
> 
> You lost
> 
> +    bs->device_name[0] = '\0';
> 
> since v2.  Oops.

Crap.  

> 
> > +}
> > +
> >  void bdrv_delete(BlockDriverState *bs)
> >  {
> >      assert(!bs->peer);
> >  
> >      /* remove from list, if necessary */
> > -    if (bs->device_name[0] != '\0') {
> > -        QTAILQ_REMOVE(&bdrv_states, bs, list);
> > -    }
> > +    bdrv_make_anon(bs);
> >  
> >      bdrv_close(bs);
> >      if (bs->file != NULL) {
> > diff --git a/block.h b/block.h
> > index 5d78fc0..52e9cad 100644
> > --- a/block.h
> > +++ b/block.h
> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
> >      QEMUOptionParameter *options);
> >  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
> >  BlockDriverState *bdrv_new(const char *device_name);
> > +void bdrv_make_anon(BlockDriverState *bs);
> >  void bdrv_delete(BlockDriverState *bs);
> >  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
> >  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
> > diff --git a/blockdev.c b/blockdev.c
> > index ecf2252..2c0eb06 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -737,8 +737,6 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> >  {
> >      const char *id = qdict_get_str(qdict, "id");
> >      BlockDriverState *bs;
> > -    BlockDriverState **ptr;
> > -    Property *prop;
> >  
> >      bs = bdrv_find(id);
> >      if (!bs) {
> > @@ -755,24 +753,17 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> >      bdrv_flush(bs);
> >      bdrv_close(bs);
> >  
> > -    /* clean up guest state from pointing to host resource by
> > -     * finding and removing DeviceState "drive" property */
> > +    /* if we have a device associated with this BlockDriverState (bs->peer)
> > +     * then we need to make the drive anonymous until the device
> > +     * can be removed.  If this is a drive with no device backing
> > +     * then we can just get rid of the block driver state right here.
> > +     */
> >      if (bs->peer) {
> > -        for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> > -            if (prop->info->type == PROP_TYPE_DRIVE) {
> > -                ptr = qdev_get_prop_ptr(bs->peer, prop);
> > -                if (*ptr == bs) {
> > -                    bdrv_detach(bs, bs->peer);
> > -                    *ptr = NULL;
> > -                    break;
> > -                }
> > -            }
> > -        }
> > +        bdrv_make_anon(bs);
> > +    } else {
> > +        bdrv_delete(bs);
> 
> Don't you need drive_uninit() here?

You're right.  We don't need it if we have a device associated since the
qdev cleanup code on device removal will remove it (via auto_delete).

I'll fix that up as well.
Ryan Harper March 29, 2011, 3:41 p.m. UTC | #4
* Markus Armbruster <armbru@redhat.com> [2011-03-29 04:06]:
> Since you have to respin anyway, would you mind limiting commit message
> line length to 70-75 characters?  Thanks.

yep
diff mbox

Patch

diff --git a/block.c b/block.c
index c8e2f97..6a5d3f2 100644
--- a/block.c
+++ b/block.c
@@ -697,14 +697,21 @@  void bdrv_close_all(void)
     }
 }
 
+/* make a BlockDriverState anonymous by removing from bdrv_state list.
+   Also, NULL terminate the device_name to prevent double remove */
+void bdrv_make_anon(BlockDriverState *bs)
+{
+    if (bs->device_name[0] != '\0') {
+        QTAILQ_REMOVE(&bdrv_states, bs, list);
+    }
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
     assert(!bs->peer);
 
     /* remove from list, if necessary */
-    if (bs->device_name[0] != '\0') {
-        QTAILQ_REMOVE(&bdrv_states, bs, list);
-    }
+    bdrv_make_anon(bs);
 
     bdrv_close(bs);
     if (bs->file != NULL) {
diff --git a/block.h b/block.h
index 5d78fc0..52e9cad 100644
--- a/block.h
+++ b/block.h
@@ -66,6 +66,7 @@  int bdrv_create(BlockDriver *drv, const char* filename,
     QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_make_anon(BlockDriverState *bs);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
diff --git a/blockdev.c b/blockdev.c
index ecf2252..2c0eb06 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -737,8 +737,6 @@  int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
     const char *id = qdict_get_str(qdict, "id");
     BlockDriverState *bs;
-    BlockDriverState **ptr;
-    Property *prop;
 
     bs = bdrv_find(id);
     if (!bs) {
@@ -755,24 +753,17 @@  int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
     bdrv_flush(bs);
     bdrv_close(bs);
 
-    /* clean up guest state from pointing to host resource by
-     * finding and removing DeviceState "drive" property */
+    /* if we have a device associated with this BlockDriverState (bs->peer)
+     * then we need to make the drive anonymous until the device
+     * can be removed.  If this is a drive with no device backing
+     * then we can just get rid of the block driver state right here.
+     */
     if (bs->peer) {
-        for (prop = bs->peer->info->props; prop && prop->name; prop++) {
-            if (prop->info->type == PROP_TYPE_DRIVE) {
-                ptr = qdev_get_prop_ptr(bs->peer, prop);
-                if (*ptr == bs) {
-                    bdrv_detach(bs, bs->peer);
-                    *ptr = NULL;
-                    break;
-                }
-            }
-        }
+        bdrv_make_anon(bs);
+    } else {
+        bdrv_delete(bs);
     }
 
-    /* clean up host side */
-    drive_uninit(drive_get_by_blockdev(bs));
-
     return 0;
 }