Patchwork [v2] Do not delete BlockDriverState when deleting the drive

login
register
mail settings
Submitter Ryan Harper
Date March 7, 2011, 3:59 p.m.
Message ID <20110307155918.GV23238@us.ibm.com>
Download mbox | patch
Permalink /patch/85762/
State New
Headers show

Comments

Ryan Harper - March 7, 2011, 3:59 p.m.
When removing a drive from the host-side via drive_del we currently have the
following path:

drive_del
qemu_aio_flush()
bdrv_close()
drive_uninit()
bdrv_delete()

When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
however, the block devices retain a copy of this pointer, see
hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.

We now have a use-after-free situation.  If the guest continues to issue IO
against the device, and we've reallocated the memory that the BlockDriverState
pointed at, then we will fail the bs->drv checks in the various bdrv_ methods.

To resolve this issue as simply as possible, we can chose to not actually
delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
pointer to NULL, we just need to remove the BlockDriverState from the QLIST
that is used to enumerate the block devices.  This is currently handled within
bdrv_delete, so move this into it's own function, bdrv_remove().

The result is that we can now invoke drive_del, this closes the file descriptors
and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
and since we do not free BlockDriverState, we don't have to worry about the copy
retained in the block devices.

Reported-by: Marcus Armbruster <armbru@redhat.com>
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
v1->v2
 - NULL bs->device_name after removing from list to prevent
   second removal.

 block.c    |   12 +++++++++---
 block.h    |    1 +
 blockdev.c |    2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)
Markus Armbruster - March 15, 2011, 9:47 a.m.
Sorry for the long delay, I was out of action for a week.

Ryan Harper <ryanh@us.ibm.com> writes:

> When removing a drive from the host-side via drive_del we currently have the
> following path:
>
> drive_del
> qemu_aio_flush()
> bdrv_close()
> drive_uninit()
> bdrv_delete()
>
> When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
> however, the block devices retain a copy of this pointer, see
> hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.
>
> We now have a use-after-free situation.  If the guest continues to issue IO
> against the device, and we've reallocated the memory that the BlockDriverState
> pointed at, then we will fail the bs->drv checks in the various bdrv_ methods.

"we will fail the bs->drv checks" is misleading, in my opinion.  Here's
what happens:

1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get
   dropped.  Works as designed.

2. drive_uninit() frees the bs.  Since the device is still connected to
   bs, any subsequent I/O is a use-after-free.

   The value of bs->drv becomes unpredictable on free.  As long as it
   remains null, I/O still gets dropped.  I/O crashes or worse once that
   changed.  Could be right on free, could be much later.

If you respin anyway, please clarify your description.

> To resolve this issue as simply as possible, we can chose to not actually
> delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
> pointer to NULL, we just need to remove the BlockDriverState from the QLIST
> that is used to enumerate the block devices.  This is currently handled within
> bdrv_delete, so move this into it's own function, bdrv_remove().

Why do we remove the BlockDriverState from bdrv_states?  Because we want
drive_del make its *name* go away.

Begs the question: is the code prepared for a BlockDriverState object
that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
creates such objects when the device_name is empty.  This is used for
internal BlockDriverStates such as COW backing files.  Your code makes
device_name empty when taking the object off bdrv_states, so we're good.

Begs yet another question: how does the behavior of a BlockDriverState
change when it's taken off bdrv_states, and is that the behavior we
want?  Changes:

* bdrv_delete() no longer takes it off bdrv_states.  Good.

* bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
  cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
  do nothing anyway for closed BlockDriverStates.

* "info block" and "info blockstats" no longer show it, because
  bdrv_info() and bdrv_info_stats() no longer see it.  Okay.

* bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
  Please check their uses and report.

> The result is that we can now invoke drive_del, this closes the file descriptors
> and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
> and since we do not free BlockDriverState, we don't have to worry about the copy
> retained in the block devices.

Yep.  But there's one more question: is the BlockDriverState freed when
the device using it gets destroyed?

qdev_free() runs prop->info->free() for all properties.  The drive
property's free() is free_drive():

  static void free_drive(DeviceState *dev, Property *prop)
  {
      BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);

      if (*ptr) {
          bdrv_detach(*ptr, dev);
          blockdev_auto_del(*ptr);
      }
  }

This should indeed delete the drive.  But only if the property still
points to it.  See below.

> Reported-by: Marcus Armbruster <armbru@redhat.com>
> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> ---
> v1->v2
>  - NULL bs->device_name after removing from list to prevent
>    second removal.
>
>  block.c    |   12 +++++++++---
>  block.h    |    1 +
>  blockdev.c |    2 +-
>  3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index 1544d81..0df9942 100644
> --- a/block.c
> +++ b/block.c
> @@ -697,14 +697,20 @@ void bdrv_close_all(void)
>      }
>  }
>  
> +void bdrv_remove(BlockDriverState *bs)
> +{
> +    if (bs->device_name[0] != '\0') {
> +        QTAILQ_REMOVE(&bdrv_states, bs, list);
> +    }
> +    bs->device_name[0] = '\0';
> +}
> +

I don't like this name.  What's the difference between "delete" and
"remove"?

The function zaps the device name.  bdrv_make_anon()?

>  void bdrv_delete(BlockDriverState *bs)
>  {
>      assert(!bs->peer);
>  
>      /* remove from list, if necessary */
> -    if (bs->device_name[0] != '\0') {
> -        QTAILQ_REMOVE(&bdrv_states, bs, list);
> -    }
> +    bdrv_remove(bs);
>  
>      bdrv_close(bs);
>      if (bs->file != NULL) {
> diff --git a/block.h b/block.h
> index 5d78fc0..8447397 100644
> --- a/block.h
> +++ b/block.h
> @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>      QEMUOptionParameter *options);
>  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>  BlockDriverState *bdrv_new(const char *device_name);
> +void bdrv_remove(BlockDriverState *bs);
>  void bdrv_delete(BlockDriverState *bs);
>  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
> diff --git a/blockdev.c b/blockdev.c
> index 0690cc8..1f57b50 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)

Let me add a bit more context:

      bdrv_flush(bs);
      bdrv_close(bs);

      /* clean up guest state from pointing to host resource by
       * finding and removing DeviceState "drive" property */
      if (bs->peer) {
          for (prop = bs->peer->info->props; prop && prop->name; prop++) {
              if (prop->info->type == PROP_TYPE_DRIVE) {
                  ptr = qdev_get_prop_ptr(bs->peer, prop);
                  if (*ptr == bs) {
                      bdrv_detach(bs, bs->peer);
                      *ptr = NULL;
                      break;
                  }
              }
          }
>      }

This zaps the drive property.  A subsequent free_drive() will do
nothing.  We leak the BlockDriverState on device unplug, I'm afraid.

Any reason why we still want to zap the drive property?

>  
>      /* clean up host side */
> -    drive_uninit(drive_get_by_blockdev(bs));
> +    bdrv_remove(bs);
>  
>      return 0;
>  }
Ryan Harper - March 23, 2011, 1:53 a.m.
* Markus Armbruster <armbru@redhat.com> [2011-03-15 04:48]:
> Sorry for the long delay, I was out of action for a week.
> 
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > When removing a drive from the host-side via drive_del we currently have the
> > following path:
> >
> > drive_del
> > qemu_aio_flush()
> > bdrv_close()
> > drive_uninit()
> > bdrv_delete()
> >
> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
> > however, the block devices retain a copy of this pointer, see
> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.
> >
> > We now have a use-after-free situation.  If the guest continues to issue IO
> > against the device, and we've reallocated the memory that the BlockDriverState
> > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods.
> 
> "we will fail the bs->drv checks" is misleading, in my opinion.  Here's
> what happens:
> 
> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get
>    dropped.  Works as designed.
> 
> 2. drive_uninit() frees the bs.  Since the device is still connected to
>    bs, any subsequent I/O is a use-after-free.
> 
>    The value of bs->drv becomes unpredictable on free.  As long as it
>    remains null, I/O still gets dropped.  I/O crashes or worse once that
>    changed.  Could be right on free, could be much later.
> 
> If you respin anyway, please clarify your description.

Sure.  I wasn't planning a new version, but I'll update and send anyhow
as I didn't see it get included in pull from the block branch.
> 
> > To resolve this issue as simply as possible, we can chose to not actually
> > delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST
> > that is used to enumerate the block devices.  This is currently handled within
> > bdrv_delete, so move this into it's own function, bdrv_remove().
> 
> Why do we remove the BlockDriverState from bdrv_states?  Because we want
> drive_del make its *name* go away.
> 
> Begs the question: is the code prepared for a BlockDriverState object
> that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
> creates such objects when the device_name is empty.  This is used for
> internal BlockDriverStates such as COW backing files.  Your code makes
> device_name empty when taking the object off bdrv_states, so we're good.
> 
> Begs yet another question: how does the behavior of a BlockDriverState
> change when it's taken off bdrv_states, and is that the behavior we
> want?  Changes:
> 
> * bdrv_delete() no longer takes it off bdrv_states.  Good.
> 
> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
>   cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
>   do nothing anyway for closed BlockDriverStates.
> 
> * "info block" and "info blockstats" no longer show it, because
>   bdrv_info() and bdrv_info_stats() no longer see it.  Okay.
> 
> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
>   Please check their uses and report.

   1    664  block-migration.c <<block_load>>
             bs = bdrv_find(device_name);

    - no longer see it.  This is fine since we can't migrate a block
    device that has been removed

   2    562  blockdev.c <<do_commit>>
             bs = bdrv_find(device);

    - do_commit won't see it in either when calling bdrv_commit_all()
    Fine as you mention above.  If user specifies the device name
    we won't find it, that's OK because we can't commit data against
    a closed BlockDriverState.

   3    587  blockdev.c <<do_snapshot_blkdev>>
             bs = bdrv_find(device);

   - OK, cannot take a snapshot against a deleted BlockDriverState

   4    662  blockdev.c <<do_eject>>
             bs = bdrv_find(filename);

   - OK, cannot eject a deleted BlockDriverState; 

   5    676  blockdev.c <<do_block_set_passwd>>
             bs = bdrv_find(qdict_get_str(qdict, "device"));

   - OK, cannot set password a deleted BlockDriverState; 

   6    701  blockdev.c <<do_change_block>>
             bs = bdrv_find(device);

   - OK, cannot change the file/device of a deleted BlockDriverState; 

   7    732  blockdev.c <<do_drive_del>>
             bs = bdrv_find(id);

   - OK, cannot delete an already deleted Drive

   8    783  blockdev.c <<do_block_resize>>
             bs = bdrv_find(device);

   - OK, cannot resize a deleted Drive

   9    312  hw/qdev-properties.c <<parse_drive>>
             bs = bdrv_find(str);

   - Used when invoking qdev_prop_drive .parse method;  parse method is invoked via
    qdev_device_add() which calls set_property() which invokes parse.  AFAICT, this is OK
    since we won't be going down the device add path worrying about a
    deleted block device.

> 
> > The result is that we can now invoke drive_del, this closes the file descriptors
> > and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
> > and since we do not free BlockDriverState, we don't have to worry about the copy
> > retained in the block devices.
> 
> Yep.  But there's one more question: is the BlockDriverState freed when
> the device using it gets destroyed?
> 
> qdev_free() runs prop->info->free() for all properties.  The drive
> property's free() is free_drive():
> 
>   static void free_drive(DeviceState *dev, Property *prop)
>   {
>       BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
> 
>       if (*ptr) {
>           bdrv_detach(*ptr, dev);
>           blockdev_auto_del(*ptr);
>       }
>   }
> 
> This should indeed delete the drive.  But only if the property still
> points to it.  See below.
> 
> > Reported-by: Marcus Armbruster <armbru@redhat.com>
> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> > ---
> > v1->v2
> >  - NULL bs->device_name after removing from list to prevent
> >    second removal.
> >
> >  block.c    |   12 +++++++++---
> >  block.h    |    1 +
> >  blockdev.c |    2 +-
> >  3 files changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index 1544d81..0df9942 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -697,14 +697,20 @@ void bdrv_close_all(void)
> >      }
> >  }
> >  
> > +void bdrv_remove(BlockDriverState *bs)
> > +{
> > +    if (bs->device_name[0] != '\0') {
> > +        QTAILQ_REMOVE(&bdrv_states, bs, list);
> > +    }
> > +    bs->device_name[0] = '\0';
> > +}
> > +
> 
> I don't like this name.  What's the difference between "delete" and
> "remove"?
> 
> The function zaps the device name.  bdrv_make_anon()?

bdrv_delist? bdrv_hide?  I'm also fine with make_anon.

> 
> >  void bdrv_delete(BlockDriverState *bs)
> >  {
> >      assert(!bs->peer);
> >  
> >      /* remove from list, if necessary */
> > -    if (bs->device_name[0] != '\0') {
> > -        QTAILQ_REMOVE(&bdrv_states, bs, list);
> > -    }
> > +    bdrv_remove(bs);
> >  
> >      bdrv_close(bs);
> >      if (bs->file != NULL) {
> > diff --git a/block.h b/block.h
> > index 5d78fc0..8447397 100644
> > --- a/block.h
> > +++ b/block.h
> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
> >      QEMUOptionParameter *options);
> >  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
> >  BlockDriverState *bdrv_new(const char *device_name);
> > +void bdrv_remove(BlockDriverState *bs);
> >  void bdrv_delete(BlockDriverState *bs);
> >  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
> >  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
> > diff --git a/blockdev.c b/blockdev.c
> > index 0690cc8..1f57b50 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> 
> Let me add a bit more context:
> 
>       bdrv_flush(bs);
>       bdrv_close(bs);
> 
>       /* clean up guest state from pointing to host resource by
>        * finding and removing DeviceState "drive" property */
>       if (bs->peer) {
>           for (prop = bs->peer->info->props; prop && prop->name; prop++) {
>               if (prop->info->type == PROP_TYPE_DRIVE) {
>                   ptr = qdev_get_prop_ptr(bs->peer, prop);
>                   if (*ptr == bs) {
>                       bdrv_detach(bs, bs->peer);
>                       *ptr = NULL;
>                       break;
>                   }
>               }
>           }
> >      }
> 
> This zaps the drive property.  A subsequent free_drive() will do
> nothing.  We leak the BlockDriverState on device unplug, I'm afraid.
> 
> Any reason why we still want to zap the drive property?

IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if
we're keeping the bs around anyhow, then I don't think we need to remove
the property.  One last check would be to see what if the device still
shows up in qtree if we don't remove the property.
Markus Armbruster - March 24, 2011, 12:17 p.m.
Whoops, almost missed this.  Best to cc: me to avoid that.

Ryan Harper <ryanh@us.ibm.com> writes:

> * Markus Armbruster <armbru@redhat.com> [2011-03-15 04:48]:
>> Sorry for the long delay, I was out of action for a week.
>> 
>> Ryan Harper <ryanh@us.ibm.com> writes:
>> 
>> > When removing a drive from the host-side via drive_del we currently have the
>> > following path:
>> >
>> > drive_del
>> > qemu_aio_flush()
>> > bdrv_close()
>> > drive_uninit()
>> > bdrv_delete()
>> >
>> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
>> > however, the block devices retain a copy of this pointer, see
>> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.
>> >
>> > We now have a use-after-free situation.  If the guest continues to issue IO
>> > against the device, and we've reallocated the memory that the BlockDriverState
>> > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods.
>> 
>> "we will fail the bs->drv checks" is misleading, in my opinion.  Here's
>> what happens:
>> 
>> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get
>>    dropped.  Works as designed.
>> 
>> 2. drive_uninit() frees the bs.  Since the device is still connected to
>>    bs, any subsequent I/O is a use-after-free.
>> 
>>    The value of bs->drv becomes unpredictable on free.  As long as it
>>    remains null, I/O still gets dropped.  I/O crashes or worse once that
>>    changed.  Could be right on free, could be much later.
>> 
>> If you respin anyway, please clarify your description.
>
> Sure.  I wasn't planning a new version, but I'll update and send anyhow
> as I didn't see it get included in pull from the block branch.
>> 
>> > To resolve this issue as simply as possible, we can chose to not actually
>> > delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
>> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST
>> > that is used to enumerate the block devices.  This is currently handled within
>> > bdrv_delete, so move this into it's own function, bdrv_remove().
>> 
>> Why do we remove the BlockDriverState from bdrv_states?  Because we want
>> drive_del make its *name* go away.
>> 
>> Begs the question: is the code prepared for a BlockDriverState object
>> that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
>> creates such objects when the device_name is empty.  This is used for
>> internal BlockDriverStates such as COW backing files.  Your code makes
>> device_name empty when taking the object off bdrv_states, so we're good.
>> 
>> Begs yet another question: how does the behavior of a BlockDriverState
>> change when it's taken off bdrv_states, and is that the behavior we
>> want?  Changes:
>> 
>> * bdrv_delete() no longer takes it off bdrv_states.  Good.
>> 
>> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
>>   cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
>>   do nothing anyway for closed BlockDriverStates.
>> 
>> * "info block" and "info blockstats" no longer show it, because
>>   bdrv_info() and bdrv_info_stats() no longer see it.  Okay.
>> 
>> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
>>   Please check their uses and report.
>
>    1    664  block-migration.c <<block_load>>
>              bs = bdrv_find(device_name);
>
>     - no longer see it.  This is fine since we can't migrate a block
>     device that has been removed
>
>    2    562  blockdev.c <<do_commit>>
>              bs = bdrv_find(device);
>
>     - do_commit won't see it in either when calling bdrv_commit_all()
>     Fine as you mention above.  If user specifies the device name
>     we won't find it, that's OK because we can't commit data against
>     a closed BlockDriverState.
>
>    3    587  blockdev.c <<do_snapshot_blkdev>>
>              bs = bdrv_find(device);
>
>    - OK, cannot take a snapshot against a deleted BlockDriverState
>
>    4    662  blockdev.c <<do_eject>>
>              bs = bdrv_find(filename);
>
>    - OK, cannot eject a deleted BlockDriverState; 
>
>    5    676  blockdev.c <<do_block_set_passwd>>
>              bs = bdrv_find(qdict_get_str(qdict, "device"));
>
>    - OK, cannot set password a deleted BlockDriverState; 
>
>    6    701  blockdev.c <<do_change_block>>
>              bs = bdrv_find(device);
>
>    - OK, cannot change the file/device of a deleted BlockDriverState; 
>
>    7    732  blockdev.c <<do_drive_del>>
>              bs = bdrv_find(id);
>
>    - OK, cannot delete an already deleted Drive
>
>    8    783  blockdev.c <<do_block_resize>>
>              bs = bdrv_find(device);
>
>    - OK, cannot resize a deleted Drive
>
>    9    312  hw/qdev-properties.c <<parse_drive>>
>              bs = bdrv_find(str);
>
>    - Used when invoking qdev_prop_drive .parse method;  parse method is invoked via
>     qdev_device_add() which calls set_property() which invokes parse.  AFAICT, this is OK
>     since we won't be going down the device add path worrying about a
>     deleted block device.

Thanks for checking!

>> > The result is that we can now invoke drive_del, this closes the file descriptors
>> > and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
>> > and since we do not free BlockDriverState, we don't have to worry about the copy
>> > retained in the block devices.
>> 
>> Yep.  But there's one more question: is the BlockDriverState freed when
>> the device using it gets destroyed?
>> 
>> qdev_free() runs prop->info->free() for all properties.  The drive
>> property's free() is free_drive():
>> 
>>   static void free_drive(DeviceState *dev, Property *prop)
>>   {
>>       BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
>> 
>>       if (*ptr) {
>>           bdrv_detach(*ptr, dev);
>>           blockdev_auto_del(*ptr);
>>       }
>>   }
>> 
>> This should indeed delete the drive.  But only if the property still
>> points to it.  See below.
>> 
>> > Reported-by: Marcus Armbruster <armbru@redhat.com>
>> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
>> > ---
>> > v1->v2
>> >  - NULL bs->device_name after removing from list to prevent
>> >    second removal.
>> >
>> >  block.c    |   12 +++++++++---
>> >  block.h    |    1 +
>> >  blockdev.c |    2 +-
>> >  3 files changed, 11 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/block.c b/block.c
>> > index 1544d81..0df9942 100644
>> > --- a/block.c
>> > +++ b/block.c
>> > @@ -697,14 +697,20 @@ void bdrv_close_all(void)
>> >      }
>> >  }
>> >  
>> > +void bdrv_remove(BlockDriverState *bs)
>> > +{
>> > +    if (bs->device_name[0] != '\0') {
>> > +        QTAILQ_REMOVE(&bdrv_states, bs, list);
>> > +    }
>> > +    bs->device_name[0] = '\0';
>> > +}
>> > +
>> 
>> I don't like this name.  What's the difference between "delete" and
>> "remove"?
>> 
>> The function zaps the device name.  bdrv_make_anon()?
>
> bdrv_delist? bdrv_hide?  I'm also fine with make_anon.

Any of the three works for me.  "delist" seems the least obvious, but
that's a matter of taste.

>> >  void bdrv_delete(BlockDriverState *bs)
>> >  {
>> >      assert(!bs->peer);
>> >  
>> >      /* remove from list, if necessary */
>> > -    if (bs->device_name[0] != '\0') {
>> > -        QTAILQ_REMOVE(&bdrv_states, bs, list);
>> > -    }
>> > +    bdrv_remove(bs);
>> >  
>> >      bdrv_close(bs);
>> >      if (bs->file != NULL) {
>> > diff --git a/block.h b/block.h
>> > index 5d78fc0..8447397 100644
>> > --- a/block.h
>> > +++ b/block.h
>> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>> >      QEMUOptionParameter *options);
>> >  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>> >  BlockDriverState *bdrv_new(const char *device_name);
>> > +void bdrv_remove(BlockDriverState *bs);
>> >  void bdrv_delete(BlockDriverState *bs);
>> >  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>> >  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> > diff --git a/blockdev.c b/blockdev.c
>> > index 0690cc8..1f57b50 100644
>> > --- a/blockdev.c
>> > +++ b/blockdev.c
>> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
>> 
>> Let me add a bit more context:
>> 
>>       bdrv_flush(bs);
>>       bdrv_close(bs);
>> 
>>       /* clean up guest state from pointing to host resource by
>>        * finding and removing DeviceState "drive" property */
>>       if (bs->peer) {
>>           for (prop = bs->peer->info->props; prop && prop->name; prop++) {
>>               if (prop->info->type == PROP_TYPE_DRIVE) {
>>                   ptr = qdev_get_prop_ptr(bs->peer, prop);
>>                   if (*ptr == bs) {
>>                       bdrv_detach(bs, bs->peer);
>>                       *ptr = NULL;
>>                       break;
>>                   }
>>               }
>>           }
>> >      }
>> 
>> This zaps the drive property.  A subsequent free_drive() will do
>> nothing.  We leak the BlockDriverState on device unplug, I'm afraid.
>> 
>> Any reason why we still want to zap the drive property?
>
> IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if
> we're keeping the bs around anyhow, then I don't think we need to remove
> the property.

Agree.

>                One last check would be to see what if the device still
> shows up in qtree if we don't remove the property.

I expect it to show up as "drive = " instead of "drive = <null>",
because:

    static int print_drive(DeviceState *dev, Property *prop, char *dest, size_t len)
    {
        BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
        return snprintf(dest, len, "%s",
                        *ptr ? bdrv_get_device_name(*ptr) : "<null>");
    }

and

    const char *bdrv_get_device_name(BlockDriverState *bs)
    {
        return bs->device_name;
    }

If the empty right hand side bothers you, you can change print_drive()
to print something else then, say "<anon>", or "<gone>".

Looking forward to v3 :)
Ryan Harper - March 28, 2011, 6:38 p.m.
* Markus Armbruster <armbru@redhat.com> [2011-03-24 07:27]:
> Whoops, almost missed this.  Best to cc: me to avoid that.
> 

It was sent directly to you:

>   Sender: qemu-devel-bounces+ryanh=us.ibm.com@nongnu.org
>   From: Ryan Harper <ryanh@us.ibm.com>
>   Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive
>   Date: Tue, 22 Mar 2011 20:53:47 -0500
>   Message-ID: <20110323015347.GA20139@us.ibm.com>
>   User-Agent: Mutt/1.5.6+20040907i
>   To: Markus Armbruster <armbru@redhat.com>
>   Cc: Kevin Wolf <kwolf@redhat.com>, Ryan Harper <ryanh@us.ibm.com>,
>       qemu-devel@nongnu.org


> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Markus Armbruster <armbru@redhat.com> [2011-03-15 04:48]:
> >> Sorry for the long delay, I was out of action for a week.
> >> 
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> > When removing a drive from the host-side via drive_del we currently have the
> >> > following path:
> >> >
> >> > drive_del
> >> > qemu_aio_flush()
> >> > bdrv_close()
> >> > drive_uninit()
> >> > bdrv_delete()
> >> >
> >> > When we bdrv_delete() we end up qemu_free() the BlockDriverState pointer
> >> > however, the block devices retain a copy of this pointer, see
> >> > hw/virtio-blk.c:virtio_blk_init() where we s->bs = conf->bs.
> >> >
> >> > We now have a use-after-free situation.  If the guest continues to issue IO
> >> > against the device, and we've reallocated the memory that the BlockDriverState
> >> > pointed at, then we will fail the bs->drv checks in the various bdrv_ methods.
> >> 
> >> "we will fail the bs->drv checks" is misleading, in my opinion.  Here's
> >> what happens:
> >> 
> >> 1. bdrv_close(bs) zaps bs->drv, which makes any subsequent I/O get
> >>    dropped.  Works as designed.
> >> 
> >> 2. drive_uninit() frees the bs.  Since the device is still connected to
> >>    bs, any subsequent I/O is a use-after-free.
> >> 
> >>    The value of bs->drv becomes unpredictable on free.  As long as it
> >>    remains null, I/O still gets dropped.  I/O crashes or worse once that
> >>    changed.  Could be right on free, could be much later.
> >> 
> >> If you respin anyway, please clarify your description.
> >
> > Sure.  I wasn't planning a new version, but I'll update and send anyhow
> > as I didn't see it get included in pull from the block branch.
> >> 
> >> > To resolve this issue as simply as possible, we can chose to not actually
> >> > delete the BlockDriverState pointer.  Since bdrv_close() handles setting the drv
> >> > pointer to NULL, we just need to remove the BlockDriverState from the QLIST
> >> > that is used to enumerate the block devices.  This is currently handled within
> >> > bdrv_delete, so move this into it's own function, bdrv_remove().
> >> 
> >> Why do we remove the BlockDriverState from bdrv_states?  Because we want
> >> drive_del make its *name* go away.
> >> 
> >> Begs the question: is the code prepared for a BlockDriverState object
> >> that isn't on bdrv_states?  Turns out we're in luck: bdrv_new() already
> >> creates such objects when the device_name is empty.  This is used for
> >> internal BlockDriverStates such as COW backing files.  Your code makes
> >> device_name empty when taking the object off bdrv_states, so we're good.
> >> 
> >> Begs yet another question: how does the behavior of a BlockDriverState
> >> change when it's taken off bdrv_states, and is that the behavior we
> >> want?  Changes:
> >> 
> >> * bdrv_delete() no longer takes it off bdrv_states.  Good.
> >> 
> >> * bdrv_close_all(), bdrv_commit_all() and bdrv_flush_all() no longer
> >>   cover it.  Okay, because bdrv_close(), bdrv_commit() and bdrv_flush()
> >>   do nothing anyway for closed BlockDriverStates.
> >> 
> >> * "info block" and "info blockstats" no longer show it, because
> >>   bdrv_info() and bdrv_info_stats() no longer see it.  Okay.
> >> 
> >> * bdrv_find(), bdrv_next(), bdrv_iterate() no longer see it.  Impact?
> >>   Please check their uses and report.
> >
> >    1    664  block-migration.c <<block_load>>
> >              bs = bdrv_find(device_name);
> >
> >     - no longer see it.  This is fine since we can't migrate a block
> >     device that has been removed
> >
> >    2    562  blockdev.c <<do_commit>>
> >              bs = bdrv_find(device);
> >
> >     - do_commit won't see it in either when calling bdrv_commit_all()
> >     Fine as you mention above.  If user specifies the device name
> >     we won't find it, that's OK because we can't commit data against
> >     a closed BlockDriverState.
> >
> >    3    587  blockdev.c <<do_snapshot_blkdev>>
> >              bs = bdrv_find(device);
> >
> >    - OK, cannot take a snapshot against a deleted BlockDriverState
> >
> >    4    662  blockdev.c <<do_eject>>
> >              bs = bdrv_find(filename);
> >
> >    - OK, cannot eject a deleted BlockDriverState; 
> >
> >    5    676  blockdev.c <<do_block_set_passwd>>
> >              bs = bdrv_find(qdict_get_str(qdict, "device"));
> >
> >    - OK, cannot set password a deleted BlockDriverState; 
> >
> >    6    701  blockdev.c <<do_change_block>>
> >              bs = bdrv_find(device);
> >
> >    - OK, cannot change the file/device of a deleted BlockDriverState; 
> >
> >    7    732  blockdev.c <<do_drive_del>>
> >              bs = bdrv_find(id);
> >
> >    - OK, cannot delete an already deleted Drive
> >
> >    8    783  blockdev.c <<do_block_resize>>
> >              bs = bdrv_find(device);
> >
> >    - OK, cannot resize a deleted Drive
> >
> >    9    312  hw/qdev-properties.c <<parse_drive>>
> >              bs = bdrv_find(str);
> >
> >    - Used when invoking qdev_prop_drive .parse method;  parse method is invoked via
> >     qdev_device_add() which calls set_property() which invokes parse.  AFAICT, this is OK
> >     since we won't be going down the device add path worrying about a
> >     deleted block device.
> 
> Thanks for checking!
> 
> >> > The result is that we can now invoke drive_del, this closes the file descriptors
> >> > and sets BlockDriverState->drv to NULL which prevents futher IO to the device,
> >> > and since we do not free BlockDriverState, we don't have to worry about the copy
> >> > retained in the block devices.
> >> 
> >> Yep.  But there's one more question: is the BlockDriverState freed when
> >> the device using it gets destroyed?
> >> 
> >> qdev_free() runs prop->info->free() for all properties.  The drive
> >> property's free() is free_drive():
> >> 
> >>   static void free_drive(DeviceState *dev, Property *prop)
> >>   {
> >>       BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
> >> 
> >>       if (*ptr) {
> >>           bdrv_detach(*ptr, dev);
> >>           blockdev_auto_del(*ptr);
> >>       }
> >>   }
> >> 
> >> This should indeed delete the drive.  But only if the property still
> >> points to it.  See below.
> >> 
> >> > Reported-by: Marcus Armbruster <armbru@redhat.com>
> >> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> >> > ---
> >> > v1->v2
> >> >  - NULL bs->device_name after removing from list to prevent
> >> >    second removal.
> >> >
> >> >  block.c    |   12 +++++++++---
> >> >  block.h    |    1 +
> >> >  blockdev.c |    2 +-
> >> >  3 files changed, 11 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/block.c b/block.c
> >> > index 1544d81..0df9942 100644
> >> > --- a/block.c
> >> > +++ b/block.c
> >> > @@ -697,14 +697,20 @@ void bdrv_close_all(void)
> >> >      }
> >> >  }
> >> >  
> >> > +void bdrv_remove(BlockDriverState *bs)
> >> > +{
> >> > +    if (bs->device_name[0] != '\0') {
> >> > +        QTAILQ_REMOVE(&bdrv_states, bs, list);
> >> > +    }
> >> > +    bs->device_name[0] = '\0';
> >> > +}
> >> > +
> >> 
> >> I don't like this name.  What's the difference between "delete" and
> >> "remove"?
> >> 
> >> The function zaps the device name.  bdrv_make_anon()?
> >
> > bdrv_delist? bdrv_hide?  I'm also fine with make_anon.
> 
> Any of the three works for me.  "delist" seems the least obvious, but
> that's a matter of taste.
> 
> >> >  void bdrv_delete(BlockDriverState *bs)
> >> >  {
> >> >      assert(!bs->peer);
> >> >  
> >> >      /* remove from list, if necessary */
> >> > -    if (bs->device_name[0] != '\0') {
> >> > -        QTAILQ_REMOVE(&bdrv_states, bs, list);
> >> > -    }
> >> > +    bdrv_remove(bs);
> >> >  
> >> >      bdrv_close(bs);
> >> >      if (bs->file != NULL) {
> >> > diff --git a/block.h b/block.h
> >> > index 5d78fc0..8447397 100644
> >> > --- a/block.h
> >> > +++ b/block.h
> >> > @@ -66,6 +66,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
> >> >      QEMUOptionParameter *options);
> >> >  int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
> >> >  BlockDriverState *bdrv_new(const char *device_name);
> >> > +void bdrv_remove(BlockDriverState *bs);
> >> >  void bdrv_delete(BlockDriverState *bs);
> >> >  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
> >> >  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
> >> > diff --git a/blockdev.c b/blockdev.c
> >> > index 0690cc8..1f57b50 100644
> >> > --- a/blockdev.c
> >> > +++ b/blockdev.c
> >> > @@ -760,7 +760,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
> >> 
> >> Let me add a bit more context:
> >> 
> >>       bdrv_flush(bs);
> >>       bdrv_close(bs);
> >> 
> >>       /* clean up guest state from pointing to host resource by
> >>        * finding and removing DeviceState "drive" property */
> >>       if (bs->peer) {
> >>           for (prop = bs->peer->info->props; prop && prop->name; prop++) {
> >>               if (prop->info->type == PROP_TYPE_DRIVE) {
> >>                   ptr = qdev_get_prop_ptr(bs->peer, prop);
> >>                   if (*ptr == bs) {
> >>                       bdrv_detach(bs, bs->peer);
> >>                       *ptr = NULL;
> >>                       break;
> >>                   }
> >>               }
> >>           }
> >> >      }
> >> 
> >> This zaps the drive property.  A subsequent free_drive() will do
> >> nothing.  We leak the BlockDriverState on device unplug, I'm afraid.
> >> 
> >> Any reason why we still want to zap the drive property?
> >
> > IIRC, it was to prevent qdev from keeping a ptr around to the bs; but if
> > we're keeping the bs around anyhow, then I don't think we need to remove
> > the property.
> 
> Agree.
> 
> >                One last check would be to see what if the device still
> > shows up in qtree if we don't remove the property.
> 
> I expect it to show up as "drive = " instead of "drive = <null>",
> because:
> 
>     static int print_drive(DeviceState *dev, Property *prop, char *dest, size_t len)
>     {
>         BlockDriverState **ptr = qdev_get_prop_ptr(dev, prop);
>         return snprintf(dest, len, "%s",
>                         *ptr ? bdrv_get_device_name(*ptr) : "<null>");
>     }
> 
> and
> 
>     const char *bdrv_get_device_name(BlockDriverState *bs)
>     {
>         return bs->device_name;
>     }
> 
> If the empty right hand side bothers you, you can change print_drive()
> to print something else then, say "<anon>", or "<gone>".
> 
> Looking forward to v3 :)

After our chat: here's what I've got in v3:

- Update drive_del use after free description
- s/bdrv_remove/bdrv_make_anon/g
- Don't remove qdev property since we don't delete bs any more
- If (bs->peer) bdrv_make_anon else bdrv_delete to handle removing
  drives with no device.

Sending out v3 as a top-post now.
Markus Armbruster - March 29, 2011, 7:34 a.m.
Ryan Harper <ryanh@us.ibm.com> writes:

> * Markus Armbruster <armbru@redhat.com> [2011-03-24 07:27]:
>> Whoops, almost missed this.  Best to cc: me to avoid that.
>> 
>
> It was sent directly to you:
>
>>   Sender: qemu-devel-bounces+ryanh=us.ibm.com@nongnu.org
>>   From: Ryan Harper <ryanh@us.ibm.com>
>>   Subject: Re: [Qemu-devel] [PATCH v2] Do not delete BlockDriverState when deleting the drive
>>   Date: Tue, 22 Mar 2011 20:53:47 -0500
>>   Message-ID: <20110323015347.GA20139@us.ibm.com>
>>   User-Agent: Mutt/1.5.6+20040907i
>>   To: Markus Armbruster <armbru@redhat.com>
>>   Cc: Kevin Wolf <kwolf@redhat.com>, Ryan Harper <ryanh@us.ibm.com>,
>>       qemu-devel@nongnu.org

Indeed.  Best to cc: me *and* to ping me when the cc: doesn't get a
timely response.  Thanks!

[...]

Patch

diff --git a/block.c b/block.c
index 1544d81..0df9942 100644
--- a/block.c
+++ b/block.c
@@ -697,14 +697,20 @@  void bdrv_close_all(void)
     }
 }
 
+void bdrv_remove(BlockDriverState *bs)
+{
+    if (bs->device_name[0] != '\0') {
+        QTAILQ_REMOVE(&bdrv_states, bs, list);
+    }
+    bs->device_name[0] = '\0';
+}
+
 void bdrv_delete(BlockDriverState *bs)
 {
     assert(!bs->peer);
 
     /* remove from list, if necessary */
-    if (bs->device_name[0] != '\0') {
-        QTAILQ_REMOVE(&bdrv_states, bs, list);
-    }
+    bdrv_remove(bs);
 
     bdrv_close(bs);
     if (bs->file != NULL) {
diff --git a/block.h b/block.h
index 5d78fc0..8447397 100644
--- a/block.h
+++ b/block.h
@@ -66,6 +66,7 @@  int bdrv_create(BlockDriver *drv, const char* filename,
     QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_remove(BlockDriverState *bs);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
 int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
diff --git a/blockdev.c b/blockdev.c
index 0690cc8..1f57b50 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -760,7 +760,7 @@  int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
     }
 
     /* clean up host side */
-    drive_uninit(drive_get_by_blockdev(bs));
+    bdrv_remove(bs);
 
     return 0;
 }