diff mbox

[2/2] v2 Fix Block Hotplug race with drive_unplug()

Message ID 1287498749-10400-3-git-send-email-ryanh@us.ibm.com
State New
Headers show

Commit Message

Ryan Harper Oct. 19, 2010, 2:32 p.m. UTC
Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_unplug which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Changes since v1:
- Added qemu_aio_flush() before bdrv_flush() to wait on pending io

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
 block.c         |    7 +++++++
 block.h         |    1 +
 blockdev.c      |   26 ++++++++++++++++++++++++++
 blockdev.h      |    1 +
 hmp-commands.hx |   15 +++++++++++++++
 5 files changed, 50 insertions(+), 0 deletions(-)

Comments

Stefan Hajnoczi Oct. 19, 2010, 3:21 p.m. UTC | #1
On Tue, Oct 19, 2010 at 3:32 PM, Ryan Harper <ryanh@us.ibm.com> wrote:
> Block hot unplug is racy since the guest is required to acknowlege the ACPI
> unplug event; this may not happen synchronously with the device removal command
>
> This series aims to close a gap where by mgmt applications that assume the
> block resource has been removed without confirming that the guest has
> acknowledged the removal may re-assign the underlying device to a second guest
> leading to data leakage.
>
> This series introduces a new montor command to decouple asynchornous device
> removal from restricting guest access to a block device.  We do this by creating
> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> IO is rejected from the device and the guest will get IO errors but continue to
> function.
>
> A subsequent device removal command can be issued to remove the device, to which
> the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> will be sumbitted.
>
> Changes since v1:
> - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
>
> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> ---
>  block.c         |    7 +++++++
>  block.h         |    1 +
>  blockdev.c      |   26 ++++++++++++++++++++++++++
>  blockdev.h      |    1 +
>  hmp-commands.hx |   15 +++++++++++++++
>  5 files changed, 50 insertions(+), 0 deletions(-)

Looks good to me.

Stefan
Daniel P. Berrangé Oct. 21, 2010, 1:27 p.m. UTC | #2
On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote:
> Block hot unplug is racy since the guest is required to acknowlege the ACPI
> unplug event; this may not happen synchronously with the device removal command
> 
> This series aims to close a gap where by mgmt applications that assume the
> block resource has been removed without confirming that the guest has
> acknowledged the removal may re-assign the underlying device to a second guest
> leading to data leakage.
> 
> This series introduces a new montor command to decouple asynchornous device
> removal from restricting guest access to a block device.  We do this by creating
> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> IO is rejected from the device and the guest will get IO errors but continue to
> function.
> 
> A subsequent device removal command can be issued to remove the device, to which
> the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> will be sumbitted.

The name 'drive_unplug' suggests to me that the drive object is
not being deleted/free()d ? Is that correct understanding, and if
so, what is responsible for finally free()ing the drive backend ?

Regards,
Daniel
Ryan Harper Oct. 21, 2010, 9:37 p.m. UTC | #3
* Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]:
> On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote:
> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> > unplug event; this may not happen synchronously with the device removal command
> > 
> > This series aims to close a gap where by mgmt applications that assume the
> > block resource has been removed without confirming that the guest has
> > acknowledged the removal may re-assign the underlying device to a second guest
> > leading to data leakage.
> > 
> > This series introduces a new montor command to decouple asynchornous device
> > removal from restricting guest access to a block device.  We do this by creating
> > a new monitor command drive_unplug which maps to a bdrv_unplug() command which
> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> > IO is rejected from the device and the guest will get IO errors but continue to
> > function.
> > 
> > A subsequent device removal command can be issued to remove the device, to which
> > the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> > will be sumbitted.
> 
> The name 'drive_unplug' suggests to me that the drive object is
> not being deleted/free()d ? Is that correct understanding, and if
> so, what is responsible for finally free()ing the drive backend ?

It's technically the BlockDriverState Driver that we're closing.  To
fully release the remaining resources, a device_del is required (which
of course requires guest participation with the current
interface).

Once QEMU issues the removal request, the guest responds and the piix4
acpi handler for pciej_write writes invokes qdev_free() on the target
device.  qdev_free() on the pci device will make it's way to the qdev
exit handler registered for virtio-blk devices, virtio_blk_exit_pci().
virtio_blk_exit_pci() marks the drive structure for deletion.  When qdev
calls the properties handler, it invokes free_drive() on the disk and
that calls blockdev_auto_del() which will do a bdrv_delete() which nukes
the remaining objects (the acutal BlockDriverState).

I think I got the whole path in there.
Daniel P. Berrangé Oct. 22, 2010, 8:10 a.m. UTC | #4
On Thu, Oct 21, 2010 at 04:37:46PM -0500, Ryan Harper wrote:
> * Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]:
> > On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote:
> > > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> > > unplug event; this may not happen synchronously with the device removal command
> > > 
> > > This series aims to close a gap where by mgmt applications that assume the
> > > block resource has been removed without confirming that the guest has
> > > acknowledged the removal may re-assign the underlying device to a second guest
> > > leading to data leakage.
> > > 
> > > This series introduces a new montor command to decouple asynchornous device
> > > removal from restricting guest access to a block device.  We do this by creating
> > > a new monitor command drive_unplug which maps to a bdrv_unplug() command which
> > > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> > > IO is rejected from the device and the guest will get IO errors but continue to
> > > function.
> > > 
> > > A subsequent device removal command can be issued to remove the device, to which
> > > the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> > > will be sumbitted.
> > 
> > The name 'drive_unplug' suggests to me that the drive object is
> > not being deleted/free()d ? Is that correct understanding, and if
> > so, what is responsible for finally free()ing the drive backend ?
> 
> It's technically the BlockDriverState Driver that we're closing.  To
> fully release the remaining resources, a device_del is required (which
> of course requires guest participation with the current
> interface).
> 
> Once QEMU issues the removal request, the guest responds and the piix4
> acpi handler for pciej_write writes invokes qdev_free() on the target
> device.  qdev_free() on the pci device will make it's way to the qdev
> exit handler registered for virtio-blk devices, virtio_blk_exit_pci().
> virtio_blk_exit_pci() marks the drive structure for deletion.  When qdev
> calls the properties handler, it invokes free_drive() on the disk and
> that calls blockdev_auto_del() which will do a bdrv_delete() which nukes
> the remaining objects (the acutal BlockDriverState).
> 
> I think I got the whole path in there.

Ok, thanks, that makes sense to me.

Sounds like we do still need a separate drive_del in the future to 
handle the different case of, drive_add, followed by a device_add
attempt which fails.

Regards,
Daniel
Kevin Wolf Oct. 22, 2010, 10:31 a.m. UTC | #5
Am 21.10.2010 23:37, schrieb Ryan Harper:
> * Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]:
>> On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote:
>>> Block hot unplug is racy since the guest is required to acknowlege the ACPI
>>> unplug event; this may not happen synchronously with the device removal command
>>>
>>> This series aims to close a gap where by mgmt applications that assume the
>>> block resource has been removed without confirming that the guest has
>>> acknowledged the removal may re-assign the underlying device to a second guest
>>> leading to data leakage.
>>>
>>> This series introduces a new montor command to decouple asynchornous device
>>> removal from restricting guest access to a block device.  We do this by creating
>>> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
>>> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
>>> IO is rejected from the device and the guest will get IO errors but continue to
>>> function.
>>>
>>> A subsequent device removal command can be issued to remove the device, to which
>>> the guest may or maynot respond, but as long as the unplugged bit is set, no IO
>>> will be sumbitted.
>>
>> The name 'drive_unplug' suggests to me that the drive object is
>> not being deleted/free()d ? Is that correct understanding, and if
>> so, what is responsible for finally free()ing the drive backend ?
> 
> It's technically the BlockDriverState Driver that we're closing.  To
> fully release the remaining resources, a device_del is required (which
> of course requires guest participation with the current
> interface).

So is this basically what blockdev_del is supposed to become one day?

Copying Markus to have a look at this. I'm sure he has some thoughts on
it as he was planning to implement blockdev_add/del.

Kevin

> 
> Once QEMU issues the removal request, the guest responds and the piix4
> acpi handler for pciej_write writes invokes qdev_free() on the target
> device.  qdev_free() on the pci device will make it's way to the qdev
> exit handler registered for virtio-blk devices, virtio_blk_exit_pci().
> virtio_blk_exit_pci() marks the drive structure for deletion.  When qdev
> calls the properties handler, it invokes free_drive() on the disk and
> that calls blockdev_auto_del() which will do a bdrv_delete() which nukes
> the remaining objects (the acutal BlockDriverState).
> 
> I think I got the whole path in there.
Markus Armbruster Oct. 29, 2010, 1:33 p.m. UTC | #6
Kevin Wolf <kwolf@redhat.com> writes:

> Am 21.10.2010 23:37, schrieb Ryan Harper:
>> * Daniel P. Berrange <berrange@redhat.com> [2010-10-21 08:29]:
>>> On Tue, Oct 19, 2010 at 09:32:29AM -0500, Ryan Harper wrote:
>>>> Block hot unplug is racy since the guest is required to acknowlege the ACPI
>>>> unplug event; this may not happen synchronously with the device removal command
>>>>
>>>> This series aims to close a gap where by mgmt applications that assume the
>>>> block resource has been removed without confirming that the guest has
>>>> acknowledged the removal may re-assign the underlying device to a second guest
>>>> leading to data leakage.
>>>>
>>>> This series introduces a new montor command to decouple asynchornous device
>>>> removal from restricting guest access to a block device.  We do this by creating
>>>> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
>>>> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
>>>> IO is rejected from the device and the guest will get IO errors but continue to
>>>> function.
>>>>
>>>> A subsequent device removal command can be issued to remove the device, to which
>>>> the guest may or maynot respond, but as long as the unplugged bit is set, no IO
>>>> will be sumbitted.
>>>
>>> The name 'drive_unplug' suggests to me that the drive object is
>>> not being deleted/free()d ? Is that correct understanding, and if
>>> so, what is responsible for finally free()ing the drive backend ?
>> 
>> It's technically the BlockDriverState Driver that we're closing.  To
>> fully release the remaining resources, a device_del is required (which
>> of course requires guest participation with the current
>> interface).
>
> So is this basically what blockdev_del is supposed to become one day?
>
> Copying Markus to have a look at this. I'm sure he has some thoughts on
> it as he was planning to implement blockdev_add/del.

Yes, Ryan's drive_unplug is quite close to my blockdev_del.  However, my
blockdev_del is part of a more ambitious job, namely to cleanly separate
host and guest part of block devices.  A whole lot of preliminary
cleanups have made it in so far, but not the actual commands.

I'll reply in more detail to the latest version of the patch series.

[...]
diff mbox

Patch

diff --git a/block.c b/block.c
index a19374d..be47655 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@  void bdrv_set_removable(BlockDriverState *bs, int removable)
     }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+    qemu_aio_flush();
+    bdrv_flush(bs);
+    bdrv_close(bs);
+}
+
 int bdrv_is_removable(BlockDriverState *bs)
 {
     return bs->removable;
diff --git a/block.h b/block.h
index 5f64380..732f63e 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@  void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
                        BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 5fc3b9b..68eb329 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -610,3 +610,29 @@  int do_change_block(Monitor *mon, const char *device,
     }
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    DriveInfo *dinfo;
+    BlockDriverState *bs;
+    const char *id;
+
+    if (!qdict_haskey(qdict, "id")) {
+        qerror_report(QERR_MISSING_PARAMETER, "id");
+        return -1;
+    }
+
+    id = qdict_get_str(qdict, "id");
+    dinfo = drive_get_by_id(id);
+    if (!dinfo) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, id);
+        return -1;
+    }
+
+    /* mark block device unplugged */
+    bs = dinfo->bdrv;
+    bdrv_unplug(bs);
+
+    return 0;
+}
+ 
diff --git a/blockdev.h b/blockdev.h
index 19c6915..ecb9ac8 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -52,5 +52,6 @@  int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..7a32a2e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,21 @@  Eject a removable medium (use -f to force it).
 ETEXI
 
     {
+        .name       = "drive_unplug",
+        .args_type  = "id:s",
+        .params     = "device",
+        .help       = "unplug block device",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_drive_unplug,
+    },
+
+STEXI
+@item unplug @var{device}
+@findex unplug
+Unplug block device.
+ETEXI
+
+    {
         .name       = "change",
         .args_type  = "device:B,target:F,arg:s?",
         .params     = "device filename [format]",