diff mbox

[04/21] block: Add bdrv_close_all() handlers

Message ID 1422300468-16216-5-git-send-email-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz Jan. 26, 2015, 7:27 p.m. UTC
Every time a reference to a BlockBackend is taken, a notifier for
bdrv_close_all() has to be deposited so the reference holder can
relinquish its reference when bdrv_close_all() is called. That notifier
should be revoked on a bdrv_unref() call.

Add a Notifier * parameter to all the functions changing the reference
count of a BlockBackend: blk_new(), blk_new_with_bs(), blk_new_open(),
blk_ref() and blk_unref().

By dropping the manual reference handling in hw/block/xen_disk.c, the
blk_unref() in hw/ide/piix.c can be dropped as well.

NBD used close notifiers so far; this patch changes it to use the
bdrv_close_all() notifier as well. The change in behavior is the
following: First, the close notifier was of course called indirectly
through bdrv_close_all(), so this case will not change.

Second, as bdrv_close() could not be called because of a bdrv_delete()
(NBD holds a reference to the BB which in turn holds a BDS reference),
the only other way for the notifier to be called was because of an
eject. However, this was wrong: With the NBD attached to some BB, it
should stay attached until the NBD export is removed or the server
stopped; medium ejection should have nothing to do with this (other than
the NBD server not being able to read data while the virtual is open, if
there is one (and there is one if the NBD server is attached to the same
BB as a guest device with a tray is; which is correct behavior because
attaching an NBD server to a guest-used BB should result in the NBD
server behaving exactly the same as the guest device)).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-backend.c          | 69 ++++++++++++++++++++++++++++++++++++------
 blockdev-nbd.c                 | 36 +---------------------
 blockdev.c                     | 62 ++++++++++++++++++++++++++++++++++---
 device-hotplug.c               |  2 +-
 hw/block/xen_disk.c            | 16 +++++++---
 hw/ide/piix.c                  |  1 -
 include/sysemu/block-backend.h | 13 +++++---
 include/sysemu/blockdev.h      |  3 ++
 nbd.c                          | 37 ++++++++++++++++++++--
 qemu-img.c                     | 39 ++++++++++++------------
 qemu-io.c                      |  6 ++--
 qemu-nbd.c                     |  6 ++--
 12 files changed, 202 insertions(+), 88 deletions(-)

Comments

Paolo Bonzini Jan. 26, 2015, 8:40 p.m. UTC | #1
On 26/01/2015 20:27, Max Reitz wrote:
> However, this was wrong: With the NBD attached to some BB, it
> should stay attached until the NBD export is removed or the server
> stopped; medium ejection should have nothing to do with this (other than
> the NBD server not being able to read data while the virtual is open, if
> there is one (and there is one if the NBD server is attached to the same
> BB as a guest device with a tray is; which is correct behavior because
> attaching an NBD server to a guest-used BB should result in the NBD
> server behaving exactly the same as the guest device)).

Not entirely; use of close notifiers was on purpose.  We do not want the
NBD client to start reading data from a different medium without noticing.

If the NBD server is attached to the BDS, it should keep serving the BDS.

If the NBD server is attached to the BB, the clients must have a way to
notice the medium change, and so far the solution was to close the
connection.  I'd be wary to change this, though I understand where you
come from.

Paolo
Max Reitz Jan. 26, 2015, 8:43 p.m. UTC | #2
On 2015-01-26 at 15:40, Paolo Bonzini wrote:
>
> On 26/01/2015 20:27, Max Reitz wrote:
>> However, this was wrong: With the NBD attached to some BB, it
>> should stay attached until the NBD export is removed or the server
>> stopped; medium ejection should have nothing to do with this (other than
>> the NBD server not being able to read data while the virtual is open, if
>> there is one (and there is one if the NBD server is attached to the same
>> BB as a guest device with a tray is; which is correct behavior because
>> attaching an NBD server to a guest-used BB should result in the NBD
>> server behaving exactly the same as the guest device)).
> Not entirely; use of close notifiers was on purpose.  We do not want the
> NBD client to start reading data from a different medium without noticing.
>
> If the NBD server is attached to the BDS, it should keep serving the BDS.

The problem is that it is no longer attached to the BDS, but to the BB.

> If the NBD server is attached to the BB, the clients must have a way to
> notice the medium change, and so far the solution was to close the
> connection.  I'd be wary to change this, though I understand where you
> come from.

Yes, I understand. However, I think the NBD server has to be attached to 
a separate BB if medium changes should not be allowed; but this would 
break backwards compatibility...

I think to retain compatibility we could either just do what we always 
did (although I find it wrong), or we could simply set up an eject 
blocker when attaching an NBD server to a BB. What do you think?

Max
Paolo Bonzini Jan. 26, 2015, 9:10 p.m. UTC | #3
On 26/01/2015 21:43, Max Reitz wrote:
>> If the NBD server is attached to the BDS, it should keep serving the BDS.
> 
> The problem is that it is no longer attached to the BDS, but to the BB.

That's not necessarily a problem. :)  It is the cause of the problem though.

Is it possible to attach two BBs to the same BDS?

Because part of the solution could be to introduce a new blockdev-serve
command that takes a BDS, creates a BB and exports that BB.

> I think to retain compatibility we could either just do what we always
> did (although I find it wrong), or we could simply set up an eject
> blocker when attaching an NBD server to a BB. What do you think?

An eject blocker would also break backwards-compatibility though.  What
about an eject notifier?  Would that concept make sense?

Paolo
Max Reitz Jan. 26, 2015, 9:13 p.m. UTC | #4
On 2015-01-26 at 16:10, Paolo Bonzini wrote:
>
> On 26/01/2015 21:43, Max Reitz wrote:
>>> If the NBD server is attached to the BDS, it should keep serving the BDS.
>> The problem is that it is no longer attached to the BDS, but to the BB.
> That's not necessarily a problem. :)  It is the cause of the problem though.
>
> Is it possible to attach two BBs to the same BDS?
>
> Because part of the solution could be to introduce a new blockdev-serve
> command that takes a BDS, creates a BB and exports that BB.
>
>> I think to retain compatibility we could either just do what we always
>> did (although I find it wrong), or we could simply set up an eject
>> blocker when attaching an NBD server to a BB. What do you think?
> An eject blocker would also break backwards-compatibility though.  What
> about an eject notifier?  Would that concept make sense?

It does make sense (in that it is the way I would implement "just do 
what we always did"), but I just don't like it for the fact that it 
makes NBD a special snowflake. I can live with it, though.

Max
Paolo Bonzini Jan. 26, 2015, 9:22 p.m. UTC | #5
On 26/01/2015 22:13, Max Reitz wrote:
>>>
>> An eject blocker would also break backwards-compatibility though.  What
>> about an eject notifier?  Would that concept make sense?
> 
> It does make sense (in that it is the way I would implement "just do
> what we always did"), but I just don't like it for the fact that it
> makes NBD a special snowflake. I can live with it, though.

Yes, it's weird.  But this is just the backwards-compatible solution.

I'm okay with implementing only the new solution, but:

- the old QMP (and HMP?) commands must be removed

- the new command probably must not reuse the same BB as the guest, and
I am not sure that this is possible.

Paolo
Eric Blake Jan. 28, 2015, 10:05 p.m. UTC | #6
On 01/26/2015 02:22 PM, Paolo Bonzini wrote:
> 
> 
> On 26/01/2015 22:13, Max Reitz wrote:
>>>>
>>> An eject blocker would also break backwards-compatibility though.  What
>>> about an eject notifier?  Would that concept make sense?
>>
>> It does make sense (in that it is the way I would implement "just do
>> what we always did"), but I just don't like it for the fact that it
>> makes NBD a special snowflake. I can live with it, though.
> 
> Yes, it's weird.  But this is just the backwards-compatible solution.
> 
> I'm okay with implementing only the new solution, but:
> 
> - the old QMP (and HMP?) commands must be removed

Back-compat is a bear to figure out. From libvirt's perspective, it is
okay to require a newer libvirt in order to drive newer qemu (the new
libvirt knowing how to probe whether old or new commands exist, and use
the right one); but it is still awkward any time upgrading qemu without
libvirt causes things to break gratuitously.

> 
> - the new command probably must not reuse the same BB as the guest, and
> I am not sure that this is possible.

We've had that design goal in the back of our minds for some time
(sharing a single BDS among multiple devices) - but I don't think it has
actually happened yet, so if this is the first time we make it happen,
there may be lots of details to get right.  But it makes the most sense
(exporting and NBD disk is a form of creating a _new_ BB - distinct from
a guest-visible device, but both uses are definitely backends; and
sharing the same BDS among both backends makes total sense, so that the
drive visible to the guest can change medium without invalidating the
NBD serving up the old contents).
Eric Blake Jan. 28, 2015, 10:44 p.m. UTC | #7
On 01/26/2015 12:27 PM, Max Reitz wrote:
> Every time a reference to a BlockBackend is taken, a notifier for
> bdrv_close_all() has to be deposited so the reference holder can
> relinquish its reference when bdrv_close_all() is called. That notifier
> should be revoked on a bdrv_unref() call.
> 

In addition to the design question about whether NBD exports should be
their own new BB,

> @@ -198,8 +207,12 @@ void blk_ref(BlockBackend *blk)
>   * If this drops it to zero, destroy @blk.
>   * For convenience, do nothing if @blk is null.
>   */
> -void blk_unref(BlockBackend *blk)
> +void blk_unref(BlockBackend *blk, Notifier *close_all_notifier)
>  {
> +    if (close_all_notifier) {
> +        notifier_remove(close_all_notifier);
> +    }
> +
>      if (blk) {

Does removing a notifier when blk is NULL make sense?
Max Reitz Jan. 28, 2015, 10:51 p.m. UTC | #8
On 2015-01-28 at 17:44, Eric Blake wrote:
> On 01/26/2015 12:27 PM, Max Reitz wrote:
>> Every time a reference to a BlockBackend is taken, a notifier for
>> bdrv_close_all() has to be deposited so the reference holder can
>> relinquish its reference when bdrv_close_all() is called. That notifier
>> should be revoked on a bdrv_unref() call.
>>
> In addition to the design question about whether NBD exports should be
> their own new BB,
>
>> @@ -198,8 +207,12 @@ void blk_ref(BlockBackend *blk)
>>    * If this drops it to zero, destroy @blk.
>>    * For convenience, do nothing if @blk is null.
>>    */
>> -void blk_unref(BlockBackend *blk)
>> +void blk_unref(BlockBackend *blk, Notifier *close_all_notifier)
>>   {
>> +    if (close_all_notifier) {
>> +        notifier_remove(close_all_notifier);
>> +    }
>> +
>>       if (blk) {
> Does removing a notifier when blk is NULL make sense?

Considering the pattern is probably "allocate notifier; if that fails, 
goto fail; create BB; if that fails, goto fail; ...; fail: blk_unref(); 
free notifier", the is indeed a case where the notifier may be non-NULL 
but not yet registered (if "create BB" failed). Therefore, to simplify 
the caller code, notifier_remove() should only be called if blk is not 
NULL, right.

Will fix.

Max
Max Reitz Jan. 28, 2015, 10:56 p.m. UTC | #9
On 2015-01-28 at 17:05, Eric Blake wrote:
> On 01/26/2015 02:22 PM, Paolo Bonzini wrote:
>>
>> On 26/01/2015 22:13, Max Reitz wrote:
>>>> An eject blocker would also break backwards-compatibility though.  What
>>>> about an eject notifier?  Would that concept make sense?
>>> It does make sense (in that it is the way I would implement "just do
>>> what we always did"), but I just don't like it for the fact that it
>>> makes NBD a special snowflake. I can live with it, though.
>> Yes, it's weird.  But this is just the backwards-compatible solution.
>>
>> I'm okay with implementing only the new solution, but:
>>
>> - the old QMP (and HMP?) commands must be removed
> Back-compat is a bear to figure out. From libvirt's perspective, it is
> okay to require a newer libvirt in order to drive newer qemu (the new
> libvirt knowing how to probe whether old or new commands exist, and use
> the right one); but it is still awkward any time upgrading qemu without
> libvirt causes things to break gratuitously.
>
>> - the new command probably must not reuse the same BB as the guest, and
>> I am not sure that this is possible.
> We've had that design goal in the back of our minds for some time
> (sharing a single BDS among multiple devices) - but I don't think it has
> actually happened yet, so if this is the first time we make it happen,
> there may be lots of details to get right.  But it makes the most sense
> (exporting and NBD disk is a form of creating a _new_ BB - distinct from
> a guest-visible device, but both uses are definitely backends; and
> sharing the same BDS among both backends makes total sense, so that the
> drive visible to the guest can change medium without invalidating the
> NBD serving up the old contents).

Well, I've looked up the discussion Markus, Kevin and me were having; 
our result was that some users may find it useful to have an own BB for 
an NBD server, while others may want to re-use an existing BB. The 
former would be requested by creating an NBD server on a node-name 
instead of giving a device name; if given a device name, NBD will reuse 
that BB, and will detach itself on eject.

Somehow I lost track of that final detail (I blame Christmas and New 
Year), so that's indeed what we decided upon. I will implement it in v2 
(an eject notifier for BBs, which is then used by NBD).

For the sake of completeness: Currently, it is impossible to have 
multiple BBs per BDS, so we cannot yet implement having a separate BB 
for the NBD server. This issue is however unrelated to this series, so 
we should be fine for now.

Max
Paolo Bonzini Jan. 29, 2015, 10:59 a.m. UTC | #10
On 28/01/2015 23:56, Max Reitz wrote:
> Somehow I lost track of that final detail (I blame Christmas and New
> Year), so that's indeed what we decided upon. I will implement it in v2
> (an eject notifier for BBs, which is then used by NBD).

Yeah, I was thinking again about it and an eject notifier really sounds
like the best plan.

Paolo
diff mbox

Patch

diff --git a/block/block-backend.c b/block/block-backend.c
index 7c18f8d..9f14f9b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -74,7 +74,8 @@  static QTAILQ_HEAD(, BlockBackend) blk_backends =
  * Store an error through @errp on failure, unless it's null.
  * Return the new BlockBackend on success, null on failure.
  */
-BlockBackend *blk_new(const char *name, Error **errp)
+BlockBackend *blk_new(const char *name, Notifier *close_all_notifier,
+                      Error **errp)
 {
     BlockBackend *blk;
 
@@ -97,6 +98,9 @@  BlockBackend *blk_new(const char *name, Error **errp)
     blk = g_new0(BlockBackend, 1);
     blk->name = g_strdup(name);
     blk->refcnt = 1;
+    if (close_all_notifier) {
+        bdrv_add_close_all_notifier(close_all_notifier);
+    }
     QTAILQ_INSERT_TAIL(&blk_backends, blk, link);
     return blk;
 }
@@ -105,12 +109,13 @@  BlockBackend *blk_new(const char *name, Error **errp)
  * Create a new BlockBackend with a new BlockDriverState attached.
  * Otherwise just like blk_new(), which see.
  */
-BlockBackend *blk_new_with_bs(const char *name, Error **errp)
+BlockBackend *blk_new_with_bs(const char *name, Notifier *close_all_notifier,
+                              Error **errp)
 {
     BlockBackend *blk;
     BlockDriverState *bs;
 
-    blk = blk_new(name, errp);
+    blk = blk_new(name, close_all_notifier, errp);
     if (!blk) {
         return NULL;
     }
@@ -135,12 +140,12 @@  BlockBackend *blk_new_with_bs(const char *name, Error **errp)
  */
 BlockBackend *blk_new_open(const char *name, const char *filename,
                            const char *reference, QDict *options, int flags,
-                           Error **errp)
+                           Notifier *close_all_notifier, Error **errp)
 {
     BlockBackend *blk;
     int ret;
 
-    blk = blk_new_with_bs(name, errp);
+    blk = blk_new_with_bs(name, close_all_notifier, errp);
     if (!blk) {
         QDECREF(options);
         return NULL;
@@ -148,7 +153,7 @@  BlockBackend *blk_new_open(const char *name, const char *filename,
 
     ret = bdrv_open(&blk->bs, filename, reference, options, flags, NULL, errp);
     if (ret < 0) {
-        blk_unref(blk);
+        blk_unref(blk, close_all_notifier);
         return NULL;
     }
 
@@ -188,9 +193,13 @@  static void drive_info_del(DriveInfo *dinfo)
  * Increment @blk's reference count.
  * @blk must not be null.
  */
-void blk_ref(BlockBackend *blk)
+void blk_ref(BlockBackend *blk, Notifier *close_all_notifier)
 {
     blk->refcnt++;
+
+    if (close_all_notifier) {
+        bdrv_add_close_all_notifier(close_all_notifier);
+    }
 }
 
 /*
@@ -198,8 +207,12 @@  void blk_ref(BlockBackend *blk)
  * If this drops it to zero, destroy @blk.
  * For convenience, do nothing if @blk is null.
  */
-void blk_unref(BlockBackend *blk)
+void blk_unref(BlockBackend *blk, Notifier *close_all_notifier)
 {
+    if (close_all_notifier) {
+        notifier_remove(close_all_notifier);
+    }
+
     if (blk) {
         assert(blk->refcnt > 0);
         if (!--blk->refcnt) {
@@ -347,6 +360,21 @@  void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
     bs->blk = blk;
 }
 
+typedef struct DevCloseAllNotifier {
+    Notifier n;
+    BlockBackend *blk;
+    QTAILQ_ENTRY(DevCloseAllNotifier) next;
+} DevCloseAllNotifier;
+
+static QTAILQ_HEAD(, DevCloseAllNotifier) close_all_notifiers =
+    QTAILQ_HEAD_INITIALIZER(close_all_notifiers);
+
+static void dev_close_all_notifier(Notifier *n, void *data)
+{
+    DevCloseAllNotifier *can = DO_UPCAST(DevCloseAllNotifier, n, n);
+    blk_detach_dev(can->blk, can->blk->dev);
+}
+
 /*
  * Attach device model @dev to @blk.
  * Return 0 on success, -EBUSY when a device model is attached already.
@@ -354,10 +382,19 @@  void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
 int blk_attach_dev(BlockBackend *blk, void *dev)
 /* TODO change to DeviceState *dev when all users are qdevified */
 {
+    DevCloseAllNotifier *can;
+
     if (blk->dev) {
         return -EBUSY;
     }
-    blk_ref(blk);
+
+    can = g_new0(DevCloseAllNotifier, 1);
+    can->n.notify = dev_close_all_notifier;
+    can->blk = blk;
+
+    QTAILQ_INSERT_TAIL(&close_all_notifiers, can, next);
+
+    blk_ref(blk, &can->n);
     blk->dev = dev;
     blk_iostatus_reset(blk);
     return 0;
@@ -382,12 +419,24 @@  void blk_attach_dev_nofail(BlockBackend *blk, void *dev)
 void blk_detach_dev(BlockBackend *blk, void *dev)
 /* TODO change to DeviceState *dev when all users are qdevified */
 {
+    DevCloseAllNotifier *can;
+
     assert(blk->dev == dev);
     blk->dev = NULL;
     blk->dev_ops = NULL;
     blk->dev_opaque = NULL;
     blk->guest_block_size = 512;
-    blk_unref(blk);
+
+    QTAILQ_FOREACH(can, &close_all_notifiers, next) {
+        if (can->blk == blk) {
+            break;
+        }
+    }
+    assert(can);
+
+    blk_unref(blk, &can->n);
+    QTAILQ_REMOVE(&close_all_notifiers, can, next);
+    g_free(can);
 }
 
 /*
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 22e95d1..eb5f9a0 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -47,36 +47,11 @@  void qmp_nbd_server_start(SocketAddress *addr, Error **errp)
     }
 }
 
-/* Hook into the BlockDriverState notifiers to close the export when
- * the file is closed.
- */
-typedef struct NBDCloseNotifier {
-    Notifier n;
-    NBDExport *exp;
-    QTAILQ_ENTRY(NBDCloseNotifier) next;
-} NBDCloseNotifier;
-
-static QTAILQ_HEAD(, NBDCloseNotifier) close_notifiers =
-    QTAILQ_HEAD_INITIALIZER(close_notifiers);
-
-static void nbd_close_notifier(Notifier *n, void *data)
-{
-    NBDCloseNotifier *cn = DO_UPCAST(NBDCloseNotifier, n, n);
-
-    notifier_remove(&cn->n);
-    QTAILQ_REMOVE(&close_notifiers, cn, next);
-
-    nbd_export_close(cn->exp);
-    nbd_export_put(cn->exp);
-    g_free(cn);
-}
-
 void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
                         Error **errp)
 {
     BlockBackend *blk;
     NBDExport *exp;
-    NBDCloseNotifier *n;
 
     if (server_fd == -1) {
         error_setg(errp, "NBD server not running");
@@ -108,20 +83,11 @@  void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
     exp = nbd_export_new(blk, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL);
 
     nbd_export_set_name(exp, device);
-
-    n = g_new0(NBDCloseNotifier, 1);
-    n->n.notify = nbd_close_notifier;
-    n->exp = exp;
-    blk_add_close_notifier(blk, &n->n);
-    QTAILQ_INSERT_TAIL(&close_notifiers, n, next);
 }
 
 void qmp_nbd_server_stop(Error **errp)
 {
-    while (!QTAILQ_EMPTY(&close_notifiers)) {
-        NBDCloseNotifier *cn = QTAILQ_FIRST(&close_notifiers);
-        nbd_close_notifier(&cn->n, nbd_export_get_blockdev(cn->exp));
-    }
+    nbd_export_close_all();
 
     if (server_fd != -1) {
         qemu_set_fd_handler2(server_fd, NULL, NULL, NULL, NULL);
diff --git a/blockdev.c b/blockdev.c
index 9c5b6ac..fd3f50d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -47,6 +47,15 @@ 
 #include "trace.h"
 #include "sysemu/arch_init.h"
 
+typedef struct BlockdevCloseAllNotifier {
+    Notifier n;
+    BlockBackend *blk;
+    QTAILQ_ENTRY(BlockdevCloseAllNotifier) next;
+} BlockdevCloseAllNotifier;
+
+static QTAILQ_HEAD(, BlockdevCloseAllNotifier) close_all_notifiers =
+    QTAILQ_HEAD_INITIALIZER(close_all_notifiers);
+
 static const char *const if_name[IF_COUNT] = {
     [IF_NONE] = "none",
     [IF_IDE] = "ide",
@@ -78,6 +87,37 @@  static int if_max_devs[IF_COUNT] = {
     [IF_SCSI] = 7,
 };
 
+static void monitor_blk_unref_with_can(BlockBackend *blk,
+                                       BlockdevCloseAllNotifier *can)
+{
+    if (can) {
+        assert(can->blk == blk);
+    } else {
+        QTAILQ_FOREACH(can, &close_all_notifiers, next) {
+            if (can->blk == blk) {
+                break;
+            }
+        }
+        assert(can);
+    }
+
+    blk_unref(blk, &can->n);
+    QTAILQ_REMOVE(&close_all_notifiers, can, next);
+    g_free(can);
+}
+
+void monitor_blk_unref(BlockBackend *blk)
+{
+    monitor_blk_unref_with_can(blk, NULL);
+}
+
+static void blockdev_close_all_notifier(Notifier *n, void *data)
+{
+    BlockdevCloseAllNotifier *can = DO_UPCAST(BlockdevCloseAllNotifier, n, n);
+
+    monitor_blk_unref_with_can(can->blk, can);
+}
+
 /**
  * Boards may call this to offer board-by-board overrides
  * of the default, global values.
@@ -140,7 +180,7 @@  void blockdev_auto_del(BlockBackend *blk)
     DriveInfo *dinfo = blk_legacy_dinfo(blk);
 
     if (dinfo && dinfo->auto_del) {
-        blk_unref(blk);
+        monitor_blk_unref(blk);
     }
 }
 
@@ -460,6 +500,7 @@  static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
     bool has_driver_specific_opts;
     BlockdevDetectZeroesOptions detect_zeroes =
         BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
+    BlockdevCloseAllNotifier *can;
 
     /* Check common options by copying from bs_opts to opts, all other options
      * stay in bs_opts for processing by bdrv_open(). */
@@ -536,14 +577,21 @@  static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
     }
 
     /* init */
+    can = g_new0(BlockdevCloseAllNotifier, 1);
+    can->n.notify = blockdev_close_all_notifier;
+
     if ((!file || !*file) && !has_driver_specific_opts) {
         BlockBackendRootState *blk_rs;
 
-        blk = blk_new(qemu_opts_id(opts), errp);
+        blk = blk_new(qemu_opts_id(opts), &can->n, errp);
         if (!blk) {
+            g_free(can);
             goto early_err;
         }
 
+        can->blk = blk;
+        QTAILQ_INSERT_TAIL(&close_all_notifiers, can, next);
+
         blk_rs = blk_get_root_state(blk);
         blk_rs->open_flags    = bdrv_flags;
         blk_rs->read_only     = !(bdrv_flags & BDRV_O_RDWR);
@@ -561,12 +609,16 @@  static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
         }
 
         blk = blk_new_open(qemu_opts_id(opts), file, NULL, bs_opts, bdrv_flags,
-                           errp);
+                           &can->n, errp);
         if (!blk) {
+            g_free(can);
             goto err_no_bs_opts;
         }
         bs = blk_bs(blk);
 
+        can->blk = blk;
+        QTAILQ_INSERT_TAIL(&close_all_notifiers, can, next);
+
         bs->detect_zeroes = detect_zeroes;
 
         /* disk I/O throttling */
@@ -2250,7 +2302,7 @@  int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
         blk_set_on_error(blk, BLOCKDEV_ON_ERROR_REPORT,
                          BLOCKDEV_ON_ERROR_REPORT);
     } else {
-        blk_unref(blk);
+        monitor_blk_unref(blk);
     }
 
     aio_context_release(aio_context);
@@ -3174,7 +3226,7 @@  void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
 
     if (bs && bdrv_key_required(bs)) {
         if (blk) {
-            blk_unref(blk);
+            monitor_blk_unref(blk);
         } else {
             bdrv_unref(bs);
         }
diff --git a/device-hotplug.c b/device-hotplug.c
index 9e38cc4..aa05a71 100644
--- a/device-hotplug.c
+++ b/device-hotplug.c
@@ -77,6 +77,6 @@  void drive_hot_add(Monitor *mon, const QDict *qdict)
 
 err:
     if (dinfo) {
-        blk_unref(blk_by_legacy_dinfo(dinfo));
+        monitor_blk_unref(blk_by_legacy_dinfo(dinfo));
     }
 }
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 665e706..3f80cbe 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -909,8 +909,10 @@  static int blk_connect(struct XenDevice *xendev)
 
         /* setup via xenbus -> create new block driver instance */
         xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus setup)\n");
+        /* No need to give a close_all_notifier because blk_attach_dev_nofail()
+         * and blk_detach_dev() will keep track of our reference */
         blkdev->blk = blk_new_open(blkdev->dev, blkdev->filename, NULL, options,
-                                   qflags, &local_err);
+                                   qflags, NULL, &local_err);
         if (!blkdev->blk) {
             xen_be_printf(&blkdev->xendev, 0, "error: %s\n",
                           error_get_pretty(local_err));
@@ -926,11 +928,16 @@  static int blk_connect(struct XenDevice *xendev)
             blkdev->blk = NULL;
             return -1;
         }
-        /* blkdev->blk is not create by us, we get a reference
-         * so we can blk_unref() unconditionally */
-        blk_ref(blkdev->blk);
     }
+
     blk_attach_dev_nofail(blkdev->blk, blkdev);
+    if (!blkdev->dinfo) {
+        /* blkdev->blk has just been created; blk_attach_dev_nofail() counts
+         * the reference blkdev->blk, so we do not have to keep track (which
+         * allows us to ignore bdrv_close_all()) */
+        blk_unref(blkdev->blk, NULL);
+    }
+
     blkdev->file_size = blk_getlength(blkdev->blk);
     if (blkdev->file_size < 0) {
         BlockDriverState *bs = blk_bs(blkdev->blk);
@@ -1033,7 +1040,6 @@  static void blk_disconnect(struct XenDevice *xendev)
 
     if (blkdev->blk) {
         blk_detach_dev(blkdev->blk, blkdev);
-        blk_unref(blkdev->blk);
         blkdev->blk = NULL;
     }
     xen_be_unbind_evtchn(&blkdev->xendev);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index b0172fb..becf0f5 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -184,7 +184,6 @@  int pci_piix3_xen_ide_unplug(DeviceState *dev)
                 blk_detach_dev(blk, ds);
             }
             pci_ide->bus[di->bus].ifs[di->unit].blk = NULL;
-            blk_unref(blk);
         }
     }
     qdev_reset_all(DEVICE(dev));
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c8b47f7..12a05df 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -13,6 +13,7 @@ 
 #ifndef BLOCK_BACKEND_H
 #define BLOCK_BACKEND_H
 
+#include "qemu/notify.h"
 #include "qemu/typedefs.h"
 #include "qapi/error.h"
 
@@ -60,13 +61,15 @@  typedef struct BlockDevOps {
     void (*resize_cb)(void *opaque);
 } BlockDevOps;
 
-BlockBackend *blk_new(const char *name, Error **errp);
-BlockBackend *blk_new_with_bs(const char *name, Error **errp);
+BlockBackend *blk_new(const char *name, Notifier *close_all_notifier,
+                      Error **errp);
+BlockBackend *blk_new_with_bs(const char *name, Notifier *close_all_notifier,
+                              Error **errp);
 BlockBackend *blk_new_open(const char *name, const char *filename,
                            const char *reference, QDict *options, int flags,
-                           Error **errp);
-void blk_ref(BlockBackend *blk);
-void blk_unref(BlockBackend *blk);
+                           Notifier *close_all_notifier, Error **errp);
+void blk_ref(BlockBackend *blk, Notifier *close_all_notifier);
+void blk_unref(BlockBackend *blk, Notifier *close_all_notifier);
 const char *blk_name(BlockBackend *blk);
 BlockBackend *blk_by_name(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 2a34332..b6091e0 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -67,4 +67,7 @@  DriveInfo *add_init_drive(const char *opts);
 
 void do_commit(Monitor *mon, const QDict *qdict);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
+
+void monitor_blk_unref(BlockBackend *blk);
+
 #endif
diff --git a/nbd.c b/nbd.c
index 53cf82b..d1dc350 100644
--- a/nbd.c
+++ b/nbd.c
@@ -956,9 +956,25 @@  static void blk_aio_detach(void *opaque)
     exp->ctx = NULL;
 }
 
+typedef struct NBDCloseNotifier {
+    Notifier n;
+    NBDExport *exp;
+    QTAILQ_ENTRY(NBDCloseNotifier) next;
+} NBDCloseNotifier;
+
+static QTAILQ_HEAD(, NBDCloseNotifier) close_notifiers =
+    QTAILQ_HEAD_INITIALIZER(close_notifiers);
+
+static void nbd_close_notifier(Notifier *n, void *data)
+{
+    NBDCloseNotifier *cn = DO_UPCAST(NBDCloseNotifier, n, n);
+    nbd_export_close(cn->exp);
+}
+
 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
                           uint32_t nbdflags, void (*close)(NBDExport *))
 {
+    NBDCloseNotifier *n = g_new0(NBDCloseNotifier, 1);
     NBDExport *exp = g_malloc0(sizeof(NBDExport));
     exp->refcount = 1;
     QTAILQ_INIT(&exp->clients);
@@ -968,7 +984,12 @@  NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
     exp->size = size == -1 ? blk_getlength(blk) : size;
     exp->close = close;
     exp->ctx = blk_get_aio_context(blk);
-    blk_ref(blk);
+
+    n->n.notify = nbd_close_notifier;
+    n->exp = exp;
+    QTAILQ_INSERT_TAIL(&close_notifiers, n, next);
+
+    blk_ref(blk, &n->n);
     blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
     /*
      * NBD exports are used for non-shared storage migration.  Make sure
@@ -1014,6 +1035,7 @@  void nbd_export_set_name(NBDExport *exp, const char *name)
 
 void nbd_export_close(NBDExport *exp)
 {
+    NBDCloseNotifier *n;
     NBDClient *client, *next;
 
     nbd_export_get(exp);
@@ -1022,11 +1044,22 @@  void nbd_export_close(NBDExport *exp)
     }
     nbd_export_set_name(exp, NULL);
     nbd_export_put(exp);
+
     if (exp->blk) {
+        QTAILQ_FOREACH(n, &close_notifiers, next) {
+            if (n->exp == exp) {
+                break;
+            }
+        }
+        assert(n);
+
         blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
                                         blk_aio_detach, exp);
-        blk_unref(exp->blk);
+        blk_unref(exp->blk, &n->n);
         exp->blk = NULL;
+
+        QTAILQ_REMOVE(&close_notifiers, n, next);
+        g_free(n);
     }
 }
 
diff --git a/qemu-img.c b/qemu-img.c
index 8b4139e..55c5262 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -300,7 +300,7 @@  static BlockBackend *img_open(const char *id, const char *filename,
         qdict_put_obj(options, "driver", QOBJECT(qstring_from_str(fmt)));
     }
 
-    blk = blk_new_open(id, filename, NULL, options, flags, &local_err);
+    blk = blk_new_open(id, filename, NULL, options, flags, NULL, &local_err);
     if (!blk) {
         error_report("Could not open '%s': %s", filename,
                      error_get_pretty(local_err));
@@ -322,7 +322,7 @@  static BlockBackend *img_open(const char *id, const char *filename,
     }
     return blk;
 fail:
-    blk_unref(blk);
+    blk_unref(blk, NULL);
     return NULL;
 }
 
@@ -711,7 +711,7 @@  static int img_check(int argc, char **argv)
 
 fail:
     qapi_free_ImageCheck(check);
-    blk_unref(blk);
+    blk_unref(blk, NULL);
     return ret;
 }
 
@@ -878,7 +878,7 @@  unref_backing:
 done:
     qemu_progress_end();
 
-    blk_unref(blk);
+    blk_unref(blk, NULL);
 
     if (local_err) {
         qerror_report_err(local_err);
@@ -1294,9 +1294,9 @@  static int img_compare(int argc, char **argv)
 out:
     qemu_vfree(buf1);
     qemu_vfree(buf2);
-    blk_unref(blk2);
+    blk_unref(blk2, NULL);
 out2:
-    blk_unref(blk1);
+    blk_unref(blk1, NULL);
 out3:
     qemu_progress_end();
     return ret;
@@ -1867,11 +1867,11 @@  out:
     qemu_opts_free(create_opts);
     qemu_vfree(buf);
     qemu_opts_del(sn_opts);
-    blk_unref(out_blk);
+    blk_unref(out_blk, NULL);
     g_free(bs);
     if (blk) {
         for (bs_i = 0; bs_i < bs_n; bs_i++) {
-            blk_unref(blk[bs_i]);
+            blk_unref(blk[bs_i], NULL);
         }
         g_free(blk);
     }
@@ -2006,7 +2006,7 @@  static ImageInfoList *collect_image_info_list(const char *filename,
         if (err) {
             error_report("%s", error_get_pretty(err));
             error_free(err);
-            blk_unref(blk);
+            blk_unref(blk, NULL);
             goto err;
         }
 
@@ -2015,7 +2015,7 @@  static ImageInfoList *collect_image_info_list(const char *filename,
         *last = elem;
         last = &elem->next;
 
-        blk_unref(blk);
+        blk_unref(blk, NULL);
 
         filename = fmt = NULL;
         if (chain) {
@@ -2303,7 +2303,7 @@  static int img_map(int argc, char **argv)
     dump_map_entry(output_format, &curr, NULL);
 
 out:
-    blk_unref(blk);
+    blk_unref(blk, NULL);
     return ret < 0;
 }
 
@@ -2427,7 +2427,7 @@  static int img_snapshot(int argc, char **argv)
     }
 
     /* Cleanup */
-    blk_unref(blk);
+    blk_unref(blk, NULL);
     if (ret) {
         return 1;
     }
@@ -2544,7 +2544,7 @@  static int img_rebase(int argc, char **argv)
 
         bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name));
         blk_old_backing = blk_new_open("old_backing", backing_name, NULL,
-                                       options, src_flags, &local_err);
+                                       options, src_flags, NULL, &local_err);
         if (!blk_old_backing) {
             error_report("Could not open old backing file '%s': %s",
                          backing_name, error_get_pretty(local_err));
@@ -2562,7 +2562,8 @@  static int img_rebase(int argc, char **argv)
             }
 
             blk_new_backing = blk_new_open("new_backing", out_baseimg, NULL,
-                                           options, src_flags, &local_err);
+                                           options, src_flags, NULL,
+                                           &local_err);
             if (!blk_new_backing) {
                 error_report("Could not open new backing file '%s': %s",
                              out_baseimg, error_get_pretty(local_err));
@@ -2741,11 +2742,11 @@  out:
     qemu_progress_end();
     /* Cleanup */
     if (!unsafe) {
-        blk_unref(blk_old_backing);
-        blk_unref(blk_new_backing);
+        blk_unref(blk_old_backing, NULL);
+        blk_unref(blk_new_backing, NULL);
     }
 
-    blk_unref(blk);
+    blk_unref(blk, NULL);
     if (ret) {
         return 1;
     }
@@ -2868,7 +2869,7 @@  static int img_resize(int argc, char **argv)
         break;
     }
 out:
-    blk_unref(blk);
+    blk_unref(blk, NULL);
     if (ret) {
         return 1;
     }
@@ -3006,7 +3007,7 @@  static int img_amend(int argc, char **argv)
 out:
     qemu_progress_end();
 
-    blk_unref(blk);
+    blk_unref(blk, NULL);
     qemu_opts_del(opts);
     qemu_opts_free(create_opts);
     g_free(options);
diff --git a/qemu-io.c b/qemu-io.c
index bde97f8..ffbe0f1 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -38,7 +38,7 @@  static ReadLineState *readline_state;
 
 static int close_f(BlockBackend *blk, int argc, char **argv)
 {
-    blk_unref(qemuio_blk);
+    blk_unref(qemuio_blk, NULL);
     qemuio_blk = NULL;
     return 0;
 }
@@ -60,7 +60,7 @@  static int openfile(char *name, int flags, QDict *opts)
         return 1;
     }
 
-    qemuio_blk = blk_new_open("hda", name, NULL, opts, flags, &local_err);
+    qemuio_blk = blk_new_open("hda", name, NULL, opts, flags, NULL, &local_err);
     if (!qemuio_blk) {
         fprintf(stderr, "%s: can't open%s%s: %s\n", progname,
                 name ? " device " : "", name ?: "",
@@ -476,7 +476,7 @@  int main(int argc, char **argv)
      */
     bdrv_drain_all();
 
-    blk_unref(qemuio_blk);
+    blk_unref(qemuio_blk, NULL);
     g_free(readline_state);
     return 0;
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index fdc9590..716053b 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -689,7 +689,8 @@  int main(int argc, char **argv)
     }
 
     srcpath = argv[optind];
-    blk = blk_new_open("hda", srcpath, NULL, options, flags, &local_err);
+    /* the reference will be passed on nbd_export_new() */
+    blk = blk_new_open("hda", srcpath, NULL, options, flags, NULL, &local_err);
     if (!blk) {
         errx(EXIT_FAILURE, "Failed to blk_new_open '%s': %s", argv[optind],
              error_get_pretty(local_err));
@@ -723,7 +724,9 @@  int main(int argc, char **argv)
         }
     }
 
+    /* nbd_export_new() takes the reference to blk */
     exp = nbd_export_new(blk, dev_offset, fd_size, nbdflags, nbd_export_closed);
+    blk_unref(blk, NULL);
 
     if (sockpath) {
         fd = unix_socket_incoming(sockpath);
@@ -768,7 +771,6 @@  int main(int argc, char **argv)
         }
     } while (state != TERMINATED);
 
-    blk_unref(blk);
     if (sockpath) {
         unlink(sockpath);
     }