[06/15] qed: avoid deadlock on emulated synchronous I/O

Message ID 1311774295-8696-7-git-send-email-stefanha@linux.vnet.ibm.com
State New
Headers show

Commit Message

Stefan Hajnoczi July 27, 2011, 1:44 p.m.
The block layer emulates synchronous bdrv_read()/bdrv_write() for
drivers that only provide the asynchronous interfaces.  The emulation
issues an asynchronous request inside a new "async context" and waits
for that request to complete.  If currently outstanding requests
complete during this time, their completion functions are not invoked
until the async context is popped again.

This can lead to deadlock if an allocating write is being processed when
synchronous I/O emulation starts.  The emulated synchronous write will
be queued because an existing request is being processed.  But the
existing request on cannot complete until the async context is popped.
The result is that qemu_aio_wait() sits in a deadlock.

Address this problem in two ways:

1. Add an assertion so that we instantly know if this corner case is
   hit.  This saves us time by giving a clear failure indication.

2. Ignore the copy-on-read hint for emulated synchronous reads.  This
   allows us to do emulated synchronous reads without hitting the

Keep this as a separate commit instead of merging with previous QED
patches so it is easy to drop when coroutines are introduced and
eliminate async contexts.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
 block/qed.c |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)


diff --git a/block/qed.c b/block/qed.c
index 6ca57f2..ffdbc2d 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1120,6 +1120,14 @@  static bool qed_start_allocating_write(QEDAIOCB *acb)
     if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) ||
         s->allocating_write_reqs_plugged) {
+        /* Queuing an emulated synchronous write causes deadlock since
+         * currently outstanding requests are not in the current async context
+         * and their completion will never be invoked.  Once the block layer
+         * moves to truly asynchronous semantics this failure case will be
+         * eliminated.
+         */
+        assert(get_async_context_id() == 0);
         return false;
     return true;
@@ -1246,7 +1254,9 @@  static void qed_aio_read_data(void *opaque, int ret,
     } else if (ret != QED_CLUSTER_FOUND) {
         BlockDriverCompletionFunc *cb = qed_aio_next_io;
-        if (bs->backing_hd && (acb->flags & QED_AIOCB_COPY_ON_READ)) {
+        /* See qed_start_allocating_write() for get_async_context_id() hack */
+        if (bs->backing_hd && (acb->flags & QED_AIOCB_COPY_ON_READ) &&
+            get_async_context_id() == 0) {
             if (!qed_start_allocating_write(acb)) {
                 return; /* wait for current allocating write to complete */