diff mbox series

[2/2] mirror: Wait only for in-flight operations

Message ID 20200325172326.22347-3-kwolf@redhat.com
State New
Headers show
Series mirror: Fix hang (operation waiting for itself/circular dependency) | expand

Commit Message

Kevin Wolf March 25, 2020, 5:23 p.m. UTC
mirror_wait_for_free_in_flight_slot() just picks a random operation to
wait for. However, a MirrorOp is already in s->ops_in_flight when
mirror_co_read() waits for free slots, so if not enough slots are
immediately available, an operation can end up waiting for itself, or
two or more operations can wait for each other to complete, which
results in a hang.

Fix this by adding a flag to MirrorOp that tells us if the request is
already in flight (and therefore occupies slots that it will later
free), and picking only such operations for waiting.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Eric Blake March 25, 2020, 5:39 p.m. UTC | #1
On 3/25/20 12:23 PM, Kevin Wolf wrote:
> mirror_wait_for_free_in_flight_slot() just picks a random operation to
> wait for. However, a MirrorOp is already in s->ops_in_flight when
> mirror_co_read() waits for free slots, so if not enough slots are
> immediately available, an operation can end up waiting for itself, or
> two or more operations can wait for each other to complete, which
> results in a hang.
> 
> Fix this by adding a flag to MirrorOp that tells us if the request is
> already in flight (and therefore occupies slots that it will later
> free), and picking only such operations for waiting.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/mirror.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 393131b135..7fef52ded2 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -102,6 +102,7 @@ struct MirrorOp {
>   
>       bool is_pseudo_op;
>       bool is_active_write;
> +    bool is_in_flight;
>       CoQueue waiting_requests;
>       Coroutine *co;
>   
> @@ -293,7 +294,9 @@ mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
>            * caller of this function.  Since there is only one pseudo op
>            * at any given time, we will always find some real operation
>            * to wait on. */
> -        if (!op->is_pseudo_op && op->is_active_write == active) {
> +        if (!op->is_pseudo_op && op->is_in_flight &&
> +            op->is_active_write == active)
> +        {
>               qemu_co_queue_wait(&op->waiting_requests, NULL);

Looks like a one-way transition - op->is_in_flight always starts as 
false, and only ever gets set to true (once the op is finished, op is no 
longer needed).  And being more selective on what you wait for here does 
look like it should work in more cases than what patch 1 reverted.

Reviewed-by: Eric Blake <eblake@redhat.com>
diff mbox series

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 393131b135..7fef52ded2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -102,6 +102,7 @@  struct MirrorOp {
 
     bool is_pseudo_op;
     bool is_active_write;
+    bool is_in_flight;
     CoQueue waiting_requests;
     Coroutine *co;
 
@@ -293,7 +294,9 @@  mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
          * caller of this function.  Since there is only one pseudo op
          * at any given time, we will always find some real operation
          * to wait on. */
-        if (!op->is_pseudo_op && op->is_active_write == active) {
+        if (!op->is_pseudo_op && op->is_in_flight &&
+            op->is_active_write == active)
+        {
             qemu_co_queue_wait(&op->waiting_requests, NULL);
             return;
         }
@@ -367,6 +370,7 @@  static void coroutine_fn mirror_co_read(void *opaque)
     /* Copy the dirty cluster.  */
     s->in_flight++;
     s->bytes_in_flight += op->bytes;
+    op->is_in_flight = true;
     trace_mirror_one_iteration(s, op->offset, op->bytes);
 
     ret = bdrv_co_preadv(s->mirror_top_bs->backing, op->offset, op->bytes,
@@ -382,6 +386,7 @@  static void coroutine_fn mirror_co_zero(void *opaque)
     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
+    op->is_in_flight = true;
 
     ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
                                op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
@@ -396,6 +401,7 @@  static void coroutine_fn mirror_co_discard(void *opaque)
     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
+    op->is_in_flight = true;
 
     ret = blk_co_pdiscard(op->s->target, op->offset, op->bytes);
     mirror_write_complete(op, ret);