diff mbox

[v2,1/2] block: Fix early failure in multiwrite

Message ID 1278072457-8156-2-git-send-email-kwolf@redhat.com
State New
Headers show

Commit Message

Kevin Wolf July 2, 2010, 12:07 p.m. UTC
bdrv_aio_writev may call the callback immediately (and it will commonly do so
in error cases). Current code doesn't consider this. For details see the
comment added by this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c |   35 +++++++++++++++++++++++++++++------
 1 files changed, 29 insertions(+), 6 deletions(-)

Comments

Stefan Hajnoczi July 2, 2010, 1:18 p.m. UTC | #1
On Fri, Jul 2, 2010 at 1:07 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> bdrv_aio_writev may call the callback immediately (and it will commonly do so
> in error cases). Current code doesn't consider this. For details see the
> comment added by this patch.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c |   35 +++++++++++++++++++++++++++++------
>  1 files changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/block.c b/block.c
> index 9176dec..e65971c 100644
> --- a/block.c
> +++ b/block.c
> @@ -2183,8 +2183,29 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
>     // Check for mergable requests
>     num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb);
>
> -    // Run the aio requests
> +    /*
> +     * Run the aio requests. As soon as one request can't be submitted
> +     * successfully, fail all requests that are not yet submitted (we must
> +     * return failure for all requests anyway)
> +     *
> +     * num_requests cannot be set to the right value immediately: If
> +     * bdrv_aio_writev fails for some request, num_requests would be too high
> +     * and therefore multiwrite_cb() would never recognize the multiwrite
> +     * request as completed. We also cannot use the loop variable i to set it
> +     * when the first request fails because the callback may already have been
> +     * called for previously submitted requests. Thus, num_requests must be
> +     * incremented for each request that is submitted.
> +     *
> +     * The problem that callbacks may be called early also means that we need
> +     * to take care that num_requests doesn't become 0 before all requests are
> +     * submitted - multiwrite_cb() would consider the multiwrite request
> +     * completed. A dummy request that is "completed" by a manual call to
> +     * multiwrite_cb() takes care of this.
> +     */
> +    mcb->num_requests = 1;
> +
>     for (i = 0; i < num_reqs; i++) {
> +        mcb->num_requests++;
>         acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
>             reqs[i].nb_sectors, multiwrite_cb, mcb);
>
> @@ -2192,22 +2213,24 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
>             // We can only fail the whole thing if no request has been
>             // submitted yet. Otherwise we'll wait for the submitted AIOs to
>             // complete and report the error in the callback.
> -            if (mcb->num_requests == 0) {
> -                reqs[i].error = -EIO;
> +            if (i == 0) {
>                 goto fail;
>             } else {
> -                mcb->num_requests++;
>                 multiwrite_cb(mcb, -EIO);

When bdrv_aio_writev() fails we don't know if the callback has been
invoked by the block driver.  Qcow2 will invoke the callback in some
cases.  This is a problem because num_requests will be decremented
twice if we unconditionally call it here.

Stefan
Kevin Wolf July 2, 2010, 1:32 p.m. UTC | #2
Am 02.07.2010 15:18, schrieb Stefan Hajnoczi:
> On Fri, Jul 2, 2010 at 1:07 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> bdrv_aio_writev may call the callback immediately (and it will commonly do so
>> in error cases). Current code doesn't consider this. For details see the
>> comment added by this patch.
>>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>  block.c |   35 +++++++++++++++++++++++++++++------
>>  1 files changed, 29 insertions(+), 6 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 9176dec..e65971c 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2183,8 +2183,29 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
>>     // Check for mergable requests
>>     num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb);
>>
>> -    // Run the aio requests
>> +    /*
>> +     * Run the aio requests. As soon as one request can't be submitted
>> +     * successfully, fail all requests that are not yet submitted (we must
>> +     * return failure for all requests anyway)
>> +     *
>> +     * num_requests cannot be set to the right value immediately: If
>> +     * bdrv_aio_writev fails for some request, num_requests would be too high
>> +     * and therefore multiwrite_cb() would never recognize the multiwrite
>> +     * request as completed. We also cannot use the loop variable i to set it
>> +     * when the first request fails because the callback may already have been
>> +     * called for previously submitted requests. Thus, num_requests must be
>> +     * incremented for each request that is submitted.
>> +     *
>> +     * The problem that callbacks may be called early also means that we need
>> +     * to take care that num_requests doesn't become 0 before all requests are
>> +     * submitted - multiwrite_cb() would consider the multiwrite request
>> +     * completed. A dummy request that is "completed" by a manual call to
>> +     * multiwrite_cb() takes care of this.
>> +     */
>> +    mcb->num_requests = 1;
>> +
>>     for (i = 0; i < num_reqs; i++) {
>> +        mcb->num_requests++;
>>         acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
>>             reqs[i].nb_sectors, multiwrite_cb, mcb);
>>
>> @@ -2192,22 +2213,24 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
>>             // We can only fail the whole thing if no request has been
>>             // submitted yet. Otherwise we'll wait for the submitted AIOs to
>>             // complete and report the error in the callback.
>> -            if (mcb->num_requests == 0) {
>> -                reqs[i].error = -EIO;
>> +            if (i == 0) {
>>                 goto fail;
>>             } else {
>> -                mcb->num_requests++;
>>                 multiwrite_cb(mcb, -EIO);
> 
> When bdrv_aio_writev() fails we don't know if the callback has been
> invoked by the block driver.  Qcow2 will invoke the callback in some
> cases.  This is a problem because num_requests will be decremented
> twice if we unconditionally call it here.

Talked to Stefan on IRC and we came to the conclusion that it's not a
problem in fact: qcow_aio_writev() either returns NULL or calls a
callback, but it never does both.

If a block driver returned NULL and called a callback for the same
request that would be a bug in the block driver.

Kevin
diff mbox

Patch

diff --git a/block.c b/block.c
index 9176dec..e65971c 100644
--- a/block.c
+++ b/block.c
@@ -2183,8 +2183,29 @@  int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
     // Check for mergable requests
     num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb);
 
-    // Run the aio requests
+    /*
+     * Run the aio requests. As soon as one request can't be submitted
+     * successfully, fail all requests that are not yet submitted (we must
+     * return failure for all requests anyway)
+     *
+     * num_requests cannot be set to the right value immediately: If
+     * bdrv_aio_writev fails for some request, num_requests would be too high
+     * and therefore multiwrite_cb() would never recognize the multiwrite
+     * request as completed. We also cannot use the loop variable i to set it
+     * when the first request fails because the callback may already have been
+     * called for previously submitted requests. Thus, num_requests must be
+     * incremented for each request that is submitted.
+     *
+     * The problem that callbacks may be called early also means that we need
+     * to take care that num_requests doesn't become 0 before all requests are
+     * submitted - multiwrite_cb() would consider the multiwrite request
+     * completed. A dummy request that is "completed" by a manual call to
+     * multiwrite_cb() takes care of this.
+     */
+    mcb->num_requests = 1;
+
     for (i = 0; i < num_reqs; i++) {
+        mcb->num_requests++;
         acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
             reqs[i].nb_sectors, multiwrite_cb, mcb);
 
@@ -2192,22 +2213,24 @@  int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
             // We can only fail the whole thing if no request has been
             // submitted yet. Otherwise we'll wait for the submitted AIOs to
             // complete and report the error in the callback.
-            if (mcb->num_requests == 0) {
-                reqs[i].error = -EIO;
+            if (i == 0) {
                 goto fail;
             } else {
-                mcb->num_requests++;
                 multiwrite_cb(mcb, -EIO);
                 break;
             }
-        } else {
-            mcb->num_requests++;
         }
     }
 
+    /* Complete the dummy request */
+    multiwrite_cb(mcb, 0);
+
     return 0;
 
 fail:
+    for (i = 0; i < mcb->num_callbacks; i++) {
+        reqs[i].error = -EIO;
+    }
     qemu_free(mcb);
     return -1;
 }