diff mbox

block: fix streaming/closing race

Message ID 1333037320-20564-1-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini March 29, 2012, 4:08 p.m. UTC
Streaming can issue I/O while qcow2_close is running.  This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device.  The fix is to cancel streaming jobs when closing their
underlying device.  The cancellation must be synchronous, so add a flag
saying whether streaming has in-flight I/O.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c        |   14 ++++++++++++++
 block/stream.c |    4 +++-
 block_int.h    |    2 ++
 3 files changed, 19 insertions(+), 1 deletions(-)

Comments

Stefan Hajnoczi March 30, 2012, 7:19 a.m. UTC | #1
On Thu, Mar 29, 2012 at 06:08:40PM +0200, Paolo Bonzini wrote:
> Streaming can issue I/O while qcow2_close is running.  This causes the
> L2 caches to become very confused or, alternatively, could cause a
> segfault when the streaming coroutine is reentered after closing its
> block device.  The fix is to cancel streaming jobs when closing their
> underlying device.  The cancellation must be synchronous, so add a flag
> saying whether streaming has in-flight I/O.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c        |   14 ++++++++++++++
>  block/stream.c |    4 +++-
>  block_int.h    |    2 ++
>  3 files changed, 19 insertions(+), 1 deletions(-)
> 
> diff --git a/block.c b/block.c
> index b88ee90..1aab4bf 100644
> --- a/block.c
> +++ b/block.c
> @@ -813,6 +813,9 @@ unlink_and_fail:
>  void bdrv_close(BlockDriverState *bs)
>  {
>      if (bs->drv) {
> +        if (bs->job) {
> +            block_job_cancel_sync(bs->job);
> +        }
>          if (bs == bs_snapshots) {
>              bs_snapshots = NULL;
>          }
> @@ -4069,3 +4072,14 @@ bool block_job_is_cancelled(BlockJob *job)
>  {
>      return job->cancelled;
>  }
> +
> +void block_job_cancel_sync(BlockJob *job)
> +{
> +    BlockDriverState *bs = job->bs;
> +
> +    assert(bs->job == job);
> +    block_job_cancel(job);
> +    while (bs->job != NULL && bs->job->busy) {

It's not clear to me why we have a busy flag.  Is canceling and waiting
for bs->job == NULL not enough?  Perhaps you're trying to take a
shortcut and not wait until the job is fully stopped - it's not worth it
since the streaming block job does no significant work after detecting
cancelation.

Stefan
Paolo Bonzini March 30, 2012, 8:31 a.m. UTC | #2
Il 30/03/2012 09:19, Stefan Hajnoczi ha scritto:
>> > +void block_job_cancel_sync(BlockJob *job)
>> > +{
>> > +    BlockDriverState *bs = job->bs;
>> > +
>> > +    assert(bs->job == job);
>> > +    block_job_cancel(job);
>> > +    while (bs->job != NULL && bs->job->busy) {
> It's not clear to me why we have a busy flag.

Because the coroutine does not restart if you do qemu_aio_wait and the
coroutine is waiting in co_sleep.  So the busy flag communicates that
the coroutine is quiescent and, when cancelled, will not issue any new I/O.

However, even this is not enough.  It fixes a race with closing, but not
with deleting bs.  So the bug does not show anymore when you quit QEMU
(because the coroutine is not restarted), but it is still there if you
hot-unplug a device while streaming is in effect.

> Is canceling and waiting for bs->job == NULL not enough?  Perhaps
> you're trying to take a shortcut and not wait until the job is fully
> stopped

No, simply it's impossible to wait for full completion with qemu_aio_wait.

Paolo
Paolo Bonzini March 30, 2012, 10 a.m. UTC | #3
Il 30/03/2012 10:31, Paolo Bonzini ha scritto:
> However, even this is not enough.  It fixes a race with closing, but not
> with deleting bs.  So the bug does not show anymore when you quit QEMU
> (because the coroutine is not restarted), but it is still there if you
> hot-unplug a device while streaming is in effect.

Nevermind, hot-unplug is handled by the reference counting mechanism; so
this patch is enough.  However, we may still want to cancel the job when
a device is hot-unplugged.  I'll resubmit this patch as a series and
with a better commit message.

Paolo
Stefan Hajnoczi March 30, 2012, 10:25 a.m. UTC | #4
On Fri, Mar 30, 2012 at 9:31 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 30/03/2012 09:19, Stefan Hajnoczi ha scritto:
>>> > +void block_job_cancel_sync(BlockJob *job)
>>> > +{
>>> > +    BlockDriverState *bs = job->bs;
>>> > +
>>> > +    assert(bs->job == job);
>>> > +    block_job_cancel(job);
>>> > +    while (bs->job != NULL && bs->job->busy) {
>> It's not clear to me why we have a busy flag.
>
> Because the coroutine does not restart if you do qemu_aio_wait and the
> coroutine is waiting in co_sleep.  So the busy flag communicates that
> the coroutine is quiescent and, when cancelled, will not issue any new I/O.

Ah, yes.  This is tricky, a comment would be nice.

Stefan
Paolo Bonzini March 30, 2012, 10:29 a.m. UTC | #5
Il 30/03/2012 12:25, Stefan Hajnoczi ha scritto:
> On Fri, Mar 30, 2012 at 9:31 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 30/03/2012 09:19, Stefan Hajnoczi ha scritto:
>>>>> +void block_job_cancel_sync(BlockJob *job)
>>>>> +{
>>>>> +    BlockDriverState *bs = job->bs;
>>>>> +
>>>>> +    assert(bs->job == job);
>>>>> +    block_job_cancel(job);
>>>>> +    while (bs->job != NULL && bs->job->busy) {
>>> It's not clear to me why we have a busy flag.
>>
>> Because the coroutine does not restart if you do qemu_aio_wait and the
>> coroutine is waiting in co_sleep.  So the busy flag communicates that
>> the coroutine is quiescent and, when cancelled, will not issue any new I/O.
> 
> Ah, yes.  This is tricky, a comment would be nice.

I'll just add gtkdoc comments for the blockjob interface.  We need to
start somewhere...

Paolo
diff mbox

Patch

diff --git a/block.c b/block.c
index b88ee90..1aab4bf 100644
--- a/block.c
+++ b/block.c
@@ -813,6 +813,9 @@  unlink_and_fail:
 void bdrv_close(BlockDriverState *bs)
 {
     if (bs->drv) {
+        if (bs->job) {
+            block_job_cancel_sync(bs->job);
+        }
         if (bs == bs_snapshots) {
             bs_snapshots = NULL;
         }
@@ -4069,3 +4072,14 @@  bool block_job_is_cancelled(BlockJob *job)
 {
     return job->cancelled;
 }
+
+void block_job_cancel_sync(BlockJob *job)
+{
+    BlockDriverState *bs = job->bs;
+
+    assert(bs->job == job);
+    block_job_cancel(job);
+    while (bs->job != NULL && bs->job->busy) {
+        qemu_aio_wait();
+    }
+}
diff --git a/block/stream.c b/block/stream.c
index d1b3986..b8a1628 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -175,7 +175,7 @@  retry:
             break;
         }
 
-
+        s->common.busy = true;
         if (base) {
             ret = is_allocated_base(bs, base, sector_num,
                                     STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
@@ -189,6 +189,7 @@  retry:
             if (s->common.speed) {
                 uint64_t delay_ns = ratelimit_calculate_delay(&s->limit, n);
                 if (delay_ns > 0) {
+                    s->common.busy = false;
                     co_sleep_ns(rt_clock, delay_ns);
 
                     /* Recheck cancellation and that sectors are unallocated */
@@ -208,6 +209,7 @@  retry:
         /* Note that even when no rate limit is applied we need to yield
          * with no pending I/O here so that qemu_aio_flush() returns.
          */
+        s->common.busy = false;
         co_sleep_ns(rt_clock, 0);
     }
 
diff --git a/block_int.h b/block_int.h
index b460c36..f5c3dff 100644
--- a/block_int.h
+++ b/block_int.h
@@ -89,6 +89,7 @@  struct BlockJob {
     const BlockJobType *job_type;
     BlockDriverState *bs;
     bool cancelled;
+    bool busy;
 
     /* These fields are published by the query-block-jobs QMP API */
     int64_t offset;
@@ -329,6 +330,7 @@  void block_job_complete(BlockJob *job, int ret);
 int block_job_set_speed(BlockJob *job, int64_t value);
 void block_job_cancel(BlockJob *job);
 bool block_job_is_cancelled(BlockJob *job);
+void block_job_cancel_sync(BlockJob *job);
 
 int stream_start(BlockDriverState *bs, BlockDriverState *base,
                  const char *base_id, BlockDriverCompletionFunc *cb,