diff mbox

[v10,01/16] block: Pause all jobs during bdrv_reopen_multiple()

Message ID 950ab25638c15b988aa336474ab21f6020028828.1475757437.git.berto@igalia.com
State New
Headers show

Commit Message

Alberto Garcia Oct. 6, 2016, 1:02 p.m. UTC
When a BlockDriverState is about to be reopened it can trigger certain
operations that need to write to disk. During this process a different
block job can be woken up. If that block job completes and also needs
to call bdrv_reopen() it can happen that it needs to do it on the same
BlockDriverState that is still in the process of being reopened.

This can have fatal consequences, like in this example:

  1) Block job A starts and sleeps after a while.
  2) Block job B starts and tries to reopen node1 (a qcow2 file).
  3) Reopening node1 means flushing and replacing its qcow2 cache.
  4) While the qcow2 cache is being flushed, job A wakes up.
  5) Job A completes and reopens node1, replacing its cache.
  6) Job B resumes, but the cache that was being flushed no longer
     exists.

This patch pauses all block jobs during bdrv_reopen_multiple(), so
that step 4 can never happen and the operation is safe.

Note that this scenario can only happen if both bdrv_reopen() calls
are made by block jobs on the same backing chain. Otherwise there's no
chance that the same BlockDriverState appears in both reopen queues.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Kevin Wolf Oct. 10, 2016, 3:37 p.m. UTC | #1
Am 06.10.2016 um 15:02 hat Alberto Garcia geschrieben:
> When a BlockDriverState is about to be reopened it can trigger certain
> operations that need to write to disk. During this process a different
> block job can be woken up. If that block job completes and also needs
> to call bdrv_reopen() it can happen that it needs to do it on the same
> BlockDriverState that is still in the process of being reopened.
> 
> This can have fatal consequences, like in this example:
> 
>   1) Block job A starts and sleeps after a while.
>   2) Block job B starts and tries to reopen node1 (a qcow2 file).
>   3) Reopening node1 means flushing and replacing its qcow2 cache.
>   4) While the qcow2 cache is being flushed, job A wakes up.
>   5) Job A completes and reopens node1, replacing its cache.
>   6) Job B resumes, but the cache that was being flushed no longer
>      exists.
> 
> This patch pauses all block jobs during bdrv_reopen_multiple(), so
> that step 4 can never happen and the operation is safe.
> 
> Note that this scenario can only happen if both bdrv_reopen() calls
> are made by block jobs on the same backing chain. Otherwise there's no
> chance that the same BlockDriverState appears in both reopen queues.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/block.c b/block.c
> index bb1f1ec..c80b528 100644
> --- a/block.c
> +++ b/block.c
> @@ -2087,9 +2087,19 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
>      int ret = -1;
>      BlockReopenQueueEntry *bs_entry, *next;
>      Error *local_err = NULL;
> +    BlockJob *job = NULL;
>  
>      assert(bs_queue != NULL);
>  
> +    /* Pause all block jobs */
> +    while ((job = block_job_next(job))) {
> +        AioContext *aio_context = blk_get_aio_context(job->blk);
> +
> +        aio_context_acquire(aio_context);
> +        block_job_pause(job);
> +        aio_context_release(aio_context);
> +    }
> +
>      bdrv_drain_all();

We already have a bdrv_drain_all() here, which does the same thing (and
more) internally, except that it resumes all jobs before it returns.
Maybe what we should do is split bdrv_drain_all() in a begin/end pair,
too.

If we don't split it, we'd have to do the "and more" part here as well,
disabling all other potential users of the BDSes. This would involve at
least calling bdrv_parent_drained_begin/end().

The other point I'm wondering now is whether bdrv_drain_all() should
have the aio_disable/enable_external() pair that bdrv_drain() has.

Paolo, any opinion?

>      QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
> @@ -2120,6 +2130,17 @@ cleanup:
>          g_free(bs_entry);
>      }
>      g_free(bs_queue);
> +
> +    /* Resume all block jobs */
> +    job = NULL;
> +    while ((job = block_job_next(job))) {
> +        AioContext *aio_context = blk_get_aio_context(job->blk);
> +
> +        aio_context_acquire(aio_context);
> +        block_job_resume(job);
> +        aio_context_release(aio_context);
> +    }
> +
>      return ret;
>  }

Kevin
Paolo Bonzini Oct. 10, 2016, 4:41 p.m. UTC | #2
On 10/10/2016 17:37, Kevin Wolf wrote:
> > +    while ((job = block_job_next(job))) {
> > +        AioContext *aio_context = blk_get_aio_context(job->blk);
> > +
> > +        aio_context_acquire(aio_context);
> > +        block_job_pause(job);
> > +        aio_context_release(aio_context);
> > +    }
> > +
> >      bdrv_drain_all();
>
> We already have a bdrv_drain_all() here, which does the same thing (and
> more) internally, except that it resumes all jobs before it returns.
> Maybe what we should do is split bdrv_drain_all() in a begin/end pair,
> too.

Hey, haven't I just suggested the same? :)  I swear I hadn't read this
before.

> The other point I'm wondering now is whether bdrv_drain_all() should
> have the aio_disable/enable_external() pair that bdrv_drain() has.

bdrv_drain_all need not have it, but its start/end replacement should.

Paolo
Kevin Wolf Oct. 11, 2016, 9:39 a.m. UTC | #3
Am 10.10.2016 um 18:41 hat Paolo Bonzini geschrieben:
> On 10/10/2016 17:37, Kevin Wolf wrote:
> > > +    while ((job = block_job_next(job))) {
> > > +        AioContext *aio_context = blk_get_aio_context(job->blk);
> > > +
> > > +        aio_context_acquire(aio_context);
> > > +        block_job_pause(job);
> > > +        aio_context_release(aio_context);
> > > +    }
> > > +
> > >      bdrv_drain_all();
> >
> > We already have a bdrv_drain_all() here, which does the same thing (and
> > more) internally, except that it resumes all jobs before it returns.
> > Maybe what we should do is split bdrv_drain_all() in a begin/end pair,
> > too.
> 
> Hey, haven't I just suggested the same? :)  I swear I hadn't read this
> before.
> 
> > The other point I'm wondering now is whether bdrv_drain_all() should
> > have the aio_disable/enable_external() pair that bdrv_drain() has.
> 
> bdrv_drain_all need not have it, but its start/end replacement should.

Doesn't need it because it holds the AioContext lock?

Kevin
Paolo Bonzini Oct. 11, 2016, 9:54 a.m. UTC | #4
On 11/10/2016 11:39, Kevin Wolf wrote:
> Am 10.10.2016 um 18:41 hat Paolo Bonzini geschrieben:
>> On 10/10/2016 17:37, Kevin Wolf wrote:
>>>> +    while ((job = block_job_next(job))) {
>>>> +        AioContext *aio_context = blk_get_aio_context(job->blk);
>>>> +
>>>> +        aio_context_acquire(aio_context);
>>>> +        block_job_pause(job);
>>>> +        aio_context_release(aio_context);
>>>> +    }
>>>> +
>>>>      bdrv_drain_all();
>>>
>>> We already have a bdrv_drain_all() here, which does the same thing (and
>>> more) internally, except that it resumes all jobs before it returns.
>>> Maybe what we should do is split bdrv_drain_all() in a begin/end pair,
>>> too.
>>
>> Hey, haven't I just suggested the same? :)  I swear I hadn't read this
>> before.
>>
>>> The other point I'm wondering now is whether bdrv_drain_all() should
>>> have the aio_disable/enable_external() pair that bdrv_drain() has.
>>
>> bdrv_drain_all need not have it, but its start/end replacement should.
> 
> Doesn't need it because it holds the AioContext lock?

No, because as soon as bdrv_drain_all exits, external file descriptors
can be triggered again so I don't think the aio_disable/enable_external
actually provides any protection.

bdrv_drain_all should really only be used in cases where you already
have some kind of "hold" on external file descriptors, like bdrv_close
uses bdrv_drain():

    bdrv_drained_begin(bs); /* complete I/O */
    bdrv_flush(bs);
    bdrv_drain(bs); /* in case flush left pending I/O */

That said, because the simplest implementation of bdrv_drain_all() does

    bdrv_drained_all_start();
    bdrv_drained_all_end();

just like bdrv_drain() does it, it's not a very interesting point.
bdrv_drain_all will probably disable/enable external file descriptors,
it just doesn't need that.

Paolo
Kevin Wolf Oct. 11, 2016, 11:07 a.m. UTC | #5
Am 11.10.2016 um 11:54 hat Paolo Bonzini geschrieben:
> On 11/10/2016 11:39, Kevin Wolf wrote:
> > Am 10.10.2016 um 18:41 hat Paolo Bonzini geschrieben:
> >> On 10/10/2016 17:37, Kevin Wolf wrote:
> >>>> +    while ((job = block_job_next(job))) {
> >>>> +        AioContext *aio_context = blk_get_aio_context(job->blk);
> >>>> +
> >>>> +        aio_context_acquire(aio_context);
> >>>> +        block_job_pause(job);
> >>>> +        aio_context_release(aio_context);
> >>>> +    }
> >>>> +
> >>>>      bdrv_drain_all();
> >>>
> >>> We already have a bdrv_drain_all() here, which does the same thing (and
> >>> more) internally, except that it resumes all jobs before it returns.
> >>> Maybe what we should do is split bdrv_drain_all() in a begin/end pair,
> >>> too.
> >>
> >> Hey, haven't I just suggested the same? :)  I swear I hadn't read this
> >> before.
> >>
> >>> The other point I'm wondering now is whether bdrv_drain_all() should
> >>> have the aio_disable/enable_external() pair that bdrv_drain() has.
> >>
> >> bdrv_drain_all need not have it, but its start/end replacement should.
> > 
> > Doesn't need it because it holds the AioContext lock?
> 
> No, because as soon as bdrv_drain_all exits, external file descriptors
> can be triggered again so I don't think the aio_disable/enable_external
> actually provides any protection.

Right, what I was thinking of was more like new requests coming in
while bdrv_drain_all() is still running. If that happened, the result
after bdrv_drain_all() wouldn't be different, but if external file
descriptors are still active and the guest keeps sending requests, it
could take forever until it returns.

But that part is addressed by the AioContext lock, I think.

> bdrv_drain_all should really only be used in cases where you already
> have some kind of "hold" on external file descriptors, like bdrv_close
> uses bdrv_drain():
> 
>     bdrv_drained_begin(bs); /* complete I/O */
>     bdrv_flush(bs);
>     bdrv_drain(bs); /* in case flush left pending I/O */
> 
> That said, because the simplest implementation of bdrv_drain_all() does
> 
>     bdrv_drained_all_start();
>     bdrv_drained_all_end();
> 
> just like bdrv_drain() does it, it's not a very interesting point.
> bdrv_drain_all will probably disable/enable external file descriptors,
> it just doesn't need that.

Yes, of course.

Kevin
diff mbox

Patch

diff --git a/block.c b/block.c
index bb1f1ec..c80b528 100644
--- a/block.c
+++ b/block.c
@@ -2087,9 +2087,19 @@  int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
     int ret = -1;
     BlockReopenQueueEntry *bs_entry, *next;
     Error *local_err = NULL;
+    BlockJob *job = NULL;
 
     assert(bs_queue != NULL);
 
+    /* Pause all block jobs */
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = blk_get_aio_context(job->blk);
+
+        aio_context_acquire(aio_context);
+        block_job_pause(job);
+        aio_context_release(aio_context);
+    }
+
     bdrv_drain_all();
 
     QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
@@ -2120,6 +2130,17 @@  cleanup:
         g_free(bs_entry);
     }
     g_free(bs_queue);
+
+    /* Resume all block jobs */
+    job = NULL;
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = blk_get_aio_context(job->blk);
+
+        aio_context_acquire(aio_context);
+        block_job_resume(job);
+        aio_context_release(aio_context);
+    }
+
     return ret;
 }