diff mbox

qed: fix bdrv_qed_drain

Message ID 1455638000-18051-1-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Feb. 16, 2016, 3:53 p.m. UTC
The current implementation of bdrv_qed_drain can cause a double
completion of a request.

The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
unconditionally, but this is not correct if an allocating write
is queued.  In this case, qed_unplug_allocating_write_reqs will
restart the allocating write and possibly cause it to complete.
The aiocb however is still in use for the L2/L1 table writes,
and will then be completed again as soon as the table writes
are stable.

The fix is to only call qed_plug_allocating_write_reqs and
bdrv_aio_flush (which is the same as the timer callback---the patch
makes this explicit) only if the timer was scheduled in the first
place.  This fixes qemu-iotests test 011.

Cc: qemu-stable@nongnu.org
Cc: qemu-block@nongnu.org
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/qed.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Fam Zheng Feb. 17, 2016, 2:57 a.m. UTC | #1
On Tue, 02/16 16:53, Paolo Bonzini wrote:
> The current implementation of bdrv_qed_drain can cause a double
> completion of a request.
> 
> The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
> unconditionally, but this is not correct if an allocating write
> is queued.  In this case, qed_unplug_allocating_write_reqs will
> restart the allocating write and possibly cause it to complete.
> The aiocb however is still in use for the L2/L1 table writes,
> and will then be completed again as soon as the table writes
> are stable.
> 
> The fix is to only call qed_plug_allocating_write_reqs and
> bdrv_aio_flush (which is the same as the timer callback---the patch
> makes this explicit) only if the timer was scheduled in the first
> place.  This fixes qemu-iotests test 011.
> 
> Cc: qemu-stable@nongnu.org
> Cc: qemu-block@nongnu.org
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/qed.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/block/qed.c b/block/qed.c
> index 404be1e..ebba220 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -380,12 +380,13 @@ static void bdrv_qed_drain(BlockDriverState *bs)
>  {
>      BDRVQEDState *s = bs->opaque;
>  
> -    /* Cancel timer and start doing I/O that were meant to happen as if it
> -     * fired, that way we get bdrv_drain() taking care of the ongoing requests
> -     * correctly. */
> -    qed_cancel_need_check_timer(s);
> -    qed_plug_allocating_write_reqs(s);
> -    bdrv_aio_flush(s->bs, qed_clear_need_check, s);
> +    /* Fire the timer immediately in order to start doing I/O as soon as the
> +     * header is flushed.
> +     */
> +    if (s->need_check_timer && timer_pending(s->need_check_timer)) {

We can assert(s->need_check_timer);

> +        qed_cancel_need_check_timer(s);
> +        qed_need_check_timer_cb(s);
> +    }

What if an allocating write is queued (the else branch case)? Its completion
will be in bdrv_drain and it could arm the need_check_timer which is wrong.

We need to drain the allocating_write_reqs queue before checking the timer.

Fam
Paolo Bonzini Feb. 17, 2016, 11:28 a.m. UTC | #2
On 17/02/2016 03:57, Fam Zheng wrote:
> On Tue, 02/16 16:53, Paolo Bonzini wrote:
>> The current implementation of bdrv_qed_drain can cause a double
>> completion of a request.
>>
>> The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
>> unconditionally, but this is not correct if an allocating write
>> is queued.  In this case, qed_unplug_allocating_write_reqs will
>> restart the allocating write and possibly cause it to complete.
>> The aiocb however is still in use for the L2/L1 table writes,
>> and will then be completed again as soon as the table writes
>> are stable.
>>
>> The fix is to only call qed_plug_allocating_write_reqs and
>> bdrv_aio_flush (which is the same as the timer callback---the patch
>> makes this explicit) only if the timer was scheduled in the first
>> place.  This fixes qemu-iotests test 011.
>>
>> Cc: qemu-stable@nongnu.org
>> Cc: qemu-block@nongnu.org
>> Cc: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  block/qed.c | 13 +++++++------
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/qed.c b/block/qed.c
>> index 404be1e..ebba220 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -380,12 +380,13 @@ static void bdrv_qed_drain(BlockDriverState *bs)
>>  {
>>      BDRVQEDState *s = bs->opaque;
>>  
>> -    /* Cancel timer and start doing I/O that were meant to happen as if it
>> -     * fired, that way we get bdrv_drain() taking care of the ongoing requests
>> -     * correctly. */
>> -    qed_cancel_need_check_timer(s);
>> -    qed_plug_allocating_write_reqs(s);
>> -    bdrv_aio_flush(s->bs, qed_clear_need_check, s);
>> +    /* Fire the timer immediately in order to start doing I/O as soon as the
>> +     * header is flushed.
>> +     */
>> +    if (s->need_check_timer && timer_pending(s->need_check_timer)) {
> 
> We can assert(s->need_check_timer);

I've seen it NULL, but didn't check why.  This was also a source of
segmentation faults.

>> +        qed_cancel_need_check_timer(s);
>> +        qed_need_check_timer_cb(s);
>> +    }
> 
> What if an allocating write is queued (the else branch case)? Its completion
> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
> 
> We need to drain the allocating_write_reqs queue before checking the timer.

You're right, but how?  That's what bdrv_drain(bs) does, it's a
chicken-and-egg problem.

In my new series, draining works separately on each BlockDriverState
along the chain, from parent (bs) to children (bs->file->bs).  We could
then have .before_drain and .after_drain; .before_drain disables the
timer, while .after_drain does the same operation as the timer.  But
that couldn't be backported, and 2.5 would remain broken forever.  This
patch at least is a band-aid to make QED functional.

Perhaps the alternative is to remove write support for QED altogether
(and the drain callback with it, since it's the only user).  Not sure
why anyone would use it these days.

Paolo
Fam Zheng Feb. 23, 2016, 5:57 a.m. UTC | #3
On Wed, 02/17 12:28, Paolo Bonzini wrote:
> 
> 
> On 17/02/2016 03:57, Fam Zheng wrote:
> > On Tue, 02/16 16:53, Paolo Bonzini wrote:
> >> The current implementation of bdrv_qed_drain can cause a double
> >> completion of a request.
> >>
> >> The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
> >> unconditionally, but this is not correct if an allocating write
> >> is queued.  In this case, qed_unplug_allocating_write_reqs will
> >> restart the allocating write and possibly cause it to complete.
> >> The aiocb however is still in use for the L2/L1 table writes,
> >> and will then be completed again as soon as the table writes
> >> are stable.
> >>
> >> The fix is to only call qed_plug_allocating_write_reqs and
> >> bdrv_aio_flush (which is the same as the timer callback---the patch
> >> makes this explicit) only if the timer was scheduled in the first
> >> place.  This fixes qemu-iotests test 011.
> >>
> >> Cc: qemu-stable@nongnu.org
> >> Cc: qemu-block@nongnu.org
> >> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >>  block/qed.c | 13 +++++++------
> >>  1 file changed, 7 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/block/qed.c b/block/qed.c
> >> index 404be1e..ebba220 100644
> >> --- a/block/qed.c
> >> +++ b/block/qed.c
> >> @@ -380,12 +380,13 @@ static void bdrv_qed_drain(BlockDriverState *bs)
> >>  {
> >>      BDRVQEDState *s = bs->opaque;
> >>  
> >> -    /* Cancel timer and start doing I/O that were meant to happen as if it
> >> -     * fired, that way we get bdrv_drain() taking care of the ongoing requests
> >> -     * correctly. */
> >> -    qed_cancel_need_check_timer(s);
> >> -    qed_plug_allocating_write_reqs(s);
> >> -    bdrv_aio_flush(s->bs, qed_clear_need_check, s);
> >> +    /* Fire the timer immediately in order to start doing I/O as soon as the
> >> +     * header is flushed.
> >> +     */
> >> +    if (s->need_check_timer && timer_pending(s->need_check_timer)) {
> > 
> > We can assert(s->need_check_timer);
> 
> I've seen it NULL, but didn't check why.  This was also a source of
> segmentation faults.
> 
> >> +        qed_cancel_need_check_timer(s);
> >> +        qed_need_check_timer_cb(s);
> >> +    }
> > 
> > What if an allocating write is queued (the else branch case)? Its completion
> > will be in bdrv_drain and it could arm the need_check_timer which is wrong.
> > 
> > We need to drain the allocating_write_reqs queue before checking the timer.
> 
> You're right, but how?  That's what bdrv_drain(bs) does, it's a
> chicken-and-egg problem.

Maybe use an aio_poll loop before the if?

Fam
Paolo Bonzini Feb. 23, 2016, 10:43 a.m. UTC | #4
On 23/02/2016 06:57, Fam Zheng wrote:
>>>> +        qed_cancel_need_check_timer(s);
>>>> +        qed_need_check_timer_cb(s);
>>>> +    }
>>>
>>> What if an allocating write is queued (the else branch case)? Its completion
>>> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
>>>
>>> We need to drain the allocating_write_reqs queue before checking the timer.
>>
>> You're right, but how?  That's what bdrv_drain(bs) does, it's a
>> chicken-and-egg problem.
> 
> Maybe use an aio_poll loop before the if?

That would not change the fact that you're reimplementing bdrv_drain
inside bdrv_qed_drain.

Perhaps for now it's simplest to just remove the QED .bdrv_drain
callback, if you think this patch is not a good stopgap measure to avoid
the segmentation faults.

Once the bdrv_drain rework is in, we can move the callback _after_ I/O
is drained on bs and before it is drained on bs->file->bs.

Paolo
Fam Zheng Feb. 23, 2016, 12:49 p.m. UTC | #5
On Tue, 02/23 11:43, Paolo Bonzini wrote:
> 
> 
> On 23/02/2016 06:57, Fam Zheng wrote:
> >>>> +        qed_cancel_need_check_timer(s);
> >>>> +        qed_need_check_timer_cb(s);
> >>>> +    }
> >>>
> >>> What if an allocating write is queued (the else branch case)? Its completion
> >>> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
> >>>
> >>> We need to drain the allocating_write_reqs queue before checking the timer.
> >>
> >> You're right, but how?  That's what bdrv_drain(bs) does, it's a
> >> chicken-and-egg problem.
> > 
> > Maybe use an aio_poll loop before the if?
> 
> That would not change the fact that you're reimplementing bdrv_drain
> inside bdrv_qed_drain.
> 

But it fulfills the contract of .bdrv_drain. This is the easy way, the hard way
would be iterating through the allocating_write_reqs list and process reqs one
by one synchronously, which still involves aio_poll indirectly.

> Perhaps for now it's simplest to just remove the QED .bdrv_drain
> callback, if you think this patch is not a good stopgap measure to avoid
> the segmentation faults.

OK, I'm fine with this as a stopgap measure.

> 
> Once the bdrv_drain rework is in, we can move the callback _after_ I/O
> is drained on bs and before it is drained on bs->file->bs.

Sounds good.

Fam
Paolo Bonzini Feb. 23, 2016, 1:54 p.m. UTC | #6
On 23/02/2016 13:49, Fam Zheng wrote:
> On Tue, 02/23 11:43, Paolo Bonzini wrote:
>>
>>
>> On 23/02/2016 06:57, Fam Zheng wrote:
>>>>>> +        qed_cancel_need_check_timer(s);
>>>>>> +        qed_need_check_timer_cb(s);
>>>>>> +    }
>>>>>
>>>>> What if an allocating write is queued (the else branch case)? Its completion
>>>>> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
>>>>>
>>>>> We need to drain the allocating_write_reqs queue before checking the timer.
>>>>
>>>> You're right, but how?  That's what bdrv_drain(bs) does, it's a
>>>> chicken-and-egg problem.
>>>
>>> Maybe use an aio_poll loop before the if?
>>
>> That would not change the fact that you're reimplementing bdrv_drain
>> inside bdrv_qed_drain.
> 
> But it fulfills the contract of .bdrv_drain. This is the easy way, the hard way
> would be iterating through the allocating_write_reqs list and process reqs one
> by one synchronously, which still involves aio_poll indirectly.

The easy way would be better then.

Stefan, any second opinion?

Paolo

>> Perhaps for now it's simplest to just remove the QED .bdrv_drain
>> callback, if you think this patch is not a good stopgap measure to avoid
>> the segmentation faults.
> 
> OK, I'm fine with this as a stopgap measure.
> 
>> Once the bdrv_drain rework is in, we can move the callback _after_ I/O
>> is drained on bs and before it is drained on bs->file->bs.
> 
> Sounds good.
Kevin Wolf March 7, 2016, 4:57 p.m. UTC | #7
Am 23.02.2016 um 14:54 hat Paolo Bonzini geschrieben:
> 
> 
> On 23/02/2016 13:49, Fam Zheng wrote:
> > On Tue, 02/23 11:43, Paolo Bonzini wrote:
> >>
> >>
> >> On 23/02/2016 06:57, Fam Zheng wrote:
> >>>>>> +        qed_cancel_need_check_timer(s);
> >>>>>> +        qed_need_check_timer_cb(s);
> >>>>>> +    }
> >>>>>
> >>>>> What if an allocating write is queued (the else branch case)? Its completion
> >>>>> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
> >>>>>
> >>>>> We need to drain the allocating_write_reqs queue before checking the timer.
> >>>>
> >>>> You're right, but how?  That's what bdrv_drain(bs) does, it's a
> >>>> chicken-and-egg problem.
> >>>
> >>> Maybe use an aio_poll loop before the if?
> >>
> >> That would not change the fact that you're reimplementing bdrv_drain
> >> inside bdrv_qed_drain.
> > 
> > But it fulfills the contract of .bdrv_drain. This is the easy way, the hard way
> > would be iterating through the allocating_write_reqs list and process reqs one
> > by one synchronously, which still involves aio_poll indirectly.
> 
> The easy way would be better then.
> 
> Stefan, any second opinion?

What's the status here? It would be good to have qed not segfaulting all
the time.

Kevin
Stefan Hajnoczi March 7, 2016, 8:56 p.m. UTC | #8
On Mon, Mar 07, 2016 at 05:57:41PM +0100, Kevin Wolf wrote:
> Am 23.02.2016 um 14:54 hat Paolo Bonzini geschrieben:
> > 
> > 
> > On 23/02/2016 13:49, Fam Zheng wrote:
> > > On Tue, 02/23 11:43, Paolo Bonzini wrote:
> > >>
> > >>
> > >> On 23/02/2016 06:57, Fam Zheng wrote:
> > >>>>>> +        qed_cancel_need_check_timer(s);
> > >>>>>> +        qed_need_check_timer_cb(s);
> > >>>>>> +    }
> > >>>>>
> > >>>>> What if an allocating write is queued (the else branch case)? Its completion
> > >>>>> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
> > >>>>>
> > >>>>> We need to drain the allocating_write_reqs queue before checking the timer.
> > >>>>
> > >>>> You're right, but how?  That's what bdrv_drain(bs) does, it's a
> > >>>> chicken-and-egg problem.
> > >>>
> > >>> Maybe use an aio_poll loop before the if?
> > >>
> > >> That would not change the fact that you're reimplementing bdrv_drain
> > >> inside bdrv_qed_drain.
> > > 
> > > But it fulfills the contract of .bdrv_drain. This is the easy way, the hard way
> > > would be iterating through the allocating_write_reqs list and process reqs one
> > > by one synchronously, which still involves aio_poll indirectly.
> > 
> > The easy way would be better then.
> > 
> > Stefan, any second opinion?
> 
> What's the status here? It would be good to have qed not segfaulting all
> the time.

I think the timer concept itself is troublesome.  A radical approach but
limited to changing QED itself is to drop the timer and instead keep a
timestamp for the last allocating write request.  Next time a write
request (allocating or rewriting) is about to complete, do the flush and
clear the need check flag as part of the write request (if 5 seconds
have passed since the timestamp).

Because the need check flag clearing is part of the write request
completion, it will integrate properly with drain.

Stefan
Stefan Hajnoczi March 7, 2016, 8:57 p.m. UTC | #9
By the way, I'll send a patch on Tuesday showing the timerless need
check mechanism.  It's a little too late tonight to start it.

Stefan
Paolo Bonzini March 7, 2016, 9:22 p.m. UTC | #10
On 07/03/2016 21:56, Stefan Hajnoczi wrote:
> I think the timer concept itself is troublesome.  A radical approach but
> limited to changing QED itself is to drop the timer and instead keep a
> timestamp for the last allocating write request.  Next time a write
> request (allocating or rewriting) is about to complete, do the flush and
> clear the need check flag as part of the write request (if 5 seconds
> have passed since the timestamp).

bdrv_qed_drain should be easy to fix in my new drain implementation,
which is based on draining the parent before the BdrvChild-ren.  It's
just troublesome in the current one which alternates flushing and draining.

I would just revert the patch that introduced bdrv_qed_drain for now,
and reintroduce it later (note however that something was messed up in
commit df9a681, "qed: Implement .bdrv_drain", 2015-11-12, because it
includes some dirty bitmap stuff).

Or perhaps the idea of making QED read-only has some merit...

Paolo
Stefan Hajnoczi March 8, 2016, 9:52 a.m. UTC | #11
On Mon, Mar 07, 2016 at 10:22:58PM +0100, Paolo Bonzini wrote:
> On 07/03/2016 21:56, Stefan Hajnoczi wrote:
> > I think the timer concept itself is troublesome.  A radical approach but
> > limited to changing QED itself is to drop the timer and instead keep a
> > timestamp for the last allocating write request.  Next time a write
> > request (allocating or rewriting) is about to complete, do the flush and
> > clear the need check flag as part of the write request (if 5 seconds
> > have passed since the timestamp).
> 
> bdrv_qed_drain should be easy to fix in my new drain implementation,
> which is based on draining the parent before the BdrvChild-ren.  It's
> just troublesome in the current one which alternates flushing and draining.
> 
> I would just revert the patch that introduced bdrv_qed_drain for now,
> and reintroduce it later (note however that something was messed up in
> commit df9a681, "qed: Implement .bdrv_drain", 2015-11-12, because it
> includes some dirty bitmap stuff).

You're right, that might be the best solution for the time being.  AFAIK
the need check write is harmless.  It does not result in a guest-visible
change and is basically just a flush + header update.

Stefan
Kevin Wolf March 8, 2016, 9:58 a.m. UTC | #12
Am 07.03.2016 um 21:56 hat Stefan Hajnoczi geschrieben:
> On Mon, Mar 07, 2016 at 05:57:41PM +0100, Kevin Wolf wrote:
> > Am 23.02.2016 um 14:54 hat Paolo Bonzini geschrieben:
> > > 
> > > 
> > > On 23/02/2016 13:49, Fam Zheng wrote:
> > > > On Tue, 02/23 11:43, Paolo Bonzini wrote:
> > > >>
> > > >>
> > > >> On 23/02/2016 06:57, Fam Zheng wrote:
> > > >>>>>> +        qed_cancel_need_check_timer(s);
> > > >>>>>> +        qed_need_check_timer_cb(s);
> > > >>>>>> +    }
> > > >>>>>
> > > >>>>> What if an allocating write is queued (the else branch case)? Its completion
> > > >>>>> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
> > > >>>>>
> > > >>>>> We need to drain the allocating_write_reqs queue before checking the timer.
> > > >>>>
> > > >>>> You're right, but how?  That's what bdrv_drain(bs) does, it's a
> > > >>>> chicken-and-egg problem.
> > > >>>
> > > >>> Maybe use an aio_poll loop before the if?
> > > >>
> > > >> That would not change the fact that you're reimplementing bdrv_drain
> > > >> inside bdrv_qed_drain.
> > > > 
> > > > But it fulfills the contract of .bdrv_drain. This is the easy way, the hard way
> > > > would be iterating through the allocating_write_reqs list and process reqs one
> > > > by one synchronously, which still involves aio_poll indirectly.
> > > 
> > > The easy way would be better then.
> > > 
> > > Stefan, any second opinion?
> > 
> > What's the status here? It would be good to have qed not segfaulting all
> > the time.
> 
> I think the timer concept itself is troublesome.  A radical approach but
> limited to changing QED itself is to drop the timer and instead keep a
> timestamp for the last allocating write request.  Next time a write
> request (allocating or rewriting) is about to complete, do the flush and
> clear the need check flag as part of the write request (if 5 seconds
> have passed since the timestamp).

Isn't that kind of backwards, though, because it's very likely that
after some new activity a second write request will come in soon and
mark the image dirty again?

Apart from that, doesn't it fail to provide the intended effect, that
the image doesn't stay dirty for a long time and a repair isn't usually
needed after a crash?

I think rather than doing this, I'd just remove the mechanism altogether
and instead mark the image clean when the guest flushes the disk cache.

Kevin

> Because the need check flag clearing is part of the write request
> completion, it will integrate properly with drain.
> 
> Stefan
Kevin Wolf March 8, 2016, 9:59 a.m. UTC | #13
Am 08.03.2016 um 10:52 hat Stefan Hajnoczi geschrieben:
> On Mon, Mar 07, 2016 at 10:22:58PM +0100, Paolo Bonzini wrote:
> > On 07/03/2016 21:56, Stefan Hajnoczi wrote:
> > > I think the timer concept itself is troublesome.  A radical approach but
> > > limited to changing QED itself is to drop the timer and instead keep a
> > > timestamp for the last allocating write request.  Next time a write
> > > request (allocating or rewriting) is about to complete, do the flush and
> > > clear the need check flag as part of the write request (if 5 seconds
> > > have passed since the timestamp).
> > 
> > bdrv_qed_drain should be easy to fix in my new drain implementation,
> > which is based on draining the parent before the BdrvChild-ren.  It's
> > just troublesome in the current one which alternates flushing and draining.
> > 
> > I would just revert the patch that introduced bdrv_qed_drain for now,
> > and reintroduce it later (note however that something was messed up in
> > commit df9a681, "qed: Implement .bdrv_drain", 2015-11-12, because it
> > includes some dirty bitmap stuff).
> 
> You're right, that might be the best solution for the time being.  AFAIK
> the need check write is harmless.  It does not result in a guest-visible
> change and is basically just a flush + header update.

I think it caused occasional assertion failures.

Kevin
Stefan Hajnoczi March 9, 2016, 3:37 p.m. UTC | #14
On Tue, Mar 08, 2016 at 10:58:47AM +0100, Kevin Wolf wrote:
> Am 07.03.2016 um 21:56 hat Stefan Hajnoczi geschrieben:
> > On Mon, Mar 07, 2016 at 05:57:41PM +0100, Kevin Wolf wrote:
> > > Am 23.02.2016 um 14:54 hat Paolo Bonzini geschrieben:
> > > > 
> > > > 
> > > > On 23/02/2016 13:49, Fam Zheng wrote:
> > > > > On Tue, 02/23 11:43, Paolo Bonzini wrote:
> > > > >>
> > > > >>
> > > > >> On 23/02/2016 06:57, Fam Zheng wrote:
> > > > >>>>>> +        qed_cancel_need_check_timer(s);
> > > > >>>>>> +        qed_need_check_timer_cb(s);
> > > > >>>>>> +    }
> > > > >>>>>
> > > > >>>>> What if an allocating write is queued (the else branch case)? Its completion
> > > > >>>>> will be in bdrv_drain and it could arm the need_check_timer which is wrong.
> > > > >>>>>
> > > > >>>>> We need to drain the allocating_write_reqs queue before checking the timer.
> > > > >>>>
> > > > >>>> You're right, but how?  That's what bdrv_drain(bs) does, it's a
> > > > >>>> chicken-and-egg problem.
> > > > >>>
> > > > >>> Maybe use an aio_poll loop before the if?
> > > > >>
> > > > >> That would not change the fact that you're reimplementing bdrv_drain
> > > > >> inside bdrv_qed_drain.
> > > > > 
> > > > > But it fulfills the contract of .bdrv_drain. This is the easy way, the hard way
> > > > > would be iterating through the allocating_write_reqs list and process reqs one
> > > > > by one synchronously, which still involves aio_poll indirectly.
> > > > 
> > > > The easy way would be better then.
> > > > 
> > > > Stefan, any second opinion?
> > > 
> > > What's the status here? It would be good to have qed not segfaulting all
> > > the time.
> > 
> > I think the timer concept itself is troublesome.  A radical approach but
> > limited to changing QED itself is to drop the timer and instead keep a
> > timestamp for the last allocating write request.  Next time a write
> > request (allocating or rewriting) is about to complete, do the flush and
> > clear the need check flag as part of the write request (if 5 seconds
> > have passed since the timestamp).
> 
> Isn't that kind of backwards, though, because it's very likely that
> after some new activity a second write request will come in soon and
> mark the image dirty again?
> 
> Apart from that, doesn't it fail to provide the intended effect, that
> the image doesn't stay dirty for a long time and a repair isn't usually
> needed after a crash?

Yes, it relies on a non-allocating write request after the timeout.

I've reverted the drain patch for now.

Stefan
diff mbox

Patch

diff --git a/block/qed.c b/block/qed.c
index 404be1e..ebba220 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -380,12 +380,13 @@  static void bdrv_qed_drain(BlockDriverState *bs)
 {
     BDRVQEDState *s = bs->opaque;
 
-    /* Cancel timer and start doing I/O that were meant to happen as if it
-     * fired, that way we get bdrv_drain() taking care of the ongoing requests
-     * correctly. */
-    qed_cancel_need_check_timer(s);
-    qed_plug_allocating_write_reqs(s);
-    bdrv_aio_flush(s->bs, qed_clear_need_check, s);
+    /* Fire the timer immediately in order to start doing I/O as soon as the
+     * header is flushed.
+     */
+    if (s->need_check_timer && timer_pending(s->need_check_timer)) {
+        qed_cancel_need_check_timer(s);
+        qed_need_check_timer_cb(s);
+    }
 }
 
 static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,