diff mbox

[v10,12/14] block: add transactional properties

Message ID 1445644612-12702-13-git-send-email-jsnow@redhat.com
State New
Headers show

Commit Message

John Snow Oct. 23, 2015, 11:56 p.m. UTC
Add both transactional properties to the QMP transactional interface,
and add the BlockJobTxn that we create as a result of the err-cancel
property to the BlkActionState structure.

[split up from a patch originally by Stefan and Fam. --js]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
---
 blockdev.c       | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 qapi-schema.json | 48 +++++++++++++++++++++++++++++++---
 qmp-commands.hx  |  2 +-
 3 files changed, 120 insertions(+), 8 deletions(-)

Comments

Stefan Hajnoczi Nov. 3, 2015, 3:17 p.m. UTC | #1
On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>                                               common, common);
>  
> +    if (action_check_cancel_mode(common, errp) < 0) {
> +        return;
> +    }
> +
>      action = common->action->block_dirty_bitmap_add;
>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>      qmp_block_dirty_bitmap_add(action->node, action->name,
> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>                                               common, common);
>      BlockDirtyBitmap *action;
>  
> +    if (action_check_cancel_mode(common, errp) < 0) {
> +        return;
> +    }
> +
>      action = common->action->block_dirty_bitmap_clear;
>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>                                                action->name,

Why do the bitmap add/clear actions not support err-cancel=all?

I understand why other block jobs don't support it, but it's not clear
why these non-block job actions cannot.
Eric Blake Nov. 3, 2015, 3:23 p.m. UTC | #2
On 10/23/2015 05:56 PM, John Snow wrote:
> Add both transactional properties to the QMP transactional interface,
> and add the BlockJobTxn that we create as a result of the err-cancel
> property to the BlkActionState structure.
> 
> [split up from a patch originally by Stefan and Fam. --js]
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

Double S-o-b looks odd.

> ---
>  blockdev.c       | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  qapi-schema.json | 48 +++++++++++++++++++++++++++++++---
>  qmp-commands.hx  |  2 +-
>  3 files changed, 120 insertions(+), 8 deletions(-)
> 


> +##
>  # @transaction
>  #
>  # Executes a number of transactionable QMP commands atomically. If any
>  # operation fails, then the entire set of actions will be abandoned and the
>  # appropriate error returned.
>  #
> -#  List of:
> -#  @TransactionAction: information needed for the respective operation
> +# @actions: List of @TransactionAction;
> +#           information needed for the respective operations.
> +#
> +# @properties: Optional structure of additional options to control the

Elsewhere, we've spelled it '#optional'; Marc-Andre has patches that
rely on that spelling to turn it .json into documentation.

But otherwise, looks good on first glance.
John Snow Nov. 3, 2015, 5:27 p.m. UTC | #3
On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>                                               common, common);
>>  
>> +    if (action_check_cancel_mode(common, errp) < 0) {
>> +        return;
>> +    }
>> +
>>      action = common->action->block_dirty_bitmap_add;
>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>                                               common, common);
>>      BlockDirtyBitmap *action;
>>  
>> +    if (action_check_cancel_mode(common, errp) < 0) {
>> +        return;
>> +    }
>> +
>>      action = common->action->block_dirty_bitmap_clear;
>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>                                                action->name,
> 
> Why do the bitmap add/clear actions not support err-cancel=all?
> 
> I understand why other block jobs don't support it, but it's not clear
> why these non-block job actions cannot.
> 

Because they don't have a callback to invoke if the rest of the job fails.

I could create a BlockJob for them complete with a callback to invoke,
but basically it's just because there's no interface to unwind them, or
an interface to join them with the transaction.

They're small, synchronous non-job actions. Which makes them weird.
John Snow Nov. 3, 2015, 5:31 p.m. UTC | #4
On 11/03/2015 10:23 AM, Eric Blake wrote:
> On 10/23/2015 05:56 PM, John Snow wrote:
>> Add both transactional properties to the QMP transactional interface,
>> and add the BlockJobTxn that we create as a result of the err-cancel
>> property to the BlkActionState structure.
>>
>> [split up from a patch originally by Stefan and Fam. --js]
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Fam Zheng <famz@redhat.com>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
> 
> Double S-o-b looks odd.
> 
>> ---
>>  blockdev.c       | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  qapi-schema.json | 48 +++++++++++++++++++++++++++++++---
>>  qmp-commands.hx  |  2 +-
>>  3 files changed, 120 insertions(+), 8 deletions(-)
>>
> 
> 
>> +##
>>  # @transaction
>>  #
>>  # Executes a number of transactionable QMP commands atomically. If any
>>  # operation fails, then the entire set of actions will be abandoned and the
>>  # appropriate error returned.
>>  #
>> -#  List of:
>> -#  @TransactionAction: information needed for the respective operation
>> +# @actions: List of @TransactionAction;
>> +#           information needed for the respective operations.
>> +#
>> +# @properties: Optional structure of additional options to control the
> 
> Elsewhere, we've spelled it '#optional'; Marc-Andre has patches that
> rely on that spelling to turn it .json into documentation.
> 
> But otherwise, looks good on first glance.
> 

Do you mean:

"# @properties: #optional structure of "... ?

--js
Eric Blake Nov. 3, 2015, 5:35 p.m. UTC | #5
On 11/03/2015 10:31 AM, John Snow wrote:
> 
> 
> On 11/03/2015 10:23 AM, Eric Blake wrote:
>> On 10/23/2015 05:56 PM, John Snow wrote:
>>> Add both transactional properties to the QMP transactional interface,
>>> and add the BlockJobTxn that we create as a result of the err-cancel
>>> property to the BlkActionState structure.
>>>


>>> +# @actions: List of @TransactionAction;
>>> +#           information needed for the respective operations.
>>> +#
>>> +# @properties: Optional structure of additional options to control the
>>
>> Elsewhere, we've spelled it '#optional'; Marc-Andre has patches that
>> rely on that spelling to turn it .json into documentation.
>>
>> But otherwise, looks good on first glance.
>>
> 
> Do you mean:
> 
> "# @properties: #optional structure of "... ?

Yes.
Stefan Hajnoczi Nov. 5, 2015, 10:47 a.m. UTC | #6
On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
> 
> 
> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
> > On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
> >> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
> >>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
> >>                                               common, common);
> >>  
> >> +    if (action_check_cancel_mode(common, errp) < 0) {
> >> +        return;
> >> +    }
> >> +
> >>      action = common->action->block_dirty_bitmap_add;
> >>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
> >>      qmp_block_dirty_bitmap_add(action->node, action->name,
> >> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
> >>                                               common, common);
> >>      BlockDirtyBitmap *action;
> >>  
> >> +    if (action_check_cancel_mode(common, errp) < 0) {
> >> +        return;
> >> +    }
> >> +
> >>      action = common->action->block_dirty_bitmap_clear;
> >>      state->bitmap = block_dirty_bitmap_lookup(action->node,
> >>                                                action->name,
> > 
> > Why do the bitmap add/clear actions not support err-cancel=all?
> > 
> > I understand why other block jobs don't support it, but it's not clear
> > why these non-block job actions cannot.
> > 
> 
> Because they don't have a callback to invoke if the rest of the job fails.
> 
> I could create a BlockJob for them complete with a callback to invoke,
> but basically it's just because there's no interface to unwind them, or
> an interface to join them with the transaction.
> 
> They're small, synchronous non-job actions. Which makes them weird.

Funny, we've been looking at the same picture while seeing different
things:
https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion

I think I understand your idea: the transaction should include both
immediate actions as well as block jobs.

My mental model was different: immediate actions commit/abort along with
the 'transaction' command.  Block jobs are separate and complete/cancel
together in a group.

In practice I think the two end up being similar because we won't be
able to implement immediate action commit/abort together with
long-running block jobs because the immediate actions rely on
quiescing/pausing the guest for atomic commit/abort.

So with your mental model the QMP client has to submit 2 'transaction'
commands: 1 for the immediate actions, 1 for the block jobs.

In my mental model the QMP client submits 1 command but the immediate
actions and block jobs are two separate transaction scopes.  This means
if the block jobs fail, the client needs to be aware of the immediate
actions that have committed.  Because of this, it becomes just as much
client effort as submitting two separate 'transaction' commands in your
model.

Can anyone see a practical difference?  I think I'm happy with John's
model.

Stefan
John Snow Nov. 5, 2015, 6:52 p.m. UTC | #7
On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>>
>>
>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>>                                               common, common);
>>>>  
>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>> +        return;
>>>> +    }
>>>> +
>>>>      action = common->action->block_dirty_bitmap_add;
>>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>>                                               common, common);
>>>>      BlockDirtyBitmap *action;
>>>>  
>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>> +        return;
>>>> +    }
>>>> +
>>>>      action = common->action->block_dirty_bitmap_clear;
>>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>>                                                action->name,
>>>
>>> Why do the bitmap add/clear actions not support err-cancel=all?
>>>
>>> I understand why other block jobs don't support it, but it's not clear
>>> why these non-block job actions cannot.
>>>
>>
>> Because they don't have a callback to invoke if the rest of the job fails.
>>
>> I could create a BlockJob for them complete with a callback to invoke,
>> but basically it's just because there's no interface to unwind them, or
>> an interface to join them with the transaction.
>>
>> They're small, synchronous non-job actions. Which makes them weird.
> 
> Funny, we've been looking at the same picture while seeing different
> things:
> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
> 
> I think I understand your idea: the transaction should include both
> immediate actions as well as block jobs.
> 
> My mental model was different: immediate actions commit/abort along with
> the 'transaction' command.  Block jobs are separate and complete/cancel
> together in a group.
> 
> In practice I think the two end up being similar because we won't be
> able to implement immediate action commit/abort together with
> long-running block jobs because the immediate actions rely on
> quiescing/pausing the guest for atomic commit/abort.
> 
> So with your mental model the QMP client has to submit 2 'transaction'
> commands: 1 for the immediate actions, 1 for the block jobs.
> 
> In my mental model the QMP client submits 1 command but the immediate
> actions and block jobs are two separate transaction scopes.  This means
> if the block jobs fail, the client needs to be aware of the immediate
> actions that have committed.  Because of this, it becomes just as much
> client effort as submitting two separate 'transaction' commands in your
> model.
> 
> Can anyone see a practical difference?  I think I'm happy with John's
> model.
> 
> Stefan
> 

We discussed this off-list, but for the sake of the archive:

== How it is now ==

Currently, transactions have two implicit phases: the first is the
synchronous phase. If this phase completes successfully, we consider the
transaction a success. The second phase is the asynchronous phase where
jobs launched by the synchronous phase run to completion.

all synchronous commands must complete for the transaction to "succeed."
There are currently (pre-patch) no guarantees about asynchronous command
completion. As long as all synchronous actions complete, asynchronous
actions are free to succeed or fail individually.

== My Model ==

The current behavior is my "err-cancel = none" scenario: we offer no
guarantee about the success or failure of the transaction as a whole
after the synchronous portion has completed.

What I was proposing is "err-cancel = all," which to me means that _ALL_
commands in this transaction are to succeed (synchronous or not) before
_any_ actions are irrevocably committed. This means that for a
hypothetical mixed synchronous-asynchronous transaction, that even after
the transaction succeeded (it passed the synchronous phase), if an
asynchronous action later fails, all actions both synchronous and non
are rolled-back -- a kind of retroactive failure of the transaction.
This is clearly not possible in all cases, so commands that cannot
support these semantics will refuse "err-cancel = all" during the
synchronous phase.

In practice, only asynchronous actions can tolerate these semantics, but
from a user perspective, it's clear that any transaction successfully
launched with "err-cancel = all" applies to *all* actions, regardless.

== Stefan's Model ==

Stefan's model was to imply that the "err-cancel" parameter applied only
to the *asynchronous* phase, because the synchronous phase has already
reported back success to the user as a return from the qmp-transaction
command. This would mean that to Stefan, "err-cancel = all" was implying
that only the asynchronous actions had to participate in the "all or
none" behavior of the transaction -- synchronous portions were exempt.

== Equivalence ==

Both models wind up being equivalent:

In Stefan's model, you need no foreknowledge of which actions are
synchronous or not. Upon failure during the asynchronous phase you will
need to understand which actions rolled back and which ones didn't, however.

In my model, you need foreknowledge of which actions are synchronous and
which ones are not, because synchronous actions will refuse the
"err-cancel = all" parameter. There is no sifting through failure states
when the command fails.

It's mostly a matter of when you need to know the difference between the
two classes of actions. In one model, it's before. In the other, it's
after a failure.

My model also allows for an emulation of Stefan's model, using the
hypothetical "err-cancel = jobs-only" mode, which would only enforce the
transaction semantics in the asynchronous phase.


For this reason, I think the two approaches to thinking about the
problem wind up having the same effect. I would perhaps argue that my
model is more explicit -- but I'm biased. I wrote it :)

--js
Markus Armbruster Nov. 5, 2015, 7:35 p.m. UTC | #8
John Snow <jsnow@redhat.com> writes:

> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>>>
>>>
>>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>>>                                               common, common);
>>>>>  
>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>      action = common->action->block_dirty_bitmap_add;
>>>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>>>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>>>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>>>                                               common, common);
>>>>>      BlockDirtyBitmap *action;
>>>>>  
>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>      action = common->action->block_dirty_bitmap_clear;
>>>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>>>                                                action->name,
>>>>
>>>> Why do the bitmap add/clear actions not support err-cancel=all?
>>>>
>>>> I understand why other block jobs don't support it, but it's not clear
>>>> why these non-block job actions cannot.
>>>>
>>>
>>> Because they don't have a callback to invoke if the rest of the job fails.
>>>
>>> I could create a BlockJob for them complete with a callback to invoke,
>>> but basically it's just because there's no interface to unwind them, or
>>> an interface to join them with the transaction.
>>>
>>> They're small, synchronous non-job actions. Which makes them weird.
>> 
>> Funny, we've been looking at the same picture while seeing different
>> things:
>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
>> 
>> I think I understand your idea: the transaction should include both
>> immediate actions as well as block jobs.
>> 
>> My mental model was different: immediate actions commit/abort along with
>> the 'transaction' command.  Block jobs are separate and complete/cancel
>> together in a group.
>> 
>> In practice I think the two end up being similar because we won't be
>> able to implement immediate action commit/abort together with
>> long-running block jobs because the immediate actions rely on
>> quiescing/pausing the guest for atomic commit/abort.
>> 
>> So with your mental model the QMP client has to submit 2 'transaction'
>> commands: 1 for the immediate actions, 1 for the block jobs.
>> 
>> In my mental model the QMP client submits 1 command but the immediate
>> actions and block jobs are two separate transaction scopes.  This means
>> if the block jobs fail, the client needs to be aware of the immediate
>> actions that have committed.  Because of this, it becomes just as much
>> client effort as submitting two separate 'transaction' commands in your
>> model.
>> 
>> Can anyone see a practical difference?  I think I'm happy with John's
>> model.
>> 
>> Stefan
>> 
>
> We discussed this off-list, but for the sake of the archive:
>
> == How it is now ==
>
> Currently, transactions have two implicit phases: the first is the
> synchronous phase. If this phase completes successfully, we consider the
> transaction a success. The second phase is the asynchronous phase where
> jobs launched by the synchronous phase run to completion.
>
> all synchronous commands must complete for the transaction to "succeed."
> There are currently (pre-patch) no guarantees about asynchronous command
> completion. As long as all synchronous actions complete, asynchronous
> actions are free to succeed or fail individually.
>
> == My Model ==
>
> The current behavior is my "err-cancel = none" scenario: we offer no
> guarantee about the success or failure of the transaction as a whole
> after the synchronous portion has completed.
>
> What I was proposing is "err-cancel = all," which to me means that _ALL_
> commands in this transaction are to succeed (synchronous or not) before
> _any_ actions are irrevocably committed. This means that for a
> hypothetical mixed synchronous-asynchronous transaction, that even after
> the transaction succeeded (it passed the synchronous phase), if an
> asynchronous action later fails, all actions both synchronous and non
> are rolled-back -- a kind of retroactive failure of the transaction.
> This is clearly not possible in all cases, so commands that cannot
> support these semantics will refuse "err-cancel = all" during the
> synchronous phase.
>
> In practice, only asynchronous actions can tolerate these semantics, but
> from a user perspective, it's clear that any transaction successfully
> launched with "err-cancel = all" applies to *all* actions, regardless.
>
> == Stefan's Model ==
>
> Stefan's model was to imply that the "err-cancel" parameter applied only
> to the *asynchronous* phase, because the synchronous phase has already
> reported back success to the user as a return from the qmp-transaction
> command. This would mean that to Stefan, "err-cancel = all" was implying
> that only the asynchronous actions had to participate in the "all or
> none" behavior of the transaction -- synchronous portions were exempt.
>
> == Equivalence ==
>
> Both models wind up being equivalent:
>
> In Stefan's model, you need no foreknowledge of which actions are
> synchronous or not. Upon failure during the asynchronous phase you will
> need to understand which actions rolled back and which ones didn't, however.
>
> In my model, you need foreknowledge of which actions are synchronous and
> which ones are not, because synchronous actions will refuse the
> "err-cancel = all" parameter. There is no sifting through failure states
> when the command fails.
>
> It's mostly a matter of when you need to know the difference between the
> two classes of actions. In one model, it's before. In the other, it's
> after a failure.
>
> My model also allows for an emulation of Stefan's model, using the
> hypothetical "err-cancel = jobs-only" mode, which would only enforce the
> transaction semantics in the asynchronous phase.
>
>
> For this reason, I think the two approaches to thinking about the
> problem wind up having the same effect. I would perhaps argue that my
> model is more explicit -- but I'm biased. I wrote it :)

I haven't followed this topic, and my opinions are therefore quite
uninformed.  Here goes anway.

In John's model, you need to know whether an action is synchronous.  If
you get it wrong, your attempt to err-cancel=all will fail.  Sounds like
a nice early failure to me.

In Stefan's model, you "need to understand which actions rolled back" to
make sense of a failure.  How?  Are exactly the asynchronous ones rolled
back?  What happens if you get it wrong?
John Snow Nov. 5, 2015, 7:43 p.m. UTC | #9
On 11/05/2015 02:35 PM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
>>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>>>>
>>>>
>>>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>>>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>>>>                                               common, common);
>>>>>>  
>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>      action = common->action->block_dirty_bitmap_add;
>>>>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>>>>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>>>>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>>>>                                               common, common);
>>>>>>      BlockDirtyBitmap *action;
>>>>>>  
>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>      action = common->action->block_dirty_bitmap_clear;
>>>>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>>>>                                                action->name,
>>>>>
>>>>> Why do the bitmap add/clear actions not support err-cancel=all?
>>>>>
>>>>> I understand why other block jobs don't support it, but it's not clear
>>>>> why these non-block job actions cannot.
>>>>>
>>>>
>>>> Because they don't have a callback to invoke if the rest of the job fails.
>>>>
>>>> I could create a BlockJob for them complete with a callback to invoke,
>>>> but basically it's just because there's no interface to unwind them, or
>>>> an interface to join them with the transaction.
>>>>
>>>> They're small, synchronous non-job actions. Which makes them weird.
>>>
>>> Funny, we've been looking at the same picture while seeing different
>>> things:
>>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
>>>
>>> I think I understand your idea: the transaction should include both
>>> immediate actions as well as block jobs.
>>>
>>> My mental model was different: immediate actions commit/abort along with
>>> the 'transaction' command.  Block jobs are separate and complete/cancel
>>> together in a group.
>>>
>>> In practice I think the two end up being similar because we won't be
>>> able to implement immediate action commit/abort together with
>>> long-running block jobs because the immediate actions rely on
>>> quiescing/pausing the guest for atomic commit/abort.
>>>
>>> So with your mental model the QMP client has to submit 2 'transaction'
>>> commands: 1 for the immediate actions, 1 for the block jobs.
>>>
>>> In my mental model the QMP client submits 1 command but the immediate
>>> actions and block jobs are two separate transaction scopes.  This means
>>> if the block jobs fail, the client needs to be aware of the immediate
>>> actions that have committed.  Because of this, it becomes just as much
>>> client effort as submitting two separate 'transaction' commands in your
>>> model.
>>>
>>> Can anyone see a practical difference?  I think I'm happy with John's
>>> model.
>>>
>>> Stefan
>>>
>>
>> We discussed this off-list, but for the sake of the archive:
>>
>> == How it is now ==
>>
>> Currently, transactions have two implicit phases: the first is the
>> synchronous phase. If this phase completes successfully, we consider the
>> transaction a success. The second phase is the asynchronous phase where
>> jobs launched by the synchronous phase run to completion.
>>
>> all synchronous commands must complete for the transaction to "succeed."
>> There are currently (pre-patch) no guarantees about asynchronous command
>> completion. As long as all synchronous actions complete, asynchronous
>> actions are free to succeed or fail individually.
>>
>> == My Model ==
>>
>> The current behavior is my "err-cancel = none" scenario: we offer no
>> guarantee about the success or failure of the transaction as a whole
>> after the synchronous portion has completed.
>>
>> What I was proposing is "err-cancel = all," which to me means that _ALL_
>> commands in this transaction are to succeed (synchronous or not) before
>> _any_ actions are irrevocably committed. This means that for a
>> hypothetical mixed synchronous-asynchronous transaction, that even after
>> the transaction succeeded (it passed the synchronous phase), if an
>> asynchronous action later fails, all actions both synchronous and non
>> are rolled-back -- a kind of retroactive failure of the transaction.
>> This is clearly not possible in all cases, so commands that cannot
>> support these semantics will refuse "err-cancel = all" during the
>> synchronous phase.
>>
>> In practice, only asynchronous actions can tolerate these semantics, but
>> from a user perspective, it's clear that any transaction successfully
>> launched with "err-cancel = all" applies to *all* actions, regardless.
>>
>> == Stefan's Model ==
>>
>> Stefan's model was to imply that the "err-cancel" parameter applied only
>> to the *asynchronous* phase, because the synchronous phase has already
>> reported back success to the user as a return from the qmp-transaction
>> command. This would mean that to Stefan, "err-cancel = all" was implying
>> that only the asynchronous actions had to participate in the "all or
>> none" behavior of the transaction -- synchronous portions were exempt.
>>
>> == Equivalence ==
>>
>> Both models wind up being equivalent:
>>
>> In Stefan's model, you need no foreknowledge of which actions are
>> synchronous or not. Upon failure during the asynchronous phase you will
>> need to understand which actions rolled back and which ones didn't, however.
>>
>> In my model, you need foreknowledge of which actions are synchronous and
>> which ones are not, because synchronous actions will refuse the
>> "err-cancel = all" parameter. There is no sifting through failure states
>> when the command fails.
>>
>> It's mostly a matter of when you need to know the difference between the
>> two classes of actions. In one model, it's before. In the other, it's
>> after a failure.
>>
>> My model also allows for an emulation of Stefan's model, using the
>> hypothetical "err-cancel = jobs-only" mode, which would only enforce the
>> transaction semantics in the asynchronous phase.
>>
>>
>> For this reason, I think the two approaches to thinking about the
>> problem wind up having the same effect. I would perhaps argue that my
>> model is more explicit -- but I'm biased. I wrote it :)
> 
> I haven't followed this topic, and my opinions are therefore quite
> uninformed.  Here goes anway.
> 
> In John's model, you need to know whether an action is synchronous.  If
> you get it wrong, your attempt to err-cancel=all will fail.  Sounds like
> a nice early failure to me.
> 

Yes.

> In Stefan's model, you "need to understand which actions rolled back" to
> make sense of a failure.  How?  Are exactly the asynchronous ones rolled
> back?  What happens if you get it wrong?
> 

You need to understand which actions were synchronous (i.e. not jobs)
and which ones were jobs. Only jobs will be rolled back, the synchronous
actions will not be -- so to retry, you need to have a sense of /which/
actions to retry.

If you get it wrong, you get weird stuff you didn't expect, of course.

Either model implies you need to understand which actions are which
kind, but the patchset as it is right now will yell at you if you got
that wrong from the get-go.

To be fair, in Stefan's model, the QMP events will hint to you which
actions actually failed, because you'll be able to count the block job
failure events. The qmp-transaction return of "{return: {}}" is the
implicit "All non-jobs finished OK."

--js
Kevin Wolf Nov. 6, 2015, 8:32 a.m. UTC | #10
Am 05.11.2015 um 19:52 hat John Snow geschrieben:
> 
> 
> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
> > On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
> >>
> >>
> >> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
> >>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
> >>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
> >>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
> >>>>                                               common, common);
> >>>>  
> >>>> +    if (action_check_cancel_mode(common, errp) < 0) {
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>>      action = common->action->block_dirty_bitmap_add;
> >>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
> >>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
> >>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
> >>>>                                               common, common);
> >>>>      BlockDirtyBitmap *action;
> >>>>  
> >>>> +    if (action_check_cancel_mode(common, errp) < 0) {
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>>      action = common->action->block_dirty_bitmap_clear;
> >>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
> >>>>                                                action->name,
> >>>
> >>> Why do the bitmap add/clear actions not support err-cancel=all?
> >>>
> >>> I understand why other block jobs don't support it, but it's not clear
> >>> why these non-block job actions cannot.
> >>>
> >>
> >> Because they don't have a callback to invoke if the rest of the job fails.
> >>
> >> I could create a BlockJob for them complete with a callback to invoke,
> >> but basically it's just because there's no interface to unwind them, or
> >> an interface to join them with the transaction.
> >>
> >> They're small, synchronous non-job actions. Which makes them weird.
> > 
> > Funny, we've been looking at the same picture while seeing different
> > things:
> > https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
> > 
> > I think I understand your idea: the transaction should include both
> > immediate actions as well as block jobs.
> > 
> > My mental model was different: immediate actions commit/abort along with
> > the 'transaction' command.  Block jobs are separate and complete/cancel
> > together in a group.
> > 
> > In practice I think the two end up being similar because we won't be
> > able to implement immediate action commit/abort together with
> > long-running block jobs because the immediate actions rely on
> > quiescing/pausing the guest for atomic commit/abort.
> > 
> > So with your mental model the QMP client has to submit 2 'transaction'
> > commands: 1 for the immediate actions, 1 for the block jobs.
> > 
> > In my mental model the QMP client submits 1 command but the immediate
> > actions and block jobs are two separate transaction scopes.  This means
> > if the block jobs fail, the client needs to be aware of the immediate
> > actions that have committed.  Because of this, it becomes just as much
> > client effort as submitting two separate 'transaction' commands in your
> > model.
> > 
> > Can anyone see a practical difference?  I think I'm happy with John's
> > model.
> > 
> > Stefan
> > 
> 
> We discussed this off-list, but for the sake of the archive:
> 
> == How it is now ==
> 
> Currently, transactions have two implicit phases: the first is the
> synchronous phase. If this phase completes successfully, we consider the
> transaction a success. The second phase is the asynchronous phase where
> jobs launched by the synchronous phase run to completion.
> 
> all synchronous commands must complete for the transaction to "succeed."
> There are currently (pre-patch) no guarantees about asynchronous command
> completion. As long as all synchronous actions complete, asynchronous
> actions are free to succeed or fail individually.

You're overcomplicating this. All actions are currently synchronous and
what you consider asynchronous transaction actions aren't actually part
of the transaction at all. The action is "start block job X", not "run
block job X".

> == My Model ==
> 
> The current behavior is my "err-cancel = none" scenario: we offer no
> guarantee about the success or failure of the transaction as a whole
> after the synchronous portion has completed.
> 
> What I was proposing is "err-cancel = all," which to me means that _ALL_
> commands in this transaction are to succeed (synchronous or not) before
> _any_ actions are irrevocably committed. This means that for a
> hypothetical mixed synchronous-asynchronous transaction, that even after
> the transaction succeeded (it passed the synchronous phase), if an
> asynchronous action later fails, all actions both synchronous and non
> are rolled-back -- a kind of retroactive failure of the transaction.
> This is clearly not possible in all cases, so commands that cannot
> support these semantics will refuse "err-cancel = all" during the
> synchronous phase.

Is this possible in any case? You're losing transaction semantics the
lastest when you drop the BQL that the monitor holds. At least atomicity
and isolation aren't given any more.

You can try to undo some parts of what you did later one, but if any
involved BDS was used in the meantime by anything other than the block
job, you don't have transactional behaviour any more.

And isn't the management tool perfectly capable of cleaning up all the
block jobs without magic happening in qemu if one of them fails? Do we
actually need atomic failure later on? And if so, do we need atomic
failure only of block jobs started in the same transaction? Why?

Kevin
Stefan Hajnoczi Nov. 6, 2015, 4:36 p.m. UTC | #11
On Fri, Nov 06, 2015 at 09:32:19AM +0100, Kevin Wolf wrote:
> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
> > 
> > 
> > On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
> > > On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
> > >>
> > >>
> > >> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
> > >>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
> > >>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
> > >>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
> > >>>>                                               common, common);
> > >>>>  
> > >>>> +    if (action_check_cancel_mode(common, errp) < 0) {
> > >>>> +        return;
> > >>>> +    }
> > >>>> +
> > >>>>      action = common->action->block_dirty_bitmap_add;
> > >>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
> > >>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
> > >>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
> > >>>>                                               common, common);
> > >>>>      BlockDirtyBitmap *action;
> > >>>>  
> > >>>> +    if (action_check_cancel_mode(common, errp) < 0) {
> > >>>> +        return;
> > >>>> +    }
> > >>>> +
> > >>>>      action = common->action->block_dirty_bitmap_clear;
> > >>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
> > >>>>                                                action->name,
> > >>>
> > >>> Why do the bitmap add/clear actions not support err-cancel=all?
> > >>>
> > >>> I understand why other block jobs don't support it, but it's not clear
> > >>> why these non-block job actions cannot.
> > >>>
> > >>
> > >> Because they don't have a callback to invoke if the rest of the job fails.
> > >>
> > >> I could create a BlockJob for them complete with a callback to invoke,
> > >> but basically it's just because there's no interface to unwind them, or
> > >> an interface to join them with the transaction.
> > >>
> > >> They're small, synchronous non-job actions. Which makes them weird.
> > > 
> > > Funny, we've been looking at the same picture while seeing different
> > > things:
> > > https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
> > > 
> > > I think I understand your idea: the transaction should include both
> > > immediate actions as well as block jobs.
> > > 
> > > My mental model was different: immediate actions commit/abort along with
> > > the 'transaction' command.  Block jobs are separate and complete/cancel
> > > together in a group.
> > > 
> > > In practice I think the two end up being similar because we won't be
> > > able to implement immediate action commit/abort together with
> > > long-running block jobs because the immediate actions rely on
> > > quiescing/pausing the guest for atomic commit/abort.
> > > 
> > > So with your mental model the QMP client has to submit 2 'transaction'
> > > commands: 1 for the immediate actions, 1 for the block jobs.
> > > 
> > > In my mental model the QMP client submits 1 command but the immediate
> > > actions and block jobs are two separate transaction scopes.  This means
> > > if the block jobs fail, the client needs to be aware of the immediate
> > > actions that have committed.  Because of this, it becomes just as much
> > > client effort as submitting two separate 'transaction' commands in your
> > > model.
> > > 
> > > Can anyone see a practical difference?  I think I'm happy with John's
> > > model.
> > > 
> > > Stefan
> > > 
> > 
> > We discussed this off-list, but for the sake of the archive:
> > 
> > == How it is now ==
> > 
> > Currently, transactions have two implicit phases: the first is the
> > synchronous phase. If this phase completes successfully, we consider the
> > transaction a success. The second phase is the asynchronous phase where
> > jobs launched by the synchronous phase run to completion.
> > 
> > all synchronous commands must complete for the transaction to "succeed."
> > There are currently (pre-patch) no guarantees about asynchronous command
> > completion. As long as all synchronous actions complete, asynchronous
> > actions are free to succeed or fail individually.
> 
> You're overcomplicating this. All actions are currently synchronous and
> what you consider asynchronous transaction actions aren't actually part
> of the transaction at all. The action is "start block job X", not "run
> block job X".

Yes, this is how I've thought of it too.

> > == My Model ==
> > 
> > The current behavior is my "err-cancel = none" scenario: we offer no
> > guarantee about the success or failure of the transaction as a whole
> > after the synchronous portion has completed.
> > 
> > What I was proposing is "err-cancel = all," which to me means that _ALL_
> > commands in this transaction are to succeed (synchronous or not) before
> > _any_ actions are irrevocably committed. This means that for a
> > hypothetical mixed synchronous-asynchronous transaction, that even after
> > the transaction succeeded (it passed the synchronous phase), if an
> > asynchronous action later fails, all actions both synchronous and non
> > are rolled-back -- a kind of retroactive failure of the transaction.
> > This is clearly not possible in all cases, so commands that cannot
> > support these semantics will refuse "err-cancel = all" during the
> > synchronous phase.
> 
> Is this possible in any case? You're losing transaction semantics the
> lastest when you drop the BQL that the monitor holds. At least atomicity
> and isolation aren't given any more.
> 
> You can try to undo some parts of what you did later one, but if any
> involved BDS was used in the meantime by anything other than the block
> job, you don't have transactional behaviour any more.
> 
> And isn't the management tool perfectly capable of cleaning up all the
> block jobs without magic happening in qemu if one of them fails? Do we
> actually need atomic failure later on? And if so, do we need atomic
> failure only of block jobs started in the same transaction? Why?

I think we do because the backup block job, when given a dirty bitmap to
copy out, will either discard that dirty bitmap upon completion or merge
the dirty bitmap back into the currently active dirty bitmap for the
device on failure (that way your incremental backup can be retried).

Now when you take an incremental backup of multiple drives at the same
instant in time, it would be a pain to have one or more jobs complete
(and discard the bitmap) but others fail.  Then you would no longer have
a single point in time when the incremental backup was taken...

That is the motivation for this feature.

Stefan
John Snow Nov. 6, 2015, 6:46 p.m. UTC | #12
On 11/06/2015 03:32 AM, Kevin Wolf wrote:
> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
>>
>>
>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
>>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>>>>
>>>>
>>>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>>>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>>>>                                               common, common);
>>>>>>  
>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>      action = common->action->block_dirty_bitmap_add;
>>>>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>>>>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>>>>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>>>>                                               common, common);
>>>>>>      BlockDirtyBitmap *action;
>>>>>>  
>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>      action = common->action->block_dirty_bitmap_clear;
>>>>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>>>>                                                action->name,
>>>>>
>>>>> Why do the bitmap add/clear actions not support err-cancel=all?
>>>>>
>>>>> I understand why other block jobs don't support it, but it's not clear
>>>>> why these non-block job actions cannot.
>>>>>
>>>>
>>>> Because they don't have a callback to invoke if the rest of the job fails.
>>>>
>>>> I could create a BlockJob for them complete with a callback to invoke,
>>>> but basically it's just because there's no interface to unwind them, or
>>>> an interface to join them with the transaction.
>>>>
>>>> They're small, synchronous non-job actions. Which makes them weird.
>>>
>>> Funny, we've been looking at the same picture while seeing different
>>> things:
>>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
>>>
>>> I think I understand your idea: the transaction should include both
>>> immediate actions as well as block jobs.
>>>
>>> My mental model was different: immediate actions commit/abort along with
>>> the 'transaction' command.  Block jobs are separate and complete/cancel
>>> together in a group.
>>>
>>> In practice I think the two end up being similar because we won't be
>>> able to implement immediate action commit/abort together with
>>> long-running block jobs because the immediate actions rely on
>>> quiescing/pausing the guest for atomic commit/abort.
>>>
>>> So with your mental model the QMP client has to submit 2 'transaction'
>>> commands: 1 for the immediate actions, 1 for the block jobs.
>>>
>>> In my mental model the QMP client submits 1 command but the immediate
>>> actions and block jobs are two separate transaction scopes.  This means
>>> if the block jobs fail, the client needs to be aware of the immediate
>>> actions that have committed.  Because of this, it becomes just as much
>>> client effort as submitting two separate 'transaction' commands in your
>>> model.
>>>
>>> Can anyone see a practical difference?  I think I'm happy with John's
>>> model.
>>>
>>> Stefan
>>>
>>
>> We discussed this off-list, but for the sake of the archive:
>>
>> == How it is now ==
>>
>> Currently, transactions have two implicit phases: the first is the
>> synchronous phase. If this phase completes successfully, we consider the
>> transaction a success. The second phase is the asynchronous phase where
>> jobs launched by the synchronous phase run to completion.
>>
>> all synchronous commands must complete for the transaction to "succeed."
>> There are currently (pre-patch) no guarantees about asynchronous command
>> completion. As long as all synchronous actions complete, asynchronous
>> actions are free to succeed or fail individually.
> 
> You're overcomplicating this. All actions are currently synchronous and
> what you consider asynchronous transaction actions aren't actually part
> of the transaction at all. The action is "start block job X", not "run
> block job X".
> 

"OK," except the entire goal of the series was to allow the "aren't
actually part of the transaction at all" parts to live and die together
based on the transaction they were launched from. This really implies
they belong to a second asynchronous phase of the transaction.

Otherwise, why would totally unrelated jobs fail because a different one
did?

I realize this is an /extension/ of the existing semantics vs a "fix,"
and part of the confusion is how I and everyone else was looking at it
differently. How could this happen, though? Let's look at our
transaction documentation:

"Operations that are currently supported:" [...] "- drive-backup" [...]
"If there is any failure
performing any of the operations, all operations for the group are
abandoned."

Where do we say "Except drive-backup, though, because technically the
drive-backup action only starts the job, and we don't really consider
this to be part of the transaction" ? We've never actually defined this
behavior as part of our API as far as I can see.

Regardless: the net effect is still the same. We want jobs launched by
transactions to (optionally!) cancel themselves on failure. The
complications only arise because people want the exact semantic meaning
to be precise for the QMP API.

The concept is simple, the language to describe it has been muddy.

>> == My Model ==
>>
>> The current behavior is my "err-cancel = none" scenario: we offer no
>> guarantee about the success or failure of the transaction as a whole
>> after the synchronous portion has completed.
>>
>> What I was proposing is "err-cancel = all," which to me means that _ALL_
>> commands in this transaction are to succeed (synchronous or not) before
>> _any_ actions are irrevocably committed. This means that for a
>> hypothetical mixed synchronous-asynchronous transaction, that even after
>> the transaction succeeded (it passed the synchronous phase), if an
>> asynchronous action later fails, all actions both synchronous and non
>> are rolled-back -- a kind of retroactive failure of the transaction.
>> This is clearly not possible in all cases, so commands that cannot
>> support these semantics will refuse "err-cancel = all" during the
>> synchronous phase.
> 
> Is this possible in any case? You're losing transaction semantics the
> lastest when you drop the BQL that the monitor holds. At least atomicity
> and isolation aren't given any more.
> 

It might be possible in at least *some* cases. I currently don't even
attempt it, though. This is why some transaction actions just refuse
this parameter if it shows up.

It'd be pretty easy to undo a bitmap-add or a bitmap-clear, as long as I
gave these actions a conditional success callback.

The "undo" semantics of the jobs are somewhat different. I am not
suggesting we can teleport back in time to before we tried to do the
backup, but we can at least abort the backup and make it like you never
issued the command -- which is important for incremental backups.

The rollback behavior for each action needs to be spelled out in our
documentation ... as well as categorizing which actions complete before
qmp_transactions return and which may have lingering effects (like
drive-backup.)

> You can try to undo some parts of what you did later one, but if any
> involved BDS was used in the meantime by anything other than the block
> job, you don't have transactional behaviour any more.
> 
> And isn't the management tool perfectly capable of cleaning up all the
> block jobs without magic happening in qemu if one of them fails? Do we
> actually need atomic failure later on? And if so, do we need atomic
> failure only of block jobs started in the same transaction? Why?
> 

There's always a debate about who is responsible for cleaning things up
on failures: QEMU or the management tool? Luckily: it's an optional
parameter, so the tool can decide at run-time about what semantics it wants.

IMO: It's a little late in this series to question if we need this or
not, but I'll oblige.

The original objection from Stefan to the incremental backup
https://soundcloud.com/tothejazz/gyms-flappy-the-happy-sealtransaction
semantics was that if two incremental backup jobs launched by a
transaction had only partial success, the management tool would have to
take on extra state and possibly issue some corrective actions to QEMU
in order to undo the damage.

QEMU, however, could just unravel the failure fairly trivially and allow
the backup commands to maintain a simple binary success/failure state.
This was agreed to be the better, less complicated management scenario.

In the case that incremental backups partially complete, you'll have one
bitmap deleted and a different bitmap merged back onto the BDS. The
management tool can at this point absolutely not delete the one
incremental that got made and needs to leave it in place before
re-issuing the incremental backup. It's then responsible for either
squashing the two incremental backups it made, or otherwise managing the
disparity in the number of actual backup files.

For QEMU's trouble, we don't delete incremental backup data until after
the backup operation, so it's trivial to hold onto the data until we
know everything's OK.

We decided it was nicest for QEMU to just roll back all of the jobs in
this case. If the management tool disagrees, it can just roll with the
original semantics.


I still believe strongly that this is the right way to go. It's
flexible, allows for either party to manage the failure, the parameter
is completely non-ambiguous, provides for early failure if the expected
semantics are not possible, and the code is already here and mostly
reviewed... except for the API names.

--js
John Snow Nov. 6, 2015, 8:35 p.m. UTC | #13
On 11/06/2015 01:46 PM, John Snow wrote:
> 
> 
> On 11/06/2015 03:32 AM, Kevin Wolf wrote:
>> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
>>>
>>>
>>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
>>>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>>>>>
>>>>>
>>>>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>>>>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>>>>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>>>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>>>>>                                               common, common);
>>>>>>>  
>>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      action = common->action->block_dirty_bitmap_add;
>>>>>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>>>>>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>>>>>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>>>>>                                               common, common);
>>>>>>>      BlockDirtyBitmap *action;
>>>>>>>  
>>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      action = common->action->block_dirty_bitmap_clear;
>>>>>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>>>>>                                                action->name,
>>>>>>
>>>>>> Why do the bitmap add/clear actions not support err-cancel=all?
>>>>>>
>>>>>> I understand why other block jobs don't support it, but it's not clear
>>>>>> why these non-block job actions cannot.
>>>>>>
>>>>>
>>>>> Because they don't have a callback to invoke if the rest of the job fails.
>>>>>
>>>>> I could create a BlockJob for them complete with a callback to invoke,
>>>>> but basically it's just because there's no interface to unwind them, or
>>>>> an interface to join them with the transaction.
>>>>>
>>>>> They're small, synchronous non-job actions. Which makes them weird.
>>>>
>>>> Funny, we've been looking at the same picture while seeing different
>>>> things:
>>>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
>>>>
>>>> I think I understand your idea: the transaction should include both
>>>> immediate actions as well as block jobs.
>>>>
>>>> My mental model was different: immediate actions commit/abort along with
>>>> the 'transaction' command.  Block jobs are separate and complete/cancel
>>>> together in a group.
>>>>
>>>> In practice I think the two end up being similar because we won't be
>>>> able to implement immediate action commit/abort together with
>>>> long-running block jobs because the immediate actions rely on
>>>> quiescing/pausing the guest for atomic commit/abort.
>>>>
>>>> So with your mental model the QMP client has to submit 2 'transaction'
>>>> commands: 1 for the immediate actions, 1 for the block jobs.
>>>>
>>>> In my mental model the QMP client submits 1 command but the immediate
>>>> actions and block jobs are two separate transaction scopes.  This means
>>>> if the block jobs fail, the client needs to be aware of the immediate
>>>> actions that have committed.  Because of this, it becomes just as much
>>>> client effort as submitting two separate 'transaction' commands in your
>>>> model.
>>>>
>>>> Can anyone see a practical difference?  I think I'm happy with John's
>>>> model.
>>>>
>>>> Stefan
>>>>
>>>
>>> We discussed this off-list, but for the sake of the archive:
>>>
>>> == How it is now ==
>>>
>>> Currently, transactions have two implicit phases: the first is the
>>> synchronous phase. If this phase completes successfully, we consider the
>>> transaction a success. The second phase is the asynchronous phase where
>>> jobs launched by the synchronous phase run to completion.
>>>
>>> all synchronous commands must complete for the transaction to "succeed."
>>> There are currently (pre-patch) no guarantees about asynchronous command
>>> completion. As long as all synchronous actions complete, asynchronous
>>> actions are free to succeed or fail individually.
>>
>> You're overcomplicating this. All actions are currently synchronous and
>> what you consider asynchronous transaction actions aren't actually part
>> of the transaction at all. The action is "start block job X", not "run
>> block job X".
>>
> 
> "OK," except the entire goal of the series was to allow the "aren't
> actually part of the transaction at all" parts to live and die together
> based on the transaction they were launched from. This really implies
> they belong to a second asynchronous phase of the transaction.
> 
> Otherwise, why would totally unrelated jobs fail because a different one
> did?
> 
> I realize this is an /extension/ of the existing semantics vs a "fix,"
> and part of the confusion is how I and everyone else was looking at it
> differently. How could this happen, though? Let's look at our
> transaction documentation:
> 
> "Operations that are currently supported:" [...] "- drive-backup" [...]
> "If there is any failure
> performing any of the operations, all operations for the group are
> abandoned."
> 
> Where do we say "Except drive-backup, though, because technically the
> drive-backup action only starts the job, and we don't really consider
> this to be part of the transaction" ? We've never actually defined this
> behavior as part of our API as far as I can see.
> 
> Regardless: the net effect is still the same. We want jobs launched by
> transactions to (optionally!) cancel themselves on failure. The
> complications only arise because people want the exact semantic meaning
> to be precise for the QMP API.
> 
> The concept is simple, the language to describe it has been muddy.
> 
>>> == My Model ==
>>>
>>> The current behavior is my "err-cancel = none" scenario: we offer no
>>> guarantee about the success or failure of the transaction as a whole
>>> after the synchronous portion has completed.
>>>
>>> What I was proposing is "err-cancel = all," which to me means that _ALL_
>>> commands in this transaction are to succeed (synchronous or not) before
>>> _any_ actions are irrevocably committed. This means that for a
>>> hypothetical mixed synchronous-asynchronous transaction, that even after
>>> the transaction succeeded (it passed the synchronous phase), if an
>>> asynchronous action later fails, all actions both synchronous and non
>>> are rolled-back -- a kind of retroactive failure of the transaction.
>>> This is clearly not possible in all cases, so commands that cannot
>>> support these semantics will refuse "err-cancel = all" during the
>>> synchronous phase.
>>
>> Is this possible in any case? You're losing transaction semantics the
>> lastest when you drop the BQL that the monitor holds. At least atomicity
>> and isolation aren't given any more.
>>
> 
> It might be possible in at least *some* cases. I currently don't even
> attempt it, though. This is why some transaction actions just refuse
> this parameter if it shows up.
> 
> It'd be pretty easy to undo a bitmap-add or a bitmap-clear, as long as I
> gave these actions a conditional success callback.
> 
> The "undo" semantics of the jobs are somewhat different. I am not
> suggesting we can teleport back in time to before we tried to do the
> backup, but we can at least abort the backup and make it like you never
> issued the command -- which is important for incremental backups.
> 
> The rollback behavior for each action needs to be spelled out in our
> documentation ... as well as categorizing which actions complete before
> qmp_transactions return and which may have lingering effects (like
> drive-backup.)
> 
>> You can try to undo some parts of what you did later one, but if any
>> involved BDS was used in the meantime by anything other than the block
>> job, you don't have transactional behaviour any more.
>>
>> And isn't the management tool perfectly capable of cleaning up all the
>> block jobs without magic happening in qemu if one of them fails? Do we
>> actually need atomic failure later on? And if so, do we need atomic
>> failure only of block jobs started in the same transaction? Why?
>>
> 
> There's always a debate about who is responsible for cleaning things up
> on failures: QEMU or the management tool? Luckily: it's an optional
> parameter, so the tool can decide at run-time about what semantics it wants.
> 
> IMO: It's a little late in this series to question if we need this or
> not, but I'll oblige.
> 
> The original objection from Stefan to the incremental backup

> https://soundcloud.com/tothejazz/gyms-flappy-the-happy-sealtransaction
> semantics was that if two incremental backup jobs launched by a

LOL. I mispasted a soundcloud link in here. Well... Enjoy this stupid
song that made me laugh that I was trying to link to someone else.
Sigh!!! (So embarrassed.)

> transaction had only partial success, the management tool would have to
> take on extra state and possibly issue some corrective actions to QEMU
> in order to undo the damage.
> 
> QEMU, however, could just unravel the failure fairly trivially and allow
> the backup commands to maintain a simple binary success/failure state.
> This was agreed to be the better, less complicated management scenario.
> 
> In the case that incremental backups partially complete, you'll have one
> bitmap deleted and a different bitmap merged back onto the BDS. The
> management tool can at this point absolutely not delete the one
> incremental that got made and needs to leave it in place before
> re-issuing the incremental backup. It's then responsible for either
> squashing the two incremental backups it made, or otherwise managing the
> disparity in the number of actual backup files.
> 
> For QEMU's trouble, we don't delete incremental backup data until after
> the backup operation, so it's trivial to hold onto the data until we
> know everything's OK.
> 
> We decided it was nicest for QEMU to just roll back all of the jobs in
> this case. If the management tool disagrees, it can just roll with the
> original semantics.
> 
> 
> I still believe strongly that this is the right way to go. It's
> flexible, allows for either party to manage the failure, the parameter
> is completely non-ambiguous, provides for early failure if the expected
> semantics are not possible, and the code is already here and mostly
> reviewed... except for the API names.
> 
> --js
>
diff mbox

Patch

diff --git a/blockdev.c b/blockdev.c
index d1dcf68..9b5e2fa 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1033,7 +1033,7 @@  static void blockdev_do_action(int kind, void *data, Error **errp)
     action.data = data;
     list.value = &action;
     list.next = NULL;
-    qmp_transaction(&list, errp);
+    qmp_transaction(&list, false, NULL, errp);
 }
 
 void qmp_blockdev_snapshot_sync(bool has_device, const char *device,
@@ -1248,6 +1248,7 @@  typedef struct BlkActionOps {
  *
  * @action: QAPI-defined enum identifying which Action to perform.
  * @ops: Table of ActionOps this Action can perform.
+ * @block_job_txn: Transaction which this action belongs to.
  * @entry: List membership for all Actions in this Transaction.
  *
  * This structure must be arranged as first member in a subclassed type,
@@ -1257,6 +1258,8 @@  typedef struct BlkActionOps {
 struct BlkActionState {
     TransactionAction *action;
     const BlkActionOps *ops;
+    BlockJobTxn *block_job_txn;
+    TransactionProperties *txn_props;
     QSIMPLEQ_ENTRY(BlkActionState) entry;
 };
 
@@ -1268,6 +1271,20 @@  typedef struct InternalSnapshotState {
     QEMUSnapshotInfo sn;
 } InternalSnapshotState;
 
+
+static int action_check_cancel_mode(BlkActionState *s, Error **errp)
+{
+    if (s->txn_props->err_cancel != ACTION_CANCEL_MODE_NONE) {
+        error_setg(errp,
+                   "Action '%s' does not support Transaction property "
+                   "err-cancel = %s",
+                   TransactionActionKind_lookup[s->action->kind],
+                   ActionCancelMode_lookup[s->txn_props->err_cancel]);
+        return -1;
+    }
+    return 0;
+}
+
 static void internal_snapshot_prepare(BlkActionState *common,
                                       Error **errp)
 {
@@ -1293,6 +1310,10 @@  static void internal_snapshot_prepare(BlkActionState *common,
     name = internal->name;
 
     /* 2. check for validation */
+    if (action_check_cancel_mode(common, errp) < 0) {
+        return;
+    }
+
     blk = blk_by_name(device);
     if (!blk) {
         error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
@@ -1444,6 +1465,10 @@  static void external_snapshot_prepare(BlkActionState *common,
     }
 
     /* start processing */
+    if (action_check_cancel_mode(common, errp) < 0) {
+        return;
+    }
+
     state->old_bs = bdrv_lookup_bs(has_device ? device : NULL,
                                    has_node_name ? node_name : NULL,
                                    &local_err);
@@ -1600,7 +1625,7 @@  static void drive_backup_prepare(BlkActionState *common, Error **errp)
                     backup->has_bitmap, backup->bitmap,
                     backup->has_on_source_error, backup->on_source_error,
                     backup->has_on_target_error, backup->on_target_error,
-                    NULL, &local_err);
+                    common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
@@ -1685,7 +1710,7 @@  static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
                        backup->has_speed, backup->speed,
                        backup->has_on_source_error, backup->on_source_error,
                        backup->has_on_target_error, backup->on_target_error,
-                       NULL, &local_err);
+                       common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
@@ -1732,6 +1757,10 @@  static void block_dirty_bitmap_add_prepare(BlkActionState *common,
     BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
                                              common, common);
 
+    if (action_check_cancel_mode(common, errp) < 0) {
+        return;
+    }
+
     action = common->action->block_dirty_bitmap_add;
     /* AIO context taken and released within qmp_block_dirty_bitmap_add */
     qmp_block_dirty_bitmap_add(action->node, action->name,
@@ -1767,6 +1796,10 @@  static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
                                              common, common);
     BlockDirtyBitmap *action;
 
+    if (action_check_cancel_mode(common, errp) < 0) {
+        return;
+    }
+
     action = common->action->block_dirty_bitmap_clear;
     state->bitmap = block_dirty_bitmap_lookup(action->node,
                                               action->name,
@@ -1869,19 +1902,50 @@  static const BlkActionOps actions[] = {
     }
 };
 
+/**
+ * Allocate a TransactionProperties structure if necessary, and fill
+ * that structure with desired defaults if they are unset.
+ */
+static TransactionProperties *get_transaction_properties(
+    TransactionProperties *props)
+{
+    if (!props) {
+        props = g_new0(TransactionProperties, 1);
+    }
+
+    if (!props->has_err_cancel) {
+        props->has_err_cancel = true;
+        props->err_cancel = ACTION_CANCEL_MODE_NONE;
+    }
+
+    return props;
+}
+
 /*
  * 'Atomic' group operations.  The operations are performed as a set, and if
  * any fail then we roll back all operations in the group.
  */
-void qmp_transaction(TransactionActionList *dev_list, Error **errp)
+void qmp_transaction(TransactionActionList *dev_list,
+                     bool has_props,
+                     struct TransactionProperties *props,
+                     Error **errp)
 {
     TransactionActionList *dev_entry = dev_list;
+    BlockJobTxn *block_job_txn = NULL;
     BlkActionState *state, *next;
     Error *local_err = NULL;
 
     QSIMPLEQ_HEAD(snap_bdrv_states, BlkActionState) snap_bdrv_states;
     QSIMPLEQ_INIT(&snap_bdrv_states);
 
+    /* Does this transaction get canceled as a group on failure?
+     * If not, we don't really need to make a BlockJobTxn.
+     */
+    props = get_transaction_properties(props);
+    if (props->err_cancel != ACTION_CANCEL_MODE_NONE) {
+        block_job_txn = block_job_txn_new();
+    }
+
     /* drain all i/o before any operations */
     bdrv_drain_all();
 
@@ -1901,6 +1965,8 @@  void qmp_transaction(TransactionActionList *dev_list, Error **errp)
         state = g_malloc0(ops->instance_size);
         state->ops = ops;
         state->action = dev_info;
+        state->block_job_txn = block_job_txn;
+        state->txn_props = props;
         QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, state, entry);
 
         state->ops->prepare(state, &local_err);
@@ -1933,6 +1999,10 @@  exit:
         }
         g_free(state);
     }
+    if (!has_props) {
+        qapi_free_TransactionProperties(props);
+    }
+    block_job_txn_unref(block_job_txn);
 }
 
 
diff --git a/qapi-schema.json b/qapi-schema.json
index a73c2c9..86323e1 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1520,6 +1520,24 @@ 
   'data': { } }
 
 ##
+# @ActionCancelMode
+#
+# An enumeration of Transactional cancellation modes.
+#
+# @none: Do not attempt to cancel any other Actions if any
+#        actions fail after the Transaction request succeeds.
+#        This is the default.
+#
+# @all:  If any action fails after the Transaction succeeds,
+#        cancel all actions. May be rejected by Actions that
+#        do not support this cancellation mode.
+#
+# Since: 2.5
+##
+{ 'enum': 'ActionCancelMode',
+  'data': [ 'none', 'all' ] }
+
+##
 # @TransactionAction
 #
 # A discriminated record of operations that can be performed with
@@ -1546,14 +1564,35 @@ 
    } }
 
 ##
+# @TransactionProperties
+#
+# Optional arguments to modify the behavior of a Transaction.
+#
+# @err-cancel: How failures that occur in jobs launched asynchronously
+#              by a Transaction that has already completed should be handled.
+#              See @ActionCancelMode for details.
+#
+# Since: 2.5
+##
+{ 'struct': 'TransactionProperties',
+  'data': {
+       '*err-cancel': 'ActionCancelMode'
+  }
+}
+
+##
 # @transaction
 #
 # Executes a number of transactionable QMP commands atomically. If any
 # operation fails, then the entire set of actions will be abandoned and the
 # appropriate error returned.
 #
-#  List of:
-#  @TransactionAction: information needed for the respective operation
+# @actions: List of @TransactionAction;
+#           information needed for the respective operations.
+#
+# @properties: Optional structure of additional options to control the
+#              execution of the transaction. See @TransactionProperties
+#              for additional detail.
 #
 # Returns: nothing on success
 #          Errors depend on the operations of the transaction
@@ -1565,7 +1604,10 @@ 
 # Since 1.1
 ##
 { 'command': 'transaction',
-  'data': { 'actions': [ 'TransactionAction' ] } }
+  'data': { 'actions': [ 'TransactionAction' ],
+            '*properties': 'TransactionProperties'
+          }
+}
 
 ##
 # @human-monitor-command:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2b52980..4a2cd15 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1262,7 +1262,7 @@  EQMP
     },
     {
         .name       = "transaction",
-        .args_type  = "actions:q",
+        .args_type  = "actions:q,properties:q?",
         .mhandler.cmd_new = qmp_marshal_transaction,
     },