diff mbox

block: Make op blocker recursive

Message ID 1403208081-18247-2-git-send-email-benoit.canet@irqsave.net
State New
Headers show

Commit Message

Benoît Canet June 19, 2014, 8:01 p.m. UTC
As the code will start to operate on arbitratry nodes we need the op blocker
to recursively block or unblock whole BDS subtrees.

Also add a function to reset all blocker from a BDS.

This patch also take care of changing blocker user so they are not broken.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
 block.c                   | 87 +++++++++++++++++++++++++++++++++++++++++++++++
 block/blkverify.c         | 10 ++++++
 block/commit.c            |  1 +
 block/mirror.c            |  1 +
 block/quorum.c            | 13 +++++++
 include/block/block.h     | 12 +++++++
 include/block/block_int.h |  5 +++
 7 files changed, 129 insertions(+)

Comments

Eric Blake June 19, 2014, 8:13 p.m. UTC | #1
On 06/19/2014 02:01 PM, Benoît Canet wrote:
> As the code will start to operate on arbitratry nodes we need the op blocker

s/arbitratry/arbitrary/

> to recursively block or unblock whole BDS subtrees.
> 
> Also add a function to reset all blocker from a BDS.
> 
> This patch also take care of changing blocker user so they are not broken.
> 
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---

> +
> +/* This remove unconditionally all blockers of type op of the subtree */

This unconditionally removes all blockers of type op of the subtree

Yikes - is that really what we want?  Or do we need to start doing
blocker reference counting?

Consider:

base <- snap1 <- active

Looking at Jeff's proposal of making blockers based on access patterns
rather than operations, we want the mere act of being a backing file to
automatically put a guest_write block on base and snap1 (we must not
modify the backing chain out of underneath active).  But now suppose we
do two operations in parallel - we take a fleecing export of active, and
we start a drive-mirror on active.

base <- snap1 <- active
              |        \-- fleecing
              \-- copy

Both of those actions should be doable in parallel, and both of them
probably put additional blocker restrictions on the chain.  But if we
unconditionally clear those additional restrictions on the first of the
two jobs ending, that would inappropriately stop blocking that action
from the still on-going second action.  The only way I see around that
is via reference-counted blocking.  Definitely not 2.1 material (but
good to be thinking about it now, so we can get it in early in the 2.2
cycle).


>  
> +/* This remove unconditionally all blockers */

Unconditionally remove all blockers


>  
> +/* Used to prevent recursion loop. A case exists in block commit mirror usage */
> +static BlockDriverState *recurse_op_bs = NULL;
> +/* take note of the recursion depth to allow assigning recurse_op_bs once */
> +static uint64_t recurse_op_depth = 0;

The '= 0' is redundant; the C language guarantees that all static
variables are 0-initialized.

> +
> +/* set or unset an op blocker to a BDS whether set is true or false */
> +void bdrv_op_block_action(BlockDriverState *bs, BlockOpType op,
> +                          BlockerAction action, Error *reason)
> +{

Not sure I follow that comment, since 'set' is not one of the parameter
names.


> +
> +/* Recursively set or unset an op block to a BDS tree whether set is true or
> + * false
> + */
> +void bdrv_recurse_op_block(BlockDriverState *bs, BlockOpType op,
> +                           BlockerAction action, Error *reason)

and again

> +{
> +    /* If recursion is detected simply return */
> +    if (recurse_op_bs == bs) {
> +        return;
> +    }
> +
> +    /* if we are starting recursion remeber the bs for later comparison */

s/remeber/remember/

> +    if (!recurse_op_depth) {
> +        recurse_op_bs = bs;
> +    }
> +
> +    /* try to set or unset on bs->file and bs->backing_hd first */
> +    bdrv_op_block_action(bs->file, op, action, reason);
> +    bdrv_op_block_action(bs->backing_hd, op, action, reason);
> +
> +    /* if the BDS is a filter with multiple childs ask the driver to recurse */

s/childs/children/


> +static void blkverify_recurse_op_block(BlockDriverState *bs, BlockOpType op,
> +                                       BlockerAction action, Error *reason)
> +{
> +    BDRVBlkverifyState *s = bs->opaque;
> +    bdrv_op_block_action(bs->file, op , action, reason);
> +    bdrv_op_block_action(s->test_file, op , action, reason);

s/ ,/,/ twice

> +++ b/include/block/block_int.h
> @@ -86,6 +86,11 @@ struct BlockDriver {
>       */
>      bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
>                                               BlockDriverState *candidate);
> +    /* In order to be able to recursively block operation on BDS trees filter
> +     * like quorum can implement this callback

s/trees filter/trees, a filter/
Benoît Canet June 19, 2014, 8:20 p.m. UTC | #2
The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
> On 06/19/2014 02:01 PM, Benoît Canet wrote:
> > As the code will start to operate on arbitratry nodes we need the op blocker
> 
> s/arbitratry/arbitrary/
> 
> > to recursively block or unblock whole BDS subtrees.
> > 
> > Also add a function to reset all blocker from a BDS.
> > 
> > This patch also take care of changing blocker user so they are not broken.
> > 
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> 
> > +
> > +/* This remove unconditionally all blockers of type op of the subtree */
> 
> This unconditionally removes all blockers of type op of the subtree
> 
> Yikes - is that really what we want?  Or do we need to start doing
> blocker reference counting?
> 
> Consider:
> 
> base <- snap1 <- active
> 
> Looking at Jeff's proposal of making blockers based on access patterns
> rather than operations, we want the mere act of being a backing file to
> automatically put a guest_write block on base and snap1 (we must not
> modify the backing chain out of underneath active).  But now suppose we
> do two operations in parallel - we take a fleecing export of active, and
> we start a drive-mirror on active.
> 
> base <- snap1 <- active
>               |        \-- fleecing
>               \-- copy
> 
> Both of those actions should be doable in parallel, and both of them
> probably put additional blocker restrictions on the chain.  But if we
> unconditionally clear those additional restrictions on the first of the
> two jobs ending, that would inappropriately stop blocking that action
> from the still on-going second action.  The only way I see around that
> is via reference-counted blocking.  Definitely not 2.1 material (but
> good to be thinking about it now, so we can get it in early in the 2.2
> cycle).

I added this reset function for the case where a whole BDS subtree is detached
from the graph and will be destroyed.

It does happen in drive mirror and bdrv_unrefing it would lead to a failed
assertion.

So the reset function take care of removing blocker of dead subtrees.

What would be a cleaner solution ?

Best regards

Benoît

> 
> 
> >  
> > +/* This remove unconditionally all blockers */
> 
> Unconditionally remove all blockers
> 
> 
> >  
> > +/* Used to prevent recursion loop. A case exists in block commit mirror usage */
> > +static BlockDriverState *recurse_op_bs = NULL;
> > +/* take note of the recursion depth to allow assigning recurse_op_bs once */
> > +static uint64_t recurse_op_depth = 0;
> 
> The '= 0' is redundant; the C language guarantees that all static
> variables are 0-initialized.
> 
> > +
> > +/* set or unset an op blocker to a BDS whether set is true or false */
> > +void bdrv_op_block_action(BlockDriverState *bs, BlockOpType op,
> > +                          BlockerAction action, Error *reason)
> > +{
> 
> Not sure I follow that comment, since 'set' is not one of the parameter
> names.
> 
> 
> > +
> > +/* Recursively set or unset an op block to a BDS tree whether set is true or
> > + * false
> > + */
> > +void bdrv_recurse_op_block(BlockDriverState *bs, BlockOpType op,
> > +                           BlockerAction action, Error *reason)
> 
> and again
> 
> > +{
> > +    /* If recursion is detected simply return */
> > +    if (recurse_op_bs == bs) {
> > +        return;
> > +    }
> > +
> > +    /* if we are starting recursion remeber the bs for later comparison */
> 
> s/remeber/remember/
> 
> > +    if (!recurse_op_depth) {
> > +        recurse_op_bs = bs;
> > +    }
> > +
> > +    /* try to set or unset on bs->file and bs->backing_hd first */
> > +    bdrv_op_block_action(bs->file, op, action, reason);
> > +    bdrv_op_block_action(bs->backing_hd, op, action, reason);
> > +
> > +    /* if the BDS is a filter with multiple childs ask the driver to recurse */
> 
> s/childs/children/
> 
> 
> > +static void blkverify_recurse_op_block(BlockDriverState *bs, BlockOpType op,
> > +                                       BlockerAction action, Error *reason)
> > +{
> > +    BDRVBlkverifyState *s = bs->opaque;
> > +    bdrv_op_block_action(bs->file, op , action, reason);
> > +    bdrv_op_block_action(s->test_file, op , action, reason);
> 
> s/ ,/,/ twice
> 
> > +++ b/include/block/block_int.h
> > @@ -86,6 +86,11 @@ struct BlockDriver {
> >       */
> >      bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
> >                                               BlockDriverState *candidate);
> > +    /* In order to be able to recursively block operation on BDS trees filter
> > +     * like quorum can implement this callback
> 
> s/trees filter/trees, a filter/
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Eric Blake June 19, 2014, 8:26 p.m. UTC | #3
On 06/19/2014 02:20 PM, Benoît Canet wrote:

>> This unconditionally removes all blockers of type op of the subtree
>>
>> Yikes - is that really what we want?  Or do we need to start doing
>> blocker reference counting?
>>
>> Consider:
>>
>> base <- snap1 <- active
>>
>> Looking at Jeff's proposal of making blockers based on access patterns
>> rather than operations, we want the mere act of being a backing file to
>> automatically put a guest_write block on base and snap1 (we must not
>> modify the backing chain out of underneath active).  But now suppose we
>> do two operations in parallel - we take a fleecing export of active, and
>> we start a drive-mirror on active.
>>
>> base <- snap1 <- active
>>               |        \-- fleecing
>>               \-- copy
>>
>> Both of those actions should be doable in parallel, and both of them
>> probably put additional blocker restrictions on the chain.  But if we
>> unconditionally clear those additional restrictions on the first of the
>> two jobs ending, that would inappropriately stop blocking that action
>> from the still on-going second action.  The only way I see around that
>> is via reference-counted blocking.  Definitely not 2.1 material (but
>> good to be thinking about it now, so we can get it in early in the 2.2
>> cycle).
> 
> I added this reset function for the case where a whole BDS subtree is detached
> from the graph and will be destroyed.
> 
> It does happen in drive mirror and bdrv_unrefing it would lead to a failed
> assertion.

Okay, you may have a use case there.  Or you may just be highlighting a
bug.  Consider what happens if we have:

base <- snap1 <- active

then open a fleecing NBD view of snap1:

base <- snap1 <- active
              \-- fleecing

then do a blockpull into active:

active
base <- snap1 <- fleecing

that is, as long as the fleecing operation is live, we STILL need to
block base and snap1 from modification; even though active is no longer
dependent on them.  Dropping the backing chain of active decreases the
reference count, but does not delete the BDS for base or snap1 because
those BDS are still in use by the fleecing operation.

> 
> So the reset function take care of removing blocker of dead subtrees.
> 
> What would be a cleaner solution ?

I'm not honestly sure.  Which is why we're thinking about design gotchas
at the moment ;)
Benoît Canet June 19, 2014, 8:36 p.m. UTC | #4
The Thursday 19 Jun 2014 à 14:26:04 (-0600), Eric Blake wrote :

> On 06/19/2014 02:20 PM, Benoît Canet wrote:
> 
> >> This unconditionally removes all blockers of type op of the subtree
> >>
> >> Yikes - is that really what we want?  Or do we need to start doing
> >> blocker reference counting?
> >>
> >> Consider:
> >>
> >> base <- snap1 <- active
> >>
> >> Looking at Jeff's proposal of making blockers based on access patterns
> >> rather than operations, we want the mere act of being a backing file to
> >> automatically put a guest_write block on base and snap1 (we must not
> >> modify the backing chain out of underneath active).  But now suppose we
> >> do two operations in parallel - we take a fleecing export of active, and
> >> we start a drive-mirror on active.
> >>
> >> base <- snap1 <- active
> >>               |        \-- fleecing
> >>               \-- copy
> >>
> >> Both of those actions should be doable in parallel, and both of them
> >> probably put additional blocker restrictions on the chain.  But if we
> >> unconditionally clear those additional restrictions on the first of the
> >> two jobs ending, that would inappropriately stop blocking that action
> >> from the still on-going second action.  The only way I see around that
> >> is via reference-counted blocking.  Definitely not 2.1 material (but
> >> good to be thinking about it now, so we can get it in early in the 2.2
> >> cycle).
> > 
> > I added this reset function for the case where a whole BDS subtree is detached
> > from the graph and will be destroyed.
> > 
> > It does happen in drive mirror and bdrv_unrefing it would lead to a failed
> > assertion.
> 
> Okay, you may have a use case there.  Or you may just be highlighting a
> bug.  Consider what happens if we have:
> 
> base <- snap1 <- active
> 
> then open a fleecing NBD view of snap1:
> 
> base <- snap1 <- active
>               \-- fleecing
> 
> then do a blockpull into active:
> 
> active
> base <- snap1 <- fleecing
> 
> that is, as long as the fleecing operation is live, we STILL need to
> block base and snap1 from modification; even though active is no longer
> dependent on them.  Dropping the backing chain of active decreases the
> reference count, but does not delete the BDS for base or snap1 because
> those BDS are still in use by the fleecing operation.

Blockers are already refcount like as they are implemented as linked list grouped
by operation types.
As long an error message  is still in a list the BDS is blocker.

Best regards

Benoît

> 
> > 
> > So the reset function take care of removing blocker of dead subtrees.
> > 
> > What would be a cleaner solution ?
> 
> I'm not honestly sure.  Which is why we're thinking about design gotchas
> at the moment ;)
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
Fam Zheng June 20, 2014, 5:01 a.m. UTC | #5
On Thu, 06/19 22:20, Benoît Canet wrote:
> The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
> > On 06/19/2014 02:01 PM, Benoît Canet wrote:
> > > As the code will start to operate on arbitratry nodes we need the op blocker
> > 
> > s/arbitratry/arbitrary/
> > 
> > > to recursively block or unblock whole BDS subtrees.

I don't get the reason, can you elaborate?

> > > 
> > > Also add a function to reset all blocker from a BDS.
> > > 
> > > This patch also take care of changing blocker user so they are not broken.
> > > 
> > > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > > ---
> > 
> > > +
> > > +/* This remove unconditionally all blockers of type op of the subtree */
> > 
> > This unconditionally removes all blockers of type op of the subtree
> > 
> > Yikes - is that really what we want?  Or do we need to start doing
> > blocker reference counting?
> > 
> > Consider:
> > 
> > base <- snap1 <- active
> > 
> > Looking at Jeff's proposal of making blockers based on access patterns
> > rather than operations, we want the mere act of being a backing file to
> > automatically put a guest_write block on base and snap1 (we must not
> > modify the backing chain out of underneath active).  But now suppose we
> > do two operations in parallel - we take a fleecing export of active, and
> > we start a drive-mirror on active.
> > 
> > base <- snap1 <- active
> >               |        \-- fleecing
> >               \-- copy
> > 
> > Both of those actions should be doable in parallel, and both of them
> > probably put additional blocker restrictions on the chain.  But if we
> > unconditionally clear those additional restrictions on the first of the
> > two jobs ending, that would inappropriately stop blocking that action
> > from the still on-going second action.  The only way I see around that
> > is via reference-counted blocking.  Definitely not 2.1 material (but
> > good to be thinking about it now, so we can get it in early in the 2.2
> > cycle).
> 
> I added this reset function for the case where a whole BDS subtree is detached
> from the graph and will be destroyed.
> 
> It does happen in drive mirror and bdrv_unrefing it would lead to a failed
> assertion.

Which assertion is it? Maybe it is a bug somewhere else. The caller of reset
wouldn't know about other blockers, why is ignoring their blocking reason and
forcing the reset safe here?

A BDS and its subtree will only be destroyed if their refcnts go to 0. In this
case, there should be no other blockers to be released other than the last
user's.  So we don't need the unconditional reset anyway, and I _do_ think
doing this is wrong and counter-design.

> 
> So the reset function take care of removing blocker of dead subtrees.
> 
> What would be a cleaner solution ?

What is the question to solve?

Fam
Eric Blake June 20, 2014, 3:30 p.m. UTC | #6
On 06/19/2014 11:01 PM, Fam Zheng wrote:
> On Thu, 06/19 22:20, Benoît Canet wrote:
>> The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
>>> On 06/19/2014 02:01 PM, Benoît Canet wrote:
>>>> As the code will start to operate on arbitratry nodes we need the op blocker
>>>
>>> s/arbitratry/arbitrary/
>>>
>>>> to recursively block or unblock whole BDS subtrees.
> 
> I don't get the reason, can you elaborate?

Consider what happens if I have:

base <- snap1 <- active

then I start a fleecing NBD server on the state as it was at snap1:

base <- snap1 <- active
              \- fleecing

then I do a blockpull into active:

base <- snap1 <- fleecing
active

at this point, base and snap1 are no longer tied to active, but they
STILL must be protected from operations that would modify their contents
in a way that would break the fleecing operation.  The solution we are
looking at is making BDS blockers recursive to every element of the
chain, not just the top-level device.

Another example: consider:

base <- snap1 <- active

then someone uses Jeff's proposed new change-backing-file QMP command to
rewrite the snap1 metadata to point to base via a relative name instead
of an absolute name.  It shouldn't matter whether active is blocked, but
only whether snap1 is blocked.  But to know if snap1 is blocked, we have
to propagate the blockers of active down recursively to its backing files.

>> What would be a cleaner solution ?
> 
> What is the question to solve?

I think Jeff's idea is on target - rather than blocking by operation, we
should instead be blocking on access patterns (various operations
trigger several access patterns):
https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg04752.html

Jeff's initial list included:

> So if I think of operations that are done on block devices from a
> block job, and chuck them into categories, I think we have:
> 
> 1) Read of guest-visible data
> 2) Write of guest-visible data
> 3) Read of host-visible data (e.g. image file metadata)
> 4) Write of host-visible data (e.g. image file metadata, such as
> the backing-file)
> 5) Block chain manipulations (e.g. movement of a BDS, change to r/w
> instead of r/o, etc..)
> 6) I/O attribute changes (e.g. throttling, etc..)
Fam Zheng June 21, 2014, 8:53 a.m. UTC | #7
On Fri, 06/20 09:30, Eric Blake wrote:
> On 06/19/2014 11:01 PM, Fam Zheng wrote:
> > On Thu, 06/19 22:20, Benoît Canet wrote:
> >> The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
> >>> On 06/19/2014 02:01 PM, Benoît Canet wrote:
> >>>> As the code will start to operate on arbitratry nodes we need the op blocker
> >>>
> >>> s/arbitratry/arbitrary/
> >>>
> >>>> to recursively block or unblock whole BDS subtrees.
> > 
> > I don't get the reason, can you elaborate?
> 
> Consider what happens if I have:
> 
> base <- snap1 <- active
> 
> then I start a fleecing NBD server on the state as it was at snap1:
> 
> base <- snap1 <- active
>               \- fleecing
> 
> then I do a blockpull into active:
> 
> base <- snap1 <- fleecing
> active
> 
> at this point, base and snap1 are no longer tied to active, but they
> STILL must be protected from operations that would modify their contents
> in a way that would break the fleecing operation.  The solution we are
> looking at is making BDS blockers recursive to every element of the
> chain, not just the top-level device.

This would already have been protected by backing blocker of fleecing target.

> 
> Another example: consider:
> 
> base <- snap1 <- active
> 
> then someone uses Jeff's proposed new change-backing-file QMP command to
> rewrite the snap1 metadata to point to base via a relative name instead
> of an absolute name.  It shouldn't matter whether active is blocked, but
> only whether snap1 is blocked.  But to know if snap1 is blocked, we have
> to propagate the blockers of active down recursively to its backing files.

Why do we need to block changging of metadata? I think this operation is safe
in most cases.

Correct me if I'm missing anything, but even if snap1 _is_ blocked, it would be
because snap1 is serving as backing of active. In this case, the actual blocker
should be active->backing_blocker.

> 
> >> What would be a cleaner solution ?
> > 
> > What is the question to solve?
> 
> I think Jeff's idea is on target - rather than blocking by operation, we
> should instead be blocking on access patterns (various operations
> trigger several access patterns):
> https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg04752.html
> 
> Jeff's initial list included:
> 
> > So if I think of operations that are done on block devices from a
> > block job, and chuck them into categories, I think we have:
> > 
> > 1) Read of guest-visible data
> > 2) Write of guest-visible data
> > 3) Read of host-visible data (e.g. image file metadata)
> > 4) Write of host-visible data (e.g. image file metadata, such as
> > the backing-file)
> > 5) Block chain manipulations (e.g. movement of a BDS, change to r/w
> > instead of r/o, etc..)
> > 6) I/O attribute changes (e.g. throttling, etc..)

Most operations looks safe to me, given the way how IOThreads and coroutine
work now. It's only the chain manpulations in long running block jobs that are
exclusive, and by nature it should be checked per chain.  Can we set some op
blockers on the bottom BDS and check it each time, to prevent user from
starting a second chain manipulator?

Fam
Benoît Canet June 21, 2014, 10:45 a.m. UTC | #8
The Saturday 21 Jun 2014 à 16:53:58 (+0800), Fam Zheng wrote :
> On Fri, 06/20 09:30, Eric Blake wrote:
> > On 06/19/2014 11:01 PM, Fam Zheng wrote:
> > > On Thu, 06/19 22:20, Benoît Canet wrote:
> > >> The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
> > >>> On 06/19/2014 02:01 PM, Benoît Canet wrote:
> > >>>> As the code will start to operate on arbitratry nodes we need the op blocker
> > >>>
> > >>> s/arbitratry/arbitrary/
> > >>>
> > >>>> to recursively block or unblock whole BDS subtrees.
> > > 
> > > I don't get the reason, can you elaborate?
> > 
> > Consider what happens if I have:
> > 
> > base <- snap1 <- active
> > 
> > then I start a fleecing NBD server on the state as it was at snap1:
> > 
> > base <- snap1 <- active
> >               \- fleecing
> > 
> > then I do a blockpull into active:
> > 
> > base <- snap1 <- fleecing
> > active
> > 
> > at this point, base and snap1 are no longer tied to active, but they
> > STILL must be protected from operations that would modify their contents
> > in a way that would break the fleecing operation.  The solution we are
> > looking at is making BDS blockers recursive to every element of the
> > chain, not just the top-level device.
> 
> This would already have been protected by backing blocker of fleecing target.
> 
> > 
> > Another example: consider:
> > 
> > base <- snap1 <- active
> > 
> > then someone uses Jeff's proposed new change-backing-file QMP command to
> > rewrite the snap1 metadata to point to base via a relative name instead
> > of an absolute name.  It shouldn't matter whether active is blocked, but
> > only whether snap1 is blocked.  But to know if snap1 is blocked, we have
> > to propagate the blockers of active down recursively to its backing files.
> 
> Why do we need to block changging of metadata? I think this operation is safe
> in most cases.
> 
> Correct me if I'm missing anything, but even if snap1 _is_ blocked, it would be
> because snap1 is serving as backing of active. In this case, the actual blocker
> should be active->backing_blocker.
> 
> > 
> > >> What would be a cleaner solution ?
> > > 
> > > What is the question to solve?
> > 
> > I think Jeff's idea is on target - rather than blocking by operation, we
> > should instead be blocking on access patterns (various operations
> > trigger several access patterns):
> > https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg04752.html
> > 
> > Jeff's initial list included:
> > 
> > > So if I think of operations that are done on block devices from a
> > > block job, and chuck them into categories, I think we have:
> > > 
> > > 1) Read of guest-visible data
> > > 2) Write of guest-visible data
> > > 3) Read of host-visible data (e.g. image file metadata)
> > > 4) Write of host-visible data (e.g. image file metadata, such as
> > > the backing-file)
> > > 5) Block chain manipulations (e.g. movement of a BDS, change to r/w
> > > instead of r/o, etc..)
> > > 6) I/O attribute changes (e.g. throttling, etc..)
> 
> Most operations looks safe to me, given the way how IOThreads and coroutine
> work now. It's only the chain manpulations in long running block jobs that are
> exclusive, and by nature it should be checked per chain.  Can we set some op
> blockers on the bottom BDS and check it each time, to prevent user from
> starting a second chain manipulator?

I don't know if bottom BDS locking is any good because some driver like quorum
have multiple childs.
Locking everytime the root (top) of the tree seems a feasible solution indeed.

Best regards

Benoît


> 
> Fam
>
Fam Zheng June 21, 2014, 3:15 p.m. UTC | #9
On Sat, 06/21 12:45, Benoît Canet wrote:
> The Saturday 21 Jun 2014 à 16:53:58 (+0800), Fam Zheng wrote :
> > On Fri, 06/20 09:30, Eric Blake wrote:
> > > On 06/19/2014 11:01 PM, Fam Zheng wrote:
> > > > On Thu, 06/19 22:20, Benoît Canet wrote:
> > > >> The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
> > > >>> On 06/19/2014 02:01 PM, Benoît Canet wrote:
> > > >>>> As the code will start to operate on arbitratry nodes we need the op blocker
> > > >>>
> > > >>> s/arbitratry/arbitrary/
> > > >>>
> > > >>>> to recursively block or unblock whole BDS subtrees.
> > > > 
> > > > I don't get the reason, can you elaborate?
> > > 
> > > Consider what happens if I have:
> > > 
> > > base <- snap1 <- active
> > > 
> > > then I start a fleecing NBD server on the state as it was at snap1:
> > > 
> > > base <- snap1 <- active
> > >               \- fleecing
> > > 
> > > then I do a blockpull into active:
> > > 
> > > base <- snap1 <- fleecing
> > > active
> > > 
> > > at this point, base and snap1 are no longer tied to active, but they
> > > STILL must be protected from operations that would modify their contents
> > > in a way that would break the fleecing operation.  The solution we are
> > > looking at is making BDS blockers recursive to every element of the
> > > chain, not just the top-level device.
> > 
> > This would already have been protected by backing blocker of fleecing target.
> > 
> > > 
> > > Another example: consider:
> > > 
> > > base <- snap1 <- active
> > > 
> > > then someone uses Jeff's proposed new change-backing-file QMP command to
> > > rewrite the snap1 metadata to point to base via a relative name instead
> > > of an absolute name.  It shouldn't matter whether active is blocked, but
> > > only whether snap1 is blocked.  But to know if snap1 is blocked, we have
> > > to propagate the blockers of active down recursively to its backing files.
> > 
> > Why do we need to block changging of metadata? I think this operation is safe
> > in most cases.
> > 
> > Correct me if I'm missing anything, but even if snap1 _is_ blocked, it would be
> > because snap1 is serving as backing of active. In this case, the actual blocker
> > should be active->backing_blocker.
> > 
> > > 
> > > >> What would be a cleaner solution ?
> > > > 
> > > > What is the question to solve?
> > > 
> > > I think Jeff's idea is on target - rather than blocking by operation, we
> > > should instead be blocking on access patterns (various operations
> > > trigger several access patterns):
> > > https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg04752.html
> > > 
> > > Jeff's initial list included:
> > > 
> > > > So if I think of operations that are done on block devices from a
> > > > block job, and chuck them into categories, I think we have:
> > > > 
> > > > 1) Read of guest-visible data
> > > > 2) Write of guest-visible data
> > > > 3) Read of host-visible data (e.g. image file metadata)
> > > > 4) Write of host-visible data (e.g. image file metadata, such as
> > > > the backing-file)
> > > > 5) Block chain manipulations (e.g. movement of a BDS, change to r/w
> > > > instead of r/o, etc..)
> > > > 6) I/O attribute changes (e.g. throttling, etc..)
> > 
> > Most operations looks safe to me, given the way how IOThreads and coroutine
> > work now. It's only the chain manpulations in long running block jobs that are
> > exclusive, and by nature it should be checked per chain.  Can we set some op
> > blockers on the bottom BDS and check it each time, to prevent user from
> > starting a second chain manipulator?
> 
> I don't know if bottom BDS locking is any good because some driver like quorum
> have multiple childs.
> Locking everytime the root (top) of the tree seems a feasible solution indeed.

Quorom doesn't change the convensions of backing chain, so each child belongs
to its own backing chain, and that chain has a deterministic top and bottom.

Blocking flag on bottom saves us from adding reverse looking up (->overlay
pointer), because we already have the ->backing_hd pointer in BDS.

Fam
Benoît Canet June 21, 2014, 3:39 p.m. UTC | #10
The Saturday 21 Jun 2014 à 23:15:19 (+0800), Fam Zheng wrote :
> On Sat, 06/21 12:45, Benoît Canet wrote:
> > The Saturday 21 Jun 2014 à 16:53:58 (+0800), Fam Zheng wrote :
> > > On Fri, 06/20 09:30, Eric Blake wrote:
> > > > On 06/19/2014 11:01 PM, Fam Zheng wrote:
> > > > > On Thu, 06/19 22:20, Benoît Canet wrote:
> > > > >> The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
> > > > >>> On 06/19/2014 02:01 PM, Benoît Canet wrote:
> > > > >>>> As the code will start to operate on arbitratry nodes we need the op blocker
> > > > >>>
> > > > >>> s/arbitratry/arbitrary/
> > > > >>>
> > > > >>>> to recursively block or unblock whole BDS subtrees.
> > > > > 
> > > > > I don't get the reason, can you elaborate?
> > > > 
> > > > Consider what happens if I have:
> > > > 
> > > > base <- snap1 <- active
> > > > 
> > > > then I start a fleecing NBD server on the state as it was at snap1:
> > > > 
> > > > base <- snap1 <- active
> > > >               \- fleecing
> > > > 
> > > > then I do a blockpull into active:
> > > > 
> > > > base <- snap1 <- fleecing
> > > > active
> > > > 
> > > > at this point, base and snap1 are no longer tied to active, but they
> > > > STILL must be protected from operations that would modify their contents
> > > > in a way that would break the fleecing operation.  The solution we are
> > > > looking at is making BDS blockers recursive to every element of the
> > > > chain, not just the top-level device.
> > > 
> > > This would already have been protected by backing blocker of fleecing target.
> > > 
> > > > 
> > > > Another example: consider:
> > > > 
> > > > base <- snap1 <- active
> > > > 
> > > > then someone uses Jeff's proposed new change-backing-file QMP command to
> > > > rewrite the snap1 metadata to point to base via a relative name instead
> > > > of an absolute name.  It shouldn't matter whether active is blocked, but
> > > > only whether snap1 is blocked.  But to know if snap1 is blocked, we have
> > > > to propagate the blockers of active down recursively to its backing files.
> > > 
> > > Why do we need to block changging of metadata? I think this operation is safe
> > > in most cases.
> > > 
> > > Correct me if I'm missing anything, but even if snap1 _is_ blocked, it would be
> > > because snap1 is serving as backing of active. In this case, the actual blocker
> > > should be active->backing_blocker.
> > > 
> > > > 
> > > > >> What would be a cleaner solution ?
> > > > > 
> > > > > What is the question to solve?
> > > > 
> > > > I think Jeff's idea is on target - rather than blocking by operation, we
> > > > should instead be blocking on access patterns (various operations
> > > > trigger several access patterns):
> > > > https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg04752.html
> > > > 
> > > > Jeff's initial list included:
> > > > 
> > > > > So if I think of operations that are done on block devices from a
> > > > > block job, and chuck them into categories, I think we have:
> > > > > 
> > > > > 1) Read of guest-visible data
> > > > > 2) Write of guest-visible data
> > > > > 3) Read of host-visible data (e.g. image file metadata)
> > > > > 4) Write of host-visible data (e.g. image file metadata, such as
> > > > > the backing-file)
> > > > > 5) Block chain manipulations (e.g. movement of a BDS, change to r/w
> > > > > instead of r/o, etc..)
> > > > > 6) I/O attribute changes (e.g. throttling, etc..)
> > > 
> > > Most operations looks safe to me, given the way how IOThreads and coroutine
> > > work now. It's only the chain manpulations in long running block jobs that are
> > > exclusive, and by nature it should be checked per chain.  Can we set some op
> > > blockers on the bottom BDS and check it each time, to prevent user from
> > > starting a second chain manipulator?
> > 
> > I don't know if bottom BDS locking is any good because some driver like quorum
> > have multiple childs.
> > Locking everytime the root (top) of the tree seems a feasible solution indeed.
> 
> Quorom doesn't change the convensions of backing chain, so each child belongs
> to its own backing chain, and that chain has a deterministic top and bottom.
> 
> Blocking flag on bottom saves us from adding reverse looking up (->overlay
> pointer), because we already have the ->backing_hd pointer in BDS.

I like the consequence that when a loop is formed like in commit's drive-mirror
run code and must be unlocked the bottom locked BDS will act as a guard to prevent
unlock loop cycling.

We still have the issue of unlocking the bottom BDS when a subtree is detached
from the graphs by a swap. (It does happen in my drive-mirror arbitrary node
replacement series).

From my understanding the unlocking of the root BDS is done by drive_mirror_complete
while the mirror code tries to unref the orphaned subtree _before_ drive_mirror_complete
is called.

So the bottom BDS would be unrefed before being unlocked.

Best regards

Benoît

> 
> Fam
>
Benoît Canet June 21, 2014, 3:40 p.m. UTC | #11
The Saturday 21 Jun 2014 à 17:39:11 (+0200), Benoît Canet wrote :
> The Saturday 21 Jun 2014 à 23:15:19 (+0800), Fam Zheng wrote :
> > On Sat, 06/21 12:45, Benoît Canet wrote:
> > > The Saturday 21 Jun 2014 à 16:53:58 (+0800), Fam Zheng wrote :
> > > > On Fri, 06/20 09:30, Eric Blake wrote:
> > > > > On 06/19/2014 11:01 PM, Fam Zheng wrote:
> > > > > > On Thu, 06/19 22:20, Benoît Canet wrote:
> > > > > >> The Thursday 19 Jun 2014 à 14:13:20 (-0600), Eric Blake wrote :
> > > > > >>> On 06/19/2014 02:01 PM, Benoît Canet wrote:
> > > > > >>>> As the code will start to operate on arbitratry nodes we need the op blocker
> > > > > >>>
> > > > > >>> s/arbitratry/arbitrary/
> > > > > >>>
> > > > > >>>> to recursively block or unblock whole BDS subtrees.
> > > > > > 
> > > > > > I don't get the reason, can you elaborate?
> > > > > 
> > > > > Consider what happens if I have:
> > > > > 
> > > > > base <- snap1 <- active
> > > > > 
> > > > > then I start a fleecing NBD server on the state as it was at snap1:
> > > > > 
> > > > > base <- snap1 <- active
> > > > >               \- fleecing
> > > > > 
> > > > > then I do a blockpull into active:
> > > > > 
> > > > > base <- snap1 <- fleecing
> > > > > active
> > > > > 
> > > > > at this point, base and snap1 are no longer tied to active, but they
> > > > > STILL must be protected from operations that would modify their contents
> > > > > in a way that would break the fleecing operation.  The solution we are
> > > > > looking at is making BDS blockers recursive to every element of the
> > > > > chain, not just the top-level device.
> > > > 
> > > > This would already have been protected by backing blocker of fleecing target.
> > > > 
> > > > > 
> > > > > Another example: consider:
> > > > > 
> > > > > base <- snap1 <- active
> > > > > 
> > > > > then someone uses Jeff's proposed new change-backing-file QMP command to
> > > > > rewrite the snap1 metadata to point to base via a relative name instead
> > > > > of an absolute name.  It shouldn't matter whether active is blocked, but
> > > > > only whether snap1 is blocked.  But to know if snap1 is blocked, we have
> > > > > to propagate the blockers of active down recursively to its backing files.
> > > > 
> > > > Why do we need to block changging of metadata? I think this operation is safe
> > > > in most cases.
> > > > 
> > > > Correct me if I'm missing anything, but even if snap1 _is_ blocked, it would be
> > > > because snap1 is serving as backing of active. In this case, the actual blocker
> > > > should be active->backing_blocker.
> > > > 
> > > > > 
> > > > > >> What would be a cleaner solution ?
> > > > > > 
> > > > > > What is the question to solve?
> > > > > 
> > > > > I think Jeff's idea is on target - rather than blocking by operation, we
> > > > > should instead be blocking on access patterns (various operations
> > > > > trigger several access patterns):
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg04752.html
> > > > > 
> > > > > Jeff's initial list included:
> > > > > 
> > > > > > So if I think of operations that are done on block devices from a
> > > > > > block job, and chuck them into categories, I think we have:
> > > > > > 
> > > > > > 1) Read of guest-visible data
> > > > > > 2) Write of guest-visible data
> > > > > > 3) Read of host-visible data (e.g. image file metadata)
> > > > > > 4) Write of host-visible data (e.g. image file metadata, such as
> > > > > > the backing-file)
> > > > > > 5) Block chain manipulations (e.g. movement of a BDS, change to r/w
> > > > > > instead of r/o, etc..)
> > > > > > 6) I/O attribute changes (e.g. throttling, etc..)
> > > > 
> > > > Most operations looks safe to me, given the way how IOThreads and coroutine
> > > > work now. It's only the chain manpulations in long running block jobs that are
> > > > exclusive, and by nature it should be checked per chain.  Can we set some op
> > > > blockers on the bottom BDS and check it each time, to prevent user from
> > > > starting a second chain manipulator?
> > > 
> > > I don't know if bottom BDS locking is any good because some driver like quorum
> > > have multiple childs.
> > > Locking everytime the root (top) of the tree seems a feasible solution indeed.
> > 
> > Quorom doesn't change the convensions of backing chain, so each child belongs
> > to its own backing chain, and that chain has a deterministic top and bottom.
> > 
> > Blocking flag on bottom saves us from adding reverse looking up (->overlay
> > pointer), because we already have the ->backing_hd pointer in BDS.
> 
> I like the consequence that when a loop is formed like in commit's drive-mirror
> run code and must be unlocked the bottom locked BDS will act as a guard to prevent
> unlock loop cycling.
> 
> We still have the issue of unlocking the bottom BDS when a subtree is detached
> from the graphs by a swap. (It does happen in my drive-mirror arbitrary node
> replacement series).
> 
> From my understanding the unlocking of the root BDS is done by drive_mirror_complete
> while the mirror code tries to unref the orphaned subtree _before_ drive_mirror_complete
> is called.

One fixe to my sentence:
s/drive_mirror_complete/block_job_complete/


> 
> So the bottom BDS would be unrefed before being unlocked.
> 
> Best regards
> 
> Benoît
> 
> > 
> > Fam
> >
Fam Zheng June 23, 2014, 4:32 a.m. UTC | #12
On Sat, 06/21 17:40, Benoît Canet wrote:
> The Saturday 21 Jun 2014 à 17:39:11 (+0200), Benoît Canet wrote :
> > We still have the issue of unlocking the bottom BDS when a subtree is detached
> > from the graphs by a swap. (It does happen in my drive-mirror arbitrary node
> > replacement series).
> > 
> > From my understanding the unlocking of the root BDS is done by drive_mirror_complete
> > while the mirror code tries to unref the orphaned subtree _before_ drive_mirror_complete
> > is called.
> 
> One fixe to my sentence:
> s/drive_mirror_complete/block_job_complete/
> 
> 
> > 
> > So the bottom BDS would be unrefed before being unlocked.

I don't see a problem with that, we can do the unlock before unref the node if
we want.

Fam
Benoît Canet June 23, 2014, 5:17 a.m. UTC | #13
The Monday 23 Jun 2014 à 12:32:30 (+0800), Fam Zheng wrote :
> On Sat, 06/21 17:40, Benoît Canet wrote:
> > The Saturday 21 Jun 2014 à 17:39:11 (+0200), Benoît Canet wrote :
> > > We still have the issue of unlocking the bottom BDS when a subtree is detached
> > > from the graphs by a swap. (It does happen in my drive-mirror arbitrary node
> > > replacement series).
> > > 
> > > From my understanding the unlocking of the root BDS is done by drive_mirror_complete
> > > while the mirror code tries to unref the orphaned subtree _before_ drive_mirror_complete
> > > is called.
> > 
> > One fixe to my sentence:
> > s/drive_mirror_complete/block_job_complete/
> > 
> > 
> > > 
> > > So the bottom BDS would be unrefed before being unlocked.
> 
> I don't see a problem with that, we can do the unlock before unref the node if
> we want.

My concern is that mirror.c does the unref and don't own the Blocker by itself.
The blocker is owned by the blockjob so it's difficult for mirror.c to do the unblock.

> 
> Fam
Fam Zheng June 23, 2014, 7:07 a.m. UTC | #14
On Mon, 06/23 07:17, Benoît Canet wrote:
> The Monday 23 Jun 2014 à 12:32:30 (+0800), Fam Zheng wrote :
> > On Sat, 06/21 17:40, Benoît Canet wrote:
> > > The Saturday 21 Jun 2014 à 17:39:11 (+0200), Benoît Canet wrote :
> > > > We still have the issue of unlocking the bottom BDS when a subtree is detached
> > > > from the graphs by a swap. (It does happen in my drive-mirror arbitrary node
> > > > replacement series).
> > > > 
> > > > From my understanding the unlocking of the root BDS is done by drive_mirror_complete
> > > > while the mirror code tries to unref the orphaned subtree _before_ drive_mirror_complete
> > > > is called.
> > > 
> > > One fixe to my sentence:
> > > s/drive_mirror_complete/block_job_complete/
> > > 
> > > 
> > > > 
> > > > So the bottom BDS would be unrefed before being unlocked.
> > 
> > I don't see a problem with that, we can do the unlock before unref the node if
> > we want.
> 
> My concern is that mirror.c does the unref and don't own the Blocker by itself.
> The blocker is owned by the blockjob so it's difficult for mirror.c to do the unblock.

For example, drive-backup doesn't change the chain, so this (chain
manipulation) blocker should be hold by mirror.c, not by blockjob.c.

Fam
diff mbox

Patch

diff --git a/block.c b/block.c
index 1fdfa66..fe78f42 100644
--- a/block.c
+++ b/block.c
@@ -5414,6 +5414,8 @@  void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason)
     blocker = g_malloc0(sizeof(BdrvOpBlocker));
     blocker->reason = reason;
     QLIST_INSERT_HEAD(&bs->op_blockers[op], blocker, list);
+
+    bdrv_recurse_op_block(bs, op, ACTION_BLOCK, reason);
 }
 
 void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason)
@@ -5426,6 +5428,20 @@  void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason)
             g_free(blocker);
         }
     }
+
+    bdrv_recurse_op_block(bs, op, ACTION_UNBLOCK, reason);
+}
+
+/* This remove unconditionally all blockers of type op of the subtree */
+void bdrv_op_reset(BlockDriverState *bs, BlockOpType op)
+{
+    BdrvOpBlocker *blocker, *next;
+    QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) {
+        QLIST_REMOVE(blocker, list);
+        g_free(blocker);
+    }
+
+    bdrv_recurse_op_block(bs, op, ACTION_RESET, NULL);
 }
 
 void bdrv_op_block_all(BlockDriverState *bs, Error *reason)
@@ -5444,6 +5460,15 @@  void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason)
     }
 }
 
+/* This remove unconditionally all blockers */
+void bdrv_op_reset_all(BlockDriverState *bs)
+{
+    int i;
+    for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
+        bdrv_op_reset(bs, i);
+    }
+}
+
 bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
 {
     int i;
@@ -5456,6 +5481,68 @@  bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
     return true;
 }
 
+/* Used to prevent recursion loop. A case exists in block commit mirror usage */
+static BlockDriverState *recurse_op_bs = NULL;
+/* take note of the recursion depth to allow assigning recurse_op_bs once */
+static uint64_t recurse_op_depth = 0;
+
+/* set or unset an op blocker to a BDS whether set is true or false */
+void bdrv_op_block_action(BlockDriverState *bs, BlockOpType op,
+                          BlockerAction action, Error *reason)
+{
+    if (!bs) {
+        return;
+    }
+
+    recurse_op_depth++;
+    switch (action) {
+    case ACTION_BLOCK:
+        bdrv_op_block(bs, op, reason);
+        break;
+    case ACTION_UNBLOCK:
+        bdrv_op_unblock(bs, op, reason);
+        break;
+    case ACTION_RESET:
+        bdrv_op_reset(bs, op);
+        break;
+    default:
+        assert(false);
+    }
+    recurse_op_depth--;
+}
+
+/* Recursively set or unset an op block to a BDS tree whether set is true or
+ * false
+ */
+void bdrv_recurse_op_block(BlockDriverState *bs, BlockOpType op,
+                           BlockerAction action, Error *reason)
+{
+    /* If recursion is detected simply return */
+    if (recurse_op_bs == bs) {
+        return;
+    }
+
+    /* if we are starting recursion remeber the bs for later comparison */
+    if (!recurse_op_depth) {
+        recurse_op_bs = bs;
+    }
+
+    /* try to set or unset on bs->file and bs->backing_hd first */
+    bdrv_op_block_action(bs->file, op, action, reason);
+    bdrv_op_block_action(bs->backing_hd, op, action, reason);
+
+    /* if the BDS is a filter with multiple childs ask the driver to recurse */
+    if (bs->drv && bs->drv->bdrv_recurse_op_block) {
+         recurse_op_depth++;
+         bs->drv->bdrv_recurse_op_block(bs, op, action, reason);
+         recurse_op_depth--;
+    }
+
+    if (!recurse_op_depth) {
+        recurse_op_bs = NULL;
+    }
+}
+
 void bdrv_iostatus_enable(BlockDriverState *bs)
 {
     bs->iostatus_enabled = true;
diff --git a/block/blkverify.c b/block/blkverify.c
index 621b785..dd0859a 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -304,6 +304,14 @@  static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
     return bdrv_recurse_is_first_non_filter(s->test_file, candidate);
 }
 
+static void blkverify_recurse_op_block(BlockDriverState *bs, BlockOpType op,
+                                       BlockerAction action, Error *reason)
+{
+    BDRVBlkverifyState *s = bs->opaque;
+    bdrv_op_block_action(bs->file, op , action, reason);
+    bdrv_op_block_action(s->test_file, op , action, reason);
+}
+
 /* Propagate AioContext changes to ->test_file */
 static void blkverify_detach_aio_context(BlockDriverState *bs)
 {
@@ -339,6 +347,8 @@  static BlockDriver bdrv_blkverify = {
 
     .is_filter                        = true,
     .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
+
+    .bdrv_recurse_op_block            = blkverify_recurse_op_block,
 };
 
 static void bdrv_blkverify_init(void)
diff --git a/block/commit.c b/block/commit.c
index 5c09f44..d1ca7bd 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -141,6 +141,7 @@  wait:
 
     if (!block_job_is_cancelled(&s->common) && sector_num == end) {
         /* success */
+        bdrv_op_reset_all(active);
         ret = bdrv_drop_intermediate(active, top, base);
     }
 
diff --git a/block/mirror.c b/block/mirror.c
index 94c8661..a99417d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -502,6 +502,7 @@  immediate_exit:
             bdrv_unref(p);
         }
     }
+    bdrv_op_reset_all(s->target);
     bdrv_unref(s->target);
     block_job_completed(&s->common, ret);
 }
diff --git a/block/quorum.c b/block/quorum.c
index 426077a..7ec135a 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -683,6 +683,17 @@  static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
     return false;
 }
 
+static void quorum_recurse_op_block(BlockDriverState *bs, BlockOpType op,
+                                    BlockerAction action, Error *reason)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_op_block_action(s->bs[i], op , action, reason);
+    }
+}
+
 static int quorum_valid_threshold(int threshold, int num_children, Error **errp)
 {
 
@@ -891,6 +902,8 @@  static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+    .bdrv_recurse_op_block              = quorum_recurse_op_block,
 };
 
 static void bdrv_quorum_init(void)
diff --git a/include/block/block.h b/include/block/block.h
index f15b99b..13ebc94 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -182,6 +182,12 @@  typedef enum BlockOpType {
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
+typedef enum BlockerAction {
+    ACTION_BLOCK,
+    ACTION_UNBLOCK,
+    ACTION_RESET,
+} BlockerAction;
+
 void bdrv_iostatus_enable(BlockDriverState *bs);
 void bdrv_iostatus_reset(BlockDriverState *bs);
 void bdrv_iostatus_disable(BlockDriverState *bs);
@@ -476,9 +482,15 @@  void bdrv_unref(BlockDriverState *bs);
 bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
 void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
 void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason);
+void bdrv_op_reset(BlockDriverState *bs, BlockOpType op);
 void bdrv_op_block_all(BlockDriverState *bs, Error *reason);
 void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason);
+void bdrv_op_reset_all(BlockDriverState *bs);
 bool bdrv_op_blocker_is_empty(BlockDriverState *bs);
+void bdrv_op_block_action(BlockDriverState *bs, BlockOpType op,
+                          BlockerAction action, Error *reason);
+void bdrv_recurse_op_block(BlockDriverState *bs, BlockOpType op,
+                           BlockerAction action, Error *reason);
 
 enum BlockAcctType {
     BDRV_ACCT_READ,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7aa2213..de4a75e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -86,6 +86,11 @@  struct BlockDriver {
      */
     bool (*bdrv_recurse_is_first_non_filter)(BlockDriverState *bs,
                                              BlockDriverState *candidate);
+    /* In order to be able to recursively block operation on BDS trees filter
+     * like quorum can implement this callback
+     */
+    void (*bdrv_recurse_op_block)(BlockDriverState *bs, BlockOpType op,
+                                  BlockerAction action, Error *reason);
 
     int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
     int (*bdrv_probe_device)(const char *filename);