mbox series

[0/4] mirror: Do not dereference invalid pointers

Message ID 20190912135632.13925-1-mreitz@redhat.com
Headers show
Series mirror: Do not dereference invalid pointers | expand

Message

Max Reitz Sept. 12, 2019, 1:56 p.m. UTC
Hi,

The fix (patch 1) is pretty straightforward; patch 2 (which I need for
the test) may not be.

The biggest problem with patch 2 is that you can use it to uncover where
our permission handling is broken.  For example, devising the test case
(patch 4) was very difficult because I kept running into the
&error_abort that mirror_exit_common() passes when dropping the
mirror_top_bs.

The problem is that mirror_top_bs does not take the same permissions
that its parent takes.  Ergo using &error_abort when dropping it is
wrong: The parent may require more permissions that mirror_top_bs did,
and so dropping mirror_top_bs may fail.

Now what’s really bad is that this cannot be fixed with our current
permission system.  mirror_top_bs was introduced precisely so it does
not take CONSISTENT_READ, but can still allow parents to take it (for
active commits).  But what if there is actually something besides the
mirror job that unshares CONSISTENT_READ?


Imagine this:

      mirror target BB       mirror source BB
                  |             |
                  v             v
mirror_top_bs -> top -> mid -> base
                  ^
                  |
             other_parent

The source BB unshares CONSISTENT_READ on the base.  mirror_top_bs
ensures that its parents can read from top even though top itself cannot
allow CONSISTENT_READ to be taken.  So far so good.

But what if other_parent also unshares CONSISTENT_READ?  Then,
mirror_top_bs has no business allowing its parents to take it.

No idea how to fix that.  (I suppose mirror_top_bs would need some way
to verify that there is no other party that has unshared CONSISTENT_READ
but its associated source BB.  In the future, we want the source BB to
go away and instead have the source be an immediate BdrvChild of
mirror_top_bs.  Maybe we can then build something into the block layer
so that a node can only restore CONSISTENT_READ when it was that node
that broke it?)


Anyway.  You can see something arising from this problem simply by
unsharing CONSISTENT_READ on the target node.  (Just drop the src-perm
node from the test I add in patch 4.)  Replacing the source with the
target will then work fine (because mirror_top_bs doesn’t care about
CONSISTENT_READ being removed), but then you cannot drop mirror_top_bs –
because its parent does want CONSISTENT_READ.  Thus, the &error_abort
aborts.


While this is a more special case, I have no idea how to fix this one
either.


Soo...  This series just fixes one thing, and leaves another unfixed
because I have no idea how to fix it.  Worse, it adds parameters to
blkdebug to actually see the problem.  Do we want to let blkdebug be
able to crash qemu (because of a bug in qemu)?


Max Reitz (4):
  mirror: Do not dereference invalid pointers
  blkdebug: Allow taking/unsharing permissions
  iotests: Add @error to wait_until_completed
  iotests: Add test for failing mirror complete

 qapi/block-core.json          |  29 +++++++++-
 block/blkdebug.c              | 106 +++++++++++++++++++++++++++++++++-
 block/mirror.c                |  13 +++--
 tests/qemu-iotests/041        |  44 ++++++++++++++
 tests/qemu-iotests/041.out    |   4 +-
 tests/qemu-iotests/iotests.py |  18 ++++--
 6 files changed, 200 insertions(+), 14 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Sept. 18, 2019, 3:38 p.m. UTC | #1
12.09.2019 16:56, Max Reitz wrote:
> Hi,
> 
> The fix (patch 1) is pretty straightforward; patch 2 (which I need for
> the test) may not be.
> 
> The biggest problem with patch 2 is that you can use it to uncover where
> our permission handling is broken.  For example, devising the test case
> (patch 4) was very difficult because I kept running into the
> &error_abort that mirror_exit_common() passes when dropping the
> mirror_top_bs.
> 
> The problem is that mirror_top_bs does not take the same permissions
> that its parent takes.  Ergo using &error_abort when dropping it is
> wrong: The parent may require more permissions that mirror_top_bs did,
> and so dropping mirror_top_bs may fail.
> 
> Now what’s really bad is that this cannot be fixed with our current
> permission system.  mirror_top_bs was introduced precisely so it does
> not take CONSISTENT_READ, but can still allow parents to take it (for
> active commits).  But what if there is actually something besides the
> mirror job that unshares CONSISTENT_READ?
> 
> 
> Imagine this:
> 
>        mirror target BB       mirror source BB
>                    |             |
>                    v             v
> mirror_top_bs -> top -> mid -> base
>                    ^
>                    |
>               other_parent
> 
> The source BB unshares CONSISTENT_READ on the base.  mirror_top_bs
> ensures that its parents can read from top even though top itself cannot
> allow CONSISTENT_READ to be taken.  So far so good.
> 
> But what if other_parent also unshares CONSISTENT_READ?  Then,
> mirror_top_bs has no business allowing its parents to take it.
> 
> No idea how to fix that.  (I suppose mirror_top_bs would need some way
> to verify that there is no other party that has unshared CONSISTENT_READ
> but its associated source BB.

May be we need grouped permissions?

Some way to define group of children, which may unshare read permission
for other children (out of the group), but still children in group may
have read permission?

But it don't work here as we are saying about children on different
nodes.. And propagated through backing chain permissions..


>  In the future, we want the source BB to
> go away and instead have the source be an immediate BdrvChild of
> mirror_top_bs.  Maybe we can then build something into the block layer
> so that a node can only restore CONSISTENT_READ when it was that node
> that broke it?)
> 
> 
> Anyway.  You can see something arising from this problem simply by
> unsharing CONSISTENT_READ on the target node.  (Just drop the src-perm
> node from the test I add in patch 4.)  Replacing the source with the
> target will then work fine (because mirror_top_bs doesn’t care about
> CONSISTENT_READ being removed), but then you cannot drop mirror_top_bs –
> because its parent does want CONSISTENT_READ.  Thus, the &error_abort
> aborts.
> 
> 
> While this is a more special case, I have no idea how to fix this one
> either.
> 
> 
> Soo...  This series just fixes one thing, and leaves another unfixed
> because I have no idea how to fix it.  Worse, it adds parameters to
> blkdebug to actually see the problem.  Do we want to let blkdebug be
> able to crash qemu (because of a bug in qemu)?
> 

blkdebug is for debugging and not used by end users like libvirt, yes?

> 
> Max Reitz (4):
>    mirror: Do not dereference invalid pointers
>    blkdebug: Allow taking/unsharing permissions
>    iotests: Add @error to wait_until_completed
>    iotests: Add test for failing mirror complete
> 
>   qapi/block-core.json          |  29 +++++++++-
>   block/blkdebug.c              | 106 +++++++++++++++++++++++++++++++++-
>   block/mirror.c                |  13 +++--
>   tests/qemu-iotests/041        |  44 ++++++++++++++
>   tests/qemu-iotests/041.out    |   4 +-
>   tests/qemu-iotests/iotests.py |  18 ++++--
>   6 files changed, 200 insertions(+), 14 deletions(-)
>
Max Reitz Sept. 19, 2019, 4:45 p.m. UTC | #2
On 18.09.19 17:38, Vladimir Sementsov-Ogievskiy wrote:
> 12.09.2019 16:56, Max Reitz wrote:
>> Hi,
>>
>> The fix (patch 1) is pretty straightforward; patch 2 (which I need for
>> the test) may not be.
>>
>> The biggest problem with patch 2 is that you can use it to uncover where
>> our permission handling is broken.  For example, devising the test case
>> (patch 4) was very difficult because I kept running into the
>> &error_abort that mirror_exit_common() passes when dropping the
>> mirror_top_bs.
>>
>> The problem is that mirror_top_bs does not take the same permissions
>> that its parent takes.  Ergo using &error_abort when dropping it is
>> wrong: The parent may require more permissions that mirror_top_bs did,
>> and so dropping mirror_top_bs may fail.
>>
>> Now what’s really bad is that this cannot be fixed with our current
>> permission system.  mirror_top_bs was introduced precisely so it does
>> not take CONSISTENT_READ, but can still allow parents to take it (for
>> active commits).  But what if there is actually something besides the
>> mirror job that unshares CONSISTENT_READ?
>>
>>
>> Imagine this:
>>
>>        mirror target BB       mirror source BB
>>                    |             |
>>                    v             v
>> mirror_top_bs -> top -> mid -> base
>>                    ^
>>                    |
>>               other_parent
>>
>> The source BB unshares CONSISTENT_READ on the base.  mirror_top_bs
>> ensures that its parents can read from top even though top itself cannot
>> allow CONSISTENT_READ to be taken.  So far so good.
>>
>> But what if other_parent also unshares CONSISTENT_READ?  Then,
>> mirror_top_bs has no business allowing its parents to take it.
>>
>> No idea how to fix that.  (I suppose mirror_top_bs would need some way
>> to verify that there is no other party that has unshared CONSISTENT_READ
>> but its associated source BB.
> 
> May be we need grouped permissions?
> 
> Some way to define group of children, which may unshare read permission
> for other children (out of the group), but still children in group may
> have read permission?

Hm, is that different from my idea below where one of mirror_top's
children unshares the read permission, and another is allowed to take it
still?

(The problem is always that if some BDS has a parent that unshares this
permission, this condition propagates upwards through its other parents,
and we need to keep track of who unshared it in the first place.)

> But it don't work here as we are saying about children on different
> nodes.. And propagated through backing chain permissions..

Yep.

>>  In the future, we want the source BB to
>> go away and instead have the source be an immediate BdrvChild of
>> mirror_top_bs.  Maybe we can then build something into the block layer
>> so that a node can only restore CONSISTENT_READ when it was that node
>> that broke it?)
>>
>>
>> Anyway.  You can see something arising from this problem simply by
>> unsharing CONSISTENT_READ on the target node.  (Just drop the src-perm
>> node from the test I add in patch 4.)  Replacing the source with the
>> target will then work fine (because mirror_top_bs doesn’t care about
>> CONSISTENT_READ being removed), but then you cannot drop mirror_top_bs –
>> because its parent does want CONSISTENT_READ.  Thus, the &error_abort
>> aborts.
>>
>>
>> While this is a more special case, I have no idea how to fix this one
>> either.
>>
>>
>> Soo...  This series just fixes one thing, and leaves another unfixed
>> because I have no idea how to fix it.  Worse, it adds parameters to
>> blkdebug to actually see the problem.  Do we want to let blkdebug be
>> able to crash qemu (because of a bug in qemu)?
>>
> 
> blkdebug is for debugging and not used by end users like libvirt, yes?

Correct.

Max

>>
>> Max Reitz (4):
>>    mirror: Do not dereference invalid pointers
>>    blkdebug: Allow taking/unsharing permissions
>>    iotests: Add @error to wait_until_completed
>>    iotests: Add test for failing mirror complete
>>
>>   qapi/block-core.json          |  29 +++++++++-
>>   block/blkdebug.c              | 106 +++++++++++++++++++++++++++++++++-
>>   block/mirror.c                |  13 +++--
>>   tests/qemu-iotests/041        |  44 ++++++++++++++
>>   tests/qemu-iotests/041.out    |   4 +-
>>   tests/qemu-iotests/iotests.py |  18 ++++--
>>   6 files changed, 200 insertions(+), 14 deletions(-)
>>
> 
>
Vladimir Sementsov-Ogievskiy Sept. 19, 2019, 4:50 p.m. UTC | #3
19.09.2019 19:45, Max Reitz wrote:
> On 18.09.19 17:38, Vladimir Sementsov-Ogievskiy wrote:
>> 12.09.2019 16:56, Max Reitz wrote:
>>> Hi,
>>>
>>> The fix (patch 1) is pretty straightforward; patch 2 (which I need for
>>> the test) may not be.
>>>
>>> The biggest problem with patch 2 is that you can use it to uncover where
>>> our permission handling is broken.  For example, devising the test case
>>> (patch 4) was very difficult because I kept running into the
>>> &error_abort that mirror_exit_common() passes when dropping the
>>> mirror_top_bs.
>>>
>>> The problem is that mirror_top_bs does not take the same permissions
>>> that its parent takes.  Ergo using &error_abort when dropping it is
>>> wrong: The parent may require more permissions that mirror_top_bs did,
>>> and so dropping mirror_top_bs may fail.
>>>
>>> Now what’s really bad is that this cannot be fixed with our current
>>> permission system.  mirror_top_bs was introduced precisely so it does
>>> not take CONSISTENT_READ, but can still allow parents to take it (for
>>> active commits).  But what if there is actually something besides the
>>> mirror job that unshares CONSISTENT_READ?
>>>
>>>
>>> Imagine this:
>>>
>>>         mirror target BB       mirror source BB
>>>                     |             |
>>>                     v             v
>>> mirror_top_bs -> top -> mid -> base
>>>                     ^
>>>                     |
>>>                other_parent
>>>
>>> The source BB unshares CONSISTENT_READ on the base.  mirror_top_bs
>>> ensures that its parents can read from top even though top itself cannot
>>> allow CONSISTENT_READ to be taken.  So far so good.
>>>
>>> But what if other_parent also unshares CONSISTENT_READ?  Then,
>>> mirror_top_bs has no business allowing its parents to take it.
>>>
>>> No idea how to fix that.  (I suppose mirror_top_bs would need some way
>>> to verify that there is no other party that has unshared CONSISTENT_READ
>>> but its associated source BB.
>>
>> May be we need grouped permissions?
>>
>> Some way to define group of children, which may unshare read permission
>> for other children (out of the group), but still children in group may
>> have read permission?
> 
> Hm, is that different from my idea below where one of mirror_top's
> children unshares the read permission, and another is allowed to take it
> still?

I just tried to imagine something generic

> 
> (The problem is always that if some BDS has a parent that unshares this
> permission, this condition propagates upwards through its other parents,
> and we need to keep track of who unshared it in the first place.)
> 
>> But it don't work here as we are saying about children on different
>> nodes.. And propagated through backing chain permissions..
> 
> Yep.
> 
>>>   In the future, we want the source BB to
>>> go away and instead have the source be an immediate BdrvChild of
>>> mirror_top_bs.  Maybe we can then build something into the block layer
>>> so that a node can only restore CONSISTENT_READ when it was that node
>>> that broke it?)
>>>
>>>
>>> Anyway.  You can see something arising from this problem simply by
>>> unsharing CONSISTENT_READ on the target node.  (Just drop the src-perm
>>> node from the test I add in patch 4.)  Replacing the source with the
>>> target will then work fine (because mirror_top_bs doesn’t care about
>>> CONSISTENT_READ being removed), but then you cannot drop mirror_top_bs –
>>> because its parent does want CONSISTENT_READ.  Thus, the &error_abort
>>> aborts.
>>>
>>>
>>> While this is a more special case, I have no idea how to fix this one
>>> either.
>>>
>>>
>>> Soo...  This series just fixes one thing, and leaves another unfixed
>>> because I have no idea how to fix it.  Worse, it adds parameters to
>>> blkdebug to actually see the problem.  Do we want to let blkdebug be
>>> able to crash qemu (because of a bug in qemu)?
>>>
>>
>> blkdebug is for debugging and not used by end users like libvirt, yes?
> 
> Correct.
> 
> 
>>>
>>> Max Reitz (4):
>>>     mirror: Do not dereference invalid pointers
>>>     blkdebug: Allow taking/unsharing permissions
>>>     iotests: Add @error to wait_until_completed
>>>     iotests: Add test for failing mirror complete
>>>
>>>    qapi/block-core.json          |  29 +++++++++-
>>>    block/blkdebug.c              | 106 +++++++++++++++++++++++++++++++++-
>>>    block/mirror.c                |  13 +++--
>>>    tests/qemu-iotests/041        |  44 ++++++++++++++
>>>    tests/qemu-iotests/041.out    |   4 +-
>>>    tests/qemu-iotests/iotests.py |  18 ++++--
>>>    6 files changed, 200 insertions(+), 14 deletions(-)
>>>
>>
>>
> 
>