diff mbox

[RFC] block: Tolerate existing writers on read only BdrvChild

Message ID 20170301081504.23595-1-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng March 1, 2017, 8:15 a.m. UTC
In an ideal world, read-write access to an image is inherently
exclusive, because we cannot guarantee other readers and writers a
consistency view of the whole image at all point. That's what the
current permission system does, and it is okay as long as it is entirely
internal. But that would change with the coming image locking. In
practice, both end users and our test cases use tools like qemu-img and
qemu-io to peek at images while guest is running.

Relax a bit and accept other writers in this case.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Kevin Wolf March 1, 2017, 9:49 a.m. UTC | #1
Am 01.03.2017 um 09:15 hat Fam Zheng geschrieben:
> In an ideal world, read-write access to an image is inherently
> exclusive, because we cannot guarantee other readers and writers a
> consistency view of the whole image at all point. That's what the
> current permission system does, and it is okay as long as it is entirely
> internal. But that would change with the coming image locking. In
> practice, both end users and our test cases use tools like qemu-img and
> qemu-io to peek at images while guest is running.
> 
> Relax a bit and accept other writers in this case.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

Hm. On the one hand I think you're right that people are using things
like this, on other hand it's also true that the result might not be
consistent and therefore image locking is right about forbidding these
actions.

I think your patch is too permissive, it allows even launching a second
long-running VM on the image, which will definitely see corrupted data
sooner or later.

Maybe what we can do is allow shared writers for read-only images if
CONSISTENT_READ isn't requested. It's still not 100% correct because we
can get inconsistent metadata and cause unexpected failure, but this is
probably tolerable. We would then have to change the allowed actions to
not request this permission.

In qemu-img, we have these read-only users:

* qemu-img check (without -r): Let's keep this blocked, it will only
  report lots of leaks and leads to invalid bug reports. I've had my
  share of them.

* qemu-img compare: Not sure if this makes sense with an image that is
  in active use?

* qemu-img convert source: Similarly to qemu-img compare, this doesn't
  make a whole lot of sense, with one exception: People are using -s to
  extract internal snapshots while the VM is still running, and this
  usually works because the snapshot doesn't change. However, I don't
  want to know what happens if you delete the snapshort from the qemu
  process while qemu-img convert is running... Doing 'qemu-img snapshot
  -s ...' with a running VM is more tolerated abuse than a supported
  feature.

* qemu-img info: This one makes perfect sense even with a running VM

* qemu-img map: Results are potentially meaningless with concurrent I/O,
  but there may be cases where it makes sense.

* qemu-img snapshot -l: Somewhat similar to qemu-img convert -s, except
  that it's very quick and doesn't access actual data (could even be
  BDRV_O_NOIO, I think). Allowing this makes sense, I guess.

* qemu-img rebase, safe mode, backing files: If we allowed concurrent
  writes, it wouldn't be safe.

* qemu-img bench: Well... No. You don't need this on an image with an
  active VM.

* qemu-img dd: Same as convert, except that there is no -s.

So the list of subcommands where we want to support it is rather short.
We can change blk_new_open() to clear CONSISTENT_READ for BDRV_O_NOIO,
which could cover 'info' and 'snapshot -l'.

That leaves us with qemu-io, 'convert -s' and 'map', all of which can be
imagined to be useful even with a running VM, but all of which can also
easily produce wrong results in this case. An explicit command line
option to disable CONSISTENT_READ might be the right tool here.

What do you think?

Kevin
Fam Zheng March 1, 2017, 12:39 p.m. UTC | #2
On Wed, 03/01 10:49, Kevin Wolf wrote:
> Am 01.03.2017 um 09:15 hat Fam Zheng geschrieben:
> > In an ideal world, read-write access to an image is inherently
> > exclusive, because we cannot guarantee other readers and writers a
> > consistency view of the whole image at all point. That's what the
> > current permission system does, and it is okay as long as it is entirely
> > internal. But that would change with the coming image locking. In
> > practice, both end users and our test cases use tools like qemu-img and
> > qemu-io to peek at images while guest is running.
> > 
> > Relax a bit and accept other writers in this case.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> Hm. On the one hand I think you're right that people are using things
> like this, on other hand it's also true that the result might not be
> consistent and therefore image locking is right about forbidding these
> actions.
> 
> I think your patch is too permissive, it allows even launching a second
> long-running VM on the image, which will definitely see corrupted data
> sooner or later.
> 
> Maybe what we can do is allow shared writers for read-only images if
> CONSISTENT_READ isn't requested. It's still not 100% correct because we
> can get inconsistent metadata and cause unexpected failure, but this is
> probably tolerable. We would then have to change the allowed actions to
> not request this permission.
> 
> In qemu-img, we have these read-only users:
> 
> * qemu-img check (without -r): Let's keep this blocked, it will only
>   report lots of leaks and leads to invalid bug reports. I've had my
>   share of them.
> 
> * qemu-img compare: Not sure if this makes sense with an image that is
>   in active use?
> 
> * qemu-img convert source: Similarly to qemu-img compare, this doesn't
>   make a whole lot of sense, with one exception: People are using -s to
>   extract internal snapshots while the VM is still running, and this
>   usually works because the snapshot doesn't change. However, I don't
>   want to know what happens if you delete the snapshort from the qemu
>   process while qemu-img convert is running... Doing 'qemu-img snapshot
>   -s ...' with a running VM is more tolerated abuse than a supported
>   feature.
> 
> * qemu-img info: This one makes perfect sense even with a running VM
> 
> * qemu-img map: Results are potentially meaningless with concurrent I/O,
>   but there may be cases where it makes sense.
> 
> * qemu-img snapshot -l: Somewhat similar to qemu-img convert -s, except
>   that it's very quick and doesn't access actual data (could even be
>   BDRV_O_NOIO, I think). Allowing this makes sense, I guess.
> 
> * qemu-img rebase, safe mode, backing files: If we allowed concurrent
>   writes, it wouldn't be safe.
> 
> * qemu-img bench: Well... No. You don't need this on an image with an
>   active VM.
> 
> * qemu-img dd: Same as convert, except that there is no -s.
> 
> So the list of subcommands where we want to support it is rather short.
> We can change blk_new_open() to clear CONSISTENT_READ for BDRV_O_NOIO,
> which could cover 'info' and 'snapshot -l'.
> 
> That leaves us with qemu-io, 'convert -s' and 'map', all of which can be
> imagined to be useful even with a running VM, but all of which can also
> easily produce wrong results in this case. An explicit command line
> option to disable CONSISTENT_READ might be the right tool here.

I'm not sure about this because: 1) this is intrusive from a user PoV, many
scripts and upper layer tools will stop working; 2) CONSISTENT_READ is enforced
by qcow2 in its .bdrv_child_perm implementation even if blk_new_open() doesn't
ask for it, therefore such an option has to impact the whole graph; 3) this
isn't only about asking for "persistent read" perm, but more about granting
"write" in shared_perm, so it feels messy.

A perhaps more contained way is to add a BDRV_O_RELAXED_LOCK flag and use it in
those subcommands, then in the image locking code, the "no other writer" byte is
not locked if it's set. This has a simpler semantic and a more manageable scope.

Fam
Kevin Wolf March 1, 2017, 3:16 p.m. UTC | #3
Am 01.03.2017 um 13:39 hat Fam Zheng geschrieben:
> On Wed, 03/01 10:49, Kevin Wolf wrote:
> > Am 01.03.2017 um 09:15 hat Fam Zheng geschrieben:
> > > In an ideal world, read-write access to an image is inherently
> > > exclusive, because we cannot guarantee other readers and writers a
> > > consistency view of the whole image at all point. That's what the
> > > current permission system does, and it is okay as long as it is entirely
> > > internal. But that would change with the coming image locking. In
> > > practice, both end users and our test cases use tools like qemu-img and
> > > qemu-io to peek at images while guest is running.
> > > 
> > > Relax a bit and accept other writers in this case.
> > > 
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > 
> > Hm. On the one hand I think you're right that people are using things
> > like this, on other hand it's also true that the result might not be
> > consistent and therefore image locking is right about forbidding these
> > actions.
> > 
> > I think your patch is too permissive, it allows even launching a second
> > long-running VM on the image, which will definitely see corrupted data
> > sooner or later.
> > 
> > Maybe what we can do is allow shared writers for read-only images if
> > CONSISTENT_READ isn't requested. It's still not 100% correct because we
> > can get inconsistent metadata and cause unexpected failure, but this is
> > probably tolerable. We would then have to change the allowed actions to
> > not request this permission.
> > 
> > In qemu-img, we have these read-only users:
> > [...]
> > 
> > So the list of subcommands where we want to support it is rather short.
> > We can change blk_new_open() to clear CONSISTENT_READ for BDRV_O_NOIO,
> > which could cover 'info' and 'snapshot -l'.
> > 
> > That leaves us with qemu-io, 'convert -s' and 'map', all of which can be
> > imagined to be useful even with a running VM, but all of which can also
> > easily produce wrong results in this case. An explicit command line
> > option to disable CONSISTENT_READ might be the right tool here.
> 
> I'm not sure about this because: 1) this is intrusive from a user PoV, many
> scripts and upper layer tools will stop working;

Are you sure? I don't expect user scripts or even proper management
tools to use qemu-io on running VMs. I do expect that some users are
using 'convert -s' with running VMs despite our recommendation against
it.

If they are aware that they're doing something that works only in the
right circumstances, then breaking it isn't nice. But my gut feeling is
that most of them don't know about the implications of accessing a live
image. In this case breaking their use case and making them think about
whether they want to add something like a --force option sounds like a
good thing because they aren't caught by surprise when things go wrong
eventually.

> 2) CONSISTENT_READ is enforced by qcow2 in its .bdrv_child_perm
> implementation even if blk_new_open() doesn't ask for it, therefore
> such an option has to impact the whole graph;

Yes, the perm |= CONSISTENT_READ statement would become conditional as
well so that it isn't set if all parents are read-only and don't need
CONSISTENT_READ themselves.

This is strictly speaking wrong, but after all the whole point of the
change would be to allow things that are strictly speaking wrong.

> 3) this isn't only about asking for "persistent read" perm, but more
> about granting "write" in shared_perm, so it feels messy.

Yes, exceptions make everything messy.

> A perhaps more contained way is to add a BDRV_O_RELAXED_LOCK flag and
> use it in those subcommands, then in the image locking code, the "no
> other writer" byte is not locked if it's set. This has a simpler
> semantic and a more manageable scope.

No new flags, please (at least none that are used in places deeper than
blk_new_open()).

I'm also not sure whether applying different rules within qemu and
externally is a really good idea. If an external process can open an
image read-only, why wouldn't it be possible internally?

Kevin
Fam Zheng March 1, 2017, 4:10 p.m. UTC | #4
On Wed, 03/01 16:16, Kevin Wolf wrote:
> > I'm not sure about this because: 1) this is intrusive from a user PoV, many
> > scripts and upper layer tools will stop working;
> 
> Are you sure? I don't expect user scripts or even proper management
> tools to use qemu-io on running VMs. I do expect that some users are
> using 'convert -s' with running VMs despite our recommendation against
> it.
> 
> If they are aware that they're doing something that works only in the
> right circumstances, then breaking it isn't nice. But my gut feeling is
> that most of them don't know about the implications of accessing a live
> image. In this case breaking their use case and making them think about
> whether they want to add something like a --force option sounds like a
> good thing because they aren't caught by surprise when things go wrong
> eventually.

Yes, the use case is poor for qcow2, and your points stand there. But image
locking will be at the posix level, which has a wider range of users. I cannot
draw the same conclusion on raw images as easily.
Kevin Wolf March 1, 2017, 4:22 p.m. UTC | #5
Am 01.03.2017 um 17:10 hat Fam Zheng geschrieben:
> On Wed, 03/01 16:16, Kevin Wolf wrote:
> > > I'm not sure about this because: 1) this is intrusive from a user PoV, many
> > > scripts and upper layer tools will stop working;
> > 
> > Are you sure? I don't expect user scripts or even proper management
> > tools to use qemu-io on running VMs. I do expect that some users are
> > using 'convert -s' with running VMs despite our recommendation against
> > it.
> > 
> > If they are aware that they're doing something that works only in the
> > right circumstances, then breaking it isn't nice. But my gut feeling is
> > that most of them don't know about the implications of accessing a live
> > image. In this case breaking their use case and making them think about
> > whether they want to add something like a --force option sounds like a
> > good thing because they aren't caught by surprise when things go wrong
> > eventually.
> 
> Yes, the use case is poor for qcow2, and your points stand there. But image
> locking will be at the posix level, which has a wider range of users. I cannot
> draw the same conclusion on raw images as easily.

Well, with raw, I'm even less concerned about breaking the commands
related to internal snapshots. :-)

Kevin
Fam Zheng March 2, 2017, 11:21 a.m. UTC | #6
On Wed, 03/01 17:22, Kevin Wolf wrote:
> Am 01.03.2017 um 17:10 hat Fam Zheng geschrieben:
> > On Wed, 03/01 16:16, Kevin Wolf wrote:
> > > > I'm not sure about this because: 1) this is intrusive from a user PoV, many
> > > > scripts and upper layer tools will stop working;
> > > 
> > > Are you sure? I don't expect user scripts or even proper management
> > > tools to use qemu-io on running VMs. I do expect that some users are
> > > using 'convert -s' with running VMs despite our recommendation against
> > > it.
> > > 
> > > If they are aware that they're doing something that works only in the
> > > right circumstances, then breaking it isn't nice. But my gut feeling is
> > > that most of them don't know about the implications of accessing a live
> > > image. In this case breaking their use case and making them think about
> > > whether they want to add something like a --force option sounds like a
> > > good thing because they aren't caught by surprise when things go wrong
> > > eventually.
> > 
> > Yes, the use case is poor for qcow2, and your points stand there. But image
> > locking will be at the posix level, which has a wider range of users. I cannot
> > draw the same conclusion on raw images as easily.
> 
> Well, with raw, I'm even less concerned about breaking the commands
> related to internal snapshots. :-)

Yes, I'm agree with a --force there. For qemu-img map and qemu-io, personally I
think it's better to keep the default working. qemu-io is a expert mode tool,
whoever using it at all should already know what he's doing, --force doesn't add
much protection for the innocent.

Fam
Kevin Wolf March 2, 2017, 2:23 p.m. UTC | #7
Am 02.03.2017 um 12:21 hat Fam Zheng geschrieben:
> On Wed, 03/01 17:22, Kevin Wolf wrote:
> > Am 01.03.2017 um 17:10 hat Fam Zheng geschrieben:
> > > On Wed, 03/01 16:16, Kevin Wolf wrote:
> > > > > I'm not sure about this because: 1) this is intrusive from a user PoV, many
> > > > > scripts and upper layer tools will stop working;
> > > > 
> > > > Are you sure? I don't expect user scripts or even proper management
> > > > tools to use qemu-io on running VMs. I do expect that some users are
> > > > using 'convert -s' with running VMs despite our recommendation against
> > > > it.
> > > > 
> > > > If they are aware that they're doing something that works only in the
> > > > right circumstances, then breaking it isn't nice. But my gut feeling is
> > > > that most of them don't know about the implications of accessing a live
> > > > image. In this case breaking their use case and making them think about
> > > > whether they want to add something like a --force option sounds like a
> > > > good thing because they aren't caught by surprise when things go wrong
> > > > eventually.
> > > 
> > > Yes, the use case is poor for qcow2, and your points stand there. But image
> > > locking will be at the posix level, which has a wider range of users. I cannot
> > > draw the same conclusion on raw images as easily.
> > 
> > Well, with raw, I'm even less concerned about breaking the commands
> > related to internal snapshots. :-)
> 
> Yes, I'm agree with a --force there. For qemu-img map and qemu-io, personally I
> think it's better to keep the default working. qemu-io is a expert mode tool,
> whoever using it at all should already know what he's doing, --force doesn't add
> much protection for the innocent.

Being an expert doesn't protect you from stupid mistakes like forgetting
that a VM is still running. --force doesn't prevent you from performing
your evil action, but it prevents accidents where you didn't even intend
to be evil for a change.

I think qemu-io and qemu-img map are tools that only human users should
be using while a VM is running, so breaking command line syntax
compatibility by requiring a --force option wouldn't hurt too much.

Kevin
Fam Zheng March 3, 2017, 1:46 a.m. UTC | #8
On Thu, 03/02 15:23, Kevin Wolf wrote:
> Am 02.03.2017 um 12:21 hat Fam Zheng geschrieben:
> > On Wed, 03/01 17:22, Kevin Wolf wrote:
> > > Am 01.03.2017 um 17:10 hat Fam Zheng geschrieben:
> > > > On Wed, 03/01 16:16, Kevin Wolf wrote:
> > > > > > I'm not sure about this because: 1) this is intrusive from a user PoV, many
> > > > > > scripts and upper layer tools will stop working;
> > > > > 
> > > > > Are you sure? I don't expect user scripts or even proper management
> > > > > tools to use qemu-io on running VMs. I do expect that some users are
> > > > > using 'convert -s' with running VMs despite our recommendation against
> > > > > it.
> > > > > 
> > > > > If they are aware that they're doing something that works only in the
> > > > > right circumstances, then breaking it isn't nice. But my gut feeling is
> > > > > that most of them don't know about the implications of accessing a live
> > > > > image. In this case breaking their use case and making them think about
> > > > > whether they want to add something like a --force option sounds like a
> > > > > good thing because they aren't caught by surprise when things go wrong
> > > > > eventually.
> > > > 
> > > > Yes, the use case is poor for qcow2, and your points stand there. But image
> > > > locking will be at the posix level, which has a wider range of users. I cannot
> > > > draw the same conclusion on raw images as easily.
> > > 
> > > Well, with raw, I'm even less concerned about breaking the commands
> > > related to internal snapshots. :-)
> > 
> > Yes, I'm agree with a --force there. For qemu-img map and qemu-io, personally I
> > think it's better to keep the default working. qemu-io is a expert mode tool,
> > whoever using it at all should already know what he's doing, --force doesn't add
> > much protection for the innocent.
> 
> Being an expert doesn't protect you from stupid mistakes like forgetting
> that a VM is still running. --force doesn't prevent you from performing
> your evil action, but it prevents accidents where you didn't even intend
> to be evil for a change.
> 
> I think qemu-io and qemu-img map are tools that only human users should
> be using while a VM is running, so breaking command line syntax
> compatibility by requiring a --force option wouldn't hurt too much.

OK, that sounds fair. I don't know how to implement the option, though, would
you like to take care of it? :)

Fam
diff mbox

Patch

diff --git a/block.c b/block.c
index f293ccb..9bdd77c 100644
--- a/block.c
+++ b/block.c
@@ -1685,12 +1685,12 @@  void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
         /* Format drivers may touch metadata even if the guest doesn't write */
         if (!bdrv_is_read_only(bs)) {
             perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+            shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
         }
 
         /* bs->file always needs to be consistent because of the metadata. We
          * can never allow other users to resize or write to it. */
         perm |= BLK_PERM_CONSISTENT_READ;
-        shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
     } else {
         /* We want consistent read from backing files if the parent needs it.
          * No other operations are performed on backing files. */