diff mbox series

[v2,10/10] file-posix: Make auto-read-only dynamic

Message ID 20190311165017.32247-11-kwolf@redhat.com
State New
Headers show
Series file-posix: Make auto-read-only dynamic | expand

Commit Message

Kevin Wolf March 11, 2019, 4:50 p.m. UTC
Until now, with auto-read-only=on we tried to open the file read-write
first and if that failed, read-only was tried. This is actually not good
enough for libvirt, which gives QEMU SELinux permissions for read-write
only as soon as it actually intends to write to the image. So we need to
be able to switch between read-only and read-write at runtime.

This patch makes auto-read-only dynamic, i.e. the file is opened
read-only as long as no user of the node has requested write
permissions, but it is automatically reopened read-write as soon as the
first writer is attached. Conversely, if the last writer goes away, the
file is reopened read-only again.

bs->read_only is no longer set for auto-read-only=on files even if the
file descriptor is opened read-only because it will be transparently
upgraded as soon as a writer is attached. This changes the output of
qemu-iotests 232.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/file-posix.c         | 36 +++++++++++++++++-------------------
 tests/qemu-iotests/232.out | 12 ++++++------
 2 files changed, 23 insertions(+), 25 deletions(-)

Comments

Eric Blake March 11, 2019, 5:26 p.m. UTC | #1
On 3/11/19 11:50 AM, Kevin Wolf wrote:
> Until now, with auto-read-only=on we tried to open the file read-write
> first and if that failed, read-only was tried. This is actually not good
> enough for libvirt, which gives QEMU SELinux permissions for read-write
> only as soon as it actually intends to write to the image. So we need to
> be able to switch between read-only and read-write at runtime.
> 
> This patch makes auto-read-only dynamic, i.e. the file is opened
> read-only as long as no user of the node has requested write
> permissions, but it is automatically reopened read-write as soon as the
> first writer is attached. Conversely, if the last writer goes away, the
> file is reopened read-only again.
> 
> bs->read_only is no longer set for auto-read-only=on files even if the
> file descriptor is opened read-only because it will be transparently
> upgraded as soon as a writer is attached. This changes the output of
> qemu-iotests 232.

auto-read-only was introduced in 3.1, at which point we intentionally
had sufficiently loose wording to permit (but not require) dynamic state
checking; so you are not breaking the interface.  On the other hand, is
libvirt going to have problems introspecting whether it can use
auto-read-only and get the dynamic behavior it needs?  Or is there
enough else in the way of libvirt's switch to -blockdev that it won't
attempt anything that needs auto-read-only without other 4.0 interfaces
anyway, at which point detecting the presence of the field (but not
whether the field has a guarantee of dynamic behavior) on 3.1 doesn't
matter?
Peter Krempa March 11, 2019, 7:59 p.m. UTC | #2
On Mon, Mar 11, 2019 at 12:26:08 -0500, Eric Blake wrote:
> On 3/11/19 11:50 AM, Kevin Wolf wrote:
> > Until now, with auto-read-only=on we tried to open the file read-write
> > first and if that failed, read-only was tried. This is actually not good
> > enough for libvirt, which gives QEMU SELinux permissions for read-write
> > only as soon as it actually intends to write to the image. So we need to
> > be able to switch between read-only and read-write at runtime.
> > 
> > This patch makes auto-read-only dynamic, i.e. the file is opened
> > read-only as long as no user of the node has requested write
> > permissions, but it is automatically reopened read-write as soon as the
> > first writer is attached. Conversely, if the last writer goes away, the
> > file is reopened read-only again.
> > 
> > bs->read_only is no longer set for auto-read-only=on files even if the
> > file descriptor is opened read-only because it will be transparently
> > upgraded as soon as a writer is attached. This changes the output of
> > qemu-iotests 232.
> 
> auto-read-only was introduced in 3.1, at which point we intentionally
> had sufficiently loose wording to permit (but not require) dynamic state
> checking; so you are not breaking the interface.  On the other hand, is
> libvirt going to have problems introspecting whether it can use
> auto-read-only and get the dynamic behavior it needs?  Or is there
> enough else in the way of libvirt's switch to -blockdev that it won't
> attempt anything that needs auto-read-only without other 4.0 interfaces
> anyway, at which point detecting the presence of the field (but not
> whether the field has a guarantee of dynamic behavior) on 3.1 doesn't
> matter?

I think we can use Stefan's capability detection mechanism he introduced
for the migration with cache enabled for local files to add a flag for
this as well.
Eric Blake March 11, 2019, 8:10 p.m. UTC | #3
On 3/11/19 2:59 PM, Peter Krempa wrote:

>> auto-read-only was introduced in 3.1, at which point we intentionally
>> had sufficiently loose wording to permit (but not require) dynamic state
>> checking; so you are not breaking the interface.  On the other hand, is
>> libvirt going to have problems introspecting whether it can use
>> auto-read-only and get the dynamic behavior it needs?  Or is there
>> enough else in the way of libvirt's switch to -blockdev that it won't
>> attempt anything that needs auto-read-only without other 4.0 interfaces
>> anyway, at which point detecting the presence of the field (but not
>> whether the field has a guarantee of dynamic behavior) on 3.1 doesn't
>> matter?
> 
> I think we can use Stefan's capability detection mechanism he introduced
> for the migration with cache enabled for local files to add a flag for
> this as well.

Except I thought we decided that the most recent version of his QMP
changes was now fully-introspectible, thanks to using conditional
compilation.

https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg02510.html

Well, that may prove to be a short-lived hiatus, if libvirt would
happily attempt to use qemu 3.1 and fail without some other
introspectible hook to know whether auto-read-only has required semantics.
Peter Krempa March 11, 2019, 8:25 p.m. UTC | #4
On Mon, Mar 11, 2019 at 15:10:36 -0500, Eric Blake wrote:
> On 3/11/19 2:59 PM, Peter Krempa wrote:
> 
> >> auto-read-only was introduced in 3.1, at which point we intentionally
> >> had sufficiently loose wording to permit (but not require) dynamic state
> >> checking; so you are not breaking the interface.  On the other hand, is
> >> libvirt going to have problems introspecting whether it can use
> >> auto-read-only and get the dynamic behavior it needs?  Or is there
> >> enough else in the way of libvirt's switch to -blockdev that it won't
> >> attempt anything that needs auto-read-only without other 4.0 interfaces
> >> anyway, at which point detecting the presence of the field (but not
> >> whether the field has a guarantee of dynamic behavior) on 3.1 doesn't
> >> matter?
> > 
> > I think we can use Stefan's capability detection mechanism he introduced
> > for the migration with cache enabled for local files to add a flag for
> > this as well.
> 
> Except I thought we decided that the most recent version of his QMP
> changes was now fully-introspectible, thanks to using conditional
> compilation.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg02510.html

Oh, bummer, I missed that it was no longer needed. I still think it's
worth adding that for future use once it will be necessary to detect
that certain things were patched and require libvirt to change behaviour
if that's the case.

> Well, that may prove to be a short-lived hiatus, if libvirt would
> happily attempt to use qemu 3.1 and fail without some other
> introspectible hook to know whether auto-read-only has required semantics.
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
Markus Armbruster March 11, 2019, 9:15 p.m. UTC | #5
Peter Krempa <pkrempa@redhat.com> writes:

> On Mon, Mar 11, 2019 at 15:10:36 -0500, Eric Blake wrote:
>> On 3/11/19 2:59 PM, Peter Krempa wrote:
>> 
>> >> auto-read-only was introduced in 3.1, at which point we intentionally
>> >> had sufficiently loose wording to permit (but not require) dynamic state
>> >> checking; so you are not breaking the interface.  On the other hand, is
>> >> libvirt going to have problems introspecting whether it can use
>> >> auto-read-only and get the dynamic behavior it needs?  Or is there
>> >> enough else in the way of libvirt's switch to -blockdev that it won't
>> >> attempt anything that needs auto-read-only without other 4.0 interfaces
>> >> anyway, at which point detecting the presence of the field (but not
>> >> whether the field has a guarantee of dynamic behavior) on 3.1 doesn't
>> >> matter?
>> > 
>> > I think we can use Stefan's capability detection mechanism he introduced
>> > for the migration with cache enabled for local files to add a flag for
>> > this as well.
>> 
>> Except I thought we decided that the most recent version of his QMP
>> changes was now fully-introspectible, thanks to using conditional
>> compilation.
>> 
>> https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg02510.html
>
> Oh, bummer, I missed that it was no longer needed. I still think it's
> worth adding that for future use once it will be necessary to detect
> that certain things were patched and require libvirt to change behaviour
> if that's the case.

We'll add it when we have a compelling use for it.

>> Well, that may prove to be a short-lived hiatus, if libvirt would
>> happily attempt to use qemu 3.1 and fail without some other
>> introspectible hook to know whether auto-read-only has required semantics.
Max Reitz April 29, 2019, 8:16 p.m. UTC | #6
On 11.03.19 17:50, Kevin Wolf wrote:
> Until now, with auto-read-only=on we tried to open the file read-write
> first and if that failed, read-only was tried. This is actually not good
> enough for libvirt, which gives QEMU SELinux permissions for read-write
> only as soon as it actually intends to write to the image. So we need to
> be able to switch between read-only and read-write at runtime.
> 
> This patch makes auto-read-only dynamic, i.e. the file is opened
> read-only as long as no user of the node has requested write
> permissions, but it is automatically reopened read-write as soon as the
> first writer is attached. Conversely, if the last writer goes away, the
> file is reopened read-only again.
> 
> bs->read_only is no longer set for auto-read-only=on files even if the
> file descriptor is opened read-only because it will be transparently
> upgraded as soon as a writer is attached. This changes the output of
> qemu-iotests 232.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/file-posix.c         | 36 +++++++++++++++++-------------------
>  tests/qemu-iotests/232.out | 12 ++++++------
>  2 files changed, 23 insertions(+), 25 deletions(-)

https://bugzilla.redhat.com/show_bug.cgi?id=1703793 seems to be caused
by this patch: When the mirror job completes, it drops all permissions
on its target BB with an &error_abort.  As of this patch, this may
result in file-posix attempting to reopen the FD, which may fail.

There are two problems I can see: First, the previous patch should have
updated s->open_flags along with s->fd when the FD is switched.  As it
is now, s->open_flags is not updated, so it stays on O_RDONLY and every
time the permissions are checked, the FD is reconfigured and then switched.

That's simple to fix, just add BDRVRawState.perm_change_flags and set it
to open_flags after raw_reconfigure_getfd() returned a ret != s->fd
(when s->perm_change_fd is set).

That fixes the problem of file-posix attempting to reopen the FD to
O_RDWR all the time, which caused the crash.

But that gives us another crash, because now dropping the permissions
(correctly) reopens the FD to O_RDONLY, with the exact same implications
as above: If the target becomes unavailable, opening the new FD will
fail, and qemu will crash.

I don't know what to do about this.  In the spirit of "dropping
permissions should always work", I presume raw_reconfigure_getfd()
should just return the old FD if it had more permissions than the new
one would have.  But if the user issues an explicit reopen command, they
probably want such an error to be reported.

Max
Kevin Wolf April 30, 2019, 11:31 a.m. UTC | #7
Am 29.04.2019 um 22:16 hat Max Reitz geschrieben:
> On 11.03.19 17:50, Kevin Wolf wrote:
> > Until now, with auto-read-only=on we tried to open the file read-write
> > first and if that failed, read-only was tried. This is actually not good
> > enough for libvirt, which gives QEMU SELinux permissions for read-write
> > only as soon as it actually intends to write to the image. So we need to
> > be able to switch between read-only and read-write at runtime.
> > 
> > This patch makes auto-read-only dynamic, i.e. the file is opened
> > read-only as long as no user of the node has requested write
> > permissions, but it is automatically reopened read-write as soon as the
> > first writer is attached. Conversely, if the last writer goes away, the
> > file is reopened read-only again.
> > 
> > bs->read_only is no longer set for auto-read-only=on files even if the
> > file descriptor is opened read-only because it will be transparently
> > upgraded as soon as a writer is attached. This changes the output of
> > qemu-iotests 232.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/file-posix.c         | 36 +++++++++++++++++-------------------
> >  tests/qemu-iotests/232.out | 12 ++++++------
> >  2 files changed, 23 insertions(+), 25 deletions(-)
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1703793 seems to be caused
> by this patch: When the mirror job completes, it drops all permissions
> on its target BB with an &error_abort.  As of this patch, this may
> result in file-posix attempting to reopen the FD, which may fail.

Hm, yes. But we already agreed that dropping permissions should never
fail, so maybe we should just implement that now (in block.c).

> There are two problems I can see: First, the previous patch should have
> updated s->open_flags along with s->fd when the FD is switched.  As it
> is now, s->open_flags is not updated, so it stays on O_RDONLY and every
> time the permissions are checked, the FD is reconfigured and then switched.
> 
> That's simple to fix, just add BDRVRawState.perm_change_flags and set it
> to open_flags after raw_reconfigure_getfd() returned a ret != s->fd
> (when s->perm_change_fd is set).

...and actually do s->open_flags = s->perm_change_flags in
raw_set_perm(), otherwise it doesn't help much. :-)

> That fixes the problem of file-posix attempting to reopen the FD to
> O_RDWR all the time, which caused the crash.

Good.

> But that gives us another crash, because now dropping the permissions
> (correctly) reopens the FD to O_RDONLY, with the exact same implications
> as above: If the target becomes unavailable, opening the new FD will
> fail, and qemu will crash.
> 
> I don't know what to do about this.  In the spirit of "dropping
> permissions should always work", I presume raw_reconfigure_getfd()
> should just return the old FD if it had more permissions than the new
> one would have.  But if the user issues an explicit reopen command, they
> probably want such an error to be reported.

Yes, I think file-posix should let the operation fail (because it
actually failed) and the permission functions in block.c should ignore
the error for dropping permissions. This way, reopen will still
correctly return the error because it has a different call chain.

Kevin
diff mbox series

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index e41e0779c6..d41059d139 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -376,13 +376,21 @@  static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     }
 }
 
-static void raw_parse_flags(int bdrv_flags, int *open_flags)
+static void raw_parse_flags(int bdrv_flags, int *open_flags, bool has_writers)
 {
+    bool read_write = false;
     assert(open_flags != NULL);
 
     *open_flags |= O_BINARY;
     *open_flags &= ~O_ACCMODE;
-    if (bdrv_flags & BDRV_O_RDWR) {
+
+    if (bdrv_flags & BDRV_O_AUTO_RDONLY) {
+        read_write = has_writers;
+    } else if (bdrv_flags & BDRV_O_RDWR) {
+        read_write = true;
+    }
+
+    if (read_write) {
         *open_flags |= O_RDWR;
     } else {
         *open_flags |= O_RDONLY;
@@ -518,24 +526,12 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
                                                false);
 
     s->open_flags = open_flags;
-    raw_parse_flags(bdrv_flags, &s->open_flags);
+    raw_parse_flags(bdrv_flags, &s->open_flags, false);
 
     s->fd = -1;
     fd = qemu_open(filename, s->open_flags, 0644);
     ret = fd < 0 ? -errno : 0;
 
-    if (ret == -EACCES || ret == -EROFS) {
-        /* Try to degrade to read-only, but if it doesn't work, still use the
-         * normal error message. */
-        if (bdrv_apply_auto_read_only(bs, NULL, NULL) == 0) {
-            bdrv_flags &= ~BDRV_O_RDWR;
-            raw_parse_flags(bdrv_flags, &s->open_flags);
-            assert(!(s->open_flags & O_CREAT));
-            fd = qemu_open(filename, s->open_flags);
-            ret = fd < 0 ? -errno : 0;
-        }
-    }
-
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not open '%s'", filename);
         if (ret == -EROFS) {
@@ -846,12 +842,14 @@  static int raw_handle_perm_lock(BlockDriverState *bs,
 }
 
 static int raw_reconfigure_getfd(BlockDriverState *bs, int flags,
-                                 int *open_flags, bool force_dup,
+                                 int *open_flags, uint64_t perm, bool force_dup,
                                  Error **errp)
 {
     BDRVRawState *s = bs->opaque;
     int fd = -1;
     int ret;
+    bool has_writers = perm &
+        (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED | BLK_PERM_RESIZE);
     int fcntl_flags = O_APPEND | O_NONBLOCK;
 #ifdef O_NOATIME
     fcntl_flags |= O_NOATIME;
@@ -862,7 +860,7 @@  static int raw_reconfigure_getfd(BlockDriverState *bs, int flags,
         *open_flags |= O_NONBLOCK;
     }
 
-    raw_parse_flags(flags, open_flags);
+    raw_parse_flags(flags, open_flags, has_writers);
 
 #ifdef O_ASYNC
     /* Not all operating systems have O_ASYNC, and those that don't
@@ -942,7 +940,7 @@  static int raw_reopen_prepare(BDRVReopenState *state,
     qemu_opts_to_qdict(opts, state->options);
 
     rs->fd = raw_reconfigure_getfd(state->bs, state->flags, &rs->open_flags,
-                                   true, &local_err);
+                                   state->perm, true, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         ret = -1;
@@ -2720,7 +2718,7 @@  static int raw_check_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared,
         s->perm_change_fd = rs->fd;
     } else {
         /* We may need a new fd if auto-read-only switches the mode */
-        ret = raw_reconfigure_getfd(bs, bs->open_flags, &open_flags,
+        ret = raw_reconfigure_getfd(bs, bs->open_flags, &open_flags, perm,
                                     false, errp);
         if (ret < 0) {
             return ret;
diff --git a/tests/qemu-iotests/232.out b/tests/qemu-iotests/232.out
index 5036ef13d4..5bcc44bb62 100644
--- a/tests/qemu-iotests/232.out
+++ b/tests/qemu-iotests/232.out
@@ -22,12 +22,12 @@  NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only)
 NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only)
 
 QEMU_PROG: -drive driver=file,file=TEST_DIR/t.IMGFMT,if=none,read-only=off,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied
-NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only)
-NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only)
+NODE_NAME: TEST_DIR/t.IMGFMT (file)
+NODE_NAME: TEST_DIR/t.IMGFMT (file)
 
 QEMU_PROG: -drive driver=file,file=TEST_DIR/t.IMGFMT,if=none,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied
-NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only)
-NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only)
+NODE_NAME: TEST_DIR/t.IMGFMT (file)
+NODE_NAME: TEST_DIR/t.IMGFMT (file)
 
 === -blockdev with read-write image: read-only/auto-read-only combinations ===
 
@@ -50,11 +50,11 @@  node0: TEST_DIR/t.IMGFMT (file, read-only)
 node0: TEST_DIR/t.IMGFMT (file, read-only)
 
 QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0,read-only=off,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied
-node0: TEST_DIR/t.IMGFMT (file, read-only)
+node0: TEST_DIR/t.IMGFMT (file)
 QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0,read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied
 
 QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied
-node0: TEST_DIR/t.IMGFMT (file, read-only)
+node0: TEST_DIR/t.IMGFMT (file)
 QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0: Could not open 'TEST_DIR/t.IMGFMT': Permission denied
 
 === Try commit to backing file with auto-read-only ===