Message ID | 20190311165017.32247-11-kwolf@redhat.com |
---|---|
State | New |
Headers | show |
Series | file-posix: Make auto-read-only dynamic | expand |
On 3/11/19 11:50 AM, Kevin Wolf wrote: > Until now, with auto-read-only=on we tried to open the file read-write > first and if that failed, read-only was tried. This is actually not good > enough for libvirt, which gives QEMU SELinux permissions for read-write > only as soon as it actually intends to write to the image. So we need to > be able to switch between read-only and read-write at runtime. > > This patch makes auto-read-only dynamic, i.e. the file is opened > read-only as long as no user of the node has requested write > permissions, but it is automatically reopened read-write as soon as the > first writer is attached. Conversely, if the last writer goes away, the > file is reopened read-only again. > > bs->read_only is no longer set for auto-read-only=on files even if the > file descriptor is opened read-only because it will be transparently > upgraded as soon as a writer is attached. This changes the output of > qemu-iotests 232. auto-read-only was introduced in 3.1, at which point we intentionally had sufficiently loose wording to permit (but not require) dynamic state checking; so you are not breaking the interface. On the other hand, is libvirt going to have problems introspecting whether it can use auto-read-only and get the dynamic behavior it needs? Or is there enough else in the way of libvirt's switch to -blockdev that it won't attempt anything that needs auto-read-only without other 4.0 interfaces anyway, at which point detecting the presence of the field (but not whether the field has a guarantee of dynamic behavior) on 3.1 doesn't matter?
On Mon, Mar 11, 2019 at 12:26:08 -0500, Eric Blake wrote: > On 3/11/19 11:50 AM, Kevin Wolf wrote: > > Until now, with auto-read-only=on we tried to open the file read-write > > first and if that failed, read-only was tried. This is actually not good > > enough for libvirt, which gives QEMU SELinux permissions for read-write > > only as soon as it actually intends to write to the image. So we need to > > be able to switch between read-only and read-write at runtime. > > > > This patch makes auto-read-only dynamic, i.e. the file is opened > > read-only as long as no user of the node has requested write > > permissions, but it is automatically reopened read-write as soon as the > > first writer is attached. Conversely, if the last writer goes away, the > > file is reopened read-only again. > > > > bs->read_only is no longer set for auto-read-only=on files even if the > > file descriptor is opened read-only because it will be transparently > > upgraded as soon as a writer is attached. This changes the output of > > qemu-iotests 232. > > auto-read-only was introduced in 3.1, at which point we intentionally > had sufficiently loose wording to permit (but not require) dynamic state > checking; so you are not breaking the interface. On the other hand, is > libvirt going to have problems introspecting whether it can use > auto-read-only and get the dynamic behavior it needs? Or is there > enough else in the way of libvirt's switch to -blockdev that it won't > attempt anything that needs auto-read-only without other 4.0 interfaces > anyway, at which point detecting the presence of the field (but not > whether the field has a guarantee of dynamic behavior) on 3.1 doesn't > matter? I think we can use Stefan's capability detection mechanism he introduced for the migration with cache enabled for local files to add a flag for this as well.
On 3/11/19 2:59 PM, Peter Krempa wrote: >> auto-read-only was introduced in 3.1, at which point we intentionally >> had sufficiently loose wording to permit (but not require) dynamic state >> checking; so you are not breaking the interface. On the other hand, is >> libvirt going to have problems introspecting whether it can use >> auto-read-only and get the dynamic behavior it needs? Or is there >> enough else in the way of libvirt's switch to -blockdev that it won't >> attempt anything that needs auto-read-only without other 4.0 interfaces >> anyway, at which point detecting the presence of the field (but not >> whether the field has a guarantee of dynamic behavior) on 3.1 doesn't >> matter? > > I think we can use Stefan's capability detection mechanism he introduced > for the migration with cache enabled for local files to add a flag for > this as well. Except I thought we decided that the most recent version of his QMP changes was now fully-introspectible, thanks to using conditional compilation. https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg02510.html Well, that may prove to be a short-lived hiatus, if libvirt would happily attempt to use qemu 3.1 and fail without some other introspectible hook to know whether auto-read-only has required semantics.
On Mon, Mar 11, 2019 at 15:10:36 -0500, Eric Blake wrote: > On 3/11/19 2:59 PM, Peter Krempa wrote: > > >> auto-read-only was introduced in 3.1, at which point we intentionally > >> had sufficiently loose wording to permit (but not require) dynamic state > >> checking; so you are not breaking the interface. On the other hand, is > >> libvirt going to have problems introspecting whether it can use > >> auto-read-only and get the dynamic behavior it needs? Or is there > >> enough else in the way of libvirt's switch to -blockdev that it won't > >> attempt anything that needs auto-read-only without other 4.0 interfaces > >> anyway, at which point detecting the presence of the field (but not > >> whether the field has a guarantee of dynamic behavior) on 3.1 doesn't > >> matter? > > > > I think we can use Stefan's capability detection mechanism he introduced > > for the migration with cache enabled for local files to add a flag for > > this as well. > > Except I thought we decided that the most recent version of his QMP > changes was now fully-introspectible, thanks to using conditional > compilation. > > https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg02510.html Oh, bummer, I missed that it was no longer needed. I still think it's worth adding that for future use once it will be necessary to detect that certain things were patched and require libvirt to change behaviour if that's the case. > Well, that may prove to be a short-lived hiatus, if libvirt would > happily attempt to use qemu 3.1 and fail without some other > introspectible hook to know whether auto-read-only has required semantics. > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org >
Peter Krempa <pkrempa@redhat.com> writes: > On Mon, Mar 11, 2019 at 15:10:36 -0500, Eric Blake wrote: >> On 3/11/19 2:59 PM, Peter Krempa wrote: >> >> >> auto-read-only was introduced in 3.1, at which point we intentionally >> >> had sufficiently loose wording to permit (but not require) dynamic state >> >> checking; so you are not breaking the interface. On the other hand, is >> >> libvirt going to have problems introspecting whether it can use >> >> auto-read-only and get the dynamic behavior it needs? Or is there >> >> enough else in the way of libvirt's switch to -blockdev that it won't >> >> attempt anything that needs auto-read-only without other 4.0 interfaces >> >> anyway, at which point detecting the presence of the field (but not >> >> whether the field has a guarantee of dynamic behavior) on 3.1 doesn't >> >> matter? >> > >> > I think we can use Stefan's capability detection mechanism he introduced >> > for the migration with cache enabled for local files to add a flag for >> > this as well. >> >> Except I thought we decided that the most recent version of his QMP >> changes was now fully-introspectible, thanks to using conditional >> compilation. >> >> https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg02510.html > > Oh, bummer, I missed that it was no longer needed. I still think it's > worth adding that for future use once it will be necessary to detect > that certain things were patched and require libvirt to change behaviour > if that's the case. We'll add it when we have a compelling use for it. >> Well, that may prove to be a short-lived hiatus, if libvirt would >> happily attempt to use qemu 3.1 and fail without some other >> introspectible hook to know whether auto-read-only has required semantics.
On 11.03.19 17:50, Kevin Wolf wrote: > Until now, with auto-read-only=on we tried to open the file read-write > first and if that failed, read-only was tried. This is actually not good > enough for libvirt, which gives QEMU SELinux permissions for read-write > only as soon as it actually intends to write to the image. So we need to > be able to switch between read-only and read-write at runtime. > > This patch makes auto-read-only dynamic, i.e. the file is opened > read-only as long as no user of the node has requested write > permissions, but it is automatically reopened read-write as soon as the > first writer is attached. Conversely, if the last writer goes away, the > file is reopened read-only again. > > bs->read_only is no longer set for auto-read-only=on files even if the > file descriptor is opened read-only because it will be transparently > upgraded as soon as a writer is attached. This changes the output of > qemu-iotests 232. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > block/file-posix.c | 36 +++++++++++++++++------------------- > tests/qemu-iotests/232.out | 12 ++++++------ > 2 files changed, 23 insertions(+), 25 deletions(-) https://bugzilla.redhat.com/show_bug.cgi?id=1703793 seems to be caused by this patch: When the mirror job completes, it drops all permissions on its target BB with an &error_abort. As of this patch, this may result in file-posix attempting to reopen the FD, which may fail. There are two problems I can see: First, the previous patch should have updated s->open_flags along with s->fd when the FD is switched. As it is now, s->open_flags is not updated, so it stays on O_RDONLY and every time the permissions are checked, the FD is reconfigured and then switched. That's simple to fix, just add BDRVRawState.perm_change_flags and set it to open_flags after raw_reconfigure_getfd() returned a ret != s->fd (when s->perm_change_fd is set). That fixes the problem of file-posix attempting to reopen the FD to O_RDWR all the time, which caused the crash. But that gives us another crash, because now dropping the permissions (correctly) reopens the FD to O_RDONLY, with the exact same implications as above: If the target becomes unavailable, opening the new FD will fail, and qemu will crash. I don't know what to do about this. In the spirit of "dropping permissions should always work", I presume raw_reconfigure_getfd() should just return the old FD if it had more permissions than the new one would have. But if the user issues an explicit reopen command, they probably want such an error to be reported. Max
Am 29.04.2019 um 22:16 hat Max Reitz geschrieben: > On 11.03.19 17:50, Kevin Wolf wrote: > > Until now, with auto-read-only=on we tried to open the file read-write > > first and if that failed, read-only was tried. This is actually not good > > enough for libvirt, which gives QEMU SELinux permissions for read-write > > only as soon as it actually intends to write to the image. So we need to > > be able to switch between read-only and read-write at runtime. > > > > This patch makes auto-read-only dynamic, i.e. the file is opened > > read-only as long as no user of the node has requested write > > permissions, but it is automatically reopened read-write as soon as the > > first writer is attached. Conversely, if the last writer goes away, the > > file is reopened read-only again. > > > > bs->read_only is no longer set for auto-read-only=on files even if the > > file descriptor is opened read-only because it will be transparently > > upgraded as soon as a writer is attached. This changes the output of > > qemu-iotests 232. > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > block/file-posix.c | 36 +++++++++++++++++------------------- > > tests/qemu-iotests/232.out | 12 ++++++------ > > 2 files changed, 23 insertions(+), 25 deletions(-) > > https://bugzilla.redhat.com/show_bug.cgi?id=1703793 seems to be caused > by this patch: When the mirror job completes, it drops all permissions > on its target BB with an &error_abort. As of this patch, this may > result in file-posix attempting to reopen the FD, which may fail. Hm, yes. But we already agreed that dropping permissions should never fail, so maybe we should just implement that now (in block.c). > There are two problems I can see: First, the previous patch should have > updated s->open_flags along with s->fd when the FD is switched. As it > is now, s->open_flags is not updated, so it stays on O_RDONLY and every > time the permissions are checked, the FD is reconfigured and then switched. > > That's simple to fix, just add BDRVRawState.perm_change_flags and set it > to open_flags after raw_reconfigure_getfd() returned a ret != s->fd > (when s->perm_change_fd is set). ...and actually do s->open_flags = s->perm_change_flags in raw_set_perm(), otherwise it doesn't help much. :-) > That fixes the problem of file-posix attempting to reopen the FD to > O_RDWR all the time, which caused the crash. Good. > But that gives us another crash, because now dropping the permissions > (correctly) reopens the FD to O_RDONLY, with the exact same implications > as above: If the target becomes unavailable, opening the new FD will > fail, and qemu will crash. > > I don't know what to do about this. In the spirit of "dropping > permissions should always work", I presume raw_reconfigure_getfd() > should just return the old FD if it had more permissions than the new > one would have. But if the user issues an explicit reopen command, they > probably want such an error to be reported. Yes, I think file-posix should let the operation fail (because it actually failed) and the permission functions in block.c should ignore the error for dropping permissions. This way, reopen will still correctly return the error because it has a different call chain. Kevin
diff --git a/block/file-posix.c b/block/file-posix.c index e41e0779c6..d41059d139 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -376,13 +376,21 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp) } } -static void raw_parse_flags(int bdrv_flags, int *open_flags) +static void raw_parse_flags(int bdrv_flags, int *open_flags, bool has_writers) { + bool read_write = false; assert(open_flags != NULL); *open_flags |= O_BINARY; *open_flags &= ~O_ACCMODE; - if (bdrv_flags & BDRV_O_RDWR) { + + if (bdrv_flags & BDRV_O_AUTO_RDONLY) { + read_write = has_writers; + } else if (bdrv_flags & BDRV_O_RDWR) { + read_write = true; + } + + if (read_write) { *open_flags |= O_RDWR; } else { *open_flags |= O_RDONLY; @@ -518,24 +526,12 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, false); s->open_flags = open_flags; - raw_parse_flags(bdrv_flags, &s->open_flags); + raw_parse_flags(bdrv_flags, &s->open_flags, false); s->fd = -1; fd = qemu_open(filename, s->open_flags, 0644); ret = fd < 0 ? -errno : 0; - if (ret == -EACCES || ret == -EROFS) { - /* Try to degrade to read-only, but if it doesn't work, still use the - * normal error message. */ - if (bdrv_apply_auto_read_only(bs, NULL, NULL) == 0) { - bdrv_flags &= ~BDRV_O_RDWR; - raw_parse_flags(bdrv_flags, &s->open_flags); - assert(!(s->open_flags & O_CREAT)); - fd = qemu_open(filename, s->open_flags); - ret = fd < 0 ? -errno : 0; - } - } - if (ret < 0) { error_setg_errno(errp, -ret, "Could not open '%s'", filename); if (ret == -EROFS) { @@ -846,12 +842,14 @@ static int raw_handle_perm_lock(BlockDriverState *bs, } static int raw_reconfigure_getfd(BlockDriverState *bs, int flags, - int *open_flags, bool force_dup, + int *open_flags, uint64_t perm, bool force_dup, Error **errp) { BDRVRawState *s = bs->opaque; int fd = -1; int ret; + bool has_writers = perm & + (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED | BLK_PERM_RESIZE); int fcntl_flags = O_APPEND | O_NONBLOCK; #ifdef O_NOATIME fcntl_flags |= O_NOATIME; @@ -862,7 +860,7 @@ static int raw_reconfigure_getfd(BlockDriverState *bs, int flags, *open_flags |= O_NONBLOCK; } - raw_parse_flags(flags, open_flags); + raw_parse_flags(flags, open_flags, has_writers); #ifdef O_ASYNC /* Not all operating systems have O_ASYNC, and those that don't @@ -942,7 +940,7 @@ static int raw_reopen_prepare(BDRVReopenState *state, qemu_opts_to_qdict(opts, state->options); rs->fd = raw_reconfigure_getfd(state->bs, state->flags, &rs->open_flags, - true, &local_err); + state->perm, true, &local_err); if (local_err) { error_propagate(errp, local_err); ret = -1; @@ -2720,7 +2718,7 @@ static int raw_check_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared, s->perm_change_fd = rs->fd; } else { /* We may need a new fd if auto-read-only switches the mode */ - ret = raw_reconfigure_getfd(bs, bs->open_flags, &open_flags, + ret = raw_reconfigure_getfd(bs, bs->open_flags, &open_flags, perm, false, errp); if (ret < 0) { return ret; diff --git a/tests/qemu-iotests/232.out b/tests/qemu-iotests/232.out index 5036ef13d4..5bcc44bb62 100644 --- a/tests/qemu-iotests/232.out +++ b/tests/qemu-iotests/232.out @@ -22,12 +22,12 @@ NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only) NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only) QEMU_PROG: -drive driver=file,file=TEST_DIR/t.IMGFMT,if=none,read-only=off,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied -NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only) -NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only) +NODE_NAME: TEST_DIR/t.IMGFMT (file) +NODE_NAME: TEST_DIR/t.IMGFMT (file) QEMU_PROG: -drive driver=file,file=TEST_DIR/t.IMGFMT,if=none,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied -NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only) -NODE_NAME: TEST_DIR/t.IMGFMT (file, read-only) +NODE_NAME: TEST_DIR/t.IMGFMT (file) +NODE_NAME: TEST_DIR/t.IMGFMT (file) === -blockdev with read-write image: read-only/auto-read-only combinations === @@ -50,11 +50,11 @@ node0: TEST_DIR/t.IMGFMT (file, read-only) node0: TEST_DIR/t.IMGFMT (file, read-only) QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0,read-only=off,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied -node0: TEST_DIR/t.IMGFMT (file, read-only) +node0: TEST_DIR/t.IMGFMT (file) QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0,read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0,auto-read-only=off: Could not open 'TEST_DIR/t.IMGFMT': Permission denied -node0: TEST_DIR/t.IMGFMT (file, read-only) +node0: TEST_DIR/t.IMGFMT (file) QEMU_PROG: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=node0: Could not open 'TEST_DIR/t.IMGFMT': Permission denied === Try commit to backing file with auto-read-only ===
Until now, with auto-read-only=on we tried to open the file read-write first and if that failed, read-only was tried. This is actually not good enough for libvirt, which gives QEMU SELinux permissions for read-write only as soon as it actually intends to write to the image. So we need to be able to switch between read-only and read-write at runtime. This patch makes auto-read-only dynamic, i.e. the file is opened read-only as long as no user of the node has requested write permissions, but it is automatically reopened read-write as soon as the first writer is attached. Conversely, if the last writer goes away, the file is reopened read-only again. bs->read_only is no longer set for auto-read-only=on files even if the file descriptor is opened read-only because it will be transparently upgraded as soon as a writer is attached. This changes the output of qemu-iotests 232. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/file-posix.c | 36 +++++++++++++++++------------------- tests/qemu-iotests/232.out | 12 ++++++------ 2 files changed, 23 insertions(+), 25 deletions(-)