diff mbox series

[v2,2/8] block: Add auto-read-only option

Message ID 20181012115532.12645-3-kwolf@redhat.com
State New
Headers show
Series block: Add auto-read-only option | expand

Commit Message

Kevin Wolf Oct. 12, 2018, 11:55 a.m. UTC
If a management application builds the block graph node by node, the
protocol layer doesn't inherit its read-only option from the format
layer any more, so it must be set explicitly.

Backing files should work on read-only storage, but at the same time, a
block job like commit should be able to reopen them read-write if they
are on read-write storage. However, without option inheritance, reopen
only changes the read-only option for the root node (typically the
format layer), but not the protocol layer, so reopening fails (the
format layer wants to get write permissions, but the protocol layer is
still read-only).

A simple workaround for the problem in the management tool would be to
open the protocol layer always read-write and to make only the format
layer read-only for backing files. However, sometimes the file is
actually stored on read-only storage and we don't know whether the image
can be opened read-write (for example, for NBD it depends on the server
we're trying to connect to). This adds an option that makes QEMU try to
open the image read-write, but allows it to degrade to a read-only mode
without returning an error.

The documentation for this option is consciously phrased in a way that
allows QEMU to switch to a better model eventually: Instead of trying
when the image is first opened, making the read-only flag dynamic and
changing it automatically whenever the first BLK_PERM_WRITE user is
attached or the last one is detached would be much more useful
behaviour.

Unfortunately, this more useful behaviour is also a lot harder to
implement, and libvirt needs a solution now before it can switch to
-blockdev, so let's start with this easier approach for now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json  |  6 ++++++
 include/block/block.h |  2 ++
 block.c               | 21 ++++++++++++++++++++-
 block/vvfat.c         |  1 +
 4 files changed, 29 insertions(+), 1 deletion(-)

Comments

Eric Blake Oct. 12, 2018, 4:47 p.m. UTC | #1
On 10/12/18 6:55 AM, Kevin Wolf wrote:
> If a management application builds the block graph node by node, the
> protocol layer doesn't inherit its read-only option from the format
> layer any more, so it must be set explicitly.
> 

> The documentation for this option is consciously phrased in a way that
> allows QEMU to switch to a better model eventually: Instead of trying
> when the image is first opened, making the read-only flag dynamic and
> changing it automatically whenever the first BLK_PERM_WRITE user is
> attached or the last one is detached would be much more useful
> behaviour.
> 
> Unfortunately, this more useful behaviour is also a lot harder to
> implement, and libvirt needs a solution now before it can switch to
> -blockdev, so let's start with this easier approach for now.

I agree both with the approach of getting the simpler implementation in 
now (always writable, even when we don't need to write) as well as 
wording the documentation to permit a future stricter approach (only 
writable at the points where we need to write).

> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   qapi/block-core.json  |  6 ++++++
>   include/block/block.h |  2 ++
>   block.c               | 21 ++++++++++++++++++++-
>   block/vvfat.c         |  1 +
>   4 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index cfb37f8c1d..3a899298de 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3651,6 +3651,11 @@
>   #                 either generally or in certain configurations. In this case,
>   #                 the default value does not work and the option must be
>   #                 specified explicitly.
> +# @auto-read-only: if true, QEMU may ignore the @read-only option and
> +#                  automatically decide whether to open the image read-only or
> +#                  read-write (and switch between the modes later), e.g.
> +#                  depending on whether the image file is writable or whether a
> +#                  writing user is attached to the node (default: false).

Bike-shedding: Do we really want to ignore @read-only? Here's the table 
of 9 combinations ('t'rue, 'f'alse, 'o'mitted), with '*' on the rows 
that must be preserved for back-compat:

RO   Auto   effect
o    o      *open for write, fail if not possible
f    o      *open for write, fail if not possible
t    o      *open for read, no conversion to write
o    f      open for write, fail if not possible
f    f      open for write, fail if not possible
t    f      open for read, no conversion to write
o    t      attempt write but graceful fall back to read
f    t      attempt write but graceful fall back to read
t    t      ignore RO flag, attempt write anyway

That last row is weird, why not make it an explicit error instead of 
ignoring the implied difference in semantics between the two?

Or, another idea: is it worth trying to support a single tri-state 
member (via an alternative between bool and enum, since the existing 
code uses a JSON bool):

"read-only": false (open for write, fail if not possible)
"read-only": true (open read-only, no later switching)
"read-only": "auto" (switch as needed; or for initial implementation 
attempt for write with graceful fallback to read)
omitting read-only: same as "read-only":false for back-compat


> @@ -1328,6 +1338,11 @@ QemuOptsList bdrv_runtime_opts = {
>               .type = QEMU_OPT_BOOL,
>               .help = "Node is opened in read-only mode",
>           },
> +        {
> +            .name = BDRV_OPT_AUTO_READ_ONLY,
> +            .type = QEMU_OPT_BOOL,
> +            .help = "Node can become read-only if opening read-write fails",
> +        },

If we keep your current approach, is it worth mentioning that 
auto-read-only true overrides read-only true?

The code looks okay, but I'd like discussion on the bikeshed points 
before giving R-b.
Kevin Wolf Oct. 15, 2018, 9:37 a.m. UTC | #2
Am 12.10.2018 um 18:47 hat Eric Blake geschrieben:
> On 10/12/18 6:55 AM, Kevin Wolf wrote:
> > If a management application builds the block graph node by node, the
> > protocol layer doesn't inherit its read-only option from the format
> > layer any more, so it must be set explicitly.
> > 
> 
> > The documentation for this option is consciously phrased in a way that
> > allows QEMU to switch to a better model eventually: Instead of trying
> > when the image is first opened, making the read-only flag dynamic and
> > changing it automatically whenever the first BLK_PERM_WRITE user is
> > attached or the last one is detached would be much more useful
> > behaviour.
> > 
> > Unfortunately, this more useful behaviour is also a lot harder to
> > implement, and libvirt needs a solution now before it can switch to
> > -blockdev, so let's start with this easier approach for now.
> 
> I agree both with the approach of getting the simpler implementation in now
> (always writable, even when we don't need to write) as well as wording the
> documentation to permit a future stricter approach (only writable at the
> points where we need to write).
> 
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   qapi/block-core.json  |  6 ++++++
> >   include/block/block.h |  2 ++
> >   block.c               | 21 ++++++++++++++++++++-
> >   block/vvfat.c         |  1 +
> >   4 files changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index cfb37f8c1d..3a899298de 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -3651,6 +3651,11 @@
> >   #                 either generally or in certain configurations. In this case,
> >   #                 the default value does not work and the option must be
> >   #                 specified explicitly.
> > +# @auto-read-only: if true, QEMU may ignore the @read-only option and
> > +#                  automatically decide whether to open the image read-only or
> > +#                  read-write (and switch between the modes later), e.g.
> > +#                  depending on whether the image file is writable or whether a
> > +#                  writing user is attached to the node (default: false).
> 
> Bike-shedding: Do we really want to ignore @read-only? Here's the table of 9
> combinations ('t'rue, 'f'alse, 'o'mitted), with '*' on the rows that must be
> preserved for back-compat:
> 
> RO   Auto   effect
> o    o      *open for write, fail if not possible
> f    o      *open for write, fail if not possible
> t    o      *open for read, no conversion to write
> o    f      open for write, fail if not possible
> f    f      open for write, fail if not possible
> t    f      open for read, no conversion to write
> o    t      attempt write but graceful fall back to read
> f    t      attempt write but graceful fall back to read
> t    t      ignore RO flag, attempt write anyway
> 
> That last row is weird, why not make it an explicit error instead of
> ignoring the implied difference in semantics between the two?

You're right that the description allows this. In practice,
auto-read-only can only make a node go from rw to ro, not the other way
round.

So our options are to document the current behaviour (auto-read-only has
no effect when the image is already read-only) or to make it an error.

One thought I had is that for convenience options like -hda (or in fact
-drive), auto-read-only=on could be the default, and only -blockdev and
blockdev-add would disable it by default. That would suggest that we
don't want to make it an error.

> Or, another idea: is it worth trying to support a single tri-state member
> (via an alternative between bool and enum, since the existing code uses a
> JSON bool):
> 
> "read-only": false (open for write, fail if not possible)
> "read-only": true (open read-only, no later switching)
> "read-only": "auto" (switch as needed; or for initial implementation attempt
> for write with graceful fallback to read)
> omitting read-only: same as "read-only":false for back-compat

If read-only were new, I would probably make it an enum, but adding it
now isn't very practical. I did actually start with an alternate and it
just wasn't very nice. One thing I remember is places that directly
accessed the options QDict, for which you could now have either a bool, a
string, an int or not present. It becomes a bit too much.

As read-only is optional, we could make it true/false/absent without
introducing an alternate and the additional int/string options, but I
don't like that very much either.


While we're talking about the schema, another thing I considered was
making auto-read-only an option only for the specific drivers that
support it so introspection could tell the management tool whether the
functionality is available. However, if we do this, we can't parse it in
block.c code and use a flag any more, but need to parse it in each
driver individually. Maybe it would be a better design anyway?

> > @@ -1328,6 +1338,11 @@ QemuOptsList bdrv_runtime_opts = {
> >               .type = QEMU_OPT_BOOL,
> >               .help = "Node is opened in read-only mode",
> >           },
> > +        {
> > +            .name = BDRV_OPT_AUTO_READ_ONLY,
> > +            .type = QEMU_OPT_BOOL,
> > +            .help = "Node can become read-only if opening read-write fails",
> > +        },
> 
> If we keep your current approach, is it worth mentioning that
> auto-read-only true overrides read-only true?

This help text is never printed anywhere anyway... Maybe we should just
delete it. What we refer to is the QAPI documentation anyway.

Kevin
Eric Blake Oct. 16, 2018, 6:46 p.m. UTC | #3
On 10/15/18 4:37 AM, Kevin Wolf wrote:
> Am 12.10.2018 um 18:47 hat Eric Blake geschrieben:
>> On 10/12/18 6:55 AM, Kevin Wolf wrote:
>>> If a management application builds the block graph node by node, the
>>> protocol layer doesn't inherit its read-only option from the format
>>> layer any more, so it must be set explicitly.
>>>
>>

>>
>> Bike-shedding: Do we really want to ignore @read-only? Here's the table of 9
>> combinations ('t'rue, 'f'alse, 'o'mitted), with '*' on the rows that must be
>> preserved for back-compat:
>>
>> RO   Auto   effect
>> o    o      *open for write, fail if not possible
>> f    o      *open for write, fail if not possible
>> t    o      *open for read, no conversion to write
>> o    f      open for write, fail if not possible
>> f    f      open for write, fail if not possible
>> t    f      open for read, no conversion to write
>> o    t      attempt write but graceful fall back to read
>> f    t      attempt write but graceful fall back to read
>> t    t      ignore RO flag, attempt write anyway
>>
>> That last row is weird, why not make it an explicit error instead of
>> ignoring the implied difference in semantics between the two?
> 
> You're right that the description allows this. In practice,
> auto-read-only can only make a node go from rw to ro, not the other way
> round.
> 
> So our options are to document the current behaviour (auto-read-only has
> no effect when the image is already read-only) or to make it an error.

Ah, that's different. I was reading it as "auto-read-only true lets you 
write if possible, overriding an explicit readonly request", while you 
are reading it as "auto-read-only true allows graceful fallback to 
read-only, and is thus a no-op if you already requested readonly"

I like yours better, so it's just a matter of coming up with the correct 
documentation wording.

> 
> One thought I had is that for convenience options like -hda (or in fact
> -drive), auto-read-only=on could be the default, and only -blockdev and
> blockdev-add would disable it by default. That would suggest that we
> don't want to make it an error.

Yes, having convenience options set auto-read-only would not be too 
terrible (since those are already magic and designed for short-hand 
human use), as long as the low-level QMP commands don't add the magic 
(explicit control is better at the low levels).

> 
>> Or, another idea: is it worth trying to support a single tri-state member
>> (via an alternative between bool and enum, since the existing code uses a
>> JSON bool):
>>
>> "read-only": false (open for write, fail if not possible)
>> "read-only": true (open read-only, no later switching)
>> "read-only": "auto" (switch as needed; or for initial implementation attempt
>> for write with graceful fallback to read)
>> omitting read-only: same as "read-only":false for back-compat
> 
> If read-only were new, I would probably make it an enum, but adding it
> now isn't very practical. I did actually start with an alternate and it
> just wasn't very nice. One thing I remember is places that directly
> accessed the options QDict, for which you could now have either a bool, a
> string, an int or not present. It becomes a bit too much.

Fair enough. Maybe it's worth a commit message note that we at least 
considered and rejected alternate implementations.

> 
> As read-only is optional, we could make it true/false/absent without
> introducing an alternate and the additional int/string options, but I
> don't like that very much either.

No, that way is not introspectible.  Adding auto-read-only is much 
friendlier.

> 
> 
> While we're talking about the schema, another thing I considered was
> making auto-read-only an option only for the specific drivers that
> support it so introspection could tell the management tool whether the
> functionality is available. However, if we do this, we can't parse it in
> block.c code and use a flag any more, but need to parse it in each
> driver individually. Maybe it would be a better design anyway?

Which drivers do you have in mind? Ones like file-posix, gluster, and 
NBD that actually have a notion of opening either read-write or 
read-only, or others that are read-only no matter what?

I'm still not convinced that a per-driver option is smart, and am 
reasonably happy with you adding it globally.

> 
>>> @@ -1328,6 +1338,11 @@ QemuOptsList bdrv_runtime_opts = {
>>>                .type = QEMU_OPT_BOOL,
>>>                .help = "Node is opened in read-only mode",
>>>            },
>>> +        {
>>> +            .name = BDRV_OPT_AUTO_READ_ONLY,
>>> +            .type = QEMU_OPT_BOOL,
>>> +            .help = "Node can become read-only if opening read-write fails",
>>> +        },
>>
>> If we keep your current approach, is it worth mentioning that
>> auto-read-only true overrides read-only true?
> 
> This help text is never printed anywhere anyway... Maybe we should just
> delete it. What we refer to is the QAPI documentation anyway.

Are you sure it never gets printed, with some of the recent patches 
around trying to improve help output?
diff mbox series

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index cfb37f8c1d..3a899298de 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3651,6 +3651,11 @@ 
 #                 either generally or in certain configurations. In this case,
 #                 the default value does not work and the option must be
 #                 specified explicitly.
+# @auto-read-only: if true, QEMU may ignore the @read-only option and
+#                  automatically decide whether to open the image read-only or
+#                  read-write (and switch between the modes later), e.g.
+#                  depending on whether the image file is writable or whether a
+#                  writing user is attached to the node (default: false).
 # @detect-zeroes: detect and optimize zero writes (Since 2.1)
 #                 (default: off)
 # @force-share:   force share all permission on added nodes.
@@ -3666,6 +3671,7 @@ 
             '*discard': 'BlockdevDiscardOptions',
             '*cache': 'BlockdevCacheOptions',
             '*read-only': 'bool',
+            '*auto-read-only': 'bool',
             '*force-share': 'bool',
             '*detect-zeroes': 'BlockdevDetectZeroesOptions' },
   'discriminator': 'driver',
diff --git a/include/block/block.h b/include/block/block.h
index b189cf422e..580b3716c3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -115,6 +115,7 @@  typedef struct HDGeometry {
                                       select an appropriate protocol driver,
                                       ignoring the format layer */
 #define BDRV_O_NO_IO       0x10000 /* don't initialize for I/O */
+#define BDRV_O_AUTO_RDONLY 0x20000 /* degrade to read-only if opening read-write fails */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
 
@@ -125,6 +126,7 @@  typedef struct HDGeometry {
 #define BDRV_OPT_CACHE_DIRECT   "cache.direct"
 #define BDRV_OPT_CACHE_NO_FLUSH "cache.no-flush"
 #define BDRV_OPT_READ_ONLY      "read-only"
+#define BDRV_OPT_AUTO_READ_ONLY "auto-read-only"
 #define BDRV_OPT_DISCARD        "discard"
 #define BDRV_OPT_FORCE_SHARE    "force-share"
 
diff --git a/block.c b/block.c
index d7bd6d29b4..f999393e28 100644
--- a/block.c
+++ b/block.c
@@ -930,6 +930,7 @@  static void bdrv_inherited_options(int *child_flags, QDict *child_options,
 
     /* Inherit the read-only option from the parent if it's not set */
     qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
+    qdict_copy_default(child_options, parent_options, BDRV_OPT_AUTO_READ_ONLY);
 
     /* Our block drivers take care to send flushes and respect unmap policy,
      * so we can default to enable both on lower layers regardless of the
@@ -1053,6 +1054,7 @@  static void bdrv_backing_options(int *child_flags, QDict *child_options,
 
     /* backing files always opened read-only */
     qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
+    qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
     flags &= ~BDRV_O_COPY_ON_READ;
 
     /* snapshot=on is handled on the top layer */
@@ -1142,6 +1144,10 @@  static void update_flags_from_options(int *flags, QemuOpts *opts)
         *flags |= BDRV_O_RDWR;
     }
 
+    assert(qemu_opt_find(opts, BDRV_OPT_AUTO_READ_ONLY));
+    if (qemu_opt_get_bool_del(opts, BDRV_OPT_AUTO_READ_ONLY, false)) {
+        *flags |= BDRV_O_AUTO_RDONLY;
+    }
 }
 
 static void update_options_from_flags(QDict *options, int flags)
@@ -1156,6 +1162,10 @@  static void update_options_from_flags(QDict *options, int flags)
     if (!qdict_haskey(options, BDRV_OPT_READ_ONLY)) {
         qdict_put_bool(options, BDRV_OPT_READ_ONLY, !(flags & BDRV_O_RDWR));
     }
+    if (!qdict_haskey(options, BDRV_OPT_AUTO_READ_ONLY)) {
+        qdict_put_bool(options, BDRV_OPT_AUTO_READ_ONLY,
+                       flags & BDRV_O_AUTO_RDONLY);
+    }
 }
 
 static void bdrv_assign_node_name(BlockDriverState *bs,
@@ -1328,6 +1338,11 @@  QemuOptsList bdrv_runtime_opts = {
             .type = QEMU_OPT_BOOL,
             .help = "Node is opened in read-only mode",
         },
+        {
+            .name = BDRV_OPT_AUTO_READ_ONLY,
+            .type = QEMU_OPT_BOOL,
+            .help = "Node can become read-only if opening read-write fails",
+        },
         {
             .name = "detect-zeroes",
             .type = QEMU_OPT_STRING,
@@ -1430,7 +1445,9 @@  static int bdrv_open_common(BlockDriverState *bs, BlockBackend *file,
     assert(atomic_read(&bs->copy_on_read) == 0);
 
     if (bs->open_flags & BDRV_O_COPY_ON_READ) {
-        if (!bs->read_only) {
+        if ((bs->open_flags & (BDRV_O_RDWR | BDRV_O_AUTO_RDONLY))
+            == BDRV_O_RDWR)
+        {
             bdrv_enable_copy_on_read(bs);
         } else {
             error_setg(errp, "Can't use copy-on-read on read-only device");
@@ -2486,6 +2503,8 @@  BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef *ref, Error **errp)
         qdict_set_default_str(qdict, BDRV_OPT_CACHE_DIRECT, "off");
         qdict_set_default_str(qdict, BDRV_OPT_CACHE_NO_FLUSH, "off");
         qdict_set_default_str(qdict, BDRV_OPT_READ_ONLY, "off");
+        qdict_set_default_str(qdict, BDRV_OPT_AUTO_READ_ONLY, "off");
+
     }
 
     bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, errp);
diff --git a/block/vvfat.c b/block/vvfat.c
index f2e7d501cf..98ba5e2bac 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3130,6 +3130,7 @@  static void vvfat_qcow_options(int *child_flags, QDict *child_options,
                                int parent_flags, QDict *parent_options)
 {
     qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "off");
+    qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
     qdict_set_default_str(child_options, BDRV_OPT_CACHE_NO_FLUSH, "on");
 }