diff mbox

add reopen to blockdev-transaction

Message ID 1330614819-26929-1-git-send-email-fsimonce@redhat.com
State New
Headers show

Commit Message

Federico Simoncelli March 1, 2012, 3:13 p.m. UTC
Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
---
 blockdev.c       |    8 ++++++++
 qapi-schema.json |   12 ++++++++++++
 qmp-commands.hx  |    6 +++++-
 3 files changed, 25 insertions(+), 1 deletions(-)

Comments

Paolo Bonzini March 1, 2012, 3:36 p.m. UTC | #1
Il 01/03/2012 16:13, Federico Simoncelli ha scritto:
> Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
> ---
>  blockdev.c       |    8 ++++++++
>  qapi-schema.json |   12 ++++++++++++
>  qmp-commands.hx  |    6 +++++-
>  3 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 56da5c9..36fe07c 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -798,6 +798,14 @@ void qmp_blockdev_transaction(BlockdevActionList *dev_list,
>                                           dev_info->mirror->target);
>              break;
>  
> +        case BLOCKDEV_ACTION_KIND_REOPEN:
> +            device = dev_info->reopen->device;
> +            if (dev_info->format->has_format) {
> +                format = dev_info->reopen->format;
> +            }
> +            new_source = g_strdup(dev_info->reopen->target);
> +            break;
> +
>          default:
>              abort();
>          }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index b33875d..17f7548 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1145,6 +1145,17 @@
>  { 'type': 'BlockdevMirror',
>    'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>              '*reuse': 'bool' } }
> +##
> +# @BlockdevReopen
> +#
> +# @device: the name of the device to reopen.
> +#
> +# @target: the target of the new image.
> +#
> +# @format: #optional the format of the new image, default is 'qcow2'.
> +##
> +{ 'type': 'BlockdevReopen',
> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str' } }
>  
>  ##
>  # @BlockdevAction
> @@ -1156,6 +1167,7 @@
>    'data': {
>         'snapshot': 'BlockdevSnapshot',
>         'mirror': 'BlockdevMirror',
> +       'reopen': 'BlockdevReopen',
>     } }
>  
>  ##
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 50ac5a0..317c448 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -720,7 +720,7 @@ Arguments:
>  
>  actions array:
>      - "type": the operation to perform.  The only supported
> -      values are "snapshot" and "mirror". (json-string)
> +      values are "snapshot", "mirror" and "reopen". (json-string)
>      - "data": a dictionary.  The contents depend on the value
>        of "type".  When "type" is "snapshot":
>        - "device": device name to snapshot (json-string)
> @@ -734,6 +734,10 @@ actions array:
>        - "format": format of new image (json-string, optional)
>        - "reuse": whether QEMU should look for an existing image file
>          (json-bool, optional, default false)
> +      When "type" is "reopen":
> +      - "device": device name to reopen (json-string)
> +      - "target": name of destination image file (json-string)
> +      - "format": format of new image (json-string, optional)
>  
>  Example:
>  

This keeps the whole "tower" of backing_hds from the old block device,
right? (including the blkmirror!)  I think that instead you need to
modify bdrv_append, probably by saving the action kind in
BlkTransactionStates.

But you can even keep from your first patch the drive-reopen command and
not make it atomic, that shouldn't be a problem.

Paolo
Eric Blake March 1, 2012, 4:23 p.m. UTC | #2
On 03/01/2012 08:36 AM, Paolo Bonzini wrote:
> Il 01/03/2012 16:13, Federico Simoncelli ha scritto:
>> Signed-off-by: Federico Simoncelli <fsimonce@redhat.com>
>> ---
>>  blockdev.c       |    8 ++++++++
>>  qapi-schema.json |   12 ++++++++++++
>>  qmp-commands.hx  |    6 +++++-
>>  3 files changed, 25 insertions(+), 1 deletions(-)
>>

>> @@ -734,6 +734,10 @@ actions array:
>>        - "format": format of new image (json-string, optional)
>>        - "reuse": whether QEMU should look for an existing image file
>>          (json-bool, optional, default false)
>> +      When "type" is "reopen":
>> +      - "device": device name to reopen (json-string)
>> +      - "target": name of destination image file (json-string)
>> +      - "format": format of new image (json-string, optional)
>>  
>>  Example:
>>  
> 
> This keeps the whole "tower" of backing_hds from the old block device,
> right? (including the blkmirror!)  I think that instead you need to
> modify bdrv_append, probably by saving the action kind in
> BlkTransactionStates.

Do we need more flexibility in what is being reopened?

I see several possibilities.  In the examples below, I'm using '*' for
any file that qemu has an open fd for, and [] to show the contents of a
backing file field within that file (which can be relative or absolute).

1. post-copy, oVirt creates the snapshot with a relative backing file
name.  That is, when using blockdev-transaction with the 'reuse' flag
set to true, qemu is merely opening the new file, ignoring the backing
file name present in the file it just opens, and using the current open
fd as the backing file of the just-opened fd.  That would give this setup:

> */store1/base <- */store1/snap1[backed by 'base'] <-\
>  /store2/base <-  /store2/snap1[backed by 'base']    \- */store2/snap2[backed by 'snap1'] <- qemu

and the action of the reopen will actually follow the relative backing
file name.  That is, 'snap2' did not technically have to be reopened, so
much as 'snap1'; but the act of reopening the entire 'snap2' chain
causes the swap so that all open files are now from /store2.

>  /store1/base <-  /store1/snap1[backed by 'base']
> */store2/base <- */store2/snap1[backed by 'base'] <- */store2/snap2[backed by 'snap1'] <- qemu

2. post-copy, letting qemu create the snapshot with absolute backing
file name.  We are starting with:

> */store1/base <- */store1/snap1[backed by '/store1/base'] <-\
>  /store2/base <-  /store2/snap1[backed by '/store2/base']    \- */store2/snap2[backed by '/store1/snap1'] <- qemu

and here, we actually need to rewrite the snap2 file to point to a
different backing file.  Again, we don't technically need to reopen
'snap2', so much as we need to reopen a different 'snap1' as well as
rewrite the backing file of snap2, to end with:

>  /store1/base <-  /store1/snap1[backed by '/store1/base']
> */store2/base <- */store2/snap1[backed by '/store2/base'] <- */store2/snap2[backed by '/store2/snap1'] <- qemu

3. mirrored migration, where oVirt pre-creates the files with relative
backing names.  qemu doesn't open the full chain of the mirror file, so
we are starting with:

> */store1/base <- */store1/snap1[backed by 'base'] <-+-- */store1/snap2[backed by 'snap1'] <- qemu r/w
>  /store2/base <-  /store2/snap1[backed by 'base']    \- */store2/snap2[backed by 'snap1'] <- qemu write only

and here, we really want the reopen to drop the side of the mirror that
we are no longer using, as well as to follow the full chain of the side
that we are swapping to.  Again, it is not technically necessary to
reopen store2/snap2 (which is already open), so much as to open the
backing chain, to end with:

>  /store1/base <-  /store1/snap1[backed by 'base'] <-  /store1/snap2[backed by 'snap1']
> */store2/base <- */store2/snap1[backed by 'base'] <- */store2/snap2[backed by 'snap1'] <- qemu r/w

4. mirrored migration, where qemu creates the snapshot and mirror with
absolute backing names.  We are starting with:

> */store1/base <- */store1/snap1[backed by '/store1/base'] <-+-- */store1/snap2[backed by '/store1/snap1'] <- qemu r/w
>  /store2/base <-  /store2/snap1[backed by '/store2/base']    \- */store2/snap2[backed by '/store1/snap1'] <- qemu write only

and just as in scenario 2, we want to rewrite the backing file stored in
store2/snap2 as part of the reopen, to end with:

>  /store1/base <-  /store1/snap1[backed by '/store1/base'] <-  /store1/snap2[backed by '/store1/snap1']
> */store2/base <- */store2/snap1[backed by '/store2/base'] <- */store2/snap2[backed by '/store2/snap1'] <- qemu r/w


Notice that in all 4 cases, the right file in current use by qemu was
already open; what this operation is really doing is reopening an
updated backing chain, as well as possibly discarding a mirror and/or
rewriting the backing file of the top level file as part of opening the
new backing chain.  I think we may need an optional argument that says
that when reopening a file, we also want to rewrite its backing chain to
the specified file name and type, as in:

{ 'type': 'BlockdevReopen',
  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
            '*backing': 'str', '*backing-format', 'str' } }

Swap 'device' to use the reopened 'target' (with given 'format') as its
current file, as well as reopening the entire backing file chain of
'target'.  This discards any mirror files that were previously in use.
Optionally, if 'backing' is given, rewrite 'target' to use 'backing'
rather than using the backing file currently recorded in target.

> But you can even keep from your first patch the drive-reopen command and
> not make it atomic, that shouldn't be a problem.

I'm not sure whether it makes sense for a separate drive-reopen or
whether to just add this to blockdev-transaction (or even both); I can
make libvirt use whichever color bikeshed we pick.  There's definitely a
transaction aspect here, though - we are reopening _all_ files along the
backing chain, so we must be able to roll back without rewriting any
backing file or altering the blockdev stack of in-use files if any of
the opens fail.
Paolo Bonzini March 1, 2012, 4:52 p.m. UTC | #3
> Do we need more flexibility in what is being reopened?
> 
> I see several possibilities.  In the examples below, I'm using '*' for
> any file that qemu has an open fd for, and [] to show the contents of a
> backing file field within that file (which can be relative or absolute).
> 
> 1. post-copy, oVirt creates the snapshot with a relative backing file
> name.  That is, when using blockdev-transaction with the 'reuse' flag
> set to true, qemu is merely opening the new file, ignoring the backing
> file name present in the file it just opens, and using the current open
> fd as the backing file of the just-opened fd.
> 
> 2. post-copy, letting qemu create the snapshot with absolute backing
> file name.  We are starting with:
> 
>> */store1/base <- */store1/snap1[backed by '/store1/base'] <-\
>>  /store2/base <-  /store2/snap1[backed by '/store2/base']    \- */store2/snap2[backed by '/store1/snap1'] <- qemu
> 
> and here, we actually need to rewrite the snap2 file to point to a
> different backing file.

You don't want to do this.  We could add:

* a "relative-backing-file" boolean to the snapshot JSON.  It would
compute a relative path to the current image file and store it.

* a "relative-backing-file" boolean, *or* a
"backing-file"/"backing-format" pair of strings to the mirror JSON.  The
latter is obvious, but I'm not sure I like it.  relative-path would
compute a relative path from the current image file to its backing file,
and store it.  So if you had

   /store0/templates/base <- /store0/vms/snap1

it would store ../templates/base in the newly-created image.

There's a third case, in which you do not want a snapshot to be created
at all, for example when migrating with streaming.  So overall you'd have:

* snapshot = true, relative-backing-file = false (default)
* snapshot = true, relative-backing-file = true (oVirt desired)
* snapshot = false, relative-backing-file doesn't matter

In any case, rewriting after the fact is wrong.

> Again, we don't technically need to reopen
> 'snap2', so much as we need to reopen a different 'snap1' to end with:
> 
>>  /store1/base <-  /store1/snap1[backed by '/store1/base']
>> */store2/base <- */store2/snap1[backed by '/store2/base'] <- */store2/snap2[backed by '/store2/snap1'] <- qemu

So in both cases we need to reopen the full tower here.

> Notice that in all 4 cases, the right file in current use by qemu was
> already open; what this operation is really doing is reopening an
> updated backing chain, as well as possibly discarding a mirror and/or
> rewriting the backing file of the top level file as part of opening the
> new backing chain.  I think we may need an optional argument that says
> that when reopening a file, we also want to rewrite its backing chain to
> the specified file name and type, as in:
> 
> { 'type': 'BlockdevReopen',
>   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>             '*backing': 'str', '*backing-format', 'str' } }

I think you're complicating the thing a lot.  Just ignore cases (2) and
(4), they are wrong for you and you are not going to do them anyway.
You just want to *reopen the device*.  Let QEMU pick the backing files.
 If they're wrong, something is wrong in the previous steps, and it's
there that we should expand the JSON.  (In the meanwhile, you can work
around it with reuse:true).

>> But you can even keep from your first patch the drive-reopen command and
>> not make it atomic, that shouldn't be a problem.
> 
> I'm not sure whether it makes sense for a separate drive-reopen or
> whether to just add this to blockdev-transaction (or even both); I can
> make libvirt use whichever color bikeshed we pick.  There's definitely a
> transaction aspect here

It's not so much atomicity, it's just safety.  The drive-reopen command
must be implemented in a similar way to bdrv_append; it must not do a
close+reopen in the same way as the existing blockdev-snapshot-sync
command, but that's just that blockdev-snapshot-sync was implemented
poorly.

Paolo
Kevin Wolf March 2, 2012, 1 p.m. UTC | #4
Am 01.03.2012 17:52, schrieb Paolo Bonzini:
>>> But you can even keep from your first patch the drive-reopen command and
>>> not make it atomic, that shouldn't be a problem.
>>
>> I'm not sure whether it makes sense for a separate drive-reopen or
>> whether to just add this to blockdev-transaction (or even both); I can
>> make libvirt use whichever color bikeshed we pick.  There's definitely a
>> transaction aspect here
> 
> It's not so much atomicity, it's just safety.  The drive-reopen command
> must be implemented in a similar way to bdrv_append; it must not do a
> close+reopen in the same way as the existing blockdev-snapshot-sync
> command, but that's just that blockdev-snapshot-sync was implemented
> poorly.

For reopen this is a bit harder because you deal with already opened
images and you must never have the same image opened twice at the same time.

Kevin
Anthony Liguori March 2, 2012, 1:08 p.m. UTC | #5
On 03/01/2012 09:13 AM, Federico Simoncelli wrote:
> Signed-off-by: Federico Simoncelli<fsimonce@redhat.com>

This is a good example of the introspection comment.  libvirt could never figure 
out this was available unless this came in with the original command.

Regards,

Anthony Liguori

> ---
>   blockdev.c       |    8 ++++++++
>   qapi-schema.json |   12 ++++++++++++
>   qmp-commands.hx  |    6 +++++-
>   3 files changed, 25 insertions(+), 1 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index 56da5c9..36fe07c 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -798,6 +798,14 @@ void qmp_blockdev_transaction(BlockdevActionList *dev_list,
>                                            dev_info->mirror->target);
>               break;
>
> +        case BLOCKDEV_ACTION_KIND_REOPEN:
> +            device = dev_info->reopen->device;
> +            if (dev_info->format->has_format) {
> +                format = dev_info->reopen->format;
> +            }
> +            new_source = g_strdup(dev_info->reopen->target);
> +            break;
> +
>           default:
>               abort();
>           }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index b33875d..17f7548 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1145,6 +1145,17 @@
>   { 'type': 'BlockdevMirror',
>     'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>               '*reuse': 'bool' } }
> +##
> +# @BlockdevReopen
> +#
> +# @device: the name of the device to reopen.
> +#
> +# @target: the target of the new image.
> +#
> +# @format: #optional the format of the new image, default is 'qcow2'.
> +##
> +{ 'type': 'BlockdevReopen',
> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str' } }
>
>   ##
>   # @BlockdevAction
> @@ -1156,6 +1167,7 @@
>     'data': {
>          'snapshot': 'BlockdevSnapshot',
>          'mirror': 'BlockdevMirror',
> +       'reopen': 'BlockdevReopen',
>      } }
>
>   ##
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 50ac5a0..317c448 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -720,7 +720,7 @@ Arguments:
>
>   actions array:
>       - "type": the operation to perform.  The only supported
> -      values are "snapshot" and "mirror". (json-string)
> +      values are "snapshot", "mirror" and "reopen". (json-string)
>       - "data": a dictionary.  The contents depend on the value
>         of "type".  When "type" is "snapshot":
>         - "device": device name to snapshot (json-string)
> @@ -734,6 +734,10 @@ actions array:
>         - "format": format of new image (json-string, optional)
>         - "reuse": whether QEMU should look for an existing image file
>           (json-bool, optional, default false)
> +      When "type" is "reopen":
> +      - "device": device name to reopen (json-string)
> +      - "target": name of destination image file (json-string)
> +      - "format": format of new image (json-string, optional)
>
>   Example:
>
Paolo Bonzini March 2, 2012, 1:25 p.m. UTC | #6
Il 02/03/2012 14:00, Kevin Wolf ha scritto:
> Am 01.03.2012 17:52, schrieb Paolo Bonzini:
>>>> But you can even keep from your first patch the drive-reopen command and
>>>> not make it atomic, that shouldn't be a problem.
>>>
>>> I'm not sure whether it makes sense for a separate drive-reopen or
>>> whether to just add this to blockdev-transaction (or even both); I can
>>> make libvirt use whichever color bikeshed we pick.  There's definitely a
>>> transaction aspect here
>>
>> It's not so much atomicity, it's just safety.  The drive-reopen command
>> must be implemented in a similar way to bdrv_append; it must not do a
>> close+reopen in the same way as the existing blockdev-snapshot-sync
>> command, but that's just that blockdev-snapshot-sync was implemented
>> poorly.
> 
> For reopen this is a bit harder because you deal with already opened
> images and you must never have the same image opened twice at the same time.

This is only for read-write images, and the backing files are read-only,
so this shouldn't be a problem, no?

Hmm, actually there could one problem.  Say you switch from base->old to
base->new; if old is the outcome of a live snapshot operation, base will
still be open read-write.

So perhaps we do need a new method like bdrv_freeze; after bdrv_freeze
you know that bdrv_close will not need to write to disk.  So e.g. for
QED bdrv_freeze will turn off the need-check bit.

Paolo
Kevin Wolf March 2, 2012, 1:38 p.m. UTC | #7
Am 02.03.2012 14:25, schrieb Paolo Bonzini:
> Il 02/03/2012 14:00, Kevin Wolf ha scritto:
>> Am 01.03.2012 17:52, schrieb Paolo Bonzini:
>>>>> But you can even keep from your first patch the drive-reopen command and
>>>>> not make it atomic, that shouldn't be a problem.
>>>>
>>>> I'm not sure whether it makes sense for a separate drive-reopen or
>>>> whether to just add this to blockdev-transaction (or even both); I can
>>>> make libvirt use whichever color bikeshed we pick.  There's definitely a
>>>> transaction aspect here
>>>
>>> It's not so much atomicity, it's just safety.  The drive-reopen command
>>> must be implemented in a similar way to bdrv_append; it must not do a
>>> close+reopen in the same way as the existing blockdev-snapshot-sync
>>> command, but that's just that blockdev-snapshot-sync was implemented
>>> poorly.
>>
>> For reopen this is a bit harder because you deal with already opened
>> images and you must never have the same image opened twice at the same time.
> 
> This is only for read-write images, and the backing files are read-only,
> so this shouldn't be a problem, no?

Opening an image read-write that is still open read-only may break the
read-only instance.

You can argue that opening an image read-only while a read-write
instance is open can be tolerated if you flushed the image and made sure
no new requests are coming in. This is what happens with live migration.
It's a case that has given us enough headaches that I would not want to
introduce similar behaviour in more cases.

So in short: Regardless of ro/rw, opening images twice is bad. Just
don't do it.

If anything, a possible solution could look like the bdrv_reopen
proposal which already includes prepare/commit/abort functions in the
block driver.

Kevin
diff mbox

Patch

diff --git a/blockdev.c b/blockdev.c
index 56da5c9..36fe07c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -798,6 +798,14 @@  void qmp_blockdev_transaction(BlockdevActionList *dev_list,
                                          dev_info->mirror->target);
             break;
 
+        case BLOCKDEV_ACTION_KIND_REOPEN:
+            device = dev_info->reopen->device;
+            if (dev_info->format->has_format) {
+                format = dev_info->reopen->format;
+            }
+            new_source = g_strdup(dev_info->reopen->target);
+            break;
+
         default:
             abort();
         }
diff --git a/qapi-schema.json b/qapi-schema.json
index b33875d..17f7548 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1145,6 +1145,17 @@ 
 { 'type': 'BlockdevMirror',
   'data': { 'device': 'str', 'target': 'str', '*format': 'str',
             '*reuse': 'bool' } }
+##
+# @BlockdevReopen
+#
+# @device: the name of the device to reopen.
+#
+# @target: the target of the new image.
+#
+# @format: #optional the format of the new image, default is 'qcow2'.
+##
+{ 'type': 'BlockdevReopen',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str' } }
 
 ##
 # @BlockdevAction
@@ -1156,6 +1167,7 @@ 
   'data': {
        'snapshot': 'BlockdevSnapshot',
        'mirror': 'BlockdevMirror',
+       'reopen': 'BlockdevReopen',
    } }
 
 ##
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 50ac5a0..317c448 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -720,7 +720,7 @@  Arguments:
 
 actions array:
     - "type": the operation to perform.  The only supported
-      values are "snapshot" and "mirror". (json-string)
+      values are "snapshot", "mirror" and "reopen". (json-string)
     - "data": a dictionary.  The contents depend on the value
       of "type".  When "type" is "snapshot":
       - "device": device name to snapshot (json-string)
@@ -734,6 +734,10 @@  actions array:
       - "format": format of new image (json-string, optional)
       - "reuse": whether QEMU should look for an existing image file
         (json-bool, optional, default false)
+      When "type" is "reopen":
+      - "device": device name to reopen (json-string)
+      - "target": name of destination image file (json-string)
+      - "format": format of new image (json-string, optional)
 
 Example: