diff mbox

[v4,3/6] add backup related monitor commands

Message ID 1361352723-218544-4-git-send-email-dietmar@proxmox.com
State New
Headers show

Commit Message

Dietmar Maurer Feb. 20, 2013, 9:32 a.m. UTC
We use a generic BackupDriver struct to encapsulate all archive format
related function.

Another option would be to simply dump <devid,cluster_num,cluster_data> to
the output fh (pipe), and an external binary saves the data. That way we
could move the whole archive format related code out of qemu.

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
 backup.h         |   12 ++
 blockdev.c       |  423 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hmp-commands.hx  |   31 ++++
 hmp.c            |   63 ++++++++
 hmp.h            |    3 +
 monitor.c        |    7 +
 qapi-schema.json |   95 ++++++++++++
 qmp-commands.hx  |   27 ++++
 8 files changed, 661 insertions(+), 0 deletions(-)

Comments

Stefan Hajnoczi Feb. 20, 2013, 2:29 p.m. UTC | #1
On Wed, Feb 20, 2013 at 10:32:00AM +0100, Dietmar Maurer wrote:
> We use a generic BackupDriver struct to encapsulate all archive format
> related function.
> 
> Another option would be to simply dump <devid,cluster_num,cluster_data> to
> the output fh (pipe), and an external binary saves the data. That way we
> could move the whole archive format related code out of qemu.

That sounds like the NBD option - write the backup to an NBD disk image.
The NBD server process can take the WRITE requests and do whatever it
wants.

The disadvantage of using NBD is that it strictly transports block data.
It doesn't really have the concept of VM configuration or device state.

> diff --git a/backup.h b/backup.h
> index d9395bc..c8ba153 100644
> --- a/backup.h
> +++ b/backup.h
> @@ -29,4 +29,16 @@ int backup_job_create(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
>                        BlockDriverCompletionFunc *backup_complete_cb,
>                        void *opaque, int64_t speed);
>  
> +typedef struct BackupDriver {
> +    const char *format;
> +    void *(*open_cb)(const char *filename, uuid_t uuid, Error **errp);
> +    int (*close_cb)(void *opaque, Error **errp);
> +    int (*register_config_cb)(void *opaque, const char *name, gpointer data,
> +                              size_t data_len);
> +    int (*register_stream_cb)(void *opaque, const char *devname, size_t size);
> +    int (*dump_cb)(void *opaque, uint8_t dev_id, int64_t cluster_num,
> +                   unsigned char *buf, size_t *zero_bytes);
> +    int (*complete_cb)(void *opaque, uint8_t dev_id, int ret);

No need to suffix the functions with _cb.

> +    /* add configuration file to archive */
> +    if (has_config_file) {
> +        char *cdata = NULL;
> +        gsize clen = 0;
> +        GError *err = NULL;
> +        if (!g_file_get_contents(config_file, &cdata, &clen, &err)) {
> +            error_setg(errp, "unable to read file '%s'", config_file);
> +            goto err;
> +        }
> +
> +        const char *basename = g_path_get_basename(config_file);
> +        if (driver->register_config_cb(writer, basename, cdata, clen) < 0) {
> +            error_setg(errp, "register_config failed");
> +            g_free(cdata);
> +            goto err;
> +        }
> +        g_free(cdata);
> +    }

This is doing too much inside QEMU.

First off, when QEMU is sandboxed or run with SELinux, we cannot expect
to be able to access arbitrary files.  This is why new QMP commands of
this sort usually support file descriptor passing - then a more
privileged component can give QEMU access.

g_file_get_contents() hangs the VM if the file is over NFS while the
server is down.  This is bad for reliability.

Management tools (proxmox, libvirt, or something else) handle the VM
configuration.  It may not be a single file.  Saving external VM
configuration is out of scope for QEMU.

> +##
> +# @backup:
> +#
> +# Starts a VM backup.
> +#
> +# @backup-file: the backup file name
> +#
> +# @format: format of the backup file
> +#
> +# @config-filename: #optional name of a configuration file to include into
> +# the backup archive.
> +#
> +# @speed: #optional the maximum speed, in bytes per second
> +#
> +# @devlist: #optional list of block device names (separated by ',', ';'
> +# or ':'). By default the backup includes all writable block devices.
> +#
> +# Returns: the uuid of the backup job
> +#
> +# Since: 1.5.0
> +##
> +{ 'command': 'backup', 'data': { 'backup-file': 'str',
> +                                 '*format': 'BackupFormat',
> +                                 '*config-file': 'str',
> +                                 '*devlist': 'str', '*speed': 'int' },

devlist should be ['String'].

> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 799adea..17e381b 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -889,6 +889,18 @@ EQMP
>      },
>  
>      {
> +        .name       = "backup",
> +        .args_type  = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?",
> +        .mhandler.cmd_new = qmp_marshal_input_backup,
> +    },
> +
> +    {
> +        .name       = "backup-cancel",
> +        .args_type  = "",
> +        .mhandler.cmd_new = qmp_marshal_input_backup_cancel,
> +    },

We might want to more than one backup if the guest has multiple disks.
For example, the database disk is backed up independently of the main OS
disk.

This API only allows one backup to run at a time.
Dietmar Maurer Feb. 21, 2013, 6:21 a.m. UTC | #2
> > Another option would be to simply dump
> > <devid,cluster_num,cluster_data> to the output fh (pipe), and an
> > external binary saves the data. That way we could move the whole archive
> format related code out of qemu.
> 
> That sounds like the NBD option - write the backup to an NBD disk image.
> The NBD server process can take the WRITE requests and do whatever it wants.
> 
> The disadvantage of using NBD is that it strictly transports block data.
> It doesn't really have the concept of VM configuration or device state.

My thought was that the BackupDriver is the simplest way to allow different backend.
For example you can easily create a BackupDriver to write to a NBD server (sure, you still need
to find a way to store configuration).

> > diff --git a/backup.h b/backup.h
> > index d9395bc..c8ba153 100644
> > --- a/backup.h
> > +++ b/backup.h
> > @@ -29,4 +29,16 @@ int backup_job_create(BlockDriverState *bs,
> BackupDumpFunc *backup_dump_cb,
> >                        BlockDriverCompletionFunc *backup_complete_cb,
> >                        void *opaque, int64_t speed);
> >
> > +typedef struct BackupDriver {
> > +    const char *format;
> > +    void *(*open_cb)(const char *filename, uuid_t uuid, Error **errp);
> > +    int (*close_cb)(void *opaque, Error **errp);
> > +    int (*register_config_cb)(void *opaque, const char *name, gpointer data,
> > +                              size_t data_len);
> > +    int (*register_stream_cb)(void *opaque, const char *devname, size_t size);
> > +    int (*dump_cb)(void *opaque, uint8_t dev_id, int64_t cluster_num,
> > +                   unsigned char *buf, size_t *zero_bytes);
> > +    int (*complete_cb)(void *opaque, uint8_t dev_id, int ret);
> 
> No need to suffix the functions with _cb.

Ok, will remove that.

> 
> > +    /* add configuration file to archive */
> > +    if (has_config_file) {
> > +        char *cdata = NULL;
> > +        gsize clen = 0;
> > +        GError *err = NULL;
> > +        if (!g_file_get_contents(config_file, &cdata, &clen, &err)) {
> > +            error_setg(errp, "unable to read file '%s'", config_file);
> > +            goto err;
> > +        }
> > +
> > +        const char *basename = g_path_get_basename(config_file);
> > +        if (driver->register_config_cb(writer, basename, cdata, clen) < 0) {
> > +            error_setg(errp, "register_config failed");
> > +            g_free(cdata);
> > +            goto err;
> > +        }
> > +        g_free(cdata);
> > +    }
> 
> This is doing too much inside QEMU.
> 
> First off, when QEMU is sandboxed or run with SELinux, we cannot expect to be
> able to access arbitrary files.  This is why new QMP commands of this sort
> usually support file descriptor passing - then a more privileged component can
> give QEMU access.

We can allow to pass fd for that later (if someone really uses it that way).

> g_file_get_contents() hangs the VM if the file is over NFS while the server is
> down.  This is bad for reliability.

The solution is to not pass a path which is on NFS. But that is up to the management tool.
 
> Management tools (proxmox, libvirt, or something else) handle the VM
> configuration.  It may not be a single file.  Saving external VM configuration is
> out of scope for QEMU.

Backup includes configuration, so you need a way to save that.
But I do not really get your suggestion - creating a backup without configuration
does not make much sense?

In future, we can allow to pass multiple config files - the vma archive format can
already handle that.

> > +##
> > +# @backup:
> > +#
> > +# Starts a VM backup.
> > +#
> > +# @backup-file: the backup file name
> > +#
> > +# @format: format of the backup file
> > +#
> > +# @config-filename: #optional name of a configuration file to include
> > +into # the backup archive.
> > +#
> > +# @speed: #optional the maximum speed, in bytes per second # #
> > +@devlist: #optional list of block device names (separated by ',', ';'
> > +# or ':'). By default the backup includes all writable block devices.
> > +#
> > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## {
> > +'command': 'backup', 'data': { 'backup-file': 'str',
> > +                                 '*format': 'BackupFormat',
> > +                                 '*config-file': 'str',
> > +                                 '*devlist': 'str', '*speed': 'int'
> > +},
> 
> devlist should be ['String'].

I want to be able to use that command on the human monitor.
That is no longer possible if I use ['String']?

> 
> > diff --git a/qmp-commands.hx b/qmp-commands.hx index 799adea..17e381b
> > 100644
> > --- a/qmp-commands.hx
> > +++ b/qmp-commands.hx
> > @@ -889,6 +889,18 @@ EQMP
> >      },
> >
> >      {
> > +        .name       = "backup",
> > +        .args_type  = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?",
> > +        .mhandler.cmd_new = qmp_marshal_input_backup,
> > +    },
> > +
> > +    {
> > +        .name       = "backup-cancel",
> > +        .args_type  = "",
> > +        .mhandler.cmd_new = qmp_marshal_input_backup_cancel,
> > +    },
> 
> We might want to more than one backup if the guest has multiple disks.
> For example, the database disk is backed up independently of the main OS disk.
> 
> This API only allows one backup to run at a time.

I do not want multiple backup jobs. You can easily run 2 jobs in sequence to solve above use case.
Stefan Hajnoczi Feb. 21, 2013, 1:03 p.m. UTC | #3
On Thu, Feb 21, 2013 at 06:21:32AM +0000, Dietmar Maurer wrote:
> > > +    /* add configuration file to archive */
> > > +    if (has_config_file) {
> > > +        char *cdata = NULL;
> > > +        gsize clen = 0;
> > > +        GError *err = NULL;
> > > +        if (!g_file_get_contents(config_file, &cdata, &clen, &err)) {
> > > +            error_setg(errp, "unable to read file '%s'", config_file);
> > > +            goto err;
> > > +        }
> > > +
> > > +        const char *basename = g_path_get_basename(config_file);
> > > +        if (driver->register_config_cb(writer, basename, cdata, clen) < 0) {
> > > +            error_setg(errp, "register_config failed");
> > > +            g_free(cdata);
> > > +            goto err;
> > > +        }
> > > +        g_free(cdata);
> > > +    }
> > 
> > This is doing too much inside QEMU.
> > 
> > First off, when QEMU is sandboxed or run with SELinux, we cannot expect to be
> > able to access arbitrary files.  This is why new QMP commands of this sort
> > usually support file descriptor passing - then a more privileged component can
> > give QEMU access.
> 
> We can allow to pass fd for that later (if someone really uses it that way).
> 
> > g_file_get_contents() hangs the VM if the file is over NFS while the server is
> > down.  This is bad for reliability.
> 
> The solution is to not pass a path which is on NFS. But that is up to the management tool.
>  
> > Management tools (proxmox, libvirt, or something else) handle the VM
> > configuration.  It may not be a single file.  Saving external VM configuration is
> > out of scope for QEMU.
> 
> Backup includes configuration, so you need a way to save that.
> But I do not really get your suggestion - creating a backup without configuration
> does not make much sense?
> 
> In future, we can allow to pass multiple config files - the vma archive format can
> already handle that.

My point is that QEMU has no business dealing with the management tool's
VM configuration file.  And I think the management tool-specific archive
format also shouldn't be in QEMU.

How will a proxmox user restore a VMA file generated with oVirt since
the configuration file comes from the management layer?  They can't
because the VMA is specific to the management layer and should be
handled there, not inside QEMU.

QEMU must provide the mechanism for point-in-time backups of block
devices - your backup block job implements this.

Where I disagree with this patch series is that you put the management
tool-specific archive format writer into QEMU.  That is outside the
scope of QEMU.

The management tool should tell QEMU to take the backups of block
devices and do a live migration to file.

The management tool can use a NBD server if it wants to capture all the
block device backups into a single archive.  And the management tool can
bundle the VM configuration into that archive too.  But those steps are
beyond the scope of QEMU.

The approach I'm suggesting is more modular.  For example, the backup
block job can also be used to copy out the state of a disk into a new
qcow2 file.

> > > +##
> > > +# @backup:
> > > +#
> > > +# Starts a VM backup.
> > > +#
> > > +# @backup-file: the backup file name
> > > +#
> > > +# @format: format of the backup file
> > > +#
> > > +# @config-filename: #optional name of a configuration file to include
> > > +into # the backup archive.
> > > +#
> > > +# @speed: #optional the maximum speed, in bytes per second # #
> > > +@devlist: #optional list of block device names (separated by ',', ';'
> > > +# or ':'). By default the backup includes all writable block devices.
> > > +#
> > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## {
> > > +'command': 'backup', 'data': { 'backup-file': 'str',
> > > +                                 '*format': 'BackupFormat',
> > > +                                 '*config-file': 'str',
> > > +                                 '*devlist': 'str', '*speed': 'int'
> > > +},
> > 
> > devlist should be ['String'].
> 
> I want to be able to use that command on the human monitor.
> That is no longer possible if I use ['String']?

Good question.  I don't know the answer but perhaps Luiz or Anthony do
(CCed)?

> > > diff --git a/qmp-commands.hx b/qmp-commands.hx index 799adea..17e381b
> > > 100644
> > > --- a/qmp-commands.hx
> > > +++ b/qmp-commands.hx
> > > @@ -889,6 +889,18 @@ EQMP
> > >      },
> > >
> > >      {
> > > +        .name       = "backup",
> > > +        .args_type  = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?",
> > > +        .mhandler.cmd_new = qmp_marshal_input_backup,
> > > +    },
> > > +
> > > +    {
> > > +        .name       = "backup-cancel",
> > > +        .args_type  = "",
> > > +        .mhandler.cmd_new = qmp_marshal_input_backup_cancel,
> > > +    },
> > 
> > We might want to more than one backup if the guest has multiple disks.
> > For example, the database disk is backed up independently of the main OS disk.
> > 
> > This API only allows one backup to run at a time.
> 
> I do not want multiple backup jobs. You can easily run 2 jobs in sequence to solve above use case.

Why do you not want multiple backup jobs?  It makes perfect sense to
separate database disks from main OS disks.  They have different backup
characteristics (how often to back up, how to restore) so it's likely
that users will ask for multiple backup jobs at the same time.  Let's
get the QMP interfaces right so that it can be supported in the future,
if not right away.

Stefan
Stefan Hajnoczi Feb. 21, 2013, 1:47 p.m. UTC | #4
On Thu, Feb 21, 2013 at 06:21:32AM +0000, Dietmar Maurer wrote:
> > > +##
> > > +# @backup:
> > > +#
> > > +# Starts a VM backup.
> > > +#
> > > +# @backup-file: the backup file name
> > > +#
> > > +# @format: format of the backup file
> > > +#
> > > +# @config-filename: #optional name of a configuration file to include
> > > +into # the backup archive.
> > > +#
> > > +# @speed: #optional the maximum speed, in bytes per second # #
> > > +@devlist: #optional list of block device names (separated by ',', ';'
> > > +# or ':'). By default the backup includes all writable block devices.
> > > +#
> > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## {
> > > +'command': 'backup', 'data': { 'backup-file': 'str',
> > > +                                 '*format': 'BackupFormat',
> > > +                                 '*config-file': 'str',
> > > +                                 '*devlist': 'str', '*speed': 'int'
> > > +},
> > 
> > devlist should be ['String'].
> 
> I want to be able to use that command on the human monitor.
> That is no longer possible if I use ['String']?

I forgot to CC Luiz and Anthony.  Question for them.

Stefan
Dietmar Maurer Feb. 21, 2013, 3:48 p.m. UTC | #5
> > In future, we can allow to pass multiple config files - the vma
> > archive format can already handle that.
> 
> My point is that QEMU has no business dealing with the management tool's VM
> configuration file.  And I think the management tool-specific archive format also
> shouldn't be in QEMU.
> 
> How will a proxmox user restore a VMA file generated with oVirt since the
> configuration file comes from the management layer?  They can't because the
> VMA is specific to the management layer and should be handled there, not inside
> QEMU.

The management tooI just needs to convert the config - looks quite easy to me.

> QEMU must provide the mechanism for point-in-time backups of block devices -
> your backup block job implements this.
> 
> Where I disagree with this patch series is that you put the management tool-
> specific archive format writer into QEMU.  That is outside the scope of QEMU.
> The management tool should tell QEMU to take the backups of block devices
> and do a live migration to file.
> 
> The management tool can use a NBD server if it wants to capture all the block
> device backups into a single archive.  And the management tool can bundle the
> VM configuration into that archive too.  But those steps are beyond the scope of
> QEMU.
> 
> The approach I'm suggesting is more modular.  For example, the backup block
> job can also be used to copy out the state of a disk into a new
> qcow2 file.

OK, I can try to split the code. But I think I will simply define a 'protocol' instead
of using an NBD server (what for?)

> > > > +##
> > > > +# @backup:
> > > > +#
> > > > +# Starts a VM backup.
> > > > +#
> > > > +# @backup-file: the backup file name # # @format: format of the
> > > > +backup file # # @config-filename: #optional name of a
> > > > +configuration file to include into # the backup archive.
> > > > +#
> > > > +# @speed: #optional the maximum speed, in bytes per second # #
> > > > +@devlist: #optional list of block device names (separated by ',', ';'
> > > > +# or ':'). By default the backup includes all writable block devices.
> > > > +#
> > > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## {
> > > > +'command': 'backup', 'data': { 'backup-file': 'str',
> > > > +                                 '*format': 'BackupFormat',
> > > > +                                 '*config-file': 'str',
> > > > +                                 '*devlist': 'str', '*speed': 'int'
> > > > +},
> > >
> > > devlist should be ['String'].
> >
> > I want to be able to use that command on the human monitor.
> > That is no longer possible if I use ['String']?
> 
> Good question.  I don't know the answer but perhaps Luiz or Anthony do (CCed)?
> 
> > > > diff --git a/qmp-commands.hx b/qmp-commands.hx index
> > > > 799adea..17e381b
> > > > 100644
> > > > --- a/qmp-commands.hx
> > > > +++ b/qmp-commands.hx
> > > > @@ -889,6 +889,18 @@ EQMP
> > > >      },
> > > >
> > > >      {
> > > > +        .name       = "backup",
> > > > +        .args_type  = "backup-file:s,format:s?,config-
> file:F?,speed:o?,devlist:s?",
> > > > +        .mhandler.cmd_new = qmp_marshal_input_backup,
> > > > +    },
> > > > +
> > > > +    {
> > > > +        .name       = "backup-cancel",
> > > > +        .args_type  = "",
> > > > +        .mhandler.cmd_new = qmp_marshal_input_backup_cancel,
> > > > +    },
> > >
> > > We might want to more than one backup if the guest has multiple disks.
> > > For example, the database disk is backed up independently of the main OS
> disk.
> > >
> > > This API only allows one backup to run at a time.
> >
> > I do not want multiple backup jobs. You can easily run 2 jobs in sequence to
> solve above use case.
> 
> Why do you not want multiple backup jobs?  It makes perfect sense to separate
> database disks from main OS disks.  They have different backup characteristics
> (how often to back up, how to restore) so it's likely that users will ask for
> multiple backup jobs at the same time.  Let's get the QMP interfaces right so that
> it can be supported in the future, if not right away.

So you want to pass the 'uuid' to backup-cancel?
Luiz Capitulino Feb. 21, 2013, 5:41 p.m. UTC | #6
On Thu, 21 Feb 2013 14:47:08 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Thu, Feb 21, 2013 at 06:21:32AM +0000, Dietmar Maurer wrote:
> > > > +##
> > > > +# @backup:
> > > > +#
> > > > +# Starts a VM backup.
> > > > +#
> > > > +# @backup-file: the backup file name
> > > > +#
> > > > +# @format: format of the backup file
> > > > +#
> > > > +# @config-filename: #optional name of a configuration file to include
> > > > +into # the backup archive.
> > > > +#
> > > > +# @speed: #optional the maximum speed, in bytes per second # #
> > > > +@devlist: #optional list of block device names (separated by ',', ';'
> > > > +# or ':'). By default the backup includes all writable block devices.
> > > > +#
> > > > +# Returns: the uuid of the backup job # # Since: 1.5.0 ## {
> > > > +'command': 'backup', 'data': { 'backup-file': 'str',
> > > > +                                 '*format': 'BackupFormat',
> > > > +                                 '*config-file': 'str',
> > > > +                                 '*devlist': 'str', '*speed': 'int'
> > > > +},
> > > 
> > > devlist should be ['String'].
> > 
> > I want to be able to use that command on the human monitor.
> > That is no longer possible if I use ['String']?

Unless I'm missing something, it should work just fine. The HMP command
can have an argument which is a device list, then it will have to parse
that list and transform it into the list expected by QAPI functions, which
is what is going to be passed to the qmp_ function.
Stefan Hajnoczi Feb. 22, 2013, 9:57 a.m. UTC | #7
On Thu, Feb 21, 2013 at 03:48:57PM +0000, Dietmar Maurer wrote:
> > > In future, we can allow to pass multiple config files - the vma
> > > archive format can already handle that.
> > 
> > My point is that QEMU has no business dealing with the management tool's VM
> > configuration file.  And I think the management tool-specific archive format also
> > shouldn't be in QEMU.
> > 
> > How will a proxmox user restore a VMA file generated with oVirt since the
> > configuration file comes from the management layer?  They can't because the
> > VMA is specific to the management layer and should be handled there, not inside
> > QEMU.
> 
> The management tooI just needs to convert the config - looks quite easy to me.

It's not an easy problem.  This is why there is no central vm-images.com
where everyone can share/sell virtual appliances.  You cannot trivially
convert between VMware, oVirt, proxmox, Xen, EC2, etc.

> > QEMU must provide the mechanism for point-in-time backups of block devices -
> > your backup block job implements this.
> > 
> > Where I disagree with this patch series is that you put the management tool-
> > specific archive format writer into QEMU.  That is outside the scope of QEMU.
> > The management tool should tell QEMU to take the backups of block devices
> > and do a live migration to file.
> > 
> > The management tool can use a NBD server if it wants to capture all the block
> > device backups into a single archive.  And the management tool can bundle the
> > VM configuration into that archive too.  But those steps are beyond the scope of
> > QEMU.
> > 
> > The approach I'm suggesting is more modular.  For example, the backup block
> > job can also be used to copy out the state of a disk into a new
> > qcow2 file.
> 
> OK, I can try to split the code. But I think I will simply define a 'protocol' instead
> of using an NBD server (what for?)

block/nbd.c already exists so it saves you from writing the QEMU-side
code to export data to another process.

The archive writer program opens a listen socket for each block device
that is being backed up.  Then it handles NBD WRITE requests from QEMU.

> > > > > diff --git a/qmp-commands.hx b/qmp-commands.hx index
> > > > > 799adea..17e381b
> > > > > 100644
> > > > > --- a/qmp-commands.hx
> > > > > +++ b/qmp-commands.hx
> > > > > @@ -889,6 +889,18 @@ EQMP
> > > > >      },
> > > > >
> > > > >      {
> > > > > +        .name       = "backup",
> > > > > +        .args_type  = "backup-file:s,format:s?,config-
> > file:F?,speed:o?,devlist:s?",
> > > > > +        .mhandler.cmd_new = qmp_marshal_input_backup,
> > > > > +    },
> > > > > +
> > > > > +    {
> > > > > +        .name       = "backup-cancel",
> > > > > +        .args_type  = "",
> > > > > +        .mhandler.cmd_new = qmp_marshal_input_backup_cancel,
> > > > > +    },
> > > >
> > > > We might want to more than one backup if the guest has multiple disks.
> > > > For example, the database disk is backed up independently of the main OS
> > disk.
> > > >
> > > > This API only allows one backup to run at a time.
> > >
> > > I do not want multiple backup jobs. You can easily run 2 jobs in sequence to
> > solve above use case.
> > 
> > Why do you not want multiple backup jobs?  It makes perfect sense to separate
> > database disks from main OS disks.  They have different backup characteristics
> > (how often to back up, how to restore) so it's likely that users will ask for
> > multiple backup jobs at the same time.  Let's get the QMP interfaces right so that
> > it can be supported in the future, if not right away.
> 
> So you want to pass the 'uuid' to backup-cancel?

Yes, that works.  Or let the user specify the ID for this backup job.
That way the management tool can use its own IDs.  QEMU doesn't care, it
treats the IDs as opaque strings.

Stefan
Dietmar Maurer Feb. 22, 2013, 10:14 a.m. UTC | #8
> > The management tooI just needs to convert the config - looks quite easy to
> me.
> 
> It's not an easy problem.  This is why there is no central vm-images.com where
> everyone can share/sell virtual appliances.  You cannot trivially convert between
> VMware, oVirt, proxmox, Xen, EC2, etc.

For that case, I added the ability to write several different config files into a VMA archive.

So If someone want to distribute an appliance, he can simply add a config file for each manager.
 
> > > QEMU must provide the mechanism for point-in-time backups of block
> > > devices - your backup block job implements this.
> > >
> > > Where I disagree with this patch series is that you put the
> > > management tool- specific archive format writer into QEMU.  That is outside
> the scope of QEMU.
> > > The management tool should tell QEMU to take the backups of block
> > > devices and do a live migration to file.
> > >
> > > The management tool can use a NBD server if it wants to capture all
> > > the block device backups into a single archive.  And the management
> > > tool can bundle the VM configuration into that archive too.  But
> > > those steps are beyond the scope of QEMU.
> > >
> > > The approach I'm suggesting is more modular.  For example, the
> > > backup block job can also be used to copy out the state of a disk
> > > into a new
> > > qcow2 file.
> >
> > OK, I can try to split the code. But I think I will simply define a
> > 'protocol' instead of using an NBD server (what for?)
> 
> block/nbd.c already exists so it saves you from writing the QEMU-side code to
> export data to another process.

Is there some documentation about that NBD protocol?

> The archive writer program opens a listen socket for each block device that is
> being backed up.  Then it handles NBD WRITE requests from QEMU.

I still think using a specific protocol is wrong. You don't want to use NDB if you can directly
talk to some backup server?
Stefan Hajnoczi Feb. 25, 2013, 1:45 p.m. UTC | #9
On Fri, Feb 22, 2013 at 10:14:03AM +0000, Dietmar Maurer wrote:
> > > > QEMU must provide the mechanism for point-in-time backups of block
> > > > devices - your backup block job implements this.
> > > >
> > > > Where I disagree with this patch series is that you put the
> > > > management tool- specific archive format writer into QEMU.  That is outside
> > the scope of QEMU.
> > > > The management tool should tell QEMU to take the backups of block
> > > > devices and do a live migration to file.
> > > >
> > > > The management tool can use a NBD server if it wants to capture all
> > > > the block device backups into a single archive.  And the management
> > > > tool can bundle the VM configuration into that archive too.  But
> > > > those steps are beyond the scope of QEMU.
> > > >
> > > > The approach I'm suggesting is more modular.  For example, the
> > > > backup block job can also be used to copy out the state of a disk
> > > > into a new
> > > > qcow2 file.
> > >
> > > OK, I can try to split the code. But I think I will simply define a
> > > 'protocol' instead of using an NBD server (what for?)
> > 
> > block/nbd.c already exists so it saves you from writing the QEMU-side code to
> > export data to another process.
> 
> Is there some documentation about that NBD protocol?

See /usr/include/linux/nbd.h.  There are four commands to the protocol:

enum {
        NBD_CMD_READ = 0,
        NBD_CMD_WRITE = 1,
        NBD_CMD_DISC = 2,
        /* there is a gap here to match userspace */
        NBD_CMD_TRIM = 4
};

NBD_CMD_DISC is "disconnect".  NBD_CMD_TRIM discards regions of the
device.

The command flow is request-response.  Each request has a unique
identifier called the "handle", which allows the client to issue
multiple requests in parallel:

/*
 * This is the packet used for communication between client and
 * server. All data are in network byte order.
 */
struct nbd_request {
        __be32 magic;
        __be32 type;    /* == READ || == WRITE  */
        char handle[8];
        __be64 from;
        __be32 len;
} __attribute__((packed));

/*
 * This is the reply packet that nbd-server sends back to the client
 * after
 * it has completed an I/O request (or an error occurs).
 */
struct nbd_reply {
	__be32 magic;
	__be32 error;           /* 0 = ok, else error   */
	char handle[8];         /* handle you got
				   from request  */
};

> > The archive writer program opens a listen socket for each block device that is
> > being backed up.  Then it handles NBD WRITE requests from QEMU.
> 
> I still think using a specific protocol is wrong. You don't want to use NDB if you can directly
> talk to some backup server?

NBD enables interprocess communication - any form of IPC requires a
protocol and NBD is quite a trivial one.  What is a simpler way of
talking to a backup server?

The only other vendor-independent method is writing a raw file to an
iSCSI, NFS, or CIFS export on the backup server.  QEMU also supports
that.

Those are two reasonable options, depending on whether you want to
implement a simple protocol (NBD) or you already have an existing
iSCSI/NFS/CIFS stack.

Stefan
Dietmar Maurer Feb. 27, 2013, 3:50 p.m. UTC | #10
> NBD enables interprocess communication - any form of IPC requires a protocol
> and NBD is quite a trivial one.  What is a simpler way of talking to a backup
> server?

Unfortunately, NBD add considerable overheads. I guess the socket communications copies data.
This is really unnecessary if I can write directly to the output stream.
Stefan Hajnoczi Feb. 28, 2013, 2:49 p.m. UTC | #11
On Wed, Feb 27, 2013 at 03:50:53PM +0000, Dietmar Maurer wrote:
> > NBD enables interprocess communication - any form of IPC requires a protocol
> > and NBD is quite a trivial one.  What is a simpler way of talking to a backup
> > server?
> 
> Unfortunately, NBD add considerable overheads. I guess the socket communications copies data.
> This is really unnecessary if I can write directly to the output stream.

The disk is the bottleneck, not memory bandwidth.  Hard disks only do
10-100 MB/sec and SSDs only do a couple 100 MB/sec.  Memory copy is
insignificant compared to the I/O activity required to copy out the
entire disk image, not to mention delaying guest writes until we read
the original data from the disk.

Unless there's a concrete performance problem here this is premature
optimization.

Stefan
Dietmar Maurer Feb. 28, 2013, 3:24 p.m. UTC | #12
> > Unfortunately, NBD add considerable overheads. I guess the socket
> communications copies data.
> > This is really unnecessary if I can write directly to the output stream.
> 
> The disk is the bottleneck, not memory bandwidth.  Hard disks only do
> 10-100 MB/sec and SSDs only do a couple 100 MB/sec.  Memory copy is
> insignificant compared to the I/O activity required to copy out the entire disk
> image, not to mention delaying guest writes until we read the original data from
> the disk.
> 
> Unless there's a concrete performance problem here this is premature
> optimization.

I currently test with about 700MB read speed, and get a slow down by factor 1.7.
So memory copy is very significant , or there is something wrong with nbd.c.

RAID with normal HDs can easily give you 500MB/s, and RAID with multiple SSDs can
give you more than 1000MB/s read speed.

There are some high performance ipc libraries, for example the one from corosync:

http://kernel.org/doc/ols/2009/ols2009-pages-61-68.pdf

Such libraries tries to avoid copying data.
Dietmar Maurer Feb. 28, 2013, 3:32 p.m. UTC | #13
> The disk is the bottleneck, not memory bandwidth.  Hard disks only do
> 10-100 MB/sec and SSDs only do a couple 100 MB/sec.  Memory copy is
> insignificant compared to the I/O activity required to copy out the entire disk
> image, not to mention delaying guest writes until we read the original data from
> the disk.

Not to mention fusion-io card, which are currently at 6GB/s.
Stefan Hajnoczi March 4, 2013, 12:58 p.m. UTC | #14
On Thu, Feb 28, 2013 at 03:24:27PM +0000, Dietmar Maurer wrote:
> > > Unfortunately, NBD add considerable overheads. I guess the socket
> > communications copies data.
> > > This is really unnecessary if I can write directly to the output stream.
> > 
> > The disk is the bottleneck, not memory bandwidth.  Hard disks only do
> > 10-100 MB/sec and SSDs only do a couple 100 MB/sec.  Memory copy is
> > insignificant compared to the I/O activity required to copy out the entire disk
> > image, not to mention delaying guest writes until we read the original data from
> > the disk.
> > 
> > Unless there's a concrete performance problem here this is premature
> > optimization.
> 
> I currently test with about 700MB read speed, and get a slow down by factor 1.7.
> So memory copy is very significant , or there is something wrong with nbd.c.

What are the details of the test?

Is it using 64 KB writes and have you tried 256 KB writes?

Stefan
Dietmar Maurer March 4, 2013, 1:16 p.m. UTC | #15
> What are the details of the test?
> 
> Is it using 64 KB writes and have you tried 256 KB writes?

I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for backup).
Kevin Wolf March 4, 2013, 1:55 p.m. UTC | #16
Am 04.03.2013 um 14:16 hat Dietmar Maurer geschrieben:
> > What are the details of the test?
> > 
> > Is it using 64 KB writes and have you tried 256 KB writes?
> 
> I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for backup).

Maybe you'd better use a different output format that doesn't restrict
you to 64k writes.

Kevin
Dietmar Maurer March 4, 2013, 2:33 p.m. UTC | #17
> > > Is it using 64 KB writes and have you tried 256 KB writes?
> >
> > I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for
> backup).
> 
> Maybe you'd better use a different output format that doesn't restrict you to
> 64k writes.

The output format is not really the restriction. The problem is that an additional
IPC layer add overhead, an d I do not want that (because it is totally unnecessary).
Stefan Hajnoczi March 6, 2013, 12:44 p.m. UTC | #18
On Mon, Mar 04, 2013 at 02:33:16PM +0000, Dietmar Maurer wrote:
> > > > Is it using 64 KB writes and have you tried 256 KB writes?
> > >
> > > I use a modified 'qemu-img convert' at 64KB block size (I need 64KB for
> > backup).
> > 
> > Maybe you'd better use a different output format that doesn't restrict you to
> > 64k writes.
> 
> The output format is not really the restriction. The problem is that an additional
> IPC layer add overhead, an d I do not want that (because it is totally unnecessary).

I missed the reason why you cannot increase the block size.  At least
increase the NBD write size - the VMA process can break up a large write
however it likes.  Do this and the overhead is divided by 4.

Stefan
Dietmar Maurer March 6, 2013, 2:42 p.m. UTC | #19
> > > Maybe you'd better use a different output format that doesn't
> > > restrict you to 64k writes.
> >
> > The output format is not really the restriction. The problem is that
> > an additional IPC layer add overhead, an d I do not want that (because it is
> totally unnecessary).
> 
> I missed the reason why you cannot increase the block size.  

When we run backup, we need to read such block on every write from the guest.
So if we increase block size we get additional delays.
Kevin Wolf March 6, 2013, 3:22 p.m. UTC | #20
Am 06.03.2013 um 15:42 hat Dietmar Maurer geschrieben:
> > > > Maybe you'd better use a different output format that doesn't
> > > > restrict you to 64k writes.
> > >
> > > The output format is not really the restriction. The problem is that
> > > an additional IPC layer add overhead, an d I do not want that (because it is
> > totally unnecessary).
> > 
> > I missed the reason why you cannot increase the block size.  
> 
> When we run backup, we need to read such block on every write from the guest.
> So if we increase block size we get additional delays.

How about variable block sizes? I mean this is a stream format that has
a header for each block anyway. Include a size there and be done.

Kevin
Dietmar Maurer March 6, 2013, 3:33 p.m. UTC | #21
> > When we run backup, we need to read such block on every write from the
> guest.
> > So if we increase block size we get additional delays.
> 
> How about variable block sizes? I mean this is a stream format that has a header
> for each block anyway. Include a size there and be done.

You can make that as complex as you want. I simply do not need that.
Kevin Wolf March 6, 2013, 3:48 p.m. UTC | #22
Am 06.03.2013 um 16:33 hat Dietmar Maurer geschrieben:
> > > When we run backup, we need to read such block on every write from the
> > guest.
> > > So if we increase block size we get additional delays.
> > 
> > How about variable block sizes? I mean this is a stream format that has a header
> > for each block anyway. Include a size there and be done.
> 
> You can make that as complex as you want. I simply do not need that.

Then you also don't need the performance that you lose by using NBD.

Kevin
Dietmar Maurer March 6, 2013, 5:39 p.m. UTC | #23
> > > How about variable block sizes? I mean this is a stream format that
> > > has a header for each block anyway. Include a size there and be done.
> >
> > You can make that as complex as you want. I simply do not need that.
> 
> Then you also don't need the performance that you lose by using NBD.

Why exactly?
Kevin Wolf March 7, 2013, 9:04 a.m. UTC | #24
Am 06.03.2013 um 18:39 hat Dietmar Maurer geschrieben:
> > > > How about variable block sizes? I mean this is a stream format that
> > > > has a header for each block anyway. Include a size there and be done.
> > >
> > > You can make that as complex as you want. I simply do not need that.
> > 
> > Then you also don't need the performance that you lose by using NBD.
> 
> Why exactly?

Because your format kills more performance than any NBD connection
could.

Kevin
Stefan Hajnoczi March 7, 2013, 9:05 a.m. UTC | #25
On Wed, Mar 06, 2013 at 02:42:57PM +0000, Dietmar Maurer wrote:
> > > > Maybe you'd better use a different output format that doesn't
> > > > restrict you to 64k writes.
> > >
> > > The output format is not really the restriction. The problem is that
> > > an additional IPC layer add overhead, an d I do not want that (because it is
> > totally unnecessary).
> > 
> > I missed the reason why you cannot increase the block size.  
> 
> When we run backup, we need to read such block on every write from the guest.
> So if we increase block size we get additional delays.

Don't increase the bitmap block size.

Just let the block job do larger reads.  This is the bulk of the I/O
workload.  You can use large reads here independently of the bitmap
block size.

That way guest writes still only have a 64 KB read overhead but you
reduce the overhead of doing so many 64 KB writes from the backup block
job.

Stefan
Dietmar Maurer March 7, 2013, 9:22 a.m. UTC | #26
> > > > You can make that as complex as you want. I simply do not need that.
> > >
> > > Then you also don't need the performance that you lose by using NBD.
> >
> > Why exactly?
> 
> Because your format kills more performance than any NBD connection could.

That is not true.
Dietmar Maurer March 7, 2013, 9:28 a.m. UTC | #27
> > When we run backup, we need to read such block on every write from the
> guest.
> > So if we increase block size we get additional delays.
> 
> Don't increase the bitmap block size.
> 
> Just let the block job do larger reads.  This is the bulk of the I/O workload.  You
> can use large reads here independently of the bitmap block size.
> 
> That way guest writes still only have a 64 KB read overhead but you reduce the
> overhead of doing so many 64 KB writes from the backup block job.

If there are many writes from the guest, you still get many 64KB block.

Anyways, and additional RPC layer always adds overhead, and It can be completely avoided.
Maybe not much, but I want to make backup as efficient as possible.
Stefan Hajnoczi March 8, 2013, 10:48 a.m. UTC | #28
On Thu, Mar 07, 2013 at 09:28:40AM +0000, Dietmar Maurer wrote:
> > > When we run backup, we need to read such block on every write from the
> > guest.
> > > So if we increase block size we get additional delays.
> > 
> > Don't increase the bitmap block size.
> > 
> > Just let the block job do larger reads.  This is the bulk of the I/O workload.  You
> > can use large reads here independently of the bitmap block size.
> > 
> > That way guest writes still only have a 64 KB read overhead but you reduce the
> > overhead of doing so many 64 KB writes from the backup block job.
> 
> If there are many writes from the guest, you still get many 64KB block.

In the common case the backup block job does significantly more I/O than
the guest, unless your workload is dd if=/dev/vda of=/dev/null.
Remember the block job is reading the *entire* disk!

> Anyways, and additional RPC layer always adds overhead, and It can be completely avoided.
> Maybe not much, but I want to make backup as efficient as possible.

The drawbacks outweight the performance advantage:

1. QEMU can neither backup nor restore without help from the management
   tool.  This is a strong indicator that the backup archive code should
   live outside QEMU.  I doesn't make sense for proxmox, oVirt,
   OpenStack, and others to each maintain their backup archive code
   inside qemu.git, tied to QEMU's C codebase, release cycle, and
   license.

2. QEMU already has interfaces to export the vmstate and block device
   snapshots: migration/savevm and BlockDriverState (NBD for IPC or
   raw/qcow2/vmdk for file).  It is not necessary to add a
   special-purpose interface just for backup.

3. The backup block job can be composed together with other QMP commands
   to achieve scenarios besides just VMA backup.  It's more flexible to
   add simple primitives that can be combined instead of adding a
   monolithic backup command.

For these reasons, I'm against putting backup archive code into QEMU.

If performance is key, please look into incremental backups.  Taking a
full copy of the disk image every time affects guest performance much
more than IPC overhead.  Plus there'll be less IPC if only dirty blocks
are backed up :).

Stefan
Dietmar Maurer March 8, 2013, 11:01 a.m. UTC | #29
> > Anyways, and additional RPC layer always adds overhead, and It can be
> completely avoided.
> > Maybe not much, but I want to make backup as efficient as possible.
> 
> The drawbacks outweight the performance advantage:

not for me.

> 1. QEMU can neither backup nor restore without help from the management
>    tool.  

Backup works perfectly with the current patches. You can easily trigger a backup using 
a HMP command. This is not really important, but works.

> This is a strong indicator that the backup archive code should
>    live outside QEMU.  I doesn't make sense for proxmox, oVirt,
>    OpenStack, and others to each maintain their backup archive code
>    inside qemu.git, tied to QEMU's C codebase, release cycle, and
>    license.
> 2. QEMU already has interfaces to export the vmstate and block device
>    snapshots: migration/savevm and BlockDriverState (NBD for IPC or
>    raw/qcow2/vmdk for file).  It is not necessary to add a
>    special-purpose interface just for backup.
> 
> 3. The backup block job can be composed together with other QMP commands
>    to achieve scenarios besides just VMA backup.  It's more flexible to
>    add simple primitives that can be combined instead of adding a
>    monolithic backup command.
> 
> For these reasons, I'm against putting backup archive code into QEMU.

That is OK for me - I already maintain the code outside of qemu.
Stefan Hajnoczi March 8, 2013, 4:29 p.m. UTC | #30
On Fri, Mar 8, 2013 at 12:01 PM, Dietmar Maurer <dietmar@proxmox.com> wrote:
>> > Anyways, and additional RPC layer always adds overhead, and It can be
>> completely avoided.
>> > Maybe not much, but I want to make backup as efficient as possible.
>>
>> The drawbacks outweight the performance advantage:
>
> not for me.
>
>> 1. QEMU can neither backup nor restore without help from the management
>>    tool.
>
> Backup works perfectly with the current patches. You can easily trigger a backup using
> a HMP command. This is not really important, but works.

If you send me a VMA file I can't restore it without knowing your
command-line and possibly setting up the environment.  QEMU just can't
do that, it's out of scope.  That is why the backup archive is a
management tool concept.

>> This is a strong indicator that the backup archive code should
>>    live outside QEMU.  I doesn't make sense for proxmox, oVirt,
>>    OpenStack, and others to each maintain their backup archive code
>>    inside qemu.git, tied to QEMU's C codebase, release cycle, and
>>    license.
>> 2. QEMU already has interfaces to export the vmstate and block device
>>    snapshots: migration/savevm and BlockDriverState (NBD for IPC or
>>    raw/qcow2/vmdk for file).  It is not necessary to add a
>>    special-purpose interface just for backup.
>>
>> 3. The backup block job can be composed together with other QMP commands
>>    to achieve scenarios besides just VMA backup.  It's more flexible to
>>    add simple primitives that can be combined instead of adding a
>>    monolithic backup command.
>>
>> For these reasons, I'm against putting backup archive code into QEMU.
>
> That is OK for me - I already maintain the code outside of qemu.

Does this mean you will keep this patch series out-of-tree?

What I am looking for is a stripped down patch series with just a
backup block job (no backup archive writer or migration code).  That
would be easily merged and saves you front rebasing this series as
QEMU changes.

Stefan
Dietmar Maurer March 8, 2013, 5:44 p.m. UTC | #31
> >> 1. QEMU can neither backup nor restore without help from the management
> >>    tool.
> >
> > Backup works perfectly with the current patches. You can easily
> > trigger a backup using a HMP command. This is not really important, but works.
> 
> If you send me a VMA file I can't restore it without knowing your command-line
> and possibly setting up the environment.  QEMU just can't do that, it's out of
> scope.  That is why the backup archive is a management tool concept.

I don't really know why you insist on that claim. I can do a backup with hmp command,
and then restore that later using the management tools. I just need to issue the
correct hmp backup command.

But this is not relevant for this discussion, so let us continue. We talked about the overhead
of the additional IPC layer.

> 
> >> This is a strong indicator that the backup archive code should
> >>    live outside QEMU.  I doesn't make sense for proxmox, oVirt,
> >>    OpenStack, and others to each maintain their backup archive code
> >>    inside qemu.git, tied to QEMU's C codebase, release cycle, and
> >>    license.
> >> 2. QEMU already has interfaces to export the vmstate and block device
> >>    snapshots: migration/savevm and BlockDriverState (NBD for IPC or
> >>    raw/qcow2/vmdk for file).  It is not necessary to add a
> >>    special-purpose interface just for backup.
> >>
> >> 3. The backup block job can be composed together with other QMP
> commands
> >>    to achieve scenarios besides just VMA backup.  It's more flexible to
> >>    add simple primitives that can be combined instead of adding a
> >>    monolithic backup command.
> >>
> >> For these reasons, I'm against putting backup archive code into QEMU.
> >
> > That is OK for me - I already maintain the code outside of qemu.
> 
> Does this mean you will keep this patch series out-of-tree?
> 
> What I am looking for is a stripped down patch series with just a backup block
> job (no backup archive writer or migration code).  That would be easily merged
> and saves you front rebasing this series as QEMU changes.

That is Patch 2/6?
Dietmar Maurer March 9, 2013, 2:13 p.m. UTC | #32
> >> For these reasons, I'm against putting backup archive code into QEMU.
> >
> > That is OK for me - I already maintain the code outside of qemu.
> 
> Does this mean you will keep this patch series out-of-tree?

You are 'against putting backup archive code into QEMU', so I need to
maintain it out-of-tree.
Stefan Hajnoczi March 10, 2013, 6:43 a.m. UTC | #33
On Fri, Mar 8, 2013 at 6:44 PM, Dietmar Maurer <dietmar@proxmox.com> wrote:
>> >> This is a strong indicator that the backup archive code should
>> >>    live outside QEMU.  I doesn't make sense for proxmox, oVirt,
>> >>    OpenStack, and others to each maintain their backup archive code
>> >>    inside qemu.git, tied to QEMU's C codebase, release cycle, and
>> >>    license.
>> >> 2. QEMU already has interfaces to export the vmstate and block device
>> >>    snapshots: migration/savevm and BlockDriverState (NBD for IPC or
>> >>    raw/qcow2/vmdk for file).  It is not necessary to add a
>> >>    special-purpose interface just for backup.
>> >>
>> >> 3. The backup block job can be composed together with other QMP
>> commands
>> >>    to achieve scenarios besides just VMA backup.  It's more flexible to
>> >>    add simple primitives that can be combined instead of adding a
>> >>    monolithic backup command.
>> >>
>> >> For these reasons, I'm against putting backup archive code into QEMU.
>> >
>> > That is OK for me - I already maintain the code outside of qemu.
>>
>> Does this mean you will keep this patch series out-of-tree?
>>
>> What I am looking for is a stripped down patch series with just a backup block
>> job (no backup archive writer or migration code).  That would be easily merged
>> and saves you front rebasing this series as QEMU changes.
>
> That is Patch 2/6?

Yes.  I sent an RFC series that shows this approach.  It is just a
prototype but it demonstrates that the NBD approach works and performs
reasonably well for a quick Python hack.

The stripped down patch series needs:
1. (Optional) query-block virtual_size field if management tool needs
to query drive size
2. Patch 2/6 with review comments addressed
3. 'block-backup' QMP command (and optionally HMP command)

That's it!

Stefan
Dietmar Maurer March 10, 2013, 9:32 a.m. UTC | #34
> >> What I am looking for is a stripped down patch series with just a
> >> backup block job (no backup archive writer or migration code).  That
> >> would be easily merged and saves you front rebasing this series as QEMU
> changes.
> >
> > That is Patch 2/6?
> 
> Yes.  I sent an RFC series that shows this approach.  It is just a prototype but it
> demonstrates that the NBD approach works and performs reasonably well for a
> quick Python hack.
> 
> The stripped down patch series needs:
> 1. (Optional) query-block virtual_size field if management tool needs to query
> drive size 2. Patch 2/6 with review comments addressed 3. 'block-backup' QMP
> command (and optionally HMP command)
> 
> That's it!

Thanks. I also have an  implementation here using that approach (using nbd.c).
The problem is the performance you get. 28% delay is not acceptable.

In fact, I want to have the fastest possible solution (additional delay = 0).

You want to move the code out of qemu, so that management frameworks
can easily plug their own backup tools. That is perfectly valid, but we should do
it in a way that does not influence performance.

So are there any other ideas?
Stefan Hajnoczi March 10, 2013, 11:11 a.m. UTC | #35
On Sun, Mar 10, 2013 at 10:32 AM, Dietmar Maurer <dietmar@proxmox.com> wrote:
>> >> What I am looking for is a stripped down patch series with just a
>> >> backup block job (no backup archive writer or migration code).  That
>> >> would be easily merged and saves you front rebasing this series as QEMU
>> changes.
>> >
>> > That is Patch 2/6?
>>
>> Yes.  I sent an RFC series that shows this approach.  It is just a prototype but it
>> demonstrates that the NBD approach works and performs reasonably well for a
>> quick Python hack.
>>
>> The stripped down patch series needs:
>> 1. (Optional) query-block virtual_size field if management tool needs to query
>> drive size 2. Patch 2/6 with review comments addressed 3. 'block-backup' QMP
>> command (and optionally HMP command)
>>
>> That's it!
>
> Thanks. I also have an  implementation here using that approach (using nbd.c).
> The problem is the performance you get. 28% delay is not acceptable.
>
> In fact, I want to have the fastest possible solution (additional delay = 0).
>
> You want to move the code out of qemu, so that management frameworks
> can easily plug their own backup tools. That is perfectly valid, but we should do
> it in a way that does not influence performance.
>
> So are there any other ideas?

Same discussion in the other thread, so let's move there where we both
just posted.

Stefan
diff mbox

Patch

diff --git a/backup.h b/backup.h
index d9395bc..c8ba153 100644
--- a/backup.h
+++ b/backup.h
@@ -29,4 +29,16 @@  int backup_job_create(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
                       BlockDriverCompletionFunc *backup_complete_cb,
                       void *opaque, int64_t speed);
 
+typedef struct BackupDriver {
+    const char *format;
+    void *(*open_cb)(const char *filename, uuid_t uuid, Error **errp);
+    int (*close_cb)(void *opaque, Error **errp);
+    int (*register_config_cb)(void *opaque, const char *name, gpointer data,
+                              size_t data_len);
+    int (*register_stream_cb)(void *opaque, const char *devname, size_t size);
+    int (*dump_cb)(void *opaque, uint8_t dev_id, int64_t cluster_num,
+                   unsigned char *buf, size_t *zero_bytes);
+    int (*complete_cb)(void *opaque, uint8_t dev_id, int ret);
+} BackupDriver;
+
 #endif /* QEMU_BACKUP_H */
diff --git a/blockdev.c b/blockdev.c
index 63e6f1e..c340fde 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -20,6 +20,7 @@ 
 #include "qmp-commands.h"
 #include "trace.h"
 #include "sysemu/arch_init.h"
+#include "backup.h"
 
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
 
@@ -1334,6 +1335,428 @@  void qmp_drive_mirror(const char *device, const char *target,
     drive_get_ref(drive_get_by_blockdev(bs));
 }
 
+/* Backup related function */
+
+static void backup_run_next_job(void);
+
+static struct GenericBackupState {
+    Error *error;
+    bool cancel;
+    uuid_t uuid;
+    char uuid_str[37];
+    int64_t speed;
+    time_t start_time;
+    time_t end_time;
+    char *backup_file;
+    const BackupDriver *driver;
+    void *writer;
+    GList *bcb_list;
+    size_t total;
+    size_t transferred;
+    size_t zero_bytes;
+} backup_state;
+
+typedef struct BackupCB {
+    BlockDriverState *bs;
+    uint8_t dev_id;
+    bool started;
+    bool completed;
+    size_t size;
+    size_t transferred;
+    size_t zero_bytes;
+} BackupCB;
+
+static int backup_dump_cb(void *opaque, BlockDriverState *bs,
+                          int64_t cluster_num, unsigned char *buf)
+{
+    BackupCB *bcb = opaque;
+
+    assert(backup_state.driver);
+    assert(backup_state.writer);
+    assert(backup_state.driver->dump_cb);
+
+    size_t zero_bytes = 0;
+    int bytes = backup_state.driver->dump_cb(backup_state.writer,
+                                             bcb->dev_id, cluster_num,
+                                             buf, &zero_bytes);
+
+    if (bytes > 0) {
+        bcb->transferred += bytes;
+        backup_state.transferred += bytes;
+        if (zero_bytes) {
+            bcb->zero_bytes += bytes;
+            backup_state.zero_bytes += zero_bytes;
+        }
+    }
+
+    return bytes;
+}
+
+static void backup_cleanup(void)
+{
+    if (backup_state.writer && backup_state.driver) {
+        backup_state.end_time = time(NULL);
+        Error *local_err = NULL;
+        backup_state.driver->close_cb(backup_state.writer, &local_err);
+        error_propagate(&backup_state.error, local_err);
+        backup_state.writer = NULL;
+    }
+
+    if (backup_state.bcb_list) {
+        GList *l = backup_state.bcb_list;
+        while (l) {
+            BackupCB *bcb = l->data;
+            l = g_list_next(l);
+            drive_put_ref_bh_schedule(drive_get_by_blockdev(bcb->bs));
+            g_free(bcb);
+        }
+        g_list_free(backup_state.bcb_list);
+        backup_state.bcb_list = NULL;
+    }
+}
+
+static void backup_complete_cb(void *opaque, int ret)
+{
+    BackupCB *bcb = opaque;
+
+    assert(backup_state.driver);
+    assert(backup_state.writer);
+    assert(backup_state.driver->complete_cb);
+    assert(backup_state.driver->close_cb);
+
+    bcb->completed = true;
+
+    backup_state.driver->complete_cb(backup_state.writer, bcb->dev_id, ret);
+
+    if (!backup_state.cancel) {
+        backup_run_next_job();
+    }
+}
+
+static void backup_cancel(void)
+{
+    backup_state.cancel = true;
+
+    if (!backup_state.error) {
+        error_setg(&backup_state.error, "backup cancelled");
+    }
+
+    /* drain all i/o (awake jobs waiting for aio) */
+    bdrv_drain_all();
+
+    int job_count = 0;
+    GList *l = backup_state.bcb_list;
+    while (l) {
+        BackupCB *bcb = l->data;
+        l = g_list_next(l);
+        BlockJob *job = bcb->bs->job;
+        if (job) {
+            job_count++;
+            if (!bcb->started) {
+                bcb->started = true;
+                backup_job_start(bcb->bs, true);
+            }
+            if (!bcb->completed) {
+                block_job_cancel_sync(job);
+            }
+        }
+    }
+
+    backup_cleanup();
+}
+
+void qmp_backup_cancel(Error **errp)
+{
+    backup_cancel();
+}
+
+static void backup_run_next_job(void)
+{
+    GList *l = backup_state.bcb_list;
+    while (l) {
+        BackupCB *bcb = l->data;
+        l = g_list_next(l);
+
+        if (bcb->bs && bcb->bs->job && !bcb->completed) {
+            if (!bcb->started) {
+                bcb->started = true;
+                bool cancel = backup_state.error || backup_state.cancel;
+                backup_job_start(bcb->bs, cancel);
+            }
+            return;
+        }
+    }
+
+    backup_cleanup();
+}
+
+static void backup_start_jobs(void)
+{
+    /* create all jobs (one for each device), start first one */
+    GList *l = backup_state.bcb_list;
+    while (l) {
+        BackupCB *bcb = l->data;
+        l = g_list_next(l);
+
+        if (backup_job_create(bcb->bs, backup_dump_cb, backup_complete_cb,
+                              bcb, backup_state.speed) != 0) {
+            error_setg(&backup_state.error, "backup_job_create failed");
+            backup_cancel();
+            return;
+        }
+    }
+
+    backup_run_next_job();
+}
+
+char *qmp_backup(const char *backup_file, bool has_format, BackupFormat format,
+                 bool has_config_file, const char *config_file,
+                 bool has_devlist, const char *devlist,
+                 bool has_speed, int64_t speed, Error **errp)
+{
+    BlockDriverState *bs;
+    Error *local_err = NULL;
+    uuid_t uuid;
+    void *writer = NULL;
+    gchar **devs = NULL;
+    GList *bcblist = NULL;
+
+    if (backup_state.bcb_list) {
+        error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+                  "previous backup not finished");
+        return NULL;
+    }
+
+    /* Todo: try to auto-detect format based on file name */
+    format = has_format ? format : BACKUP_FORMAT_VMA;
+
+    /* fixme: find driver for specifued format */
+    const BackupDriver *driver = NULL;
+
+    if (!driver) {
+        error_set(errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
+        return NULL;
+    }
+
+    if (has_devlist) {
+        devs = g_strsplit_set(devlist, ",;:", -1);
+
+        gchar **d = devs;
+        while (d && *d) {
+            bs = bdrv_find(*d);
+            if (bs) {
+                if (bdrv_is_read_only(bs)) {
+                    error_set(errp, QERR_DEVICE_IS_READ_ONLY, *d);
+                    goto err;
+                }
+                if (!bdrv_is_inserted(bs)) {
+                    error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
+                    goto err;
+                }
+                BackupCB *bcb = g_new0(BackupCB, 1);
+                bcb->bs = bs;
+                bcblist = g_list_append(bcblist, bcb);
+            } else {
+                error_set(errp, QERR_DEVICE_NOT_FOUND, *d);
+                goto err;
+            }
+            d++;
+        }
+
+    } else {
+
+        bs = NULL;
+        while ((bs = bdrv_next(bs))) {
+
+            if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+                continue;
+            }
+
+            BackupCB *bcb = g_new0(BackupCB, 1);
+            bcb->bs = bs;
+            bcblist = g_list_append(bcblist, bcb);
+        }
+    }
+
+    if (!bcblist) {
+        error_set(errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
+        goto err;
+    }
+
+    GList *l = bcblist;
+    while (l) {
+        BackupCB *bcb = l->data;
+        l = g_list_next(l);
+        if (bcb->bs->job) {
+            error_set(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bcb->bs));
+            goto err;
+        }
+    }
+
+    uuid_generate(uuid);
+
+    writer = driver->open_cb(backup_file, uuid, &local_err);
+    if (!writer) {
+        if (error_is_set(&local_err)) {
+            error_propagate(errp, local_err);
+        }
+        goto err;
+    }
+
+    size_t total = 0;
+
+    /* register all devices for vma writer */
+    l = bcblist;
+    while (l) {
+        BackupCB *bcb = l->data;
+        l = g_list_next(l);
+
+        int64_t size = bdrv_getlength(bcb->bs);
+        const char *devname = bdrv_get_device_name(bcb->bs);
+        bcb->dev_id = driver->register_stream_cb(writer, devname, size);
+        if (bcb->dev_id <= 0) {
+            error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+                      "register_stream failed");
+            goto err;
+        }
+        bcb->size = size;
+        total += size;
+    }
+
+    /* add configuration file to archive */
+    if (has_config_file) {
+        char *cdata = NULL;
+        gsize clen = 0;
+        GError *err = NULL;
+        if (!g_file_get_contents(config_file, &cdata, &clen, &err)) {
+            error_setg(errp, "unable to read file '%s'", config_file);
+            goto err;
+        }
+
+        const char *basename = g_path_get_basename(config_file);
+        if (driver->register_config_cb(writer, basename, cdata, clen) < 0) {
+            error_setg(errp, "register_config failed");
+            g_free(cdata);
+            goto err;
+        }
+        g_free(cdata);
+    }
+
+    /* initialize global backup_state now */
+
+    backup_state.cancel = false;
+
+    if (backup_state.error) {
+        error_free(backup_state.error);
+        backup_state.error = NULL;
+    }
+
+    backup_state.driver = driver;
+
+    backup_state.speed = (has_speed && speed > 0) ? speed : 0;
+
+    backup_state.start_time = time(NULL);
+    backup_state.end_time = 0;
+
+    if (backup_state.backup_file) {
+        g_free(backup_state.backup_file);
+    }
+    backup_state.backup_file = g_strdup(backup_file);
+
+    backup_state.writer = writer;
+
+    uuid_copy(backup_state.uuid, uuid);
+    uuid_unparse_lower(uuid, backup_state.uuid_str);
+
+    backup_state.bcb_list = bcblist;
+
+    backup_state.total = total;
+    backup_state.transferred = 0;
+    backup_state.zero_bytes = 0;
+
+    /* Grab a reference so hotplug does not delete the
+     * BlockDriverState from underneath us.
+     */
+    l = bcblist;
+    while (l) {
+        BackupCB *bcb = l->data;
+        l = g_list_next(l);
+        drive_get_ref(drive_get_by_blockdev(bcb->bs));
+    }
+
+    backup_start_jobs();
+
+    return g_strdup(backup_state.uuid_str);
+
+err:
+
+    l = bcblist;
+    while (l) {
+        g_free(l->data);
+        l = g_list_next(l);
+    }
+    g_list_free(bcblist);
+
+    if (devs) {
+        g_strfreev(devs);
+    }
+
+    if (writer) {
+        unlink(backup_file);
+        if (driver) {
+            Error *err = NULL;
+            driver->close_cb(writer, &err);
+        }
+    }
+
+    return NULL;
+}
+
+BackupStatus *qmp_query_backup(Error **errp)
+{
+    BackupStatus *info = g_malloc0(sizeof(*info));
+
+    if (!backup_state.start_time) {
+        /* not started, return {} */
+        return info;
+    }
+
+    info->has_status = true;
+    info->has_start_time = true;
+    info->start_time = backup_state.start_time;
+
+    if (backup_state.backup_file) {
+        info->has_backup_file = true;
+        info->backup_file = g_strdup(backup_state.backup_file);
+    }
+
+    info->has_uuid = true;
+    info->uuid = g_strdup(backup_state.uuid_str);
+
+    if (backup_state.end_time) {
+        if (backup_state.error) {
+            info->status = g_strdup("error");
+            info->has_errmsg = true;
+            info->errmsg = g_strdup(error_get_pretty(backup_state.error));
+        } else {
+            info->status = g_strdup("done");
+        }
+        info->has_end_time = true;
+        info->end_time = backup_state.end_time;
+    } else {
+        info->status = g_strdup("active");
+    }
+
+    info->has_total = true;
+    info->total = backup_state.total;
+    info->has_zero_bytes = true;
+    info->zero_bytes = backup_state.zero_bytes;
+    info->has_transferred = true;
+    info->transferred = backup_state.transferred;
+
+    return info;
+}
+
 static BlockJob *find_block_job(const char *device)
 {
     BlockDriverState *bs;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 64008a9..0f178d8 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -83,6 +83,35 @@  STEXI
 Copy data from a backing file into a block device.
 ETEXI
 
+   {
+        .name       = "backup",
+        .args_type  = "backupfile:s,speed:o?,devlist:s?",
+        .params     = "backupfile [speed [devlist]]",
+        .help       = "create a VM Backup.",
+        .mhandler.cmd = hmp_backup,
+    },
+
+STEXI
+@item backup
+@findex backup
+Create a VM backup.
+ETEXI
+
+    {
+        .name       = "backup_cancel",
+        .args_type  = "",
+        .params     = "",
+        .help       = "cancel the current VM backup",
+        .mhandler.cmd = hmp_backup_cancel,
+    },
+
+STEXI
+@item backup_cancel
+@findex backup_cancel
+Cancel the current VM backup.
+
+ETEXI
+
     {
         .name       = "block_job_set_speed",
         .args_type  = "device:B,speed:o",
@@ -1630,6 +1659,8 @@  show CPU statistics
 show user network stack connection states
 @item info migrate
 show migration status
+@item info backup
+show backup status
 @item info migrate_capabilities
 show current migration capabilities
 @item info migrate_cache_size
diff --git a/hmp.c b/hmp.c
index 2f47a8a..b2c1f23 100644
--- a/hmp.c
+++ b/hmp.c
@@ -131,6 +131,38 @@  void hmp_info_mice(Monitor *mon, const QDict *qdict)
     qapi_free_MouseInfoList(mice_list);
 }
 
+void hmp_info_backup(Monitor *mon, const QDict *qdict)
+{
+    BackupStatus *info;
+
+    info = qmp_query_backup(NULL);
+    if (info->has_status) {
+        if (info->has_errmsg) {
+            monitor_printf(mon, "Backup status: %s - %s\n",
+                           info->status, info->errmsg);
+        } else {
+            monitor_printf(mon, "Backup status: %s\n", info->status);
+        }
+    }
+    if (info->has_backup_file) {
+        int per = (info->has_total && info->total &&
+            info->has_transferred && info->transferred) ?
+            (info->transferred * 100)/info->total : 0;
+        int zero_per = (info->has_total && info->total &&
+                        info->has_zero_bytes && info->zero_bytes) ?
+            (info->zero_bytes * 100)/info->total : 0;
+        monitor_printf(mon, "Backup file: %s\n", info->backup_file);
+        monitor_printf(mon, "Backup uuid: %s\n", info->uuid);
+        monitor_printf(mon, "Total size: %zd\n", info->total);
+        monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
+                       info->transferred, per);
+        monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
+                       info->zero_bytes, zero_per);
+    }
+
+    qapi_free_BackupStatus(info);
+}
+
 void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 {
     MigrationInfo *info;
@@ -998,6 +1030,37 @@  void hmp_block_stream(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &error);
 }
 
+void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
+{
+    Error *errp = NULL;
+
+    qmp_backup_cancel(&errp);
+
+    if (error_is_set(&errp)) {
+        monitor_printf(mon, "%s\n", error_get_pretty(errp));
+        error_free(errp);
+        return;
+    }
+}
+
+void hmp_backup(Monitor *mon, const QDict *qdict)
+{
+    const char *backup_file = qdict_get_str(qdict, "backup-file");
+    const char *devlist = qdict_get_try_str(qdict, "devlist");
+    int64_t speed = qdict_get_try_int(qdict, "speed", 0);
+
+    Error *errp = NULL;
+
+    qmp_backup(backup_file, true, BACKUP_FORMAT_VMA, false, NULL, !!devlist,
+               devlist, qdict_haskey(qdict, "speed"), speed, &errp);
+
+    if (error_is_set(&errp)) {
+        monitor_printf(mon, "%s\n", error_get_pretty(errp));
+        error_free(errp);
+        return;
+    }
+}
+
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict)
 {
     Error *error = NULL;
diff --git a/hmp.h b/hmp.h
index 30b3c20..ad4cf80 100644
--- a/hmp.h
+++ b/hmp.h
@@ -28,6 +28,7 @@  void hmp_info_mice(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
 void hmp_info_migrate_cache_size(Monitor *mon, const QDict *qdict);
+void hmp_info_backup(Monitor *mon, const QDict *qdict);
 void hmp_info_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_block(Monitor *mon, const QDict *qdict);
 void hmp_info_blockstats(Monitor *mon, const QDict *qdict);
@@ -65,6 +66,8 @@  void hmp_eject(Monitor *mon, const QDict *qdict);
 void hmp_change(Monitor *mon, const QDict *qdict);
 void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
 void hmp_block_stream(Monitor *mon, const QDict *qdict);
+void hmp_backup(Monitor *mon, const QDict *qdict);
+void hmp_backup_cancel(Monitor *mon, const QDict *qdict);
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_block_job_pause(Monitor *mon, const QDict *qdict);
diff --git a/monitor.c b/monitor.c
index 6a0f257..e4a810c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2666,6 +2666,13 @@  static mon_cmd_t info_cmds[] = {
     },
 #endif
     {
+        .name       = "backup",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show backup status",
+        .mhandler.cmd = hmp_info_backup,
+    },
+    {
         .name       = "migrate",
         .args_type  = "",
         .params     = "",
diff --git a/qapi-schema.json b/qapi-schema.json
index 7275b5d..09ca8ef 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -425,6 +425,39 @@ 
 { 'type': 'EventInfo', 'data': {'name': 'str'} }
 
 ##
+# @BackupStatus:
+#
+# Detailed backup status.
+#
+# @status: #optional string describing the current backup status.
+#          This can be 'active', 'done', 'error'. If this field is not
+#          returned, no backup process has been initiated
+#
+# @errmsg: #optional error message (only returned if status is 'error')
+#
+# @total: #optional total amount of bytes involved in the backup process
+#
+# @transferred: #optional amount of bytes already backed up.
+#
+# @zero-bytes: #optional amount of 'zero' bytes detected.
+#
+# @start-time: #optional time (epoch) when backup job started.
+#
+# @end-time: #optional time (epoch) when backup job finished.
+#
+# @backupfile: #optional backup file name
+#
+# @uuid: #optional uuid for this backup job
+#
+# Since: 1.5.0
+##
+{ 'type': 'BackupStatus',
+  'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int',
+           '*transferred': 'int', '*zero-bytes': 'int',
+           '*start-time': 'int', '*end-time': 'int',
+           '*backup-file': 'str', '*uuid': 'str' } }
+
+##
 # @query-events:
 #
 # Return a list of supported QMP events by this server
@@ -1824,6 +1857,68 @@ 
   'data': { 'path': 'str' },
   'returns': [ 'ObjectPropertyInfo' ] }
 
+
+##
+# @BackupFormat
+#
+# An enumeration of supported backup formats.
+#
+# @vma: Proxmox vma backup format
+##
+{ 'enum': 'BackupFormat',
+  'data': [ 'vma' ] }
+
+##
+# @backup:
+#
+# Starts a VM backup.
+#
+# @backup-file: the backup file name
+#
+# @format: format of the backup file
+#
+# @config-filename: #optional name of a configuration file to include into
+# the backup archive.
+#
+# @speed: #optional the maximum speed, in bytes per second
+#
+# @devlist: #optional list of block device names (separated by ',', ';'
+# or ':'). By default the backup includes all writable block devices.
+#
+# Returns: the uuid of the backup job
+#
+# Since: 1.5.0
+##
+{ 'command': 'backup', 'data': { 'backup-file': 'str',
+                                 '*format': 'BackupFormat',
+                                 '*config-file': 'str',
+                                 '*devlist': 'str', '*speed': 'int' },
+  'returns': 'str' }
+
+##
+# @query-backup
+#
+# Returns information about current/last backup task.
+#
+# Returns: @BackupStatus
+#
+# Since: 1.5.0
+##
+{ 'command': 'query-backup', 'returns': 'BackupStatus' }
+
+##
+# @backup-cancel
+#
+# Cancel the current executing backup process.
+#
+# Returns: nothing on success
+#
+# Notes: This command succeeds even if there is no backup process running.
+#
+# Since: 1.5.0
+##
+{ 'command': 'backup-cancel' }
+
 ##
 # @qom-get:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 799adea..17e381b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -889,6 +889,18 @@  EQMP
     },
 
     {
+        .name       = "backup",
+        .args_type  = "backup-file:s,format:s?,config-file:F?,speed:o?,devlist:s?",
+        .mhandler.cmd_new = qmp_marshal_input_backup,
+    },
+
+    {
+        .name       = "backup-cancel",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_backup_cancel,
+    },
+
+    {
         .name       = "block-job-set-speed",
         .args_type  = "device:B,speed:o",
         .mhandler.cmd_new = qmp_marshal_input_block_job_set_speed,
@@ -2566,6 +2578,21 @@  EQMP
     },
 
 SQMP
+
+query-backup
+-------------
+
+Backup status.
+
+EQMP
+
+    {
+        .name       = "query-backup",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_backup,
+    },
+
+SQMP
 migrate-set-capabilities
 -------