diff mbox

[28/47] qmp: add drive-mirror command

Message ID 1343127865-16608-29-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini July 24, 2012, 11:04 a.m. UTC
This adds the monitor commands that start the mirroring job.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 hmp-commands.hx  |   21 +++++++++
 hmp.c            |   28 ++++++++++++
 hmp.h            |    1 +
 qapi-schema.json |   35 ++++++++++++++
 qmp-commands.hx  |   42 +++++++++++++++++
 trace-events     |    2 +-
 7 files changed, 258 insertions(+), 4 deletions(-)

Comments

Eric Blake July 26, 2012, 11:42 p.m. UTC | #1
On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  hmp-commands.hx  |   21 +++++++++
>  hmp.c            |   28 ++++++++++++
>  hmp.h            |    1 +
>  qapi-schema.json |   35 ++++++++++++++
>  qmp-commands.hx  |   42 +++++++++++++++++
>  trace-events     |    2 +-
>  7 files changed, 258 insertions(+), 4 deletions(-)

This command only allows mirroring from the top or the complete disk,
but no intermediate points.  That is, given:
base <- sn1 <- sn2
I can mirror sn2 in isolation, or the entire chain including base, but I
cannot mirror exactly sn1+sn2.  Instead, I would have to do a mirror of
sn2 then follow up with a partial block-stream to rebase that mirror on
top of base.  I guess this is okay, but it feels a bit limited to have
to do it in two steps instead of one.

[Hmm, your later patches introduce the ability to have a persistent
mapping file; perhaps this can be exploited to have the user pre-create
a mapping file that shows the portions allocated by sn1+sn2 as the
starting point, but I'm not there in the patch series yet to know if
this is the case.]

> +++ b/qapi-schema.json
> @@ -1367,6 +1367,41 @@
>    'returns': 'str' }
>  
>  ##
> +# @drive-mirror
> +#
> +# Start mirroring a block device's writes to a new destination.
> +#
> +# @device:  the name of the device whose writes should be mirrored.
> +#
> +# @target: the target of the new image. If the file exists, or if it
> +#          is a device, the existing file/device will be used as the new
> +#          destination.  If it does not exist, a new file will be created.
> +#
> +# @format: #optional the format of the new destination, default is to
> +#          probe is @mode is 'existing', else the format of the source

s/probe is/probe if/

> +#
> +# @mode: #optional whether and how QEMU should create a new image, default is
> +#        'absolute-paths'.
> +#
> +# @speed:  #optional the maximum speed, in bytes per second
> +#
> +# @sync: what parts of the disk image should be copied to the destination
> +#        (all the disk, only the sectors allocated in the topmost image, or
> +#        only new I/O).

Is there any reason 'sync' is listed last here, but before 'mode' in the
JSON?

> +#
> +# Returns: nothing on success
> +#          If @device is not a valid block device, DeviceNotFound
> +#          If @target can't be opened, OpenFileFailed
> +#          If @format is invalid, InvalidBlockFormat
> +#
> +# Since 1.2
> +##
> +{ 'command': 'drive-mirror',
> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
> +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
> +            '*speed': 'int' } }
> +

> +drive-mirror
> +------------
> +
> +Start mirroring a block device's writes to a new destination. target
> +specifies the target of the new image. If the file exists, or if it is
> +a device, it will be used as the new destination for writes. If does not
> +exist, a new file will be created. format specifies the format of the
> +mirror image, default is to probe if mode='existing', else the format
> +of the source.
> +
> +Arguments:
> +
> +- "device": device name to operate on (json-string)
> +- "target": name of new image file (json-string)
> +- "format": format of new image (json-string, optional)
> +- "mode": how an image file should be created into the target
> +  file/device (NewImageMode, optional, default 'absolute-paths')

Should we call out all the possible 'NewImageMode' strings here,...

> +- "speed": maximum speed of the streaming job, in bytes per second
> +  (json-int)

Mention that this is optional.

> +- "sync": what parts of the disk image should be copied to the destination;
> +  possibilities include "full" for all the disk, "top" for only the sectors
> +  allocated in the topmost image, or "none" to only replicate new I/O
> +  (MirrorSyncMode).

...given that we did the same for all the possible MirrorSyncMode strings?
Paolo Bonzini July 27, 2012, 7:04 a.m. UTC | #2
Il 27/07/2012 01:42, Eric Blake ha scritto:
> On 07/24/2012 05:04 AM, Paolo Bonzini wrote:
>> This adds the monitor commands that start the mirroring job.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  hmp-commands.hx  |   21 +++++++++
>>  hmp.c            |   28 ++++++++++++
>>  hmp.h            |    1 +
>>  qapi-schema.json |   35 ++++++++++++++
>>  qmp-commands.hx  |   42 +++++++++++++++++
>>  trace-events     |    2 +-
>>  7 files changed, 258 insertions(+), 4 deletions(-)
> 
> This command only allows mirroring from the top or the complete disk,
> but no intermediate points.  That is, given:
> base <- sn1 <- sn2
> I can mirror sn2 in isolation, or the entire chain including base, but I
> cannot mirror exactly sn1+sn2.  Instead, I would have to do a mirror of
> sn2 then follow up with a partial block-stream to rebase that mirror on
> top of base.  I guess this is okay, but it feels a bit limited to have
> to do it in two steps instead of one.

I couldn't think of a use case for this:

- preparing for a failing disk requires to keep all snapshots
unmodified, so mirroring only the top

- migration with non-shared storage requires to copy the whole disk
contents (unless you want to do part of the copy outside QEMU)

- collapsing a "tower" of images to raw for performance requires a copy
of the whole disk contents

And we need the sync parameter anyway (for 'none' and in the future for
the dirty bitmap), so I preferred to keep the API simple.

> [Hmm, your later patches introduce the ability to have a persistent
> mapping file; perhaps this can be exploited to have the user pre-create
> a mapping file that shows the portions allocated by sn1+sn2 as the
> starting point, but I'm not there in the patch series yet to know if
> this is the case.]

These are not in the posted patches (and in fact have not been developed
yet :)).

>> +++ b/qapi-schema.json
>> @@ -1367,6 +1367,41 @@
>>    'returns': 'str' }
>>  
>>  ##
>> +# @drive-mirror
>> +#
>> +# Start mirroring a block device's writes to a new destination.
>> +#
>> +# @device:  the name of the device whose writes should be mirrored.
>> +#
>> +# @target: the target of the new image. If the file exists, or if it
>> +#          is a device, the existing file/device will be used as the new
>> +#          destination.  If it does not exist, a new file will be created.
>> +#
>> +# @format: #optional the format of the new destination, default is to
>> +#          probe is @mode is 'existing', else the format of the source
> 
> s/probe is/probe if/
> 
>> +#
>> +# @mode: #optional whether and how QEMU should create a new image, default is
>> +#        'absolute-paths'.
>> +#
>> +# @speed:  #optional the maximum speed, in bytes per second
>> +#
>> +# @sync: what parts of the disk image should be copied to the destination
>> +#        (all the disk, only the sectors allocated in the topmost image, or
>> +#        only new I/O).
> 
> Is there any reason 'sync' is listed last here, but before 'mode' in the
> JSON?

No, no reason.

>> +#
>> +# Returns: nothing on success
>> +#          If @device is not a valid block device, DeviceNotFound
>> +#          If @target can't be opened, OpenFileFailed
>> +#          If @format is invalid, InvalidBlockFormat
>> +#
>> +# Since 1.2
>> +##
>> +{ 'command': 'drive-mirror',
>> +  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
>> +            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
>> +            '*speed': 'int' } }
>> +
> 
>> +drive-mirror
>> +------------
>> +
>> +Start mirroring a block device's writes to a new destination. target
>> +specifies the target of the new image. If the file exists, or if it is
>> +a device, it will be used as the new destination for writes. If does not
>> +exist, a new file will be created. format specifies the format of the
>> +mirror image, default is to probe if mode='existing', else the format
>> +of the source.
>> +
>> +Arguments:
>> +
>> +- "device": device name to operate on (json-string)
>> +- "target": name of new image file (json-string)
>> +- "format": format of new image (json-string, optional)
>> +- "mode": how an image file should be created into the target
>> +  file/device (NewImageMode, optional, default 'absolute-paths')
> 
> Should we call out all the possible 'NewImageMode' strings here,...
> 
>> +- "speed": maximum speed of the streaming job, in bytes per second
>> +  (json-int)
> 
> Mention that this is optional.
> 
>> +- "sync": what parts of the disk image should be copied to the destination;
>> +  possibilities include "full" for all the disk, "top" for only the sectors
>> +  allocated in the topmost image, or "none" to only replicate new I/O
>> +  (MirrorSyncMode).
> 
> ...given that we did the same for all the possible MirrorSyncMode strings?

We could.  The difference between the two is that NewImageMode is used
also in other commands.  In the end, I'm not even sure of the usefulness
of this documentation since the .json schema is much better and could be
converted to a format with hyperlinks.

Paolo
Kevin Wolf July 31, 2012, 9:26 a.m. UTC | #3
Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Moving the discussion upstream ]

> Why make all of it inaccessible?  Everything except target device access
> does have a stable API.  The target device access can be delayed to 1.3,
> together with the much-needed QMP schema introspection.

I'm not even sure about the QMP mirror command itself.

I don't really like it, it does too many things at once: It can create
the target image file, it opens the target and it actually starts the
mirroring. It's rather bad at the first two steps, because it doesn't
take any options. This means that it can't create qcow2v3 images, for
example. Or you can't mirror into a backup with cache=unsafe while
running your real VM on cache=writethrough.

Having an all-in-one mirror command is a nice feature for HMP, but for
QMP it's more like a design problem.

Now I see you have called it drive-mirror, so that kind of implies that
it's not the final blockdev-mirror but just a QMP version of a command
primarily designed for HMP. As such this restricted functionality may be
acceptable, but it's not like everything is already perfect and there's
no room for discussion.

Kevin
Paolo Bonzini July 31, 2012, 9:33 a.m. UTC | #4
Il 31/07/2012 11:26, Kevin Wolf ha scritto:
> Am 24.07.2012 13:04, schrieb Paolo Bonzini:
>> This adds the monitor commands that start the mirroring job.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> [ Moving the discussion upstream ]
> 
>> Why make all of it inaccessible?  Everything except target device access
>> does have a stable API.  The target device access can be delayed to 1.3,
>> together with the much-needed QMP schema introspection.
> 
> I'm not even sure about the QMP mirror command itself.
> 
> I don't really like it, it does too many things at once: It can create
> the target image file, it opens the target and it actually starts the
> mirroring. It's rather bad at the first two steps, because it doesn't
> take any options. This means that it can't create qcow2v3 images, for
> example. Or you can't mirror into a backup with cache=unsafe while
> running your real VM on cache=writethrough.

Yes, though this can be worked around with mode: 'existing'.

> Having an all-in-one mirror command is a nice feature for HMP, but for
> QMP it's more like a design problem.
> 
> Now I see you have called it drive-mirror

I thought this was your idea. :)

> , so that kind of implies that
> it's not the final blockdev-mirror but just a QMP version of a command
> primarily designed for HMP. As such this restricted functionality may be
> acceptable, but it's not like everything is already perfect and there's
> no room for discussion.

We keep going back to the same point that we do not have -blockdev, but
it's becoming a bit frustrating to always rehash this same point...

Paolo
Kevin Wolf July 31, 2012, 9:46 a.m. UTC | #5
Am 31.07.2012 11:33, schrieb Paolo Bonzini:
> Il 31/07/2012 11:26, Kevin Wolf ha scritto:
>> Am 24.07.2012 13:04, schrieb Paolo Bonzini:
>>> This adds the monitor commands that start the mirroring job.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> [ Moving the discussion upstream ]
>>
>>> Why make all of it inaccessible?  Everything except target device access
>>> does have a stable API.  The target device access can be delayed to 1.3,
>>> together with the much-needed QMP schema introspection.
>>
>> I'm not even sure about the QMP mirror command itself.
>>
>> I don't really like it, it does too many things at once: It can create
>> the target image file, it opens the target and it actually starts the
>> mirroring. It's rather bad at the first two steps, because it doesn't
>> take any options. This means that it can't create qcow2v3 images, for
>> example. Or you can't mirror into a backup with cache=unsafe while
>> running your real VM on cache=writethrough.
> 
> Yes, though this can be worked around with mode: 'existing'.

True. Only the problem with image creation, though, not the one with
bdrv_open() flags, right?

>> Having an all-in-one mirror command is a nice feature for HMP, but for
>> QMP it's more like a design problem.
>>
>> Now I see you have called it drive-mirror
> 
> I thought this was your idea. :)

Hm, then probably we discussed similar things before. :-)

>> , so that kind of implies that
>> it's not the final blockdev-mirror but just a QMP version of a command
>> primarily designed for HMP. As such this restricted functionality may be
>> acceptable, but it's not like everything is already perfect and there's
>> no room for discussion.
> 
> We keep going back to the same point that we do not have -blockdev, but
> it's becoming a bit frustrating to always rehash this same point...

The question is whether we need it at all. We do have a drive_add
if=none, and for creating a mirror target that should actually be enough.

Kevin
Paolo Bonzini July 31, 2012, 10:02 a.m. UTC | #6
Il 31/07/2012 11:46, Kevin Wolf ha scritto:
>>> I'm not even sure about the QMP mirror command itself.
>>>
>>> I don't really like it, it does too many things at once: It can create
>>> the target image file, it opens the target and it actually starts the
>>> mirroring. It's rather bad at the first two steps, because it doesn't
>>> take any options. This means that it can't create qcow2v3 images, for
>>> example. Or you can't mirror into a backup with cache=unsafe while
>>> running your real VM on cache=writethrough.
>>
>> Yes, though this can be worked around with mode: 'existing'.
> 
> True. Only the problem with image creation, though, not the one with
> bdrv_open() flags, right?

Yeah, but do you really care about for example io=threads vs. io=native?
 The only interesting one is cache=unsafe; the mirror should enable
writeback caching on the target (bdrv_swap will disable it if needed;
I'll change this in the next submission), so cache=writethrough vs.
writeback doesn't matter.

>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>> QMP it's more like a design problem.
>>>
>>> Now I see you have called it drive-mirror
>>
>> I thought this was your idea. :)
> 
> Hm, then probably we discussed similar things before. :-)
> 
>>> , so that kind of implies that
>>> it's not the final blockdev-mirror but just a QMP version of a command
>>> primarily designed for HMP. As such this restricted functionality may be
>>> acceptable, but it's not like everything is already perfect and there's
>>> no room for discussion.
>>
>> We keep going back to the same point that we do not have -blockdev, but
>> it's becoming a bit frustrating to always rehash this same point...
> 
> The question is whether we need it at all. We do have a drive_add
> if=none, and for creating a mirror target that should actually be enough.

But not for creating images.  That would require qemu-img invocation.

If you're okay with always using an existing image in the QMP case (and
moving image creation to the HMP implementation), we can do it.  But I'm
not sure I like it, I think it's excessive in the other direction.

Paolo
Kevin Wolf July 31, 2012, 10:25 a.m. UTC | #7
Am 31.07.2012 12:02, schrieb Paolo Bonzini:
> Il 31/07/2012 11:46, Kevin Wolf ha scritto:
>>>> I'm not even sure about the QMP mirror command itself.
>>>>
>>>> I don't really like it, it does too many things at once: It can create
>>>> the target image file, it opens the target and it actually starts the
>>>> mirroring. It's rather bad at the first two steps, because it doesn't
>>>> take any options. This means that it can't create qcow2v3 images, for
>>>> example. Or you can't mirror into a backup with cache=unsafe while
>>>> running your real VM on cache=writethrough.
>>>
>>> Yes, though this can be worked around with mode: 'existing'.
>>
>> True. Only the problem with image creation, though, not the one with
>> bdrv_open() flags, right?
> 
> Yeah, but do you really care about for example io=threads vs. io=native?
>  The only interesting one is cache=unsafe; the mirror should enable
> writeback caching on the target (bdrv_swap will disable it if needed;
> I'll change this in the next submission), so cache=writethrough vs.
> writeback doesn't matter.

Can we really make it writeback unconditionally? For a passive mirror it
probably doesn't make a difference, but what happens when the user stops
the mirroring and switches to the target? Will it stay writeback?

The same goes for aio=native/threads. Probably not interesting for the
mirror, but well afterwards.

Another interesting thing is I/O throttling. The mirror currently
implements rate limiting itself, but is there really a reason why we
can't reuse regular I/O throttling on the target?

>>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>>> QMP it's more like a design problem.
>>>>
>>>> Now I see you have called it drive-mirror
>>>
>>> I thought this was your idea. :)
>>
>> Hm, then probably we discussed similar things before. :-)
>>
>>>> , so that kind of implies that
>>>> it's not the final blockdev-mirror but just a QMP version of a command
>>>> primarily designed for HMP. As such this restricted functionality may be
>>>> acceptable, but it's not like everything is already perfect and there's
>>>> no room for discussion.
>>>
>>> We keep going back to the same point that we do not have -blockdev, but
>>> it's becoming a bit frustrating to always rehash this same point...
>>
>> The question is whether we need it at all. We do have a drive_add
>> if=none, and for creating a mirror target that should actually be enough.
> 
> But not for creating images.  That would require qemu-img invocation.

Yeah, either qemu-img or another monitor command. I believe that in
practice libvirt will do this anyway if this is the only way to specify
image creation options.

> If you're okay with always using an existing image in the QMP case (and
> moving image creation to the HMP implementation), we can do it.  But I'm
> not sure I like it, I think it's excessive in the other direction.

If you think it's helpful, we could make it optional and have a mode
'blockdev' where you don't specify a file name but a blockdev name. But
this is an approach that feels a bit HMPish...

Kevin
Paolo Bonzini July 31, 2012, 10:51 a.m. UTC | #8
Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>> Yeah, but do you really care about for example io=threads vs. io=native?
>>  The only interesting one is cache=unsafe; the mirror should enable
>> writeback caching on the target (bdrv_swap will disable it if needed;
>> I'll change this in the next submission), so cache=writethrough vs.
>> writeback doesn't matter.
> 
> Can we really make it writeback unconditionally? For a passive mirror it
> probably doesn't make a difference, but what happens when the user stops
> the mirroring and switches to the target? Will it stay writeback?

bdrv_swap takes care of it just fine.

> The same goes for aio=native/threads. Probably not interesting for the
> mirror, but well afterwards.

Actually it is interesting for the mirror.  Passive mirroring can only
benefit from lower latency.

But yes, bdrv_swap would not copy this one.  Right now we always use the
same aio method as the source (at worst it is ignored by the protocol),
so it is not a problem.

> Another interesting thing is I/O throttling. The mirror currently
> implements rate limiting itself, but is there really a reason why we
> can't reuse regular I/O throttling on the target?

I thought about it, but I'm worried of what happens when I/O throttling
kicks in, and how it interacts with pause/resume/cancel.

>>>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>>>> QMP it's more like a design problem.
>>>>>
>>>>> Now I see you have called it drive-mirror
>>>>
>>>> I thought this was your idea. :)
>>>
>>> Hm, then probably we discussed similar things before. :-)
>>>
>>>>> , so that kind of implies that
>>>>> it's not the final blockdev-mirror but just a QMP version of a command
>>>>> primarily designed for HMP. As such this restricted functionality may be
>>>>> acceptable, but it's not like everything is already perfect and there's
>>>>> no room for discussion.
>>>>
>>>> We keep going back to the same point that we do not have -blockdev, but
>>>> it's becoming a bit frustrating to always rehash this same point...
>>>
>>> The question is whether we need it at all. We do have a drive_add
>>> if=none, and for creating a mirror target that should actually be enough.
>>
>> But not for creating images.  That would require qemu-img invocation.
> 
> Yeah, either qemu-img or another monitor command. I believe that in
> practice libvirt will do this anyway if this is the only way to specify
> image creation options.

Playing devil's advocate because you've almost convinced me, we have the
same problem for blockdev-snapshot-sync.  Now drive-mirror is a bit
different because you can use it to "reshape" an image to something
else, but the same could be done with snapshot + streaming in many cases.

>> If you're okay with always using an existing image in the QMP case (and
>> moving image creation to the HMP implementation), we can do it.  But I'm
>> not sure I like it, I think it's excessive in the other direction.
> 
> If you think it's helpful, we could make it optional and have a mode
> 'blockdev' where you don't specify a file name but a blockdev name. But
> this is an approach that feels a bit HMPish...

I think having a few limited knobs for image creation make some sense
(not all QMP clients need to be as sophisticated as libvirt), but that's
actually an interesting idea (as it is in general to piggyback on
drive_add).

Still, it leaves something to be desired.  It's not that it feels
HMP-ish, it's that it's overloading target a bit too much.  I would
prefer to keep drive-mirror for simple clients, and have a separate
blockdev-mirror that must have a blockdev target.  But doing the same
with blockdev-snapshot-sync will always look like duct-tape, because the
blockdev name is already taken. :(  Man, sometimes it feels like we're
not getting one thing right.

Paolo
Kevin Wolf July 31, 2012, 11:13 a.m. UTC | #9
Am 31.07.2012 12:51, schrieb Paolo Bonzini:
> Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>>> Yeah, but do you really care about for example io=threads vs. io=native?
>>>  The only interesting one is cache=unsafe; the mirror should enable
>>> writeback caching on the target (bdrv_swap will disable it if needed;
>>> I'll change this in the next submission), so cache=writethrough vs.
>>> writeback doesn't matter.
>>
>> Can we really make it writeback unconditionally? For a passive mirror it
>> probably doesn't make a difference, but what happens when the user stops
>> the mirroring and switches to the target? Will it stay writeback?
> 
> bdrv_swap takes care of it just fine.

Ah, the switch uses bdrv_swap? Then that one is fine indeed.

>> The same goes for aio=native/threads. Probably not interesting for the
>> mirror, but well afterwards.
> 
> Actually it is interesting for the mirror.  Passive mirroring can only
> benefit from lower latency.
> 
> But yes, bdrv_swap would not copy this one.  Right now we always use the
> same aio method as the source (at worst it is ignored by the protocol),
> so it is not a problem.

Fair enough, though there may be cases where you'd really want to
switch, like migrating from a block device to a file on NFS.

>> Another interesting thing is I/O throttling. The mirror currently
>> implements rate limiting itself, but is there really a reason why we
>> can't reuse regular I/O throttling on the target?
> 
> I thought about it, but I'm worried of what happens when I/O throttling
> kicks in, and how it interacts with pause/resume/cancel.

bdrv_co_write won't return until the request has been throttled, so it
should be mostly transparent. The effect that it could have is that
pausing the mirror could take a little bit longer to complete (though
not much, as there is only one mirror request at the same time). But
iirc, pausing a block job was async anyway.

Any other aspect I'm missing?

>>>>>> Having an all-in-one mirror command is a nice feature for HMP, but for
>>>>>> QMP it's more like a design problem.
>>>>>>
>>>>>> Now I see you have called it drive-mirror
>>>>>
>>>>> I thought this was your idea. :)
>>>>
>>>> Hm, then probably we discussed similar things before. :-)
>>>>
>>>>>> , so that kind of implies that
>>>>>> it's not the final blockdev-mirror but just a QMP version of a command
>>>>>> primarily designed for HMP. As such this restricted functionality may be
>>>>>> acceptable, but it's not like everything is already perfect and there's
>>>>>> no room for discussion.
>>>>>
>>>>> We keep going back to the same point that we do not have -blockdev, but
>>>>> it's becoming a bit frustrating to always rehash this same point...
>>>>
>>>> The question is whether we need it at all. We do have a drive_add
>>>> if=none, and for creating a mirror target that should actually be enough.
>>>
>>> But not for creating images.  That would require qemu-img invocation.
>>
>> Yeah, either qemu-img or another monitor command. I believe that in
>> practice libvirt will do this anyway if this is the only way to specify
>> image creation options.
> 
> Playing devil's advocate because you've almost convinced me, we have the
> same problem for blockdev-snapshot-sync.  Now drive-mirror is a bit
> different because you can use it to "reshape" an image to something
> else, but the same could be done with snapshot + streaming in many cases.

Yes, blockdev-snapshot-sync is more or less the same case. We were aware
from the beginning that it's not the right command, but apparently
didn't think of drive_add.

>>> If you're okay with always using an existing image in the QMP case (and
>>> moving image creation to the HMP implementation), we can do it.  But I'm
>>> not sure I like it, I think it's excessive in the other direction.
>>
>> If you think it's helpful, we could make it optional and have a mode
>> 'blockdev' where you don't specify a file name but a blockdev name. But
>> this is an approach that feels a bit HMPish...
> 
> I think having a few limited knobs for image creation make some sense
> (not all QMP clients need to be as sophisticated as libvirt), but that's
> actually an interesting idea (as it is in general to piggyback on
> drive_add).
> 
> Still, it leaves something to be desired.  It's not that it feels
> HMP-ish, it's that it's overloading target a bit too much.  I would
> prefer to keep drive-mirror for simple clients, and have a separate
> blockdev-mirror that must have a blockdev target.  But doing the same
> with blockdev-snapshot-sync will always look like duct-tape, because the
> blockdev name is already taken. :(  Man, sometimes it feels like we're
> not getting one thing right.

blockdev-snapshot isn't taken yet. However, having the two side by side
would imply that blockdev-snapshot is async, which I believe is
currently not the most urgent of our concerns...

Or actually, it might not even matter any more, because the thing that
really takes some time is creating and opening the image. Once you have
the blockdev, there's no point in making things async any more.

Kevin
Paolo Bonzini July 31, 2012, 11:25 a.m. UTC | #10
Il 31/07/2012 13:13, Kevin Wolf ha scritto:
> Am 31.07.2012 12:51, schrieb Paolo Bonzini:
>> Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>>>> Yeah, but do you really care about for example io=threads vs. io=native?
>>>>  The only interesting one is cache=unsafe; the mirror should enable
>>>> writeback caching on the target (bdrv_swap will disable it if needed;
>>>> I'll change this in the next submission), so cache=writethrough vs.
>>>> writeback doesn't matter.
>>>
>>> Can we really make it writeback unconditionally? For a passive mirror it
>>> probably doesn't make a difference, but what happens when the user stops
>>> the mirroring and switches to the target? Will it stay writeback?
>>
>> bdrv_swap takes care of it just fine.
> 
> Ah, the switch uses bdrv_swap? Then that one is fine indeed.
> 
>>> The same goes for aio=native/threads. Probably not interesting for the
>>> mirror, but well afterwards.
>>
>> Actually it is interesting for the mirror.  Passive mirroring can only
>> benefit from lower latency.
>>
>> But yes, bdrv_swap would not copy this one.  Right now we always use the
>> same aio method as the source (at worst it is ignored by the protocol),
>> so it is not a problem.
> 
> Fair enough, though there may be cases where you'd really want to
> switch, like migrating from a block device to a file on NFS.
> 
>>> Another interesting thing is I/O throttling. The mirror currently
>>> implements rate limiting itself, but is there really a reason why we
>>> can't reuse regular I/O throttling on the target?
>>
>> I thought about it, but I'm worried of what happens when I/O throttling
>> kicks in, and how it interacts with pause/resume/cancel.
> 
> bdrv_co_write won't return until the request has been throttled, so it
> should be mostly transparent.

At the end of this series I use bdrv_aio_readv/writev.

> The effect that it could have is that
> pausing the mirror could take a little bit longer to complete (though
> not much, as there is only one mirror request at the same time).

Not anymore. :)

> But iirc, pausing a block job was async anyway.

Yes, it is, and job->busy nicely abstracts the hairy parts.

> Any other aspect I'm missing?

No, that should be ok.  Though I'm not sure if it's so useful to apply
throttling on the target.  It's more useful to throttle the source
(making writes slower than reads will help the job's convergence) and
copy at full steam to the target.

>>>> If you're okay with always using an existing image in the QMP case (and
>>>> moving image creation to the HMP implementation), we can do it.  But I'm
>>>> not sure I like it, I think it's excessive in the other direction.
>>>
>>> If you think it's helpful, we could make it optional and have a mode
>>> 'blockdev' where you don't specify a file name but a blockdev name. But
>>> this is an approach that feels a bit HMPish...
>>
>> I think having a few limited knobs for image creation make some sense
>> (not all QMP clients need to be as sophisticated as libvirt), but that's
>> actually an interesting idea (as it is in general to piggyback on
>> drive_add).
>>
>> Still, it leaves something to be desired.  It's not that it feels
>> HMP-ish, it's that it's overloading target a bit too much.  I would
>> prefer to keep drive-mirror for simple clients, and have a separate
>> blockdev-mirror that must have a blockdev target.  But doing the same
>> with blockdev-snapshot-sync will always look like duct-tape, because the
>> blockdev name is already taken. :(  Man, sometimes it feels like we're
>> not getting one thing right.
> 
> blockdev-snapshot isn't taken yet. However, having the two side by side
> would imply that blockdev-snapshot is async, which I believe is
> currently not the most urgent of our concerns...
> 
> Or actually, it might not even matter any more, because the thing that
> really takes some time is creating and opening the image. Once you have
> the blockdev, there's no point in making things async any more.

Right, blockdev-snapshot would really be just a bdrv_append operation.
/me smiles. :)  So let's keep drive-mirror as is, and later add
blockdev-mirror.

Paolo
Kevin Wolf July 31, 2012, 12:17 p.m. UTC | #11
Am 31.07.2012 13:25, schrieb Paolo Bonzini:
> Il 31/07/2012 13:13, Kevin Wolf ha scritto:
>> Am 31.07.2012 12:51, schrieb Paolo Bonzini:
>>> Il 31/07/2012 12:25, Kevin Wolf ha scritto:
>>>> Another interesting thing is I/O throttling. The mirror currently
>>>> implements rate limiting itself, but is there really a reason why we
>>>> can't reuse regular I/O throttling on the target?
>>>
>>> I thought about it, but I'm worried of what happens when I/O throttling
>>> kicks in, and how it interacts with pause/resume/cancel.
>>
>> bdrv_co_write won't return until the request has been throttled, so it
>> should be mostly transparent.
> 
> At the end of this series I use bdrv_aio_readv/writev.
> 
>> The effect that it could have is that
>> pausing the mirror could take a little bit longer to complete (though
>> not much, as there is only one mirror request at the same time).
> 
> Not anymore. :)

Hm, I see. Makes it a bit more involved, but then the logic should be
almost the same as you already need to complete the mirror.

>> But iirc, pausing a block job was async anyway.
> 
> Yes, it is, and job->busy nicely abstracts the hairy parts.
> 
>> Any other aspect I'm missing?
> 
> No, that should be ok.  Though I'm not sure if it's so useful to apply
> throttling on the target.  It's more useful to throttle the source
> (making writes slower than reads will help the job's convergence) and
> copy at full steam to the target.

But doesn't the rate limiting of the mirror already throttle the target?
Which isn't too bad, because I think at least in the initial phase
you'll want to have both source and target throttled (later the target
is automatically throttled to the level of the source, except for bitmap
granularity artefacts).

>>>>> If you're okay with always using an existing image in the QMP case (and
>>>>> moving image creation to the HMP implementation), we can do it.  But I'm
>>>>> not sure I like it, I think it's excessive in the other direction.
>>>>
>>>> If you think it's helpful, we could make it optional and have a mode
>>>> 'blockdev' where you don't specify a file name but a blockdev name. But
>>>> this is an approach that feels a bit HMPish...
>>>
>>> I think having a few limited knobs for image creation make some sense
>>> (not all QMP clients need to be as sophisticated as libvirt), but that's
>>> actually an interesting idea (as it is in general to piggyback on
>>> drive_add).
>>>
>>> Still, it leaves something to be desired.  It's not that it feels
>>> HMP-ish, it's that it's overloading target a bit too much.  I would
>>> prefer to keep drive-mirror for simple clients, and have a separate
>>> blockdev-mirror that must have a blockdev target.  But doing the same
>>> with blockdev-snapshot-sync will always look like duct-tape, because the
>>> blockdev name is already taken. :(  Man, sometimes it feels like we're
>>> not getting one thing right.
>>
>> blockdev-snapshot isn't taken yet. However, having the two side by side
>> would imply that blockdev-snapshot is async, which I believe is
>> currently not the most urgent of our concerns...
>>
>> Or actually, it might not even matter any more, because the thing that
>> really takes some time is creating and opening the image. Once you have
>> the blockdev, there's no point in making things async any more.
> 
> Right, blockdev-snapshot would really be just a bdrv_append operation.
> /me smiles. :)  So let's keep drive-mirror as is, and later add
> blockdev-mirror.

Ok, that's fair enough.

Kevin
Paolo Bonzini July 31, 2012, 12:52 p.m. UTC | #12
Il 31/07/2012 14:17, Kevin Wolf ha scritto:
>> No, that should be ok.  Though I'm not sure if it's so useful to apply
>> throttling on the target.  It's more useful to throttle the source
>> (making writes slower than reads will help the job's convergence) and
>> copy at full steam to the target.
> 
> But doesn't the rate limiting of the mirror already throttle the target?

Of course whatever you throttle (any of job, source, target) will have
an effect on the other two as well.

IMO, the target is perhaps the least useful to throttle.  It is more
interesting to play with the source, because that's guest visible.
Slowing down the target, while letting the guest run at full speed is
unlikely to help convergence of the job.

On the other hand, the job and target speeds are really duplicates of
each other, so the job speed is really just as useless.

So it sounds like removing the job speed is a good idea.  If needed,
libvirt can implement it later with a named block device for the target
+ I/O throttling.

> Which isn't too bad, because I think at least in the initial phase
> you'll want to have both source and target throttled (later the target
> is automatically throttled to the level of the source, except for bitmap
> granularity artefacts).

The target is always throttled to the level of the source and vice
versa.  The target can never be written faster than you read the source;
and slowing down the target will keep buffers busy so you cannot read
more from the source.

Paolo
Kevin Wolf Sept. 13, 2012, 1:15 p.m. UTC | #13
Am 24.07.2012 13:04, schrieb Paolo Bonzini:
> This adds the monitor commands that start the mirroring job.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c       |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  hmp-commands.hx  |   21 +++++++++
>  hmp.c            |   28 ++++++++++++
>  hmp.h            |    1 +
>  qapi-schema.json |   35 ++++++++++++++
>  qmp-commands.hx  |   42 +++++++++++++++++
>  trace-events     |    2 +-
>  7 files changed, 258 insertions(+), 4 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 192a9db..4b4574a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -21,6 +21,8 @@
>  #include "trace.h"
>  #include "arch_init.h"
>  
> +static void block_job_cb(void *opaque, int ret);
> +
>  static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
>  
>  static const char *const if_name[IF_COUNT] = {
> @@ -825,6 +827,131 @@ exit:
>      return;
>  }
>  
> +void qmp_drive_mirror(const char *device, const char *target,
> +                      bool has_format, const char *format,
> +                      enum MirrorSyncMode sync,
> +                      bool has_mode, enum NewImageMode mode,
> +                      bool has_speed, int64_t speed, Error **errp)
> +{
> +    BlockDriverInfo bdi;
> +    BlockDriverState *bs;
> +    BlockDriverState *source, *target_bs;
> +    BlockDriver *proto_drv;
> +    BlockDriver *drv = NULL;
> +    Error *local_err = NULL;
> +    int flags;
> +    uint64_t size;
> +    int ret;
> +
> +    if (!has_speed) {
> +        speed = 0;
> +    }
> +    if (!has_mode) {
> +        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
> +    }
> +
> +    bs = bdrv_find(device);
> +    if (!bs) {
> +        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
> +        return;
> +    }
> +
> +    if (!has_format) {
> +        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
> +    }
> +    if (format) {
> +        drv = bdrv_find_format(format);
> +        if (!drv) {
> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +            return;
> +        }
> +    }
> +
> +    if (!bdrv_is_inserted(bs)) {
> +        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
> +        return;
> +    }
> +
> +    if (bdrv_in_use(bs)) {
> +        error_set(errp, QERR_DEVICE_IN_USE, device);
> +        return;
> +    }
> +
> +    flags = bs->open_flags | BDRV_O_RDWR;

Do we take care to make the image read-only again after completion?

Kevin
Paolo Bonzini Sept. 13, 2012, 1:24 p.m. UTC | #14
Il 13/09/2012 15:15, Kevin Wolf ha scritto:
>> > +    flags = bs->open_flags | BDRV_O_RDWR;
> Do we take care to make the image read-only again after completion?

Not at the file-descriptor level, but bs->read_only is indeed restored
to "true" via bdrv_swap.

Doing it on the file descriptor is possible with Jeff's bdrv_reopen patches.

Paolo
Kevin Wolf Sept. 13, 2012, 1:26 p.m. UTC | #15
Am 13.09.2012 15:24, schrieb Paolo Bonzini:
> Il 13/09/2012 15:15, Kevin Wolf ha scritto:
>>>> +    flags = bs->open_flags | BDRV_O_RDWR;
>> Do we take care to make the image read-only again after completion?
> 
> Not at the file-descriptor level, but bs->read_only is indeed restored
> to "true" via bdrv_swap.
> 
> Doing it on the file descriptor is possible with Jeff's bdrv_reopen patches.

Ah, right, obviously you can't do it before Jeff's patches are in. But
yes, this is what I meant.

Kevin
Paolo Bonzini Sept. 13, 2012, 1:38 p.m. UTC | #16
Il 13/09/2012 15:26, Kevin Wolf ha scritto:
>>>>> +    flags = bs->open_flags | BDRV_O_RDWR;
>>> >> Do we take care to make the image read-only again after completion?
>> > 
>> > Not at the file-descriptor level, but bs->read_only is indeed restored
>> > to "true" via bdrv_swap.
>> > 
>> > Doing it on the file descriptor is possible with Jeff's bdrv_reopen patches.
> Ah, right, obviously you can't do it before Jeff's patches are in. But
> yes, this is what I meant.

Still, the guest won't be able to issue writes after switching to the
destination, if it couldn't do so before.

Paolo
diff mbox

Patch

diff --git a/blockdev.c b/blockdev.c
index 192a9db..4b4574a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -21,6 +21,8 @@ 
 #include "trace.h"
 #include "arch_init.h"
 
+static void block_job_cb(void *opaque, int ret);
+
 static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
 
 static const char *const if_name[IF_COUNT] = {
@@ -825,6 +827,131 @@  exit:
     return;
 }
 
+void qmp_drive_mirror(const char *device, const char *target,
+                      bool has_format, const char *format,
+                      enum MirrorSyncMode sync,
+                      bool has_mode, enum NewImageMode mode,
+                      bool has_speed, int64_t speed, Error **errp)
+{
+    BlockDriverInfo bdi;
+    BlockDriverState *bs;
+    BlockDriverState *source, *target_bs;
+    BlockDriver *proto_drv;
+    BlockDriver *drv = NULL;
+    Error *local_err = NULL;
+    int flags;
+    uint64_t size;
+    int ret;
+
+    if (!has_speed) {
+        speed = 0;
+    }
+    if (!has_mode) {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+
+    if (!has_format) {
+        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : bs->drv->format_name;
+    }
+    if (format) {
+        drv = bdrv_find_format(format);
+        if (!drv) {
+            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+            return;
+        }
+    }
+
+    if (!bdrv_is_inserted(bs)) {
+        error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+        return;
+    }
+
+    if (bdrv_in_use(bs)) {
+        error_set(errp, QERR_DEVICE_IN_USE, device);
+        return;
+    }
+
+    flags = bs->open_flags | BDRV_O_RDWR;
+    source = bs->backing_hd;
+    if (!source && sync == MIRROR_SYNC_MODE_TOP) {
+        sync = MIRROR_SYNC_MODE_FULL;
+    }
+
+    proto_drv = bdrv_find_protocol(target);
+    if (!proto_drv) {
+        error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
+        return;
+    }
+
+    if (sync == MIRROR_SYNC_MODE_FULL && mode != NEW_IMAGE_MODE_EXISTING) {
+        /* create new image w/o backing file */
+        assert(format && drv);
+        bdrv_get_geometry(bs, &size);
+        size *= 512;
+        ret = bdrv_img_create(target, format,
+                              NULL, NULL, NULL, size, flags);
+    } else {
+        switch (mode) {
+        case NEW_IMAGE_MODE_EXISTING:
+            ret = 0;
+            break;
+        case NEW_IMAGE_MODE_ABSOLUTE_PATHS:
+            /* create new image with backing file */
+            ret = bdrv_img_create(target, format,
+                                  source->filename,
+                                  source->drv->format_name,
+                                  NULL, -1, flags);
+            break;
+        default:
+            abort();
+        }
+    }
+
+    if (ret) {
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    target_bs = bdrv_new("");
+    ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv);
+
+    if (ret < 0) {
+        bdrv_delete(target_bs);
+        error_set(errp, QERR_OPEN_FILE_FAILED, target);
+        return;
+    }
+
+    /* We need a backing file if we will copy parts of a cluster.  */
+    if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 &&
+        bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) {
+        ret = bdrv_ensure_backing_file(target_bs);
+        if (ret < 0) {
+            bdrv_delete(target_bs);
+            error_set(errp, QERR_OPEN_FILE_FAILED, target);
+            return;
+        }
+    }
+
+    mirror_start(bs, target_bs, speed, sync, block_job_cb, bs, &local_err);
+    if (local_err != NULL) {
+        bdrv_delete(target_bs);
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
+
+
 
 static void eject_device(BlockDriverState *bs, int force, Error **errp)
 {
@@ -1049,12 +1176,12 @@  void qmp_block_resize(const char *device, int64_t size, Error **errp)
     }
 }
 
-static void block_stream_cb(void *opaque, int ret)
+static void block_job_cb(void *opaque, int ret)
 {
     BlockDriverState *bs = opaque;
     QObject *obj;
 
-    trace_block_stream_cb(bs, bs->job, ret);
+    trace_block_job_cb(bs, bs->job, ret);
 
     assert(bs->job);
     obj = qobject_from_block_job(bs->job);
@@ -1101,7 +1228,7 @@  void qmp_block_stream(const char *device, bool has_base,
     }
 
     stream_start(bs, base_bs, base, has_speed ? speed : 0,
-                 on_error, block_stream_cb, bs, &local_err);
+                 on_error, block_job_cb, bs, &local_err);
     if (error_is_set(&local_err)) {
         error_propagate(errp, local_err);
         return;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 7a72122..a6db333 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -973,6 +973,27 @@  Snapshot device, using snapshot file as target if provided
 ETEXI
 
     {
+        .name       = "drive_mirror",
+        .args_type  = "reuse:-n,full:-f,device:B,target:s,format:s?",
+        .params     = "[-n] [-f] device target [format]",
+        .help       = "initiates live storage\n\t\t\t"
+                      "migration for a device. The device's contents are\n\t\t\t"
+                      "copied to the new image file, including data that\n\t\t\t"
+                      "is written after the command is started.\n\t\t\t"
+                      "The -n flag requests QEMU to reuse the image found\n\t\t\t"
+                      "in new-image-file, instead of recreating it from scratch.\n\t\t\t"
+                      "The -f flag requests QEMU to copy the whole disk,\n\t\t\t"
+                      "so that the result does not need a backing file.\n\t\t\t",
+        .mhandler.cmd = hmp_drive_mirror,
+    },
+STEXI
+@item drive_mirror
+@findex drive_mirror
+Start mirroring a block device's writes to a new destination,
+using the specified target.
+ETEXI
+
+    {
         .name       = "drive_add",
         .args_type  = "pci_addr:s,opts:s",
         .params     = "[[<domain>:]<bus>:]<slot>\n"
diff --git a/hmp.c b/hmp.c
index 206e920..1d6a606 100644
--- a/hmp.c
+++ b/hmp.c
@@ -695,6 +695,34 @@  void hmp_block_resize(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &errp);
 }
 
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
+{
+    const char *device = qdict_get_str(qdict, "device");
+    const char *filename = qdict_get_str(qdict, "target");
+    const char *format = qdict_get_try_str(qdict, "format");
+    int reuse = qdict_get_try_bool(qdict, "reuse", 0);
+    int full = qdict_get_try_bool(qdict, "full", 0);
+    enum NewImageMode mode;
+    Error *errp = NULL;
+
+    if (!filename) {
+        error_set(&errp, QERR_MISSING_PARAMETER, "target");
+        hmp_handle_error(mon, &errp);
+        return;
+    }
+
+    if (reuse) {
+        mode = NEW_IMAGE_MODE_EXISTING;
+    } else {
+        mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+    }
+
+    qmp_drive_mirror(device, filename, !!format, format,
+                     full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
+                     true, mode, false, 0, &errp);
+    hmp_handle_error(mon, &errp);
+}
+
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict)
 {
     const char *device = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 24e551e..300efc2 100644
--- a/hmp.h
+++ b/hmp.h
@@ -48,6 +48,7 @@  void hmp_block_passwd(Monitor *mon, const QDict *qdict);
 void hmp_balloon(Monitor *mon, const QDict *qdict);
 void hmp_block_resize(Monitor *mon, const QDict *qdict);
 void hmp_snapshot_blkdev(Monitor *mon, const QDict *qdict);
+void hmp_drive_mirror(Monitor *mon, const QDict *qdict);
 void hmp_migrate_cancel(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_downtime(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_speed(Monitor *mon, const QDict *qdict);
diff --git a/qapi-schema.json b/qapi-schema.json
index b7f16c4..f5c20ca 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1367,6 +1367,41 @@ 
   'returns': 'str' }
 
 ##
+# @drive-mirror
+#
+# Start mirroring a block device's writes to a new destination.
+#
+# @device:  the name of the device whose writes should be mirrored.
+#
+# @target: the target of the new image. If the file exists, or if it
+#          is a device, the existing file/device will be used as the new
+#          destination.  If it does not exist, a new file will be created.
+#
+# @format: #optional the format of the new destination, default is to
+#          probe is @mode is 'existing', else the format of the source
+#
+# @mode: #optional whether and how QEMU should create a new image, default is
+#        'absolute-paths'.
+#
+# @speed:  #optional the maximum speed, in bytes per second
+#
+# @sync: what parts of the disk image should be copied to the destination
+#        (all the disk, only the sectors allocated in the topmost image, or
+#        only new I/O).
+#
+# Returns: nothing on success
+#          If @device is not a valid block device, DeviceNotFound
+#          If @target can't be opened, OpenFileFailed
+#          If @format is invalid, InvalidBlockFormat
+#
+# Since 1.2
+##
+{ 'command': 'drive-mirror',
+  'data': { 'device': 'str', 'target': 'str', '*format': 'str',
+            'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
+            '*speed': 'int' } }
+
+##
 # @migrate_cancel
 #
 # Cancel the current executing migration process.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 56f953d..7a32bb6 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -837,6 +837,48 @@  Example:
 EQMP
 
     {
+        .name       = "drive-mirror",
+        .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?",
+        .mhandler.cmd_new = qmp_marshal_input_drive_mirror,
+    },
+
+SQMP
+drive-mirror
+------------
+
+Start mirroring a block device's writes to a new destination. target
+specifies the target of the new image. If the file exists, or if it is
+a device, it will be used as the new destination for writes. If does not
+exist, a new file will be created. format specifies the format of the
+mirror image, default is to probe if mode='existing', else the format
+of the source.
+
+Arguments:
+
+- "device": device name to operate on (json-string)
+- "target": name of new image file (json-string)
+- "format": format of new image (json-string, optional)
+- "mode": how an image file should be created into the target
+  file/device (NewImageMode, optional, default 'absolute-paths')
+- "speed": maximum speed of the streaming job, in bytes per second
+  (json-int)
+- "sync": what parts of the disk image should be copied to the destination;
+  possibilities include "full" for all the disk, "top" for only the sectors
+  allocated in the topmost image, or "none" to only replicate new I/O
+  (MirrorSyncMode).
+
+
+Example:
+
+-> { "execute": "drive-mirror", "arguments": { "device": "ide-hd0",
+                                               "target": "/some/place/my-image",
+                                               "sync": "full",
+                                               "format": "qcow2" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "balloon",
         .args_type  = "value:M",
         .mhandler.cmd_new = qmp_marshal_input_balloon,
diff --git a/trace-events b/trace-events
index 1e65c77..ac58f3a 100644
--- a/trace-events
+++ b/trace-events
@@ -87,7 +87,7 @@  qmp_block_job_cancel(void *job) "job %p"
 qmp_block_job_pause(void *job) "job %p"
 qmp_block_job_resume(void *job) "job %p"
 qmp_block_job_complete(void *job) "job %p"
-block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
+block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
 # hw/virtio-blk.c