diff mbox

[1/5] block: added lock image option and callback

Message ID 1450856816-9816-2-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Dec. 23, 2015, 7:46 a.m. UTC
From: Olga Krishtal <okrishtal@virtuozzo.com>

While opening the image we want to be sure that we are the
one who works with image, anf if it is not true -
opening the image for writing should fail.

There are 2 ways at the moment: no lock at all and lock the file
image.

Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Fam Zheng <famz@redhat.com>
---
 block.c                   | 41 +++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  1 +
 include/block/block_int.h |  1 +
 qapi/block-core.json      |  9 +++++++++
 4 files changed, 52 insertions(+)

Comments

Eric Blake Dec. 23, 2015, 11:48 p.m. UTC | #1
On 12/23/2015 12:46 AM, Denis V. Lunev wrote:
> From: Olga Krishtal <okrishtal@virtuozzo.com>
> 
> While opening the image we want to be sure that we are the
> one who works with image, anf if it is not true -

s/anf/and/

> opening the image for writing should fail.
> 
> There are 2 ways at the moment: no lock at all and lock the file
> image.
> 
> Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> ---

> @@ -895,6 +897,12 @@ static QemuOptsList bdrv_runtime_opts = {
>              .type = QEMU_OPT_BOOL,
>              .help = "Ignore flush requests",
>          },
> +        {
> +            .name = "lock",
> +            .type = QEMU_OPT_STRING,
> +            .help = "How to lock the image: none or lockfile",
> +            .def_value_str = "none",

If locking is not on by default, then it is not providing the protection
we want.  Having multiple lock styles doesn't help much - advisory
locking is only useful if all clients use the same style.


> +++ b/qapi/block-core.json
> @@ -2408,3 +2408,12 @@
>  ##
>  { 'command': 'block-set-write-threshold',
>    'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
> +
> +##
> +# @BdrvLockImage:
> +#
> +#An enumeration of lock types to lock image file.
> +# @none - do not lock the image file
> +# @lockfile

I know you said this series was rough, but you didn't document much
about what lockfile means.

> +## Since 2.6
> +{ 'enum': 'BdrvLockImage', 'data':['none', 'lockfile']}
>
Kevin Wolf Jan. 11, 2016, 5:31 p.m. UTC | #2
Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
> From: Olga Krishtal <okrishtal@virtuozzo.com>
> 
> While opening the image we want to be sure that we are the
> one who works with image, anf if it is not true -
> opening the image for writing should fail.
> 
> There are 2 ways at the moment: no lock at all and lock the file
> image.
> 
> Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> CC: Fam Zheng <famz@redhat.com>

As long as locking is disabled by default, it's useless and won't
prevent people from corrupting their images. These corruptions happen
exactly because people don't know how to use qemu properly. You can't
expect them to enable locking manually.

Also, you probably need to consider bdrv_reopen() and live migration.
I think live migration would be blocked if source and destination both
see the lock; which is admittedly less likely than with the qcow2 patch
(and generally a problem of this series), but with localhost migration
and potentially with some NFS setups it can be the case.

Kevin
Daniel P. Berrangé Jan. 11, 2016, 5:58 p.m. UTC | #3
On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote:
> Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
> > From: Olga Krishtal <okrishtal@virtuozzo.com>
> > 
> > While opening the image we want to be sure that we are the
> > one who works with image, anf if it is not true -
> > opening the image for writing should fail.
> > 
> > There are 2 ways at the moment: no lock at all and lock the file
> > image.
> > 
> > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > CC: Kevin Wolf <kwolf@redhat.com>
> > CC: Max Reitz <mreitz@redhat.com>
> > CC: Eric Blake <eblake@redhat.com>
> > CC: Fam Zheng <famz@redhat.com>
> 
> As long as locking is disabled by default, it's useless and won't
> prevent people from corrupting their images. These corruptions happen
> exactly because people don't know how to use qemu properly. You can't
> expect them to enable locking manually.
> 
> Also, you probably need to consider bdrv_reopen() and live migration.
> I think live migration would be blocked if source and destination both
> see the lock; which is admittedly less likely than with the qcow2 patch
> (and generally a problem of this series), but with localhost migration
> and potentially with some NFS setups it can be the case.

Note that when libvirt does locking it will release locks when a VM
is paused, and acquire locks prior to resuming CPUs. This allows live
migration to work because you never have CPUs running on both src + dst
at the same time. This does mean that libvirt does not allow QEMU to
automatically re-start CPUs when migration completes, as it needs to
take some action to acquire locks before allowing the dst to start
running again.

Regards,
Daniel
Kevin Wolf Jan. 11, 2016, 6:35 p.m. UTC | #4
Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben:
> On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote:
> > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
> > > From: Olga Krishtal <okrishtal@virtuozzo.com>
> > > 
> > > While opening the image we want to be sure that we are the
> > > one who works with image, anf if it is not true -
> > > opening the image for writing should fail.
> > > 
> > > There are 2 ways at the moment: no lock at all and lock the file
> > > image.
> > > 
> > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
> > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > CC: Kevin Wolf <kwolf@redhat.com>
> > > CC: Max Reitz <mreitz@redhat.com>
> > > CC: Eric Blake <eblake@redhat.com>
> > > CC: Fam Zheng <famz@redhat.com>
> > 
> > As long as locking is disabled by default, it's useless and won't
> > prevent people from corrupting their images. These corruptions happen
> > exactly because people don't know how to use qemu properly. You can't
> > expect them to enable locking manually.
> > 
> > Also, you probably need to consider bdrv_reopen() and live migration.
> > I think live migration would be blocked if source and destination both
> > see the lock; which is admittedly less likely than with the qcow2 patch
> > (and generally a problem of this series), but with localhost migration
> > and potentially with some NFS setups it can be the case.
> 
> Note that when libvirt does locking it will release locks when a VM
> is paused, and acquire locks prior to resuming CPUs. This allows live
> migration to work because you never have CPUs running on both src + dst
> at the same time. This does mean that libvirt does not allow QEMU to
> automatically re-start CPUs when migration completes, as it needs to
> take some action to acquire locks before allowing the dst to start
> running again.

This assumes that block devices can only be written to if CPUs are
running. In the days of qemu 0.9, this was probably right, but with
things like block jobs and built-in NBD servers, I wouldn't be as sure
these days.

Kevin
Denis V. Lunev Jan. 12, 2016, 5:38 a.m. UTC | #5
On 01/11/2016 08:31 PM, Kevin Wolf wrote:
> Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
>> From: Olga Krishtal <okrishtal@virtuozzo.com>
>>
>> While opening the image we want to be sure that we are the
>> one who works with image, anf if it is not true -
>> opening the image for writing should fail.
>>
>> There are 2 ways at the moment: no lock at all and lock the file
>> image.
>>
>> Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Eric Blake <eblake@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
> As long as locking is disabled by default, it's useless and won't
> prevent people from corrupting their images. These corruptions happen
> exactly because people don't know how to use qemu properly. You can't
> expect them to enable locking manually.
You are right. Though this is not a big problem to enable locking.
If there are several mechanics, we could save one into the
qcow2 header.


> Also, you probably need to consider bdrv_reopen() and live migration.
> I think live migration would be blocked if source and destination both
> see the lock; which is admittedly less likely than with the qcow2 patch
> (and generally a problem of this series), but with localhost migration
> and potentially with some NFS setups it can be the case.
>
> Kevin
for live migration the situation should be not that problematic.
The disk is opened in RW mode only on one node. Am I right?

The lock is taken when the image is opened in RW mode only.

Den
Kevin Wolf Jan. 12, 2016, 10:10 a.m. UTC | #6
Am 12.01.2016 um 06:38 hat Denis V. Lunev geschrieben:
> On 01/11/2016 08:31 PM, Kevin Wolf wrote:
> >Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
> >>From: Olga Krishtal <okrishtal@virtuozzo.com>
> >>
> >>While opening the image we want to be sure that we are the
> >>one who works with image, anf if it is not true -
> >>opening the image for writing should fail.
> >>
> >>There are 2 ways at the moment: no lock at all and lock the file
> >>image.
> >>
> >>Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>CC: Kevin Wolf <kwolf@redhat.com>
> >>CC: Max Reitz <mreitz@redhat.com>
> >>CC: Eric Blake <eblake@redhat.com>
> >>CC: Fam Zheng <famz@redhat.com>
> >As long as locking is disabled by default, it's useless and won't
> >prevent people from corrupting their images. These corruptions happen
> >exactly because people don't know how to use qemu properly. You can't
> >expect them to enable locking manually.
> You are right. Though this is not a big problem to enable locking.
> If there are several mechanics, we could save one into the
> qcow2 header.

The problem is that libvirt already takes a lock, as Dan mentioned in
another reply in this thread, so we can't enable locking in qemu by
default. It would always fail when run under libvirt.

Unless I'm seriously mistaken, this means that flock() inside qemu is
dead.

> >Also, you probably need to consider bdrv_reopen() and live migration.
> >I think live migration would be blocked if source and destination both
> >see the lock; which is admittedly less likely than with the qcow2 patch
> >(and generally a problem of this series), but with localhost migration
> >and potentially with some NFS setups it can be the case.
> >
> >Kevin
> for live migration the situation should be not that problematic.
> The disk is opened in RW mode only on one node. Am I right?
> The lock is taken when the image is opened in RW mode only.

You wish. :-)

That's exactly the reason why my patch series contains all these patches
with inactivate/invalidate_cache. These patches still don't avoid that
the file descriptor is opened r/w on both sides, but at least they
ensure that it's clearly defined who actually uses it to write.

Kevin
Fam Zheng Jan. 12, 2016, 11:33 a.m. UTC | #7
On Tue, 01/12 11:10, Kevin Wolf wrote:
> 
> The problem is that libvirt already takes a lock, as Dan mentioned in
> another reply in this thread, so we can't enable locking in qemu by
> default. It would always fail when run under libvirt.
> 
> Unless I'm seriously mistaken, this means that flock() inside qemu is
> dead.

Yes, I see the problem with libvirt, but can we instead do these?

  1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
  2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.

This way, if libvirt is holding flock, we can assume libvirt is actually
"using" the image: 1) just works as before, but 2) will not break the qcow2.
That is still a slight improvement, and does solve the reckless "qemu-img
snapshot create" user's problem.

Fam
Denis V. Lunev Jan. 12, 2016, 12:24 p.m. UTC | #8
On 01/12/2016 02:33 PM, Fam Zheng wrote:
> On Tue, 01/12 11:10, Kevin Wolf wrote:
>> The problem is that libvirt already takes a lock, as Dan mentioned in
>> another reply in this thread, so we can't enable locking in qemu by
>> default. It would always fail when run under libvirt.
>>
>> Unless I'm seriously mistaken, this means that flock() inside qemu is
>> dead.
> Yes, I see the problem with libvirt, but can we instead do these?
>
>    1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
>    2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.
>
> This way, if libvirt is holding flock, we can assume libvirt is actually
> "using" the image: 1) just works as before, but 2) will not break the qcow2.
> That is still a slight improvement, and does solve the reckless "qemu-img
> snapshot create" user's problem.
>
> Fam
assuming that we should lock "by default".

This looks good to me.

Den
Kevin Wolf Jan. 12, 2016, 12:28 p.m. UTC | #9
Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben:
> On Tue, 01/12 11:10, Kevin Wolf wrote:
> > 
> > The problem is that libvirt already takes a lock, as Dan mentioned in
> > another reply in this thread, so we can't enable locking in qemu by
> > default. It would always fail when run under libvirt.
> > 
> > Unless I'm seriously mistaken, this means that flock() inside qemu is
> > dead.
> 
> Yes, I see the problem with libvirt, but can we instead do these?
> 
>   1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
>   2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.
> 
> This way, if libvirt is holding flock, we can assume libvirt is actually
> "using" the image: 1) just works as before, but 2) will not break the qcow2.
> That is still a slight improvement, and does solve the reckless "qemu-img
> snapshot create" user's problem.

This makes two assumptions:

1. qemu is only ever invoked by libvirt
2. qemu-img is only ever invoked by human users

Both of them are wrong. 1. just means that manually started QEMUs are
unprotected (which is already bad), but 2. means that qemu-img called by
libvirt fails (which is obviously not acceptable).

Kevin
Fam Zheng Jan. 12, 2016, 1:17 p.m. UTC | #10
On Tue, 01/12 13:28, Kevin Wolf wrote:
> Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben:
> > On Tue, 01/12 11:10, Kevin Wolf wrote:
> > > 
> > > The problem is that libvirt already takes a lock, as Dan mentioned in
> > > another reply in this thread, so we can't enable locking in qemu by
> > > default. It would always fail when run under libvirt.
> > > 
> > > Unless I'm seriously mistaken, this means that flock() inside qemu is
> > > dead.
> > 
> > Yes, I see the problem with libvirt, but can we instead do these?
> > 
> >   1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
> >   2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.
> > 
> > This way, if libvirt is holding flock, we can assume libvirt is actually
> > "using" the image: 1) just works as before, but 2) will not break the qcow2.
> > That is still a slight improvement, and does solve the reckless "qemu-img
> > snapshot create" user's problem.
> 
> This makes two assumptions:
> 
> 1. qemu is only ever invoked by libvirt
> 2. qemu-img is only ever invoked by human users
> 
> Both of them are wrong. 1. just means that manually started QEMUs are
> unprotected (which is already bad), but 2. means that qemu-img called by
> libvirt fails (which is obviously not acceptable).

No, my assumptions are:

a. libvirt calls flock() when invoking qemu;
b. libvirt doesn't call flock() when invoking qemu-img; (if I read the libvirt
   code correctly, input from libvirt folks needed);

So, 1. means that multiple manually started QEMU instances writing to the same
image are NOT protected, that's the limitation. 2. means that qemu-img called
by either libvirt or a human user is prevented from corrupting an in-use qcow2.

As long as I'm not wrong about b.

Fam
Daniel P. Berrangé Jan. 12, 2016, 1:24 p.m. UTC | #11
On Tue, Jan 12, 2016 at 09:17:51PM +0800, Fam Zheng wrote:
> On Tue, 01/12 13:28, Kevin Wolf wrote:
> > Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben:
> > > On Tue, 01/12 11:10, Kevin Wolf wrote:
> > > > 
> > > > The problem is that libvirt already takes a lock, as Dan mentioned in
> > > > another reply in this thread, so we can't enable locking in qemu by
> > > > default. It would always fail when run under libvirt.
> > > > 
> > > > Unless I'm seriously mistaken, this means that flock() inside qemu is
> > > > dead.
> > > 
> > > Yes, I see the problem with libvirt, but can we instead do these?
> > > 
> > >   1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
> > >   2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.
> > > 
> > > This way, if libvirt is holding flock, we can assume libvirt is actually
> > > "using" the image: 1) just works as before, but 2) will not break the qcow2.
> > > That is still a slight improvement, and does solve the reckless "qemu-img
> > > snapshot create" user's problem.
> > 
> > This makes two assumptions:
> > 
> > 1. qemu is only ever invoked by libvirt
> > 2. qemu-img is only ever invoked by human users
> > 
> > Both of them are wrong. 1. just means that manually started QEMUs are
> > unprotected (which is already bad), but 2. means that qemu-img called by
> > libvirt fails (which is obviously not acceptable).
> 
> No, my assumptions are:
> 
> a. libvirt calls flock() when invoking qemu;
> b. libvirt doesn't call flock() when invoking qemu-img; (if I read the libvirt
>    code correctly, input from libvirt folks needed);

b. is /currently/ true, but I wouldn't guarantee that will always be
true, because we've (vague) plans to extend our locking infrastructure
to cover our storage pools APIs too, at which point we'd likely be
have locking around qemu-img based API calls too. There's also likelihood
we'll make our locking API public, in which case it is possible that
an app using libvirt could have acquired locks on the file.

> So, 1. means that multiple manually started QEMU instances writing to the same
> image are NOT protected, that's the limitation. 2. means that qemu-img called
> by either libvirt or a human user is prevented from corrupting an in-use qcow2.
> 
> As long as I'm not wrong about b.

Regards,
Daniel
Denis V. Lunev Jan. 12, 2016, 3:59 p.m. UTC | #12
On 01/12/2016 02:33 PM, Fam Zheng wrote:
> On Tue, 01/12 11:10, Kevin Wolf wrote:
>> The problem is that libvirt already takes a lock, as Dan mentioned in
>> another reply in this thread, so we can't enable locking in qemu by
>> default. It would always fail when run under libvirt.
>>
>> Unless I'm seriously mistaken, this means that flock() inside qemu is
>> dead.
> Yes, I see the problem with libvirt, but can we instead do these?
>
>    1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
>    2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.
>
> This way, if libvirt is holding flock, we can assume libvirt is actually
> "using" the image: 1) just works as before, but 2) will not break the qcow2.
> That is still a slight improvement, and does solve the reckless "qemu-img
> snapshot create" user's problem.
>
> Fam
There is a better way though.

If we will switch default in my patch from 'nolock' to 'lock' then
pour guys which are calling qemu-img etc stuff will see the lock
as necessary while 'proper management software' aka libvirt
will be able to call qemu/qemu-img etc with proper 'nolock'
flag as they do care about the locking.

Though from my POW all locks should be taken in the responsible
entity, i.e. qemu or qemu-img as if locks are held by libvirt then
they should be re-taken on f.e. daemon restart, which could happen.
This is not that convenient.

Den
Fam Zheng Jan. 13, 2016, 12:08 a.m. UTC | #13
On Tue, 01/12 13:24, Daniel P. Berrange wrote:
> On Tue, Jan 12, 2016 at 09:17:51PM +0800, Fam Zheng wrote:
> > On Tue, 01/12 13:28, Kevin Wolf wrote:
> > > Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben:
> > > > On Tue, 01/12 11:10, Kevin Wolf wrote:
> > > > > 
> > > > > The problem is that libvirt already takes a lock, as Dan mentioned in
> > > > > another reply in this thread, so we can't enable locking in qemu by
> > > > > default. It would always fail when run under libvirt.
> > > > > 
> > > > > Unless I'm seriously mistaken, this means that flock() inside qemu is
> > > > > dead.
> > > > 
> > > > Yes, I see the problem with libvirt, but can we instead do these?
> > > > 
> > > >   1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
> > > >   2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.
> > > > 
> > > > This way, if libvirt is holding flock, we can assume libvirt is actually
> > > > "using" the image: 1) just works as before, but 2) will not break the qcow2.
> > > > That is still a slight improvement, and does solve the reckless "qemu-img
> > > > snapshot create" user's problem.
> > > 
> > > This makes two assumptions:
> > > 
> > > 1. qemu is only ever invoked by libvirt
> > > 2. qemu-img is only ever invoked by human users
> > > 
> > > Both of them are wrong. 1. just means that manually started QEMUs are
> > > unprotected (which is already bad), but 2. means that qemu-img called by
> > > libvirt fails (which is obviously not acceptable).
> > 
> > No, my assumptions are:
> > 
> > a. libvirt calls flock() when invoking qemu;
> > b. libvirt doesn't call flock() when invoking qemu-img; (if I read the libvirt
> >    code correctly, input from libvirt folks needed);
> 
> b. is /currently/ true, but I wouldn't guarantee that will always be
> true, because we've (vague) plans to extend our locking infrastructure
> to cover our storage pools APIs too, at which point we'd likely be
> have locking around qemu-img based API calls too. There's also likelihood
> we'll make our locking API public, in which case it is possible that
> an app using libvirt could have acquired locks on the file.
> 

This is not a problem. When you extend that in libvirt, you can at meanwhile
modify it to always specify "nolock=on" when invoking the new qemu-img so that
it doesn't check flock().

Fam
Fam Zheng Jan. 13, 2016, 12:10 a.m. UTC | #14
On Tue, 01/12 18:59, Denis V. Lunev wrote:
> On 01/12/2016 02:33 PM, Fam Zheng wrote:
> >On Tue, 01/12 11:10, Kevin Wolf wrote:
> >>The problem is that libvirt already takes a lock, as Dan mentioned in
> >>another reply in this thread, so we can't enable locking in qemu by
> >>default. It would always fail when run under libvirt.
> >>
> >>Unless I'm seriously mistaken, this means that flock() inside qemu is
> >>dead.
> >Yes, I see the problem with libvirt, but can we instead do these?
> >
> >   1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore.
> >   2) Do a hard flock() in qemu-img invocation. If it fails, report and exit.
> >
> >This way, if libvirt is holding flock, we can assume libvirt is actually
> >"using" the image: 1) just works as before, but 2) will not break the qcow2.
> >That is still a slight improvement, and does solve the reckless "qemu-img
> >snapshot create" user's problem.
> >
> >Fam
> There is a better way though.
> 
> If we will switch default in my patch from 'nolock' to 'lock' then
> pour guys which are calling qemu-img etc stuff will see the lock
> as necessary while 'proper management software' aka libvirt
> will be able to call qemu/qemu-img etc with proper 'nolock'
> flag as they do care about the locking.

That is wrong because then we break old libvirt with the new qemu-img (acquires
lock by default), which is IMO a breakage of backward compatibility.

Fam

> 
> Though from my POW all locks should be taken in the responsible
> entity, i.e. qemu or qemu-img as if locks are held by libvirt then
> they should be re-taken on f.e. daemon restart, which could happen.
> This is not that convenient.
> 
> Den
Markus Armbruster Jan. 13, 2016, 8:52 a.m. UTC | #15
Kevin Wolf <kwolf@redhat.com> writes:

> Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben:
>> On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote:
>> > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
>> > > From: Olga Krishtal <okrishtal@virtuozzo.com>
>> > > 
>> > > While opening the image we want to be sure that we are the
>> > > one who works with image, anf if it is not true -
>> > > opening the image for writing should fail.
>> > > 
>> > > There are 2 ways at the moment: no lock at all and lock the file
>> > > image.
>> > > 
>> > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
>> > > Signed-off-by: Denis V. Lunev <den@openvz.org>
>> > > CC: Kevin Wolf <kwolf@redhat.com>
>> > > CC: Max Reitz <mreitz@redhat.com>
>> > > CC: Eric Blake <eblake@redhat.com>
>> > > CC: Fam Zheng <famz@redhat.com>
>> > 
>> > As long as locking is disabled by default, it's useless and won't
>> > prevent people from corrupting their images. These corruptions happen
>> > exactly because people don't know how to use qemu properly. You can't
>> > expect them to enable locking manually.
>> > 
>> > Also, you probably need to consider bdrv_reopen() and live migration.
>> > I think live migration would be blocked if source and destination both
>> > see the lock; which is admittedly less likely than with the qcow2 patch
>> > (and generally a problem of this series), but with localhost migration
>> > and potentially with some NFS setups it can be the case.
>> 
>> Note that when libvirt does locking it will release locks when a VM
>> is paused, and acquire locks prior to resuming CPUs. This allows live
>> migration to work because you never have CPUs running on both src + dst
>> at the same time. This does mean that libvirt does not allow QEMU to
>> automatically re-start CPUs when migration completes, as it needs to
>> take some action to acquire locks before allowing the dst to start
>> running again.
>
> This assumes that block devices can only be written to if CPUs are
> running. In the days of qemu 0.9, this was probably right, but with
> things like block jobs and built-in NBD servers, I wouldn't be as sure
> these days.

Sounds like QEMU and libvirt should cooperate more closely to get the
locking less wrong.

QEMU should have more accurate knowledge on how it is using the image.
Libvirt may be able to provide better locks, with the help of its
virtlockd daemon.
Denis V. Lunev Jan. 13, 2016, 9:12 a.m. UTC | #16
On 01/13/2016 11:52 AM, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
>
>> Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben:
>>> On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote:
>>>> Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
>>>>> From: Olga Krishtal <okrishtal@virtuozzo.com>
>>>>>
>>>>> While opening the image we want to be sure that we are the
>>>>> one who works with image, anf if it is not true -
>>>>> opening the image for writing should fail.
>>>>>
>>>>> There are 2 ways at the moment: no lock at all and lock the file
>>>>> image.
>>>>>
>>>>> Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
>>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>>> CC: Kevin Wolf <kwolf@redhat.com>
>>>>> CC: Max Reitz <mreitz@redhat.com>
>>>>> CC: Eric Blake <eblake@redhat.com>
>>>>> CC: Fam Zheng <famz@redhat.com>
>>>> As long as locking is disabled by default, it's useless and won't
>>>> prevent people from corrupting their images. These corruptions happen
>>>> exactly because people don't know how to use qemu properly. You can't
>>>> expect them to enable locking manually.
>>>>
>>>> Also, you probably need to consider bdrv_reopen() and live migration.
>>>> I think live migration would be blocked if source and destination both
>>>> see the lock; which is admittedly less likely than with the qcow2 patch
>>>> (and generally a problem of this series), but with localhost migration
>>>> and potentially with some NFS setups it can be the case.
>>> Note that when libvirt does locking it will release locks when a VM
>>> is paused, and acquire locks prior to resuming CPUs. This allows live
>>> migration to work because you never have CPUs running on both src + dst
>>> at the same time. This does mean that libvirt does not allow QEMU to
>>> automatically re-start CPUs when migration completes, as it needs to
>>> take some action to acquire locks before allowing the dst to start
>>> running again.
>> This assumes that block devices can only be written to if CPUs are
>> running. In the days of qemu 0.9, this was probably right, but with
>> things like block jobs and built-in NBD servers, I wouldn't be as sure
>> these days.
> Sounds like QEMU and libvirt should cooperate more closely to get the
> locking less wrong.
>
> QEMU should have more accurate knowledge on how it is using the image.
> Libvirt may be able to provide better locks, with the help of its
> virtlockd daemon.
daemon owning locks is a problem:
- there are distributed cases
- daemons restart from time to time
Daniel P. Berrangé Jan. 13, 2016, 9:50 a.m. UTC | #17
On Wed, Jan 13, 2016 at 12:12:10PM +0300, Denis V. Lunev wrote:
> On 01/13/2016 11:52 AM, Markus Armbruster wrote:
> >Kevin Wolf <kwolf@redhat.com> writes:
> >
> >>Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben:
> >>>On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote:
> >>>>Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
> >>>>>From: Olga Krishtal <okrishtal@virtuozzo.com>
> >>>>>
> >>>>>While opening the image we want to be sure that we are the
> >>>>>one who works with image, anf if it is not true -
> >>>>>opening the image for writing should fail.
> >>>>>
> >>>>>There are 2 ways at the moment: no lock at all and lock the file
> >>>>>image.
> >>>>>
> >>>>>Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
> >>>>>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>>>>CC: Kevin Wolf <kwolf@redhat.com>
> >>>>>CC: Max Reitz <mreitz@redhat.com>
> >>>>>CC: Eric Blake <eblake@redhat.com>
> >>>>>CC: Fam Zheng <famz@redhat.com>
> >>>>As long as locking is disabled by default, it's useless and won't
> >>>>prevent people from corrupting their images. These corruptions happen
> >>>>exactly because people don't know how to use qemu properly. You can't
> >>>>expect them to enable locking manually.
> >>>>
> >>>>Also, you probably need to consider bdrv_reopen() and live migration.
> >>>>I think live migration would be blocked if source and destination both
> >>>>see the lock; which is admittedly less likely than with the qcow2 patch
> >>>>(and generally a problem of this series), but with localhost migration
> >>>>and potentially with some NFS setups it can be the case.
> >>>Note that when libvirt does locking it will release locks when a VM
> >>>is paused, and acquire locks prior to resuming CPUs. This allows live
> >>>migration to work because you never have CPUs running on both src + dst
> >>>at the same time. This does mean that libvirt does not allow QEMU to
> >>>automatically re-start CPUs when migration completes, as it needs to
> >>>take some action to acquire locks before allowing the dst to start
> >>>running again.
> >>This assumes that block devices can only be written to if CPUs are
> >>running. In the days of qemu 0.9, this was probably right, but with
> >>things like block jobs and built-in NBD servers, I wouldn't be as sure
> >>these days.
> >Sounds like QEMU and libvirt should cooperate more closely to get the
> >locking less wrong.
> >
> >QEMU should have more accurate knowledge on how it is using the image.
> >Libvirt may be able to provide better locks, with the help of its
> >virtlockd daemon.
> daemon owning locks is a problem:
> - there are distributed cases
> - daemons restart from time to time

The virtlockd daemon copes with both of these cases just fine. There is
one daemon per virtualization host, and they can be configured acquire
locks in a way that they will be enforced across all hosts. The reason
we do it in a separate virtlockd daemon instead of libvirtd is that we
designed it to be able to re-exec() itself while maintaining all locks
to allow for seemless upgrade.

Regards,
Daniel
Daniel P. Berrangé Jan. 13, 2016, 9:51 a.m. UTC | #18
On Mon, Jan 11, 2016 at 07:35:57PM +0100, Kevin Wolf wrote:
> Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben:
> > On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote:
> > > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben:
> > > > From: Olga Krishtal <okrishtal@virtuozzo.com>
> > > > 
> > > > While opening the image we want to be sure that we are the
> > > > one who works with image, anf if it is not true -
> > > > opening the image for writing should fail.
> > > > 
> > > > There are 2 ways at the moment: no lock at all and lock the file
> > > > image.
> > > > 
> > > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com>
> > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > CC: Kevin Wolf <kwolf@redhat.com>
> > > > CC: Max Reitz <mreitz@redhat.com>
> > > > CC: Eric Blake <eblake@redhat.com>
> > > > CC: Fam Zheng <famz@redhat.com>
> > > 
> > > As long as locking is disabled by default, it's useless and won't
> > > prevent people from corrupting their images. These corruptions happen
> > > exactly because people don't know how to use qemu properly. You can't
> > > expect them to enable locking manually.
> > > 
> > > Also, you probably need to consider bdrv_reopen() and live migration.
> > > I think live migration would be blocked if source and destination both
> > > see the lock; which is admittedly less likely than with the qcow2 patch
> > > (and generally a problem of this series), but with localhost migration
> > > and potentially with some NFS setups it can be the case.
> > 
> > Note that when libvirt does locking it will release locks when a VM
> > is paused, and acquire locks prior to resuming CPUs. This allows live
> > migration to work because you never have CPUs running on both src + dst
> > at the same time. This does mean that libvirt does not allow QEMU to
> > automatically re-start CPUs when migration completes, as it needs to
> > take some action to acquire locks before allowing the dst to start
> > running again.
> 
> This assumes that block devices can only be written to if CPUs are
> running. In the days of qemu 0.9, this was probably right, but with
> things like block jobs and built-in NBD servers, I wouldn't be as sure
> these days.

True, libvirt knows when it is using block jobs & NBD servers, so it
should not be difficult to address this.

Regards,
Daniel
Eric Blake Jan. 13, 2016, 4:44 p.m. UTC | #19
On 01/12/2016 05:10 PM, Fam Zheng wrote:

>> If we will switch default in my patch from 'nolock' to 'lock' then
>> pour guys which are calling qemu-img etc stuff will see the lock
>> as necessary while 'proper management software' aka libvirt
>> will be able to call qemu/qemu-img etc with proper 'nolock'
>> flag as they do care about the locking.
> 
> That is wrong because then we break old libvirt with the new qemu-img (acquires
> lock by default), which is IMO a breakage of backward compatibility.

In the big software stack picture, it is okay to reject 'old libvirt/new
qemu' as an invalid combination.  Upgrade-wise, we specifically support
'new libvirt/old qemu' - but it is fair game to say that 'if you want to
run new qemu, you must first upgrade to new libvirt that knows how to
drive it'.

That said, minimizing back-compat breaks, so that old libvirt can
(usually) correctly drive new qemu, is a worthy design goal for qemu.
Denis V. Lunev Jan. 14, 2016, 7:23 a.m. UTC | #20
On 01/13/2016 07:44 PM, Eric Blake wrote:
> On 01/12/2016 05:10 PM, Fam Zheng wrote:
>
>>> If we will switch default in my patch from 'nolock' to 'lock' then
>>> pour guys which are calling qemu-img etc stuff will see the lock
>>> as necessary while 'proper management software' aka libvirt
>>> will be able to call qemu/qemu-img etc with proper 'nolock'
>>> flag as they do care about the locking.
>> That is wrong because then we break old libvirt with the new qemu-img (acquires
>> lock by default), which is IMO a breakage of backward compatibility.
> In the big software stack picture, it is okay to reject 'old libvirt/new
> qemu' as an invalid combination.  Upgrade-wise, we specifically support
> 'new libvirt/old qemu' - but it is fair game to say that 'if you want to
> run new qemu, you must first upgrade to new libvirt that knows how to
> drive it'.
>
> That said, minimizing back-compat breaks, so that old libvirt can
> (usually) correctly drive new qemu, is a worthy design goal for qemu.
>
there is one other thing I have originally missed to add to the
picture.

Locking could be complex and format specific. In original
Parallels disk format (not image but entire bundle), the locking
is performed on a very special file.

Thus either libvirt must know exact format details or it must
rely on something which really does know details, i.e. QEMU/qemu-img.

Den
diff mbox

Patch

diff --git a/block.c b/block.c
index 411edbf..74228b8 100644
--- a/block.c
+++ b/block.c
@@ -40,6 +40,8 @@ 
 #include "qemu/timer.h"
 #include "qapi-event.h"
 #include "block/throttle-groups.h"
+#include "qapi-types.h"
+#include "qapi/util.h"
 
 #ifdef CONFIG_BSD
 #include <sys/types.h>
@@ -895,6 +897,12 @@  static QemuOptsList bdrv_runtime_opts = {
             .type = QEMU_OPT_BOOL,
             .help = "Ignore flush requests",
         },
+        {
+            .name = "lock",
+            .type = QEMU_OPT_STRING,
+            .help = "How to lock the image: none or lockfile",
+            .def_value_str = "none",
+        },
         { /* end of list */ }
     },
 };
@@ -914,6 +922,8 @@  static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
     QemuOpts *opts;
     BlockDriver *drv;
     Error *local_err = NULL;
+    const char *lock_image;
+    int lock;
 
     assert(bs->file == NULL);
     assert(options != NULL && bs->options != options);
@@ -1020,6 +1030,18 @@  static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
         goto free_and_fail;
     }
 
+    lock_image = qemu_opt_get(opts, "lock");
+    lock = qapi_enum_parse(BdrvLockImage_lookup, lock_image,
+                BDRV_LOCK_IMAGE__MAX, BDRV_LOCK_IMAGE_NONE, &local_err);
+    if (!bs->read_only && lock != BDRV_LOCK_IMAGE_NONE) {
+        ret = bdrv_lock_image(bs, lock);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Could not lock the image");
+            bdrv_close(bs);
+            goto fail_opts;
+        }
+    }
+
     if (bs->encrypted) {
         error_report("Encrypted images are deprecated");
         error_printf("Support for them will be removed in a future release.\n"
@@ -4320,3 +4342,22 @@  void bdrv_refresh_filename(BlockDriverState *bs)
         QDECREF(json);
     }
 }
+
+int bdrv_lock_image(BlockDriverState *bs, BdrvLockImage lock)
+{
+    BlockDriver *drv = bs->drv;
+    if (lock != BDRV_LOCK_IMAGE_LOCKFILE) {
+        return -EOPNOTSUPP;
+    }
+    if (drv != NULL && drv->bdrv_lock_image != NULL) {
+        return bs->drv->bdrv_lock_image(bs, lock);
+    }
+    if (bs->file == NULL) {
+        return -EOPNOTSUPP;
+    }
+    drv = bs->file->bs->drv;
+    if (drv != NULL && drv->bdrv_lock_image != NULL) {
+        return drv->bdrv_lock_image(bs, lock);
+    }
+    return -EOPNOTSUPP;
+}
diff --git a/include/block/block.h b/include/block/block.h
index db8e096..27fc434 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -506,6 +506,7 @@  void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
 void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
 void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
+int bdrv_lock_image(BlockDriverState *bs, BdrvLockImage lock);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 256609d..755f342 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -135,6 +135,7 @@  struct BlockDriver {
     int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
     int (*bdrv_set_key)(BlockDriverState *bs, const char *key);
     int (*bdrv_make_empty)(BlockDriverState *bs);
+    int (*bdrv_lock_image)(BlockDriverState *bs, BdrvLockImage lock);
 
     void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1a5d9ce..b82589b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2408,3 +2408,12 @@ 
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @BdrvLockImage:
+#
+#An enumeration of lock types to lock image file.
+# @none - do not lock the image file
+# @lockfile
+## Since 2.6
+{ 'enum': 'BdrvLockImage', 'data':['none', 'lockfile']}