Message ID | 1450856816-9816-2-git-send-email-den@openvz.org |
---|---|
State | New |
Headers | show |
On 12/23/2015 12:46 AM, Denis V. Lunev wrote: > From: Olga Krishtal <okrishtal@virtuozzo.com> > > While opening the image we want to be sure that we are the > one who works with image, anf if it is not true - s/anf/and/ > opening the image for writing should fail. > > There are 2 ways at the moment: no lock at all and lock the file > image. > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Max Reitz <mreitz@redhat.com> > CC: Eric Blake <eblake@redhat.com> > CC: Fam Zheng <famz@redhat.com> > --- > @@ -895,6 +897,12 @@ static QemuOptsList bdrv_runtime_opts = { > .type = QEMU_OPT_BOOL, > .help = "Ignore flush requests", > }, > + { > + .name = "lock", > + .type = QEMU_OPT_STRING, > + .help = "How to lock the image: none or lockfile", > + .def_value_str = "none", If locking is not on by default, then it is not providing the protection we want. Having multiple lock styles doesn't help much - advisory locking is only useful if all clients use the same style. > +++ b/qapi/block-core.json > @@ -2408,3 +2408,12 @@ > ## > { 'command': 'block-set-write-threshold', > 'data': { 'node-name': 'str', 'write-threshold': 'uint64' } } > + > +## > +# @BdrvLockImage: > +# > +#An enumeration of lock types to lock image file. > +# @none - do not lock the image file > +# @lockfile I know you said this series was rough, but you didn't document much about what lockfile means. > +## Since 2.6 > +{ 'enum': 'BdrvLockImage', 'data':['none', 'lockfile']} >
Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: > From: Olga Krishtal <okrishtal@virtuozzo.com> > > While opening the image we want to be sure that we are the > one who works with image, anf if it is not true - > opening the image for writing should fail. > > There are 2 ways at the moment: no lock at all and lock the file > image. > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Max Reitz <mreitz@redhat.com> > CC: Eric Blake <eblake@redhat.com> > CC: Fam Zheng <famz@redhat.com> As long as locking is disabled by default, it's useless and won't prevent people from corrupting their images. These corruptions happen exactly because people don't know how to use qemu properly. You can't expect them to enable locking manually. Also, you probably need to consider bdrv_reopen() and live migration. I think live migration would be blocked if source and destination both see the lock; which is admittedly less likely than with the qcow2 patch (and generally a problem of this series), but with localhost migration and potentially with some NFS setups it can be the case. Kevin
On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote: > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: > > From: Olga Krishtal <okrishtal@virtuozzo.com> > > > > While opening the image we want to be sure that we are the > > one who works with image, anf if it is not true - > > opening the image for writing should fail. > > > > There are 2 ways at the moment: no lock at all and lock the file > > image. > > > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> > > Signed-off-by: Denis V. Lunev <den@openvz.org> > > CC: Kevin Wolf <kwolf@redhat.com> > > CC: Max Reitz <mreitz@redhat.com> > > CC: Eric Blake <eblake@redhat.com> > > CC: Fam Zheng <famz@redhat.com> > > As long as locking is disabled by default, it's useless and won't > prevent people from corrupting their images. These corruptions happen > exactly because people don't know how to use qemu properly. You can't > expect them to enable locking manually. > > Also, you probably need to consider bdrv_reopen() and live migration. > I think live migration would be blocked if source and destination both > see the lock; which is admittedly less likely than with the qcow2 patch > (and generally a problem of this series), but with localhost migration > and potentially with some NFS setups it can be the case. Note that when libvirt does locking it will release locks when a VM is paused, and acquire locks prior to resuming CPUs. This allows live migration to work because you never have CPUs running on both src + dst at the same time. This does mean that libvirt does not allow QEMU to automatically re-start CPUs when migration completes, as it needs to take some action to acquire locks before allowing the dst to start running again. Regards, Daniel
Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben: > On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote: > > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: > > > From: Olga Krishtal <okrishtal@virtuozzo.com> > > > > > > While opening the image we want to be sure that we are the > > > one who works with image, anf if it is not true - > > > opening the image for writing should fail. > > > > > > There are 2 ways at the moment: no lock at all and lock the file > > > image. > > > > > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> > > > Signed-off-by: Denis V. Lunev <den@openvz.org> > > > CC: Kevin Wolf <kwolf@redhat.com> > > > CC: Max Reitz <mreitz@redhat.com> > > > CC: Eric Blake <eblake@redhat.com> > > > CC: Fam Zheng <famz@redhat.com> > > > > As long as locking is disabled by default, it's useless and won't > > prevent people from corrupting their images. These corruptions happen > > exactly because people don't know how to use qemu properly. You can't > > expect them to enable locking manually. > > > > Also, you probably need to consider bdrv_reopen() and live migration. > > I think live migration would be blocked if source and destination both > > see the lock; which is admittedly less likely than with the qcow2 patch > > (and generally a problem of this series), but with localhost migration > > and potentially with some NFS setups it can be the case. > > Note that when libvirt does locking it will release locks when a VM > is paused, and acquire locks prior to resuming CPUs. This allows live > migration to work because you never have CPUs running on both src + dst > at the same time. This does mean that libvirt does not allow QEMU to > automatically re-start CPUs when migration completes, as it needs to > take some action to acquire locks before allowing the dst to start > running again. This assumes that block devices can only be written to if CPUs are running. In the days of qemu 0.9, this was probably right, but with things like block jobs and built-in NBD servers, I wouldn't be as sure these days. Kevin
On 01/11/2016 08:31 PM, Kevin Wolf wrote: > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: >> From: Olga Krishtal <okrishtal@virtuozzo.com> >> >> While opening the image we want to be sure that we are the >> one who works with image, anf if it is not true - >> opening the image for writing should fail. >> >> There are 2 ways at the moment: no lock at all and lock the file >> image. >> >> Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> >> Signed-off-by: Denis V. Lunev <den@openvz.org> >> CC: Kevin Wolf <kwolf@redhat.com> >> CC: Max Reitz <mreitz@redhat.com> >> CC: Eric Blake <eblake@redhat.com> >> CC: Fam Zheng <famz@redhat.com> > As long as locking is disabled by default, it's useless and won't > prevent people from corrupting their images. These corruptions happen > exactly because people don't know how to use qemu properly. You can't > expect them to enable locking manually. You are right. Though this is not a big problem to enable locking. If there are several mechanics, we could save one into the qcow2 header. > Also, you probably need to consider bdrv_reopen() and live migration. > I think live migration would be blocked if source and destination both > see the lock; which is admittedly less likely than with the qcow2 patch > (and generally a problem of this series), but with localhost migration > and potentially with some NFS setups it can be the case. > > Kevin for live migration the situation should be not that problematic. The disk is opened in RW mode only on one node. Am I right? The lock is taken when the image is opened in RW mode only. Den
Am 12.01.2016 um 06:38 hat Denis V. Lunev geschrieben: > On 01/11/2016 08:31 PM, Kevin Wolf wrote: > >Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: > >>From: Olga Krishtal <okrishtal@virtuozzo.com> > >> > >>While opening the image we want to be sure that we are the > >>one who works with image, anf if it is not true - > >>opening the image for writing should fail. > >> > >>There are 2 ways at the moment: no lock at all and lock the file > >>image. > >> > >>Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> > >>Signed-off-by: Denis V. Lunev <den@openvz.org> > >>CC: Kevin Wolf <kwolf@redhat.com> > >>CC: Max Reitz <mreitz@redhat.com> > >>CC: Eric Blake <eblake@redhat.com> > >>CC: Fam Zheng <famz@redhat.com> > >As long as locking is disabled by default, it's useless and won't > >prevent people from corrupting their images. These corruptions happen > >exactly because people don't know how to use qemu properly. You can't > >expect them to enable locking manually. > You are right. Though this is not a big problem to enable locking. > If there are several mechanics, we could save one into the > qcow2 header. The problem is that libvirt already takes a lock, as Dan mentioned in another reply in this thread, so we can't enable locking in qemu by default. It would always fail when run under libvirt. Unless I'm seriously mistaken, this means that flock() inside qemu is dead. > >Also, you probably need to consider bdrv_reopen() and live migration. > >I think live migration would be blocked if source and destination both > >see the lock; which is admittedly less likely than with the qcow2 patch > >(and generally a problem of this series), but with localhost migration > >and potentially with some NFS setups it can be the case. > > > >Kevin > for live migration the situation should be not that problematic. > The disk is opened in RW mode only on one node. Am I right? > The lock is taken when the image is opened in RW mode only. You wish. :-) That's exactly the reason why my patch series contains all these patches with inactivate/invalidate_cache. These patches still don't avoid that the file descriptor is opened r/w on both sides, but at least they ensure that it's clearly defined who actually uses it to write. Kevin
On Tue, 01/12 11:10, Kevin Wolf wrote: > > The problem is that libvirt already takes a lock, as Dan mentioned in > another reply in this thread, so we can't enable locking in qemu by > default. It would always fail when run under libvirt. > > Unless I'm seriously mistaken, this means that flock() inside qemu is > dead. Yes, I see the problem with libvirt, but can we instead do these? 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. This way, if libvirt is holding flock, we can assume libvirt is actually "using" the image: 1) just works as before, but 2) will not break the qcow2. That is still a slight improvement, and does solve the reckless "qemu-img snapshot create" user's problem. Fam
On 01/12/2016 02:33 PM, Fam Zheng wrote: > On Tue, 01/12 11:10, Kevin Wolf wrote: >> The problem is that libvirt already takes a lock, as Dan mentioned in >> another reply in this thread, so we can't enable locking in qemu by >> default. It would always fail when run under libvirt. >> >> Unless I'm seriously mistaken, this means that flock() inside qemu is >> dead. > Yes, I see the problem with libvirt, but can we instead do these? > > 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. > 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. > > This way, if libvirt is holding flock, we can assume libvirt is actually > "using" the image: 1) just works as before, but 2) will not break the qcow2. > That is still a slight improvement, and does solve the reckless "qemu-img > snapshot create" user's problem. > > Fam assuming that we should lock "by default". This looks good to me. Den
Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben: > On Tue, 01/12 11:10, Kevin Wolf wrote: > > > > The problem is that libvirt already takes a lock, as Dan mentioned in > > another reply in this thread, so we can't enable locking in qemu by > > default. It would always fail when run under libvirt. > > > > Unless I'm seriously mistaken, this means that flock() inside qemu is > > dead. > > Yes, I see the problem with libvirt, but can we instead do these? > > 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. > 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. > > This way, if libvirt is holding flock, we can assume libvirt is actually > "using" the image: 1) just works as before, but 2) will not break the qcow2. > That is still a slight improvement, and does solve the reckless "qemu-img > snapshot create" user's problem. This makes two assumptions: 1. qemu is only ever invoked by libvirt 2. qemu-img is only ever invoked by human users Both of them are wrong. 1. just means that manually started QEMUs are unprotected (which is already bad), but 2. means that qemu-img called by libvirt fails (which is obviously not acceptable). Kevin
On Tue, 01/12 13:28, Kevin Wolf wrote: > Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben: > > On Tue, 01/12 11:10, Kevin Wolf wrote: > > > > > > The problem is that libvirt already takes a lock, as Dan mentioned in > > > another reply in this thread, so we can't enable locking in qemu by > > > default. It would always fail when run under libvirt. > > > > > > Unless I'm seriously mistaken, this means that flock() inside qemu is > > > dead. > > > > Yes, I see the problem with libvirt, but can we instead do these? > > > > 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. > > 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. > > > > This way, if libvirt is holding flock, we can assume libvirt is actually > > "using" the image: 1) just works as before, but 2) will not break the qcow2. > > That is still a slight improvement, and does solve the reckless "qemu-img > > snapshot create" user's problem. > > This makes two assumptions: > > 1. qemu is only ever invoked by libvirt > 2. qemu-img is only ever invoked by human users > > Both of them are wrong. 1. just means that manually started QEMUs are > unprotected (which is already bad), but 2. means that qemu-img called by > libvirt fails (which is obviously not acceptable). No, my assumptions are: a. libvirt calls flock() when invoking qemu; b. libvirt doesn't call flock() when invoking qemu-img; (if I read the libvirt code correctly, input from libvirt folks needed); So, 1. means that multiple manually started QEMU instances writing to the same image are NOT protected, that's the limitation. 2. means that qemu-img called by either libvirt or a human user is prevented from corrupting an in-use qcow2. As long as I'm not wrong about b. Fam
On Tue, Jan 12, 2016 at 09:17:51PM +0800, Fam Zheng wrote: > On Tue, 01/12 13:28, Kevin Wolf wrote: > > Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben: > > > On Tue, 01/12 11:10, Kevin Wolf wrote: > > > > > > > > The problem is that libvirt already takes a lock, as Dan mentioned in > > > > another reply in this thread, so we can't enable locking in qemu by > > > > default. It would always fail when run under libvirt. > > > > > > > > Unless I'm seriously mistaken, this means that flock() inside qemu is > > > > dead. > > > > > > Yes, I see the problem with libvirt, but can we instead do these? > > > > > > 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. > > > 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. > > > > > > This way, if libvirt is holding flock, we can assume libvirt is actually > > > "using" the image: 1) just works as before, but 2) will not break the qcow2. > > > That is still a slight improvement, and does solve the reckless "qemu-img > > > snapshot create" user's problem. > > > > This makes two assumptions: > > > > 1. qemu is only ever invoked by libvirt > > 2. qemu-img is only ever invoked by human users > > > > Both of them are wrong. 1. just means that manually started QEMUs are > > unprotected (which is already bad), but 2. means that qemu-img called by > > libvirt fails (which is obviously not acceptable). > > No, my assumptions are: > > a. libvirt calls flock() when invoking qemu; > b. libvirt doesn't call flock() when invoking qemu-img; (if I read the libvirt > code correctly, input from libvirt folks needed); b. is /currently/ true, but I wouldn't guarantee that will always be true, because we've (vague) plans to extend our locking infrastructure to cover our storage pools APIs too, at which point we'd likely be have locking around qemu-img based API calls too. There's also likelihood we'll make our locking API public, in which case it is possible that an app using libvirt could have acquired locks on the file. > So, 1. means that multiple manually started QEMU instances writing to the same > image are NOT protected, that's the limitation. 2. means that qemu-img called > by either libvirt or a human user is prevented from corrupting an in-use qcow2. > > As long as I'm not wrong about b. Regards, Daniel
On 01/12/2016 02:33 PM, Fam Zheng wrote: > On Tue, 01/12 11:10, Kevin Wolf wrote: >> The problem is that libvirt already takes a lock, as Dan mentioned in >> another reply in this thread, so we can't enable locking in qemu by >> default. It would always fail when run under libvirt. >> >> Unless I'm seriously mistaken, this means that flock() inside qemu is >> dead. > Yes, I see the problem with libvirt, but can we instead do these? > > 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. > 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. > > This way, if libvirt is holding flock, we can assume libvirt is actually > "using" the image: 1) just works as before, but 2) will not break the qcow2. > That is still a slight improvement, and does solve the reckless "qemu-img > snapshot create" user's problem. > > Fam There is a better way though. If we will switch default in my patch from 'nolock' to 'lock' then pour guys which are calling qemu-img etc stuff will see the lock as necessary while 'proper management software' aka libvirt will be able to call qemu/qemu-img etc with proper 'nolock' flag as they do care about the locking. Though from my POW all locks should be taken in the responsible entity, i.e. qemu or qemu-img as if locks are held by libvirt then they should be re-taken on f.e. daemon restart, which could happen. This is not that convenient. Den
On Tue, 01/12 13:24, Daniel P. Berrange wrote: > On Tue, Jan 12, 2016 at 09:17:51PM +0800, Fam Zheng wrote: > > On Tue, 01/12 13:28, Kevin Wolf wrote: > > > Am 12.01.2016 um 12:33 hat Fam Zheng geschrieben: > > > > On Tue, 01/12 11:10, Kevin Wolf wrote: > > > > > > > > > > The problem is that libvirt already takes a lock, as Dan mentioned in > > > > > another reply in this thread, so we can't enable locking in qemu by > > > > > default. It would always fail when run under libvirt. > > > > > > > > > > Unless I'm seriously mistaken, this means that flock() inside qemu is > > > > > dead. > > > > > > > > Yes, I see the problem with libvirt, but can we instead do these? > > > > > > > > 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. > > > > 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. > > > > > > > > This way, if libvirt is holding flock, we can assume libvirt is actually > > > > "using" the image: 1) just works as before, but 2) will not break the qcow2. > > > > That is still a slight improvement, and does solve the reckless "qemu-img > > > > snapshot create" user's problem. > > > > > > This makes two assumptions: > > > > > > 1. qemu is only ever invoked by libvirt > > > 2. qemu-img is only ever invoked by human users > > > > > > Both of them are wrong. 1. just means that manually started QEMUs are > > > unprotected (which is already bad), but 2. means that qemu-img called by > > > libvirt fails (which is obviously not acceptable). > > > > No, my assumptions are: > > > > a. libvirt calls flock() when invoking qemu; > > b. libvirt doesn't call flock() when invoking qemu-img; (if I read the libvirt > > code correctly, input from libvirt folks needed); > > b. is /currently/ true, but I wouldn't guarantee that will always be > true, because we've (vague) plans to extend our locking infrastructure > to cover our storage pools APIs too, at which point we'd likely be > have locking around qemu-img based API calls too. There's also likelihood > we'll make our locking API public, in which case it is possible that > an app using libvirt could have acquired locks on the file. > This is not a problem. When you extend that in libvirt, you can at meanwhile modify it to always specify "nolock=on" when invoking the new qemu-img so that it doesn't check flock(). Fam
On Tue, 01/12 18:59, Denis V. Lunev wrote: > On 01/12/2016 02:33 PM, Fam Zheng wrote: > >On Tue, 01/12 11:10, Kevin Wolf wrote: > >>The problem is that libvirt already takes a lock, as Dan mentioned in > >>another reply in this thread, so we can't enable locking in qemu by > >>default. It would always fail when run under libvirt. > >> > >>Unless I'm seriously mistaken, this means that flock() inside qemu is > >>dead. > >Yes, I see the problem with libvirt, but can we instead do these? > > > > 1) Do a soft flock() in QEMU invocation. If it fails, sliently ignore. > > 2) Do a hard flock() in qemu-img invocation. If it fails, report and exit. > > > >This way, if libvirt is holding flock, we can assume libvirt is actually > >"using" the image: 1) just works as before, but 2) will not break the qcow2. > >That is still a slight improvement, and does solve the reckless "qemu-img > >snapshot create" user's problem. > > > >Fam > There is a better way though. > > If we will switch default in my patch from 'nolock' to 'lock' then > pour guys which are calling qemu-img etc stuff will see the lock > as necessary while 'proper management software' aka libvirt > will be able to call qemu/qemu-img etc with proper 'nolock' > flag as they do care about the locking. That is wrong because then we break old libvirt with the new qemu-img (acquires lock by default), which is IMO a breakage of backward compatibility. Fam > > Though from my POW all locks should be taken in the responsible > entity, i.e. qemu or qemu-img as if locks are held by libvirt then > they should be re-taken on f.e. daemon restart, which could happen. > This is not that convenient. > > Den
Kevin Wolf <kwolf@redhat.com> writes: > Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben: >> On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote: >> > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: >> > > From: Olga Krishtal <okrishtal@virtuozzo.com> >> > > >> > > While opening the image we want to be sure that we are the >> > > one who works with image, anf if it is not true - >> > > opening the image for writing should fail. >> > > >> > > There are 2 ways at the moment: no lock at all and lock the file >> > > image. >> > > >> > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> >> > > Signed-off-by: Denis V. Lunev <den@openvz.org> >> > > CC: Kevin Wolf <kwolf@redhat.com> >> > > CC: Max Reitz <mreitz@redhat.com> >> > > CC: Eric Blake <eblake@redhat.com> >> > > CC: Fam Zheng <famz@redhat.com> >> > >> > As long as locking is disabled by default, it's useless and won't >> > prevent people from corrupting their images. These corruptions happen >> > exactly because people don't know how to use qemu properly. You can't >> > expect them to enable locking manually. >> > >> > Also, you probably need to consider bdrv_reopen() and live migration. >> > I think live migration would be blocked if source and destination both >> > see the lock; which is admittedly less likely than with the qcow2 patch >> > (and generally a problem of this series), but with localhost migration >> > and potentially with some NFS setups it can be the case. >> >> Note that when libvirt does locking it will release locks when a VM >> is paused, and acquire locks prior to resuming CPUs. This allows live >> migration to work because you never have CPUs running on both src + dst >> at the same time. This does mean that libvirt does not allow QEMU to >> automatically re-start CPUs when migration completes, as it needs to >> take some action to acquire locks before allowing the dst to start >> running again. > > This assumes that block devices can only be written to if CPUs are > running. In the days of qemu 0.9, this was probably right, but with > things like block jobs and built-in NBD servers, I wouldn't be as sure > these days. Sounds like QEMU and libvirt should cooperate more closely to get the locking less wrong. QEMU should have more accurate knowledge on how it is using the image. Libvirt may be able to provide better locks, with the help of its virtlockd daemon.
On 01/13/2016 11:52 AM, Markus Armbruster wrote: > Kevin Wolf <kwolf@redhat.com> writes: > >> Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben: >>> On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote: >>>> Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: >>>>> From: Olga Krishtal <okrishtal@virtuozzo.com> >>>>> >>>>> While opening the image we want to be sure that we are the >>>>> one who works with image, anf if it is not true - >>>>> opening the image for writing should fail. >>>>> >>>>> There are 2 ways at the moment: no lock at all and lock the file >>>>> image. >>>>> >>>>> Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> >>>>> Signed-off-by: Denis V. Lunev <den@openvz.org> >>>>> CC: Kevin Wolf <kwolf@redhat.com> >>>>> CC: Max Reitz <mreitz@redhat.com> >>>>> CC: Eric Blake <eblake@redhat.com> >>>>> CC: Fam Zheng <famz@redhat.com> >>>> As long as locking is disabled by default, it's useless and won't >>>> prevent people from corrupting their images. These corruptions happen >>>> exactly because people don't know how to use qemu properly. You can't >>>> expect them to enable locking manually. >>>> >>>> Also, you probably need to consider bdrv_reopen() and live migration. >>>> I think live migration would be blocked if source and destination both >>>> see the lock; which is admittedly less likely than with the qcow2 patch >>>> (and generally a problem of this series), but with localhost migration >>>> and potentially with some NFS setups it can be the case. >>> Note that when libvirt does locking it will release locks when a VM >>> is paused, and acquire locks prior to resuming CPUs. This allows live >>> migration to work because you never have CPUs running on both src + dst >>> at the same time. This does mean that libvirt does not allow QEMU to >>> automatically re-start CPUs when migration completes, as it needs to >>> take some action to acquire locks before allowing the dst to start >>> running again. >> This assumes that block devices can only be written to if CPUs are >> running. In the days of qemu 0.9, this was probably right, but with >> things like block jobs and built-in NBD servers, I wouldn't be as sure >> these days. > Sounds like QEMU and libvirt should cooperate more closely to get the > locking less wrong. > > QEMU should have more accurate knowledge on how it is using the image. > Libvirt may be able to provide better locks, with the help of its > virtlockd daemon. daemon owning locks is a problem: - there are distributed cases - daemons restart from time to time
On Wed, Jan 13, 2016 at 12:12:10PM +0300, Denis V. Lunev wrote: > On 01/13/2016 11:52 AM, Markus Armbruster wrote: > >Kevin Wolf <kwolf@redhat.com> writes: > > > >>Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben: > >>>On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote: > >>>>Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: > >>>>>From: Olga Krishtal <okrishtal@virtuozzo.com> > >>>>> > >>>>>While opening the image we want to be sure that we are the > >>>>>one who works with image, anf if it is not true - > >>>>>opening the image for writing should fail. > >>>>> > >>>>>There are 2 ways at the moment: no lock at all and lock the file > >>>>>image. > >>>>> > >>>>>Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> > >>>>>Signed-off-by: Denis V. Lunev <den@openvz.org> > >>>>>CC: Kevin Wolf <kwolf@redhat.com> > >>>>>CC: Max Reitz <mreitz@redhat.com> > >>>>>CC: Eric Blake <eblake@redhat.com> > >>>>>CC: Fam Zheng <famz@redhat.com> > >>>>As long as locking is disabled by default, it's useless and won't > >>>>prevent people from corrupting their images. These corruptions happen > >>>>exactly because people don't know how to use qemu properly. You can't > >>>>expect them to enable locking manually. > >>>> > >>>>Also, you probably need to consider bdrv_reopen() and live migration. > >>>>I think live migration would be blocked if source and destination both > >>>>see the lock; which is admittedly less likely than with the qcow2 patch > >>>>(and generally a problem of this series), but with localhost migration > >>>>and potentially with some NFS setups it can be the case. > >>>Note that when libvirt does locking it will release locks when a VM > >>>is paused, and acquire locks prior to resuming CPUs. This allows live > >>>migration to work because you never have CPUs running on both src + dst > >>>at the same time. This does mean that libvirt does not allow QEMU to > >>>automatically re-start CPUs when migration completes, as it needs to > >>>take some action to acquire locks before allowing the dst to start > >>>running again. > >>This assumes that block devices can only be written to if CPUs are > >>running. In the days of qemu 0.9, this was probably right, but with > >>things like block jobs and built-in NBD servers, I wouldn't be as sure > >>these days. > >Sounds like QEMU and libvirt should cooperate more closely to get the > >locking less wrong. > > > >QEMU should have more accurate knowledge on how it is using the image. > >Libvirt may be able to provide better locks, with the help of its > >virtlockd daemon. > daemon owning locks is a problem: > - there are distributed cases > - daemons restart from time to time The virtlockd daemon copes with both of these cases just fine. There is one daemon per virtualization host, and they can be configured acquire locks in a way that they will be enforced across all hosts. The reason we do it in a separate virtlockd daemon instead of libvirtd is that we designed it to be able to re-exec() itself while maintaining all locks to allow for seemless upgrade. Regards, Daniel
On Mon, Jan 11, 2016 at 07:35:57PM +0100, Kevin Wolf wrote: > Am 11.01.2016 um 18:58 hat Daniel P. Berrange geschrieben: > > On Mon, Jan 11, 2016 at 06:31:06PM +0100, Kevin Wolf wrote: > > > Am 23.12.2015 um 08:46 hat Denis V. Lunev geschrieben: > > > > From: Olga Krishtal <okrishtal@virtuozzo.com> > > > > > > > > While opening the image we want to be sure that we are the > > > > one who works with image, anf if it is not true - > > > > opening the image for writing should fail. > > > > > > > > There are 2 ways at the moment: no lock at all and lock the file > > > > image. > > > > > > > > Signed-off-by: Olga Krishtal <okrishtal@virtuozzo.com> > > > > Signed-off-by: Denis V. Lunev <den@openvz.org> > > > > CC: Kevin Wolf <kwolf@redhat.com> > > > > CC: Max Reitz <mreitz@redhat.com> > > > > CC: Eric Blake <eblake@redhat.com> > > > > CC: Fam Zheng <famz@redhat.com> > > > > > > As long as locking is disabled by default, it's useless and won't > > > prevent people from corrupting their images. These corruptions happen > > > exactly because people don't know how to use qemu properly. You can't > > > expect them to enable locking manually. > > > > > > Also, you probably need to consider bdrv_reopen() and live migration. > > > I think live migration would be blocked if source and destination both > > > see the lock; which is admittedly less likely than with the qcow2 patch > > > (and generally a problem of this series), but with localhost migration > > > and potentially with some NFS setups it can be the case. > > > > Note that when libvirt does locking it will release locks when a VM > > is paused, and acquire locks prior to resuming CPUs. This allows live > > migration to work because you never have CPUs running on both src + dst > > at the same time. This does mean that libvirt does not allow QEMU to > > automatically re-start CPUs when migration completes, as it needs to > > take some action to acquire locks before allowing the dst to start > > running again. > > This assumes that block devices can only be written to if CPUs are > running. In the days of qemu 0.9, this was probably right, but with > things like block jobs and built-in NBD servers, I wouldn't be as sure > these days. True, libvirt knows when it is using block jobs & NBD servers, so it should not be difficult to address this. Regards, Daniel
On 01/12/2016 05:10 PM, Fam Zheng wrote: >> If we will switch default in my patch from 'nolock' to 'lock' then >> pour guys which are calling qemu-img etc stuff will see the lock >> as necessary while 'proper management software' aka libvirt >> will be able to call qemu/qemu-img etc with proper 'nolock' >> flag as they do care about the locking. > > That is wrong because then we break old libvirt with the new qemu-img (acquires > lock by default), which is IMO a breakage of backward compatibility. In the big software stack picture, it is okay to reject 'old libvirt/new qemu' as an invalid combination. Upgrade-wise, we specifically support 'new libvirt/old qemu' - but it is fair game to say that 'if you want to run new qemu, you must first upgrade to new libvirt that knows how to drive it'. That said, minimizing back-compat breaks, so that old libvirt can (usually) correctly drive new qemu, is a worthy design goal for qemu.
On 01/13/2016 07:44 PM, Eric Blake wrote: > On 01/12/2016 05:10 PM, Fam Zheng wrote: > >>> If we will switch default in my patch from 'nolock' to 'lock' then >>> pour guys which are calling qemu-img etc stuff will see the lock >>> as necessary while 'proper management software' aka libvirt >>> will be able to call qemu/qemu-img etc with proper 'nolock' >>> flag as they do care about the locking. >> That is wrong because then we break old libvirt with the new qemu-img (acquires >> lock by default), which is IMO a breakage of backward compatibility. > In the big software stack picture, it is okay to reject 'old libvirt/new > qemu' as an invalid combination. Upgrade-wise, we specifically support > 'new libvirt/old qemu' - but it is fair game to say that 'if you want to > run new qemu, you must first upgrade to new libvirt that knows how to > drive it'. > > That said, minimizing back-compat breaks, so that old libvirt can > (usually) correctly drive new qemu, is a worthy design goal for qemu. > there is one other thing I have originally missed to add to the picture. Locking could be complex and format specific. In original Parallels disk format (not image but entire bundle), the locking is performed on a very special file. Thus either libvirt must know exact format details or it must rely on something which really does know details, i.e. QEMU/qemu-img. Den
diff --git a/block.c b/block.c index 411edbf..74228b8 100644 --- a/block.c +++ b/block.c @@ -40,6 +40,8 @@ #include "qemu/timer.h" #include "qapi-event.h" #include "block/throttle-groups.h" +#include "qapi-types.h" +#include "qapi/util.h" #ifdef CONFIG_BSD #include <sys/types.h> @@ -895,6 +897,12 @@ static QemuOptsList bdrv_runtime_opts = { .type = QEMU_OPT_BOOL, .help = "Ignore flush requests", }, + { + .name = "lock", + .type = QEMU_OPT_STRING, + .help = "How to lock the image: none or lockfile", + .def_value_str = "none", + }, { /* end of list */ } }, }; @@ -914,6 +922,8 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, QemuOpts *opts; BlockDriver *drv; Error *local_err = NULL; + const char *lock_image; + int lock; assert(bs->file == NULL); assert(options != NULL && bs->options != options); @@ -1020,6 +1030,18 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, goto free_and_fail; } + lock_image = qemu_opt_get(opts, "lock"); + lock = qapi_enum_parse(BdrvLockImage_lookup, lock_image, + BDRV_LOCK_IMAGE__MAX, BDRV_LOCK_IMAGE_NONE, &local_err); + if (!bs->read_only && lock != BDRV_LOCK_IMAGE_NONE) { + ret = bdrv_lock_image(bs, lock); + if (ret < 0) { + error_setg_errno(errp, -ret, "Could not lock the image"); + bdrv_close(bs); + goto fail_opts; + } + } + if (bs->encrypted) { error_report("Encrypted images are deprecated"); error_printf("Support for them will be removed in a future release.\n" @@ -4320,3 +4342,22 @@ void bdrv_refresh_filename(BlockDriverState *bs) QDECREF(json); } } + +int bdrv_lock_image(BlockDriverState *bs, BdrvLockImage lock) +{ + BlockDriver *drv = bs->drv; + if (lock != BDRV_LOCK_IMAGE_LOCKFILE) { + return -EOPNOTSUPP; + } + if (drv != NULL && drv->bdrv_lock_image != NULL) { + return bs->drv->bdrv_lock_image(bs, lock); + } + if (bs->file == NULL) { + return -EOPNOTSUPP; + } + drv = bs->file->bs->drv; + if (drv != NULL && drv->bdrv_lock_image != NULL) { + return drv->bdrv_lock_image(bs, lock); + } + return -EOPNOTSUPP; +} diff --git a/include/block/block.h b/include/block/block.h index db8e096..27fc434 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -506,6 +506,7 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap, void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi); void bdrv_set_dirty_iter(struct HBitmapIter *hbi, int64_t offset); int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap); +int bdrv_lock_image(BlockDriverState *bs, BdrvLockImage lock); void bdrv_enable_copy_on_read(BlockDriverState *bs); void bdrv_disable_copy_on_read(BlockDriverState *bs); diff --git a/include/block/block_int.h b/include/block/block_int.h index 256609d..755f342 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -135,6 +135,7 @@ struct BlockDriver { int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp); int (*bdrv_set_key)(BlockDriverState *bs, const char *key); int (*bdrv_make_empty)(BlockDriverState *bs); + int (*bdrv_lock_image)(BlockDriverState *bs, BdrvLockImage lock); void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options); diff --git a/qapi/block-core.json b/qapi/block-core.json index 1a5d9ce..b82589b 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2408,3 +2408,12 @@ ## { 'command': 'block-set-write-threshold', 'data': { 'node-name': 'str', 'write-threshold': 'uint64' } } + +## +# @BdrvLockImage: +# +#An enumeration of lock types to lock image file. +# @none - do not lock the image file +# @lockfile +## Since 2.6 +{ 'enum': 'BdrvLockImage', 'data':['none', 'lockfile']}