diff mbox

[for-2.7,v2,07/17] rbd: Implement image locking

Message ID 1460690887-32751-8-git-send-email-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng April 15, 2016, 3:27 a.m. UTC
librbd has the locking API that can be used to implement .bdrv_lockf.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block/rbd.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Comments

Jason Dillaman April 23, 2016, 1:57 a.m. UTC | #1
Since this cannot automatically recover from a crashed QEMU client with an
RBD image, perhaps this RBD locking should not default to enabled.
Additionally, this will conflict with the "exclusive-lock" feature
available since the Ceph Hammer-release since both utilize the same locking
construct.

As a quick background, the optional exclusive-lock feature can be
enabled/disabled on image and safely/automatically handles the case of
recovery from a crashed client.  Under normal conditions, the RBD
exclusive-lock feature automatically acquires the lock upon the first
attempt to write to the image and transparently transitions ownership of
the lock between two or more clients -- used for QEMU live-migration.

I'd be happy to add a new librbd API method to explicitly expose acquiring
and releasing the RBD exclusive lock since it certainly solves a couple
compromises in our current QEMU integration.

On Thu, Apr 14, 2016 at 11:27 PM, Fam Zheng <famz@redhat.com> wrote:

> librbd has the locking API that can be used to implement .bdrv_lockf.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block/rbd.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 5bc5b32..a495083 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -810,6 +810,30 @@ static int qemu_rbd_truncate(BlockDriverState *bs,
> int64_t offset)
>      return 0;
>  }
>
> +static int qemu_rbd_lockf(BlockDriverState *bs, BdrvLockfCmd cmd)
> +{
> +    int ret;
> +    BDRVRBDState *s = bs->opaque;
> +
> +    /* XXX: RBD locks are not released automatically when program exits,
> which
> +     * means if QEMU dies it cannot open the image next time until
> manually
> +     * unlocked. */
> +    switch (cmd) {
> +    case BDRV_LOCKF_RWLOCK:
> +        ret = rbd_lock_exclusive(s->image, NULL);
> +        break;
> +    case BDRV_LOCKF_ROLOCK:
> +        ret = rbd_lock_shared(s->image, NULL, NULL);
> +        break;
> +    case BDRV_LOCKF_UNLOCK:
> +        ret = rbd_unlock(s->image, NULL);
> +        break;
> +    default:
> +        abort();
> +    }
> +    return ret;
> +}
> +
>  static int qemu_rbd_snap_create(BlockDriverState *bs,
>                                  QEMUSnapshotInfo *sn_info)
>  {
> @@ -998,6 +1022,7 @@ static BlockDriver bdrv_rbd = {
>      .bdrv_aio_discard       = qemu_rbd_aio_discard,
>  #endif
>
> +    .bdrv_lockf             = qemu_rbd_lockf,
>      .bdrv_snapshot_create   = qemu_rbd_snap_create,
>      .bdrv_snapshot_delete   = qemu_rbd_snap_remove,
>      .bdrv_snapshot_list     = qemu_rbd_snap_list,
> --
> 2.8.0
>
>
>
Fam Zheng April 25, 2016, 12:42 a.m. UTC | #2
On Fri, 04/22 21:57, Jason Dillaman wrote:
> Since this cannot automatically recover from a crashed QEMU client with an
> RBD image, perhaps this RBD locking should not default to enabled.
> Additionally, this will conflict with the "exclusive-lock" feature
> available since the Ceph Hammer-release since both utilize the same locking
> construct.
> 
> As a quick background, the optional exclusive-lock feature can be
> enabled/disabled on image and safely/automatically handles the case of
> recovery from a crashed client.  Under normal conditions, the RBD
> exclusive-lock feature automatically acquires the lock upon the first
> attempt to write to the image and transparently transitions ownership of
> the lock between two or more clients -- used for QEMU live-migration.

Is it enabled by default?

> 
> I'd be happy to add a new librbd API method to explicitly expose acquiring
> and releasing the RBD exclusive lock since it certainly solves a couple
> compromises in our current QEMU integration.

Does the API do enable/disable or acquire/relase? Could you explain the
differences between it and rbd_lock_exclusive?

Fam
Jason Dillaman April 26, 2016, 3:42 p.m. UTC | #3
On Sun, Apr 24, 2016 at 7:42 PM, Fam Zheng <famz@redhat.com> wrote:
> On Fri, 04/22 21:57, Jason Dillaman wrote:
>> Since this cannot automatically recover from a crashed QEMU client with an
>> RBD image, perhaps this RBD locking should not default to enabled.
>> Additionally, this will conflict with the "exclusive-lock" feature
>> available since the Ceph Hammer-release since both utilize the same locking
>> construct.
>>
>> As a quick background, the optional exclusive-lock feature can be
>> enabled/disabled on image and safely/automatically handles the case of
>> recovery from a crashed client.  Under normal conditions, the RBD
>> exclusive-lock feature automatically acquires the lock upon the first
>> attempt to write to the image and transparently transitions ownership of
>> the lock between two or more clients -- used for QEMU live-migration.
>
> Is it enabled by default?
>

Starting with the Jewel release of Ceph it is enabled by default.

>>
>> I'd be happy to add a new librbd API method to explicitly expose acquiring
>> and releasing the RBD exclusive lock since it certainly solves a couple
>> compromises in our current QEMU integration.
>
> Does the API do enable/disable or acquire/relase? Could you explain the
> differences between it and rbd_lock_exclusive?

There is already an API for enabling/disabling the exclusive-lock
feature (and associated CLI tooling).  This would be a new API method
to explicitly acquire / release the exclusive-lock (instead of
implicitly acquiring the lock upon first write request as it currently
does in order to support QEMU live-migration).

The rbd_lock_exclusive/rbd_lock_shared are essentially advisory locks.
Nothing stops a client from accessing the image if it doesn't attempt
to acquire the lock (even if another client already has the image
locked exclusively).  It also doesn't support automatic recovery from
failed clients.
Fam Zheng April 27, 2016, 12:20 a.m. UTC | #4
On Tue, 04/26 10:42, Jason Dillaman wrote:
> On Sun, Apr 24, 2016 at 7:42 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Fri, 04/22 21:57, Jason Dillaman wrote:
> >> Since this cannot automatically recover from a crashed QEMU client with an
> >> RBD image, perhaps this RBD locking should not default to enabled.
> >> Additionally, this will conflict with the "exclusive-lock" feature
> >> available since the Ceph Hammer-release since both utilize the same locking
> >> construct.
> >>
> >> As a quick background, the optional exclusive-lock feature can be
> >> enabled/disabled on image and safely/automatically handles the case of
> >> recovery from a crashed client.  Under normal conditions, the RBD
> >> exclusive-lock feature automatically acquires the lock upon the first
> >> attempt to write to the image and transparently transitions ownership of
> >> the lock between two or more clients -- used for QEMU live-migration.
> >
> > Is it enabled by default?
> >
> 
> Starting with the Jewel release of Ceph it is enabled by default.

OK, then I'll leave rbd in this QEMU series for now.

> 
> >>
> >> I'd be happy to add a new librbd API method to explicitly expose acquiring
> >> and releasing the RBD exclusive lock since it certainly solves a couple
> >> compromises in our current QEMU integration.
> >
> > Does the API do enable/disable or acquire/relase? Could you explain the
> > differences between it and rbd_lock_exclusive?
> 
> There is already an API for enabling/disabling the exclusive-lock
> feature (and associated CLI tooling).  This would be a new API method
> to explicitly acquire / release the exclusive-lock (instead of
> implicitly acquiring the lock upon first write request as it currently
> does in order to support QEMU live-migration).
> 
> The rbd_lock_exclusive/rbd_lock_shared are essentially advisory locks.
> Nothing stops a client from accessing the image if it doesn't attempt
> to acquire the lock (even if another client already has the image
> locked exclusively).  It also doesn't support automatic recovery from
> failed clients.

I see, thanks for the explanation!

Fam
Jason Dillaman April 27, 2016, 6:18 p.m. UTC | #5
On Tue, Apr 26, 2016 at 7:20 PM, Fam Zheng <famz@redhat.com> wrote:
> On Tue, 04/26 10:42, Jason Dillaman wrote:
>> On Sun, Apr 24, 2016 at 7:42 PM, Fam Zheng <famz@redhat.com> wrote:
>> > On Fri, 04/22 21:57, Jason Dillaman wrote:
>> >> Since this cannot automatically recover from a crashed QEMU client with an
>> >> RBD image, perhaps this RBD locking should not default to enabled.
>> >> Additionally, this will conflict with the "exclusive-lock" feature
>> >> available since the Ceph Hammer-release since both utilize the same locking
>> >> construct.
>> >>
>> >> As a quick background, the optional exclusive-lock feature can be
>> >> enabled/disabled on image and safely/automatically handles the case of
>> >> recovery from a crashed client.  Under normal conditions, the RBD
>> >> exclusive-lock feature automatically acquires the lock upon the first
>> >> attempt to write to the image and transparently transitions ownership of
>> >> the lock between two or more clients -- used for QEMU live-migration.
>> >
>> > Is it enabled by default?
>> >
>>
>> Starting with the Jewel release of Ceph it is enabled by default.
>
> OK, then I'll leave rbd in this QEMU series for now.

Without exposing some new API methods, this patch will, unfortunately,
directly conflict with the new Jewel rbd defaults and will actually
result in the image becoming read-only since librbd won't be able to
acquire the exclusive lock when QEMU owns the advisory lock.

We can probably get the new API methods upstream within a week or two
[1].  If that's too long of a delay, I'd recommend dropping rbd
locking from the series for now.

[1] http://tracker.ceph.com/issues/15632
Fam Zheng April 28, 2016, 1:33 a.m. UTC | #6
On Wed, 04/27 13:18, Jason Dillaman wrote:
> On Tue, Apr 26, 2016 at 7:20 PM, Fam Zheng <famz@redhat.com> wrote:
> > On Tue, 04/26 10:42, Jason Dillaman wrote:
> >> On Sun, Apr 24, 2016 at 7:42 PM, Fam Zheng <famz@redhat.com> wrote:
> >> > On Fri, 04/22 21:57, Jason Dillaman wrote:
> >> >> Since this cannot automatically recover from a crashed QEMU client with an
> >> >> RBD image, perhaps this RBD locking should not default to enabled.
> >> >> Additionally, this will conflict with the "exclusive-lock" feature
> >> >> available since the Ceph Hammer-release since both utilize the same locking
> >> >> construct.
> >> >>
> >> >> As a quick background, the optional exclusive-lock feature can be
> >> >> enabled/disabled on image and safely/automatically handles the case of
> >> >> recovery from a crashed client.  Under normal conditions, the RBD
> >> >> exclusive-lock feature automatically acquires the lock upon the first
> >> >> attempt to write to the image and transparently transitions ownership of
> >> >> the lock between two or more clients -- used for QEMU live-migration.
> >> >
> >> > Is it enabled by default?
> >> >
> >>
> >> Starting with the Jewel release of Ceph it is enabled by default.
> >
> > OK, then I'll leave rbd in this QEMU series for now.
> 
> Without exposing some new API methods, this patch will, unfortunately,
> directly conflict with the new Jewel rbd defaults and will actually
> result in the image becoming read-only since librbd won't be able to
> acquire the exclusive lock when QEMU owns the advisory lock.
> 
> We can probably get the new API methods upstream within a week or two
> [1].  If that's too long of a delay, I'd recommend dropping rbd
> locking from the series for now.

Yes you are right, I tried to mean "drop" with "leave" :)

Thanks,
Fam

> 
> [1] http://tracker.ceph.com/issues/15632
> 
> -- 
> Jason
diff mbox

Patch

diff --git a/block/rbd.c b/block/rbd.c
index 5bc5b32..a495083 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -810,6 +810,30 @@  static int qemu_rbd_truncate(BlockDriverState *bs, int64_t offset)
     return 0;
 }
 
+static int qemu_rbd_lockf(BlockDriverState *bs, BdrvLockfCmd cmd)
+{
+    int ret;
+    BDRVRBDState *s = bs->opaque;
+
+    /* XXX: RBD locks are not released automatically when program exits, which
+     * means if QEMU dies it cannot open the image next time until manually
+     * unlocked. */
+    switch (cmd) {
+    case BDRV_LOCKF_RWLOCK:
+        ret = rbd_lock_exclusive(s->image, NULL);
+        break;
+    case BDRV_LOCKF_ROLOCK:
+        ret = rbd_lock_shared(s->image, NULL, NULL);
+        break;
+    case BDRV_LOCKF_UNLOCK:
+        ret = rbd_unlock(s->image, NULL);
+        break;
+    default:
+        abort();
+    }
+    return ret;
+}
+
 static int qemu_rbd_snap_create(BlockDriverState *bs,
                                 QEMUSnapshotInfo *sn_info)
 {
@@ -998,6 +1022,7 @@  static BlockDriver bdrv_rbd = {
     .bdrv_aio_discard       = qemu_rbd_aio_discard,
 #endif
 
+    .bdrv_lockf             = qemu_rbd_lockf,
     .bdrv_snapshot_create   = qemu_rbd_snap_create,
     .bdrv_snapshot_delete   = qemu_rbd_snap_remove,
     .bdrv_snapshot_list     = qemu_rbd_snap_list,