Message ID | cd07f4c1-6b18-ecfc-b37c-a5d9369a13bb@gmail.com |
---|---|
State | New |
Headers | show |
On Tue, May 17, 2016 at 6:03 AM, Sebastian Färber <sfaerber82@gmail.com> wrote: > Hi Kevin, > >> A correct reopen implementation must consider all options and flags that >> .bdrv_open() looked at. >> >> The options are okay, as both "filename" and "password-secret" aren't >> things that we want to allow a reopen to change. However, in the flags >> BDRV_O_NOCACHE makes a difference: >> >> if (flags & BDRV_O_NOCACHE) { >> rados_conf_set(s->cluster, "rbd_cache", "false"); >> } else { >> rados_conf_set(s->cluster, "rbd_cache", "true"); >> } >> >> A reopen must either update the setting, or if it can't (e.g. because >> librbd doesn't support it) any attempt to change the flag must fail. > > Thanks for the feedback. > As far as i can tell it's not possible to update the cache settings > without reconnecting. I've added a check in the following patch. > Would be great if someone who knows the internals of ceph/rbd could > have a look as well. Correct, you cannot currently dynamically enable nor disable the cache. You would need to close the image and re-open it to change the cache settings. > Sebastian > > -- >8 -- > Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub > > Add support for reopen() by adding the .bdrv_reopen_prepare() stub > > Signed-off-by: Sebastian Färber <sfaerber82@gmail.com> > --- > block/rbd.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index 5bc5b32..8ecf096 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -577,6 +577,19 @@ failed_opts: > return r; > } > > +/* Note that this will not re-establish a connection with the Ceph cluster > + - it is effectively a NOP. */ > +static int qemu_rbd_reopen_prepare(BDRVReopenState *state, > + BlockReopenQueue *queue, Error **errp) > +{ > + if (state->flags & BDRV_O_NOCACHE && > + ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) { > + error_setg(errp, "Cannot turn off rbd_cache during reopen"); > + return -EINVAL; > + } > + return 0; > +} > + > static void qemu_rbd_close(BlockDriverState *bs) > { > BDRVRBDState *s = bs->opaque; > @@ -976,6 +989,7 @@ static BlockDriver bdrv_rbd = { > .instance_size = sizeof(BDRVRBDState), > .bdrv_needs_filename = true, > .bdrv_file_open = qemu_rbd_open, > + .bdrv_reopen_prepare = qemu_rbd_reopen_prepare, > .bdrv_close = qemu_rbd_close, > .bdrv_create = qemu_rbd_create, > .bdrv_has_zero_init = bdrv_has_zero_init_1, > -- > 1.8.3.1 > >
On 05/17/2016 03:03 AM, Sebastian Färber wrote: > Hi Kevin, > >> A correct reopen implementation must consider all options and flags that >> .bdrv_open() looked at. >> >> The options are okay, as both "filename" and "password-secret" aren't >> things that we want to allow a reopen to change. However, in the flags >> BDRV_O_NOCACHE makes a difference: >> >> if (flags & BDRV_O_NOCACHE) { >> rados_conf_set(s->cluster, "rbd_cache", "false"); >> } else { >> rados_conf_set(s->cluster, "rbd_cache", "true"); >> } >> >> A reopen must either update the setting, or if it can't (e.g. because >> librbd doesn't support it) any attempt to change the flag must fail. Updating this setting on an open image won't do anything, but if you rbd_close() and rbd_open() it again the setting will take effect. rbd_close() will force a flush of any pending I/O in librbd and free the memory for librbd's ImageCtx, which may or may not be desired here. > Thanks for the feedback. > As far as i can tell it's not possible to update the cache settings > without reconnecting. I've added a check in the following patch. > Would be great if someone who knows the internals of ceph/rbd could > have a look as well. There's no need to reset the librados state, so connections to the cluster can stick around. I'm a bit unclear on the bdrv_reopen_* functions though - what is their intended use and semantics? > Sebastian > > -- >8 -- > Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub > > Add support for reopen() by adding the .bdrv_reopen_prepare() stub > > Signed-off-by: Sebastian Färber <sfaerber82@gmail.com> > --- > block/rbd.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index 5bc5b32..8ecf096 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -577,6 +577,19 @@ failed_opts: > return r; > } > > +/* Note that this will not re-establish a connection with the Ceph cluster > + - it is effectively a NOP. */ > +static int qemu_rbd_reopen_prepare(BDRVReopenState *state, > + BlockReopenQueue *queue, Error **errp) > +{ > + if (state->flags & BDRV_O_NOCACHE && > + ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) { > + error_setg(errp, "Cannot turn off rbd_cache during reopen"); > + return -EINVAL; > + } > + return 0; > +} > + > static void qemu_rbd_close(BlockDriverState *bs) > { > BDRVRBDState *s = bs->opaque; > @@ -976,6 +989,7 @@ static BlockDriver bdrv_rbd = { > .instance_size = sizeof(BDRVRBDState), > .bdrv_needs_filename = true, > .bdrv_file_open = qemu_rbd_open, > + .bdrv_reopen_prepare = qemu_rbd_reopen_prepare, > .bdrv_close = qemu_rbd_close, > .bdrv_create = qemu_rbd_create, > .bdrv_has_zero_init = bdrv_has_zero_init_1, > -- > 1.8.3.1 >
On Tue, May 17, 2016 at 12:03:53PM +0200, Sebastian Färber wrote: > Hi Kevin, > > > A correct reopen implementation must consider all options and flags that > > .bdrv_open() looked at. > > > > The options are okay, as both "filename" and "password-secret" aren't > > things that we want to allow a reopen to change. However, in the flags > > BDRV_O_NOCACHE makes a difference: > > > > if (flags & BDRV_O_NOCACHE) { > > rados_conf_set(s->cluster, "rbd_cache", "false"); > > } else { > > rados_conf_set(s->cluster, "rbd_cache", "true"); > > } > > > > A reopen must either update the setting, or if it can't (e.g. because > > librbd doesn't support it) any attempt to change the flag must fail. > > Thanks for the feedback. > As far as i can tell it's not possible to update the cache settings > without reconnecting. I've added a check in the following patch. > Would be great if someone who knows the internals of ceph/rbd could > have a look as well. > > Sebastian > Is there any reason you cannot create a new connection, and then tear down the old one, and switch over to use the new connection? The reopen functions have a three operations: * _prepare() Parses flags, creates new connections / file descriptors, etc., but does not do anything that cannot be undone, or anything that makes the reopen operation "live". * _commit() (if applicable) switches over to the new fd, and makes the reopen operation complete (and cleans up anything that needs to be cleaned up from _prepare()). * _abort() cleanly undoes anything done in _prepare() - called if there is an error in _prepare(), or if the reopen operation is part of a larger transaction of multiple reopens, one of which failed. For instance, it may be useful to look at how it is handled in the gluster driver, and see if you can do something similar here in ceph. For gluster, we create a new gluster connection in the _prepare(), and in the _commit() we close the old fd, and switch the image driver instance state fd over to the new one. > -- >8 -- > Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub > > Add support for reopen() by adding the .bdrv_reopen_prepare() stub > > Signed-off-by: Sebastian Färber <sfaerber82@gmail.com> > --- > block/rbd.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index 5bc5b32..8ecf096 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -577,6 +577,19 @@ failed_opts: > return r; > } > > +/* Note that this will not re-establish a connection with the Ceph cluster > + - it is effectively a NOP. */ > +static int qemu_rbd_reopen_prepare(BDRVReopenState *state, > + BlockReopenQueue *queue, Error **errp) > +{ > + if (state->flags & BDRV_O_NOCACHE && > + ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) { > + error_setg(errp, "Cannot turn off rbd_cache during reopen"); > + return -EINVAL; > + } > + return 0; > +} > + > static void qemu_rbd_close(BlockDriverState *bs) > { > BDRVRBDState *s = bs->opaque; > @@ -976,6 +989,7 @@ static BlockDriver bdrv_rbd = { > .instance_size = sizeof(BDRVRBDState), > .bdrv_needs_filename = true, > .bdrv_file_open = qemu_rbd_open, > + .bdrv_reopen_prepare = qemu_rbd_reopen_prepare, > .bdrv_close = qemu_rbd_close, > .bdrv_create = qemu_rbd_create, > .bdrv_has_zero_init = bdrv_has_zero_init_1, > -- > 1.8.3.1 >
> > There's no need to reset the librados state, so connections to the > cluster can stick around. I'm a bit unclear on the bdrv_reopen_* > functions though - what is their intended use and semantics? My motivation for implementing this basic reopen support is getting active block commit in qemu (driven by libvirt) to work for rbd. For example: - VM is running and using rbd device as primary disk - Create external local qcow2 snapshot with rbd as backing device - All writes end up in local qcow2 file - Do active block commit to get changes back to rbd after a while Without the patch the last part (active block commit) fails in qemu: Block format 'rbd' used by device '' does not support reopening files I think, in this case, block commit wants to switch the rbd device to RW to make sure it can write back the changes. Which is unnecessary in this case because it already is writeable. Unfortunately i don't think i can implement proper/full-blown reopen support (like gluster) for rbd - this would be way above my head ... Sebastian
Am 17.05.2016 um 20:48 hat Josh Durgin geschrieben: > On 05/17/2016 03:03 AM, Sebastian Färber wrote: > >Hi Kevin, > > > >>A correct reopen implementation must consider all options and flags that > >>.bdrv_open() looked at. > >> > >>The options are okay, as both "filename" and "password-secret" aren't > >>things that we want to allow a reopen to change. However, in the flags > >>BDRV_O_NOCACHE makes a difference: > >> > >> if (flags & BDRV_O_NOCACHE) { > >> rados_conf_set(s->cluster, "rbd_cache", "false"); > >> } else { > >> rados_conf_set(s->cluster, "rbd_cache", "true"); > >> } > >> > >>A reopen must either update the setting, or if it can't (e.g. because > >>librbd doesn't support it) any attempt to change the flag must fail. > > Updating this setting on an open image won't do anything, but if you > rbd_close() and rbd_open() it again the setting will take effect. > rbd_close() will force a flush of any pending I/O in librbd and > free the memory for librbd's ImageCtx, which may or may not be desired > here. First rbd_close() and then rbd_open() risks that the rbd_open() fails and we end up with no usable image at all. Can we open a second instance of the image first and only close the first one if that succeeded? We already flush all requests before calling this, so that part shouldn't make a difference. > >Thanks for the feedback. > >As far as i can tell it's not possible to update the cache settings > >without reconnecting. I've added a check in the following patch. > >Would be great if someone who knows the internals of ceph/rbd could > >have a look as well. > > There's no need to reset the librados state, so connections to the > cluster can stick around. I'm a bit unclear on the bdrv_reopen_* > functions though - what is their intended use and semantics? They change the options and flags that were specified in .bdrv_open(). The most important use case today is switching between read-only and read-write, but changing the cache mode or any other option can be requested as well. > >Sebastian > > > >-- >8 -- > >Subject: [PATCH] block/rbd: add .bdrv_reopen_prepare() stub > > > >Add support for reopen() by adding the .bdrv_reopen_prepare() stub > > > >Signed-off-by: Sebastian Färber <sfaerber82@gmail.com> > >--- > > block/rbd.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > >diff --git a/block/rbd.c b/block/rbd.c > >index 5bc5b32..8ecf096 100644 > >--- a/block/rbd.c > >+++ b/block/rbd.c > >@@ -577,6 +577,19 @@ failed_opts: > > return r; > > } > > > >+/* Note that this will not re-establish a connection with the Ceph cluster > >+ - it is effectively a NOP. */ > >+static int qemu_rbd_reopen_prepare(BDRVReopenState *state, > >+ BlockReopenQueue *queue, Error **errp) > >+{ > >+ if (state->flags & BDRV_O_NOCACHE && > >+ ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) { This misses the other direction, where you try to turn on caching. If we don't implement the real functionality, we should always error out if the bit changes. The most readable check is probably: (state->flags & BDRV_O_NOCACHE) != (state->bs->open_flags & BDRV_O_NOCACHE) > >+ error_setg(errp, "Cannot turn off rbd_cache during reopen"); > >+ return -EINVAL; > >+ } > >+ return 0; > >+} Kevin
On Wed, May 18, 2016 at 4:19 AM, Kevin Wolf <kwolf@redhat.com> wrote: >> Updating this setting on an open image won't do anything, but if you >> rbd_close() and rbd_open() it again the setting will take effect. >> rbd_close() will force a flush of any pending I/O in librbd and >> free the memory for librbd's ImageCtx, which may or may not be desired >> here. > > First rbd_close() and then rbd_open() risks that the rbd_open() fails > and we end up with no usable image at all. Can we open a second instance > of the image first and only close the first one if that succeeded? > > We already flush all requests before calling this, so that part > shouldn't make a difference. You can open the same image twice without issue. It's perfectly fine to rbd_open() with the new settings and upon success rbd_close() the original image handle.
diff --git a/block/rbd.c b/block/rbd.c index 5bc5b32..8ecf096 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -577,6 +577,19 @@ failed_opts: return r; } +/* Note that this will not re-establish a connection with the Ceph cluster + - it is effectively a NOP. */ +static int qemu_rbd_reopen_prepare(BDRVReopenState *state, + BlockReopenQueue *queue, Error **errp) +{ + if (state->flags & BDRV_O_NOCACHE && + ((state->bs->open_flags & BDRV_O_NOCACHE) == 0)) { + error_setg(errp, "Cannot turn off rbd_cache during reopen"); + return -EINVAL; + } + return 0; +} + static void qemu_rbd_close(BlockDriverState *bs) { BDRVRBDState *s = bs->opaque; @@ -976,6 +989,7 @@ static BlockDriver bdrv_rbd = { .instance_size = sizeof(BDRVRBDState), .bdrv_needs_filename = true, .bdrv_file_open = qemu_rbd_open, + .bdrv_reopen_prepare = qemu_rbd_reopen_prepare, .bdrv_close = qemu_rbd_close, .bdrv_create = qemu_rbd_create, .bdrv_has_zero_init = bdrv_has_zero_init_1,