Message ID | 1365552342-22840-1-git-send-email-josh.durgin@inktank.com |
---|---|
State | New |
Headers | show |
On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: > This allows the rbd block driver to detect symbols in the installed > version of librbd, and enable or disable features appropriately. This > obviates the #ifdefs regarding librbd versions. > > Loading librbd dynamically also makes the rbd block driver easier to > install and package, since it removes the dependency on librbd at > build time. > > Add structures containing the necessary function pointer signatures > and types from librbd, and fill them in the first time the rbd module > is used. Use glib's g_module interface so we don't preclude future > portability, and don't have to deal with odd dlopen behavior directly. > > Internally, librbd and some libraries it depends on use C++ templates, > which mean that they each contain a defined weak symbol for their > definition. Due to the way the linker resolves duplicate symbols, the > libraries loaded by librbd end up using the template definitions from > librbd, creating a circular dependency. This means that once librbd is > loaded, it won't be unloaded. Changing this behavior might work with a > Sun ld, but there doesn't seem to be a portable (or even working with > GNU ld) way to hide these C++ symbols correctly. Instead, never unload > librbd, and explicitly make it resident. > > Signed-off-by: Josh Durgin <josh.durgin@inktank.com> > --- > block/Makefile.objs | 3 +- > block/rbd.c | 292 ++++++++++++++++++++++++++++++++++++--------------- > block/rbd_types.h | 95 +++++++++++++++++ > configure | 41 +------- > 4 files changed, 309 insertions(+), 122 deletions(-) > create mode 100644 block/rbd_types.h NACK I think we're solving the problem at the wrong level. Writing our own dynamic linker and adding boilerplate to juggle function pointers every time we use a library dependency is ugly. There are two related problems here: 1. Packagers do not want to enable niche dependencies since users will complain that the package is bloated and pulls in too much stuff. 2. QEMU linked against a newer library version fails to run on hosts that have an older library. Problem #1 has several solutions: 1. Let packagers take care of it. For example, vim is often shipped in several packages that have various amounts of dependencies (vim-tiny, vim-gtk, etc). Packagers create the specialized packages for specific groups of users to meet their demands without dragging in too many dependencies. 2. Make QEMU modular - host devices should be shared libraries that are loaded at runtime. There should be no stable API so that development stays flexible and we discourage binary-only modules. This lets packagers easily ship a qemu-rbd package, for example, that drops in a .so file that QEMU can load at runtime. Problem #2 is already solved: The dynamic linker will refuse to load the program if there are missing symbols. It's not possible to mix and match binaries across environments while downgrading their library dependencies. With effort, this could be doable but it's not an interesting use case that many users care about - they get their binaries from a distro or build them from source with correct dependencies. Maybe it's time to move block drivers and other components into modules? Stefan
On 04/10/2013 01:09 AM, Stefan Hajnoczi wrote: > On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: >> This allows the rbd block driver to detect symbols in the installed >> version of librbd, and enable or disable features appropriately. This >> obviates the #ifdefs regarding librbd versions. >> >> Loading librbd dynamically also makes the rbd block driver easier to >> install and package, since it removes the dependency on librbd at >> build time. >> >> Add structures containing the necessary function pointer signatures >> and types from librbd, and fill them in the first time the rbd module >> is used. Use glib's g_module interface so we don't preclude future >> portability, and don't have to deal with odd dlopen behavior directly. >> >> Internally, librbd and some libraries it depends on use C++ templates, >> which mean that they each contain a defined weak symbol for their >> definition. Due to the way the linker resolves duplicate symbols, the >> libraries loaded by librbd end up using the template definitions from >> librbd, creating a circular dependency. This means that once librbd is >> loaded, it won't be unloaded. Changing this behavior might work with a >> Sun ld, but there doesn't seem to be a portable (or even working with >> GNU ld) way to hide these C++ symbols correctly. Instead, never unload >> librbd, and explicitly make it resident. >> >> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> >> --- >> block/Makefile.objs | 3 +- >> block/rbd.c | 292 ++++++++++++++++++++++++++++++++++++--------------- >> block/rbd_types.h | 95 +++++++++++++++++ >> configure | 41 +------- >> 4 files changed, 309 insertions(+), 122 deletions(-) >> create mode 100644 block/rbd_types.h > > NACK > > I think we're solving the problem at the wrong level. Writing our own > dynamic linker and adding boilerplate to juggle function pointers > every time we use a library dependency is ugly. I wouldn't call this writing our own dynamic linker. To load a library at runtime requires some boilerplate and using function pointers, although there could be cleaner ways of setting it up. > There are two related problems here: > > 1. Packagers do not want to enable niche dependencies since users will > complain that the package is bloated and pulls in too much stuff. > > 2. QEMU linked against a newer library version fails to run on hosts > that have an older library. Related to 2, there's also a need to rebuild QEMU to be able to access new functionality in a newer version of a library than QEMU was linked against. This would be easier to do if the devices were modules themselves, since only the module that used the library would need to be rebuilt. > Problem #1 has several solutions: > > 1. Let packagers take care of it. For example, vim is often shipped > in several packages that have various amounts of dependencies > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > for specific groups of users to meet their demands without dragging in > too many dependencies. > > 2. Make QEMU modular - host devices should be shared libraries that > are loaded at runtime. There should be no stable API so that > development stays flexible and we discourage binary-only modules. > This lets packagers easily ship a qemu-rbd package, for example, that > drops in a .so file that QEMU can load at runtime. > > Problem #2 is already solved: > > The dynamic linker will refuse to load the program if there are > missing symbols. It's not possible to mix and match binaries across > environments while downgrading their library dependencies. With > effort, this could be doable but it's not an interesting use case that > many users care about - they get their binaries from a distro or build > them from source with correct dependencies. > > Maybe it's time to move block drivers and other components into modules? I think that would be great. I'm not too familiar with the other components, so I'll discuss block drivers as a first step. I could see this happening in at a few different ways. These could be incremental steps or final implementations, increasing in complexity and effort: 1) It seems like it would be possible to achieve an initial version of runtime loading with minimal driver changes by loading modules instead of using block_init(), and using the existing driver-specific block_init() methods (renamed to be the same across modules) as the only function. This would avoid the function pointer ugliness since it would be hidden by the existing BlockDriver structure. 2) The existing BlockDriver methods could become the module interface, with a couple extra functions for defining the current data members (format_name, protocol_name, instance_size, and create_options). The block layer could build up a BlockDriver for each module based on the functions it exports, detected by g_module_symbol(). 3) An entirely different module interface from what exists today in the block drivers. Thoughts? Josh
Stefan Hajnoczi <stefanha@gmail.com> writes: > On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: > NACK > > I think we're solving the problem at the wrong level. Writing our own > dynamic linker and adding boilerplate to juggle function pointers > every time we use a library dependency is ugly. > > There are two related problems here: > > 1. Packagers do not want to enable niche dependencies since users will > complain that the package is bloated and pulls in too much stuff. > > 2. QEMU linked against a newer library version fails to run on hosts > that have an older library. > > Problem #1 has several solutions: > > 1. Let packagers take care of it. For example, vim is often shipped > in several packages that have various amounts of dependencies > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > for specific groups of users to meet their demands without dragging in > too many dependencies. > > 2. Make QEMU modular - host devices should be shared libraries that > are loaded at runtime. There should be no stable API so that > development stays flexible and we discourage binary-only modules. > This lets packagers easily ship a qemu-rbd package, for example, that > drops in a .so file that QEMU can load at runtime. > > Problem #2 is already solved: > > The dynamic linker will refuse to load the program if there are > missing symbols. It's not possible to mix and match binaries across > environments while downgrading their library dependencies. With > effort, this could be doable but it's not an interesting use case that > many users care about - they get their binaries from a distro or build > them from source with correct dependencies. > > Maybe it's time to move block drivers and other components into > modules? This is really a build system issue more than anything else. There are no internal API changes needed. All that's needed is to something like (in module.h): /* This should not be used directly. Use block_init etc. instead. */ #ifdef CONFIG_MODULE #define module_init(function, type) \ const gchar *g_module_check_init(GModule *module) \ { \ register_module_init(function, type); \ return NULL; \ } #else #define module_init(function, type) \ static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ register_module_init(function, type); \ } #endif We then also need a way to load modules prior to calling init using the GModule interfaces. Easiest thing to do is just load all .so's in a single directory (/usr/lib/qemu/modules/*.so?) prior to calling any module init functions. What we need from the build system is the ability to build things either builtin or as modules. Paolo has a GSoC proposal to integrate kconfig. This would be a great approach to solving this problem. Doing it this way would let us build not only block drivers but also devices as modules. This would let us make QXL a module making it easier for distros to not have a hard dependence on libspice for the QEMU package. Regards, Anthony Liguori > > Stefan
Il 10/04/2013 17:08, Anthony Liguori ha scritto: > /* This should not be used directly. Use block_init etc. instead. */ > #ifdef CONFIG_MODULE > #define module_init(function, type) \ > const gchar *g_module_check_init(GModule *module) \ > { \ > register_module_init(function, type); \ > return NULL; \ > } > #else > #define module_init(function, type) \ > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > register_module_init(function, type); \ > } > #endif Not even that is needed. Modules can have constructor functions that use symbols in the main executable. Basically, modules would be opened with G_MODULE_BIND_LOCAL and communicate with QEMU via constructor functions only (registering driver modules or QOM types). It is really more of a build-system hacking project than anything else. Paolo > We then also need a way to load modules prior to calling init using the > GModule interfaces. Easiest thing to do is just load all .so's in a > single directory (/usr/lib/qemu/modules/*.so?) prior to calling any > module init functions.
On Wed, Apr 10, 2013 at 10:08:20AM -0500, Anthony Liguori wrote: > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: > > NACK > > > > I think we're solving the problem at the wrong level. Writing our own > > dynamic linker and adding boilerplate to juggle function pointers > > every time we use a library dependency is ugly. > > > > There are two related problems here: > > > > 1. Packagers do not want to enable niche dependencies since users will > > complain that the package is bloated and pulls in too much stuff. > > > > 2. QEMU linked against a newer library version fails to run on hosts > > that have an older library. > > > > Problem #1 has several solutions: > > > > 1. Let packagers take care of it. For example, vim is often shipped > > in several packages that have various amounts of dependencies > > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > > for specific groups of users to meet their demands without dragging in > > too many dependencies. > > > > 2. Make QEMU modular - host devices should be shared libraries that > > are loaded at runtime. There should be no stable API so that > > development stays flexible and we discourage binary-only modules. > > This lets packagers easily ship a qemu-rbd package, for example, that > > drops in a .so file that QEMU can load at runtime. > > > > Problem #2 is already solved: > > > > The dynamic linker will refuse to load the program if there are > > missing symbols. It's not possible to mix and match binaries across > > environments while downgrading their library dependencies. With > > effort, this could be doable but it's not an interesting use case that > > many users care about - they get their binaries from a distro or build > > them from source with correct dependencies. > > > > Maybe it's time to move block drivers and other components into > > modules? > > This is really a build system issue more than anything else. There are > no internal API changes needed. > > All that's needed is to something like (in module.h): > > /* This should not be used directly. Use block_init etc. instead. */ > #ifdef CONFIG_MODULE > #define module_init(function, type) \ > const gchar *g_module_check_init(GModule *module) \ > { \ > register_module_init(function, type); \ > return NULL; \ > } > #else > #define module_init(function, type) \ > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > register_module_init(function, type); \ > } > #endif > > We then also need a way to load modules prior to calling init using the > GModule interfaces. Easiest thing to do is just load all .so's in a > single directory (/usr/lib/qemu/modules/*.so?) prior to calling any > module init functions. > > What we need from the build system is the ability to build things either > builtin or as modules. Paolo has a GSoC proposal to integrate kconfig. > This would be a great approach to solving this problem. > > Doing it this way would let us build not only block drivers but also > devices as modules. This would let us make QXL a module making it > easier for distros to not have a hard dependence on libspice for the > QEMU package. I'd love to do this for the net subsystem as well so distros can provide the qemu-net-vde package without requiring a VDE dependency from the core QEMU package. Stefan
On Wed, Apr 10, 2013 at 11:19:03PM +0200, Paolo Bonzini wrote: > Il 10/04/2013 17:08, Anthony Liguori ha scritto: > > /* This should not be used directly. Use block_init etc. instead. */ > > #ifdef CONFIG_MODULE > > #define module_init(function, type) \ > > const gchar *g_module_check_init(GModule *module) \ > > { \ > > register_module_init(function, type); \ > > return NULL; \ > > } > > #else > > #define module_init(function, type) \ > > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > > register_module_init(function, type); \ > > } > > #endif > > Not even that is needed. Modules can have constructor functions that > use symbols in the main executable. > > Basically, modules would be opened with G_MODULE_BIND_LOCAL and > communicate with QEMU via constructor functions only (registering driver > modules or QOM types). > > It is really more of a build-system hacking project than anything else. Great. So it seems like we just need a function early in QEMU startup that enumerates all shared libraries in QEMU_MODULE_PATH and opens them with G_MODULE_BIND_LOCAL. Perhaps in the future we'll load modules on-demand to avoid reduce footprint, but for now we can just load all of them just like you get when you build a QEMU binary with everything included. Stefan
Hi Anthony, Stefan, Paolo, [Resurrecting an old thread, here!] On Wed, 10 Apr 2013 Anthony Liguori wrote: > Stefan Hajnoczi <stefanha@gmail.com> writes: > > NACK > > > > I think we're solving the problem at the wrong level. Writing our own > > dynamic linker and adding boilerplate to juggle function pointers > > every time we use a library dependency is ugly. > > > > There are two related problems here: > > > > 1. Packagers do not want to enable niche dependencies since users will > > complain that the package is bloated and pulls in too much stuff. > > > > 2. QEMU linked against a newer library version fails to run on hosts > > that have an older library. > > > > Problem #1 has several solutions: > > > > 1. Let packagers take care of it. For example, vim is often shipped > > in several packages that have various amounts of dependencies > > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > > for specific groups of users to meet their demands without dragging in > > too many dependencies. > > > > 2. Make QEMU modular - host devices should be shared libraries that > > are loaded at runtime. There should be no stable API so that > > development stays flexible and we discourage binary-only modules. > > This lets packagers easily ship a qemu-rbd package, for example, that > > drops in a .so file that QEMU can load at runtime. > > > > Problem #2 is already solved: > > > > The dynamic linker will refuse to load the program if there are > > missing symbols. It's not possible to mix and match binaries across > > environments while downgrading their library dependencies. With > > effort, this could be doable but it's not an interesting use case that > > many users care about - they get their binaries from a distro or build > > them from source with correct dependencies. > > > > Maybe it's time to move block drivers and other components into > > modules? > > This is really a build system issue more than anything else. There are > no internal API changes needed. > > All that's needed is to something like (in module.h): > > /* This should not be used directly. Use block_init etc. instead. */ > #ifdef CONFIG_MODULE > #define module_init(function, type) \ > const gchar *g_module_check_init(GModule *module) \ > { \ > register_module_init(function, type); \ > return NULL; \ > } > #else > #define module_init(function, type) \ > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > register_module_init(function, type); \ > } > #endif Has any progress been made toward a generic dynamic linking solution for block drivers? It is a conceptually simple change, but non-trivial for anyone unfamiliar with the build system. In the mean time, the inability to dynamically link librbd is continuing to cause significant pain for distro users. Would you consider merging the rbd-specific linking as an interim solution until something is generically available to qemu? sage
--On 21 June 2013 11:42:34 -0700 Sage Weil <sage@inktank.com> wrote: > Has any progress been made toward a generic dynamic linking solution for > block drivers? It is a conceptually simple change, but non-trivial for > anyone unfamiliar with the build system. > > In the mean time, the inability to dynamically link librbd is continuing > to cause significant pain for distro users. Would you consider merging > the rbd-specific linking as an interim solution until something is > generically available to qemu? FWIW I have this implemented here: https://github.com/flexiant/qemu/commits/v1.0-rbd-add-async-flush specifically this commit: https://github.com/flexiant/qemu/commit/6fa2e9c95bdaca7c814881e27f04424fb6cc 2960 This is a backport of qemu 1.5's librbd to qemu 1.0 (which is slighly confusing) but the #pragma weak technique is basically the same. I had to use a slightly different technique to Josh's original version to get it to compile with old includes and still run with a new library, as well as vice versa. It would take me only a few minutes to bring this back as a patch for 1.5 if anyone is interested. I assumed they would not be.
Sage Weil <sage@inktank.com> writes: > Hi Anthony, Stefan, Paolo, > > [Resurrecting an old thread, here!] > > On Wed, 10 Apr 2013 Anthony Liguori wrote: > > Has any progress been made toward a generic dynamic linking solution for > block drivers? It is a conceptually simple change, but non-trivial for > anyone unfamiliar with the build system. There are patches on the list in fact. > In the mean time, the inability to dynamically link librbd is continuing > to cause significant pain for distro users. Would you consider merging > the rbd-specific linking as an interim solution until something is > generically available to qemu? Regards, Anthony Liguori > > sage
diff --git a/block/Makefile.objs b/block/Makefile.objs index c067f38..2ad32f7 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -8,10 +8,9 @@ block-obj-$(CONFIG_POSIX) += raw-posix.o block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o ifeq ($(CONFIG_POSIX),y) -block-obj-y += nbd.o sheepdog.o +block-obj-y += nbd.o rbd.o sheepdog.o block-obj-$(CONFIG_LIBISCSI) += iscsi.o block-obj-$(CONFIG_CURL) += curl.o -block-obj-$(CONFIG_RBD) += rbd.o block-obj-$(CONFIG_GLUSTERFS) += gluster.o endif diff --git a/block/rbd.c b/block/rbd.c index 037d82b..0bf5c40 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -11,13 +11,13 @@ * GNU GPL, version 2 or (at your option) any later version. */ +#include <gmodule.h> #include <inttypes.h> #include "qemu-common.h" #include "qemu/error-report.h" #include "block/block_int.h" - -#include <rbd/librbd.h> +#include "rbd_types.h" /* * When specifying the image filename use: @@ -44,13 +44,6 @@ * leading "\". */ -/* rbd_aio_discard added in 0.1.2 */ -#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2) -#define LIBRBD_SUPPORTS_DISCARD -#else -#undef LIBRBD_SUPPORTS_DISCARD -#endif - #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER) #define RBD_MAX_CONF_NAME_SIZE 128 @@ -106,6 +99,12 @@ typedef struct BDRVRBDState { RADOSCB *event_rcb; } BDRVRBDState; +static LibradosFuncs librados; +static LibrbdFuncs librbd; +static bool librbd_loaded; +static GModule *librbd_handle; + +static int qemu_rbd_load_libs(void); static void rbd_aio_bh_cb(void *opaque); static int qemu_rbd_next_tok(char *dst, int dst_len, @@ -267,7 +266,7 @@ static int qemu_rbd_set_conf(rados_t cluster, const char *conf) qemu_rbd_unescape(value); if (strcmp(name, "conf") == 0) { - ret = rados_conf_read_file(cluster, value); + ret = (*librados.rados_conf_read_file)(cluster, value); if (ret < 0) { error_report("error reading conf file %s", value); break; @@ -275,7 +274,7 @@ static int qemu_rbd_set_conf(rados_t cluster, const char *conf) } else if (strcmp(name, "id") == 0) { /* ignore, this is parsed by qemu_rbd_parse_clientname() */ } else { - ret = rados_conf_set(cluster, name, value); + ret = (*librados.rados_conf_set)(cluster, name, value); if (ret < 0) { error_report("invalid conf option %s", name); ret = -EINVAL; @@ -310,6 +309,10 @@ static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options) return -EINVAL; } + if (qemu_rbd_load_libs() < 0) { + return -EIO; + } + /* Read out options */ while (options && options->name) { if (!strcmp(options->name, BLOCK_OPT_SIZE)) { @@ -332,38 +335,38 @@ static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options) } clientname = qemu_rbd_parse_clientname(conf, clientname_buf); - if (rados_create(&cluster, clientname) < 0) { + if ((*librados.rados_create)(&cluster, clientname) < 0) { error_report("error initializing"); return -EIO; } if (strstr(conf, "conf=") == NULL) { /* try default location, but ignore failure */ - rados_conf_read_file(cluster, NULL); + (*librados.rados_conf_read_file)(cluster, NULL); } if (conf[0] != '\0' && qemu_rbd_set_conf(cluster, conf) < 0) { error_report("error setting config options"); - rados_shutdown(cluster); + (*librados.rados_shutdown)(cluster); return -EIO; } - if (rados_connect(cluster) < 0) { + if ((*librados.rados_connect)(cluster) < 0) { error_report("error connecting"); - rados_shutdown(cluster); + (*librados.rados_shutdown)(cluster); return -EIO; } - if (rados_ioctx_create(cluster, pool, &io_ctx) < 0) { + if ((*librados.rados_ioctx_create)(cluster, pool, &io_ctx) < 0) { error_report("error opening pool %s", pool); - rados_shutdown(cluster); + (*librados.rados_shutdown)(cluster); return -EIO; } - ret = rbd_create(io_ctx, name, bytes, &obj_order); - rados_ioctx_destroy(io_ctx); - rados_shutdown(cluster); + ret = (*librbd.rbd_create)(io_ctx, name, bytes, &obj_order); + (*librados.rados_ioctx_destroy)(io_ctx); + (*librados.rados_shutdown)(cluster); return ret; } @@ -459,8 +462,12 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, return -EINVAL; } + if (qemu_rbd_load_libs() < 0) { + return -EIO; + } + clientname = qemu_rbd_parse_clientname(conf, clientname_buf); - r = rados_create(&s->cluster, clientname); + r = (*librados.rados_create)(&s->cluster, clientname); if (r < 0) { error_report("error initializing"); return r; @@ -479,14 +486,14 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, * be set up, fall back to no caching. */ if (flags & BDRV_O_NOCACHE) { - rados_conf_set(s->cluster, "rbd_cache", "false"); + (*librados.rados_conf_set)(s->cluster, "rbd_cache", "false"); } else { - rados_conf_set(s->cluster, "rbd_cache", "true"); + (*librados.rados_conf_set)(s->cluster, "rbd_cache", "true"); } if (strstr(conf, "conf=") == NULL) { /* try default location, but ignore failure */ - rados_conf_read_file(s->cluster, NULL); + (*librados.rados_conf_read_file)(s->cluster, NULL); } if (conf[0] != '\0') { @@ -497,19 +504,19 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, } } - r = rados_connect(s->cluster); + r = (*librados.rados_connect)(s->cluster); if (r < 0) { error_report("error connecting"); goto failed_shutdown; } - r = rados_ioctx_create(s->cluster, pool, &s->io_ctx); + r = (*librados.rados_ioctx_create)(s->cluster, pool, &s->io_ctx); if (r < 0) { error_report("error opening pool %s", pool); goto failed_shutdown; } - r = rbd_open(s->io_ctx, s->name, &s->image, s->snap); + r = (*librbd.rbd_open)(s->io_ctx, s->name, &s->image, s->snap); if (r < 0) { error_report("error reading header from %s", s->name); goto failed_open; @@ -532,11 +539,11 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, return 0; failed: - rbd_close(s->image); + (*librbd.rbd_close)(s->image); failed_open: - rados_ioctx_destroy(s->io_ctx); + (*librados.rados_ioctx_destroy)(s->io_ctx); failed_shutdown: - rados_shutdown(s->cluster); + (*librados.rados_shutdown)(s->cluster); g_free(s->snap); return r; } @@ -549,10 +556,10 @@ static void qemu_rbd_close(BlockDriverState *bs) close(s->fds[1]); qemu_aio_set_fd_handler(s->fds[RBD_FD_READ], NULL, NULL, NULL, NULL); - rbd_close(s->image); - rados_ioctx_destroy(s->io_ctx); + (*librbd.rbd_close)(s->image); + (*librados.rados_ioctx_destroy)(s->io_ctx); g_free(s->snap); - rados_shutdown(s->cluster); + (*librados.rados_shutdown)(s->cluster); } /* @@ -618,8 +625,8 @@ static int qemu_rbd_send_pipe(BDRVRBDState *s, RADOSCB *rcb) static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb) { int ret; - rcb->ret = rbd_aio_get_return_value(c); - rbd_aio_release(c); + rcb->ret = (*librbd.rbd_aio_get_return_value)(c); + (*librbd.rbd_aio_release)(c); ret = qemu_rbd_send_pipe(rcb->s, rcb); if (ret < 0) { error_report("failed writing to acb->s->fds"); @@ -647,28 +654,6 @@ static void rbd_aio_bh_cb(void *opaque) } } -static int rbd_aio_discard_wrapper(rbd_image_t image, - uint64_t off, - uint64_t len, - rbd_completion_t comp) -{ -#ifdef LIBRBD_SUPPORTS_DISCARD - return rbd_aio_discard(image, off, len, comp); -#else - return -ENOTSUP; -#endif -} - -static int rbd_aio_flush_wrapper(rbd_image_t image, - rbd_completion_t comp) -{ -#ifdef LIBRBD_SUPPORTS_AIO_FLUSH - return rbd_aio_flush(image, comp); -#else - return -ENOTSUP; -#endif -} - static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, @@ -718,23 +703,25 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, rcb->buf = buf; rcb->s = acb->s; rcb->size = size; - r = rbd_aio_create_completion(rcb, (rbd_callback_t) rbd_finish_aiocb, &c); + r = (*librbd.rbd_aio_create_completion)(rcb, + (rbd_callback_t) rbd_finish_aiocb, + &c); if (r < 0) { goto failed; } switch (cmd) { case RBD_AIO_WRITE: - r = rbd_aio_write(s->image, off, size, buf, c); + r = (*librbd.rbd_aio_write)(s->image, off, size, buf, c); break; case RBD_AIO_READ: - r = rbd_aio_read(s->image, off, size, buf, c); + r = (*librbd.rbd_aio_read)(s->image, off, size, buf, c); break; case RBD_AIO_DISCARD: - r = rbd_aio_discard_wrapper(s->image, off, size, c); + r = (*librbd.rbd_aio_discard)(s->image, off, size, c); break; case RBD_AIO_FLUSH: - r = rbd_aio_flush_wrapper(s->image, c); + r = (*librbd.rbd_aio_flush)(s->image, c); break; default: r = -EINVAL; @@ -784,13 +771,11 @@ static BlockDriverAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs, static int qemu_rbd_co_flush(BlockDriverState *bs) { -#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 1) - /* rbd_flush added in 0.1.1 */ BDRVRBDState *s = bs->opaque; - return rbd_flush(s->image); -#else + if (librbd.rbd_flush) { + return (*librbd.rbd_flush)(s->image); + } return 0; -#endif } static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi) @@ -799,7 +784,7 @@ static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi) rbd_image_info_t info; int r; - r = rbd_stat(s->image, &info, sizeof(info)); + r = (*librbd.rbd_stat)(s->image, &info, sizeof(info)); if (r < 0) { return r; } @@ -814,7 +799,7 @@ static int64_t qemu_rbd_getlength(BlockDriverState *bs) rbd_image_info_t info; int r; - r = rbd_stat(s->image, &info, sizeof(info)); + r = (*librbd.rbd_stat)(s->image, &info, sizeof(info)); if (r < 0) { return r; } @@ -827,7 +812,7 @@ static int qemu_rbd_truncate(BlockDriverState *bs, int64_t offset) BDRVRBDState *s = bs->opaque; int r; - r = rbd_resize(s->image, offset); + r = (*librbd.rbd_resize)(s->image, offset); if (r < 0) { return r; } @@ -858,7 +843,7 @@ static int qemu_rbd_snap_create(BlockDriverState *bs, return -ERANGE; } - r = rbd_snap_create(s->image, sn_info->name); + r = (*librbd.rbd_snap_create)(s->image, sn_info->name); if (r < 0) { error_report("failed to create snap: %s", strerror(-r)); return r; @@ -873,7 +858,7 @@ static int qemu_rbd_snap_remove(BlockDriverState *bs, BDRVRBDState *s = bs->opaque; int r; - r = rbd_snap_remove(s->image, snapshot_name); + r = (*librbd.rbd_snap_remove)(s->image, snapshot_name); return r; } @@ -883,7 +868,7 @@ static int qemu_rbd_snap_rollback(BlockDriverState *bs, BDRVRBDState *s = bs->opaque; int r; - r = rbd_snap_rollback(s->image, snapshot_name); + r = (*librbd.rbd_snap_rollback)(s->image, snapshot_name); return r; } @@ -898,7 +883,7 @@ static int qemu_rbd_snap_list(BlockDriverState *bs, do { snaps = g_malloc(sizeof(*snaps) * max_snaps); - snap_count = rbd_snap_list(s->image, snaps, &max_snaps); + snap_count = (*librbd.rbd_snap_list)(s->image, snaps, &max_snaps); if (snap_count < 0) { g_free(snaps); } @@ -922,14 +907,13 @@ static int qemu_rbd_snap_list(BlockDriverState *bs, sn_info->date_nsec = 0; sn_info->vm_clock_nsec = 0; } - rbd_snap_list_end(snaps); + (*librbd.rbd_snap_list_end)(snaps); done: *psn_tab = sn_tab; return snap_count; } -#ifdef LIBRBD_SUPPORTS_DISCARD static BlockDriverAIOCB* qemu_rbd_aio_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors, @@ -939,7 +923,6 @@ static BlockDriverAIOCB* qemu_rbd_aio_discard(BlockDriverState *bs, return rbd_start_aio(bs, sector_num, NULL, nb_sectors, cb, opaque, RBD_AIO_DISCARD); } -#endif static QEMUOptionParameter qemu_rbd_create_options[] = { { @@ -972,11 +955,7 @@ static BlockDriver bdrv_rbd = { .bdrv_aio_flush = qemu_rbd_aio_flush, .bdrv_co_flush_to_disk = qemu_rbd_co_flush, - -#ifdef LIBRBD_SUPPORTS_DISCARD .bdrv_aio_discard = qemu_rbd_aio_discard, -#endif - .bdrv_snapshot_create = qemu_rbd_snap_create, .bdrv_snapshot_delete = qemu_rbd_snap_remove, .bdrv_snapshot_list = qemu_rbd_snap_list, @@ -988,4 +967,153 @@ static void bdrv_rbd_init(void) bdrv_register(&bdrv_rbd); } +typedef struct LibSymbol { + const char *name; + gpointer *addr; +} LibSymbol; + +static int qemu_rbd_set_functions(GModule *lib, const LibSymbol *funcs) +{ + int i = 0; + while (funcs[i].name) { + const char *name = funcs[i].name; + if (!g_module_symbol(lib, name, funcs[i].addr)) { + error_report("%s could not be loaded from librbd or librados: %s", + name, g_module_error()); + return -1; + } + ++i; + } + return 0; +} + +/* + * Set function pointers for basic librados and librbd + * functions that have always been present in these libraries. + */ +static int qemu_rbd_set_mandatory_functions(void) +{ + LibSymbol symbols[] = { + {"rados_create", + (gpointer *) &librados.rados_create}, + {"rados_connect", + (gpointer *) &librados.rados_connect}, + {"rados_shutdown", + (gpointer *) &librados.rados_shutdown}, + {"rados_conf_read_file", + (gpointer *) &librados.rados_conf_read_file}, + {"rados_conf_set", + (gpointer *) &librados.rados_conf_set}, + {"rados_ioctx_create", + (gpointer *) &librados.rados_ioctx_create}, + {"rados_ioctx_destroy", + (gpointer *) &librados.rados_ioctx_destroy}, + {"rbd_create", + (gpointer *) &librbd.rbd_create}, + {"rbd_open", + (gpointer *) &librbd.rbd_open}, + {"rbd_close", + (gpointer *) &librbd.rbd_close}, + {"rbd_resize", + (gpointer *) &librbd.rbd_resize}, + {"rbd_stat", + (gpointer *) &librbd.rbd_stat}, + {"rbd_snap_list", + (gpointer *) &librbd.rbd_snap_list}, + {"rbd_snap_list_end", + (gpointer *) &librbd.rbd_snap_list_end}, + {"rbd_snap_create", + (gpointer *) &librbd.rbd_snap_create}, + {"rbd_snap_remove", + (gpointer *) &librbd.rbd_snap_remove}, + {"rbd_snap_rollback", + (gpointer *) &librbd.rbd_snap_rollback}, + {"rbd_aio_write", + (gpointer *) &librbd.rbd_aio_write}, + {"rbd_aio_read", + (gpointer *) &librbd.rbd_aio_read}, + {"rbd_aio_create_completion", + (gpointer *) &librbd.rbd_aio_create_completion}, + {"rbd_aio_get_return_value", + (gpointer *) &librbd.rbd_aio_get_return_value}, + {"rbd_aio_release", + (gpointer *) &librbd.rbd_aio_release}, + {NULL} + }; + + if (qemu_rbd_set_functions(librbd_handle, symbols) < 0) { + return -1; + } + + return 0; +} + +/* + * Detect whether the installed version of librbd + * supports newer functionality, and enable or disable + * it appropriately in bdrv_rbd. + */ +static void qemu_rbd_set_optional_functions(void) +{ + if (g_module_symbol(librbd_handle, "rbd_flush", + (gpointer *) &librbd.rbd_flush)) { + bdrv_rbd.bdrv_aio_flush = NULL; + bdrv_rbd.bdrv_co_flush_to_disk = qemu_rbd_co_flush; + } else { + librbd.rbd_flush = NULL; + bdrv_rbd.bdrv_co_flush_to_disk = NULL; + } + + if (g_module_symbol(librbd_handle, "rbd_aio_flush", + (gpointer *) &librbd.rbd_aio_flush)) { + bdrv_rbd.bdrv_co_flush_to_disk = NULL; + bdrv_rbd.bdrv_aio_flush = qemu_rbd_aio_flush; + } else { + librbd.rbd_aio_flush = NULL; + bdrv_rbd.bdrv_aio_flush = NULL; + } + + if (g_module_symbol(librbd_handle, "rbd_aio_discard", + (gpointer *) &librbd.rbd_aio_discard)) { + bdrv_rbd.bdrv_aio_discard = qemu_rbd_aio_discard; + } else { + librbd.rbd_aio_discard = NULL; + bdrv_rbd.bdrv_aio_discard = NULL; + } +} + +static int qemu_rbd_load_libs(void) +{ + if (librbd_loaded) { + return 0; + } + + if (!g_module_supported()) { + error_report("modules are not supported on this platform: %s", + g_module_error()); + return -1; + } + + librbd_handle = g_module_open("librbd.so.1", 0); + if (!librbd_handle) { + error_report("error loading librbd: %s", g_module_error()); + return -1; + } + + /* + * Due to c++ templates used in librbd/librados and their + * dependencies, and linker duplicate trimming rules, closing + * librbd would leave it mapped. Make this explicit. + */ + g_module_make_resident(librbd_handle); + + if (qemu_rbd_set_mandatory_functions() < 0) { + return -1; + } + qemu_rbd_set_optional_functions(); + librbd_loaded = true; + + return 0; +} + block_init(bdrv_rbd_init); diff --git a/block/rbd_types.h b/block/rbd_types.h new file mode 100644 index 0000000..54df8d6 --- /dev/null +++ b/block/rbd_types.h @@ -0,0 +1,95 @@ +/* + * Types and signatures for librados and librbd + * + * Copyright (C) 2013 Inktank Storage Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef QEMU_BLOCK_RBD_TYPES_H +#define QEMU_BLOCK_RBD_TYPES_H + +/* types from librados used by the rbd block driver */ + +typedef void *rados_t; +typedef void *rados_ioctx_t; + +typedef struct LibradosFuncs { + int (*rados_create)(rados_t *cluster, const char * const id); + int (*rados_connect)(rados_t cluster); + void (*rados_shutdown)(rados_t cluster); + int (*rados_conf_read_file)(rados_t cluster, const char *path); + int (*rados_conf_set)(rados_t cluster, const char *option, + const char *value); + int (*rados_ioctx_create)(rados_t cluster, const char *pool_name, + rados_ioctx_t *ioctx); + void (*rados_ioctx_destroy)(rados_ioctx_t io); +} LibradosFuncs; + +/* types from librbd used by the rbd block driver*/ + +typedef void *rbd_image_t; +typedef void *rbd_completion_t; +typedef void (*rbd_callback_t)(rbd_completion_t cb, void *arg); + +typedef struct { + uint64_t id; + uint64_t size; + const char *name; +} rbd_snap_info_t; + +#define RBD_MAX_IMAGE_NAME_SIZE 96 +#define RBD_MAX_BLOCK_NAME_SIZE 24 + +typedef struct { + uint64_t size; + uint64_t obj_size; + uint64_t num_objs; + int order; + char block_name_prefix[RBD_MAX_BLOCK_NAME_SIZE]; + int64_t parent_pool; + char parent_name[RBD_MAX_IMAGE_NAME_SIZE]; +} rbd_image_info_t; + +typedef struct LibrbdFuncs { + int (*rbd_create)(rados_ioctx_t io, const char *name, uint64_t size, + int *order); + int (*rbd_open)(rados_ioctx_t io, const char *name, rbd_image_t *image, + const char *snap_name); + int (*rbd_close)(rbd_image_t image); + int (*rbd_resize)(rbd_image_t image, uint64_t size); + int (*rbd_stat)(rbd_image_t image, rbd_image_info_t *info, + size_t infosize); + int (*rbd_snap_list)(rbd_image_t image, rbd_snap_info_t *snaps, + int *max_snaps); + void (*rbd_snap_list_end)(rbd_snap_info_t *snaps); + int (*rbd_snap_create)(rbd_image_t image, const char *snapname); + int (*rbd_snap_remove)(rbd_image_t image, const char *snapname); + int (*rbd_snap_rollback)(rbd_image_t image, const char *snapname); + int (*rbd_aio_write)(rbd_image_t image, uint64_t off, size_t len, + const char *buf, rbd_completion_t c); + int (*rbd_aio_read)(rbd_image_t image, uint64_t off, size_t len, + char *buf, rbd_completion_t c); + int (*rbd_aio_discard)(rbd_image_t image, uint64_t off, uint64_t len, + rbd_completion_t c); + int (*rbd_aio_create_completion)(void *cb_arg, + rbd_callback_t complete_cb, + rbd_completion_t *c); + ssize_t (*rbd_aio_get_return_value)(rbd_completion_t c); + void (*rbd_aio_release)(rbd_completion_t c); + int (*rbd_flush)(rbd_image_t image); + int (*rbd_aio_flush)(rbd_image_t image, rbd_completion_t c); +} LibrbdFuncs; + +#endif diff --git a/configure b/configure index f2af714..05e062a 100755 --- a/configure +++ b/configure @@ -214,7 +214,6 @@ zero_malloc="" trace_backend="nop" trace_file="trace" spice="" -rbd="" smartcard_nss="" usb_redir="" glx="" @@ -862,10 +861,6 @@ for opt do ;; --enable-glx) glx="yes" ;; - --disable-rbd) rbd="no" - ;; - --enable-rbd) rbd="yes" - ;; --disable-xfsctl) xfs="no" ;; --enable-xfsctl) xfs="yes" @@ -1148,7 +1143,6 @@ echo " --with-trace-file=NAME Full PATH,NAME of file to store traces" echo " Default:trace-<pid>" echo " --disable-spice disable spice" echo " --enable-spice enable spice" -echo " --enable-rbd enable building the rados block device (rbd)" echo " --disable-libiscsi disable iscsi support" echo " --enable-libiscsi enable iscsi support" echo " --disable-smartcard-nss disable smartcard nss support" @@ -2195,10 +2189,10 @@ if test "$mingw32" = yes; then else glib_req_ver=2.12 fi -if $pkg_config --atleast-version=$glib_req_ver gthread-2.0 > /dev/null 2>&1 +if $pkg_config --atleast-version=$glib_req_ver gthread-2.0 gmodule-2.0 > /dev/null 2>&1 then - glib_cflags=`$pkg_config --cflags gthread-2.0 2>/dev/null` - glib_libs=`$pkg_config --libs gthread-2.0 2>/dev/null` + glib_cflags=`$pkg_config --cflags gthread-2.0 gmodule-2.0 2>/dev/null` + glib_libs=`$pkg_config --libs gthread-2.0 gmodule-2.0 2>/dev/null` LIBS="$glib_libs $LIBS" libs_qga="$glib_libs $libs_qga" else @@ -2305,31 +2299,6 @@ if test "$mingw32" != yes -a "$pthread" = no; then fi ########################################## -# rbd probe -if test "$rbd" != "no" ; then - cat > $TMPC <<EOF -#include <stdio.h> -#include <rbd/librbd.h> -int main(void) { - rados_t cluster; - rados_create(&cluster, NULL); - return 0; -} -EOF - rbd_libs="-lrbd -lrados" - if compile_prog "" "$rbd_libs" ; then - rbd=yes - libs_tools="$rbd_libs $libs_tools" - libs_softmmu="$rbd_libs $libs_softmmu" - else - if test "$rbd" = "yes" ; then - feature_not_found "rados block device" - fi - rbd=no - fi -fi - -########################################## # linux-aio probe if test "$linux_aio" != "no" ; then @@ -3427,7 +3396,6 @@ echo "vhost-net support $vhost_net" echo "Trace backend $trace_backend" echo "Trace output file $trace_file-<pid>" echo "spice support $spice ($spice_protocol_version/$spice_server_version)" -echo "rbd support $rbd" echo "xfsctl support $xfs" echo "nss used $smartcard_nss" echo "usb net redir $usb_redir" @@ -3764,9 +3732,6 @@ echo "CONFIG_UNAME_RELEASE=\"$uname_release\"" >> $config_host_mak if test "$zero_malloc" = "yes" ; then echo "CONFIG_ZERO_MALLOC=y" >> $config_host_mak fi -if test "$rbd" = "yes" ; then - echo "CONFIG_RBD=y" >> $config_host_mak -fi if test "$coroutine_backend" = "ucontext" ; then echo "CONFIG_UCONTEXT_COROUTINE=y" >> $config_host_mak
This allows the rbd block driver to detect symbols in the installed version of librbd, and enable or disable features appropriately. This obviates the #ifdefs regarding librbd versions. Loading librbd dynamically also makes the rbd block driver easier to install and package, since it removes the dependency on librbd at build time. Add structures containing the necessary function pointer signatures and types from librbd, and fill them in the first time the rbd module is used. Use glib's g_module interface so we don't preclude future portability, and don't have to deal with odd dlopen behavior directly. Internally, librbd and some libraries it depends on use C++ templates, which mean that they each contain a defined weak symbol for their definition. Due to the way the linker resolves duplicate symbols, the libraries loaded by librbd end up using the template definitions from librbd, creating a circular dependency. This means that once librbd is loaded, it won't be unloaded. Changing this behavior might work with a Sun ld, but there doesn't seem to be a portable (or even working with GNU ld) way to hide these C++ symbols correctly. Instead, never unload librbd, and explicitly make it resident. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> --- block/Makefile.objs | 3 +- block/rbd.c | 292 ++++++++++++++++++++++++++++++++++++--------------- block/rbd_types.h | 95 +++++++++++++++++ configure | 41 +------- 4 files changed, 309 insertions(+), 122 deletions(-) create mode 100644 block/rbd_types.h