diff mbox

ceph/rbd block driver for qemu-kvm (v8)

Message ID 20101117214223.GA31321@sir.home
State New
Headers show

Commit Message

Christian Brunner Nov. 17, 2010, 9:42 p.m. UTC
Here is another update for the ceph storage driver. It includes changes
for the annotations Stefan made last week and a bit more things Sage
discovered while looking over the driver again.

I really hope that this time we are not only close, but have reached
a quality that everyone is satisfied with. - Of course suggestions for 
further improvements are always welcome.

Regards,
Christian


RBD is an block driver for the distributed file system Ceph
(http://ceph.newdream.net/). This driver uses librados (which
is part of the Ceph server) for direct access to the Ceph object
store and is running entirely in userspace (Yehuda also
wrote a driver for the linux kernel, that can be used to access
rbd volumes as a block device).
---
 Makefile.objs     |    1 +
 block/rbd.c       | 1059 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/rbd_types.h |   71 ++++
 configure         |   31 ++
 4 files changed, 1162 insertions(+), 0 deletions(-)
 create mode 100644 block/rbd.c
 create mode 100644 block/rbd_types.h

Comments

Stefan Hajnoczi Nov. 18, 2010, 10:48 a.m. UTC | #1
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Christian Brunner Nov. 26, 2010, 9:59 p.m. UTC | #2
Thanks for the review. What am I supposed to do now?

Christian

2010/11/18 Stefan Hajnoczi <stefanha@gmail.com>:
> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Stefan Hajnoczi Nov. 27, 2010, 7:12 a.m. UTC | #3
On Fri, Nov 26, 2010 at 9:59 PM, Christian Brunner
<c.m.brunner@gmail.com> wrote:
> Thanks for the review. What am I supposed to do now?

Kevin is the block maintainer.  His review is the next step, I have
CCed him.  After that rbd would be ready to merge.

Stefan
Kevin Wolf Nov. 29, 2010, 8:59 a.m. UTC | #4
Am 27.11.2010 08:12, schrieb Stefan Hajnoczi:
> On Fri, Nov 26, 2010 at 9:59 PM, Christian Brunner
> <c.m.brunner@gmail.com> wrote:
>> Thanks for the review. What am I supposed to do now?
> 
> Kevin is the block maintainer.  His review is the next step, I have
> CCed him.  After that rbd would be ready to merge.

If I don't find anything really obvious and it doesn't break the build,
I'll merge it based on your review.

Kevin
Kevin Wolf Nov. 29, 2010, 10:02 a.m. UTC | #5
Am 29.11.2010 09:59, schrieb Kevin Wolf:
> Am 27.11.2010 08:12, schrieb Stefan Hajnoczi:
>> On Fri, Nov 26, 2010 at 9:59 PM, Christian Brunner
>> <c.m.brunner@gmail.com> wrote:
>>> Thanks for the review. What am I supposed to do now?
>>
>> Kevin is the block maintainer.  His review is the next step, I have
>> CCed him.  After that rbd would be ready to merge.
> 
> If I don't find anything really obvious and it doesn't break the build,
> I'll merge it based on your review.

Which librados version is this supposed to require? My F12 one seems to
be too old, however configure still automatically enables it (so the
build fails in the default configuration for me). I think you need to
add some check there.

$ rpm -q ceph-devel
ceph-devel-0.20.2-1.fc12.x86_64

$ LANG=C make
  CC    block/rbd.o
block/rbd.c: In function 'rbd_register_image':
block/rbd.c:191: error: 'CEPH_OSD_TMAP_SET' undeclared (first use in
this function)
block/rbd.c:191: error: (Each undeclared identifier is reported only once
block/rbd.c:191: error: for each function it appears in.)
cc1: warnings being treated as errors
block/rbd.c: In function 'rbd_set_snapc':
block/rbd.c:468: error: implicit declaration of function
'rados_set_snap_context'
block/rbd.c:468: error: nested extern declaration of
'rados_set_snap_context'
block/rbd.c: In function 'rbd_snap_create':
block/rbd.c:844: error: implicit declaration of function
'rados_selfmanaged_snap_create'
block/rbd.c:844: error: nested extern declaration of
'rados_selfmanaged_snap_create'
make: *** [block/rbd.o] Error 1

Kevin
Kevin Wolf Dec. 6, 2010, 12:48 p.m. UTC | #6
Am 17.11.2010 22:42, schrieb Christian Brunner:
> Here is another update for the ceph storage driver. It includes changes
> for the annotations Stefan made last week and a bit more things Sage
> discovered while looking over the driver again.
> 
> I really hope that this time we are not only close, but have reached
> a quality that everyone is satisfied with. - Of course suggestions for 
> further improvements are always welcome.
> 
> Regards,
> Christian
> 
> 
> RBD is an block driver for the distributed file system Ceph
> (http://ceph.newdream.net/). This driver uses librados (which
> is part of the Ceph server) for direct access to the Ceph object
> store and is running entirely in userspace (Yehuda also
> wrote a driver for the linux kernel, that can be used to access
> rbd volumes as a block device).
> ---
>  Makefile.objs     |    1 +
>  block/rbd.c       | 1059 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/rbd_types.h |   71 ++++
>  configure         |   31 ++
>  4 files changed, 1162 insertions(+), 0 deletions(-)
>  create mode 100644 block/rbd.c
>  create mode 100644 block/rbd_types.h

This lacks a Signed-off-by. Please merge Yehuda's fix for configure when
you resend the patch.

What's the easiest way to try it out? I tried to use vstart.sh and copy
the generated ceph.conf to /etc/ceph/ceph.conf so that qemu-img etc.
find the monitor address. However, that leads to a hang when I try "rbd
list" or "./qemu-img create -f rbd rbd:data/test.img 4G", so I seem to
be missing something.

The only thing I have achieved until now with my attempts of trying it
out (and trying wrong things, of course) is that I stumbled over the the
following segfault in librados:

Program received signal SIGSEGV, Segmentation fault.
Objecter::shutdown (this=0x0) at osdc/Objecter.cc:59
59        assert(client_lock.is_locked());  // otherwise event
cancellation is unsafe
(gdb) bt
#0  Objecter::shutdown (this=0x0) at osdc/Objecter.cc:59
#1  0x00007ffff7ca5ce4 in RadosClient::shutdown (this=0xa58a90) at
librados.cc:392
#2  0x00007ffff7ca8ccc in rados_deinitialize () at librados.cc:1770
#3  0x000000000043150c in rbd_create (filename=<value optimized out>,
options=<value optimized out>) at block/rbd.c:304
#4  0x0000000000405f10 in img_create (argc=5, argv=0x7fffffffde80) at
qemu-img.c:409
#5  0x0000003c9f01eb1d in __libc_start_main () from /lib64/libc.so.6
#6  0x0000000000403999 in _start ()

Kevin
Yehuda Sadeh Weinraub Dec. 6, 2010, 7:22 p.m. UTC | #7
On Mon, Dec 6, 2010 at 4:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>
>
> What's the easiest way to try it out? I tried to use vstart.sh and copy
> the generated ceph.conf to /etc/ceph/ceph.conf so that qemu-img etc.
> find the monitor address. However, that leads to a hang when I try "rbd
> list" or "./qemu-img create -f rbd rbd:data/test.img 4G", so I seem to
> be missing something.

What ceph version are you running? Is your system up? What's the 'ceph
-s' output?

>
> The only thing I have achieved until now with my attempts of trying it
> out (and trying wrong things, of course) is that I stumbled over the the
> following segfault in librados:
>
> Program received signal SIGSEGV, Segmentation fault.
> Objecter::shutdown (this=0x0) at osdc/Objecter.cc:59
> 59        assert(client_lock.is_locked());  // otherwise event
> cancellation is unsafe
> (gdb) bt
> #0  Objecter::shutdown (this=0x0) at osdc/Objecter.cc:59
> #1  0x00007ffff7ca5ce4 in RadosClient::shutdown (this=0xa58a90) at
> librados.cc:392
> #2  0x00007ffff7ca8ccc in rados_deinitialize () at librados.cc:1770
> #3  0x000000000043150c in rbd_create (filename=<value optimized out>,
> options=<value optimized out>) at block/rbd.c:304
> #4  0x0000000000405f10 in img_create (argc=5, argv=0x7fffffffde80) at
> qemu-img.c:409
> #5  0x0000003c9f01eb1d in __libc_start_main () from /lib64/libc.so.6
> #6  0x0000000000403999 in _start ()
>

This was a bug in the librados C interface. Basically it ignored
errors when doing initialization and later on when trying to clean up
stuff after failing to do some operation (since it failed to init) it
crashed. I pushed a fix for that to the ceph rc branch (and also to
the unstable branch).

The question is still why it failed to initialize in the first place.
Were there any other messages printed? It could be that it still
couldn't find the monitors, or that it failed to authenticate for some
reason (if cephx was being used). you can try turning on several ceph
modules logs by adding the following to your ceph.conf in the global
section:

    debug ms = 1
    debug rados = 20
    debug monc = 10
    debug objecter = 10

If everything seems ok and it still doesn't work you can try to run
the rbd utility:

  $ ./rbd create test.img --size=4096 -p data

and you can add '--debug-ms=1 --debug-rados=20 --debug-...' to the
comman line too.

Let us know if you still have any problems.

Thanks,
Yehuda


Thanks,
Yehuda
Christian Brunner Dec. 6, 2010, 8:15 p.m. UTC | #8
2010/12/6 Kevin Wolf <kwolf@redhat.com>:

Hi Kevin,

> This lacks a Signed-off-by. Please merge Yehuda's fix for configure when
> you resend the patch.

I've sent an updated patch.

> What's the easiest way to try it out? I tried to use vstart.sh and copy
> the generated ceph.conf to /etc/ceph/ceph.conf so that qemu-img etc.
> find the monitor address. However, that leads to a hang when I try "rbd
> list" or "./qemu-img create -f rbd rbd:data/test.img 4G", so I seem to
> be missing something.

The most simple ceph.conf I can think about, is the following:

[global]
        auth supported = none

[mon]
        mon data = /ceph/mon$id

[mon0]
        host = {hostname}
        mon addr = 127.0.0.1:6789

[osd]
        osd data = /ceph/osd\$id

[osd0]
        host = {hostname}
        btrfs devs = {devicename}


Replace {hostname} with your `hostname -s` and {devicename} with the
name of an empty volume. Create a directory for the monitor and a
mountpoint for the osd volume:

# mkdir -p /ceph/mon0
# mkdir -p /ceph/osd0

After you have created the ceph.conf file, you can create your
ceph-filesystem with the following command (attention - this will
format the configured volume):

# mkcephfs -c /etc/ceph/ceph.conf --mkbtrfs -a

Now you should be able to start ceph (assuming you are using the redhat rpm):

# service ceph start

Check if ceph is running with `ceph -w` or `rados df`. `qemu-img
create -f rbd rbd:data/test.img 4G` should work now, too.

Regards
Christian
diff mbox

Patch

diff --git a/Makefile.objs b/Makefile.objs
index 6ee077c..56a13c1 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -19,6 +19,7 @@  block-nested-y += parallels.o nbd.o blkdebug.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
+block-nested-$(CONFIG_RBD) += rbd.o
 
 block-obj-y +=  $(addprefix block/, $(block-nested-y))
 
diff --git a/block/rbd.c b/block/rbd.c
new file mode 100644
index 0000000..249a590
--- /dev/null
+++ b/block/rbd.c
@@ -0,0 +1,1059 @@ 
+/*
+ * QEMU Block driver for RADOS (Ceph)
+ *
+ * Copyright (C) 2010 Christian Brunner <chb@muc.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "qemu-error.h"
+
+#include "rbd_types.h"
+#include "block_int.h"
+
+#include <rados/librados.h>
+
+
+
+/*
+ * When specifying the image filename use:
+ *
+ * rbd:poolname/devicename
+ *
+ * poolname must be the name of an existing rados pool
+ *
+ * devicename is the basename for all objects used to
+ * emulate the raw device.
+ *
+ * Metadata information (image size, ...) is stored in an
+ * object with the name "devicename.rbd".
+ *
+ * The raw device is split into 4MB sized objects by default.
+ * The sequencenumber is encoded in a 12 byte long hex-string,
+ * and is attached to the devicename, separated by a dot.
+ * e.g. "devicename.1234567890ab"
+ *
+ */
+
+#define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
+
+typedef struct RBDAIOCB {
+    BlockDriverAIOCB common;
+    QEMUBH *bh;
+    int ret;
+    QEMUIOVector *qiov;
+    char *bounce;
+    int write;
+    int64_t sector_num;
+    int aiocnt;
+    int error;
+    struct BDRVRBDState *s;
+    int cancelled;
+} RBDAIOCB;
+
+typedef struct RADOSCB {
+    int rcbid;
+    RBDAIOCB *acb;
+    struct BDRVRBDState *s;
+    int done;
+    int64_t segsize;
+    char *buf;
+    int ret;
+} RADOSCB;
+
+#define RBD_FD_READ 0
+#define RBD_FD_WRITE 1
+
+typedef struct BDRVRBDState {
+    int fds[2];
+    rados_pool_t pool;
+    rados_pool_t header_pool;
+    char name[RBD_MAX_OBJ_NAME_SIZE];
+    char block_name[RBD_MAX_BLOCK_NAME_SIZE];
+    uint64_t size;
+    uint64_t objsize;
+    int qemu_aio_count;
+    int event_reader_pos;
+    RADOSCB *event_rcb;
+} BDRVRBDState;
+
+typedef struct rbd_obj_header_ondisk RbdHeader1;
+
+static void rbd_aio_bh_cb(void *opaque);
+
+static int rbd_next_tok(char *dst, int dst_len,
+                        char *src, char delim,
+                        const char *name,
+                        char **p)
+{
+    int l;
+    char *end;
+
+    *p = NULL;
+
+    if (delim != '\0') {
+        end = strchr(src, delim);
+        if (end) {
+            *p = end + 1;
+            *end = '\0';
+        }
+    }
+    l = strlen(src);
+    if (l >= dst_len) {
+        error_report("%s too long", name);
+        return -EINVAL;
+    } else if (l == 0) {
+        error_report("%s too short", name);
+        return -EINVAL;
+    }
+
+    pstrcpy(dst, dst_len, src);
+
+    return 0;
+}
+
+static int rbd_parsename(const char *filename,
+                         char *pool, int pool_len,
+                         char *snap, int snap_len,
+                         char *name, int name_len)
+{
+    const char *start;
+    char *p, *buf;
+    int ret;
+
+    if (!strstart(filename, "rbd:", &start)) {
+        return -EINVAL;
+    }
+
+    buf = qemu_strdup(start);
+    p = buf;
+
+    ret = rbd_next_tok(pool, pool_len, p, '/', "pool name", &p);
+    if (ret < 0 || !p) {
+        ret = -EINVAL;
+        goto done;
+    }
+    ret = rbd_next_tok(name, name_len, p, '@', "object name", &p);
+    if (ret < 0) {
+        goto done;
+    }
+    if (!p) {
+        *snap = '\0';
+        goto done;
+    }
+
+    ret = rbd_next_tok(snap, snap_len, p, '\0', "snap name", &p);
+
+done:
+    qemu_free(buf);
+    return ret;
+}
+
+static int create_tmap_op(uint8_t op, const char *name, char **tmap_desc)
+{
+    uint32_t len = strlen(name);
+    uint32_t len_le = cpu_to_le32(len);
+    /* total_len = encoding op + name + empty buffer */
+    uint32_t total_len = 1 + (sizeof(uint32_t) + len) + sizeof(uint32_t);
+    uint8_t *desc = NULL;
+
+    desc = qemu_malloc(total_len);
+
+    *tmap_desc = (char *)desc;
+
+    *desc = op;
+    desc++;
+    memcpy(desc, &len_le, sizeof(len_le));
+    desc += sizeof(len_le);
+    memcpy(desc, name, len);
+    desc += len;
+    len = 0; /* no need for endian conversion for 0 */
+    memcpy(desc, &len, sizeof(len));
+    desc += sizeof(len);
+
+    return (char *)desc - *tmap_desc;
+}
+
+static void free_tmap_op(char *tmap_desc)
+{
+    qemu_free(tmap_desc);
+}
+
+static int rbd_register_image(rados_pool_t pool, const char *name)
+{
+    char *tmap_desc;
+    const char *dir = RBD_DIRECTORY;
+    int ret;
+
+    ret = create_tmap_op(CEPH_OSD_TMAP_SET, name, &tmap_desc);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = rados_tmap_update(pool, dir, tmap_desc, ret);
+    free_tmap_op(tmap_desc);
+
+    return ret;
+}
+
+static int touch_rbd_info(rados_pool_t pool, const char *info_oid)
+{
+    int r = rados_write(pool, info_oid, 0, NULL, 0);
+    if (r < 0) {
+        return r;
+    }
+    return 0;
+}
+
+static int rbd_assign_bid(rados_pool_t pool, uint64_t *id)
+{
+    uint64_t out[1];
+    const char *info_oid = RBD_INFO;
+
+    *id = 0;
+
+    int r = touch_rbd_info(pool, info_oid);
+    if (r < 0) {
+        return r;
+    }
+
+    r = rados_exec(pool, info_oid, "rbd", "assign_bid", NULL,
+                   0, (char *)out, sizeof(out));
+    if (r < 0) {
+        return r;
+    }
+
+    le64_to_cpus(out);
+    *id = out[0];
+
+    return 0;
+}
+
+static int rbd_create(const char *filename, QEMUOptionParameter *options)
+{
+    int64_t bytes = 0;
+    int64_t objsize;
+    uint64_t size;
+    time_t mtime;
+    uint8_t obj_order = RBD_DEFAULT_OBJ_ORDER;
+    char pool[RBD_MAX_SEG_NAME_SIZE];
+    char n[RBD_MAX_SEG_NAME_SIZE];
+    char name[RBD_MAX_OBJ_NAME_SIZE];
+    char snap_buf[RBD_MAX_SEG_NAME_SIZE];
+    char *snap = NULL;
+    RbdHeader1 header;
+    rados_pool_t p;
+    uint64_t bid;
+    uint32_t hi, lo;
+    int ret;
+
+    if (rbd_parsename(filename,
+                      pool, sizeof(pool),
+                      snap_buf, sizeof(snap_buf),
+                      name, sizeof(name)) < 0) {
+        return -EINVAL;
+    }
+    if (snap_buf[0] != '\0') {
+        snap = snap_buf;
+    }
+
+    snprintf(n, sizeof(n), "%s%s", name, RBD_SUFFIX);
+
+    /* Read out options */
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            bytes = options->value.n;
+        } else if (!strcmp(options->name, BLOCK_OPT_CLUSTER_SIZE)) {
+            if (options->value.n) {
+                objsize = options->value.n;
+                if ((objsize - 1) & objsize) {    /* not a power of 2? */
+                    error_report("obj size needs to be power of 2");
+                    return -EINVAL;
+                }
+                if (objsize < 4096) {
+                    error_report("obj size too small");
+                    return -EINVAL;
+                }
+		obj_order = ffs(objsize) - 1;
+            }
+        }
+        options++;
+    }
+
+    memset(&header, 0, sizeof(header));
+    pstrcpy(header.text, sizeof(header.text), RBD_HEADER_TEXT);
+    pstrcpy(header.signature, sizeof(header.signature), RBD_HEADER_SIGNATURE);
+    pstrcpy(header.version, sizeof(header.version), RBD_HEADER_VERSION);
+    header.image_size = cpu_to_le64(bytes);
+    header.options.order = obj_order;
+    header.options.crypt_type = RBD_CRYPT_NONE;
+    header.options.comp_type = RBD_COMP_NONE;
+    header.snap_seq = 0;
+    header.snap_count = 0;
+
+    if (rados_initialize(0, NULL) < 0) {
+        error_report("error initializing");
+        return -EIO;
+    }
+
+    if (rados_open_pool(pool, &p)) {
+        error_report("error opening pool %s", pool);
+        rados_deinitialize();
+        return -EIO;
+    }
+
+    /* check for existing rbd header file */
+    ret = rados_stat(p, n, &size, &mtime);
+    if (ret == 0) {
+        ret=-EEXIST;
+        goto done;
+    }
+
+    ret = rbd_assign_bid(p, &bid);
+    if (ret < 0) {
+        error_report("failed assigning block id");
+        rados_deinitialize();
+        return -EIO;
+    }
+    hi = bid >> 32;
+    lo = bid & 0xFFFFFFFF;
+    snprintf(header.block_name, sizeof(header.block_name), "rb.%x.%x", hi, lo);
+
+    /* create header file */
+    ret = rados_write(p, n, 0, (const char *)&header, sizeof(header));
+    if (ret < 0) {
+        goto done;
+    }
+
+    ret = rbd_register_image(p, name);
+done:
+    rados_close_pool(p);
+    rados_deinitialize();
+
+    return ret;
+}
+
+/*
+ * This aio completion is being called from rbd_aio_event_reader() and
+ * runs in qemu context. It schedules a bh, but just in case the aio
+ * was not cancelled before.
+ */
+static void rbd_complete_aio(RADOSCB *rcb)
+{
+    RBDAIOCB *acb = rcb->acb;
+    int64_t r;
+
+    acb->aiocnt--;
+
+    if (acb->cancelled) {
+        if (!acb->aiocnt) {
+            qemu_vfree(acb->bounce);
+            qemu_aio_release(acb);
+        }
+        goto done;
+    }
+
+    r = rcb->ret;
+
+    if (acb->write) {
+        if (r < 0) {
+            acb->ret = r;
+            acb->error = 1;
+        } else if (!acb->error) {
+            acb->ret += rcb->segsize;
+        }
+    } else {
+        if (r == -ENOENT) {
+            memset(rcb->buf, 0, rcb->segsize);
+            if (!acb->error) {
+                acb->ret += rcb->segsize;
+            }
+        } else if (r < 0) {
+	    memset(rcb->buf, 0, rcb->segsize);
+            acb->ret = r;
+            acb->error = 1;
+        } else if (r < rcb->segsize) {
+            memset(rcb->buf + r, 0, rcb->segsize - r);
+            if (!acb->error) {
+                acb->ret += rcb->segsize;
+            }
+        } else if (!acb->error) {
+            acb->ret += r;
+        }
+    }
+    /* Note that acb->bh can be NULL in case where the aio was cancelled */
+    if (!acb->aiocnt) {
+        acb->bh = qemu_bh_new(rbd_aio_bh_cb, acb);
+        qemu_bh_schedule(acb->bh);
+    }
+done:
+    qemu_free(rcb);
+}
+
+/*
+ * aio fd read handler. It runs in the qemu context and calls the
+ * completion handling of completed rados aio operations.
+ */
+static void rbd_aio_event_reader(void *opaque)
+{
+    BDRVRBDState *s = opaque;
+
+    ssize_t ret;
+
+    do {
+        char *p = (char *)&s->event_rcb;
+
+        /* now read the rcb pointer that was sent from a non qemu thread */
+        if ((ret = read(s->fds[RBD_FD_READ], p + s->event_reader_pos,
+                        sizeof(s->event_rcb) - s->event_reader_pos)) > 0) {
+            if (ret > 0) {
+                s->event_reader_pos += ret;
+                if (s->event_reader_pos == sizeof(s->event_rcb)) {
+                    s->event_reader_pos = 0;
+                    rbd_complete_aio(s->event_rcb);
+                    s->qemu_aio_count --;
+                }
+            }
+        }
+    } while (ret < 0 && errno == EINTR);
+}
+
+static int rbd_aio_flush_cb(void *opaque)
+{
+    BDRVRBDState *s = opaque;
+
+    return (s->qemu_aio_count > 0);
+}
+
+
+static int rbd_set_snapc(rados_pool_t pool, const char *snap, RbdHeader1 *header)
+{
+    uint32_t snap_count = le32_to_cpu(header->snap_count);
+    rados_snap_t *snaps = NULL;
+    rados_snap_t seq;
+    uint32_t i;
+    uint64_t snap_names_len = le64_to_cpu(header->snap_names_len);
+    int r;
+    rados_snap_t snapid = 0;
+
+    if (snap_count) {
+        const char *header_snap = (const char *)&header->snaps[snap_count];
+        const char *end = header_snap + snap_names_len;
+        snaps = qemu_malloc(sizeof(rados_snap_t) * header->snap_count);
+
+        for (i=0; i < snap_count; i++) {
+            snaps[i] = le64_to_cpu(header->snaps[i].id);
+
+            if (snap && strcmp(snap, header_snap) == 0) {
+                snapid = snaps[i];
+            }
+
+            header_snap += strlen(header_snap) + 1;
+            if (header_snap > end) {
+                error_report("bad header, snapshot list broken");
+            }
+        }
+    }
+
+    if (snap && !snapid) {
+        error_report("snapshot not found");
+        qemu_free(snaps);
+        return -ENOENT;
+    }
+    seq = le32_to_cpu(header->snap_seq);
+
+    r = rados_set_snap_context(pool, seq, snaps, snap_count);
+
+    rados_set_snap(pool, snapid);
+
+    qemu_free(snaps);
+
+    return r;
+}
+
+#define BUF_READ_START_LEN    4096
+
+static int rbd_read_header(BDRVRBDState *s, char **hbuf)
+{
+    char *buf = NULL;
+    char n[RBD_MAX_SEG_NAME_SIZE];
+    uint64_t len = BUF_READ_START_LEN;
+    int r;
+
+    snprintf(n, sizeof(n), "%s%s", s->name, RBD_SUFFIX);
+
+    buf = qemu_malloc(len);
+
+    r = rados_read(s->header_pool, n, 0, buf, len);
+    if (r < 0) {
+        goto failed;
+    }
+
+    if (r < len) {
+        goto done;
+    }
+
+    qemu_free(buf);
+    buf = qemu_malloc(len);
+
+    r = rados_stat(s->header_pool, n, &len, NULL);
+    if (r < 0) {
+        goto failed;
+    }
+
+    r = rados_read(s->header_pool, n, 0, buf, len);
+    if (r < 0) {
+        goto failed;
+    }
+
+done:
+    *hbuf = buf;
+    return 0;
+
+failed:
+    qemu_free(buf);
+    return r;
+}
+
+static int rbd_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVRBDState *s = bs->opaque;
+    RbdHeader1 *header;
+    char pool[RBD_MAX_SEG_NAME_SIZE];
+    char snap_buf[RBD_MAX_SEG_NAME_SIZE];
+    char *snap = NULL;
+    char *hbuf = NULL;
+    int r;
+
+    if (rbd_parsename(filename, pool, sizeof(pool),
+                      snap_buf, sizeof(snap_buf),
+                      s->name, sizeof(s->name)) < 0) {
+        return -EINVAL;
+    }
+    if (snap_buf[0] != '\0') {
+        snap = snap_buf;
+    }
+
+    if ((r = rados_initialize(0, NULL)) < 0) {
+        error_report("error initializing");
+        return r;
+    }
+
+    if ((r = rados_open_pool(pool, &s->pool))) {
+        error_report("error opening pool %s", pool);
+        rados_deinitialize();
+        return r;
+    }
+
+    if ((r = rados_open_pool(pool, &s->header_pool))) {
+        error_report("error opening pool %s", pool);
+        rados_deinitialize();
+        return r;
+    }
+
+    if ((r = rbd_read_header(s, &hbuf)) < 0) {
+        error_report("error reading header from %s", s->name);
+        goto failed;
+    }
+
+    if (memcmp(hbuf + 64, RBD_HEADER_SIGNATURE, 4)) {
+        error_report("Invalid header signature");
+        r = -EMEDIUMTYPE;
+        goto failed;
+    }
+
+    if (memcmp(hbuf + 68, RBD_HEADER_VERSION, 8)) {
+        error_report("Unknown image version");
+        r = -EMEDIUMTYPE;
+        goto failed;
+    }
+
+    header = (RbdHeader1 *) hbuf;
+    s->size = le64_to_cpu(header->image_size);
+    s->objsize = 1ULL << header->options.order;
+    memcpy(s->block_name, header->block_name, sizeof(header->block_name));
+
+    r = rbd_set_snapc(s->pool, snap, header);
+    if (r < 0) {
+        error_report("failed setting snap context: %s", strerror(-r));
+        goto failed;
+    }
+
+    bs->read_only = (snap != NULL);
+
+    s->event_reader_pos = 0;
+    r = qemu_pipe(s->fds);
+    if (r < 0) {
+        error_report("error opening eventfd");
+        goto failed;
+    }
+    fcntl(s->fds[0], F_SETFL, O_NONBLOCK);
+    fcntl(s->fds[1], F_SETFL, O_NONBLOCK);
+    qemu_aio_set_fd_handler(s->fds[RBD_FD_READ], rbd_aio_event_reader, NULL,
+        rbd_aio_flush_cb, NULL, s);
+
+    qemu_free(hbuf);
+
+    return 0;
+
+failed:
+    qemu_free(hbuf);
+
+    rados_close_pool(s->header_pool);
+    rados_close_pool(s->pool);
+    rados_deinitialize();
+    return r;
+}
+
+static void rbd_close(BlockDriverState *bs)
+{
+    BDRVRBDState *s = bs->opaque;
+
+    close(s->fds[0]);
+    close(s->fds[1]);
+    qemu_aio_set_fd_handler(s->fds[RBD_FD_READ], NULL , NULL, NULL, NULL,
+        NULL);
+
+    rados_close_pool(s->header_pool);
+    rados_close_pool(s->pool);
+    rados_deinitialize();
+}
+
+/*
+ * Cancel aio. Since we don't reference acb in a non qemu threads,
+ * it is safe to access it here.
+ */
+static void rbd_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    RBDAIOCB *acb = (RBDAIOCB *) blockacb;
+    acb->cancelled = 1;
+}
+
+static AIOPool rbd_aio_pool = {
+    .aiocb_size = sizeof(RBDAIOCB),
+    .cancel = rbd_aio_cancel,
+};
+
+/*
+ * This is the callback function for rados_aio_read and _write
+ *
+ * Note: this function is being called from a non qemu thread so
+ * we need to be careful about what we do here. Generally we only
+ * write to the block notification pipe, and do the rest of the
+ * io completion handling from rbd_aio_event_reader() which
+ * runs in a qemu context.
+ */
+static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb)
+{
+    int ret;
+    rcb->ret = rados_aio_get_return_value(c);
+    rados_aio_release(c);
+    while (1) {
+        fd_set wfd;
+        int fd = rcb->s->fds[RBD_FD_WRITE];
+
+        /* send the rcb pointer to the qemu thread that is responsible
+           for the aio completion. Must do it in a qemu thread context */
+        ret = write(fd, (void *)&rcb, sizeof(rcb));
+        if (ret >= 0) {
+            break;
+        }
+        if (errno == EINTR) {
+            continue;
+	}
+        if (errno != EAGAIN) {
+            break;
+	}
+
+        FD_ZERO(&wfd);
+        FD_SET(fd, &wfd);
+        do {
+            ret = select(fd + 1, NULL, &wfd, NULL, NULL);
+        } while (ret < 0 && errno == EINTR);
+    }
+
+    if (ret < 0) {
+        error_report("failed writing to acb->s->fds\n");
+        qemu_free(rcb);
+    }
+}
+
+/* Callback when all queued rados_aio requests are complete */
+
+static void rbd_aio_bh_cb(void *opaque)
+{
+    RBDAIOCB *acb = opaque;
+
+    if (!acb->write) {
+        qemu_iovec_from_buffer(acb->qiov, acb->bounce, acb->qiov->size);
+    }
+    qemu_vfree(acb->bounce);
+    acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
+    qemu_bh_delete(acb->bh);
+    acb->bh = NULL;
+
+    qemu_aio_release(acb);
+}
+
+static BlockDriverAIOCB *rbd_aio_rw_vector(BlockDriverState *bs,
+                                           int64_t sector_num,
+                                           QEMUIOVector *qiov,
+                                           int nb_sectors,
+                                           BlockDriverCompletionFunc *cb,
+                                           void *opaque, int write)
+{
+    RBDAIOCB *acb;
+    RADOSCB *rcb;
+    rados_completion_t c;
+    char n[RBD_MAX_SEG_NAME_SIZE];
+    int64_t segnr, segoffs, segsize, last_segnr;
+    int64_t off, size;
+    char *buf;
+
+    BDRVRBDState *s = bs->opaque;
+
+    acb = qemu_aio_get(&rbd_aio_pool, bs, cb, opaque);
+    acb->write = write;
+    acb->qiov = qiov;
+    acb->bounce = qemu_blockalign(bs, qiov->size);
+    acb->aiocnt = 0;
+    acb->ret = 0;
+    acb->error = 0;
+    acb->s = s;
+    acb->cancelled = 0;
+    acb->bh = NULL;
+
+    if (write) {
+        qemu_iovec_to_buffer(acb->qiov, acb->bounce);
+    }
+
+    buf = acb->bounce;
+
+    off = sector_num * BDRV_SECTOR_SIZE;
+    size = nb_sectors * BDRV_SECTOR_SIZE;
+    segnr = off / s->objsize;
+    segoffs = off % s->objsize;
+    segsize = s->objsize - segoffs;
+
+    last_segnr = ((off + size - 1) / s->objsize);
+    acb->aiocnt = (last_segnr - segnr) + 1;
+
+    s->qemu_aio_count += acb->aiocnt; /* All the RADOSCB */
+
+    while (size > 0) {
+        if (size < segsize) {
+            segsize = size;
+        }
+
+        snprintf(n, sizeof(n), "%s.%012" PRIx64, s->block_name,
+                 segnr);
+
+        rcb = qemu_malloc(sizeof(RADOSCB));
+        rcb->done = 0;
+        rcb->acb = acb;
+        rcb->segsize = segsize;
+        rcb->buf = buf;
+        rcb->s = acb->s;
+
+        if (write) {
+            rados_aio_create_completion(rcb, NULL,
+                                        (rados_callback_t) rbd_finish_aiocb,
+                                        &c);
+            rados_aio_write(s->pool, n, segoffs, buf, segsize, c);
+        } else {
+            rados_aio_create_completion(rcb,
+                                        (rados_callback_t) rbd_finish_aiocb,
+                                        NULL, &c);
+            rados_aio_read(s->pool, n, segoffs, buf, segsize, c);
+        }
+
+        buf += segsize;
+        size -= segsize;
+        segoffs = 0;
+        segsize = s->objsize;
+        segnr++;
+    }
+
+    return &acb->common;
+}
+
+static BlockDriverAIOCB *rbd_aio_readv(BlockDriverState * bs,
+                                       int64_t sector_num, QEMUIOVector * qiov,
+                                       int nb_sectors,
+                                       BlockDriverCompletionFunc * cb,
+                                       void *opaque)
+{
+    return rbd_aio_rw_vector(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+}
+
+static BlockDriverAIOCB *rbd_aio_writev(BlockDriverState * bs,
+                                        int64_t sector_num, QEMUIOVector * qiov,
+                                        int nb_sectors,
+                                        BlockDriverCompletionFunc * cb,
+                                        void *opaque)
+{
+    return rbd_aio_rw_vector(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
+}
+
+static int rbd_getinfo(BlockDriverState * bs, BlockDriverInfo * bdi)
+{
+    BDRVRBDState *s = bs->opaque;
+    bdi->cluster_size = s->objsize;
+    return 0;
+}
+
+static int64_t rbd_getlength(BlockDriverState * bs)
+{
+    BDRVRBDState *s = bs->opaque;
+
+    return s->size;
+}
+
+static int rbd_snap_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
+{
+    BDRVRBDState *s = bs->opaque;
+    char inbuf[512], outbuf[128];
+    uint64_t snap_id;
+    int r;
+    char *p = inbuf;
+    char *end = inbuf + sizeof(inbuf);
+    char n[RBD_MAX_SEG_NAME_SIZE];
+    char *hbuf = NULL;
+    RbdHeader1 *header;
+
+    if (sn_info->name[0] == '\0') {
+        return -EINVAL; /* we need a name for rbd snapshots */
+    }
+
+    /*
+     * rbd snapshots are using the name as the user controlled unique identifier
+     * we can't use the rbd snapid for that purpose, as it can't be set
+     */
+    if (sn_info->id_str[0] != '\0' &&
+        strcmp(sn_info->id_str, sn_info->name) != 0) {
+        return -EINVAL;
+    }
+
+    if (strlen(sn_info->name) >= sizeof(sn_info->id_str)) {
+        return -ERANGE;
+    }
+
+    r = rados_selfmanaged_snap_create(s->header_pool, &snap_id);
+    if (r < 0) {
+        error_report("failed to create snap id: %s", strerror(-r));
+        return r;
+    }
+
+    *(uint32_t *)p = strlen(sn_info->name);
+    cpu_to_le32s((uint32_t *)p);
+    p += sizeof(uint32_t);
+    strncpy(p, sn_info->name, end - p);
+    p += strlen(p);
+    if (p + sizeof(snap_id) > end) {
+        error_report("invalid input parameter");
+        return -EINVAL;
+    }
+
+    *(uint64_t *)p = snap_id;
+    cpu_to_le64s((uint64_t *)p);
+
+    snprintf(n, sizeof(n), "%s%s", s->name, RBD_SUFFIX);
+
+    r = rados_exec(s->header_pool, n, "rbd", "snap_add", inbuf,
+                   sizeof(inbuf), outbuf, sizeof(outbuf));
+    if (r < 0) {
+        error_report("rbd.snap_add execution failed failed: %s", strerror(-r));
+        return r;
+    }
+
+    sprintf(sn_info->id_str, "%s", sn_info->name);
+
+    r = rbd_read_header(s, &hbuf);
+    if (r < 0) {
+        error_report("failed reading header: %s", strerror(-r));
+        return r;
+    }
+
+    header = (RbdHeader1 *) hbuf;
+    r = rbd_set_snapc(s->pool, sn_info->name, header);
+    if (r < 0) {
+        error_report("failed setting snap context: %s", strerror(-r));
+        goto failed;
+    }
+
+    return 0;
+
+failed:
+    qemu_free(header);
+    return r;
+}
+
+static int decode32(char **p, const char *end, uint32_t *v)
+{
+    if (*p + 4 > end) {
+	return -ERANGE;
+    }
+
+    *v = *(uint32_t *)(*p);
+    le32_to_cpus(v);
+    *p += 4;
+    return 0;
+}
+
+static int decode64(char **p, const char *end, uint64_t *v)
+{
+    if (*p + 8 > end) {
+        return -ERANGE;
+    }
+
+    *v = *(uint64_t *)(*p);
+    le64_to_cpus(v);
+    *p += 8;
+    return 0;
+}
+
+static int decode_str(char **p, const char *end, char **s)
+{
+    uint32_t len;
+    int r;
+
+    if ((r = decode32(p, end, &len)) < 0) {
+        return r;
+    }
+
+    *s = qemu_malloc(len + 1);
+    memcpy(*s, *p, len);
+    *p += len;
+    (*s)[len] = '\0';
+
+    return len;
+}
+
+static int rbd_snap_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)
+{
+    BDRVRBDState *s = bs->opaque;
+    char n[RBD_MAX_SEG_NAME_SIZE];
+    QEMUSnapshotInfo *sn_info, *sn_tab = NULL;
+    RbdHeader1 *header;
+    char *hbuf = NULL;
+    char *outbuf = NULL, *end, *buf;
+    uint64_t len;
+    uint64_t snap_seq;
+    uint32_t snap_count;
+    int r, i;
+
+    /* read header to estimate how much space we need to read the snap
+     * list */
+    if ((r = rbd_read_header(s, &hbuf)) < 0) {
+        goto done_err;
+    }
+    header = (RbdHeader1 *)hbuf;
+    len = le64_to_cpu(header->snap_names_len);
+    len += 1024; /* should have already been enough, but new snapshots might
+                    already been created since we read the header. just allocate
+                    a bit more, so that in most cases it'll suffice anyway */
+    qemu_free(hbuf);
+
+    snprintf(n, sizeof(n), "%s%s", s->name, RBD_SUFFIX);
+    while (1) {
+        qemu_free(outbuf);
+        outbuf = qemu_malloc(len);
+
+        r = rados_exec(s->header_pool, n, "rbd", "snap_list", NULL, 0,
+                       outbuf, len);
+        if (r < 0) {
+            error_report("rbd.snap_list execution failed failed: %s", strerror(-r));
+            goto done_err;
+        }
+        if (r != len) {
+            break;
+	}
+
+        /* if we're here, we probably raced with some snaps creation */
+        len *= 2;
+    }
+    buf = outbuf;
+    end = buf + len;
+
+    if ((r = decode64(&buf, end, &snap_seq)) < 0) {
+        goto done_err;
+    }
+    if ((r = decode32(&buf, end, &snap_count)) < 0) {
+        goto done_err;
+    }
+
+    sn_tab = qemu_mallocz(snap_count * sizeof(QEMUSnapshotInfo));
+    for (i = 0; i < snap_count; i++) {
+        uint64_t id, image_size;
+        char *snap_name;
+
+        if ((r = decode64(&buf, end, &id)) < 0) {
+            goto done_err;
+        }
+        if ((r = decode64(&buf, end, &image_size)) < 0) {
+            goto done_err;
+        }
+        if ((r = decode_str(&buf, end, &snap_name)) < 0) {
+            goto done_err;
+        }
+
+        sn_info = sn_tab + i;
+        pstrcpy(sn_info->id_str, sizeof(sn_info->id_str), snap_name);
+        pstrcpy(sn_info->name, sizeof(sn_info->name), snap_name);
+        qemu_free(snap_name);
+
+        sn_info->vm_state_size = image_size;
+        sn_info->date_sec = 0;
+        sn_info->date_nsec = 0;
+        sn_info->vm_clock_nsec = 0;
+    }
+    *psn_tab = sn_tab;
+    qemu_free(outbuf);
+    return snap_count;
+done_err:
+    qemu_free(sn_tab);
+    qemu_free(outbuf);
+    return r;
+}
+
+static QEMUOptionParameter rbd_create_options[] = {
+    {
+     .name = BLOCK_OPT_SIZE,
+     .type = OPT_SIZE,
+     .help = "Virtual disk size"
+    },
+    {
+     .name = BLOCK_OPT_CLUSTER_SIZE,
+     .type = OPT_SIZE,
+     .help = "RBD object size"
+    },
+    {NULL}
+};
+
+static BlockDriver bdrv_rbd = {
+    .format_name        = "rbd",
+    .instance_size      = sizeof(BDRVRBDState),
+    .bdrv_file_open     = rbd_open,
+    .bdrv_close         = rbd_close,
+    .bdrv_create        = rbd_create,
+    .bdrv_get_info      = rbd_getinfo,
+    .create_options     = rbd_create_options,
+    .bdrv_getlength     = rbd_getlength,
+    .protocol_name      = "rbd",
+
+    .bdrv_aio_readv     = rbd_aio_readv,
+    .bdrv_aio_writev    = rbd_aio_writev,
+
+    .bdrv_snapshot_create = rbd_snap_create,
+    .bdrv_snapshot_list = rbd_snap_list,
+};
+
+static void bdrv_rbd_init(void)
+{
+    bdrv_register(&bdrv_rbd);
+}
+
+block_init(bdrv_rbd_init);
diff --git a/block/rbd_types.h b/block/rbd_types.h
new file mode 100644
index 0000000..f4cca99
--- /dev/null
+++ b/block/rbd_types.h
@@ -0,0 +1,71 @@ 
+/*
+ * Ceph - scalable distributed file system
+ *
+ * Copyright (C) 2004-2010 Sage Weil <sage@newdream.net>
+ *
+ * This is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2.1, as published by the Free Software
+ * Foundation.  See file COPYING.LIB.
+ *
+ */
+
+#ifndef CEPH_RBD_TYPES_H
+#define CEPH_RBD_TYPES_H
+
+
+/*
+ * rbd image 'foo' consists of objects
+ *   foo.rbd      - image metadata
+ *   foo.00000000
+ *   foo.00000001
+ *   ...          - data
+ */
+
+#define RBD_SUFFIX              ".rbd"
+#define RBD_DIRECTORY           "rbd_directory"
+#define RBD_INFO                "rbd_info"
+
+#define RBD_DEFAULT_OBJ_ORDER   22   /* 4MB */
+
+#define RBD_MAX_OBJ_NAME_SIZE   96
+#define RBD_MAX_BLOCK_NAME_SIZE 24
+#define RBD_MAX_SEG_NAME_SIZE   128
+
+#define RBD_COMP_NONE           0
+#define RBD_CRYPT_NONE          0
+
+#define RBD_HEADER_TEXT         "<<< Rados Block Device Image >>>\n"
+#define RBD_HEADER_SIGNATURE    "RBD"
+#define RBD_HEADER_VERSION      "001.005"
+
+struct rbd_info {
+    uint64_t max_id;
+} __attribute__ ((packed));
+
+struct rbd_obj_snap_ondisk {
+    uint64_t id;
+    uint64_t image_size;
+} __attribute__((packed));
+
+struct rbd_obj_header_ondisk {
+    char text[40];
+    char block_name[RBD_MAX_BLOCK_NAME_SIZE];
+    char signature[4];
+    char version[8];
+    struct {
+        uint8_t order;
+        uint8_t crypt_type;
+        uint8_t comp_type;
+        uint8_t unused;
+    } __attribute__((packed)) options;
+    uint64_t image_size;
+    uint64_t snap_seq;
+    uint32_t snap_count;
+    uint32_t reserved;
+    uint64_t snap_names_len;
+    struct rbd_obj_snap_ondisk snaps[0];
+} __attribute__((packed));
+
+
+#endif
diff --git a/configure b/configure
index af50607..5d8f620 100755
--- a/configure
+++ b/configure
@@ -325,6 +325,7 @@  cpu_emulation="yes"
 check_utests="no"
 user_pie="no"
 zero_malloc=""
+rbd=""
 
 # OS specific
 if check_define __linux__ ; then
@@ -724,6 +725,10 @@  for opt do
   ;;
   --*dir)
   ;;
+  --disable-rbd) rbd="no"
+  ;;
+  --enable-rbd) rbd="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -909,6 +914,7 @@  echo "  --enable-docs            enable documentation build"
 echo "  --disable-docs           disable documentation build"
 echo "  --disable-vhost-net      disable vhost-net acceleration support"
 echo "  --enable-vhost-net       enable vhost-net acceleration support"
+echo "  --enable-rbd             enable building the rados block device (rbd)"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -1755,6 +1761,27 @@  if test "$mingw32" != yes -a "$pthread" = no; then
 fi
 
 ##########################################
+# rbd probe
+if test "$rbd" != "no" ; then
+  cat > $TMPC <<EOF
+#include <stdio.h>
+#include <rados/librados.h>
+int main(void) { rados_initialize(0, NULL); return 0; }
+EOF
+  rbd_libs="-lrados -lcrypto"
+  if compile_prog "" "$rbd_libs" ; then
+    rbd=yes
+    libs_tools="$rbd_libs $libs_tools"
+    libs_softmmu="$rbd_libs $libs_softmmu"
+  else
+    if test "$rbd" = "yes" ; then
+      feature_not_found "rados block device"
+    fi
+    rbd=no
+  fi
+fi
+
+##########################################
 # linux-aio probe
 
 if test "$linux_aio" != "no" ; then
@@ -2256,6 +2283,7 @@  echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
 echo "uuid support      $uuid"
 echo "vhost-net support $vhost_net"
+echo "rbd support       $rbd"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2498,6 +2526,9 @@  echo "CONFIG_UNAME_RELEASE=\"$uname_release\"" >> $config_host_mak
 if test "$zero_malloc" = "yes" ; then
   echo "CONFIG_ZERO_MALLOC=y" >> $config_host_mak
 fi
+if test "$rbd" = "yes" ; then
+  echo "CONFIG_RBD=y" >> $config_host_mak
+fi
 
 # USB host support
 case "$usb" in