diff mbox

[v2] rbd: add an asynchronous flush

Message ID 1364587403-30689-1-git-send-email-josh.durgin@inktank.com
State New
Headers show

Commit Message

Josh Durgin March 29, 2013, 8:03 p.m. UTC
The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
which is sychronous and causes the main qemu thread to block until it
is complete. This results in unresponsiveness and extra latency for
the guest.

Fix this by using an asynchronous version of flush.  This was added to
librbd with a special #define to indicate its presence, since it will
be backported to stable versions. Thus, there is no need to check the
version of librbd.

Implement this as bdrv_aio_flush, since it matches other aio functions
in the rbd block driver, and leave out bdrv_co_flush_to_disk when the
asynchronous version is available.

Reported-by: Oliver Francke <oliver@filoo.de>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
---

v2:
*  include hunk treating write, discard, and flush completions
   the same, since they have no result data

 block/rbd.c |   37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

Comments

Kevin Wolf April 2, 2013, 2:10 p.m. UTC | #1
Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben:
> The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
> which is sychronous and causes the main qemu thread to block until it
> is complete. This results in unresponsiveness and extra latency for
> the guest.
> 
> Fix this by using an asynchronous version of flush.  This was added to
> librbd with a special #define to indicate its presence, since it will
> be backported to stable versions. Thus, there is no need to check the
> version of librbd.

librbd is linked dynamically and the version on the build host isn't
necessarily the same as the version qemu is run with. So shouldn't this
better be a runtime check?

> Implement this as bdrv_aio_flush, since it matches other aio functions
> in the rbd block driver, and leave out bdrv_co_flush_to_disk when the
> asynchronous version is available.
> 
> Reported-by: Oliver Francke <oliver@filoo.de>
> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>

Looks good otherwise.

Kevin
Josh Durgin April 10, 2013, 2:03 p.m. UTC | #2
On 04/02/2013 07:10 AM, Kevin Wolf wrote:
> Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben:
>> The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
>> which is sychronous and causes the main qemu thread to block until it
>> is complete. This results in unresponsiveness and extra latency for
>> the guest.
>>
>> Fix this by using an asynchronous version of flush.  This was added to
>> librbd with a special #define to indicate its presence, since it will
>> be backported to stable versions. Thus, there is no need to check the
>> version of librbd.
>
> librbd is linked dynamically and the version on the build host isn't
> necessarily the same as the version qemu is run with. So shouldn't this
> better be a runtime check?

While we discuss runtime loading separately, would you mind taking this
patch as-is for now?

Josh

>> Implement this as bdrv_aio_flush, since it matches other aio functions
>> in the rbd block driver, and leave out bdrv_co_flush_to_disk when the
>> asynchronous version is available.
>>
>> Reported-by: Oliver Francke <oliver@filoo.de>
>> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
>
> Looks good otherwise.
>
> Kevin
>
Stefan Hajnoczi April 11, 2013, 8:02 a.m. UTC | #3
On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote:
> On 04/02/2013 07:10 AM, Kevin Wolf wrote:
> >Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben:
> >>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
> >>which is sychronous and causes the main qemu thread to block until it
> >>is complete. This results in unresponsiveness and extra latency for
> >>the guest.
> >>
> >>Fix this by using an asynchronous version of flush.  This was added to
> >>librbd with a special #define to indicate its presence, since it will
> >>be backported to stable versions. Thus, there is no need to check the
> >>version of librbd.
> >
> >librbd is linked dynamically and the version on the build host isn't
> >necessarily the same as the version qemu is run with. So shouldn't this
> >better be a runtime check?
> 
> While we discuss runtime loading separately, would you mind taking this
> patch as-is for now?

Hi Josh,
I'm happy with Patch v3 1/2.  Does that work for you?

I don't want to take Patch v3 2/2 or the dlsym() function pointer patch.

Stefan
Kevin Wolf April 11, 2013, 8:48 a.m. UTC | #4
Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben:
> On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote:
> > On 04/02/2013 07:10 AM, Kevin Wolf wrote:
> > >Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben:
> > >>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
> > >>which is sychronous and causes the main qemu thread to block until it
> > >>is complete. This results in unresponsiveness and extra latency for
> > >>the guest.
> > >>
> > >>Fix this by using an asynchronous version of flush.  This was added to
> > >>librbd with a special #define to indicate its presence, since it will
> > >>be backported to stable versions. Thus, there is no need to check the
> > >>version of librbd.
> > >
> > >librbd is linked dynamically and the version on the build host isn't
> > >necessarily the same as the version qemu is run with. So shouldn't this
> > >better be a runtime check?
> > 
> > While we discuss runtime loading separately, would you mind taking this
> > patch as-is for now?
> 
> Hi Josh,
> I'm happy with Patch v3 1/2.  Does that work for you?

Only patch 1/2 would add dead code as .bdrv_aio_flush would never be
called. I think we should rather take v1 of the series then, with the
#ifdefs at build time.

Kevin
Josh Durgin April 11, 2013, 5:19 p.m. UTC | #5
On 04/11/2013 01:48 AM, Kevin Wolf wrote:
> Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben:
>> On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote:
>>> On 04/02/2013 07:10 AM, Kevin Wolf wrote:
>>>> Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben:
>>>>> The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
>>>>> which is sychronous and causes the main qemu thread to block until it
>>>>> is complete. This results in unresponsiveness and extra latency for
>>>>> the guest.
>>>>>
>>>>> Fix this by using an asynchronous version of flush.  This was added to
>>>>> librbd with a special #define to indicate its presence, since it will
>>>>> be backported to stable versions. Thus, there is no need to check the
>>>>> version of librbd.
>>>>
>>>> librbd is linked dynamically and the version on the build host isn't
>>>> necessarily the same as the version qemu is run with. So shouldn't this
>>>> better be a runtime check?
>>>
>>> While we discuss runtime loading separately, would you mind taking this
>>> patch as-is for now?
>>
>> Hi Josh,
>> I'm happy with Patch v3 1/2.  Does that work for you?
>
> Only patch 1/2 would add dead code as .bdrv_aio_flush would never be
> called. I think we should rather take v1 of the series then, with the
> #ifdefs at build time.

Yes, that would be a problem with only v3 1/2. v2 of the original
series is fine, but v1 was accidentally missing a hunk. To be clear,
I think just http://patchwork.ozlabs.org/patch/232489/ would be good.

Thanks,
Josh
Kevin Wolf April 12, 2013, 6:50 a.m. UTC | #6
Am 11.04.2013 um 19:19 hat Josh Durgin geschrieben:
> On 04/11/2013 01:48 AM, Kevin Wolf wrote:
> >Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben:
> >>On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote:
> >>>On 04/02/2013 07:10 AM, Kevin Wolf wrote:
> >>>>Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben:
> >>>>>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
> >>>>>which is sychronous and causes the main qemu thread to block until it
> >>>>>is complete. This results in unresponsiveness and extra latency for
> >>>>>the guest.
> >>>>>
> >>>>>Fix this by using an asynchronous version of flush.  This was added to
> >>>>>librbd with a special #define to indicate its presence, since it will
> >>>>>be backported to stable versions. Thus, there is no need to check the
> >>>>>version of librbd.
> >>>>
> >>>>librbd is linked dynamically and the version on the build host isn't
> >>>>necessarily the same as the version qemu is run with. So shouldn't this
> >>>>better be a runtime check?
> >>>
> >>>While we discuss runtime loading separately, would you mind taking this
> >>>patch as-is for now?
> >>
> >>Hi Josh,
> >>I'm happy with Patch v3 1/2.  Does that work for you?
> >
> >Only patch 1/2 would add dead code as .bdrv_aio_flush would never be
> >called. I think we should rather take v1 of the series then, with the
> >#ifdefs at build time.
> 
> Yes, that would be a problem with only v3 1/2. v2 of the original
> series is fine, but v1 was accidentally missing a hunk. To be clear,
> I think just http://patchwork.ozlabs.org/patch/232489/ would be good.

Yes, sorry, I should have checked before posting this. v2 I meant.

Kevin
Stefan Hajnoczi April 12, 2013, 7:42 a.m. UTC | #7
On Fri, Apr 12, 2013 at 08:50:35AM +0200, Kevin Wolf wrote:
> Am 11.04.2013 um 19:19 hat Josh Durgin geschrieben:
> > On 04/11/2013 01:48 AM, Kevin Wolf wrote:
> > >Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben:
> > >>On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote:
> > >>>On 04/02/2013 07:10 AM, Kevin Wolf wrote:
> > >>>>Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben:
> > >>>>>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(),
> > >>>>>which is sychronous and causes the main qemu thread to block until it
> > >>>>>is complete. This results in unresponsiveness and extra latency for
> > >>>>>the guest.
> > >>>>>
> > >>>>>Fix this by using an asynchronous version of flush.  This was added to
> > >>>>>librbd with a special #define to indicate its presence, since it will
> > >>>>>be backported to stable versions. Thus, there is no need to check the
> > >>>>>version of librbd.
> > >>>>
> > >>>>librbd is linked dynamically and the version on the build host isn't
> > >>>>necessarily the same as the version qemu is run with. So shouldn't this
> > >>>>better be a runtime check?
> > >>>
> > >>>While we discuss runtime loading separately, would you mind taking this
> > >>>patch as-is for now?
> > >>
> > >>Hi Josh,
> > >>I'm happy with Patch v3 1/2.  Does that work for you?
> > >
> > >Only patch 1/2 would add dead code as .bdrv_aio_flush would never be
> > >called. I think we should rather take v1 of the series then, with the
> > >#ifdefs at build time.
> > 
> > Yes, that would be a problem with only v3 1/2. v2 of the original
> > series is fine, but v1 was accidentally missing a hunk. To be clear,
> > I think just http://patchwork.ozlabs.org/patch/232489/ would be good.
> 
> Yes, sorry, I should have checked before posting this. v2 I meant.

Thanks Josh and Kevin.  Will take v2.

Stefan
diff mbox

Patch

diff --git a/block/rbd.c b/block/rbd.c
index 1a8ea6d..141b488 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -63,7 +63,8 @@ 
 typedef enum {
     RBD_AIO_READ,
     RBD_AIO_WRITE,
-    RBD_AIO_DISCARD
+    RBD_AIO_DISCARD,
+    RBD_AIO_FLUSH
 } RBDAIOCmd;
 
 typedef struct RBDAIOCB {
@@ -379,8 +380,7 @@  static void qemu_rbd_complete_aio(RADOSCB *rcb)
 
     r = rcb->ret;
 
-    if (acb->cmd == RBD_AIO_WRITE ||
-        acb->cmd == RBD_AIO_DISCARD) {
+    if (acb->cmd != RBD_AIO_READ) {
         if (r < 0) {
             acb->ret = r;
             acb->error = 1;
@@ -659,6 +659,16 @@  static int rbd_aio_discard_wrapper(rbd_image_t image,
 #endif
 }
 
+static int rbd_aio_flush_wrapper(rbd_image_t image,
+                                 rbd_completion_t comp)
+{
+#ifdef LIBRBD_SUPPORTS_AIO_FLUSH
+    return rbd_aio_flush(image, comp);
+#else
+    return -ENOTSUP;
+#endif
+}
+
 static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
                                        int64_t sector_num,
                                        QEMUIOVector *qiov,
@@ -679,7 +689,7 @@  static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
     acb = qemu_aio_get(&rbd_aiocb_info, bs, cb, opaque);
     acb->cmd = cmd;
     acb->qiov = qiov;
-    if (cmd == RBD_AIO_DISCARD) {
+    if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
         acb->bounce = NULL;
     } else {
         acb->bounce = qemu_blockalign(bs, qiov->size);
@@ -723,6 +733,9 @@  static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
     case RBD_AIO_DISCARD:
         r = rbd_aio_discard_wrapper(s->image, off, size, c);
         break;
+    case RBD_AIO_FLUSH:
+        r = rbd_aio_flush_wrapper(s->image, c);
+        break;
     default:
         r = -EINVAL;
     }
@@ -762,6 +775,16 @@  static BlockDriverAIOCB *qemu_rbd_aio_writev(BlockDriverState *bs,
                          RBD_AIO_WRITE);
 }
 
+#ifdef LIBRBD_SUPPORTS_AIO_FLUSH
+static BlockDriverAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs,
+                                            BlockDriverCompletionFunc *cb,
+                                            void *opaque)
+{
+    return rbd_start_aio(bs, 0, NULL, 0, cb, opaque, RBD_AIO_FLUSH);
+}
+
+#else
+
 static int qemu_rbd_co_flush(BlockDriverState *bs)
 {
 #if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 1)
@@ -772,6 +795,7 @@  static int qemu_rbd_co_flush(BlockDriverState *bs)
     return 0;
 #endif
 }
+#endif
 
 static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
@@ -949,7 +973,12 @@  static BlockDriver bdrv_rbd = {
 
     .bdrv_aio_readv         = qemu_rbd_aio_readv,
     .bdrv_aio_writev        = qemu_rbd_aio_writev,
+
+#ifdef LIBRBD_SUPPORTS_AIO_FLUSH
+    .bdrv_aio_flush         = qemu_rbd_aio_flush,
+#else
     .bdrv_co_flush_to_disk  = qemu_rbd_co_flush,
+#endif
 
 #ifdef LIBRBD_SUPPORTS_DISCARD
     .bdrv_aio_discard       = qemu_rbd_aio_discard,