Patchwork [v3,2/2] virtio-blk: Use bdrv_aio_multiwrite

login
register
mail settings
Submitter Kevin Wolf
Date Sept. 9, 2009, 3:53 p.m.
Message ID <1252511618-19497-3-git-send-email-kwolf@redhat.com>
Download mbox | patch
Permalink /patch/33211/
State Superseded
Headers show

Comments

Kevin Wolf - Sept. 9, 2009, 3:53 p.m.
It is quite common for virtio-blk to submit more than one write request in a
row to the qemu block layer. Use bdrv_aio_multiwrite to allow block drivers to
optimize its handling of the requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/virtio-blk.c |   50 ++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 42 insertions(+), 8 deletions(-)
Christoph Hellwig - Sept. 10, 2009, 10:41 p.m.
On Wed, Sep 09, 2009 at 05:53:38PM +0200, Kevin Wolf wrote:
> +static void do_multiwrite(BlockDriverState *bs, BlockRequest *blkreq,
> +    int num_writes)
>  {
> -    BlockDriverAIOCB *acb;
> +    int i, ret;
> +    ret = bdrv_aio_multiwrite(bs, blkreq, num_writes);
> +
> +    if (ret != 0) {
> +        for (i = 0; i < num_writes; i++) {
> +            if (blkreq[i].error) {
> +                virtio_blk_req_complete(blkreq[i].opaque, VIRTIO_BLK_S_IOERR);
> +            }
> +        }
> +    }
> +}
>  
> +static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes,
> +    VirtIOBlockReq *req, BlockDriverState **old_bs)
> +{
> +    if (req->dev->bs != *old_bs || *num_writes == 32) {

I was under the impression we had one queue per device, so the first
condition should never happen.

> +        if (*old_bs != NULL) {
> +            do_multiwrite(*old_bs, blkreq, *num_writes);
> +        }
> +        *num_writes = 0;
> +        *old_bs = req->dev->bs;
>      }
> +
> +    blkreq[*num_writes].sector = req->out->sector;
> +    blkreq[*num_writes].nb_sectors = req->qiov.size / 512;
> +    blkreq[*num_writes].qiov = &req->qiov;
> +    blkreq[*num_writes].cb = virtio_blk_rw_complete;
> +    blkreq[*num_writes].opaque = req;
> +    blkreq[*num_writes].error = 0;
> +
> +    (*num_writes)++;

If you pass the completion routine to the function and map the error case
to calling completion routine (which is the usual way to handle errors
anyway) this function could become copletely generic.

I think we also need to actually store the iocb in the BlockRequest
ide and scsi use it to cancel I/O on migration (why virtio
doesn't is on my TODO list to investigate) or some other cases.

Another improvement to the data structure would be to have a container
structure containing the BlockRequest array and num_writes, at which
point this actually becomes a pretty clean abstraction, which we
could also use to submit multiple iocbs in the native AIO code.

Any chance to just use this batches subsmission unconditionally and
also for reads?  I'd hate to grow even more confusing I/O methods
in the block later.
Kevin Wolf - Sept. 11, 2009, 7:10 a.m.
Am 11.09.2009 00:41, schrieb Christoph Hellwig:
> On Wed, Sep 09, 2009 at 05:53:38PM +0200, Kevin Wolf wrote:
>> +static void do_multiwrite(BlockDriverState *bs, BlockRequest *blkreq,
>> +    int num_writes)
>>  {
>> -    BlockDriverAIOCB *acb;
>> +    int i, ret;
>> +    ret = bdrv_aio_multiwrite(bs, blkreq, num_writes);
>> +
>> +    if (ret != 0) {
>> +        for (i = 0; i < num_writes; i++) {
>> +            if (blkreq[i].error) {
>> +                virtio_blk_req_complete(blkreq[i].opaque, VIRTIO_BLK_S_IOERR);
>> +            }
>> +        }
>> +    }
>> +}
>>  
>> +static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes,
>> +    VirtIOBlockReq *req, BlockDriverState **old_bs)
>> +{
>> +    if (req->dev->bs != *old_bs || *num_writes == 32) {
> 
> I was under the impression we had one queue per device, so the first
> condition should never happen.

Maybe, don't know? ;-) Makes perfect sense, so probably you're right. I
was just trying to be better safe than sorry. I can take it out if you
prefer.

>> +        if (*old_bs != NULL) {
>> +            do_multiwrite(*old_bs, blkreq, *num_writes);
>> +        }
>> +        *num_writes = 0;
>> +        *old_bs = req->dev->bs;
>>      }
>> +
>> +    blkreq[*num_writes].sector = req->out->sector;
>> +    blkreq[*num_writes].nb_sectors = req->qiov.size / 512;
>> +    blkreq[*num_writes].qiov = &req->qiov;
>> +    blkreq[*num_writes].cb = virtio_blk_rw_complete;
>> +    blkreq[*num_writes].opaque = req;
>> +    blkreq[*num_writes].error = 0;
>> +
>> +    (*num_writes)++;
> 
> If you pass the completion routine to the function and map the error case
> to calling completion routine (which is the usual way to handle errors
> anyway) this function could become copletely generic.

Except that VirtIOBlockReq doesn't seem to be a type commonly used in
generic code.

> I think we also need to actually store the iocb in the BlockRequest
> ide and scsi use it to cancel I/O on migration (why virtio
> doesn't is on my TODO list to investigate) or some other cases.

Right, this is something that is still missing. Considering that virtio
as the only multiwrite user doesn't use it (yet) anyway and that it's
not completely trivial (the request to be cancelled could have been
merged), I would prefer to introduce this in a patch on top. The
bdrv_aio_multiwrite patch is already larger than I had liked it to be.

> Another improvement to the data structure would be to have a container
> structure containing the BlockRequest array and num_writes, at which
> point this actually becomes a pretty clean abstraction, which we
> could also use to submit multiple iocbs in the native AIO code.

Ok, I'm not opposed to this.

> Any chance to just use this batches subsmission unconditionally and
> also for reads?  I'd hate to grow even more confusing I/O methods
> in the block later.

If we want to completely obsolete bdrv_aio_readv/writev by batch
submission functions (not only in block.c but also in each block
driver), we certainly can do that. I think this would make a lot of
sense, but it's quite some work and definitely out of scope for this
patch which is basically meant to be a qcow2 performance fix.

Kevin
Christoph Hellwig - Sept. 11, 2009, 6:39 p.m.
On Fri, Sep 11, 2009 at 09:10:20AM +0200, Kevin Wolf wrote:
> >> +    blkreq[*num_writes].sector = req->out->sector;
> >> +    blkreq[*num_writes].nb_sectors = req->qiov.size / 512;
> >> +    blkreq[*num_writes].qiov = &req->qiov;
> >> +    blkreq[*num_writes].cb = virtio_blk_rw_complete;
> >> +    blkreq[*num_writes].opaque = req;
> >> +    blkreq[*num_writes].error = 0;
> >> +
> >> +    (*num_writes)++;
> > 
> > If you pass the completion routine to the function and map the error case
> > to calling completion routine (which is the usual way to handle errors
> > anyway) this function could become copletely generic.
> 
> Except that VirtIOBlockReq doesn't seem to be a type commonly used in
> generic code.

Yeah, we'd need to pass it only as opaque cookie and the qiov/setor
separately, making the whole thing look more similar to how the block
API works elsewhere.

> > Any chance to just use this batches subsmission unconditionally and
> > also for reads?  I'd hate to grow even more confusing I/O methods
> > in the block later.
> 
> If we want to completely obsolete bdrv_aio_readv/writev by batch
> submission functions (not only in block.c but also in each block
> driver), we certainly can do that. I think this would make a lot of
> sense, but it's quite some work and definitely out of scope for this
> patch which is basically meant to be a qcow2 performance fix.

I'm generally not a big fan of incomplete transitions, history tells
they will remaing incomplete for a long time or even forever and grow
more and more of the old calls.  The persistant existance of the non-AIO
block APIs in qemu is one of those cases..

Patch

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index c160246..5c88c12 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -252,15 +252,40 @@  static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
 }
 #endif /* __linux__ */
 
-static void virtio_blk_handle_write(VirtIOBlockReq *req)
+static void do_multiwrite(BlockDriverState *bs, BlockRequest *blkreq,
+    int num_writes)
 {
-    BlockDriverAIOCB *acb;
+    int i, ret;
+    ret = bdrv_aio_multiwrite(bs, blkreq, num_writes);
+
+    if (ret != 0) {
+        for (i = 0; i < num_writes; i++) {
+            if (blkreq[i].error) {
+                virtio_blk_req_complete(blkreq[i].opaque, VIRTIO_BLK_S_IOERR);
+            }
+        }
+    }
+}
 
-    acb = bdrv_aio_writev(req->dev->bs, req->out->sector, &req->qiov,
-                          req->qiov.size / 512, virtio_blk_rw_complete, req);
-    if (!acb) {
-        virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
+static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes,
+    VirtIOBlockReq *req, BlockDriverState **old_bs)
+{
+    if (req->dev->bs != *old_bs || *num_writes == 32) {
+        if (*old_bs != NULL) {
+            do_multiwrite(*old_bs, blkreq, *num_writes);
+        }
+        *num_writes = 0;
+        *old_bs = req->dev->bs;
     }
+
+    blkreq[*num_writes].sector = req->out->sector;
+    blkreq[*num_writes].nb_sectors = req->qiov.size / 512;
+    blkreq[*num_writes].qiov = &req->qiov;
+    blkreq[*num_writes].cb = virtio_blk_rw_complete;
+    blkreq[*num_writes].opaque = req;
+    blkreq[*num_writes].error = 0;
+
+    (*num_writes)++;
 }
 
 static void virtio_blk_handle_read(VirtIOBlockReq *req)
@@ -278,6 +303,9 @@  static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBlock *s = to_virtio_blk(vdev);
     VirtIOBlockReq *req;
+    BlockRequest blkreq[32];
+    int num_writes = 0;
+    BlockDriverState *old_bs = NULL;
 
     while ((req = virtio_blk_get_request(s))) {
         if (req->elem.out_num < 1 || req->elem.in_num < 1) {
@@ -299,13 +327,18 @@  static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
         } else if (req->out->type & VIRTIO_BLK_T_OUT) {
             qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1],
                                      req->elem.out_num - 1);
-            virtio_blk_handle_write(req);
+            virtio_blk_handle_write(blkreq, &num_writes, req, &old_bs);
         } else {
             qemu_iovec_init_external(&req->qiov, &req->elem.in_sg[0],
                                      req->elem.in_num - 1);
             virtio_blk_handle_read(req);
         }
     }
+
+    if (num_writes > 0) {
+        do_multiwrite(old_bs, blkreq, num_writes);
+    }
+
     /*
      * FIXME: Want to check for completions before returning to guest mode,
      * so cached reads and writes are reported as quickly as possible. But
@@ -324,7 +357,8 @@  static void virtio_blk_dma_restart_bh(void *opaque)
     s->rq = NULL;
 
     while (req) {
-        virtio_blk_handle_write(req);
+        bdrv_aio_writev(req->dev->bs, req->out->sector, &req->qiov,
+            req->qiov.size / 512, virtio_blk_rw_complete, req);
         req = req->next;
     }
 }