Message ID | 1252511618-19497-3-git-send-email-kwolf@redhat.com |
---|---|
State | Superseded |
Headers | show |
On Wed, Sep 09, 2009 at 05:53:38PM +0200, Kevin Wolf wrote: > +static void do_multiwrite(BlockDriverState *bs, BlockRequest *blkreq, > + int num_writes) > { > - BlockDriverAIOCB *acb; > + int i, ret; > + ret = bdrv_aio_multiwrite(bs, blkreq, num_writes); > + > + if (ret != 0) { > + for (i = 0; i < num_writes; i++) { > + if (blkreq[i].error) { > + virtio_blk_req_complete(blkreq[i].opaque, VIRTIO_BLK_S_IOERR); > + } > + } > + } > +} > > +static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes, > + VirtIOBlockReq *req, BlockDriverState **old_bs) > +{ > + if (req->dev->bs != *old_bs || *num_writes == 32) { I was under the impression we had one queue per device, so the first condition should never happen. > + if (*old_bs != NULL) { > + do_multiwrite(*old_bs, blkreq, *num_writes); > + } > + *num_writes = 0; > + *old_bs = req->dev->bs; > } > + > + blkreq[*num_writes].sector = req->out->sector; > + blkreq[*num_writes].nb_sectors = req->qiov.size / 512; > + blkreq[*num_writes].qiov = &req->qiov; > + blkreq[*num_writes].cb = virtio_blk_rw_complete; > + blkreq[*num_writes].opaque = req; > + blkreq[*num_writes].error = 0; > + > + (*num_writes)++; If you pass the completion routine to the function and map the error case to calling completion routine (which is the usual way to handle errors anyway) this function could become copletely generic. I think we also need to actually store the iocb in the BlockRequest ide and scsi use it to cancel I/O on migration (why virtio doesn't is on my TODO list to investigate) or some other cases. Another improvement to the data structure would be to have a container structure containing the BlockRequest array and num_writes, at which point this actually becomes a pretty clean abstraction, which we could also use to submit multiple iocbs in the native AIO code. Any chance to just use this batches subsmission unconditionally and also for reads? I'd hate to grow even more confusing I/O methods in the block later.
Am 11.09.2009 00:41, schrieb Christoph Hellwig: > On Wed, Sep 09, 2009 at 05:53:38PM +0200, Kevin Wolf wrote: >> +static void do_multiwrite(BlockDriverState *bs, BlockRequest *blkreq, >> + int num_writes) >> { >> - BlockDriverAIOCB *acb; >> + int i, ret; >> + ret = bdrv_aio_multiwrite(bs, blkreq, num_writes); >> + >> + if (ret != 0) { >> + for (i = 0; i < num_writes; i++) { >> + if (blkreq[i].error) { >> + virtio_blk_req_complete(blkreq[i].opaque, VIRTIO_BLK_S_IOERR); >> + } >> + } >> + } >> +} >> >> +static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes, >> + VirtIOBlockReq *req, BlockDriverState **old_bs) >> +{ >> + if (req->dev->bs != *old_bs || *num_writes == 32) { > > I was under the impression we had one queue per device, so the first > condition should never happen. Maybe, don't know? ;-) Makes perfect sense, so probably you're right. I was just trying to be better safe than sorry. I can take it out if you prefer. >> + if (*old_bs != NULL) { >> + do_multiwrite(*old_bs, blkreq, *num_writes); >> + } >> + *num_writes = 0; >> + *old_bs = req->dev->bs; >> } >> + >> + blkreq[*num_writes].sector = req->out->sector; >> + blkreq[*num_writes].nb_sectors = req->qiov.size / 512; >> + blkreq[*num_writes].qiov = &req->qiov; >> + blkreq[*num_writes].cb = virtio_blk_rw_complete; >> + blkreq[*num_writes].opaque = req; >> + blkreq[*num_writes].error = 0; >> + >> + (*num_writes)++; > > If you pass the completion routine to the function and map the error case > to calling completion routine (which is the usual way to handle errors > anyway) this function could become copletely generic. Except that VirtIOBlockReq doesn't seem to be a type commonly used in generic code. > I think we also need to actually store the iocb in the BlockRequest > ide and scsi use it to cancel I/O on migration (why virtio > doesn't is on my TODO list to investigate) or some other cases. Right, this is something that is still missing. Considering that virtio as the only multiwrite user doesn't use it (yet) anyway and that it's not completely trivial (the request to be cancelled could have been merged), I would prefer to introduce this in a patch on top. The bdrv_aio_multiwrite patch is already larger than I had liked it to be. > Another improvement to the data structure would be to have a container > structure containing the BlockRequest array and num_writes, at which > point this actually becomes a pretty clean abstraction, which we > could also use to submit multiple iocbs in the native AIO code. Ok, I'm not opposed to this. > Any chance to just use this batches subsmission unconditionally and > also for reads? I'd hate to grow even more confusing I/O methods > in the block later. If we want to completely obsolete bdrv_aio_readv/writev by batch submission functions (not only in block.c but also in each block driver), we certainly can do that. I think this would make a lot of sense, but it's quite some work and definitely out of scope for this patch which is basically meant to be a qcow2 performance fix. Kevin
On Fri, Sep 11, 2009 at 09:10:20AM +0200, Kevin Wolf wrote: > >> + blkreq[*num_writes].sector = req->out->sector; > >> + blkreq[*num_writes].nb_sectors = req->qiov.size / 512; > >> + blkreq[*num_writes].qiov = &req->qiov; > >> + blkreq[*num_writes].cb = virtio_blk_rw_complete; > >> + blkreq[*num_writes].opaque = req; > >> + blkreq[*num_writes].error = 0; > >> + > >> + (*num_writes)++; > > > > If you pass the completion routine to the function and map the error case > > to calling completion routine (which is the usual way to handle errors > > anyway) this function could become copletely generic. > > Except that VirtIOBlockReq doesn't seem to be a type commonly used in > generic code. Yeah, we'd need to pass it only as opaque cookie and the qiov/setor separately, making the whole thing look more similar to how the block API works elsewhere. > > Any chance to just use this batches subsmission unconditionally and > > also for reads? I'd hate to grow even more confusing I/O methods > > in the block later. > > If we want to completely obsolete bdrv_aio_readv/writev by batch > submission functions (not only in block.c but also in each block > driver), we certainly can do that. I think this would make a lot of > sense, but it's quite some work and definitely out of scope for this > patch which is basically meant to be a qcow2 performance fix. I'm generally not a big fan of incomplete transitions, history tells they will remaing incomplete for a long time or even forever and grow more and more of the old calls. The persistant existance of the non-AIO block APIs in qemu is one of those cases..
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index c160246..5c88c12 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -252,15 +252,40 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req) } #endif /* __linux__ */ -static void virtio_blk_handle_write(VirtIOBlockReq *req) +static void do_multiwrite(BlockDriverState *bs, BlockRequest *blkreq, + int num_writes) { - BlockDriverAIOCB *acb; + int i, ret; + ret = bdrv_aio_multiwrite(bs, blkreq, num_writes); + + if (ret != 0) { + for (i = 0; i < num_writes; i++) { + if (blkreq[i].error) { + virtio_blk_req_complete(blkreq[i].opaque, VIRTIO_BLK_S_IOERR); + } + } + } +} - acb = bdrv_aio_writev(req->dev->bs, req->out->sector, &req->qiov, - req->qiov.size / 512, virtio_blk_rw_complete, req); - if (!acb) { - virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); +static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes, + VirtIOBlockReq *req, BlockDriverState **old_bs) +{ + if (req->dev->bs != *old_bs || *num_writes == 32) { + if (*old_bs != NULL) { + do_multiwrite(*old_bs, blkreq, *num_writes); + } + *num_writes = 0; + *old_bs = req->dev->bs; } + + blkreq[*num_writes].sector = req->out->sector; + blkreq[*num_writes].nb_sectors = req->qiov.size / 512; + blkreq[*num_writes].qiov = &req->qiov; + blkreq[*num_writes].cb = virtio_blk_rw_complete; + blkreq[*num_writes].opaque = req; + blkreq[*num_writes].error = 0; + + (*num_writes)++; } static void virtio_blk_handle_read(VirtIOBlockReq *req) @@ -278,6 +303,9 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBlock *s = to_virtio_blk(vdev); VirtIOBlockReq *req; + BlockRequest blkreq[32]; + int num_writes = 0; + BlockDriverState *old_bs = NULL; while ((req = virtio_blk_get_request(s))) { if (req->elem.out_num < 1 || req->elem.in_num < 1) { @@ -299,13 +327,18 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) } else if (req->out->type & VIRTIO_BLK_T_OUT) { qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1], req->elem.out_num - 1); - virtio_blk_handle_write(req); + virtio_blk_handle_write(blkreq, &num_writes, req, &old_bs); } else { qemu_iovec_init_external(&req->qiov, &req->elem.in_sg[0], req->elem.in_num - 1); virtio_blk_handle_read(req); } } + + if (num_writes > 0) { + do_multiwrite(old_bs, blkreq, num_writes); + } + /* * FIXME: Want to check for completions before returning to guest mode, * so cached reads and writes are reported as quickly as possible. But @@ -324,7 +357,8 @@ static void virtio_blk_dma_restart_bh(void *opaque) s->rq = NULL; while (req) { - virtio_blk_handle_write(req); + bdrv_aio_writev(req->dev->bs, req->out->sector, &req->qiov, + req->qiov.size / 512, virtio_blk_rw_complete, req); req = req->next; } }
It is quite common for virtio-blk to submit more than one write request in a row to the qemu block layer. Use bdrv_aio_multiwrite to allow block drivers to optimize its handling of the requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/virtio-blk.c | 50 ++++++++++++++++++++++++++++++++++++++++++-------- 1 files changed, 42 insertions(+), 8 deletions(-)