Patchwork iscsi: partly avoid iovec linearization in iscsi_aio_writev

login
register
mail settings
Submitter Peter Lieven
Date Nov. 19, 2012, 2:58 p.m.
Message ID <50AA4917.5000402@dlhnet.de>
Download mbox | patch
Permalink /patch/200023/
State New
Headers show

Comments

Peter Lieven - Nov. 19, 2012, 2:58 p.m.
libiscsi expects all write16 data in a linear buffer. If the
iovec only contains one buffer we can skip the linearization
step as well as the additional malloc/free and pass the
buffer directly.

Reported-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
---
  block/iscsi.c |   24 +++++++++++++++---------
  1 file changed, 15 insertions(+), 9 deletions(-)
Paolo Bonzini - Nov. 19, 2012, 3:18 p.m.
Il 19/11/2012 15:58, Peter Lieven ha scritto:
> 
> -    /* XXX we should pass the iovec to write16 to avoid the extra copy */
> -    /* this will allow us to get rid of 'buf' completely */
>      size = nb_sectors * BDRV_SECTOR_SIZE;
> -    acb->buf = g_malloc(size);
> -    qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
> +    data.size = size;
> +
> +    /* if the iovec only contains one buffer we can pass it directly */
> +    if (acb->qiov->niov == 1) {
> +        acb->buf = NULL;
> +        data.data = acb->qiov->iov[0].iov_base;
> +    } else {
> +        acb->buf = g_malloc(size);
> +        qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
> +        data.data = acb->buf;
> +    }

Looks good, but how hard is it to get rid of the bounce buffer
completely, as mentioned in the comment?  Ronnie?

Paolo
Peter Lieven - Nov. 19, 2012, 3:23 p.m.
Am 19.11.2012 um 16:18 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 19/11/2012 15:58, Peter Lieven ha scritto:
>> 
>> -    /* XXX we should pass the iovec to write16 to avoid the extra copy */
>> -    /* this will allow us to get rid of 'buf' completely */
>>     size = nb_sectors * BDRV_SECTOR_SIZE;
>> -    acb->buf = g_malloc(size);
>> -    qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
>> +    data.size = size;
>> +
>> +    /* if the iovec only contains one buffer we can pass it directly */
>> +    if (acb->qiov->niov == 1) {
>> +        acb->buf = NULL;
>> +        data.data = acb->qiov->iov[0].iov_base;
>> +    } else {
>> +        acb->buf = g_malloc(size);
>> +        qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
>> +        data.data = acb->buf;
>> +    }
> 
> Looks good, but how hard is it to get rid of the bounce buffer
> completely, as mentioned in the comment?  Ronnie?

afaik, he is working on that. but therefore it needs support for passing
buffers to libiscsi also for write operations (currently thats only possible
for reads).

this was just an easy catch and can be reverted once libiscsi has support
for this. we are working on some abi changes which separate libiscsi
for the scsi-lowlevel operations. we have to make modifications to the
qemu driver then as well, maybe this could be done at the same point.

Peter

> 
> Paolo
ronniesahlberg@gmail.com - Nov. 19, 2012, 3:56 p.m.
On Mon, Nov 19, 2012 at 7:18 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 19/11/2012 15:58, Peter Lieven ha scritto:
>>
>> -    /* XXX we should pass the iovec to write16 to avoid the extra copy */
>> -    /* this will allow us to get rid of 'buf' completely */
>>      size = nb_sectors * BDRV_SECTOR_SIZE;
>> -    acb->buf = g_malloc(size);
>> -    qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
>> +    data.size = size;
>> +
>> +    /* if the iovec only contains one buffer we can pass it directly */
>> +    if (acb->qiov->niov == 1) {
>> +        acb->buf = NULL;
>> +        data.data = acb->qiov->iov[0].iov_base;
>> +    } else {
>> +        acb->buf = g_malloc(size);
>> +        qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
>> +        data.data = acb->buf;
>> +    }
>
> Looks good, but how hard is it to get rid of the bounce buffer
> completely, as mentioned in the comment?  Ronnie?
>

I am working on that,  but I will need the thanksgiving weekend to
finish it and test it.
It also means breaking the API, since it would introduce new
functionality  so it will take a little while after finishing the
libiscsi part before we could/should introduce it in qemu.

The vast majority of writes seems to be a vector with only a single
element,  so Peters change is a good optimization for the common case,
until I get the proper fix finished and tested and pushed in a new
release of libiscsi.

I.e.  I think Peters change is good for now.  It solves the problem
with "extra memcpy" for the majority of writes until we can get the
full solution ready.


> Paolo
Stefan Hajnoczi - Nov. 20, 2012, 3:51 p.m.
On Mon, Nov 19, 2012 at 03:58:31PM +0100, Peter Lieven wrote:
> libiscsi expects all write16 data in a linear buffer. If the
> iovec only contains one buffer we can skip the linearization
> step as well as the additional malloc/free and pass the
> buffer directly.
> 
> Reported-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index d0b1a10..f12148e 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -199,8 +199,10 @@ iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status,
>      trace_iscsi_aio_write16_cb(iscsi, status, acb, acb->canceled);
> -    g_free(acb->buf);
> -
> +    if (acb->buf != NULL) {
> +        g_free(acb->buf);
> +    }
> +

g_free(NULL) is a nop, so this change isn't necessary.  Poalo: Could be
done when merging the patch?

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Paolo Bonzini - Nov. 20, 2012, 4:16 p.m.
Il 20/11/2012 16:51, Stefan Hajnoczi ha scritto:
> Poalo: Could be
> done when merging the patch?
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Yes.  Applying this to scsi-next branch.

Paolo

Patch

diff --git a/block/iscsi.c b/block/iscsi.c
index d0b1a10..f12148e 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -199,8 +199,10 @@  iscsi_aio_write16_cb(struct iscsi_context *iscsi, int status,
  
      trace_iscsi_aio_write16_cb(iscsi, status, acb, acb->canceled);
  
-    g_free(acb->buf);
-
+    if (acb->buf != NULL) {
+        g_free(acb->buf);
+    }
+
      if (acb->canceled != 0) {
          return;
      }
@@ -244,11 +246,18 @@  iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
      acb->bh         = NULL;
      acb->status     = -EINPROGRESS;
  
-    /* XXX we should pass the iovec to write16 to avoid the extra copy */
-    /* this will allow us to get rid of 'buf' completely */
      size = nb_sectors * BDRV_SECTOR_SIZE;
-    acb->buf = g_malloc(size);
-    qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
+    data.size = size;
+
+    /* if the iovec only contains one buffer we can pass it directly */
+    if (acb->qiov->niov == 1) {
+        acb->buf = NULL;
+        data.data = acb->qiov->iov[0].iov_base;
+    } else {
+        acb->buf = g_malloc(size);
+        qemu_iovec_to_buf(acb->qiov, 0, acb->buf, size);
+        data.data = acb->buf;
+    }
  
      acb->task = malloc(sizeof(struct scsi_task));
      if (acb->task == NULL) {
@@ -269,9 +278,6 @@  iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
      *(uint32_t *)&acb->task->cdb[10] = htonl(num_sectors);
      acb->task->expxferlen = size;
  
-    data.data = acb->buf;
-    data.size = size;
-
      if (iscsi_scsi_command_async(iscsi, iscsilun->lun, acb->task,
                                   iscsi_aio_write16_cb,
                                   &data,