Patchwork [09/16] scsi-disk: Allocate iovec dynamically

login
register
mail settings
Submitter Hannes Reinecke
Date Nov. 18, 2010, 2:47 p.m.
Message ID <20101118144723.61032F90AB@ochil.suse.de>
Download mbox | patch
Permalink /patch/72104/
State New
Headers show

Comments

Hannes Reinecke - Nov. 18, 2010, 2:47 p.m.
Rather than have the iovec part of the structure with a fixed size
of '1' we should be allocating it dynamically. This will allow us
to pass in SGLs directly.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 hw/scsi-disk.c |  112 +++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 70 insertions(+), 42 deletions(-)
Gerd Hoffmann - Nov. 18, 2010, 3:33 p.m.
Hi,

> +static size_t scsi_req_iov_len(SCSIDiskReq *r)
> +{
> +    size_t iov_len = 0;
> +    int i;
> +
> +    for (i = 0; i<  r->iov_num; i++)
> +        iov_len += r->iov[i].iov_len;
> +
> +    return iov_len;
> +}

You are aware that there is a QEMUIOVector type with helper functions 
which keeps track of both number of elements and total size?

cheers,
   Gerd
Hannes Reinecke - Nov. 18, 2010, 4:28 p.m.
On 11/18/2010 04:33 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> +static size_t scsi_req_iov_len(SCSIDiskReq *r)
>> +{
>> +    size_t iov_len = 0;
>> +    int i;
>> +
>> +    for (i = 0; i<  r->iov_num; i++)
>> +        iov_len += r->iov[i].iov_len;
>> +
>> +    return iov_len;
>> +}
> 
> You are aware that there is a QEMUIOVector type with helper functions
> which keeps track of both number of elements and total size?
> 
Yes. But I'm passing passing in an entire iovec to the request.
However, the QEMUIOVector routines allow you to add only _one_
element at a time, which is pretty wasteful here.

And I'm counting the resulting length of the iovec, which might have
been changed by read/write operations. For which there is no generic
function either.

But if requested I could easily move it into cutils.c.

Cheers,

Hannes
Kevin Wolf - Nov. 19, 2010, 11:43 a.m.
Am 18.11.2010 17:28, schrieb Hannes Reinecke:
> On 11/18/2010 04:33 PM, Gerd Hoffmann wrote:
>>   Hi,
>>
>>> +static size_t scsi_req_iov_len(SCSIDiskReq *r)
>>> +{
>>> +    size_t iov_len = 0;
>>> +    int i;
>>> +
>>> +    for (i = 0; i<  r->iov_num; i++)
>>> +        iov_len += r->iov[i].iov_len;
>>> +
>>> +    return iov_len;
>>> +}
>>
>> You are aware that there is a QEMUIOVector type with helper functions
>> which keeps track of both number of elements and total size?
>>
> Yes. But I'm passing passing in an entire iovec to the request.
> However, the QEMUIOVector routines allow you to add only _one_
> element at a time, which is pretty wasteful here.

Does the iov need to be changed afterwards, or why doesn't
qemu_iovec_init_external work here?

> And I'm counting the resulting length of the iovec, which might have
> been changed by read/write operations. For which there is no generic
> function either.

Can you explain which kind of read/write operations would change the
iov? This is not completely clear to me.

In general the same information that you're calculating here should be
stored in qiov->size for a QEUMIOVector, but depending what changes you
mean above it may not provide all operations you need.

Kevin
Hannes Reinecke - Nov. 19, 2010, 12:30 p.m.
On 11/19/2010 12:43 PM, Kevin Wolf wrote:
> Am 18.11.2010 17:28, schrieb Hannes Reinecke:
>> On 11/18/2010 04:33 PM, Gerd Hoffmann wrote:
>>>   Hi,
>>>
>>>> +static size_t scsi_req_iov_len(SCSIDiskReq *r)
>>>> +{
>>>> +    size_t iov_len = 0;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i<  r->iov_num; i++)
>>>> +        iov_len += r->iov[i].iov_len;
>>>> +
>>>> +    return iov_len;
>>>> +}
>>>
>>> You are aware that there is a QEMUIOVector type with helper functions
>>> which keeps track of both number of elements and total size?
>>>
>> Yes. But I'm passing passing in an entire iovec to the request.
>> However, the QEMUIOVector routines allow you to add only _one_
>> element at a time, which is pretty wasteful here.
> 
> Does the iov need to be changed afterwards, or why doesn't
> qemu_iovec_init_external work here?
> 
Oh, but I _do_ use qemu_iovec_init_external(); cf

hw/scsi_disk.c:scsi_read_data()

and yes, the iovec might be modified by the read/write operation;
see below.

>> And I'm counting the resulting length of the iovec, which might have
>> been changed by read/write operations. For which there is no generic
>> function either.
> 
> Can you explain which kind of read/write operations would change the
> iov? This is not completely clear to me.
> 
It is perfectly valid to send down an iovec larger than the command
would fill out; eg for a SCSI Inquiry you could easily request 255
bytes, but the command itself would only provide you with say 36 bytes.

Using SG_IO you would pass down the iovec (with the full size of
255), and you would be returned the same iovec. However, the
->iov_len parameter of the iovec elements would be modified
depending on the size of the actual data returned. Hence you need to
figure out the resulting size of the iovec by the above command.

> In general the same information that you're calculating here should be
> stored in qiov->size for a QEUMIOVector, but depending what changes you
> mean above it may not provide all operations you need.
> 
Ah. Seem to have missed it. Care to point it out to me?
But I don't think it'll work for the scsi-generic case, where we're
just using SG_IO.
But then, I tend to get lost in the callbacks-upon-callbacks
reference in the block layer, so there's a fair chance I haven't
seen it.

Cheers,

Hannes
Kevin Wolf - Nov. 19, 2010, 12:46 p.m.
Am 19.11.2010 13:30, schrieb Hannes Reinecke:
> On 11/19/2010 12:43 PM, Kevin Wolf wrote:
>> Am 18.11.2010 17:28, schrieb Hannes Reinecke:
>>> On 11/18/2010 04:33 PM, Gerd Hoffmann wrote:
>>>>   Hi,
>>>>
>>>>> +static size_t scsi_req_iov_len(SCSIDiskReq *r)
>>>>> +{
>>>>> +    size_t iov_len = 0;
>>>>> +    int i;
>>>>> +
>>>>> +    for (i = 0; i<  r->iov_num; i++)
>>>>> +        iov_len += r->iov[i].iov_len;
>>>>> +
>>>>> +    return iov_len;
>>>>> +}
>>>>
>>>> You are aware that there is a QEMUIOVector type with helper functions
>>>> which keeps track of both number of elements and total size?
>>>>
>>> Yes. But I'm passing passing in an entire iovec to the request.
>>> However, the QEMUIOVector routines allow you to add only _one_
>>> element at a time, which is pretty wasteful here.
>>
>> Does the iov need to be changed afterwards, or why doesn't
>> qemu_iovec_init_external work here?
>>
> Oh, but I _do_ use qemu_iovec_init_external(); cf
> 
> hw/scsi_disk.c:scsi_read_data()
> 
> and yes, the iovec might be modified by the read/write operation;
> see below.
> 
>>> And I'm counting the resulting length of the iovec, which might have
>>> been changed by read/write operations. For which there is no generic
>>> function either.
>>
>> Can you explain which kind of read/write operations would change the
>> iov? This is not completely clear to me.
>>
> It is perfectly valid to send down an iovec larger than the command
> would fill out; eg for a SCSI Inquiry you could easily request 255
> bytes, but the command itself would only provide you with say 36 bytes.

Okay, so at the point where you create the iovec you don't have this
information yet but rather let the command emulation update the iovec
later. Makes sense.

That's probably something that QEMUIOVectors isn't really designed for,
so just forget what I said. :-)

Kevin

Patch

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index a71607e..2d2e770 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -56,7 +56,10 @@  typedef struct SCSIDiskReq {
     /* Both sector and sector_count are in terms of qemu 512 byte blocks.  */
     uint64_t sector;
     uint32_t sector_count;
-    struct iovec iov;
+    uint8_t *iov_buf;
+    uint64_t iov_len;
+    struct iovec *iov;
+    int iov_num;
     QEMUIOVector qiov;
     uint32_t status;
 } SCSIDiskReq;
@@ -86,13 +89,19 @@  static SCSIDiskReq *scsi_new_request(SCSIDiskState *s, uint32_t tag,
 
     req = scsi_req_alloc(sizeof(SCSIDiskReq), &s->qdev, tag, lun);
     r = DO_UPCAST(SCSIDiskReq, req, req);
-    r->iov.iov_base = qemu_blockalign(s->bs, SCSI_DMA_BUF_SIZE);
+    r->iov_buf = qemu_blockalign(s->bs, SCSI_DMA_BUF_SIZE);
+    r->iov = qemu_mallocz(sizeof(struct iovec));
+    r->iov[0].iov_base = r->iov_buf;
+    r->iov_num = 1;
     return r;
 }
 
 static void scsi_remove_request(SCSIDiskReq *r)
 {
-    qemu_vfree(r->iov.iov_base);
+    qemu_vfree(r->iov);
+    r->iov = NULL;
+    qemu_vfree(r->iov_buf);
+    r->iov_buf = NULL;
     scsi_req_free(&r->req);
 }
 
@@ -114,10 +123,21 @@  static void scsi_req_set_status(SCSIDiskReq *r, int status, SCSISense sense)
     s->sense = sense;
 }
 
+static size_t scsi_req_iov_len(SCSIDiskReq *r)
+{
+    size_t iov_len = 0;
+    int i;
+
+    for (i = 0; i < r->iov_num; i++)
+        iov_len += r->iov[i].iov_len;
+
+    return iov_len;
+}
+
 /* Helper function for command completion.  */
 static void scsi_command_complete(SCSIDiskReq *r, int status, SCSISense sense)
 {
-    DPRINTF("Command complete tag=0x%x status=%d sense=%d/%d/%d\n",
+    DPRINTF("Command complete tag=0x%x status=%d sense=%02x/%02x/%02x\n",
             r->req.tag, status, sense.key, sense.asc, sense.ascq);
     scsi_req_set_status(r, status, sense);
     scsi_req_complete(&r->req);
@@ -142,7 +162,7 @@  static void scsi_cancel_io(SCSIDevice *d, uint32_t tag)
 static void scsi_read_complete(void * opaque, int ret)
 {
     SCSIDiskReq *r = (SCSIDiskReq *)opaque;
-    int n;
+    size_t iov_len = 0;
 
     r->req.aiocb = NULL;
 
@@ -151,13 +171,11 @@  static void scsi_read_complete(void * opaque, int ret)
             return;
         }
     }
+    iov_len = scsi_req_iov_len(r);
 
-    DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, r->iov.iov_len);
+    DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, iov_len);
 
-    n = r->iov.iov_len / 512;
-    r->sector += n;
-    r->sector_count -= n;
-    r->req.bus->complete(r->req.bus, SCSI_REASON_DATA, r->req.tag, r->iov.iov_len);
+    r->req.bus->complete(r->req.bus, SCSI_REASON_DATA, r->req.tag, iov_len);
 }
 
 
@@ -167,9 +185,10 @@  static void scsi_read_request(SCSIDiskReq *r)
     uint32_t n;
 
     if (r->sector_count == (uint32_t)-1) {
-        DPRINTF("Read buf_len=%zd\n", r->iov.iov_len);
+        DPRINTF("Read buf_len=%zd\n", r->iov[0].iov_len);
         r->sector_count = 0;
-        r->req.bus->complete(r->req.bus, SCSI_REASON_DATA, r->req.tag, r->iov.iov_len);
+        r->req.bus->complete(r->req.bus, SCSI_REASON_DATA, r->req.tag,
+                             r->iov[0].iov_len);
         return;
     }
     DPRINTF("Read sector_count=%d\n", r->sector_count);
@@ -179,15 +198,21 @@  static void scsi_read_request(SCSIDiskReq *r)
     }
 
     n = r->sector_count;
-    if (n > SCSI_DMA_BUF_SIZE / 512)
-        n = SCSI_DMA_BUF_SIZE / 512;
+    if (r->iov_buf) {
+        /* Reset iovec */
+        if (n > SCSI_DMA_BUF_SIZE / 512)
+            n = SCSI_DMA_BUF_SIZE / 512;
+        r->iov[0].iov_len = n * 512;
+    }
 
-    r->iov.iov_len = n * 512;
-    qemu_iovec_init_external(&r->qiov, &r->iov, 1);
+    qemu_iovec_init_external(&r->qiov, r->iov, r->iov_num);
     r->req.aiocb = bdrv_aio_readv(s->bs, r->sector, &r->qiov, n,
                               scsi_read_complete, r);
     if (r->req.aiocb == NULL) {
         scsi_read_complete(r, -EIO);
+    } else {
+        r->sector += n;
+        r->sector_count -= n;
     }
 }
 
@@ -264,17 +289,20 @@  static void scsi_write_complete(void * opaque, int ret)
         }
     }
 
-    n = r->iov.iov_len / 512;
+    n = scsi_req_iov_len(r) / 512;
     r->sector += n;
     r->sector_count -= n;
     if (r->sector_count == 0) {
         scsi_command_complete(r, GOOD, SENSE_CODE(NO_SENSE));
     } else {
         len = r->sector_count * 512;
-        if (len > SCSI_DMA_BUF_SIZE) {
-            len = SCSI_DMA_BUF_SIZE;
+        if (r->iov_buf) {
+            /* Reset iovec */
+            if (len > SCSI_DMA_BUF_SIZE) {
+                len = SCSI_DMA_BUF_SIZE;
+            }
+            r->iov[0].iov_len = len;
         }
-        r->iov.iov_len = len;
         DPRINTF("Write complete tag=0x%x more=%d\n", r->req.tag, len);
         r->req.bus->complete(r->req.bus, SCSI_REASON_DATA, r->req.tag, len);
     }
@@ -285,9 +313,9 @@  static void scsi_write_request(SCSIDiskReq *r)
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
     uint32_t n;
 
-    n = r->iov.iov_len / 512;
+    n = scsi_req_iov_len(r) / 512;
     if (n) {
-        qemu_iovec_init_external(&r->qiov, &r->iov, 1);
+        qemu_iovec_init_external(&r->qiov, r->iov, r->iov_num);
         r->req.aiocb = bdrv_aio_writev(s->bs, r->sector, &r->qiov, n,
                                    scsi_write_complete, r);
         if (r->req.aiocb == NULL) {
@@ -352,7 +380,7 @@  static void scsi_dma_restart_bh(void *opaque)
                 scsi_write_request(r);
                 break;
             case SCSI_REQ_STATUS_RETRY_FLUSH:
-                ret = scsi_disk_emulate_command(r, r->iov.iov_base);
+                ret = scsi_disk_emulate_command(r, r->iov[0].iov_base);
                 if (ret == 0) {
                     scsi_command_complete(r, GOOD, SENSE_CODE(NO_SENSE));
                 }
@@ -385,7 +413,7 @@  static uint8_t *scsi_get_buf(SCSIDevice *d, uint32_t tag)
         BADF("Bad buffer tag 0x%x\n", tag);
         return NULL;
     }
-    return (uint8_t *)r->iov.iov_base;
+    return r->iov_buf;
 }
 
 static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
@@ -1001,12 +1029,10 @@  static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag,
                                  uint8_t *buf, int lun)
 {
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
-    uint32_t len;
+    ssize_t len = 0;
     int is_write;
     uint8_t command;
-    uint8_t *outbuf;
     SCSIDiskReq *r;
-    int rc;
 
     command = buf[0];
     r = scsi_find_request(s, tag);
@@ -1017,7 +1043,6 @@  static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag,
     /* ??? Tags are not unique for different luns.  We only implement a
        single lun, so this should not matter.  */
     r = scsi_new_request(s, tag, lun);
-    outbuf = (uint8_t *)r->iov.iov_base;
     is_write = 0;
     DPRINTF("Command: lun=%d tag=0x%x data=0x%02x", lun, tag, buf[0]);
 
@@ -1065,23 +1090,25 @@  static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag,
     case REPORT_LUNS:
     case VERIFY:
     case REZERO_UNIT:
-        rc = scsi_disk_emulate_command(r, outbuf);
-        if (rc < 0) {
+        len = scsi_disk_emulate_command(r, r->iov[0].iov_base);
+        if (len < 0) {
             return 0;
         }
 
-        r->iov.iov_len = rc;
+        r->iov[0].iov_len = len;
         break;
     case READ_6:
     case READ_10:
     case READ_12:
     case READ_16:
-        len = r->req.cmd.xfer / d->blocksize;
-        DPRINTF("Read (sector %" PRId64 ", count %d)\n", r->req.cmd.lba, len);
-        if (r->req.cmd.lba > s->max_lba)
+        r->sector_count = r->req.cmd.xfer / d->blocksize * s->cluster_size;
+        DPRINTF("Read (sector %" PRId64 ", blocks %d)\n", r->req.cmd.lba,
+                r->sector_count);
+        if (r->req.cmd.lba > s->max_lba) {
+            r->sector_count = 0;
             goto illegal_lba;
+        }
         r->sector = r->req.cmd.lba * s->cluster_size;
-        r->sector_count = len * s->cluster_size;
         break;
     case WRITE_6:
     case WRITE_10:
@@ -1090,14 +1117,15 @@  static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag,
     case WRITE_VERIFY:
     case WRITE_VERIFY_12:
     case WRITE_VERIFY_16:
-        len = r->req.cmd.xfer / d->blocksize;
-        DPRINTF("Write %s(sector %" PRId64 ", count %d)\n",
+        r->sector_count = r->req.cmd.xfer / d->blocksize * s->cluster_size;
+        DPRINTF("Write %s(sector %" PRId64 ", blocks %d)\n",
                 (command & 0xe) == 0xe ? "And Verify " : "",
-                r->req.cmd.lba, len);
-        if (r->req.cmd.lba > s->max_lba)
+                r->req.cmd.lba, r->sector_count);
+        if (r->req.cmd.lba > s->max_lba) {
+            r->sector_count = 0;
             goto illegal_lba;
+        }
         r->sector = r->req.cmd.lba * s->cluster_size;
-        r->sector_count = len * s->cluster_size;
         is_write = 1;
         break;
     case MODE_SELECT:
@@ -1135,10 +1163,10 @@  static int32_t scsi_send_command(SCSIDevice *d, uint32_t tag,
         scsi_command_complete(r, CHECK_CONDITION, SENSE_CODE(LBA_OUT_OF_RANGE));
         return 0;
     }
-    if (r->sector_count == 0 && r->iov.iov_len == 0) {
+    if (r->sector_count == 0 && len == 0) {
         scsi_command_complete(r, GOOD, SENSE_CODE(NO_SENSE));
     }
-    len = r->sector_count * 512 + r->iov.iov_len;
+    len += r->sector_count * 512;
     if (is_write) {
         return -len;
     } else {