diff mbox

ISCSI: Add support for thin-provisioning via discard/UNMAP and bigger LUNs

Message ID 1335248944-10765-2-git-send-email-ronniesahlberg@gmail.com
State New
Headers show

Commit Message

ronnie sahlberg April 24, 2012, 6:29 a.m. UTC
Update the configure test for libiscsi support to detect version 1.3 or later.
Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP commands.

Update the iscsi block layer to use READCAPACITY16 to detect the size of the LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB.

Update to implement bdrv_aio_discard() using the UNMAP command.
This allows us to use thin-provisioned LUNs from TGTD and other iSCSI targets that support thin-provisioning.

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
---
 block/iscsi.c |   86 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 configure     |    5 ++-
 2 files changed, 77 insertions(+), 14 deletions(-)

Comments

Paolo Bonzini April 24, 2012, 6:46 a.m. UTC | #1
Il 24/04/2012 08:29, Ronnie Sahlberg ha scritto:
> Update the configure test for libiscsi support to detect version 1.3 or later.
> Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP commands.
> 
> Update the iscsi block layer to use READCAPACITY16 to detect the size of the LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB.
> 
> Update to implement bdrv_aio_discard() using the UNMAP command.
> This allows us to use thin-provisioned LUNs from TGTD and other iSCSI targets that support thin-provisioning.

Looks good.  Kevin, do you want me to take libiscsi patches via the SCSI
tree?

As an aside, I am not really sure of the utility of adding these utility
functions directly in libiscsi, rather than making it a pure transport
library.  block/iscsi.c is going to grow as you add more functionality
(e.g. WRITE SAME commands), and libiscsi will have to be updated each
time in lockstep.

I can see the value of basic read/write/flush and readcap10/16, but with
unmap it's starting to be a bit more specific.  Are there other clients
of libiscsi that use these functions?  Should they be placed into
block/iscsi.c or a new block/iscsi-cdb.c instead?

Paolo
Kevin Wolf April 24, 2012, 7:41 a.m. UTC | #2
Am 24.04.2012 08:46, schrieb Paolo Bonzini:
> Il 24/04/2012 08:29, Ronnie Sahlberg ha scritto:
>> Update the configure test for libiscsi support to detect version 1.3 or later.
>> Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP commands.
>>
>> Update the iscsi block layer to use READCAPACITY16 to detect the size of the LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB.
>>
>> Update to implement bdrv_aio_discard() using the UNMAP command.
>> This allows us to use thin-provisioned LUNs from TGTD and other iSCSI targets that support thin-provisioning.
> 
> Looks good.  Kevin, do you want me to take libiscsi patches via the SCSI
> tree?

Sure, if you like, go ahead. Feel free to update MAINTAINERS as well.

> As an aside, I am not really sure of the utility of adding these utility
> functions directly in libiscsi, rather than making it a pure transport
> library.  block/iscsi.c is going to grow as you add more functionality
> (e.g. WRITE SAME commands), and libiscsi will have to be updated each
> time in lockstep.
> 
> I can see the value of basic read/write/flush and readcap10/16, but with
> unmap it's starting to be a bit more specific.  Are there other clients
> of libiscsi that use these functions?  Should they be placed into
> block/iscsi.c or a new block/iscsi-cdb.c instead?

I think I agree. For the more obscure commands, the qemu driver should
probably build the CDB on its own and use a generic function.

Kevin
ronnie sahlberg April 24, 2012, 8:02 a.m. UTC | #3
On Tue, Apr 24, 2012 at 4:46 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 24/04/2012 08:29, Ronnie Sahlberg ha scritto:
>> Update the configure test for libiscsi support to detect version 1.3 or later.
>> Version 1.3 of libiscsi provides both READCAPACITY16 as well as UNMAP commands.
>>
>> Update the iscsi block layer to use READCAPACITY16 to detect the size of the LUN instead of READCAPACITY10. This allows support for LUNs larger than 2TB.
>>
>> Update to implement bdrv_aio_discard() using the UNMAP command.
>> This allows us to use thin-provisioned LUNs from TGTD and other iSCSI targets that support thin-provisioning.
>
> Looks good.  Kevin, do you want me to take libiscsi patches via the SCSI
> tree?
>
> As an aside, I am not really sure of the utility of adding these utility
> functions directly in libiscsi, rather than making it a pure transport
> library.  block/iscsi.c is going to grow as you add more functionality
> (e.g. WRITE SAME commands), and libiscsi will have to be updated each
> time in lockstep.
>
> I can see the value of basic read/write/flush and readcap10/16, but with
> unmap it's starting to be a bit more specific.  Are there other clients
> of libiscsi that use these functions?  Should they be placed into
> block/iscsi.c or a new block/iscsi-cdb.c instead?

I see your point.
I like to add scsi commands as I use them to libiscsi, since then I
dont have to
re-write the marshalling/unmarshalling code everytime in my small test
and utility programs.
For example when i want to write some one-off small tools to test something.


But, yes. There is no real need to use them directly from qemu.


So ignore this patch for now. I will redo UNMAP in the patch to
instead use the generic scsi function inside libiscsi.

That will serve the purpose to verify that the public API in libiscsi
is sufficient for accessing the generic transport and
secondly that will show a useful example on how to send CDB+data to
and to receive data back from the generic function.

This generic API would be what a future virt-scsi->libiscsi
integration would use anyway.



regards
ronnie sahlberg
Paolo Bonzini April 24, 2012, 8:05 a.m. UTC | #4
Il 24/04/2012 10:02, ronnie sahlberg ha scritto:
> So ignore this patch for now. I will redo UNMAP in the patch to
> instead use the generic scsi function inside libiscsi.
> 
> That will serve the purpose to verify that the public API in libiscsi
> is sufficient for accessing the generic transport and
> secondly that will show a useful example on how to send CDB+data to
> and to receive data back from the generic function.
> 
> This generic API would be what a future virt-scsi->libiscsi
> integration would use anyway.

I will be on holiday from tomorrow to May 1, and I won't be able to send
a pull request today.  As I want to minimize the time I spend looking at
qemu-devel, :) I'll take this patch as is, since the new libiscsi
version is needed anyway for READ CAPACITY (16).

You could write a follow-up patch to teach iscsi.c about WRITE SAME(10)
and WRITE SAME(16) with the unmap bit set, and use generic CDB+data
functions for those.

Paolo
ronnie sahlberg April 24, 2012, 8:11 a.m. UTC | #5
On Tue, Apr 24, 2012 at 6:05 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 24/04/2012 10:02, ronnie sahlberg ha scritto:
>> So ignore this patch for now. I will redo UNMAP in the patch to
>> instead use the generic scsi function inside libiscsi.
>>
>> That will serve the purpose to verify that the public API in libiscsi
>> is sufficient for accessing the generic transport and
>> secondly that will show a useful example on how to send CDB+data to
>> and to receive data back from the generic function.
>>
>> This generic API would be what a future virt-scsi->libiscsi
>> integration would use anyway.
>
> I will be on holiday from tomorrow to May 1, and I won't be able to send
> a pull request today.  As I want to minimize the time I spend looking at
> qemu-devel, :) I'll take this patch as is, since the new libiscsi
> version is needed anyway for READ CAPACITY (16).
>
> You could write a follow-up patch to teach iscsi.c about WRITE SAME(10)
> and WRITE SAME(16) with the unmap bit set, and use generic CDB+data
> functions for those.
>

Ok. That works for me. Thanks.

regards
ronnie sahlberg
Paolo Bonzini May 4, 2012, 8:06 a.m. UTC | #6
Il 24/04/2012 08:29, Ronnie Sahlberg ha scritto:
> +    itask->bs->total_sectors    = rc16->returned_lba *
> +                               rc16->block_length / BDRV_SECTOR_SIZE ;

Ronnie, does this need to be "(rc16->returned_lba + 1) * ..."?

READ CAPACITY returns the highest valid LBA, not the size.

Please send a patch to fix this up if that's the case.

Paolo
ronnie sahlberg May 4, 2012, 8:20 a.m. UTC | #7
On Fri, May 4, 2012 at 6:06 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 24/04/2012 08:29, Ronnie Sahlberg ha scritto:
>> +    itask->bs->total_sectors    = rc16->returned_lba *
>> +                               rc16->block_length / BDRV_SECTOR_SIZE ;
>
> Ronnie, does this need to be "(rc16->returned_lba + 1) * ..."?
>
> READ CAPACITY returns the highest valid LBA, not the size.
>
> Please send a patch to fix this up if that's the case.
>

Absolutely. A patch is on its way.
This is probably the third time I have made the same mistake of
RC10/16 returns the lba of last block, not the number of blocks.
I think READCAPACITY is just not compatible with my brain here.


regards
ronnie sahlberg
diff mbox

Patch

diff --git a/block/iscsi.c b/block/iscsi.c
index bd3ca11..eb49093 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -383,6 +383,65 @@  iscsi_aio_flush(BlockDriverState *bs,
     return &acb->common;
 }
 
+static void
+iscsi_unmap_cb(struct iscsi_context *iscsi, int status,
+                     void *command_data, void *opaque)
+{
+    IscsiAIOCB *acb = opaque;
+
+    if (acb->canceled != 0) {
+        qemu_aio_release(acb);
+        scsi_free_scsi_task(acb->task);
+        acb->task = NULL;
+        return;
+    }
+
+    acb->status = 0;
+    if (status < 0) {
+        error_report("Failed to unmap data on iSCSI lun. %s",
+                     iscsi_get_error(iscsi));
+        acb->status = -EIO;
+    }
+
+    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
+    scsi_free_scsi_task(acb->task);
+    acb->task = NULL;
+}
+
+static BlockDriverAIOCB *
+iscsi_aio_discard(BlockDriverState *bs,
+                  int64_t sector_num, int nb_sectors,
+                  BlockDriverCompletionFunc *cb, void *opaque)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+    IscsiAIOCB *acb;
+    struct unmap_list list[1];
+
+    acb = qemu_aio_get(&iscsi_aio_pool, bs, cb, opaque);
+
+    acb->iscsilun = iscsilun;
+    acb->canceled   = 0;
+
+    list[0].lba = sector_qemu2lun(sector_num, iscsilun);
+    list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size;
+
+    acb->task = iscsi_unmap_task(iscsi, iscsilun->lun,
+                                 0, 0, &list[0], 1,
+                                 iscsi_unmap_cb,
+                                 acb);
+    if (acb->task == NULL) {
+        error_report("iSCSI: Failed to send unmap command. %s",
+                     iscsi_get_error(iscsi));
+        qemu_aio_release(acb);
+        return NULL;
+    }
+
+    iscsi_set_events(iscsilun);
+
+    return &acb->common;
+}
+
 static int64_t
 iscsi_getlength(BlockDriverState *bs)
 {
@@ -396,11 +455,11 @@  iscsi_getlength(BlockDriverState *bs)
 }
 
 static void
-iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status,
+iscsi_readcapacity16_cb(struct iscsi_context *iscsi, int status,
                         void *command_data, void *opaque)
 {
     struct IscsiTask *itask = opaque;
-    struct scsi_readcapacity10 *rc10;
+    struct scsi_readcapacity16 *rc16;
     struct scsi_task *task = command_data;
 
     if (status != 0) {
@@ -412,26 +471,25 @@  iscsi_readcapacity10_cb(struct iscsi_context *iscsi, int status,
         return;
     }
 
-    rc10 = scsi_datain_unmarshall(task);
-    if (rc10 == NULL) {
-        error_report("iSCSI: Failed to unmarshall readcapacity10 data.");
+    rc16 = scsi_datain_unmarshall(task);
+    if (rc16 == NULL) {
+        error_report("iSCSI: Failed to unmarshall readcapacity16 data.");
         itask->status   = 1;
         itask->complete = 1;
         scsi_free_scsi_task(task);
         return;
     }
 
-    itask->iscsilun->block_size = rc10->block_size;
-    itask->iscsilun->num_blocks = rc10->lba;
-    itask->bs->total_sectors = (uint64_t)rc10->lba *
-                               rc10->block_size / BDRV_SECTOR_SIZE ;
+    itask->iscsilun->block_size = rc16->block_length;
+    itask->iscsilun->num_blocks = rc16->returned_lba;
+    itask->bs->total_sectors    = rc16->returned_lba *
+                               rc16->block_length / BDRV_SECTOR_SIZE ;
 
     itask->status   = 0;
     itask->complete = 1;
     scsi_free_scsi_task(task);
 }
 
-
 static void
 iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data,
                  void *opaque)
@@ -445,10 +503,10 @@  iscsi_connect_cb(struct iscsi_context *iscsi, int status, void *command_data,
         return;
     }
 
-    task = iscsi_readcapacity10_task(iscsi, itask->iscsilun->lun, 0, 0,
-                                   iscsi_readcapacity10_cb, opaque);
+    task = iscsi_readcapacity16_task(iscsi, itask->iscsilun->lun,
+                                   iscsi_readcapacity16_cb, opaque);
     if (task == NULL) {
-        error_report("iSCSI: failed to send readcapacity command.");
+        error_report("iSCSI: failed to send readcapacity16 command.");
         itask->status   = 1;
         itask->complete = 1;
         return;
@@ -700,6 +758,8 @@  static BlockDriver bdrv_iscsi = {
     .bdrv_aio_readv  = iscsi_aio_readv,
     .bdrv_aio_writev = iscsi_aio_writev,
     .bdrv_aio_flush  = iscsi_aio_flush,
+
+    .bdrv_aio_discard = iscsi_aio_discard,
 };
 
 static void iscsi_block_init(void)
diff --git a/configure b/configure
index 2d62d12..1c693f0 100755
--- a/configure
+++ b/configure
@@ -2516,10 +2516,13 @@  fi
 
 ##########################################
 # Do we have libiscsi
+# We check for iscsi_unmap_sync() to make sure we have a
+# recent enough version of libiscsi.
 if test "$libiscsi" != "no" ; then
   cat > $TMPC << EOF
+#include <stdio.h>
 #include <iscsi/iscsi.h>
-int main(void) { iscsi_create_context(""); return 0; }
+int main(void) { iscsi_unmap_sync(NULL,0,0,0,NULL,0); return 0; }
 EOF
   if compile_prog "-Werror" "-liscsi" ; then
     libiscsi="yes"