diff mbox

[v5] sheepdog: add discard/trim support for sheepdog

Message ID 1366042504-18354-1-git-send-email-namei.unix@gmail.com
State New
Headers show

Commit Message

Liu Yuan April 15, 2013, 4:15 p.m. UTC
From: Liu Yuan <tailai.ly@taobao.com>

The 'TRIM' command from VM that is to release underlying data storage for
better thin-provision is already supported by the Sheepdog.

This patch adds the TRIM support at QEMU part.

For older Sheepdog that doesn't support it, we return 0(success) to upper layer.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
---
v5:
 - adjust macro numbering
v4:
 - adjust discard macro
 - return success when operation is not supported by sheep
 - add coroutine_fn marker
v3:
 - fix a silly accidental deletion of 'default' in switch clause.
v2:
 - skip the object when it is not allocated

 block/sheepdog.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

Comments

MORITA Kazutaka April 15, 2013, 4:25 p.m. UTC | #1
At Tue, 16 Apr 2013 00:15:04 +0800,
Liu Yuan wrote:
> 
> From: Liu Yuan <tailai.ly@taobao.com>
> 
> The 'TRIM' command from VM that is to release underlying data storage for
> better thin-provision is already supported by the Sheepdog.
> 
> This patch adds the TRIM support at QEMU part.
> 
> For older Sheepdog that doesn't support it, we return 0(success) to upper layer.
> 
> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
> ---
> v5:
>  - adjust macro numbering
> v4:
>  - adjust discard macro
>  - return success when operation is not supported by sheep
>  - add coroutine_fn marker
> v3:
>  - fix a silly accidental deletion of 'default' in switch clause.
> v2:
>  - skip the object when it is not allocated
> 
>  block/sheepdog.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 55 insertions(+), 1 deletion(-)

Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Stefan Hajnoczi April 16, 2013, 8:18 a.m. UTC | #2
On Tue, Apr 16, 2013 at 12:15:04AM +0800, Liu Yuan wrote:
> @@ -727,6 +730,20 @@ static void coroutine_fn aio_read_response(void *opaque)
>              rsp.result = SD_RES_SUCCESS;
>          }
>          break;
> +    case AIOCB_DISCARD_OBJ:
> +        switch (rsp.result) {
> +        case SD_RES_INVALID_PARMS:
> +            error_report("you are running the old sheep that doesn't support "
> +                         "discard command.\n");

error_report() does not need '\n'.

The recently added ssh block driver has a similar case when the server
does not support fsync.  It does the following:

1. Print the error message once only per volume, avoid filling up logs
   on the host.
2. Include details of the volume/server in case the users is connected
   to multiple volumes/servers.  This allows them to figure out which
   server is outdated.

This makes the error messages safe from denial-of-service and includes
more useful information.
Kevin Wolf April 16, 2013, 8:47 a.m. UTC | #3
Am 16.04.2013 um 10:18 hat Stefan Hajnoczi geschrieben:
> On Tue, Apr 16, 2013 at 12:15:04AM +0800, Liu Yuan wrote:
> > @@ -727,6 +730,20 @@ static void coroutine_fn aio_read_response(void *opaque)
> >              rsp.result = SD_RES_SUCCESS;
> >          }
> >          break;
> > +    case AIOCB_DISCARD_OBJ:
> > +        switch (rsp.result) {
> > +        case SD_RES_INVALID_PARMS:
> > +            error_report("you are running the old sheep that doesn't support "
> > +                         "discard command.\n");
> 
> error_report() does not need '\n'.
> 
> The recently added ssh block driver has a similar case when the server
> does not support fsync.  It does the following:
> 
> 1. Print the error message once only per volume, avoid filling up logs
>    on the host.
> 2. Include details of the volume/server in case the users is connected
>    to multiple volumes/servers.  This allows them to figure out which
>    server is outdated.
> 
> This makes the error messages safe from denial-of-service and includes
> more useful information.

Or if we can check whether discard works during bdrv_open(), we could
already fail there for discard=on.

Kevin
Liu Yuan April 16, 2013, 9:08 a.m. UTC | #4
On 04/16/2013 04:47 PM, Kevin Wolf wrote:
> Am 16.04.2013 um 10:18 hat Stefan Hajnoczi geschrieben:
>> > On Tue, Apr 16, 2013 at 12:15:04AM +0800, Liu Yuan wrote:
>>> > > @@ -727,6 +730,20 @@ static void coroutine_fn aio_read_response(void *opaque)
>>> > >              rsp.result = SD_RES_SUCCESS;
>>> > >          }
>>> > >          break;
>>> > > +    case AIOCB_DISCARD_OBJ:
>>> > > +        switch (rsp.result) {
>>> > > +        case SD_RES_INVALID_PARMS:
>>> > > +            error_report("you are running the old sheep that doesn't support "
>>> > > +                         "discard command.\n");
>> > 
>> > error_report() does not need '\n'.
>> > 
>> > The recently added ssh block driver has a similar case when the server
>> > does not support fsync.  It does the following:
>> > 
>> > 1. Print the error message once only per volume, avoid filling up logs
>> >    on the host.

All the request for the volumes are firstly handled by the sheep daemon
this QEMU connects to, so we can say that if one discard request for any
volume return SD_RES_INVALID_PARMS, then all the volumes attatched to
this VM can't support discard operation.

>> > 2. Include details of the volume/server in case the users is connected
>> >    to multiple volumes/servers.  This allows them to figure out which
>> >    server is outdated.
>> > 

Multi-connections aren't supported yet (though planned), so this doesn't
apply for current SD.

>> > This makes the error messages safe from denial-of-service and includes
>> > more useful information.
> Or if we can check whether discard works during bdrv_open(), we could
> already fail there for discard=on.

Hmm, SD doesn't support a feature negotiation request. The most simple
way I can come up is add a s->enable_discard flag that set false when it
is reported discard operation isn't supported by the server connected.
What do you think?

Thanks,
Yuan
diff mbox

Patch

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 987018e..3b64690 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -27,6 +27,8 @@ 
 #define SD_OP_CREATE_AND_WRITE_OBJ  0x01
 #define SD_OP_READ_OBJ       0x02
 #define SD_OP_WRITE_OBJ      0x03
+/* 0x04 is used internally by Sheepdog */
+#define SD_OP_DISCARD_OBJ    0x05
 
 #define SD_OP_NEW_VDI        0x11
 #define SD_OP_LOCK_VDI       0x12
@@ -269,6 +271,7 @@  enum AIOCBState {
     AIOCB_WRITE_UDATA,
     AIOCB_READ_UDATA,
     AIOCB_FLUSH_CACHE,
+    AIOCB_DISCARD_OBJ,
 };
 
 struct SheepdogAIOCB {
@@ -656,7 +659,7 @@  static void coroutine_fn aio_read_response(void *opaque)
     int ret;
     AIOReq *aio_req = NULL;
     SheepdogAIOCB *acb;
-    unsigned long idx;
+    uint64_t idx;
 
     if (QLIST_EMPTY(&s->inflight_aio_head)) {
         goto out;
@@ -727,6 +730,20 @@  static void coroutine_fn aio_read_response(void *opaque)
             rsp.result = SD_RES_SUCCESS;
         }
         break;
+    case AIOCB_DISCARD_OBJ:
+        switch (rsp.result) {
+        case SD_RES_INVALID_PARMS:
+            error_report("you are running the old sheep that doesn't support "
+                         "discard command.\n");
+            rsp.result = SD_RES_SUCCESS;
+            break;
+        case SD_RES_SUCCESS:
+            idx = data_oid_to_idx(aio_req->oid);
+            s->inode.data_vdi_id[idx] = 0;
+            break;
+        default:
+            break;
+        }
     }
 
     if (rsp.result != SD_RES_SUCCESS) {
@@ -1016,6 +1033,9 @@  static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
         wlen = datalen;
         hdr.flags = SD_FLAG_CMD_WRITE | flags;
         break;
+    case AIOCB_DISCARD_OBJ:
+        hdr.opcode = SD_OP_DISCARD_OBJ;
+        break;
     }
 
     if (s->cache_flags) {
@@ -1633,6 +1653,15 @@  static int coroutine_fn sd_co_rw_vector(void *p)
                 flags = SD_FLAG_CMD_COW;
             }
             break;
+        case AIOCB_DISCARD_OBJ:
+            /*
+             * We discard the object only when the whole object is
+             * 1) allocated 2) trimmed. Otherwise, simply skip it.
+             */
+            if (len != SD_DATA_OBJ_SIZE || inode->data_vdi_id[idx] == 0) {
+                goto done;
+            }
+            break;
         default:
             break;
         }
@@ -2071,6 +2100,28 @@  static int sd_load_vmstate(BlockDriverState *bs, uint8_t *data,
 }
 
 
+static coroutine_fn int sd_co_discard(BlockDriverState *bs, int64_t sector_num,
+                                      int nb_sectors)
+{
+    SheepdogAIOCB *acb;
+    QEMUIOVector dummy;
+    int ret;
+
+    acb = sd_aio_setup(bs, &dummy, sector_num, nb_sectors);
+    acb->aiocb_type = AIOCB_DISCARD_OBJ;
+    acb->aio_done_func = sd_finish_aiocb;
+
+    ret = sd_co_rw_vector(acb);
+    if (ret <= 0) {
+        qemu_aio_release(acb);
+        return ret;
+    }
+
+    qemu_coroutine_yield();
+
+    return acb->ret;
+}
+
 static QEMUOptionParameter sd_create_options[] = {
     {
         .name = BLOCK_OPT_SIZE,
@@ -2103,6 +2154,7 @@  static BlockDriver bdrv_sheepdog = {
     .bdrv_co_readv  = sd_co_readv,
     .bdrv_co_writev = sd_co_writev,
     .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
+    .bdrv_co_discard = sd_co_discard,
 
     .bdrv_snapshot_create   = sd_snapshot_create,
     .bdrv_snapshot_goto     = sd_snapshot_goto,
@@ -2128,6 +2180,7 @@  static BlockDriver bdrv_sheepdog_tcp = {
     .bdrv_co_readv  = sd_co_readv,
     .bdrv_co_writev = sd_co_writev,
     .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
+    .bdrv_co_discard = sd_co_discard,
 
     .bdrv_snapshot_create   = sd_snapshot_create,
     .bdrv_snapshot_goto     = sd_snapshot_goto,
@@ -2153,6 +2206,7 @@  static BlockDriver bdrv_sheepdog_unix = {
     .bdrv_co_readv  = sd_co_readv,
     .bdrv_co_writev = sd_co_writev,
     .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
+    .bdrv_co_discard = sd_co_discard,
 
     .bdrv_snapshot_create   = sd_snapshot_create,
     .bdrv_snapshot_goto     = sd_snapshot_goto,