diff mbox series

[v4,2/7] nbd: Add new qemu:allocation-depth metadata context

Message ID 20201009215533.1194742-3-eblake@redhat.com
State New
Headers show
Series Exposing backing-chain allocation over NBD | expand

Commit Message

Eric Blake Oct. 9, 2020, 9:55 p.m. UTC
'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point the
qemu:dirty-bitmap:NAME metadata context can expose that information
via the creation of a temporary bitmap, but we can shorten the effort
by adding a new qemu:allocation-depth metadata context that does the
same thing without an intermediate bitmap (this patch does not
eliminate the need for that proposal, as it will have other uses as
well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); an obvious extension
would be to provide the actual depth in bits 31-4 while keeping bits
1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
from a hex number).  But adding this extension would require
bdrv_is_allocated_above to return a depth number.

While documenting things, remember that although the NBD protocol has
NBD_OPT_SET_META_CONTEXT, the rest of its documentation refers to
'metadata context', which is a more apt description of what is
actually being used by NBD_CMD_BLOCK_STATUS: the user is requesting
metadata by passing one or more context names.  So I also touched up
some existing wording to prefer the term 'metadata context' where it
makes sense.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt | 27 ++++++++++---
 include/block/nbd.h  | 12 ++++--
 nbd/server.c         | 92 ++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 116 insertions(+), 15 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Oct. 14, 2020, 11:52 a.m. UTC | #1
10.10.2020 00:55, Eric Blake wrote:
> 'qemu-img map' provides a way to determine which extents of an image
> come from the top layer vs. inherited from a backing chain.  This is
> useful information worth exposing over NBD.  There is a proposal to
> add a QMP command block-dirty-bitmap-populate which can create a dirty
> bitmap that reflects allocation information, at which point the
> qemu:dirty-bitmap:NAME metadata context can expose that information
> via the creation of a temporary bitmap, but we can shorten the effort
> by adding a new qemu:allocation-depth metadata context that does the
> same thing without an intermediate bitmap (this patch does not
> eliminate the need for that proposal, as it will have other uses as
> well).
> 
> For this patch, I just encoded a tri-state value (unallocated, from
> this layer, from any of the backing layers); an obvious extension
> would be to provide the actual depth in bits 31-4 while keeping bits
> 1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
> from a hex number).  But adding this extension would require
> bdrv_is_allocated_above to return a depth number.
> 
> While documenting things, remember that although the NBD protocol has
> NBD_OPT_SET_META_CONTEXT, the rest of its documentation refers to
> 'metadata context', which is a more apt description of what is
> actually being used by NBD_CMD_BLOCK_STATUS: the user is requesting
> metadata by passing one or more context names.  So I also touched up
> some existing wording to prefer the term 'metadata context' where it
> makes sense.
> 
> Note that this patch does not actually enable any way to request a
> server to enable this context; that will come in the next patch.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---
>   docs/interop/nbd.txt | 27 ++++++++++---

[..]

> +In the allocation depth context, bits 0 and 1 form a tri-state value:
> +
> +    bits 0-1: 00: NBD_STATE_DEPTH_UNALLOC, the extent is unallocated
> +              01: NBD_STATE_DEPTH_LOCAL, the extent is allocated in the
> +                  top level of the image

Hmm. I always thought that "image" == file, so backing chain is a chain of images,
not a several levels of one image. If it is so, than it should be "the top level image".
And "levels of the image" may designate internal qcow2 snapshots unrelated here..
Eric Blake Oct. 22, 2020, 9:45 p.m. UTC | #2
On 10/14/20 6:52 AM, Vladimir Sementsov-Ogievskiy wrote:

>>   docs/interop/nbd.txt | 27 ++++++++++---
> 
> [..]
> 
>> +In the allocation depth context, bits 0 and 1 form a tri-state value:
>> +
>> +    bits 0-1: 00: NBD_STATE_DEPTH_UNALLOC, the extent is unallocated
>> +              01: NBD_STATE_DEPTH_LOCAL, the extent is allocated in the
>> +                  top level of the image
> 
> Hmm. I always thought that "image" == file, so backing chain is a chain
> of images,
> not a several levels of one image. If it is so, than it should be "the
> top level image".
> And "levels of the image" may designate internal qcow2 snapshots
> unrelated here..

It's fuzzy.  From the guest point of view, we are serving a single guest
image by use of multiple files in the host.  I will do s/level/layer/,
to match the wording I already had on the next line:

>               10: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
>                   backing layer
diff mbox series

Patch

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f3b3cacc9621..a55e5a8776c8 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -17,19 +17,35 @@  namespace "qemu".

 == "qemu" namespace ==

-The "qemu" namespace currently contains only one type of context,
-related to exposing the contents of a dirty bitmap alongside the
-associated disk contents.  That context has the following form:
+The "qemu" namespace currently contains two available metadata context
+types.  The first is related to exposing the contents of a dirty
+bitmap alongside the associated disk contents.  That metadata context
+is named with the following form:

     qemu:dirty-bitmap:<dirty-bitmap-export-name>

 Each dirty-bitmap metadata context defines only one flag for extents
 in reply for NBD_CMD_BLOCK_STATUS:

-    bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"
+    bit 0: NBD_STATE_DIRTY, set when the extent is "dirty"
+
+The second is related to exposing the source of various extents within
+the image, with a single metadata context named:
+
+    qemu:allocation-depth
+
+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+    bits 0-1: 00: NBD_STATE_DEPTH_UNALLOC, the extent is unallocated
+              01: NBD_STATE_DEPTH_LOCAL, the extent is allocated in the
+                  top level of the image
+              10: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
+                  backing layer
+              11: invalid, never returned

 For NBD_OPT_LIST_META_CONTEXT the following queries are supported
-in addition to "qemu:dirty-bitmap:<dirty-bitmap-export-name>":
+in addition to the specific "qemu:allocation-depth" and
+"qemu:dirty-bitmap:<dirty-bitmap-export-name>":

 * "qemu:" - returns list of all available metadata contexts in the
             namespace.
@@ -55,3 +71,4 @@  the operation of that feature.
 NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
+* 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3dd9a04546ec..0bbf92f02951 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@ 
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -47,7 +47,7 @@  typedef struct NBDOptionReply NBDOptionReply;
 typedef struct NBDOptionReplyMetaContext {
     NBDOptionReply h; /* h.type = NBD_REP_META_CONTEXT, h.length > 4 */
     uint32_t context_id;
-    /* meta context name follows */
+    /* metadata context name follows */
 } QEMU_PACKED NBDOptionReplyMetaContext;

 /* Transmission phase structs
@@ -229,7 +229,7 @@  enum {
 #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)

 /*
- * Maximum size of a protocol string (export name, meta context name,
+ * Maximum size of a protocol string (export name, metadata context name,
  * etc.).  Use malloc rather than stack allocation for storage of a
  * string.
  */
@@ -259,6 +259,12 @@  enum {
 /* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_DIRTY (1 << 0)

+/* Extent flags for qemu:allocation-depth in NBD_REPLY_TYPE_BLOCK_STATUS */
+#define NBD_STATE_DEPTH_MASK    0x3
+#define NBD_STATE_DEPTH_UNALLOC 0x0
+#define NBD_STATE_DEPTH_LOCAL   0x1
+#define NBD_STATE_DEPTH_BACKING 0x2
+
 static inline bool nbd_reply_type_is_error(int type)
 {
     return type & (1 << 15);
diff --git a/nbd/server.c b/nbd/server.c
index e75c825879aa..ae6f8a8e5429 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -27,7 +27,8 @@ 
 #include "qemu/units.h"

 #define NBD_META_ID_BASE_ALLOCATION 0
-#define NBD_META_ID_DIRTY_BITMAP 1
+#define NBD_META_ID_ALLOCATION_DEPTH 1
+#define NBD_META_ID_DIRTY_BITMAP 2

 /*
  * NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical
@@ -94,6 +95,7 @@  struct NBDExport {
     BlockBackend *eject_notifier_blk;
     Notifier eject_notifier;

+    bool allocation_depth;
     BdrvDirtyBitmap *export_bitmap;
     char *export_bitmap_context;
 };
@@ -108,6 +110,7 @@  typedef struct NBDExportMetaContexts {
     bool valid; /* means that negotiation of the option finished without
                    errors */
     bool base_allocation; /* export base:allocation context (block status) */
+    bool allocation_depth; /* export qemu:allocation-depth */
     bool bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
 } NBDExportMetaContexts;

@@ -852,7 +855,8 @@  static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
 /* nbd_meta_qemu_query
  *
  * Handle queries to 'qemu' namespace. For now, only the qemu:dirty-bitmap:
- * context is available.  Return true if @query has been handled.
+ * and qemu:allocation-depth contexts are available.  Return true if @query
+ * has been handled.
  */
 static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
                                 const char *query)
@@ -864,12 +868,19 @@  static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,

     if (!*query) {
         if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+            meta->allocation_depth = meta->exp->allocation_depth;
             meta->bitmap = !!meta->exp->export_bitmap;
         }
         trace_nbd_negotiate_meta_query_parse("empty");
         return true;
     }

+    if (strcmp(query, "allocation-depth") == 0) {
+        trace_nbd_negotiate_meta_query_parse("allocation-depth");
+        meta->allocation_depth = meta->exp->allocation_depth;
+        return true;
+    }
+
     if (nbd_strshift(&query, "dirty-bitmap:")) {
         trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
         if (!meta->exp->export_bitmap) {
@@ -884,7 +895,7 @@  static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
         return true;
     }

-    trace_nbd_negotiate_meta_query_skip("not dirty-bitmap");
+    trace_nbd_negotiate_meta_query_skip("unknown qemu context");
     return true;
 }

@@ -984,6 +995,7 @@  static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
         /* enable all known contexts */
         meta->base_allocation = true;
+        meta->allocation_depth = meta->exp->allocation_depth;
         meta->bitmap = !!meta->exp->export_bitmap;
     } else {
         for (i = 0; i < nb_queries; ++i) {
@@ -1003,6 +1015,15 @@  static int nbd_negotiate_meta_queries(NBDClient *client,
         }
     }

+    if (meta->allocation_depth) {
+        ret = nbd_negotiate_send_meta_context(client, "qemu:allocation-depth",
+                                              NBD_META_ID_ALLOCATION_DEPTH,
+                                              errp);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     if (meta->bitmap) {
         ret = nbd_negotiate_send_meta_context(client,
                                               meta->exp->export_bitmap_context,
@@ -1961,6 +1982,40 @@  static int blockstatus_to_extents(BlockDriverState *bs, uint64_t offset,
     return 0;
 }

+static int blockalloc_to_extents(BlockDriverState *bs, uint64_t offset,
+                                 uint64_t bytes, NBDExtentArray *ea)
+{
+    while (bytes) {
+        uint32_t flags;
+        int64_t num;
+        int ret = bdrv_is_allocated(bs, offset, bytes, &num);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (ret == 1) {
+            flags = NBD_STATE_DEPTH_LOCAL;
+        } else {
+            ret = bdrv_is_allocated_above(bs, NULL, false, offset, num,
+                                          &num);
+            if (ret < 0) {
+                return ret;
+            }
+            flags = ret ? NBD_STATE_DEPTH_BACKING : NBD_STATE_DEPTH_UNALLOC;
+        }
+
+        if (nbd_extent_array_add(ea, num, flags) < 0) {
+            return 0;
+        }
+
+        offset += num;
+        bytes -= num;
+    }
+
+    return 0;
+}
+
 /*
  * nbd_co_send_extents
  *
@@ -2000,7 +2055,11 @@  static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
     unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
     g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);

-    ret = blockstatus_to_extents(bs, offset, length, ea);
+    if (context_id == NBD_META_ID_BASE_ALLOCATION) {
+        ret = blockstatus_to_extents(bs, offset, length, ea);
+    } else {
+        ret = blockalloc_to_extents(bs, offset, length, ea);
+    }
     if (ret < 0) {
         return nbd_co_send_structured_error(
                 client, handle, -ret, "can't get block status", errp);
@@ -2335,16 +2394,20 @@  static coroutine_fn int nbd_handle_request(NBDClient *client,
         }
         if (client->export_meta.valid &&
             (client->export_meta.base_allocation ||
+             client->export_meta.allocation_depth ||
              client->export_meta.bitmap))
         {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
+            int contexts_remaining = client->export_meta.base_allocation +
+                client->export_meta.allocation_depth +
+                client->export_meta.bitmap;

             if (client->export_meta.base_allocation) {
                 ret = nbd_co_send_block_status(client, request->handle,
                                                blk_bs(exp->common.blk),
                                                request->from,
                                                request->len, dont_fragment,
-                                               !client->export_meta.bitmap,
+                                               !--contexts_remaining,
                                                NBD_META_ID_BASE_ALLOCATION,
                                                errp);
                 if (ret < 0) {
@@ -2352,17 +2415,32 @@  static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

+            if (client->export_meta.allocation_depth) {
+                ret = nbd_co_send_block_status(client, request->handle,
+                                               blk_bs(exp->common.blk),
+                                               request->from, request->len,
+                                               dont_fragment,
+                                               !--contexts_remaining,
+                                               NBD_META_ID_ALLOCATION_DEPTH,
+                                               errp);
+                if (ret < 0) {
+                    return ret;
+                }
+            }
+
             if (client->export_meta.bitmap) {
                 ret = nbd_co_send_bitmap(client, request->handle,
                                          client->exp->export_bitmap,
                                          request->from, request->len,
-                                         dont_fragment,
-                                         true, NBD_META_ID_DIRTY_BITMAP, errp);
+                                         dont_fragment, !--contexts_remaining,
+                                         NBD_META_ID_DIRTY_BITMAP, errp);
                 if (ret < 0) {
                     return ret;
                 }
             }

+            assert(!contexts_remaining);
+
             return 0;
         } else {
             return nbd_send_generic_reply(client, request->handle, -EINVAL,