diff mbox series

[v2,4/5] nbd: Add new qemu:allocation-depth metacontext

Message ID 20200930121105.667049-5-eblake@redhat.com
State New
Headers show
Series Exposing backing-chain allocation over NBD | expand

Commit Message

Eric Blake Sept. 30, 2020, 12:11 p.m. UTC
'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point
qemu:dirty-bitmap:NAME can expose that information via the creation of
a temporary bitmap, but we can shorten the effort by adding a new
qemu:allocation-depth context that does the same thing without an
intermediate bitmap (this patch does not eliminate the need for that
proposal, as it will have other uses as well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); an obvious extension
would be to provide the actual depth in bits 31-4 while keeping bits
1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
from a hex number).  But this extension would require
bdrv_is_allocated_above to return a depth number.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/interop/nbd.txt |  22 +++++++--
 include/block/nbd.h  |   8 +++-
 nbd/server.c         | 105 ++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 125 insertions(+), 10 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Oct. 7, 2020, 1:40 p.m. UTC | #1
30.09.2020 15:11, Eric Blake wrote:
> 'qemu-img map' provides a way to determine which extents of an image
> come from the top layer vs. inherited from a backing chain.  This is
> useful information worth exposing over NBD.  There is a proposal to
> add a QMP command block-dirty-bitmap-populate which can create a dirty
> bitmap that reflects allocation information, at which point
> qemu:dirty-bitmap:NAME can expose that information via the creation of
> a temporary bitmap, but we can shorten the effort by adding a new
> qemu:allocation-depth context that does the same thing without an
> intermediate bitmap (this patch does not eliminate the need for that
> proposal, as it will have other uses as well).
> 
> For this patch, I just encoded a tri-state value (unallocated, from
> this layer, from any of the backing layers); an obvious extension
> would be to provide the actual depth in bits 31-4 while keeping bits
> 1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
> from a hex number).  But this extension would require
> bdrv_is_allocated_above to return a depth number.
> 
> Note that this patch does not actually enable any way to request a
> server to enable this context; that will come in the next patch.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   docs/interop/nbd.txt |  22 +++++++--
>   include/block/nbd.h  |   8 +++-
>   nbd/server.c         | 105 ++++++++++++++++++++++++++++++++++++++++---
>   3 files changed, 125 insertions(+), 10 deletions(-)
> 
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index f3b3cacc9621..56efec7aee12 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -17,9 +17,9 @@ namespace "qemu".
> 
>   == "qemu" namespace ==
> 
> -The "qemu" namespace currently contains only one type of context,
> -related to exposing the contents of a dirty bitmap alongside the
> -associated disk contents.  That context has the following form:
> +The "qemu" namespace currently contains two types of context.  The
> +first is related to exposing the contents of a dirty bitmap alongside
> +the associated disk contents.  That context has the following form:
> 
>       qemu:dirty-bitmap:<dirty-bitmap-export-name>
> 
> @@ -28,8 +28,21 @@ in reply for NBD_CMD_BLOCK_STATUS:
> 
>       bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"
> 
> +The second is related to exposing the source of various extents within
> +the image, with a single context named:
> +
> +    qemu:allocation-depth
> +
> +In the allocation depth context, bits 0 and 1 form a tri-state value:
> +
> +    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
> +    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image

Maybe, s/this/the top level/, as, what is "this" for NBD client?

> +    bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
> +               backing layer
> +
>   For NBD_OPT_LIST_META_CONTEXT the following queries are supported
> -in addition to "qemu:dirty-bitmap:<dirty-bitmap-export-name>":
> +in addition to the specific "qemu:allocation-depth" and
> +"qemu:dirty-bitmap:<dirty-bitmap-export-name>":
> 
>   * "qemu:" - returns list of all available metadata contexts in the
>               namespace.
> @@ -55,3 +68,4 @@ the operation of that feature.
>   NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
>   * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
>   NBD_CMD_FLAG_FAST_ZERO
> +* 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 3dd9a04546ec..06208bc25027 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -1,5 +1,5 @@
>   /*
> - *  Copyright (C) 2016-2019 Red Hat, Inc.
> + *  Copyright (C) 2016-2020 Red Hat, Inc.
>    *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
>    *
>    *  Network Block Device
> @@ -259,6 +259,12 @@ enum {
>   /* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
>   #define NBD_STATE_DIRTY (1 << 0)
> 
> +/* Extent flags for qemu:allocation-depth in NBD_REPLY_TYPE_BLOCK_STATUS */
> +#define NBD_STATE_DEPTH_UNALLOC (0 << 0)
> +#define NBD_STATE_DEPTH_LOCAL   (1 << 0)
> +#define NBD_STATE_DEPTH_BACKING (2 << 0)
> +#define NBD_STATE_DEPTH_MASK    (0x3)
> +
>   static inline bool nbd_reply_type_is_error(int type)
>   {
>       return type & (1 << 15);
> diff --git a/nbd/server.c b/nbd/server.c
> index 7271a09b5c2b..830b21000be3 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -27,7 +27,8 @@
>   #include "qemu/units.h"
> 
>   #define NBD_META_ID_BASE_ALLOCATION 0
> -#define NBD_META_ID_DIRTY_BITMAP 1
> +#define NBD_META_ID_ALLOCATION_DEPTH 1
> +#define NBD_META_ID_DIRTY_BITMAP 2
> 
>   /*
>    * NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical
> @@ -94,6 +95,7 @@ struct NBDExport {
>       BlockBackend *eject_notifier_blk;
>       Notifier eject_notifier;
> 
> +    bool alloc_context;
>       BdrvDirtyBitmap *export_bitmap;
>       char *export_bitmap_context;
>   };
> @@ -108,6 +110,7 @@ typedef struct NBDExportMetaContexts {
>       bool valid; /* means that negotiation of the option finished without
>                      errors */
>       bool base_allocation; /* export base:allocation context (block status) */
> +    bool allocation_depth; /* export qemu:allocation-depth */
>       bool bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
>   } NBDExportMetaContexts;
> 
> @@ -806,7 +809,7 @@ static bool nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
>   {
>       if (!*query) {
>           trace_nbd_negotiate_meta_query_parse("empty");
> -        return client->opt == NBD_OPT_LIST_META_CONTEXT;
> +        return !*pattern || client->opt == NBD_OPT_LIST_META_CONTEXT;
>       }
>       if (strcmp(query, pattern) == 0) {
>           trace_nbd_negotiate_meta_query_parse(pattern);
> @@ -864,12 +867,21 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
> 
>       if (!*query) {
>           if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
> +            meta->allocation_depth = meta->exp->alloc_context;
>               meta->bitmap = !!meta->exp->export_bitmap;
>           }
>           trace_nbd_negotiate_meta_query_parse("empty");
>           return true;
>       }
> 
> +    if (nbd_strshift(&query, "allocation-depth")) {
> +        trace_nbd_negotiate_meta_query_parse("allocation-depth");
> +        if (nbd_meta_empty_or_pattern(client, "", query)) {

How much it differs from "if (!*query) {", I don't see...

> +            meta->allocation_depth = meta->exp->alloc_context;
> +        }
> +        return true;
> +    }

why not just

  if (!strcmp(query, "allocation-depth")) {
     meta->allocation_depth = meta->exp->alloc_context;
     return true;
  }

?

With your code you also parse something like "allocation-depth-extended".. Is it on purpose?

> +
>       if (nbd_strshift(&query, "dirty-bitmap:")) {
>           trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
>           if (!meta->exp->export_bitmap) {


Also, "trace_nbd_negotiate_meta_query_skip("not dirty-bitmap"); " at function end should
now be something like trace_nbd_negotiate_meta_query_skip("unknown context in qemu: namespace");

> @@ -984,6 +996,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>       if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
>           /* enable all known contexts */
>           meta->base_allocation = true;
> +        meta->allocation_depth = meta->exp->alloc_context;
>           meta->bitmap = !!meta->exp->export_bitmap;
>       } else {
>           for (i = 0; i < nb_queries; ++i) {
> @@ -1003,6 +1016,15 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
>           }
>       }
> 
> +    if (meta->allocation_depth) {
> +        ret = nbd_negotiate_send_meta_context(client, "qemu:allocation-depth",
> +                                              NBD_META_ID_ALLOCATION_DEPTH,
> +                                              errp);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> +
>       if (meta->bitmap) {
>           ret = nbd_negotiate_send_meta_context(client,
>                                                 meta->exp->export_bitmap_context,
> @@ -1961,6 +1983,40 @@ static int blockstatus_to_extents(BlockDriverState *bs, uint64_t offset,
>       return 0;
>   }
> 
> +static int blockalloc_to_extents(BlockDriverState *bs, uint64_t offset,
> +                                 uint64_t bytes, NBDExtentArray *ea)
> +{
> +    while (bytes) {
> +        uint32_t flags;
> +        int64_t num;
> +        int ret = bdrv_is_allocated(bs, offset, bytes, &num);
> +
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        if (ret == 1) {
> +            flags = NBD_STATE_DEPTH_LOCAL;
> +        } else {
> +            ret = bdrv_is_allocated_above(bs, NULL, false, offset, num,
> +                                          &num);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            flags = ret ? NBD_STATE_DEPTH_BACKING : NBD_STATE_DEPTH_UNALLOC;
> +        }
> +
> +        if (nbd_extent_array_add(ea, num, flags) < 0) {
> +            return 0;
> +        }
> +
> +        offset += num;
> +        bytes -= num;
> +    }
> +
> +    return 0;
> +}
> +
>   /*
>    * nbd_co_send_extents
>    *
> @@ -2009,6 +2065,26 @@ static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
>       return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
>   }
> 
> +/* Get allocation depth from the exported device and send it to the client */
> +static int nbd_co_send_block_alloc(NBDClient *client, uint64_t handle,
> +                                   BlockDriverState *bs, uint64_t offset,
> +                                   uint32_t length, bool dont_fragment,
> +                                   bool last, uint32_t context_id,
> +                                   Error **errp)
> +{
> +    int ret;
> +    unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
> +    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
> +
> +    ret = blockalloc_to_extents(bs, offset, length, ea);
> +    if (ret < 0) {
> +        return nbd_co_send_structured_error(
> +                client, handle, -ret, "can't get block status", errp);
> +    }
> +
> +    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
> +}


The function is a copy of nbd_co_send_block_status()..

Probably, we can reuse nbd_co_send_block_status(), and just call blockstatus_to_extents()
or blockalloc_to_extents() depending on context_id parameter.

Still, I'm OK with it as is.

> +
>   /* Populate @ea from a dirty bitmap. */
>   static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
>                                 uint64_t offset, uint64_t length,
> @@ -2335,16 +2411,20 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>           }
>           if (client->export_meta.valid &&
>               (client->export_meta.base_allocation ||
> +             client->export_meta.allocation_depth ||
>                client->export_meta.bitmap))
>           {
>               bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
> +            int contexts_remaining = client->export_meta.base_allocation +
> +                client->export_meta.allocation_depth +
> +                client->export_meta.bitmap;
> 
>               if (client->export_meta.base_allocation) {
>                   ret = nbd_co_send_block_status(client, request->handle,
>                                                  blk_bs(exp->common.blk),
>                                                  request->from,
>                                                  request->len, dont_fragment,
> -                                               !client->export_meta.bitmap,
> +                                               !--contexts_remaining,
>                                                  NBD_META_ID_BASE_ALLOCATION,
>                                                  errp);
>                   if (ret < 0) {
> @@ -2352,17 +2432,32 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>                   }
>               }
> 
> +            if (client->export_meta.allocation_depth) {
> +                ret = nbd_co_send_block_alloc(client, request->handle,
> +                                              blk_bs(exp->common.blk),
> +                                              request->from, request->len,
> +                                              dont_fragment,
> +                                              !--contexts_remaining,
> +                                              NBD_META_ID_ALLOCATION_DEPTH,
> +                                              errp);
> +                if (ret < 0) {
> +                    return ret;
> +                }
> +            }
> +
>               if (client->export_meta.bitmap) {
>                   ret = nbd_co_send_bitmap(client, request->handle,
>                                            client->exp->export_bitmap,
>                                            request->from, request->len,
> -                                         dont_fragment,
> -                                         true, NBD_META_ID_DIRTY_BITMAP, errp);
> +                                         dont_fragment, !--contexts_remaining,
> +                                         NBD_META_ID_DIRTY_BITMAP, errp);
>                   if (ret < 0) {
>                       return ret;
>                   }
>               }
> 
> +            assert(!contexts_remaining);
> +
>               return 0;
>           } else {
>               return nbd_send_generic_reply(client, request->handle, -EINVAL,
>
Eric Blake Oct. 7, 2020, 9:54 p.m. UTC | #2
On 10/7/20 8:40 AM, Vladimir Sementsov-Ogievskiy wrote:
> 30.09.2020 15:11, Eric Blake wrote:
>> 'qemu-img map' provides a way to determine which extents of an image
>> come from the top layer vs. inherited from a backing chain.  This is
>> useful information worth exposing over NBD.  There is a proposal to
>> add a QMP command block-dirty-bitmap-populate which can create a dirty
>> bitmap that reflects allocation information, at which point
>> qemu:dirty-bitmap:NAME can expose that information via the creation of
>> a temporary bitmap, but we can shorten the effort by adding a new
>> qemu:allocation-depth context that does the same thing without an
>> intermediate bitmap (this patch does not eliminate the need for that
>> proposal, as it will have other uses as well).
>>
>> For this patch, I just encoded a tri-state value (unallocated, from
>> this layer, from any of the backing layers); an obvious extension
>> would be to provide the actual depth in bits 31-4 while keeping bits
>> 1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
>> from a hex number).  But this extension would require
>> bdrv_is_allocated_above to return a depth number.
>>
>> Note that this patch does not actually enable any way to request a
>> server to enable this context; that will come in the next patch.
>>

>> +In the allocation depth context, bits 0 and 1 form a tri-state value:
>> +
>> +    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is
>> unallocated
>> +    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this
>> image
> 
> Maybe, s/this/the top level/, as, what is "this" for NBD client?

Sure.


>> @@ -864,12 +867,21 @@ static bool nbd_meta_qemu_query(NBDClient
>> *client, NBDExportMetaContexts *meta,
>>
>>       if (!*query) {

We get here for "qemu:".

>>           if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
>> +            meta->allocation_depth = meta->exp->alloc_context;
>>               meta->bitmap = !!meta->exp->export_bitmap;
>>           }
>>           trace_nbd_negotiate_meta_query_parse("empty");
>>           return true;
>>       }
>>
>> +    if (nbd_strshift(&query, "allocation-depth")) {

We get here for "qemu:allocation-depth", and as you pointed out,
"qemu:allocation-depth-extended".

>> +        trace_nbd_negotiate_meta_query_parse("allocation-depth");
>> +        if (nbd_meta_empty_or_pattern(client, "", query)) {
> 
> How much it differs from "if (!*query) {", I don't see...

The nbd_meta_empty_or_pattern returns false for
"qemu:allocation-depth-extended"; it only returns true for
"qemu:allocation-depth".  But, as you pointed out,

> 
>> +            meta->allocation_depth = meta->exp->alloc_context;
>> +        }
>> +        return true;
>> +    }
> 
> why not just
> 
>  if (!strcmp(query, "allocation-depth")) {
>     meta->allocation_depth = meta->exp->alloc_context;
>     return true;
>  }
> 
> ?

that does seem shorter.

> 
> With your code you also parse something like
> "allocation-depth-extended".. Is it on purpose?

The string is parsed, but does not affect meta->XXX, which means nothing
gets advertised to the client.  The trace messages might differ, but
behavior is correct.

> 
>> +
>>       if (nbd_strshift(&query, "dirty-bitmap:")) {
>>           trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
>>           if (!meta->exp->export_bitmap) {
> 
> 
> Also, "trace_nbd_negotiate_meta_query_skip("not dirty-bitmap"); " at
> function end should
> now be something like trace_nbd_negotiate_meta_query_skip("unknown
> context in qemu: namespace");

Good idea.


>> +/* Get allocation depth from the exported device and send it to the
>> client */
>> +static int nbd_co_send_block_alloc(NBDClient *client, uint64_t handle,
>> +                                   BlockDriverState *bs, uint64_t
>> offset,
>> +                                   uint32_t length, bool dont_fragment,
>> +                                   bool last, uint32_t context_id,
>> +                                   Error **errp)
>> +{
>> +    int ret;
>> +    unsigned int nb_extents = dont_fragment ? 1 :
>> NBD_MAX_BLOCK_STATUS_EXTENTS;
>> +    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
>> +
>> +    ret = blockalloc_to_extents(bs, offset, length, ea);
>> +    if (ret < 0) {
>> +        return nbd_co_send_structured_error(
>> +                client, handle, -ret, "can't get block status", errp);
>> +    }
>> +
>> +    return nbd_co_send_extents(client, handle, ea, last, context_id,
>> errp);
>> +}
> 
> 
> The function is a copy of nbd_co_send_block_status()..
> 
> Probably, we can reuse nbd_co_send_block_status(), and just call
> blockstatus_to_extents()
> or blockalloc_to_extents() depending on context_id parameter.

Nice idea to reduce the duplication.

> 
> Still, I'm OK with it as is.
> 

So is that Reviewed-by:, or do I need to post v3 with my tweaks in
response to your comments?
Vladimir Sementsov-Ogievskiy Oct. 8, 2020, 1:04 p.m. UTC | #3
08.10.2020 00:54, Eric Blake wrote:
> On 10/7/20 8:40 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 30.09.2020 15:11, Eric Blake wrote:
>>> 'qemu-img map' provides a way to determine which extents of an image
>>> come from the top layer vs. inherited from a backing chain.  This is
>>> useful information worth exposing over NBD.  There is a proposal to
>>> add a QMP command block-dirty-bitmap-populate which can create a dirty
>>> bitmap that reflects allocation information, at which point
>>> qemu:dirty-bitmap:NAME can expose that information via the creation of
>>> a temporary bitmap, but we can shorten the effort by adding a new
>>> qemu:allocation-depth context that does the same thing without an
>>> intermediate bitmap (this patch does not eliminate the need for that
>>> proposal, as it will have other uses as well).
>>>
>>> For this patch, I just encoded a tri-state value (unallocated, from
>>> this layer, from any of the backing layers); an obvious extension
>>> would be to provide the actual depth in bits 31-4 while keeping bits
>>> 1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
>>> from a hex number).  But this extension would require
>>> bdrv_is_allocated_above to return a depth number.
>>>
>>> Note that this patch does not actually enable any way to request a
>>> server to enable this context; that will come in the next patch.
>>>
> 
>>> +In the allocation depth context, bits 0 and 1 form a tri-state value:
>>> +
>>> +    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is
>>> unallocated
>>> +    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this
>>> image
>>
>> Maybe, s/this/the top level/, as, what is "this" for NBD client?
> 
> Sure.
> 
> 
>>> @@ -864,12 +867,21 @@ static bool nbd_meta_qemu_query(NBDClient
>>> *client, NBDExportMetaContexts *meta,
>>>
>>>        if (!*query) {
> 
> We get here for "qemu:".
> 
>>>            if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
>>> +            meta->allocation_depth = meta->exp->alloc_context;
>>>                meta->bitmap = !!meta->exp->export_bitmap;
>>>            }
>>>            trace_nbd_negotiate_meta_query_parse("empty");
>>>            return true;
>>>        }
>>>
>>> +    if (nbd_strshift(&query, "allocation-depth")) {
> 
> We get here for "qemu:allocation-depth", and as you pointed out,
> "qemu:allocation-depth-extended".
> 
>>> +        trace_nbd_negotiate_meta_query_parse("allocation-depth");
>>> +        if (nbd_meta_empty_or_pattern(client, "", query)) {
>>
>> How much it differs from "if (!*query) {", I don't see...
> 
> The nbd_meta_empty_or_pattern returns false for
> "qemu:allocation-depth-extended"; it only returns true for
> "qemu:allocation-depth".  But, as you pointed out,
> 
>>
>>> +            meta->allocation_depth = meta->exp->alloc_context;
>>> +        }
>>> +        return true;
>>> +    }
>>
>> why not just
>>
>>   if (!strcmp(query, "allocation-depth")) {
>>      meta->allocation_depth = meta->exp->alloc_context;
>>      return true;
>>   }
>>
>> ?
> 
> that does seem shorter.
> 
>>
>> With your code you also parse something like
>> "allocation-depth-extended".. Is it on purpose?
> 
> The string is parsed, but does not affect meta->XXX, which means nothing
> gets advertised to the client.  The trace messages might differ, but
> behavior is correct.
> 
>>
>>> +
>>>        if (nbd_strshift(&query, "dirty-bitmap:")) {
>>>            trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
>>>            if (!meta->exp->export_bitmap) {
>>
>>
>> Also, "trace_nbd_negotiate_meta_query_skip("not dirty-bitmap"); " at
>> function end should
>> now be something like trace_nbd_negotiate_meta_query_skip("unknown
>> context in qemu: namespace");
> 
> Good idea.
> 
> 
>>> +/* Get allocation depth from the exported device and send it to the
>>> client */
>>> +static int nbd_co_send_block_alloc(NBDClient *client, uint64_t handle,
>>> +                                   BlockDriverState *bs, uint64_t
>>> offset,
>>> +                                   uint32_t length, bool dont_fragment,
>>> +                                   bool last, uint32_t context_id,
>>> +                                   Error **errp)
>>> +{
>>> +    int ret;
>>> +    unsigned int nb_extents = dont_fragment ? 1 :
>>> NBD_MAX_BLOCK_STATUS_EXTENTS;
>>> +    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
>>> +
>>> +    ret = blockalloc_to_extents(bs, offset, length, ea);
>>> +    if (ret < 0) {
>>> +        return nbd_co_send_structured_error(
>>> +                client, handle, -ret, "can't get block status", errp);
>>> +    }
>>> +
>>> +    return nbd_co_send_extents(client, handle, ea, last, context_id,
>>> errp);
>>> +}
>>
>>
>> The function is a copy of nbd_co_send_block_status()..
>>
>> Probably, we can reuse nbd_co_send_block_status(), and just call
>> blockstatus_to_extents()
>> or blockalloc_to_extents() depending on context_id parameter.
> 
> Nice idea to reduce the duplication.
> 
>>
>> Still, I'm OK with it as is.
>>

it was only about the duplicated function

> 
> So is that Reviewed-by:, or do I need to post v3 with my tweaks in
> response to your comments?
> 

Honestly it wasn't.. So, I found some things to discuss, and it's possible that I've looked through with less attention. Hm. So, with v3 or without, I've have to lookthrough it again..

My suggested diff:

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 56efec7aee..a0d91ff08b 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -36,7 +36,8 @@ the image, with a single context named:
  In the allocation depth context, bits 0 and 1 form a tri-state value:
  
      bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
-    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image
+    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in the top
+               level image
      bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
                 backing layer
  
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 06208bc250..2d7165960d 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -262,7 +262,7 @@ enum {
  /* Extent flags for qemu:allocation-depth in NBD_REPLY_TYPE_BLOCK_STATUS */
  #define NBD_STATE_DEPTH_UNALLOC (0 << 0)
  #define NBD_STATE_DEPTH_LOCAL   (1 << 0)
-#define NBD_STATE_DEPTH_BACKING (2 << 0)
+#define NBD_STATE_DEPTH_BACKING (1 << 1)
  #define NBD_STATE_DEPTH_MASK    (0x3)
  
  static inline bool nbd_reply_type_is_error(int type)
diff --git a/nbd/server.c b/nbd/server.c
index 830b21000b..3761ac180f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -809,7 +809,7 @@ static bool nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
  {
      if (!*query) {
          trace_nbd_negotiate_meta_query_parse("empty");
-        return !*pattern || client->opt == NBD_OPT_LIST_META_CONTEXT;
+        return client->opt == NBD_OPT_LIST_META_CONTEXT;
      }
      if (strcmp(query, pattern) == 0) {
          trace_nbd_negotiate_meta_query_parse(pattern);
@@ -874,11 +874,9 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
          return true;
      }
  
-    if (nbd_strshift(&query, "allocation-depth")) {
+    if (!strcmp(query, "allocation-depth")) {
          trace_nbd_negotiate_meta_query_parse("allocation-depth");
-        if (nbd_meta_empty_or_pattern(client, "", query)) {
-            meta->allocation_depth = meta->exp->alloc_context;
-        }
+        meta->allocation_depth = meta->exp->alloc_context;
          return true;
      }
  
@@ -896,7 +894,7 @@ static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
          return true;
      }
  
-    trace_nbd_negotiate_meta_query_skip("not dirty-bitmap");
+    trace_nbd_negotiate_meta_query_skip("unknown context in qemu: namespace");
      return true;
  }
  
@@ -2045,7 +2043,11 @@ static int nbd_co_send_extents(NBDClient *client, uint64_t handle,
      return nbd_co_send_iov(client, iov, 2, errp);
  }
  
-/* Get block status from the exported device and send it to the client */
+/*
+ * Get block status from the exported device and send it to the client,
+ * handling base:allocation or qemu:allocation-depth meta-context, based
+ * on @context_id value.
+ */
  static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
                                      BlockDriverState *bs, uint64_t offset,
                                      uint32_t length, bool dont_fragment,
@@ -2056,27 +2058,12 @@ static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
      unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
      g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
  
-    ret = blockstatus_to_extents(bs, offset, length, ea);
-    if (ret < 0) {
-        return nbd_co_send_structured_error(
-                client, handle, -ret, "can't get block status", errp);
+    if (context_id == NBD_META_ID_BASE_ALLOCATION) {
+        ret = blockstatus_to_extents(bs, offset, length, ea);
+    } else {
+        assert(context_id == NBD_META_ID_ALLOCATION_DEPTH);
+        ret = blockalloc_to_extents(bs, offset, length, ea);
      }
-
-    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
-}
-
-/* Get allocation depth from the exported device and send it to the client */
-static int nbd_co_send_block_alloc(NBDClient *client, uint64_t handle,
-                                   BlockDriverState *bs, uint64_t offset,
-                                   uint32_t length, bool dont_fragment,
-                                   bool last, uint32_t context_id,
-                                   Error **errp)
-{
-    int ret;
-    unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
-    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
-
-    ret = blockalloc_to_extents(bs, offset, length, ea);
      if (ret < 0) {
          return nbd_co_send_structured_error(
                  client, handle, -ret, "can't get block status", errp);
@@ -2433,13 +2420,13 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
              }
  
              if (client->export_meta.allocation_depth) {
-                ret = nbd_co_send_block_alloc(client, request->handle,
-                                              blk_bs(exp->common.blk),
-                                              request->from, request->len,
-                                              dont_fragment,
-                                              !--contexts_remaining,
-                                              NBD_META_ID_ALLOCATION_DEPTH,
-                                              errp);
+                ret = nbd_co_send_block_status(client, request->handle,
+                                               blk_bs(exp->common.blk),
+                                               request->from, request->len,
+                                               dont_fragment,
+                                               !--contexts_remaining,
+                                               NBD_META_ID_ALLOCATION_DEPTH,
+                                               errp);
                  if (ret < 0) {
                      return ret;
                  }



with it:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
diff mbox series

Patch

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f3b3cacc9621..56efec7aee12 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -17,9 +17,9 @@  namespace "qemu".

 == "qemu" namespace ==

-The "qemu" namespace currently contains only one type of context,
-related to exposing the contents of a dirty bitmap alongside the
-associated disk contents.  That context has the following form:
+The "qemu" namespace currently contains two types of context.  The
+first is related to exposing the contents of a dirty bitmap alongside
+the associated disk contents.  That context has the following form:

     qemu:dirty-bitmap:<dirty-bitmap-export-name>

@@ -28,8 +28,21 @@  in reply for NBD_CMD_BLOCK_STATUS:

     bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"

+The second is related to exposing the source of various extents within
+the image, with a single context named:
+
+    qemu:allocation-depth
+
+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
+    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image
+    bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
+               backing layer
+
 For NBD_OPT_LIST_META_CONTEXT the following queries are supported
-in addition to "qemu:dirty-bitmap:<dirty-bitmap-export-name>":
+in addition to the specific "qemu:allocation-depth" and
+"qemu:dirty-bitmap:<dirty-bitmap-export-name>":

 * "qemu:" - returns list of all available metadata contexts in the
             namespace.
@@ -55,3 +68,4 @@  the operation of that feature.
 NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
+* 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3dd9a04546ec..06208bc25027 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@ 
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -259,6 +259,12 @@  enum {
 /* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_DIRTY (1 << 0)

+/* Extent flags for qemu:allocation-depth in NBD_REPLY_TYPE_BLOCK_STATUS */
+#define NBD_STATE_DEPTH_UNALLOC (0 << 0)
+#define NBD_STATE_DEPTH_LOCAL   (1 << 0)
+#define NBD_STATE_DEPTH_BACKING (2 << 0)
+#define NBD_STATE_DEPTH_MASK    (0x3)
+
 static inline bool nbd_reply_type_is_error(int type)
 {
     return type & (1 << 15);
diff --git a/nbd/server.c b/nbd/server.c
index 7271a09b5c2b..830b21000be3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -27,7 +27,8 @@ 
 #include "qemu/units.h"

 #define NBD_META_ID_BASE_ALLOCATION 0
-#define NBD_META_ID_DIRTY_BITMAP 1
+#define NBD_META_ID_ALLOCATION_DEPTH 1
+#define NBD_META_ID_DIRTY_BITMAP 2

 /*
  * NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical
@@ -94,6 +95,7 @@  struct NBDExport {
     BlockBackend *eject_notifier_blk;
     Notifier eject_notifier;

+    bool alloc_context;
     BdrvDirtyBitmap *export_bitmap;
     char *export_bitmap_context;
 };
@@ -108,6 +110,7 @@  typedef struct NBDExportMetaContexts {
     bool valid; /* means that negotiation of the option finished without
                    errors */
     bool base_allocation; /* export base:allocation context (block status) */
+    bool allocation_depth; /* export qemu:allocation-depth */
     bool bitmap; /* export qemu:dirty-bitmap:<export bitmap name> */
 } NBDExportMetaContexts;

@@ -806,7 +809,7 @@  static bool nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
 {
     if (!*query) {
         trace_nbd_negotiate_meta_query_parse("empty");
-        return client->opt == NBD_OPT_LIST_META_CONTEXT;
+        return !*pattern || client->opt == NBD_OPT_LIST_META_CONTEXT;
     }
     if (strcmp(query, pattern) == 0) {
         trace_nbd_negotiate_meta_query_parse(pattern);
@@ -864,12 +867,21 @@  static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,

     if (!*query) {
         if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+            meta->allocation_depth = meta->exp->alloc_context;
             meta->bitmap = !!meta->exp->export_bitmap;
         }
         trace_nbd_negotiate_meta_query_parse("empty");
         return true;
     }

+    if (nbd_strshift(&query, "allocation-depth")) {
+        trace_nbd_negotiate_meta_query_parse("allocation-depth");
+        if (nbd_meta_empty_or_pattern(client, "", query)) {
+            meta->allocation_depth = meta->exp->alloc_context;
+        }
+        return true;
+    }
+
     if (nbd_strshift(&query, "dirty-bitmap:")) {
         trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
         if (!meta->exp->export_bitmap) {
@@ -984,6 +996,7 @@  static int nbd_negotiate_meta_queries(NBDClient *client,
     if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
         /* enable all known contexts */
         meta->base_allocation = true;
+        meta->allocation_depth = meta->exp->alloc_context;
         meta->bitmap = !!meta->exp->export_bitmap;
     } else {
         for (i = 0; i < nb_queries; ++i) {
@@ -1003,6 +1016,15 @@  static int nbd_negotiate_meta_queries(NBDClient *client,
         }
     }

+    if (meta->allocation_depth) {
+        ret = nbd_negotiate_send_meta_context(client, "qemu:allocation-depth",
+                                              NBD_META_ID_ALLOCATION_DEPTH,
+                                              errp);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     if (meta->bitmap) {
         ret = nbd_negotiate_send_meta_context(client,
                                               meta->exp->export_bitmap_context,
@@ -1961,6 +1983,40 @@  static int blockstatus_to_extents(BlockDriverState *bs, uint64_t offset,
     return 0;
 }

+static int blockalloc_to_extents(BlockDriverState *bs, uint64_t offset,
+                                 uint64_t bytes, NBDExtentArray *ea)
+{
+    while (bytes) {
+        uint32_t flags;
+        int64_t num;
+        int ret = bdrv_is_allocated(bs, offset, bytes, &num);
+
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (ret == 1) {
+            flags = NBD_STATE_DEPTH_LOCAL;
+        } else {
+            ret = bdrv_is_allocated_above(bs, NULL, false, offset, num,
+                                          &num);
+            if (ret < 0) {
+                return ret;
+            }
+            flags = ret ? NBD_STATE_DEPTH_BACKING : NBD_STATE_DEPTH_UNALLOC;
+        }
+
+        if (nbd_extent_array_add(ea, num, flags) < 0) {
+            return 0;
+        }
+
+        offset += num;
+        bytes -= num;
+    }
+
+    return 0;
+}
+
 /*
  * nbd_co_send_extents
  *
@@ -2009,6 +2065,26 @@  static int nbd_co_send_block_status(NBDClient *client, uint64_t handle,
     return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
 }

+/* Get allocation depth from the exported device and send it to the client */
+static int nbd_co_send_block_alloc(NBDClient *client, uint64_t handle,
+                                   BlockDriverState *bs, uint64_t offset,
+                                   uint32_t length, bool dont_fragment,
+                                   bool last, uint32_t context_id,
+                                   Error **errp)
+{
+    int ret;
+    unsigned int nb_extents = dont_fragment ? 1 : NBD_MAX_BLOCK_STATUS_EXTENTS;
+    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
+
+    ret = blockalloc_to_extents(bs, offset, length, ea);
+    if (ret < 0) {
+        return nbd_co_send_structured_error(
+                client, handle, -ret, "can't get block status", errp);
+    }
+
+    return nbd_co_send_extents(client, handle, ea, last, context_id, errp);
+}
+
 /* Populate @ea from a dirty bitmap. */
 static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
                               uint64_t offset, uint64_t length,
@@ -2335,16 +2411,20 @@  static coroutine_fn int nbd_handle_request(NBDClient *client,
         }
         if (client->export_meta.valid &&
             (client->export_meta.base_allocation ||
+             client->export_meta.allocation_depth ||
              client->export_meta.bitmap))
         {
             bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
+            int contexts_remaining = client->export_meta.base_allocation +
+                client->export_meta.allocation_depth +
+                client->export_meta.bitmap;

             if (client->export_meta.base_allocation) {
                 ret = nbd_co_send_block_status(client, request->handle,
                                                blk_bs(exp->common.blk),
                                                request->from,
                                                request->len, dont_fragment,
-                                               !client->export_meta.bitmap,
+                                               !--contexts_remaining,
                                                NBD_META_ID_BASE_ALLOCATION,
                                                errp);
                 if (ret < 0) {
@@ -2352,17 +2432,32 @@  static coroutine_fn int nbd_handle_request(NBDClient *client,
                 }
             }

+            if (client->export_meta.allocation_depth) {
+                ret = nbd_co_send_block_alloc(client, request->handle,
+                                              blk_bs(exp->common.blk),
+                                              request->from, request->len,
+                                              dont_fragment,
+                                              !--contexts_remaining,
+                                              NBD_META_ID_ALLOCATION_DEPTH,
+                                              errp);
+                if (ret < 0) {
+                    return ret;
+                }
+            }
+
             if (client->export_meta.bitmap) {
                 ret = nbd_co_send_bitmap(client, request->handle,
                                          client->exp->export_bitmap,
                                          request->from, request->len,
-                                         dont_fragment,
-                                         true, NBD_META_ID_DIRTY_BITMAP, errp);
+                                         dont_fragment, !--contexts_remaining,
+                                         NBD_META_ID_DIRTY_BITMAP, errp);
                 if (ret < 0) {
                     return ret;
                 }
             }

+            assert(!contexts_remaining);
+
             return 0;
         } else {
             return nbd_send_generic_reply(client, request->handle, -EINVAL,