diff mbox series

[2/4] nbd/server: Advertise actual minimum block size

Message ID 20180802144834.520904-3-eblake@redhat.com
State New
Headers show
Series NBD fixes for unaligned images | expand

Commit Message

Eric Blake Aug. 2, 2018, 2:48 p.m. UTC
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split
their reply according to bdrv_block_status() boundaries. If the
block device has a request_alignment smaller than 512, but we
advertise a block alignment of 512 to the client, then this can
result in the server reply violating client expectations by
reporting a smaller region of the export than what the client
is permitted to address.  Thus, it is imperative that we
advertise the actual minimum block limit, rather than blindly
rounding it up to 512 (bdrv_block_status() cannot return status
aligned any smaller than request_alignment).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Aug. 17, 2018, 1:49 p.m. UTC | #1
02.08.2018 17:48, Eric Blake wrote:
> Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split
> their reply according to bdrv_block_status() boundaries. If the
> block device has a request_alignment smaller than 512, but we
> advertise a block alignment of 512 to the client, then this can
> result in the server reply violating client expectations by
> reporting a smaller region of the export than what the client
> is permitted to address.  Thus, it is imperative that we
> advertise the actual minimum block limit, rather than blindly
> rounding it up to 512 (bdrv_block_status() cannot return status
> aligned any smaller than request_alignment).
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   nbd/server.c | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/nbd/server.c b/nbd/server.c
> index ea5fe0eb336..cd3c41f895b 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -608,10 +608,12 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint16_t myflags,
>       /* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size
>        * according to whether the client requested it, and according to
>        * whether this is OPT_INFO or OPT_GO. */
> -    /* minimum - 1 for back-compat, or 512 if client is new enough.
> -     * TODO: consult blk_bs(blk)->bl.request_alignment? */
> -    sizes[0] =
> -            (client->opt == NBD_OPT_INFO || blocksize) ? BDRV_SECTOR_SIZE : 1;
> +    /* minimum - 1 for back-compat, or actual if client will obey it. */
> +    if (client->opt == NBD_OPT_INFO || blocksize) {
> +        sizes[0] = blk_get_request_alignment(exp->blk);
> +    } else {
> +        sizes[0] = 1;
> +    }
>       /* preferred - Hard-code to 4096 for now.
>        * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */
>       sizes[1] = 4096;

Here, what about dirty bitmap export through BLOCK_STATUS? Could it's 
granularity be less than request_alignment? Shouldn't we handle it somehow?
Eric Blake Aug. 17, 2018, 2:53 p.m. UTC | #2
On 08/17/2018 08:49 AM, Vladimir Sementsov-Ogievskiy wrote:

>> BDRV_SECTOR_SIZE : 1;
>> +    /* minimum - 1 for back-compat, or actual if client will obey it. */
>> +    if (client->opt == NBD_OPT_INFO || blocksize) {
>> +        sizes[0] = blk_get_request_alignment(exp->blk);
>> +    } else {
>> +        sizes[0] = 1;
>> +    }
>>       /* preferred - Hard-code to 4096 for now.
>>        * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */
>>       sizes[1] = 4096;
> 
> Here, what about dirty bitmap export through BLOCK_STATUS? Could it's 
> granularity be less than request_alignment? Shouldn't we handle it somehow?

Can you create a dirty bitmap with a granularity smaller than 
request_alignment?  I know you can configure dirty bitmap granularity 
independently from cluster size (in both directions: either smaller or 
larger than cluster size), but that it has a least a minimum lower 
bounds of 512.  You're probably right that we also want it to have a 
minimum lower bound of the request_alignment (if you're using a device 
with 4k minimum I/O, request_alignment would be 4k, and having a dirty 
bitmap any smaller than that granularity is wasted space).

On the other hand, I also think we're safe for this patch: even if you 
waste the space by creating a bitmap with too-small granularity, the 
actions that write bits into the bitmap will still be aligned to 
request_alignment, which means you always set a multiple of bits per 
action; when reading back the dirty bitmap to report over 
NBD_CMD_BLOCK_STATUS, you'll never encounter a change in bit status 
except on alignment boundaries (based on how the bits were written), and 
thus what NBD reports will still be aligned to the advertised minimum 
size, rather than the smaller bitmap granularity.
Vladimir Sementsov-Ogievskiy Aug. 17, 2018, 3:04 p.m. UTC | #3
17.08.2018 17:53, Eric Blake wrote:
> On 08/17/2018 08:49 AM, Vladimir Sementsov-Ogievskiy wrote:
>
>>> BDRV_SECTOR_SIZE : 1;
>>> +    /* minimum - 1 for back-compat, or actual if client will obey 
>>> it. */
>>> +    if (client->opt == NBD_OPT_INFO || blocksize) {
>>> +        sizes[0] = blk_get_request_alignment(exp->blk);
>>> +    } else {
>>> +        sizes[0] = 1;
>>> +    }
>>>       /* preferred - Hard-code to 4096 for now.
>>>        * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */
>>>       sizes[1] = 4096;
>>
>> Here, what about dirty bitmap export through BLOCK_STATUS? Could it's 
>> granularity be less than request_alignment? Shouldn't we handle it 
>> somehow?
>
> Can you create a dirty bitmap with a granularity smaller than 
> request_alignment?  I know you can configure dirty bitmap granularity 
> independently from cluster size (in both directions: either smaller or 
> larger than cluster size), but that it has a least a minimum lower 
> bounds of 512.  You're probably right that we also want it to have a 
> minimum lower bound of the request_alignment (if you're using a device 
> with 4k minimum I/O, request_alignment would be 4k, and having a dirty 
> bitmap any smaller than that granularity is wasted space).
>
> On the other hand, I also think we're safe for this patch: even if you 
> waste the space by creating a bitmap with too-small granularity, the 
> actions that write bits into the bitmap will still be aligned to 
> request_alignment,

Not quite right: nbd_export_bitmap searches through the backing chain 
for the bitmap, and it may correspond to the bds with smaller 
request_alignment.

> which means you always set a multiple of bits per action; when reading 
> back the dirty bitmap to report over NBD_CMD_BLOCK_STATUS, you'll 
> never encounter a change in bit status except on alignment boundaries 
> (based on how the bits were written), and thus what NBD reports will 
> still be aligned to the advertised minimum size, rather than the 
> smaller bitmap granularity.
>
Eric Blake Aug. 17, 2018, 3:11 p.m. UTC | #4
On 08/17/2018 10:04 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Can you create a dirty bitmap with a granularity smaller than 
>> request_alignment?  I know you can configure dirty bitmap granularity 
>> independently from cluster size (in both directions: either smaller or 
>> larger than cluster size), but that it has a least a minimum lower 
>> bounds of 512.  You're probably right that we also want it to have a 
>> minimum lower bound of the request_alignment (if you're using a device 
>> with 4k minimum I/O, request_alignment would be 4k, and having a dirty 
>> bitmap any smaller than that granularity is wasted space).
>>
>> On the other hand, I also think we're safe for this patch: even if you 
>> waste the space by creating a bitmap with too-small granularity, the 
>> actions that write bits into the bitmap will still be aligned to 
>> request_alignment,
> 
> Not quite right: nbd_export_bitmap searches through the backing chain 
> for the bitmap, and it may correspond to the bds with smaller 
> request_alignment.

Hmm. Interesting setup:

base (512-byte alignment, bitmap granularity) <- active (4k alignment)

so you're stating that exporting 'active' but with the 'bitmap' 
inherited from the backing chain from base means that the bitmap may 
have transitions in between the alignment advertised by active.  Then it 
sounds like nbd/server.c:bitmap_to_extents() should be aware of that 
fact, and intentionally round up (anywhere that it finds a mid-alignment 
transition, treat the entire aligned region as dirty. It never hurts to 
mark more of the image dirty than necessary, but does hurt to expose a 
mid-alignment transition to an NBD client not expecting one).
diff mbox series

Patch

diff --git a/nbd/server.c b/nbd/server.c
index ea5fe0eb336..cd3c41f895b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -608,10 +608,12 @@  static int nbd_negotiate_handle_info(NBDClient *client, uint16_t myflags,
     /* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size
      * according to whether the client requested it, and according to
      * whether this is OPT_INFO or OPT_GO. */
-    /* minimum - 1 for back-compat, or 512 if client is new enough.
-     * TODO: consult blk_bs(blk)->bl.request_alignment? */
-    sizes[0] =
-            (client->opt == NBD_OPT_INFO || blocksize) ? BDRV_SECTOR_SIZE : 1;
+    /* minimum - 1 for back-compat, or actual if client will obey it. */
+    if (client->opt == NBD_OPT_INFO || blocksize) {
+        sizes[0] = blk_get_request_alignment(exp->blk);
+    } else {
+        sizes[0] = 1;
+    }
     /* preferred - Hard-code to 4096 for now.
      * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */
     sizes[1] = 4096;