Message ID | 20180802144834.520904-3-eblake@redhat.com |
---|---|
State | New |
Headers | show |
Series | NBD fixes for unaligned images | expand |
02.08.2018 17:48, Eric Blake wrote: > Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split > their reply according to bdrv_block_status() boundaries. If the > block device has a request_alignment smaller than 512, but we > advertise a block alignment of 512 to the client, then this can > result in the server reply violating client expectations by > reporting a smaller region of the export than what the client > is permitted to address. Thus, it is imperative that we > advertise the actual minimum block limit, rather than blindly > rounding it up to 512 (bdrv_block_status() cannot return status > aligned any smaller than request_alignment). > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > nbd/server.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/nbd/server.c b/nbd/server.c > index ea5fe0eb336..cd3c41f895b 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -608,10 +608,12 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint16_t myflags, > /* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size > * according to whether the client requested it, and according to > * whether this is OPT_INFO or OPT_GO. */ > - /* minimum - 1 for back-compat, or 512 if client is new enough. > - * TODO: consult blk_bs(blk)->bl.request_alignment? */ > - sizes[0] = > - (client->opt == NBD_OPT_INFO || blocksize) ? BDRV_SECTOR_SIZE : 1; > + /* minimum - 1 for back-compat, or actual if client will obey it. */ > + if (client->opt == NBD_OPT_INFO || blocksize) { > + sizes[0] = blk_get_request_alignment(exp->blk); > + } else { > + sizes[0] = 1; > + } > /* preferred - Hard-code to 4096 for now. > * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */ > sizes[1] = 4096; Here, what about dirty bitmap export through BLOCK_STATUS? Could it's granularity be less than request_alignment? Shouldn't we handle it somehow?
On 08/17/2018 08:49 AM, Vladimir Sementsov-Ogievskiy wrote: >> BDRV_SECTOR_SIZE : 1; >> + /* minimum - 1 for back-compat, or actual if client will obey it. */ >> + if (client->opt == NBD_OPT_INFO || blocksize) { >> + sizes[0] = blk_get_request_alignment(exp->blk); >> + } else { >> + sizes[0] = 1; >> + } >> /* preferred - Hard-code to 4096 for now. >> * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */ >> sizes[1] = 4096; > > Here, what about dirty bitmap export through BLOCK_STATUS? Could it's > granularity be less than request_alignment? Shouldn't we handle it somehow? Can you create a dirty bitmap with a granularity smaller than request_alignment? I know you can configure dirty bitmap granularity independently from cluster size (in both directions: either smaller or larger than cluster size), but that it has a least a minimum lower bounds of 512. You're probably right that we also want it to have a minimum lower bound of the request_alignment (if you're using a device with 4k minimum I/O, request_alignment would be 4k, and having a dirty bitmap any smaller than that granularity is wasted space). On the other hand, I also think we're safe for this patch: even if you waste the space by creating a bitmap with too-small granularity, the actions that write bits into the bitmap will still be aligned to request_alignment, which means you always set a multiple of bits per action; when reading back the dirty bitmap to report over NBD_CMD_BLOCK_STATUS, you'll never encounter a change in bit status except on alignment boundaries (based on how the bits were written), and thus what NBD reports will still be aligned to the advertised minimum size, rather than the smaller bitmap granularity.
17.08.2018 17:53, Eric Blake wrote: > On 08/17/2018 08:49 AM, Vladimir Sementsov-Ogievskiy wrote: > >>> BDRV_SECTOR_SIZE : 1; >>> + /* minimum - 1 for back-compat, or actual if client will obey >>> it. */ >>> + if (client->opt == NBD_OPT_INFO || blocksize) { >>> + sizes[0] = blk_get_request_alignment(exp->blk); >>> + } else { >>> + sizes[0] = 1; >>> + } >>> /* preferred - Hard-code to 4096 for now. >>> * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */ >>> sizes[1] = 4096; >> >> Here, what about dirty bitmap export through BLOCK_STATUS? Could it's >> granularity be less than request_alignment? Shouldn't we handle it >> somehow? > > Can you create a dirty bitmap with a granularity smaller than > request_alignment? I know you can configure dirty bitmap granularity > independently from cluster size (in both directions: either smaller or > larger than cluster size), but that it has a least a minimum lower > bounds of 512. You're probably right that we also want it to have a > minimum lower bound of the request_alignment (if you're using a device > with 4k minimum I/O, request_alignment would be 4k, and having a dirty > bitmap any smaller than that granularity is wasted space). > > On the other hand, I also think we're safe for this patch: even if you > waste the space by creating a bitmap with too-small granularity, the > actions that write bits into the bitmap will still be aligned to > request_alignment, Not quite right: nbd_export_bitmap searches through the backing chain for the bitmap, and it may correspond to the bds with smaller request_alignment. > which means you always set a multiple of bits per action; when reading > back the dirty bitmap to report over NBD_CMD_BLOCK_STATUS, you'll > never encounter a change in bit status except on alignment boundaries > (based on how the bits were written), and thus what NBD reports will > still be aligned to the advertised minimum size, rather than the > smaller bitmap granularity. >
On 08/17/2018 10:04 AM, Vladimir Sementsov-Ogievskiy wrote: >> Can you create a dirty bitmap with a granularity smaller than >> request_alignment? I know you can configure dirty bitmap granularity >> independently from cluster size (in both directions: either smaller or >> larger than cluster size), but that it has a least a minimum lower >> bounds of 512. You're probably right that we also want it to have a >> minimum lower bound of the request_alignment (if you're using a device >> with 4k minimum I/O, request_alignment would be 4k, and having a dirty >> bitmap any smaller than that granularity is wasted space). >> >> On the other hand, I also think we're safe for this patch: even if you >> waste the space by creating a bitmap with too-small granularity, the >> actions that write bits into the bitmap will still be aligned to >> request_alignment, > > Not quite right: nbd_export_bitmap searches through the backing chain > for the bitmap, and it may correspond to the bds with smaller > request_alignment. Hmm. Interesting setup: base (512-byte alignment, bitmap granularity) <- active (4k alignment) so you're stating that exporting 'active' but with the 'bitmap' inherited from the backing chain from base means that the bitmap may have transitions in between the alignment advertised by active. Then it sounds like nbd/server.c:bitmap_to_extents() should be aware of that fact, and intentionally round up (anywhere that it finds a mid-alignment transition, treat the entire aligned region as dirty. It never hurts to mark more of the image dirty than necessary, but does hurt to expose a mid-alignment transition to an NBD client not expecting one).
diff --git a/nbd/server.c b/nbd/server.c index ea5fe0eb336..cd3c41f895b 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -608,10 +608,12 @@ static int nbd_negotiate_handle_info(NBDClient *client, uint16_t myflags, /* Send NBD_INFO_BLOCK_SIZE always, but tweak the minimum size * according to whether the client requested it, and according to * whether this is OPT_INFO or OPT_GO. */ - /* minimum - 1 for back-compat, or 512 if client is new enough. - * TODO: consult blk_bs(blk)->bl.request_alignment? */ - sizes[0] = - (client->opt == NBD_OPT_INFO || blocksize) ? BDRV_SECTOR_SIZE : 1; + /* minimum - 1 for back-compat, or actual if client will obey it. */ + if (client->opt == NBD_OPT_INFO || blocksize) { + sizes[0] = blk_get_request_alignment(exp->blk); + } else { + sizes[0] = 1; + } /* preferred - Hard-code to 4096 for now. * TODO: is blk_bs(blk)->bl.opt_transfer appropriate? */ sizes[1] = 4096;
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their reply according to bdrv_block_status() boundaries. If the block device has a request_alignment smaller than 512, but we advertise a block alignment of 512 to the client, then this can result in the server reply violating client expectations by reporting a smaller region of the export than what the client is permitted to address. Thus, it is imperative that we advertise the actual minimum block limit, rather than blindly rounding it up to 512 (bdrv_block_status() cannot return status aligned any smaller than request_alignment). Signed-off-by: Eric Blake <eblake@redhat.com> --- nbd/server.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)