diff mbox series

[v3,4/6] nbd/client: Support qemu-img convert from unaligned size

Message ID 20190329042750.14704-5-eblake@redhat.com
State New
Headers show
Series NBD server alignment improvement | expand

Commit Message

Eric Blake March 29, 2019, 4:27 a.m. UTC
If an NBD server advertises a size that is not a multiple of a sector,
the block layer rounds up that size, even though we set info.size to
the exact byte value sent by the server. The block layer then proceeds
to let us read or query block status on the hole that it added past
EOF, which the NBD server is unlikely to be happy with. Fortunately,
qemu as a server never advertizes an unaligned size, so we generally
don't run into this problem; but the nbdkit server makes it easy to
test:

$ printf %1000d 1 > f1
$ ~/nbdkit/nbdkit -fv file f1 & pid=$!
$ qemu-img convert -f raw nbd://localhost:10809 f2
$ kill $pid
$ qemu-img compare f1 f2

Pre-patch, the server attempts a 1024-byte read, which nbdkit
rightfully rejects as going beyond its advertised 1000 byte size; the
conversion fails and the output files differ (not even the first
sector is copied, because qemu-img does not follow ddrescue's habit of
trying smaller reads to get as much information as possible in spite
of errors). Post-patch, the client's attempts to read (and query block
status, for new enough nbdkit) are properly truncated to the server's
length, with sane handling of the hole the block layer forced on
us. Although f2 ends up as a larger file (1024 bytes instead of 1000),
qemu-img compare shows the two images to have identical contents for
display to the guest.

I didn't add iotests coverage since I didn't want to add a dependency
on nbdkit in iotests. I also did NOT patch write, trim, or write
zeroes - these commands continue to fail (usually with ENOSPC, but
whatever the server chose), because we really can't write to the end
of the file, and because 'qemu-img convert' is the most common case
where we care about being tolerant (which is read-only). Perhaps we
could truncate the request if the client is writing zeros to the tail,
but that seems like more work, especially if the block layer is fixed
in 4.1 to track byte-accurate sizing (in which case this patch would
be reverted as unnecessary).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd-client.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

Comments

Vladimir Sementsov-Ogievskiy March 29, 2019, 3:26 p.m. UTC | #1
29.03.2019 7:27, Eric Blake wrote:
> If an NBD server advertises a size that is not a multiple of a sector,
> the block layer rounds up that size, even though we set info.size to
> the exact byte value sent by the server. The block layer then proceeds
> to let us read or query block status on the hole that it added past
> EOF, which the NBD server is unlikely to be happy with. Fortunately,
> qemu as a server never advertizes an unaligned size, so we generally
> don't run into this problem; but the nbdkit server makes it easy to
> test:
> 
> $ printf %1000d 1 > f1
> $ ~/nbdkit/nbdkit -fv file f1 & pid=$!
> $ qemu-img convert -f raw nbd://localhost:10809 f2
> $ kill $pid
> $ qemu-img compare f1 f2
> 
> Pre-patch, the server attempts a 1024-byte read, which nbdkit
> rightfully rejects as going beyond its advertised 1000 byte size; the
> conversion fails and the output files differ (not even the first
> sector is copied, because qemu-img does not follow ddrescue's habit of
> trying smaller reads to get as much information as possible in spite
> of errors). Post-patch, the client's attempts to read (and query block
> status, for new enough nbdkit) are properly truncated to the server's
> length, with sane handling of the hole the block layer forced on
> us. Although f2 ends up as a larger file (1024 bytes instead of 1000),
> qemu-img compare shows the two images to have identical contents for
> display to the guest.
> 
> I didn't add iotests coverage since I didn't want to add a dependency
> on nbdkit in iotests. I also did NOT patch write, trim, or write
> zeroes - these commands continue to fail (usually with ENOSPC, but
> whatever the server chose), because we really can't write to the end
> of the file, and because 'qemu-img convert' is the most common case
> where we care about being tolerant (which is read-only). Perhaps we
> could truncate the request if the client is writing zeros to the tail,
> but that seems like more work, especially if the block layer is fixed
> in 4.1 to track byte-accurate sizing (in which case this patch would
> be reverted as unnecessary).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/nbd-client.c | 39 ++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 3edb508f668..409c2171bc3 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -848,6 +848,25 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>       if (!bytes) {
>           return 0;
>       }
> +    /*
> +     * Work around the fact that the block layer doesn't do
> +     * byte-accurate sizing yet - if the read exceeds the server's
> +     * advertised size because the block layer rounded size up, then
> +     * truncate the request to the server and tail-pad with zero.
> +     */
> +    if (offset >= client->info.size) {
> +        assert(bytes < BDRV_SECTOR_SIZE);
> +        qemu_iovec_memset(qiov, 0, 0, bytes);
> +        return 0;
> +    }
> +    if (offset + bytes > client->info.size) {
> +        uint64_t slop = offset + bytes - client->info.size;
> +
> +        assert(slop < BDRV_SECTOR_SIZE);
> +        qemu_iovec_memset(qiov, bytes - slop, 0, slop);
> +        request.len -= slop;
> +    }
> +
>       ret = nbd_co_send_request(bs, &request, NULL);
>       if (ret < 0) {
>           return ret;
> @@ -966,7 +985,8 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
>           .from = offset,
>           .len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
>                                                   bs->bl.request_alignment),
> -                                client->info.max_block), bytes),
> +                                client->info.max_block),
> +                   MIN(bytes, client->info.size - offset)),
>           .flags = NBD_CMD_FLAG_REQ_ONE,
>       };
> 
> @@ -977,6 +997,23 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
>           return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
>       }
> 
> +    /*
> +     * Work around the fact that the block layer doesn't do
> +     * byte-accurate sizing yet - if the status request exceeds the
> +     * server's advertised size because the block layer rounded size
> +     * up, we truncated the request to the server (above), or are
> +     * called on just the hole.
> +     */
> +    if (offset >= client->info.size) {
> +        *pnum = bytes;
> +        assert(bytes < BDRV_SECTOR_SIZE);
> +        /* Intentionally don't report offset_valid for the hole */
> +        return BDRV_BLOCK_ZERO;
> +    }
> +
> +    if (client->info.min_block) {
> +        assert(QEMU_IS_ALIGNED(request.len, client->info.min_block));

it will crash if info.size is unaligned..

" If a server advertises a minimum block size, the advertised export size SHOULD be an integer multiple of that block size"

violation "SHOULD" by server, should it lead to client crash?

> +    }
>       ret = nbd_co_send_request(bs, &request, NULL);
>       if (ret < 0) {
>           return ret;
>
Eric Blake March 29, 2019, 3:55 p.m. UTC | #2
On 3/29/19 10:26 AM, Vladimir Sementsov-Ogievskiy wrote:
> 29.03.2019 7:27, Eric Blake wrote:
>> If an NBD server advertises a size that is not a multiple of a sector,
>> the block layer rounds up that size, even though we set info.size to
>> the exact byte value sent by the server. The block layer then proceeds
>> to let us read or query block status on the hole that it added past
>> EOF, which the NBD server is unlikely to be happy with. Fortunately,
>> qemu as a server never advertizes an unaligned size, so we generally
>> don't run into this problem; but the nbdkit server makes it easy to
>> test:
>>
>> $ printf %1000d 1 > f1
>> $ ~/nbdkit/nbdkit -fv file f1 & pid=$!
>> $ qemu-img convert -f raw nbd://localhost:10809 f2
>> $ kill $pid
>> $ qemu-img compare f1 f2
>>

>> @@ -966,7 +985,8 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
>>           .from = offset,
>>           .len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
>>                                                   bs->bl.request_alignment),
>> -                                client->info.max_block), bytes),
>> +                                client->info.max_block),
>> +                   MIN(bytes, client->info.size - offset)),

This is a complex computation...

>>           .flags = NBD_CMD_FLAG_REQ_ONE,
>>       };
>>
>> @@ -977,6 +997,23 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
>>           return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
>>       }
>>
>> +    /*
>> +     * Work around the fact that the block layer doesn't do
>> +     * byte-accurate sizing yet - if the status request exceeds the
>> +     * server's advertised size because the block layer rounded size
>> +     * up, we truncated the request to the server (above), or are
>> +     * called on just the hole.
>> +     */
>> +    if (offset >= client->info.size) {
>> +        *pnum = bytes;
>> +        assert(bytes < BDRV_SECTOR_SIZE);
>> +        /* Intentionally don't report offset_valid for the hole */
>> +        return BDRV_BLOCK_ZERO;
>> +    }
>> +
>> +    if (client->info.min_block) {
>> +        assert(QEMU_IS_ALIGNED(request.len, client->info.min_block));

...so the assert is here to say that before we hand a length to the
server, check that our computation ended up being aligned to the
server's requirements on input.

> 
> it will crash if info.size is unaligned..

It will crash if our client code is buggy, before the server ever gets a
chance to see things. Our client code should not be buggy (we do not
want to be sending the server unaligned requests).

> 
> " If a server advertises a minimum block size, the advertised export size SHOULD be an integer multiple of that block size"
> 
> violation "SHOULD" by server, should it lead to client crash?

We are not preventing the server replying with unaligned data, but the
client sending an unaligned request. You're right that we must not
abort() on anything the server sends (even if it was not compliant). But
the client code is under our control, and I'd rather catch it up front
if I have a bug in my complex calculations rather than waiting for
whatever the server decides to do with my buggy request.
Eric Blake March 29, 2019, 4:18 p.m. UTC | #3
On 3/29/19 10:55 AM, Eric Blake wrote:

>>> @@ -966,7 +985,8 @@ int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
>>>           .from = offset,
>>>           .len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
>>>                                                   bs->bl.request_alignment),
>>> -                                client->info.max_block), bytes),
>>> +                                client->info.max_block),
>>> +                   MIN(bytes, client->info.size - offset)),
> 
> This is a complex computation...
> 

>>> +    if (offset >= client->info.size) {
>>> +        *pnum = bytes;
>>> +        assert(bytes < BDRV_SECTOR_SIZE);
>>> +        /* Intentionally don't report offset_valid for the hole */
>>> +        return BDRV_BLOCK_ZERO;
>>> +    }
>>> +
>>> +    if (client->info.min_block) {
>>> +        assert(QEMU_IS_ALIGNED(request.len, client->info.min_block));
> 
> ...so the assert is here to say that before we hand a length to the
> server, check that our computation ended up being aligned to the
> server's requirements on input.
> 
>>
>> it will crash if info.size is unaligned..
> 
> It will crash if our client code is buggy, before the server ever gets a
> chance to see things. Our client code should not be buggy (we do not
> want to be sending the server unaligned requests).

Oh, I see what you are saying - if the server gives us a file length of
1023, in spite of also giving us a min_block of 512, we CANNOT access
those trailing bytes - but it can still cause this code to fail the
assertion. What I should do is submit an additional patch on the
handshake code that if the server sends us an inconsistent length and
mandates min_block, then we truncate the size to match what we can
actually access while still staying compliant (at which point,
client->info.size will always be aligned to client->info.min_block in
the rest of the code, such as here, and this assertion is back to being
safe about client-only computations that all the other inputs to the
computation were also aligned).

7/6 coming up soon...
diff mbox series

Patch

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 3edb508f668..409c2171bc3 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -848,6 +848,25 @@  int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
     if (!bytes) {
         return 0;
     }
+    /*
+     * Work around the fact that the block layer doesn't do
+     * byte-accurate sizing yet - if the read exceeds the server's
+     * advertised size because the block layer rounded size up, then
+     * truncate the request to the server and tail-pad with zero.
+     */
+    if (offset >= client->info.size) {
+        assert(bytes < BDRV_SECTOR_SIZE);
+        qemu_iovec_memset(qiov, 0, 0, bytes);
+        return 0;
+    }
+    if (offset + bytes > client->info.size) {
+        uint64_t slop = offset + bytes - client->info.size;
+
+        assert(slop < BDRV_SECTOR_SIZE);
+        qemu_iovec_memset(qiov, bytes - slop, 0, slop);
+        request.len -= slop;
+    }
+
     ret = nbd_co_send_request(bs, &request, NULL);
     if (ret < 0) {
         return ret;
@@ -966,7 +985,8 @@  int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
         .from = offset,
         .len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
                                                 bs->bl.request_alignment),
-                                client->info.max_block), bytes),
+                                client->info.max_block),
+                   MIN(bytes, client->info.size - offset)),
         .flags = NBD_CMD_FLAG_REQ_ONE,
     };

@@ -977,6 +997,23 @@  int coroutine_fn nbd_client_co_block_status(BlockDriverState *bs,
         return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
     }

+    /*
+     * Work around the fact that the block layer doesn't do
+     * byte-accurate sizing yet - if the status request exceeds the
+     * server's advertised size because the block layer rounded size
+     * up, we truncated the request to the server (above), or are
+     * called on just the hole.
+     */
+    if (offset >= client->info.size) {
+        *pnum = bytes;
+        assert(bytes < BDRV_SECTOR_SIZE);
+        /* Intentionally don't report offset_valid for the hole */
+        return BDRV_BLOCK_ZERO;
+    }
+
+    if (client->info.min_block) {
+        assert(QEMU_IS_ALIGNED(request.len, client->info.min_block));
+    }
     ret = nbd_co_send_request(bs, &request, NULL);
     if (ret < 0) {
         return ret;