proto: add 'shift' extension.
diff mbox

Message ID 1474893981-5302-1-git-send-email-vsementsov@virtuozzo.com
State New
Headers show

Commit Message

Vladimir Sementsov-Ogievskiy Sept. 26, 2016, 12:46 p.m. UTC
This extension allows big requests for TRIM and WRITE_ZEROES through
special 'shift' parameter, which means that offset and length should be
shifted left by several bits.

This is needed for efficient clearing large regions of the disk (up to
the whole disk).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Here mentioned WRITE_ZEROES command which is only an experemental
extension for now.

To dicuss:
NBD_OPT_SHIFT Data. It can be reduced to 8 bits actually... Are some
   reserved bits needed here?

 doc/proto.md | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Sept. 26, 2016, 12:51 p.m. UTC | #1
On 26/09/2016 14:46, Vladimir Sementsov-Ogievskiy wrote:
> This extension allows big requests for TRIM and WRITE_ZEROES through
> special 'shift' parameter, which means that offset and length should be
> shifted left by several bits.
> 
> This is needed for efficient clearing large regions of the disk (up to
> the whole disk).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Here mentioned WRITE_ZEROES command which is only an experemental
> extension for now.
> 
> To dicuss:
> NBD_OPT_SHIFT Data. It can be reduced to 8 bits actually... Are some
>    reserved bits needed here?
> 
>  doc/proto.md | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 2de3a6a..6fd1b16 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -682,6 +682,8 @@ The field has the following format:
>    experimental `WRITE_ZEROES` [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
>  - bit 7, `NBD_FLAG_SEND_DF`: defined by the experimental `STRUCTURED_REPLY`
>    [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
> +- bit 8, `NBD_FLAG_SEND_SHIFT` : exposes support for `NBD_CMD_FLAG_SHIFT` and
> +  `NBD_OPT_SHIFT`
>  
>  Clients SHOULD ignore unknown flags.
>  
> @@ -765,6 +767,15 @@ of the newstyle negotiation.
>  
>      Defined by the experimental `INFO` [extension](https://github.com/yoe/nbd/blob/extension-info/doc/proto.md).
>  
> +- `NBD_OPT_SHIFT` (10)
> +
> +    Defines shift used to calculate request offset and length if
> +    `NBD_CMD_FLAG_SHIFT` is set.
> +
> +    Data:
> +
> +    - 32 bits, shift (unsigned); Must not be larger than 32.
> +
>  #### Option reply types
>  
>  These values are used in the "reply type" field, sent by the server
> @@ -872,7 +883,13 @@ valid may depend on negotiation during the handshake phase.
>    [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
>  - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
>    [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
> -
> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
> +  *length* and *offset* left by N bits, where N is defined by `NBD_OPT_SHIFT`
> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
> +  not specified. If after shift `(offset + length)` exceeds disk size, length
> +  should be reduced to `(<disk size> - offset)`. However, `(offset + length)`
> +  must not exceed disk size by more than `(1 << N) - 1`.
>  
>  #### Request types
>  
> 

This is very ad hoc.  Can we instead have a block size common to all
commands?  Block devices in practice have one, in fact that's why
they're called block devices...

Paolo
Vladimir Sementsov-Ogievskiy Sept. 26, 2016, 1:53 p.m. UTC | #2
On 26.09.2016 15:51, Paolo Bonzini wrote:
>
> On 26/09/2016 14:46, Vladimir Sementsov-Ogievskiy wrote:
>> This extension allows big requests for TRIM and WRITE_ZEROES through
>> special 'shift' parameter, which means that offset and length should be
>> shifted left by several bits.
>>
>> This is needed for efficient clearing large regions of the disk (up to
>> the whole disk).
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>
>> Here mentioned WRITE_ZEROES command which is only an experemental
>> extension for now.
>>
>> To dicuss:
>> NBD_OPT_SHIFT Data. It can be reduced to 8 bits actually... Are some
>>     reserved bits needed here?
>>
>>   doc/proto.md | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/doc/proto.md b/doc/proto.md
>> index 2de3a6a..6fd1b16 100644
>> --- a/doc/proto.md
>> +++ b/doc/proto.md
>> @@ -682,6 +682,8 @@ The field has the following format:
>>     experimental `WRITE_ZEROES` [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
>>   - bit 7, `NBD_FLAG_SEND_DF`: defined by the experimental `STRUCTURED_REPLY`
>>     [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
>> +- bit 8, `NBD_FLAG_SEND_SHIFT` : exposes support for `NBD_CMD_FLAG_SHIFT` and
>> +  `NBD_OPT_SHIFT`
>>   
>>   Clients SHOULD ignore unknown flags.
>>   
>> @@ -765,6 +767,15 @@ of the newstyle negotiation.
>>   
>>       Defined by the experimental `INFO` [extension](https://github.com/yoe/nbd/blob/extension-info/doc/proto.md).
>>   
>> +- `NBD_OPT_SHIFT` (10)
>> +
>> +    Defines shift used to calculate request offset and length if
>> +    `NBD_CMD_FLAG_SHIFT` is set.
>> +
>> +    Data:
>> +
>> +    - 32 bits, shift (unsigned); Must not be larger than 32.
>> +
>>   #### Option reply types
>>   
>>   These values are used in the "reply type" field, sent by the server
>> @@ -872,7 +883,13 @@ valid may depend on negotiation during the handshake phase.
>>     [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
>>   - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
>>     [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
>> -
>> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
>> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
>> +  *length* and *offset* left by N bits, where N is defined by `NBD_OPT_SHIFT`
>> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
>> +  not specified. If after shift `(offset + length)` exceeds disk size, length
>> +  should be reduced to `(<disk size> - offset)`. However, `(offset + length)`
>> +  must not exceed disk size by more than `(1 << N) - 1`.
>>   
>>   #### Request types
>>   
>>
> This is very ad hoc.  Can we instead have a block size common to all
> commands?  Block devices in practice have one, in fact that's why
> they're called block devices...
>
> Paolo

Block size can be too small to clear the whole disk in one request (i.e. 
(2**31 * block_size) is too small..)
.
Paolo Bonzini Sept. 26, 2016, 1:54 p.m. UTC | #3
On 26/09/2016 15:53, Vladimir Sementsov-Ogievskiy wrote:
> On 26.09.2016 15:51, Paolo Bonzini wrote:
>> This is very ad hoc.  Can we instead have a block size common to all
>> commands?  Block devices in practice have one, in fact that's why
>> they're called block devices...
> 
> Block size can be too small to clear the whole disk in one request (i.e.
> (2**31 * block_size) is too small..)

Considering that NBD supports multiple outstanding requests, is it a big
deal to require one request per terabyte of storage?

Paolo
Alex Bligh Sept. 26, 2016, 2:05 p.m. UTC | #4
> On 26 Sep 2016, at 13:46, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
> 
> #### Option reply types
> 
> These values are used in the "reply type" field, sent by the server
> @@ -872,7 +883,13 @@ valid may depend on negotiation during the handshake phase.
>   [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
> - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
>   [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
> -
> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
> +  *length* and *offset* left by N bits, where N is defined by `NBD_OPT_SHIFT`
> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
> +  not specified. If after shift `(offset + length)` exceeds disk size, length
> +  should be reduced to `(<disk size> - offset)`. However, `(offset + length)`
> +  must not exceed disk size by more than `(1 << N) - 1`.

Is there a good reason the shift size can't be either the minimum block size
or otherwise negotiated using NBD_OPT_INFO?
Alex Bligh Sept. 26, 2016, 2:06 p.m. UTC | #5
> On 26 Sep 2016, at 14:54, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> Considering that NBD supports multiple outstanding requests, is it a big
> deal to require one request per terabyte of storage?

+1

Also I don't think we particularly want to make clearing the entire
disk easy to do by mistake!

This whole 'clear the whole disk in one command' seems like a
dire case of premature optimisation.
Eric Blake Sept. 26, 2016, 8:21 p.m. UTC | #6
On 09/26/2016 07:46 AM, Vladimir Sementsov-Ogievskiy wrote:
> This extension allows big requests for TRIM and WRITE_ZEROES through
> special 'shift' parameter, which means that offset and length should be
> shifted left by several bits.
> 
> This is needed for efficient clearing large regions of the disk (up to
> the whole disk).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Here mentioned WRITE_ZEROES command which is only an experemental
> extension for now.
> 
> To dicuss:

> +- `NBD_OPT_SHIFT` (10)
> +
> +    Defines shift used to calculate request offset and length if
> +    `NBD_CMD_FLAG_SHIFT` is set.
> +
> +    Data:
> +
> +    - 32 bits, shift (unsigned); Must not be larger than 32.

Uggh. You're making the protocol stateful (the server has to remember
what the client has previously requested via NBD_CMD_FLAG_SHIFT, rather
than having ALL information it needs immediately available in the
current NBD_CMD_WRITE_ZEROES).

I'd much rather support a single flag that says to zero the entire disk
than to introduce stateful variable-amount shifting.
Eric Blake Sept. 26, 2016, 8:35 p.m. UTC | #7
On 09/26/2016 07:51 AM, Paolo Bonzini wrote:

>> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
>> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
>> +  *length* and *offset* left by N bits, where N is defined by `NBD_OPT_SHIFT`
>> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
>> +  not specified. If after shift `(offset + length)` exceeds disk size, length
>> +  should be reduced to `(<disk size> - offset)`. However, `(offset + length)`
>> +  must not exceed disk size by more than `(1 << N) - 1`.
>>  
>>  #### Request types
>>  
>>
> 
> This is very ad hoc.  Can we instead have a block size common to all
> commands?  Block devices in practice have one, in fact that's why
> they're called block devices...

The INFO extensions are supposed to be a way for the server to
communicate block sizes to the guest (note, it is one-way communication;
the guest does NOT get to pick an arbitrary size, but relies on the
server reporting it) - but I don't know if that sizing information is
useful enough for the task at hand.  As it was, the INFO extension had a
proposal idea a while back about advertising a different size for TRIM
and WRITE_ZEROES than what is preferred for WRITE and READ, so having a
single size that works for shifted commands that operate on blocks
instead of bytes may not even be feasible if there is no single block
size to settle on.

But having a universal command flag that says "this command requests
offset and length in units of blocks instead of bytes", where blocks is
an up-front sizing settled on during handshake via INFO extension, and
NOT something that the client can change at-will during a live session,
may be useful.  Or it may be the source of arithmetic overflow exploits
in poor implementations, or of denial-of-service with used with a READ
or WRITE to send more than 2G of data in a single command.  In other
words, I don't yet see a compelling use case for being able to request
shifted sizes.
Wouter Verhelst Sept. 26, 2016, 11:41 p.m. UTC | #8
On Mon, Sep 26, 2016 at 03:21:46PM -0500, Eric Blake wrote:
> I'd much rather support a single flag that says to zero the entire disk
> than to introduce stateful variable-amount shifting.

That's almost exactly the opposite of what I said :)

Now, I don't feel very strong either way, but what matters to me is:

- NBD is a simple, easy to understand protocol; that is a feature, and
  so it should remain that way.
- Every time we add another option, flag, or command, we make the
  protocol slightly more complex, which is counter to that goal.
- Adding a command with a single use case (i.e., a "wipe the whole
  device" command) seems like it would not see much use, except perhaps
  in the use case that Virtuozzo is thinking about. In other words, it
  makes things slightly more complex for little benefit.

I thought a negotiated shift size could be creatively used for other
things beyond just "wipe the whole disk" commands, and that it might be
elegant in that way. On the other hand, I recognize that adding state in
that manner also complicates the protocol in that an observer which sees
only part of the traffic may not understand what's going on anymore.

So let's just say that an NBD_CMD_FLAG_SHIFT would:
- Left-shift the size by 16 bits; no more, no less
  - 2^32-1 is too large a granularity for this to be useful beyond "wipe
    whole disk" commands; 2^16-1 (65535) seems like a more useful
    granularity.
  - This allows for a maximum number of 2^48-1 bytes (one byte shy of
    256 tebibytes) to be affected by a single command, which seems
    sufficient for the given purpose.
  - If someone really wants to wipe 2^64-1 bytes (i.e., 16 exbibytes),
    they are probably using the wrong tools.
- Be only valid for commands that don't send or expect data to be sent
  out over the wire.
  - currently TRIM and WRITE_ZEROES, but not READ or WRITE.

Thoughts?
Alex Bligh Sept. 27, 2016, 7:36 a.m. UTC | #9
> On 27 Sep 2016, at 00:41, Wouter Verhelst <w@uter.be> wrote:
> 
> Thoughts?

Honestly? This whole thing seems like complication for little gain. One command per 2GB wiped doesn't seem like a bad thing.
Denis Lunev Sept. 27, 2016, 9:43 a.m. UTC | #10
On 09/26/2016 03:46 PM, Vladimir Sementsov-Ogievskiy wrote:
> This extension allows big requests for TRIM and WRITE_ZEROES through
> special 'shift' parameter, which means that offset and length should be
> shifted left by several bits.
>
> This is needed for efficient clearing large regions of the disk (up to
> the whole disk).
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>
> Here mentioned WRITE_ZEROES command which is only an experemental
> extension for now.
>
> To dicuss:
> NBD_OPT_SHIFT Data. It can be reduced to 8 bits actually... Are some
>    reserved bits needed here?
>
>  doc/proto.md | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/doc/proto.md b/doc/proto.md
> index 2de3a6a..6fd1b16 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -682,6 +682,8 @@ The field has the following format:
>    experimental `WRITE_ZEROES` [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
>  - bit 7, `NBD_FLAG_SEND_DF`: defined by the experimental `STRUCTURED_REPLY`
>    [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
> +- bit 8, `NBD_FLAG_SEND_SHIFT` : exposes support for `NBD_CMD_FLAG_SHIFT` and
> +  `NBD_OPT_SHIFT`
>  
>  Clients SHOULD ignore unknown flags.
>  
> @@ -765,6 +767,15 @@ of the newstyle negotiation.
>  
>      Defined by the experimental `INFO` [extension](https://github.com/yoe/nbd/blob/extension-info/doc/proto.md).
>  
> +- `NBD_OPT_SHIFT` (10)
> +
> +    Defines shift used to calculate request offset and length if
> +    `NBD_CMD_FLAG_SHIFT` is set.
> +
> +    Data:
> +
> +    - 32 bits, shift (unsigned); Must not be larger than 32.
> +
>  #### Option reply types
>  
>  These values are used in the "reply type" field, sent by the server
> @@ -872,7 +883,13 @@ valid may depend on negotiation during the handshake phase.
>    [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
>  - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
>    [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
> -
> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
> +  *length* and *offset* left by N bits, where N is defined by `NBD_OPT_SHIFT`
> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
> +  not specified. If after shift `(offset + length)` exceeds disk size, length
> +  should be reduced to `(<disk size> - offset)`. However, `(offset + length)`
> +  must not exceed disk size by more than `(1 << N) - 1`.
>  
>  #### Request types
>  
We could go in a different direction and export flag
'has_zero_init' which will report that the storage is
initialized with all zeroes at the moment. In this
case mirroring code will not fall into this
branch.

Den
Paolo Bonzini Sept. 27, 2016, 10:15 a.m. UTC | #11
> We could go in a different direction and export flag
> 'has_zero_init' which will report that the storage is
> initialized with all zeroes at the moment. In this
> case mirroring code will not fall into this
> branch.

Why don't you add the zero_init flag to QEMU's NBD driver instead?

Thanks,

Paolo
Denis Lunev Sept. 27, 2016, 10:25 a.m. UTC | #12
On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
>> We could go in a different direction and export flag
>> 'has_zero_init' which will report that the storage is
>> initialized with all zeroes at the moment. In this
>> case mirroring code will not fall into this
>> branch.
> Why don't you add the zero_init flag to QEMU's NBD driver instead?
>
> Thanks,
>
> Paolo
for all cases without knowing real backend on the server side?
I think this would be wrong.

Den
Paolo Bonzini Sept. 27, 2016, 12:07 p.m. UTC | #13
----- Original Message -----
> From: "Denis V. Lunev" <den@virtuozzo.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
> w@uter.be
> Sent: Tuesday, September 27, 2016 12:25:54 PM
> Subject: Re: [PATCH] proto: add 'shift' extension.
> 
> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
> >> We could go in a different direction and export flag
> >> 'has_zero_init' which will report that the storage is
> >> initialized with all zeroes at the moment. In this
> >> case mirroring code will not fall into this
> >> branch.
> > Why don't you add the zero_init flag to QEMU's NBD driver instead?
>
> for all cases without knowing real backend on the server side?
> I think this would be wrong.

Add it to the command line, and leave it to libvirt or the user to
pass "-drive file.driver=nbd,...,file.zero-init=on".

Paolo
Denis Lunev Sept. 27, 2016, 1:28 p.m. UTC | #14
On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
>
> ----- Original Message -----
>> From: "Denis V. Lunev" <den@virtuozzo.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
>> w@uter.be
>> Sent: Tuesday, September 27, 2016 12:25:54 PM
>> Subject: Re: [PATCH] proto: add 'shift' extension.
>>
>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
>>>> We could go in a different direction and export flag
>>>> 'has_zero_init' which will report that the storage is
>>>> initialized with all zeroes at the moment. In this
>>>> case mirroring code will not fall into this
>>>> branch.
>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
>> for all cases without knowing real backend on the server side?
>> I think this would be wrong.
> Add it to the command line, and leave it to libvirt or the user to
> pass "-drive file.driver=nbd,...,file.zero-init=on".
>
> Paolo
I have started with something very similar for 'drive-mirror' command.
We have added additional flag for this to improve migration speed
and this was rejected.

Den
Denis Lunev Sept. 27, 2016, 1:46 p.m. UTC | #15
On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
>
> ----- Original Message -----
>> From: "Denis V. Lunev" <den@virtuozzo.com>
>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
>> w@uter.be
>> Sent: Tuesday, September 27, 2016 12:25:54 PM
>> Subject: Re: [PATCH] proto: add 'shift' extension.
>>
>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
>>>> We could go in a different direction and export flag
>>>> 'has_zero_init' which will report that the storage is
>>>> initialized with all zeroes at the moment. In this
>>>> case mirroring code will not fall into this
>>>> branch.
>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
>> for all cases without knowing real backend on the server side?
>> I think this would be wrong.
> Add it to the command line, and leave it to libvirt or the user to
> pass "-drive file.driver=nbd,...,file.zero-init=on".
>
> Paolo
this specifically means that all QMP commands like 'drive-backup' and
'drive-mirror' will have to be modified to pass this attribute. If this is
OK, we can proceed with that.

Den
Paolo Bonzini Sept. 27, 2016, 5:04 p.m. UTC | #16
On 27/09/2016 15:28, Denis V. Lunev wrote:
> On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
>>
>> ----- Original Message -----
>>> From: "Denis V. Lunev" <den@virtuozzo.com>
>>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
>>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
>>> w@uter.be
>>> Sent: Tuesday, September 27, 2016 12:25:54 PM
>>> Subject: Re: [PATCH] proto: add 'shift' extension.
>>>
>>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
>>>>> We could go in a different direction and export flag
>>>>> 'has_zero_init' which will report that the storage is
>>>>> initialized with all zeroes at the moment. In this
>>>>> case mirroring code will not fall into this
>>>>> branch.
>>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
>>> for all cases without knowing real backend on the server side?
>>> I think this would be wrong.
>> Add it to the command line, and leave it to libvirt or the user to
>> pass "-drive file.driver=nbd,...,file.zero-init=on".
>
> I have started with something very similar for 'drive-mirror' command.
> We have added additional flag for this to improve migration speed
> and this was rejected.

You can add it through the filename path too, through a URI option
"nbd://...?zero-init=on".

Paolo
Denis Lunev Sept. 27, 2016, 6:59 p.m. UTC | #17
On 09/27/2016 08:04 PM, Paolo Bonzini wrote:
>
> On 27/09/2016 15:28, Denis V. Lunev wrote:
>> On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
>>> ----- Original Message -----
>>>> From: "Denis V. Lunev" <den@virtuozzo.com>
>>>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>>>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
>>>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
>>>> w@uter.be
>>>> Sent: Tuesday, September 27, 2016 12:25:54 PM
>>>> Subject: Re: [PATCH] proto: add 'shift' extension.
>>>>
>>>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
>>>>>> We could go in a different direction and export flag
>>>>>> 'has_zero_init' which will report that the storage is
>>>>>> initialized with all zeroes at the moment. In this
>>>>>> case mirroring code will not fall into this
>>>>>> branch.
>>>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
>>>> for all cases without knowing real backend on the server side?
>>>> I think this would be wrong.
>>> Add it to the command line, and leave it to libvirt or the user to
>>> pass "-drive file.driver=nbd,...,file.zero-init=on".
>> I have started with something very similar for 'drive-mirror' command.
>> We have added additional flag for this to improve migration speed
>> and this was rejected.
> You can add it through the filename path too, through a URI option
> "nbd://...?zero-init=on".
>
> Paolo
ha, cool idea! Thanks!

Den
Kevin Wolf Sept. 28, 2016, 8:34 a.m. UTC | #18
Am 27.09.2016 um 20:59 hat Denis V. Lunev geschrieben:
> On 09/27/2016 08:04 PM, Paolo Bonzini wrote:
> >
> > On 27/09/2016 15:28, Denis V. Lunev wrote:
> >> On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
> >>> ----- Original Message -----
> >>>> From: "Denis V. Lunev" <den@virtuozzo.com>
> >>>> To: "Paolo Bonzini" <pbonzini@redhat.com>
> >>>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
> >>>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
> >>>> w@uter.be
> >>>> Sent: Tuesday, September 27, 2016 12:25:54 PM
> >>>> Subject: Re: [PATCH] proto: add 'shift' extension.
> >>>>
> >>>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
> >>>>>> We could go in a different direction and export flag
> >>>>>> 'has_zero_init' which will report that the storage is
> >>>>>> initialized with all zeroes at the moment. In this
> >>>>>> case mirroring code will not fall into this
> >>>>>> branch.
> >>>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
> >>>> for all cases without knowing real backend on the server side?
> >>>> I think this would be wrong.
> >>> Add it to the command line, and leave it to libvirt or the user to
> >>> pass "-drive file.driver=nbd,...,file.zero-init=on".
> >> I have started with something very similar for 'drive-mirror' command.
> >> We have added additional flag for this to improve migration speed
> >> and this was rejected.
> > You can add it through the filename path too, through a URI option
> > "nbd://...?zero-init=on".
> >
> > Paolo
> ha, cool idea! Thanks!

What's the advantage of writing "?zero-init=on" instead of
",zero-init=on"? Doesn't it only add more string parsing code for no
benefit?

Kevin
Denis Lunev Sept. 28, 2016, 8:37 a.m. UTC | #19
On 09/28/2016 11:34 AM, Kevin Wolf wrote:
> Am 27.09.2016 um 20:59 hat Denis V. Lunev geschrieben:
>> On 09/27/2016 08:04 PM, Paolo Bonzini wrote:
>>> On 27/09/2016 15:28, Denis V. Lunev wrote:
>>>> On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
>>>>> ----- Original Message -----
>>>>>> From: "Denis V. Lunev" <den@virtuozzo.com>
>>>>>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>>>>>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
>>>>>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
>>>>>> w@uter.be
>>>>>> Sent: Tuesday, September 27, 2016 12:25:54 PM
>>>>>> Subject: Re: [PATCH] proto: add 'shift' extension.
>>>>>>
>>>>>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
>>>>>>>> We could go in a different direction and export flag
>>>>>>>> 'has_zero_init' which will report that the storage is
>>>>>>>> initialized with all zeroes at the moment. In this
>>>>>>>> case mirroring code will not fall into this
>>>>>>>> branch.
>>>>>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
>>>>>> for all cases without knowing real backend on the server side?
>>>>>> I think this would be wrong.
>>>>> Add it to the command line, and leave it to libvirt or the user to
>>>>> pass "-drive file.driver=nbd,...,file.zero-init=on".
>>>> I have started with something very similar for 'drive-mirror' command.
>>>> We have added additional flag for this to improve migration speed
>>>> and this was rejected.
>>> You can add it through the filename path too, through a URI option
>>> "nbd://...?zero-init=on".
>>>
>>> Paolo
>> ha, cool idea! Thanks!
> What's the advantage of writing "?zero-init=on" instead of
> ",zero-init=on"? Doesn't it only add more string parsing code for no
> benefit?
>
> Kevin
Here I appreciate the idea to pass command line options in the
target file name. Will it be performed via comma or '?' - there
is no difference for us. We will check and use what is already
implemented.

The most important for us is an approach.

Den
Kevin Wolf Sept. 28, 2016, 8:56 a.m. UTC | #20
Am 28.09.2016 um 10:37 hat Denis V. Lunev geschrieben:
> On 09/28/2016 11:34 AM, Kevin Wolf wrote:
> > Am 27.09.2016 um 20:59 hat Denis V. Lunev geschrieben:
> >> On 09/27/2016 08:04 PM, Paolo Bonzini wrote:
> >>> On 27/09/2016 15:28, Denis V. Lunev wrote:
> >>>> On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
> >>>>> ----- Original Message -----
> >>>>>> From: "Denis V. Lunev" <den@virtuozzo.com>
> >>>>>> To: "Paolo Bonzini" <pbonzini@redhat.com>
> >>>>>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
> >>>>>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
> >>>>>> w@uter.be
> >>>>>> Sent: Tuesday, September 27, 2016 12:25:54 PM
> >>>>>> Subject: Re: [PATCH] proto: add 'shift' extension.
> >>>>>>
> >>>>>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
> >>>>>>>> We could go in a different direction and export flag
> >>>>>>>> 'has_zero_init' which will report that the storage is
> >>>>>>>> initialized with all zeroes at the moment. In this
> >>>>>>>> case mirroring code will not fall into this
> >>>>>>>> branch.
> >>>>>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
> >>>>>> for all cases without knowing real backend on the server side?
> >>>>>> I think this would be wrong.
> >>>>> Add it to the command line, and leave it to libvirt or the user to
> >>>>> pass "-drive file.driver=nbd,...,file.zero-init=on".
> >>>> I have started with something very similar for 'drive-mirror' command.
> >>>> We have added additional flag for this to improve migration speed
> >>>> and this was rejected.
> >>> You can add it through the filename path too, through a URI option
> >>> "nbd://...?zero-init=on".
> >>>
> >>> Paolo
> >> ha, cool idea! Thanks!
> > What's the advantage of writing "?zero-init=on" instead of
> > ",zero-init=on"? Doesn't it only add more string parsing code for no
> > benefit?
> >
> > Kevin
> Here I appreciate the idea to pass command line options in the
> target file name. Will it be performed via comma or '?' - there
> is no difference for us. We will check and use what is already
> implemented.
> 
> The most important for us is an approach.

For me, too. With commas it's not part of the file name that must be
parsed out of the string, but a separate option, which is the much
cleaner approach.

The good thing is that the conversion of NBD to individual options has
progressed far enough that you wouldn't even be able to implement the
URL extension without implementing the separate option, too. :-)
(Because all the URL parser does is splitting the URL into individual
options before passing them to nbd_open().)

Kevin
Vladimir Sementsov-Ogievskiy Sept. 28, 2016, 9 a.m. UTC | #21
On 28.09.2016 11:56, Kevin Wolf wrote:
> Am 28.09.2016 um 10:37 hat Denis V. Lunev geschrieben:
>> On 09/28/2016 11:34 AM, Kevin Wolf wrote:
>>> Am 27.09.2016 um 20:59 hat Denis V. Lunev geschrieben:
>>>> On 09/27/2016 08:04 PM, Paolo Bonzini wrote:
>>>>> On 27/09/2016 15:28, Denis V. Lunev wrote:
>>>>>> On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Denis V. Lunev" <den@virtuozzo.com>
>>>>>>>> To: "Paolo Bonzini" <pbonzini@redhat.com>
>>>>>>>> Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
>>>>>>>> nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
>>>>>>>> w@uter.be
>>>>>>>> Sent: Tuesday, September 27, 2016 12:25:54 PM
>>>>>>>> Subject: Re: [PATCH] proto: add 'shift' extension.
>>>>>>>>
>>>>>>>> On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
>>>>>>>>>> We could go in a different direction and export flag
>>>>>>>>>> 'has_zero_init' which will report that the storage is
>>>>>>>>>> initialized with all zeroes at the moment. In this
>>>>>>>>>> case mirroring code will not fall into this
>>>>>>>>>> branch.
>>>>>>>>> Why don't you add the zero_init flag to QEMU's NBD driver instead?
>>>>>>>> for all cases without knowing real backend on the server side?
>>>>>>>> I think this would be wrong.
>>>>>>> Add it to the command line, and leave it to libvirt or the user to
>>>>>>> pass "-drive file.driver=nbd,...,file.zero-init=on".
>>>>>> I have started with something very similar for 'drive-mirror' command.
>>>>>> We have added additional flag for this to improve migration speed
>>>>>> and this was rejected.
>>>>> You can add it through the filename path too, through a URI option
>>>>> "nbd://...?zero-init=on".
>>>>>
>>>>> Paolo
>>>> ha, cool idea! Thanks!
>>> What's the advantage of writing "?zero-init=on" instead of
>>> ",zero-init=on"? Doesn't it only add more string parsing code for no
>>> benefit?
>>>
>>> Kevin
>> Here I appreciate the idea to pass command line options in the
>> target file name. Will it be performed via comma or '?' - there
>> is no difference for us. We will check and use what is already
>> implemented.
>>
>> The most important for us is an approach.
> For me, too. With commas it's not part of the file name that must be
> parsed out of the string, but a separate option, which is the much
> cleaner approach.
>
> The good thing is that the conversion of NBD to individual options has
> progressed far enough that you wouldn't even be able to implement the
> URL extension without implementing the separate option, too. :-)
> (Because all the URL parser does is splitting the URL into individual
> options before passing them to nbd_open().)
>
> Kevin

Just note: we can use json instead of url, like this: virsh 
qemu-monitor-command backup-vm 
'{"execute":"drive-backup","arguments":{"device": "disk", "target": 
"json://{\"driver\":\"nbd\",\"host\":\"127.0.0.1\",\"port\":\"10809\",\"zero-init\":\"on\"}", 
"mode": "existing", "sync": "full"}}'
Kevin Wolf Sept. 28, 2016, 9:32 a.m. UTC | #22
Am 28.09.2016 um 11:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 28.09.2016 11:56, Kevin Wolf wrote:
> >Am 28.09.2016 um 10:37 hat Denis V. Lunev geschrieben:
> >>On 09/28/2016 11:34 AM, Kevin Wolf wrote:
> >>>Am 27.09.2016 um 20:59 hat Denis V. Lunev geschrieben:
> >>>>On 09/27/2016 08:04 PM, Paolo Bonzini wrote:
> >>>>>On 27/09/2016 15:28, Denis V. Lunev wrote:
> >>>>>>On 09/27/2016 03:07 PM, Paolo Bonzini wrote:
> >>>>>>>----- Original Message -----
> >>>>>>>>From: "Denis V. Lunev" <den@virtuozzo.com>
> >>>>>>>>To: "Paolo Bonzini" <pbonzini@redhat.com>
> >>>>>>>>Cc: "Vladimir Sementsov-Ogievskiy" <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, qemu-block@nongnu.org,
> >>>>>>>>nbd-general@lists.sourceforge.net, alex@alex.org.uk, eblake@redhat.com, kwolf@redhat.com, stefanha@redhat.com,
> >>>>>>>>w@uter.be
> >>>>>>>>Sent: Tuesday, September 27, 2016 12:25:54 PM
> >>>>>>>>Subject: Re: [PATCH] proto: add 'shift' extension.
> >>>>>>>>
> >>>>>>>>On 09/27/2016 01:15 PM, Paolo Bonzini wrote:
> >>>>>>>>>>We could go in a different direction and export flag
> >>>>>>>>>>'has_zero_init' which will report that the storage is
> >>>>>>>>>>initialized with all zeroes at the moment. In this
> >>>>>>>>>>case mirroring code will not fall into this
> >>>>>>>>>>branch.
> >>>>>>>>>Why don't you add the zero_init flag to QEMU's NBD driver instead?
> >>>>>>>>for all cases without knowing real backend on the server side?
> >>>>>>>>I think this would be wrong.
> >>>>>>>Add it to the command line, and leave it to libvirt or the user to
> >>>>>>>pass "-drive file.driver=nbd,...,file.zero-init=on".
> >>>>>>I have started with something very similar for 'drive-mirror' command.
> >>>>>>We have added additional flag for this to improve migration speed
> >>>>>>and this was rejected.
> >>>>>You can add it through the filename path too, through a URI option
> >>>>>"nbd://...?zero-init=on".
> >>>>>
> >>>>>Paolo
> >>>>ha, cool idea! Thanks!
> >>>What's the advantage of writing "?zero-init=on" instead of
> >>>",zero-init=on"? Doesn't it only add more string parsing code for no
> >>>benefit?
> >>>
> >>>Kevin
> >>Here I appreciate the idea to pass command line options in the
> >>target file name. Will it be performed via comma or '?' - there
> >>is no difference for us. We will check and use what is already
> >>implemented.
> >>
> >>The most important for us is an approach.
> >For me, too. With commas it's not part of the file name that must be
> >parsed out of the string, but a separate option, which is the much
> >cleaner approach.
> >
> >The good thing is that the conversion of NBD to individual options has
> >progressed far enough that you wouldn't even be able to implement the
> >URL extension without implementing the separate option, too. :-)
> >(Because all the URL parser does is splitting the URL into individual
> >options before passing them to nbd_open().)
> >
> >Kevin
> 
> Just note: we can use json instead of url, like this: virsh
> qemu-monitor-command backup-vm
> '{"execute":"drive-backup","arguments":{"device": "disk", "target": "json://{\"driver\":\"nbd\",\"host\":\"127.0.0.1\",\"port\":\"10809\",\"zero-init\":\"on\"}",
> "mode": "existing", "sync": "full"}}'

Ah, sorry, I missed the part that you need a file name because that's
the only thing the QMP command accepts. Yes, then the json: pseudo
protocol is a good workaround for the moment.

I hope we can declare blockdev-add stable soon, and then you can use
blockdev-backup instead of drive-backup.

Kevin

Patch
diff mbox

diff --git a/doc/proto.md b/doc/proto.md
index 2de3a6a..6fd1b16 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -682,6 +682,8 @@  The field has the following format:
   experimental `WRITE_ZEROES` [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
 - bit 7, `NBD_FLAG_SEND_DF`: defined by the experimental `STRUCTURED_REPLY`
   [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
+- bit 8, `NBD_FLAG_SEND_SHIFT` : exposes support for `NBD_CMD_FLAG_SHIFT` and
+  `NBD_OPT_SHIFT`
 
 Clients SHOULD ignore unknown flags.
 
@@ -765,6 +767,15 @@  of the newstyle negotiation.
 
     Defined by the experimental `INFO` [extension](https://github.com/yoe/nbd/blob/extension-info/doc/proto.md).
 
+- `NBD_OPT_SHIFT` (10)
+
+    Defines shift used to calculate request offset and length if
+    `NBD_CMD_FLAG_SHIFT` is set.
+
+    Data:
+
+    - 32 bits, shift (unsigned); Must not be larger than 32.
+
 #### Option reply types
 
 These values are used in the "reply type" field, sent by the server
@@ -872,7 +883,13 @@  valid may depend on negotiation during the handshake phase.
   [extension](https://github.com/yoe/nbd/blob/extension-write-zeroes/doc/proto.md).
 - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
   [extension](https://github.com/yoe/nbd/blob/extension-structured-reply/doc/proto.md).
-
+- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
+  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
+  *length* and *offset* left by N bits, where N is defined by `NBD_OPT_SHIFT`
+  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
+  not specified. If after shift `(offset + length)` exceeds disk size, length
+  should be reduced to `(<disk size> - offset)`. However, `(offset + length)`
+  must not exceed disk size by more than `(1 << N) - 1`.
 
 #### Request types