diff mbox

doc: Flip bit sense for allowing trim during WRITE_ZEROES

Message ID 1459546186-17002-1-git-send-email-eblake@redhat.com
State New
Headers show

Commit Message

Eric Blake April 1, 2016, 9:29 p.m. UTC
Rather than requiring allocation by default and allowing trims
only on request during WRITE_ZEROES, it seems like a better
default is to allow server optimizations by default and require
full allocation by specific request.  Since WRITE_ZEROES is
experimental and has not yet been implemented, we can flip the
sense of the command flag so that the default flags of all 0s
is the most efficient, and flags (whether _FUA or _NO_HOLE) are
added to make the operation have tighter constraints but
possibly slower execution.

The name NO_HOLE is also slightly nicer at expressing the fact
that there is no dependency on whether NBD_CMD_TRIM is supported.

Also tweak a couple of formatting issues for consistency (for
example, only reserve a bit number in one place).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 doc/proto.md | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

Comments

Alex Bligh April 1, 2016, 10:20 p.m. UTC | #1
On 1 Apr 2016, at 22:29, Eric Blake <eblake@redhat.com> wrote:

> Rather than requiring allocation by default and allowing trims
> only on request during WRITE_ZEROES, it seems like a better
> default is to allow server optimizations by default and require
> full allocation by specific request.  Since WRITE_ZEROES is
> experimental and has not yet been implemented, we can flip the
> sense of the command flag so that the default flags of all 0s
> is the most efficient, and flags (whether _FUA or _NO_HOLE) are
> added to make the operation have tighter constraints but
> possibly slower execution.
> 
> The name NO_HOLE is also slightly nicer at expressing the fact
> that there is no dependency on whether NBD_CMD_TRIM is supported.
> 
> Also tweak a couple of formatting issues for consistency (for
> example, only reserve a bit number in one place).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alex Bligh <alex@alex.org.uk>

+1
Denis V. Lunev April 2, 2016, 7:38 a.m. UTC | #2
On 04/02/2016 12:29 AM, Eric Blake wrote:
> Rather than requiring allocation by default and allowing trims
> only on request during WRITE_ZEROES, it seems like a better
> default is to allow server optimizations by default and require
> full allocation by specific request.  Since WRITE_ZEROES is
> experimental and has not yet been implemented, we can flip the
> sense of the command flag so that the default flags of all 0s
> is the most efficient, and flags (whether _FUA or _NO_HOLE) are
> added to make the operation have tighter constraints but
> possibly slower execution.
>
> The name NO_HOLE is also slightly nicer at expressing the fact
> that there is no dependency on whether NBD_CMD_TRIM is supported.
>
> Also tweak a couple of formatting issues for consistency (for
> example, only reserve a bit number in one place).
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   doc/proto.md | 30 +++++++++++++++++-------------
>   1 file changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/doc/proto.md b/doc/proto.md
> index 758a661..725af4d 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -288,10 +288,10 @@ immediately after the handshake flags field in oldstyle negotiation:
>     schedule I/O accesses as for a rotational medium
>   - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
>     `NBD_CMD_TRIM` commands
> -- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the server
> -  supports `NBD_CMD_WRITE_ZEROES` commands
> -- bit 7, `NBD_FLAG_SEND_DF`; defined by the `STRUCTURED_REPLY` extension;
> -  see below.
> +- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; defined by the experimental
> +  `WRITE_ZEROES` extension; see below.
> +- bit 7, `NBD_FLAG_SEND_DF`; defined by the experimental `STRUCTURED_REPLY`
> +  extension; see below.
>
>   Clients SHOULD ignore unknown flags.
>
> @@ -483,7 +483,7 @@ valid may depend on negotiation during the handshake phase.
>     `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
>     "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
>     flags included `NBD_FLAG_SEND_FUA`.
> -- bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES`
> +- bit 1, `NBD_CMD_NO_HOLE`; defined by the experimental `WRITE_ZEROES`
>     extension; see below.
>   - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
>     extension; see below
> @@ -736,7 +736,7 @@ losing the sparseness.
>   To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds
>   one new command and one new command flag.
>
> -* `NBD_CMD_WRITE_ZEROES` (6)
> +* `NBD_CMD_WRITE_ZEROES`
>
>       A write request with no payload. Length and offset define the location
>       and amount of data to be zeroed.
> @@ -753,9 +753,13 @@ one new command and one new command flag.
>       If this flag was set, the server MUST NOT send the reply until it has
>       ensured that the newly-zeroed data has reached permanent storage.
>
> -    If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command
> -    flags field, the server MAY use trimming to zero out the area, but it
> -    MUST ensure that the data reads back as zero.
> +    By default, the server MAY use trimming to zero out the area, even
> +    if it did not advertise `NBD_FLAG_SEND_TRIM`; but it MUST ensure
> +    that the data reads back as zero.  However, the client MAY set the
> +    command flag `NBD_CMD_FLAG_NO_HOLE` to inform the server that the
> +    area MUST be fully provisioned, ensuring that future writes to the
> +    same area will not cause fragmentation or cause failure due to
> +    insufficient space.
>
>       If an error occurs, the server SHOULD set the appropriate error code
>       in the error field. The server MAY then close the connection.
> @@ -766,9 +770,9 @@ return `EPERM` if it receives a write zeroes request on a read-only export.
>
>   The extension adds the following new command flag:
>
> -- bit 1, `NBD_CMD_FLAG_MAY_TRIM`; valid during `NBD_CMD_WRITE_ZEROES`.
> -  SHOULD be set to 1 if the client allows the server to use trim to perform
> -  the requested operation. The client MAY send `NBD_CMD_FLAG_MAY_TRIM` even
> +- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`.
> +  SHOULD be set to 1 if the client wants to ensure that the server does
> +  not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even
>     if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field.
>
>   ### `STRUCTURED_REPLY` extension
> @@ -893,7 +897,7 @@ error, and alters the reply to the `NBD_CMD_READ` request.
>         were sent earlier in the structured reply, the server SHOULD NOT
>         send multiple distinct offsets that lie within the bounds of a
>         single content chunk.  Valid as a reply to `NBD_CMD_READ`,
> -      `NBD_CMD_WRITE`, and `NBD_CMD_TRIM`.
> +      `NBD_CMD_WRITE`, `NBD_CMD_WRITE_ZEROES`, and `NBD_CMD_TRIM`.
>
>         The payload is structured as:
>
- do not see the problem in this change. It looks correct to me
Eric Blake April 4, 2016, 1:56 p.m. UTC | #3
On 04/01/2016 03:29 PM, Eric Blake wrote:
> Rather than requiring allocation by default and allowing trims
> only on request during WRITE_ZEROES, it seems like a better
> default is to allow server optimizations by default and require
> full allocation by specific request.  Since WRITE_ZEROES is
> experimental and has not yet been implemented, we can flip the
> sense of the command flag so that the default flags of all 0s
> is the most efficient, and flags (whether _FUA or _NO_HOLE) are
> added to make the operation have tighter constraints but
> possibly slower execution.
> 
> The name NO_HOLE is also slightly nicer at expressing the fact
> that there is no dependency on whether NBD_CMD_TRIM is supported.
> 

> -    If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command
> -    flags field, the server MAY use trimming to zero out the area, but it
> -    MUST ensure that the data reads back as zero.
> +    By default, the server MAY use trimming to zero out the area, even
> +    if it did not advertise `NBD_FLAG_SEND_TRIM`; but it MUST ensure
> +    that the data reads back as zero.  However, the client MAY set the
> +    command flag `NBD_CMD_FLAG_NO_HOLE` to inform the server that the
> +    area MUST be fully provisioned, ensuring that future writes to the
> +    same area will not cause fragmentation or cause failure due to
> +    insufficient space.

Question - qemu already has the notion of an nbd server that can be
configured to explicitly check for large repetitions of zero during
NBD_CMD_WRITE, and create holes in the corresponding file as a result
(basically because qemu is compensating for NBD_CMD_WRITE_ZEROES not
being part of the protocol yet, so that the destination file can still
be sparse even though lots of zeroes were sent over the wire). Should we
also allow servers to accept NBD_CMD_FLAG_NO_HOLE during NBD_CMD_WRITE
as an explicit request to NOT perform the magic of searching for large
runs of zeroes, but to just write everything as written?

For older qemu clients that are not aware of the new WRITE_ZEROES
command, the client will also not use the NO_HOLE flag for normal
writes, and everything is at the mercy of the server recognizing zeroes.
 But for newer qemu clients, it can be usefully assumed that the client
will always prefer WRITE_ZEROES (without flags) when it is okay to punch
a hole, and therefore that the new client will also send the NO_HOLE
flag to normal WRITE to inform the server that it doesn't need to waste
time searching for large runs of zeroes.

If this sounds reasonable, I'll go ahead and propose a patch.
Alex Bligh April 4, 2016, 3:15 p.m. UTC | #4
On 4 Apr 2016, at 14:56, Eric Blake <eblake@redhat.com> wrote:

> Question - qemu already has the notion of an nbd server that can be
> configured to explicitly check for large repetitions of zero during
> NBD_CMD_WRITE, and create holes in the corresponding file as a result
> (basically because qemu is compensating for NBD_CMD_WRITE_ZEROES not
> being part of the protocol yet, so that the destination file can still
> be sparse even though lots of zeroes were sent over the wire). Should we
> also allow servers to accept NBD_CMD_FLAG_NO_HOLE during NBD_CMD_WRITE
> as an explicit request to NOT perform the magic of searching for large
> runs of zeroes, but to just write everything as written?

Seems a good idea, but I'd make it a SHOULD rather than a MUST. IE
it's saying "please don't make a hole" but really this is up to the
nbd server. This is true in any case, as the underlying filing system
could create a hole.

--
Alex Bligh
diff mbox

Patch

diff --git a/doc/proto.md b/doc/proto.md
index 758a661..725af4d 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -288,10 +288,10 @@  immediately after the handshake flags field in oldstyle negotiation:
   schedule I/O accesses as for a rotational medium
 - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
   `NBD_CMD_TRIM` commands
-- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the server
-  supports `NBD_CMD_WRITE_ZEROES` commands
-- bit 7, `NBD_FLAG_SEND_DF`; defined by the `STRUCTURED_REPLY` extension;
-  see below.
+- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; defined by the experimental
+  `WRITE_ZEROES` extension; see below.
+- bit 7, `NBD_FLAG_SEND_DF`; defined by the experimental `STRUCTURED_REPLY`
+  extension; see below.

 Clients SHOULD ignore unknown flags.

@@ -483,7 +483,7 @@  valid may depend on negotiation during the handshake phase.
   `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
   "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
   flags included `NBD_FLAG_SEND_FUA`.
-- bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES`
+- bit 1, `NBD_CMD_NO_HOLE`; defined by the experimental `WRITE_ZEROES`
   extension; see below.
 - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
   extension; see below
@@ -736,7 +736,7 @@  losing the sparseness.
 To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds
 one new command and one new command flag.

-* `NBD_CMD_WRITE_ZEROES` (6)
+* `NBD_CMD_WRITE_ZEROES`

     A write request with no payload. Length and offset define the location
     and amount of data to be zeroed.
@@ -753,9 +753,13 @@  one new command and one new command flag.
     If this flag was set, the server MUST NOT send the reply until it has
     ensured that the newly-zeroed data has reached permanent storage.

-    If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command
-    flags field, the server MAY use trimming to zero out the area, but it
-    MUST ensure that the data reads back as zero.
+    By default, the server MAY use trimming to zero out the area, even
+    if it did not advertise `NBD_FLAG_SEND_TRIM`; but it MUST ensure
+    that the data reads back as zero.  However, the client MAY set the
+    command flag `NBD_CMD_FLAG_NO_HOLE` to inform the server that the
+    area MUST be fully provisioned, ensuring that future writes to the
+    same area will not cause fragmentation or cause failure due to
+    insufficient space.

     If an error occurs, the server SHOULD set the appropriate error code
     in the error field. The server MAY then close the connection.
@@ -766,9 +770,9 @@  return `EPERM` if it receives a write zeroes request on a read-only export.

 The extension adds the following new command flag:

-- bit 1, `NBD_CMD_FLAG_MAY_TRIM`; valid during `NBD_CMD_WRITE_ZEROES`.
-  SHOULD be set to 1 if the client allows the server to use trim to perform
-  the requested operation. The client MAY send `NBD_CMD_FLAG_MAY_TRIM` even
+- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`.
+  SHOULD be set to 1 if the client wants to ensure that the server does
+  not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even
   if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field.

 ### `STRUCTURED_REPLY` extension
@@ -893,7 +897,7 @@  error, and alters the reply to the `NBD_CMD_READ` request.
       were sent earlier in the structured reply, the server SHOULD NOT
       send multiple distinct offsets that lie within the bounds of a
       single content chunk.  Valid as a reply to `NBD_CMD_READ`,
-      `NBD_CMD_WRITE`, and `NBD_CMD_TRIM`.
+      `NBD_CMD_WRITE`, `NBD_CMD_WRITE_ZEROES`, and `NBD_CMD_TRIM`.

       The payload is structured as: