diff mbox

doc: Allow NBD_CMD_FLAG_NO_HOLE during NBD_CMD_WRITE

Message ID 1459779314-12266-1-git-send-email-eblake@redhat.com
State New
Headers show

Commit Message

Eric Blake April 4, 2016, 2:15 p.m. UTC
qemu already has an existing server implementation option that will
explicitly search the payload of NBD_CMD_WRITE for large blocks of
zeroes, and punch holes in the underlying file.  For old clients
that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a
workaround to keep the server's destination file approximately as
sparse as the client's source.  However, for new clients that know
how to explicitly request holes, it is unnecessary overhead; and
can lead to the server punching a hole and risking fragmentation or
future ENOSPC even when the client explicitly wanted to write
zeroes rather than a hole.  So it makes sense to let the new
NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 doc/proto.md | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

Comments

Denis V. Lunev April 4, 2016, 2:47 p.m. UTC | #1
On 04/04/2016 05:15 PM, Eric Blake wrote:
> qemu already has an existing server implementation option that will
> explicitly search the payload of NBD_CMD_WRITE for large blocks of
> zeroes, and punch holes in the underlying file.  For old clients
> that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a
> workaround to keep the server's destination file approximately as
> sparse as the client's source.  However, for new clients that know
> how to explicitly request holes, it is unnecessary overhead; and
> can lead to the server punching a hole and risking fragmentation or
> future ENOSPC even when the client explicitly wanted to write
> zeroes rather than a hole.  So it makes sense to let the new
> NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>

what behaviour do you expect  for QCOW2 file?
We should fully provision that image as far as I could understand,
i.e. allocate data blocks with zero content.

I think that this would work.
Eric Blake April 4, 2016, 3 p.m. UTC | #2
On 04/04/2016 08:47 AM, Denis V. Lunev wrote:
> On 04/04/2016 05:15 PM, Eric Blake wrote:
>> qemu already has an existing server implementation option that will
>> explicitly search the payload of NBD_CMD_WRITE for large blocks of
>> zeroes, and punch holes in the underlying file.  For old clients
>> that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a
>> workaround to keep the server's destination file approximately as
>> sparse as the client's source.  However, for new clients that know
>> how to explicitly request holes, it is unnecessary overhead; and
>> can lead to the server punching a hole and risking fragmentation or
>> future ENOSPC even when the client explicitly wanted to write
>> zeroes rather than a hole.  So it makes sense to let the new
>> NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> what behaviour do you expect  for QCOW2 file?
> We should fully provision that image as far as I could understand,
> i.e. allocate data blocks with zero content.

For an old client (that doesn't know how to use WRITE_ZEROES), the
client will send ALL data via CMD_WRITE; and then the server pays
attention to its command-line flags on whether to parse for zeroes
(which is already tri-state, between always allocate, use WRITE_SAME
where possible, and TRIM where possible).  The result is that the server
can be commanded at startup to make its file sparse, even when the
client did not have a sparse file to start with.

For a new client, the client will always use WRITE_ZEROES when it is
okay with the server punching holes, and WRITE_ZEROES+NO_HOLE when it
wants to compress the network stream but still force allocation.
Therefore, the server spending time looking for zeroes during WRITE is
wasted effort, and the client should always send WRITE+NO_HOLE for data
that is not a hole on the client's side of things (even if that content
is all zeroes).  The result is that the server can now create a file
with the same sparseness pattern as the client, assuming that client and
server have the same granularity of hole sizing.

With a qcow2 file as the underlying file of the qemu nbd server, the
behavior then depends on whether the qcow2 file is version 2 (no
efficient ways to represent all zeroes, but there ARE some shortcuts
that can be taken when you know the file is backed by something else
that is all zero) or version 3 (where there is an explicit and efficient
way to mark an all-zero cluster).

> 
> I think that this would work.
>
Alex Bligh April 4, 2016, 3:16 p.m. UTC | #3
On 4 Apr 2016, at 15:15, Eric Blake <eblake@redhat.com> wrote:

> qemu already has an existing server implementation option that will
> explicitly search the payload of NBD_CMD_WRITE for large blocks of
> zeroes, and punch holes in the underlying file.  For old clients
> that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a
> workaround to keep the server's destination file approximately as
> sparse as the client's source.  However, for new clients that know
> how to explicitly request holes, it is unnecessary overhead; and
> can lead to the server punching a hole and risking fragmentation or
> future ENOSPC even when the client explicitly wanted to write
> zeroes rather than a hole.  So it makes sense to let the new
> NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>

+1

Alex

> ---
> doc/proto.md | 33 +++++++++++++++++++++++++++------
> 1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 35a3266..fb97217 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -737,8 +737,14 @@ by a sparse file. With current NBD command set, the client has to issue
> through the wire. The server has to write the data onto disk, effectively
> losing the sparseness.
> 
> -To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds
> -one new command and one new command flag.
> +To remedy this, a `WRITE_ZEROES` extension is envisioned. This
> +extension adds one new transmission flag, one new command, and one new
> +command flag; and refines an existing command.
> +
> +* `NBD_FLAG_SEND_WRITE_ZEROES`
> +
> +    The server SHOULD set this transmission flag to 1 if the
> +    `NBD_CMD_WRITE_ZEROES` request is supported.
> 
> * `NBD_CMD_WRITE_ZEROES`
> 
> @@ -772,12 +778,27 @@ The server SHOULD return `ENOSPC` if it receives a write zeroes request
> including one or more sectors beyond the size of the device. It SHOULD
> return `EPERM` if it receives a write zeroes request on a read-only export.
> 
> +* `NBD_CMD_WRITE`
> +
> +    By default, the server MAY search for large contiguous blocks of
> +    all zero content, and use trimming to zero out those portions of
> +    the write, even if it did not advertise `NBD_FLAG_SEND_TRIM`; but
> +    it MUST ensure that any trimmed areas of data read back as zero.
> +    However, the client MAY set the command flag
> +    `NBD_CMD_FLAG_NO_HOLE` to inform the server that the entire
> +    written area MUST be fully provisioned, ensuring that future
> +    writes to the same area will not cause fragmentation or cause
> +    failure due to insufficient space.  Clients SHOULD NOT set this
> +    flag unless the server advertised `NBD_FLAG_SEND_WRITE_ZEROES` in
> +    the transmisison flags.
> +
> The extension adds the following new command flag:
> 
> -- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`.
> -  SHOULD be set to 1 if the client wants to ensure that the server does
> -  not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even
> -  if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field.
> +- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE` and
> +  `NBD_CMD_WRITE_ZEROES`.  SHOULD be set to 1 if the client wants to
> +  ensure that the server does not create a hole. The client MAY send
> +  `NBD_CMD_FLAG_NO_HOLE` even if `NBD_FLAG_SEND_TRIM` was not set in
> +  the transmission flags field.
> 
> ### `STRUCTURED_REPLY` extension
> 
> -- 
> 2.5.5
>
Paolo Bonzini April 5, 2016, 10:51 p.m. UTC | #4
On 04/04/2016 16:15, Eric Blake wrote:
> qemu already has an existing server implementation option that will
> explicitly search the payload of NBD_CMD_WRITE for large blocks of
> zeroes, and punch holes in the underlying file.  For old clients
> that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a
> workaround to keep the server's destination file approximately as
> sparse as the client's source.

I don't think this is the case; the flag is explicitly meant to override
the client.

Paolo

> However, for new clients that know
> how to explicitly request holes, it is unnecessary overhead; and
> can lead to the server punching a hole and risking fragmentation or
> future ENOSPC even when the client explicitly wanted to write
> zeroes rather than a hole.  So it makes sense to let the new
> NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES.
> 
> Signed-off-by: Eric Blake <eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  doc/proto.md | 33 +++++++++++++++++++++++++++------
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 35a3266..fb97217 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -737,8 +737,14 @@ by a sparse file. With current NBD command set, the client has to issue
>  through the wire. The server has to write the data onto disk, effectively
>  losing the sparseness.
> 
> -To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds
> -one new command and one new command flag.
> +To remedy this, a `WRITE_ZEROES` extension is envisioned. This
> +extension adds one new transmission flag, one new command, and one new
> +command flag; and refines an existing command.
> +
> +* `NBD_FLAG_SEND_WRITE_ZEROES`
> +
> +    The server SHOULD set this transmission flag to 1 if the
> +    `NBD_CMD_WRITE_ZEROES` request is supported.
> 
>  * `NBD_CMD_WRITE_ZEROES`
> 
> @@ -772,12 +778,27 @@ The server SHOULD return `ENOSPC` if it receives a write zeroes request
>  including one or more sectors beyond the size of the device. It SHOULD
>  return `EPERM` if it receives a write zeroes request on a read-only export.
> 
> +* `NBD_CMD_WRITE`
> +
> +    By default, the server MAY search for large contiguous blocks of
> +    all zero content, and use trimming to zero out those portions of
> +    the write, even if it did not advertise `NBD_FLAG_SEND_TRIM`; but
> +    it MUST ensure that any trimmed areas of data read back as zero.
> +    However, the client MAY set the command flag
> +    `NBD_CMD_FLAG_NO_HOLE` to inform the server that the entire
> +    written area MUST be fully provisioned, ensuring that future
> +    writes to the same area will not cause fragmentation or cause
> +    failure due to insufficient space.  Clients SHOULD NOT set this
> +    flag unless the server advertised `NBD_FLAG_SEND_WRITE_ZEROES` in
> +    the transmisison flags.
> +
>  The extension adds the following new command flag:
> 
> -- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`.
> -  SHOULD be set to 1 if the client wants to ensure that the server does
> -  not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even
> -  if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field.
> +- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE` and
> +  `NBD_CMD_WRITE_ZEROES`.  SHOULD be set to 1 if the client wants to
> +  ensure that the server does not create a hole. The client MAY send
> +  `NBD_CMD_FLAG_NO_HOLE` even if `NBD_FLAG_SEND_TRIM` was not set in
> +  the transmission flags field.
> 
>  ### `STRUCTURED_REPLY` extension
>
diff mbox

Patch

diff --git a/doc/proto.md b/doc/proto.md
index 35a3266..fb97217 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -737,8 +737,14 @@  by a sparse file. With current NBD command set, the client has to issue
 through the wire. The server has to write the data onto disk, effectively
 losing the sparseness.

-To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds
-one new command and one new command flag.
+To remedy this, a `WRITE_ZEROES` extension is envisioned. This
+extension adds one new transmission flag, one new command, and one new
+command flag; and refines an existing command.
+
+* `NBD_FLAG_SEND_WRITE_ZEROES`
+
+    The server SHOULD set this transmission flag to 1 if the
+    `NBD_CMD_WRITE_ZEROES` request is supported.

 * `NBD_CMD_WRITE_ZEROES`

@@ -772,12 +778,27 @@  The server SHOULD return `ENOSPC` if it receives a write zeroes request
 including one or more sectors beyond the size of the device. It SHOULD
 return `EPERM` if it receives a write zeroes request on a read-only export.

+* `NBD_CMD_WRITE`
+
+    By default, the server MAY search for large contiguous blocks of
+    all zero content, and use trimming to zero out those portions of
+    the write, even if it did not advertise `NBD_FLAG_SEND_TRIM`; but
+    it MUST ensure that any trimmed areas of data read back as zero.
+    However, the client MAY set the command flag
+    `NBD_CMD_FLAG_NO_HOLE` to inform the server that the entire
+    written area MUST be fully provisioned, ensuring that future
+    writes to the same area will not cause fragmentation or cause
+    failure due to insufficient space.  Clients SHOULD NOT set this
+    flag unless the server advertised `NBD_FLAG_SEND_WRITE_ZEROES` in
+    the transmisison flags.
+
 The extension adds the following new command flag:

-- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE_ZEROES`.
-  SHOULD be set to 1 if the client wants to ensure that the server does
-  not create a hole. The client MAY send `NBD_CMD_FLAG_NO_HOLE` even
-  if `NBD_FLAG_SEND_TRIM` was not set in the transmission flags field.
+- `NBD_CMD_FLAG_NO_HOLE`; valid during `NBD_CMD_WRITE` and
+  `NBD_CMD_WRITE_ZEROES`.  SHOULD be set to 1 if the client wants to
+  ensure that the server does not create a hole. The client MAY send
+  `NBD_CMD_FLAG_NO_HOLE` even if `NBD_FLAG_SEND_TRIM` was not set in
+  the transmission flags field.

 ### `STRUCTURED_REPLY` extension