diff mbox

[PATCHv3] Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA.

Message ID 1459858065-13348-1-git-send-email-alex@alex.org.uk
State New
Headers show

Commit Message

Alex Bligh April 5, 2016, 12:07 p.m. UTC
Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
the latter may be set on any command, and its semantics on commands other
than NBD_CMD_WRITE need explaining. Further, explain how these relate to
reordering of commands.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 doc/proto.md | 51 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 41 insertions(+), 10 deletions(-)

ARRRGH - this time with a proper commit header.

Changes since version 2:

* Rebase on master

* Remove bogus 'SHOULD' for FLUSH in relation to writes that are in flight
  but not yet completed

* After consultation with lkml etc., document that FUA on things that do not
  write does nothing

* Document that sending FUA for commands that do nothing is permissible, but
  'SHOULD NOT' be done; an existing client does this.

* Document that FUA on TRIM should do something after all, per Kevin Wolf's
  comment

I'm hoping this is now complete.

Comments

Eric Blake April 5, 2016, 2:22 p.m. UTC | #1
On 04/05/2016 06:07 AM, Alex Bligh wrote:
> Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
> the latter may be set on any command, and its semantics on commands other
> than NBD_CMD_WRITE need explaining. Further, explain how these relate to
> reordering of commands.
> 
> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> ---
>  doc/proto.md | 51 +++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 41 insertions(+), 10 deletions(-)
> 

close, but needs a v4

> +++ b/doc/proto.md
> @@ -217,6 +217,33 @@ handle as was sent by the client in the corresponding request.  In
>  this way, the client can correlate which request is receiving a
>  response.
>  
> +#### Ordering of messages and writes
> +
> +The server MAY process commands out of order, and MAY reply out of
> +order, save that:

maybe s/save/except/

> +
> +* All write commands (that includes both `NBD_CMD_WRITE` and
> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)

should we also list  NBD_CMD_WRITE_ZEROES?

> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
> +  storage prior to replying to that `NBD_CMD_FLUSH`. This
> +  paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within
> +  the transmission flags, as otherwise `NBD_CMD_FLUSH` will never
> +  be sent by the client to the server.
> +
> +* A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set
> +  in its command flags until the data (if any) written by that command
> +  is persisted to non-volatile storage. This only applies if
> +  `NBD_FLAG_SEND_FLUSH` is set within the transmission flags, as otherwise

s/FLUSH/FUA/

> +  `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server
> +  by the client.
> +
> +`NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with
> +`REQ_FLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux
> +kernel bio with `REQ_FUA` set. In case of ambiguity in this
> +specification, the
> +[kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
> +may be useful.
> +
>  #### Request message
>  
>  The request message, sent by the client, looks as follows:
> @@ -483,10 +510,20 @@ affects a particular command.  Clients MUST NOT set a command flag bit
>  that is not documented for the particular command; and whether a flag is
>  valid may depend on negotiation during the handshake phase.
>  
> -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and
> -  `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
> -  "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
> -  flags included `NBD_FLAG_SEND_FUA`.
> +- bit 0, `NBD_CMD_FLAG_FUA`; This flag is valid for all commands provided

s/commands/commands,/

> +  `NBD_FLAG_SEND_FUA` has been negotiated, in which case the server MUST
> +  accept all commands with this bit set (even by ignoring the bit). The
> +  client SHOULD NOT set this bit unless the command has the potential of
> +  writing data (current commands are `NBD_CMD_WRITE`, `NBD_CMD_WRITE_ZEROES`
> +  and `NBD_CMD_TRIM`); existing clients are known to set this bit on

maybe: s/existing/but existing/

> +  other commands; subject to that, provided `NBD_FLAG_SEND_FUA` is
> +  negotiated, the client MAY set this bit as it wishes. If the server

Sounds redundant.  I'd strike 'subject to that, ... as it wishes'.

> +  receives a command with `NBD_CMD_FLAG_FUA` set it MUST NOT send its
> +  reply to that command until all write operations (if any) associated with
> +  that command command have been completed and persisted to non-volatile
> +  storage. If the command does not in fact write data (for instance on an
> +  `NBD_CMD_TRIM` which does is ignored), the server MAY ignore this bit

s/does //

> +  being set on such a command.
>  - bit 1, `NBD_CMD_NO_HOLE`; defined by the experimental `WRITE_ZEROES`
>    extension; see below.
>  - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
> @@ -535,12 +572,6 @@ The following request types exist:
>      message. The server MAY send the reply message before the data has
>      reached permanent storage.
>  
> -    If the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") was set in the
> -    transmission flags field, the client MAY set the flag `NBD_CMD_FLAG_FUA` in
> -    the command flags field. If this flag was set, the server MUST NOT send
> -    the reply until it has ensured that the newly-written data has reached
> -    permanent storage.
> -

You should drop this paragraph from both NBD_CMD_WRITE and
NBD_CMD_WRITE_ZEROES, now that the flag is globally described (here, you
only dropped it from one of the two).
Paolo Bonzini April 5, 2016, 2:30 p.m. UTC | #2
On 05/04/2016 14:07, Alex Bligh wrote:
> Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
> the latter may be set on any command, and its semantics on commands other
> than NBD_CMD_WRITE need explaining. Further, explain how these relate to
> reordering of commands.
> 
> Signed-off-by: Alex Bligh <alex-rWA27mgs/Jz10XsdtD+oqA@public.gmane.org>
> ---
>  doc/proto.md | 51 +++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 41 insertions(+), 10 deletions(-)
> 
> ARRRGH - this time with a proper commit header.
> 
> Changes since version 2:
> 
> * Rebase on master
> 
> * Remove bogus 'SHOULD' for FLUSH in relation to writes that are in flight
>   but not yet completed
> 
> * After consultation with lkml etc., document that FUA on things that do not
>   write does nothing
> 
> * Document that sending FUA for commands that do nothing is permissible, but
>   'SHOULD NOT' be done; an existing client does this.

Can you send a pointer to the discussion?  FUA on reads definitely does
*something* in SCSI (it ensures that the data is moved out of the
volatile cache prior to the read, similar to what QEMU implements).

Paolo

> * Document that FUA on TRIM should do something after all, per Kevin Wolf's
>   comment
Alex Bligh April 5, 2016, 3:11 p.m. UTC | #3
Eric,

> close, but needs a v4

Should have guessed :-)

>> +++ b/doc/proto.md
>> @@ -217,6 +217,33 @@ handle as was sent by the client in the corresponding request.  In
>> this way, the client can correlate which request is receiving a
>> response.
>> 
>> +#### Ordering of messages and writes
>> +
>> +The server MAY process commands out of order, and MAY reply out of
>> +order, save that:
> 
> maybe s/save/except/

ok

>> +
>> +* All write commands (that includes both `NBD_CMD_WRITE` and
>> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
> 
> should we also list  NBD_CMD_WRITE_ZEROES?

My mistake

> 
>> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
>> +  storage prior to replying to that `NBD_CMD_FLUSH`. This
>> +  paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within
>> +  the transmission flags, as otherwise `NBD_CMD_FLUSH` will never
>> +  be sent by the client to the server.
>> +
>> +* A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set
>> +  in its command flags until the data (if any) written by that command
>> +  is persisted to non-volatile storage. This only applies if
>> +  `NBD_FLAG_SEND_FLUSH` is set within the transmission flags, as otherwise
> 
> s/FLUSH/FUA/

Yes

>> +  `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server
>> +  by the client.
>> +
>> +`NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with
>> +`REQ_FLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux
>> +kernel bio with `REQ_FUA` set. In case of ambiguity in this
>> +specification, the
>> +[kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
>> +may be useful.
>> +
>> #### Request message
>> 
>> The request message, sent by the client, looks as follows:
>> @@ -483,10 +510,20 @@ affects a particular command.  Clients MUST NOT set a command flag bit
>> that is not documented for the particular command; and whether a flag is
>> valid may depend on negotiation during the handshake phase.
>> 
>> -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and
>> -  `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
>> -  "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
>> -  flags included `NBD_FLAG_SEND_FUA`.
>> +- bit 0, `NBD_CMD_FLAG_FUA`; This flag is valid for all commands provided
> 
> s/commands/commands,/

Yes

> 
>> +  `NBD_FLAG_SEND_FUA` has been negotiated, in which case the server MUST
>> +  accept all commands with this bit set (even by ignoring the bit). The
>> +  client SHOULD NOT set this bit unless the command has the potential of
>> +  writing data (current commands are `NBD_CMD_WRITE`, `NBD_CMD_WRITE_ZEROES`
>> +  and `NBD_CMD_TRIM`); existing clients are known to set this bit on
> 
> maybe: s/existing/but existing/
> 
>> +  other commands; subject to that, provided `NBD_FLAG_SEND_FUA` is
>> +  negotiated, the client MAY set this bit as it wishes. If the server
> 
> Sounds redundant.  I'd strike 'subject to that, ... as it wishes'.

I was trying to say that in a stream of commands it can set it as
it wishes. I've made it clearer and referred to the section on
ordering.

> 
>> +  receives a command with `NBD_CMD_FLAG_FUA` set it MUST NOT send its
>> +  reply to that command until all write operations (if any) associated with
>> +  that command command have been completed and persisted to non-volatile
>> +  storage. If the command does not in fact write data (for instance on an
>> +  `NBD_CMD_TRIM` which does is ignored), the server MAY ignore this bit
> 
> s/does //

Reworded differently.

>> +  being set on such a command.
>> - bit 1, `NBD_CMD_NO_HOLE`; defined by the experimental `WRITE_ZEROES`
>>   extension; see below.
>> - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
>> @@ -535,12 +572,6 @@ The following request types exist:
>>     message. The server MAY send the reply message before the data has
>>     reached permanent storage.
>> 
>> -    If the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") was set in the
>> -    transmission flags field, the client MAY set the flag `NBD_CMD_FLAG_FUA` in
>> -    the command flags field. If this flag was set, the server MUST NOT send
>> -    the reply until it has ensured that the newly-written data has reached
>> -    permanent storage.
>> -
> 
> You should drop this paragraph from both NBD_CMD_WRITE and
> NBD_CMD_WRITE_ZEROES, now that the flag is globally described (here, you
> only dropped it from one of the two).

Thanks, missed that.

--
Alex Bligh
Alex Bligh April 5, 2016, 3:16 p.m. UTC | #4
Paolo,

On 5 Apr 2016, at 15:30, Paolo Bonzini <pbonzini@redhat.com> wrote:

>> * Document that sending FUA for commands that do nothing is permissible, but
>>  'SHOULD NOT' be done; an existing client does this.
> 
> Can you send a pointer to the discussion?  FUA on reads definitely does
> *something* in SCSI (it ensures that the data is moved out of the
> volatile cache prior to the read, similar to what QEMU implements).

Sure. I got a solitary one reply that referenced the kernel, which
was copied to this list - see below.

I don't have strong feelings either way, but the safer option would
be to not REQUIRE the server to rely on any FUA read behaviour (I'm
already saying they should ignore the bit on reads). Qemu can't
currently RELY on FUA on reads, as it's not documented anywhere
and with the reference NBD server did nothing until recently; recently
it started errorring the read! See below.

On that basis I chose to go with 'FUA on read does nothing'.

If the kernel actually does something on read perhaps we should
reconsider.

Alex

Begin forwarded message:

> From: Jeff Moyer <jmoyer@redhat.com>
> Subject: Re: Block layer - meaning of REQ_FUA on not write requests
> Date: 1 April 2016 17:06:09 BST
> To: Alex Bligh <alex@alex.org.uk>
> Cc: "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>, nbd-general@lists.sourceforge.net, Eric Blake <eblake@redhat.com>
> 
> Alex Bligh <alex@alex.org.uk> writes:
> 
>> I am trying to clean up the documentation of the NBD protocol. NBD's
>> support for Force Unit Access (FUA) was modelled on the linux kernel
>> block layer. When I added support a few years ago, I omitted to find
>> out exactly what types of request it applies to. Obviously it applies
>> to write requests, but how about others (e.g. read)?
> 
> Any request with REQ_FUA set will be treated as a flush by the block
> layer.  As such, we do not expect reads to have this bit set.
> 
> Cheers,
> Jeff
Paolo Bonzini April 5, 2016, 3:22 p.m. UTC | #5
On 05/04/2016 17:16, Alex Bligh wrote:
> > > * Document that sending FUA for commands that do nothing is permissible, but
> > >  'SHOULD NOT' be done; an existing client does this.
> > 
> > Can you send a pointer to the discussion?  FUA on reads definitely does
> > *something* in SCSI (it ensures that the data is moved out of the
> > volatile cache prior to the read, similar to what QEMU implements).
> 
> Sure. I got a solitary one reply that referenced the kernel, which
> was copied to this list - see below.

Ok, thanks.

> I don't have strong feelings either way, but the safer option would
> be to not REQUIRE the server to rely on any FUA read behaviour (I'm
> already saying they should ignore the bit on reads). Qemu can't
> currently RELY on FUA on reads, as it's not documented anywhere
> and with the reference NBD server did nothing until recently; recently
> it started errorring the read! See below.

Right.  However, bugs get fixed...

Paolo

> On that basis I chose to go with 'FUA on read does nothing'.
> 
> If the kernel actually does something on read perhaps we should
> reconsider.
Alex Bligh April 5, 2016, 3:25 p.m. UTC | #6
On 5 Apr 2016, at 16:22, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Right.  However, bugs get fixed...

Sure. I'd quite like to get this change in now though, and
we can add a treatment for 'read' later. It's the ordering stuff
which is really important, plus 'don't break current QEMU' :-)
diff mbox

Patch

diff --git a/doc/proto.md b/doc/proto.md
index 35a3266..d7d9939 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -217,6 +217,33 @@  handle as was sent by the client in the corresponding request.  In
 this way, the client can correlate which request is receiving a
 response.
 
+#### Ordering of messages and writes
+
+The server MAY process commands out of order, and MAY reply out of
+order, save that:
+
+* All write commands (that includes both `NBD_CMD_WRITE` and
+  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
+  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
+  storage prior to replying to that `NBD_CMD_FLUSH`. This
+  paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within
+  the transmission flags, as otherwise `NBD_CMD_FLUSH` will never
+  be sent by the client to the server.
+
+* A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set
+  in its command flags until the data (if any) written by that command
+  is persisted to non-volatile storage. This only applies if
+  `NBD_FLAG_SEND_FLUSH` is set within the transmission flags, as otherwise
+  `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server
+  by the client.
+
+`NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with
+`REQ_FLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux
+kernel bio with `REQ_FUA` set. In case of ambiguity in this
+specification, the
+[kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
+may be useful.
+
 #### Request message
 
 The request message, sent by the client, looks as follows:
@@ -483,10 +510,20 @@  affects a particular command.  Clients MUST NOT set a command flag bit
 that is not documented for the particular command; and whether a flag is
 valid may depend on negotiation during the handshake phase.
 
-- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and
-  `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
-  "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
-  flags included `NBD_FLAG_SEND_FUA`.
+- bit 0, `NBD_CMD_FLAG_FUA`; This flag is valid for all commands provided
+  `NBD_FLAG_SEND_FUA` has been negotiated, in which case the server MUST
+  accept all commands with this bit set (even by ignoring the bit). The
+  client SHOULD NOT set this bit unless the command has the potential of
+  writing data (current commands are `NBD_CMD_WRITE`, `NBD_CMD_WRITE_ZEROES`
+  and `NBD_CMD_TRIM`); existing clients are known to set this bit on
+  other commands; subject to that, provided `NBD_FLAG_SEND_FUA` is
+  negotiated, the client MAY set this bit as it wishes. If the server
+  receives a command with `NBD_CMD_FLAG_FUA` set it MUST NOT send its
+  reply to that command until all write operations (if any) associated with
+  that command command have been completed and persisted to non-volatile
+  storage. If the command does not in fact write data (for instance on an
+  `NBD_CMD_TRIM` which does is ignored), the server MAY ignore this bit
+  being set on such a command.
 - bit 1, `NBD_CMD_NO_HOLE`; defined by the experimental `WRITE_ZEROES`
   extension; see below.
 - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
@@ -535,12 +572,6 @@  The following request types exist:
     message. The server MAY send the reply message before the data has
     reached permanent storage.
 
-    If the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") was set in the
-    transmission flags field, the client MAY set the flag `NBD_CMD_FLAG_FUA` in
-    the command flags field. If this flag was set, the server MUST NOT send
-    the reply until it has ensured that the newly-written data has reached
-    permanent storage.
-
     If an error occurs, the server SHOULD set the appropriate error code
     in the error field. The server MAY then close the connection.