diff mbox

Strawman proposal for NBD structured replies

Message ID 1459283983-11890-1-git-send-email-alex@alex.org.uk
State New
Headers show

Commit Message

Alex Bligh March 29, 2016, 8:39 p.m. UTC
Here's a strawman for the structured reply section. I haven't
covered negotation.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 doc/proto.md | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 111 insertions(+), 3 deletions(-)

Comments

Wouter Verhelst March 29, 2016, 8:57 p.m. UTC | #1
On Tue, Mar 29, 2016 at 09:39:43PM +0100, Alex Bligh wrote:
> Here's a strawman for the structured reply section. I haven't
> covered negotation.

LGTM, for the most part.

[...]
> +Each chunk consists of the following:
> +
> +S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
> +S: 32 bits, flags (including type)
> +S: 64 bits, handle
> +S: 32 bits, payload length
> +S: (*length* bytes of payload data)
> +
> +The flags have the following meanings:
> +
> +* bits 0-7: `NBD_CHUNKTYPE`, an 8 bit unsigned integer
> +* bits 8-31: reserved (server MUST set these to zero)

I understand why you do it this way (we don't need 2^16 reply types),
but (in contrast to the flags in the request packet) this makes it
harder to specify flags and command type as separate fields (there is no
24-bit integer on most systems).

As said though, I understand why, and the alternative isn't ideal.

[...]
> +If the server detects an error during an operation which it
> +is serving with a structured reply, it MUST complete
> +the transmission of the current data chunk if transmission
> +has started (by padding the current chunk with data
> +which MUST be zero), after which zero or more other
> +data chunks may be sent, followed by an `NBD_CHUNKTYPE_END`
> +chunk. The server MAY set the offset within `NBD_CHUNKTYPE_END`
> +to the offset of the error; if so, this MUST be within the
> +length requested.

This should probably also be more explicit about what to do if the
server doesn't want to set the offset (set it to zero, presumably)
Alex Bligh March 29, 2016, 9:59 p.m. UTC | #2
On 29 Mar 2016, at 21:57, Wouter Verhelst <w@uter.be> wrote:
> 
> I understand why you do it this way (we don't need 2^16 reply types),
> but (in contrast to the flags in the request packet) this makes it
> harder to specify flags and command type as separate fields (there is no
> 24-bit integer on most systems).
> 
> As said though, I understand why, and the alternative isn't ideal.

As a third option then:

Each chunk consists of the following:

S: 32 bits, 0x668e33ef, magic (NBD_STRUCTURED_REPLY_MAGIC)
S: 8 bits: type
S: 8 bits: reserved (must be zero)
S: 16 bits, flags
S: 64 bits, handle
S: 32 bits, payload length S: (length bytes of payload data)

The flags have the following meanings:

• bits 0-15: reserved (server MUST set these to zero)

>> +If the server detects an error during an operation which it
>> +is serving with a structured reply, it MUST complete
>> +the transmission of the current data chunk if transmission
>> +has started (by padding the current chunk with data
>> +which MUST be zero), after which zero or more other
>> +data chunks may be sent, followed by an `NBD_CHUNKTYPE_END`
>> +chunk. The server MAY set the offset within `NBD_CHUNKTYPE_END`
>> +to the offset of the error; if so, this MUST be within the
>> +length requested.
> 
> This should probably also be more explicit about what to do if the
> server doesn't want to set the offset (set it to zero, presumably)

Hmm. Perhaps it would be better to set the offset to 2^32-1 to
indicate "I don't know". Making this value useful is difficult in
the situation where the server is running multiple sendfiles on
multiple chunks. There could actually be multiple errors, and you
don't want the server to rely on 'data up to X' as being OK as
only one error is reported. I'd therefore suggest an error offset
of 2^32-1 means 'one or more error, assume all delivered data is
potentially erroneous'.
Wouter Verhelst March 29, 2016, 10:31 p.m. UTC | #3
On Tue, Mar 29, 2016 at 10:59:18PM +0100, Alex Bligh wrote:
> On 29 Mar 2016, at 21:57, Wouter Verhelst <w@uter.be> wrote:
> > 
> > I understand why you do it this way (we don't need 2^16 reply types),
> > but (in contrast to the flags in the request packet) this makes it
> > harder to specify flags and command type as separate fields (there is no
> > 24-bit integer on most systems).
> > 
> > As said though, I understand why, and the alternative isn't ideal.
> 
> As a third option then:
> 
> Each chunk consists of the following:
> 
> S: 32 bits, 0x668e33ef, magic (NBD_STRUCTURED_REPLY_MAGIC)
> S: 8 bits: type
> S: 8 bits: reserved (must be zero)
> S: 16 bits, flags
> S: 64 bits, handle
> S: 32 bits, payload length S: (length bytes of payload data)
> 
> The flags have the following meanings:
> 
> • bits 0-15: reserved (server MUST set these to zero)

That seems better in that context, yes. The reserved byte could later on
be assigned as extra flags if need be.

> >> +If the server detects an error during an operation which it
> >> +is serving with a structured reply, it MUST complete
> >> +the transmission of the current data chunk if transmission
> >> +has started (by padding the current chunk with data
> >> +which MUST be zero), after which zero or more other
> >> +data chunks may be sent, followed by an `NBD_CHUNKTYPE_END`
> >> +chunk. The server MAY set the offset within `NBD_CHUNKTYPE_END`
> >> +to the offset of the error; if so, this MUST be within the
> >> +length requested.
> > 
> > This should probably also be more explicit about what to do if the
> > server doesn't want to set the offset (set it to zero, presumably)
> 
> Hmm. Perhaps it would be better to set the offset to 2^32-1 to
> indicate "I don't know". Making this value useful is difficult in
> the situation where the server is running multiple sendfiles on
> multiple chunks.

(side note: you can't do multiple sendfile-like things concurrently; one
of them will require exclusive access to write to the socket)

> There could actually be multiple errors, and you
> don't want the server to rely on 'data up to X' as being OK as
> only one error is reported. I'd therefore suggest an error offset
> of 2^32-1 means 'one or more error, assume all delivered data is
> potentially erroneous'.

The reason why I suggested zero is that it doesn't require special-case
code. If an error offset implies that everything beyond that offset is
invalid, then having an offset of zero implies that the whole read is
invalid -- which is correct if the server encountered an error, but
doesn't know or doesn't want to say (for whatever reason) where.

Maybe the "MAY set the offset" above should just be a "MUST set the
offset", with the clarification that the offset "MUST not be beyond the
actual error location, but MAY be before it if the server has no
detailed information", or something along those lines.
Eric Blake March 29, 2016, 10:58 p.m. UTC | #4
On 03/29/2016 04:31 PM, Wouter Verhelst wrote:
> 
> Maybe the "MAY set the offset" above should just be a "MUST set the
> offset", with the clarification that the offset "MUST not be beyond the
> actual error location, but MAY be before it if the server has no
> detailed information", or something along those lines.

Except that offset 0 IS a valid location, and there is no unsigned
number before 0 if the read error occurs while reading the head of the file.
diff mbox

Patch

diff --git a/doc/proto.md b/doc/proto.md
index aaae0a2..2ea81b9 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -195,15 +195,123 @@  C: 64 bits, offset (unsigned)
 C: 32 bits, length (unsigned)  
 C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)
 
-The server replies with:
+Replies take one of two forms. They may either be structured replies,
+or unstructured replies. The server MUST NOT send structured replies
+unless it has negotiated structured replies with the client using
+`NBD_OPT_STUCTURED_REPLIES` (??). Subject to that, structured replies
+may be sent in response to any command.
+
+Unstructured replies are problematic for error handling within
+`NBD_CMD_READ`, therefore servers SHOULD support structured replies.
+
+#### Unstructured replies
+
+In an unstructured reply, the server replies with:
 
 S: 32 bits, 0x67446698, magic (`NBD_REPLY_MAGIC`)  
 S: 32 bits, error  
 S: 64 bits, handle  
 S: (*length* bytes of data if the request is of type `NBD_CMD_READ`)
 
-Replies need not be sent in the same order as requests (i.e., requests
-may be handled by the server asynchronously).
+#### Structured replies
+
+A structured reply consists of one or more chunks. The server
+MUST send exactly one end chunk chunk (identified by
+the chunk type `NBD_CHUNKTYPE_END`), and this MUST be the final
+chunk within the reply.
+
+Each chunk consists of the following:
+
+S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
+S: 32 bits, flags (including type)
+S: 64 bits, handle
+S: 32 bits, payload length
+S: (*length* bytes of payload data)
+
+The flags have the following meanings:
+
+* bits 0-7: `NBD_CHUNKTYPE`, an 8 bit unsigned integer
+* bits 8-31: reserved (server MUST set these to zero)
+
+Possible values of `NBD_CHUNKTYPE` are as follows:
+
+* 0 = `NBD_CHUNKTYPE_END`: the final chunk
+* 1 = `NBD_CHUNKTYPE_DATA`: data that has been read
+* 2 = `NBD_CHUNKTYPE_ZERO`: data that should be considered zero
+
+The format of the payload data for each chunk type is as follows:
+
+##### `NBD_CHUNKTYPE_END`
+
+S: 32 bits, error code or zero for success
+S: 64 bits, offset of error (if any)
+
+##### `NBD_CHUNKTYPE_DATA`
+
+S: 64 bits, offset of data
+S: (*length-8* bytes of data as read)
+
+##### `NBD_CHUNKTYPE_ZERO`
+
+S: 64 bits, offset of data
+S: 32 bits, number of zeroes to which this corresponds
+
+
+Commands that return data (currently `NBD_CMD_READ`) therefore MUST
+return zero or more chunks each of type `NBD_CHUNKTYPE_DATA` or
+`NBD_CHUNKTYPE_ZERO` (collectively 'data chunks') followed
+an `NBD_CHUNKTYPE_END`.
+
+The server MAY split the reply into any number of data
+chunks (provided each consists of at least one byte) and
+MAY send the data chunks in any order (though the
+`NBD_CHUNKTYPE_END` must be the final chunk). This means the
+client is responsible for reassembling the chunks in the correct
+order.
+
+The server MUST NOT send chunks that overlap. The server
+MUST NOT send chunks whose data exceeds the length
+of data requested (for this purpose counting the data
+within `NBD_CHUNKTYPE_ZERO` as the number of zero bytes
+specified therein). The server MUST, in the case of a successesful
+read send exactly the number of bytes requested (whether
+represented by `NBD_CHUNKTYPE_DATA` or `NBD_CHUNKTYPE_ZERO`).
+The server MUST NOT, in the case of an errored read, send
+more than the number of byte requested.
+
+In order to avoid the burden of reassembly, the client
+MAY send `NBD_CMD_FLAG_DF` (??), which instructs the server
+not to fragment the reply. If this flag is set, the server
+MUST send either zero or one data chunks and an `NBD_CHUNKTYPE_END`
+only. Under such circumstances the server MAY error the command
+with `ETOOBIG` if the length read exceeds [65,536 bytes | the
+negotiated maximum fragment size].
+
+If no errors are detected within an operation, the `NBD_CHUNKTYPE_END`
+packet MUST contain an error value of zero and an error offset of
+zero.
+
+If the server detects an error during an operation which it
+is serving with a structured reply, it MUST complete
+the transmission of the current data chunk if transmission
+has started (by padding the current chunk with data
+which MUST be zero), after which zero or more other
+data chunks may be sent, followed by an `NBD_CHUNKTYPE_END`
+chunk. The server MAY set the offset within `NBD_CHUNKTYPE_END`
+to the offset of the error; if so, this MUST be within the
+length requested.
+
+#### Ordering of replies
+
+The server MAY send replies in any order; the order of replies
+need not correpsond to the order of requests, i.e., requests
+may be handled by the server asynchronously). The server MAY
+interleave the chunks relating to a single structured reply
+with chunks relating to structured replies relating to
+a different handle, or with unstructured replies relating
+to a different handle. Note that there is a constraint on
+the ordering of chunks within structured replies as set out
+above.
 
 ## Values