[v3] doc: Add NBD_CMD_BLOCK_STATUS extension

From: Wouter Verhelst [mailto:w@uter.be]

With the availability of sparse storage formats, it is often needed
to query status of a particular range and read only those blocks of
data that are actually present on the block device.

To provide such information, the patch adds the BLOCK_STATUS
extension with one new NBD_CMD_BLOCK_STATUS command, a new
structured reply chunk format, and a new transmission flag.

There exists a concept of data dirtiness, which is required
during, for example, incremental block device backup. To express
this concept via NBD protocol, this patch also adds a flag to
NBD_CMD_BLOCK_STATUS to request dirtiness information rather than
provisioning information; however, with the current proposal, data
dirtiness is only useful with additional coordination outside of
the NBD protocol (such as a way to start and stop the server from
tracking dirty sectors).  Future NBD extensions may add commands
to control dirtiness through NBD.

Since NBD protocol has no notion of block size, and to mimic SCSI
"GET LBA STATUS" command more closely, it has been chosen to return
a list of extents in the response of NBD_CMD_BLOCK_STATUS command,
instead of a bitmap.

CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Wouter Verhelst <w@uter.be>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

v3:

Hi all. This is almost a resend of v2 (by Eric Blake), The only change is
removing the restriction, that sum of status descriptor lengths must be equal
to requested length. I.e., let's permit the server to replay with less data
than required if it wants.

Also, bit of NBD_FLAG_SEND_BLOCK_STATUS is changed to 9, as 8 is now
 NBD_FLAG_CAN_MULTI_CONN in master branch.

And, finally, I've rebased this onto current state of
extension-structured-reply branch (which itself should be rebased on
master IMHO).

By this resend I just want to continue the diqussion, started about half
a year ago. Here is a summary of some questions and ideas from v2
diqussion:

1. Q: Synchronisation. Is such data (dirty/allocated) reliable? 
   A: This all is for read-only disks, so the data is static and unchangeable.

2. Q: different granularities of dirty/allocated bitmaps. Any problems?
   A: 1: server replies with status descriptors of any size, granularity
         is hidden from the client
      2: dirty/allocated requests are separate and unrelated to each
         other, so their granularities are not intersecting

3. Q: selecting of dirty bitmap to export
   A: several variants:
      1: id of bitmap is in flags field of request
          pros: - simple
          cons: - it's a hack. flags field is for other uses.
                - we'll have to map bitmap names to these "ids"
      2: introduce extended nbd requests with variable length and exploit this
         feature for BLOCK_STATUS command, specifying bitmap identifier.
         pros: - looks like a true way
         cons: - we have to create additional extension
               - possible we have to create a map,
                 {<QEMU bitmap name> <=> <NBD bitmap id>}
      3: exteranl tool should select which bitmap to export. So, in case of Qemu
         it should be something like qmp command block-export-dirty-bitmap.
         pros: - simple
               - we can extend it to behave like (2) later
         cons: - additional qmp command to implement (possibly, the lesser evil)
         note: Hmm, external tool can make chose between allocated/dirty data too,
               so, we can remove 'NBD_FLAG_STATUS_DIRTY' flag at all.

4. Q: Should not get_{allocated,dirty} be separate commands?
   cons: Two commands with almost same semantic and similar means?
   pros: However here is a good point of separating clearly defined and native
         for block devices GET_BLOCK_STATUS from user-driven and actually
         undefined data, called 'dirtyness'.

5. Number of status descriptors, sent by server, should be restricted
   variants:
   1: just allow server to restrict this as it wants (which was done in v3)
   2: (not excluding 1). Client specifies somehow the maximum for number
      of descriptors.
      2.1: add command flag, which will request only one descriptor
           (otherwise, no restrictions from the client)
      2.2: again, introduce extended nbd requests, and add field to
           specify this maximum

6. A: What to do with unspecified flags (in request/reply)?
   I think the normal variant is to make them reserved. (Server should
   return EINVAL if found unknown bits, client should consider replay
   with unknown bits as an error)

[v3] doc: Add NBD_CMD_BLOCK_STATUS extension

Commit Message

Comments

Patch