diff mbox

[v5,12/21] block: define get_block_status return value

Message ID 1378314038-15525-13-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Sept. 4, 2013, 5 p.m. UTC
Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-8
are left for future extensions.

The return code is compatible with the old is_allocated API: if a driver
only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
clients of is_allocated will not have any change in behavior.  Still,
we will return more precise information in the next patches and the
new definition of bdrv_is_allocated is already prepared for this.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c               | 10 ++++++++--
 include/block/block.h | 26 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 2 deletions(-)

Comments

Kevin Wolf May 5, 2014, 2:34 p.m. UTC | #1
Am 04.09.2013 um 19:00 hat Paolo Bonzini geschrieben:
> Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
> are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-8
> are left for future extensions.
> 
> The return code is compatible with the old is_allocated API: if a driver
> only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
> clients of is_allocated will not have any change in behavior.  Still,
> we will return more precise information in the next patches and the
> new definition of bdrv_is_allocated is already prepared for this.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c               | 10 ++++++++--
>  include/block/block.h | 26 ++++++++++++++++++++++++++
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 47a5d63..0eb5e43 100644
> --- a/block.c
> +++ b/block.c
> @@ -3073,7 +3073,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>  
>      if (!bs->drv->bdrv_co_get_block_status) {
>          *pnum = nb_sectors;
> -        return 1;
> +        return BDRV_BLOCK_DATA;
>      }
>  
>      return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
> @@ -3123,7 +3123,13 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
>  int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
>                                     int nb_sectors, int *pnum)
>  {
> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return
> +        (ret & BDRV_BLOCK_DATA) ||
> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
>  }

I'm afraid that the assumptions eventually became too clever:

$ ./qemu-img create -f qcow2 /tmp/backing.qcow2 1G
Formatting '/tmp/backing.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
$ ./qemu-img create -f qcow2 -b /tmp/backing.qcow2 /tmp/overlay.qcow2 2G
Formatting '/tmp/overlay.qcow2', fmt=qcow2 size=2147483648 backing_file='/tmp/backing.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
$ ./qemu-io -c 'alloc 1G 64k' /tmp/overlay.qcow2
65536/65536 sectors allocated at offset 1 GiB

BDRV_BLOCK_ZERO is set for everything past the end of the backing file
(which kind of makes sense), and bdrv_has_zero_init() returns false for
anything that has a backing file (which makes sense, too), so now we're
reporting all sectors past the end of the backing file as allocated
(which is bad).

I think we're missing the information whether the BDRV_BLOCK_ZERO flag
actually comes from bs itself or from the backing chain. Do we need
another flag or does someone have a better idea?

Kevin
Paolo Bonzini May 5, 2014, 2:58 p.m. UTC | #2
Il 05/05/2014 16:34, Kevin Wolf ha scritto:
> I think we're missing the information whether the BDRV_BLOCK_ZERO flag
> actually comes from bs itself or from the backing chain. Do we need
> another flag or does someone have a better idea?

Not sure if it is a better idea :) but we can duplicate the logic of
bdrv_get_co_block_status in bdrv_is_allocated.

The observation here is that bdrv_has_zero_init should have been 
changed to bdrv_unallocated_blocks_are_zero in commit c3d8688
(block/get_block_status: fix BDRV_BLOCK_ZERO for unallocated blocks,
2013-10-24); the logic in the commit message applies just as well to
bdrv_is_allocated.

Of course, instead of cut-and-paste we can wrap all this into a
bdrv_unallocated_block_is_zero(bs, sector_num) function like this:

        if (bdrv_unallocated_blocks_are_zero(bs)) {
            return true;
        }
        if (bs->backing_hd) {
            BlockDriverState *bs2 = bs->backing_hd;
            int64_t length2 = bdrv_getlength(bs2);
            if (length2 >= 0 && sector_num >= (length2 >> BDRV_SECTOR_BITS)) {
                return true;
            }
        }
        return false;

The function can then be used in bdrv_get_block_status to add a
BDRV_BLOCK_ZERO, and in bdrv_is_allocated instead of bdrv_has_zero_init.
(Bonus points for renaming bdrv_is_allocated to something like
bdrv_has_content, and similarly for bdrv_is_allocated_above).

Paolo
Kevin Wolf May 6, 2014, 11:34 a.m. UTC | #3
Am 05.05.2014 um 16:58 hat Paolo Bonzini geschrieben:
> Il 05/05/2014 16:34, Kevin Wolf ha scritto:
> > I think we're missing the information whether the BDRV_BLOCK_ZERO flag
> > actually comes from bs itself or from the backing chain. Do we need
> > another flag or does someone have a better idea?
> 
> Not sure if it is a better idea :) but we can duplicate the logic of
> bdrv_get_co_block_status in bdrv_is_allocated.
> 
> The observation here is that bdrv_has_zero_init should have been 
> changed to bdrv_unallocated_blocks_are_zero in commit c3d8688
> (block/get_block_status: fix BDRV_BLOCK_ZERO for unallocated blocks,
> 2013-10-24); the logic in the commit message applies just as well to
> bdrv_is_allocated.

But the result still wouldn't be right. We have an implication that you
can't simply reverse:

    If this block is unallocated and unallocated blocks read as zero,
    then this block reads as zero.

You're trying to use something like "If the block reads as zero and
unallocated blocks read as zero, it must be unallocated", which quite
obviously doesn't work.

Kevin
Paolo Bonzini May 6, 2014, 12:31 p.m. UTC | #4
Il 06/05/2014 13:34, Kevin Wolf ha scritto:
> But the result still wouldn't be right. We have an implication that you
> can't simply reverse:
>
>     If this block is unallocated and unallocated blocks read as zero,
>     then this block reads as zero.
>
> You're trying to use something like "If the block reads as zero and
> unallocated blocks read as zero, it must be unallocated", which quite
> obviously doesn't work.

This is not what the condition says in is_allocated.  The right one is 
(excluding the backing_hd case, which is handled with a function like 
the one I sketched in the previous message):

     If the block reads as zero and not all unallocated blocks read as
     zero, the sector will be read from BS rather than the backing file,
     even if it is not allocated.

This is true, because bdrv_get_block_status does not ask the backing 
file whether it is zero in the backing file.  Thus, reading from the 
backing file does not guarantee to read zeroes.

<pedantic>
Since you wrote "something like", I guess it's not the difference 
between what you wrote and what I wrote is not too relevant.  But here 
is the pedantic explanation of it:

- is_allocated does not answer the question "is the block allocated", 
but rather "will the sector be read from BS or from a backing file?" (or 
from an imaginary all-zeroes backing file if there is none).  This is 
why it is better to rename is_allocated before it brings more confusion 
(and the brain damage I got from this confusion is also the reason why 
BDRV_BLOCK_DATA is not called BDRV_BLOCK_ALLOCATED).

- this is the right hand side of a "||", so the outcome is not "it must 
be unallocated", but "it must be allocated".  Or better, given the 
previous point, "the sector will be read from BS"

- the bdrv_has_zero_init (which is wrong, but can be fixed) is negated, 
so it is not "unallocated blocks are zero" but rather "if the block 
reads as zero and not all unallocated blocks read as zero".
</pedantic>

Paolo
Kevin Wolf May 6, 2014, 1:08 p.m. UTC | #5
Am 06.05.2014 um 14:31 hat Paolo Bonzini geschrieben:
> Il 06/05/2014 13:34, Kevin Wolf ha scritto:
> >But the result still wouldn't be right. We have an implication that you
> >can't simply reverse:
> >
> >    If this block is unallocated and unallocated blocks read as zero,
> >    then this block reads as zero.
> >
> >You're trying to use something like "If the block reads as zero and
> >unallocated blocks read as zero, it must be unallocated", which quite
> >obviously doesn't work.
> 
> This is not what the condition says in is_allocated.  The right one
> is (excluding the backing_hd case, which is handled with a function
> like the one I sketched in the previous message):
> 
>     If the block reads as zero and not all unallocated blocks read as
>     zero, the sector will be read from BS rather than the backing file,
>     even if it is not allocated.
> 
> This is true, because bdrv_get_block_status does not ask the backing
> file whether it is zero in the backing file.  Thus, reading from the
> backing file does not guarantee to read zeroes.

Aha, this paragraph is the key. :-)

I assumed that bdrv_get_block_status() does ask the backing file, but
apparently it doesn't. The current meaning of BDRV_BLOCK_ZERO is
therefore not simply "sectors read as zero", but "sectors read as zero
and this is determined by bs itself (mostly, except for the backing file
size)".

Is it a good idea to rely on this or would it be reasonable for
bdrv_get_block_status() to look at the backing file for returning ZERO
sometime in the future? I think that could be useful information in some
cases.

Kevin
Paolo Bonzini May 6, 2014, 1:20 p.m. UTC | #6
Il 06/05/2014 15:08, Kevin Wolf ha scritto:
>> This is true, because bdrv_get_block_status does not ask the backing
>> file whether it is zero in the backing file.  Thus, reading from the
>> backing file does not guarantee to read zeroes.
>
> Aha, this paragraph is the key. :-)
>
> I assumed that bdrv_get_block_status() does ask the backing file, but
> apparently it doesn't. The current meaning of BDRV_BLOCK_ZERO is
> therefore not simply "sectors read as zero", but "sectors read as zero
> and this is determined by bs itself (mostly, except for the backing file
> size)".

Exactly.

> Is it a good idea to rely on this or would it be reasonable for
> bdrv_get_block_status() to look at the backing file for returning ZERO
> sometime in the future? I think that could be useful information in some
> cases.

It is reasonable, for example SCSI emulation could use it.  However, it 
would be a different API (like bdrv_is_allocated_above).  It would be 
very similar to get_block_status() in qemu-img.c.

Paolo
diff mbox

Patch

diff --git a/block.c b/block.c
index 47a5d63..0eb5e43 100644
--- a/block.c
+++ b/block.c
@@ -3073,7 +3073,7 @@  static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
 
     if (!bs->drv->bdrv_co_get_block_status) {
         *pnum = nb_sectors;
-        return 1;
+        return BDRV_BLOCK_DATA;
     }
 
     return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
@@ -3123,7 +3123,13 @@  int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
                                    int nb_sectors, int *pnum)
 {
-    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+    if (ret < 0) {
+        return ret;
+    }
+    return
+        (ret & BDRV_BLOCK_DATA) ||
+        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
 }
 
 /*
diff --git a/include/block/block.h b/include/block/block.h
index c73e679..578941f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -81,6 +81,32 @@  typedef struct BlockDevOps {
 #define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)
 #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
 
+/* BDRV_BLOCK_DATA: data is read from bs->file or another file
+ * BDRV_BLOCK_ZERO: sectors read as zero
+ * BDRV_BLOCK_OFFSET_VALID: sector stored in bs->file as raw data
+ *
+ * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
+ * bs->file where sector data can be read from as raw data.
+ *
+ * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
+ *
+ * DATA ZERO OFFSET_VALID
+ *  t    t        t       sectors read as zero, bs->file is zero at offset
+ *  t    f        t       sectors read as valid from bs->file at offset
+ *  f    t        t       sectors preallocated, read as zero, bs->file not
+ *                        necessarily zero at offset
+ *  f    f        t       sectors preallocated but read from backing_hd,
+ *                        bs->file contains garbage at offset
+ *  t    t        f       sectors preallocated, read as zero, unknown offset
+ *  t    f        f       sectors read from unknown file or offset
+ *  f    t        f       not allocated or unknown offset, read as zero
+ *  f    f        f       not allocated or unknown offset, read from backing_hd
+ */
+#define BDRV_BLOCK_DATA         1
+#define BDRV_BLOCK_ZERO         2
+#define BDRV_BLOCK_OFFSET_VALID 4
+#define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
+
 typedef enum {
     BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
 } BlockErrorAction;