diff mbox

[v2,03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()

Message ID 20170703221456.30817-4-eblake@redhat.com
State New
Headers show

Commit Message

Eric Blake July 3, 2017, 10:14 p.m. UTC
Not all callers care about which BDS owns the mapping for a given
range of the file.  In particular, bdrv_is_allocated() cares more
about finding the largest run of allocated data from the guest
perspective, whether or not that data is consecutive from the
host perspective.  Therefore, doing subsequent refinements such
as checking how much of the format-layer allocation also satisfies
BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
case, it just costs extra CPU cycles during a single
bdrv_is_allocated(), but in the worst case, it results in a smaller
*pnum, and forces callers to iterate through more status probes when
visiting the entire file for even more extra CPU cycles.

This patch only optimizes the block layer.  But subsequent patches
will tweak the driver callback to be byte-based, and in the process,
can also pass this hint through to the driver.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v2: new patch
---
 block/io.c | 51 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 37 insertions(+), 14 deletions(-)

Comments

Fam Zheng July 4, 2017, 7:06 a.m. UTC | #1
On Mon, 07/03 17:14, Eric Blake wrote:
> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>   * Drivers not implementing the functionality are assumed to not support
>   * backing files, hence all their sectors are reported as allocated.
>   *
> + * If 'allocation' is true, the caller only cares about allocation
> + * status; this is a hint that a larger 'pnum' result is more
> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
> + *

This is slightly unintuitive. I would guess "allocation == false" means "I don't
care about the allocation status" but it actually is "I don't care about the
consecutiveness". The "only" semantics is missing in the parameter name.

Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
true" means BDRV_BLOCK_OFFSET_VALID is wanted.

Sorry for bikeshedding.

Fam
Eric Blake July 5, 2017, 11:56 a.m. UTC | #2
On 07/04/2017 02:06 AM, Fam Zheng wrote:
> On Mon, 07/03 17:14, Eric Blake wrote:
>> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>>   * Drivers not implementing the functionality are assumed to not support
>>   * backing files, hence all their sectors are reported as allocated.
>>   *
>> + * If 'allocation' is true, the caller only cares about allocation
>> + * status; this is a hint that a larger 'pnum' result is more
>> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
>> + *
> 
> This is slightly unintuitive. I would guess "allocation == false" means "I don't
> care about the allocation status" but it actually is "I don't care about the
> consecutiveness". The "only" semantics is missing in the parameter name.
> 
> Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
> true" means BDRV_BLOCK_OFFSET_VALID is wanted.

Reasonable idea; other [shorter] names I've been toying with:
strict
mapping
precise

any of which, if true (set true by bdrv_get_block_status), means that I
care more about BDRV_BLOCK_OFFSET_VALID and validity for learning host
offsets, if false  it means I'm okay getting a larger *pnum even if it
extends over disjoint host offsets; or:

fast

which if true (set true by bdrv_is_allocated) means I want a larger
*pnum even if I have to sacrifice accuracy by omitting
BDRV_BLOCK_OFFSET_VALID, because I'm trying to reduce effort spent.

> 
> Sorry for bikeshedding.

Not a problem, I also had some double-takes in writing my own code
trying to remember which way I wanted the 'allocation' boolean to be
set, so coming up with a more intuitive name/default state in order to
help legibility is worth it.  Do any of my above suggestions sound better?
Fam Zheng July 5, 2017, 12:07 p.m. UTC | #3
On Wed, 07/05 06:56, Eric Blake wrote:
> On 07/04/2017 02:06 AM, Fam Zheng wrote:
> > On Mon, 07/03 17:14, Eric Blake wrote:
> >> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
> >>   * Drivers not implementing the functionality are assumed to not support
> >>   * backing files, hence all their sectors are reported as allocated.
> >>   *
> >> + * If 'allocation' is true, the caller only cares about allocation
> >> + * status; this is a hint that a larger 'pnum' result is more
> >> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
> >> + *
> > 
> > This is slightly unintuitive. I would guess "allocation == false" means "I don't
> > care about the allocation status" but it actually is "I don't care about the
> > consecutiveness". The "only" semantics is missing in the parameter name.
> > 
> > Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
> > true" means BDRV_BLOCK_OFFSET_VALID is wanted.
> 
> Reasonable idea; other [shorter] names I've been toying with:
> strict
> mapping
> precise
> 
> any of which, if true (set true by bdrv_get_block_status), means that I
> care more about BDRV_BLOCK_OFFSET_VALID and validity for learning host
> offsets, if false  it means I'm okay getting a larger *pnum even if it
> extends over disjoint host offsets; or:
> 
> fast
> 
> which if true (set true by bdrv_is_allocated) means I want a larger
> *pnum even if I have to sacrifice accuracy by omitting
> BDRV_BLOCK_OFFSET_VALID, because I'm trying to reduce effort spent.
> 
> > 
> > Sorry for bikeshedding.
> 
> Not a problem, I also had some double-takes in writing my own code
> trying to remember which way I wanted the 'allocation' boolean to be
> set, so coming up with a more intuitive name/default state in order to
> help legibility is worth it.  Do any of my above suggestions sound better?
> 

I'd vote for "mapping" because it has a close connection with offset (as in
BDRV_BLOCK_OFFSET_VALID).

Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.

Fam
Eric Blake July 5, 2017, 2:01 p.m. UTC | #4
On 07/05/2017 07:07 AM, Fam Zheng wrote:
>>>
>>> Sorry for bikeshedding.
>>
>> Not a problem, I also had some double-takes in writing my own code
>> trying to remember which way I wanted the 'allocation' boolean to be
>> set, so coming up with a more intuitive name/default state in order to
>> help legibility is worth it.  Do any of my above suggestions sound better?
>>
> 
> I'd vote for "mapping" because it has a close connection with offset (as in
> BDRV_BLOCK_OFFSET_VALID).
> 
> Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.

Well, there ARE drivers that WANT to return BDRV_BLOCK_OFFSET_VALID
regardless of the state of the boolean (namely, any driver that also
returns BDRV_BLOCK_RAW, to hint that this is a passthrough and the query
should be repeated at the next BDS in the chain).  So stating that
'offset' is false MUST preclude BDRV_BLOCK_OFFSET_VALID is a bit too
strong, but I can probably come up with appropriate wording that meets
in the middle ground (if 'offset' is true, then make all efforts to
include BDRV_BLOCK_OFFSET_VALID in the return if that is possible; if it
is false, then omitting the flag in order to get a larger pnum is
acceptable)).
Fam Zheng July 5, 2017, 2:21 p.m. UTC | #5
On Wed, 07/05 09:01, Eric Blake wrote:
> On 07/05/2017 07:07 AM, Fam Zheng wrote:
> >>>
> >>> Sorry for bikeshedding.
> >>
> >> Not a problem, I also had some double-takes in writing my own code
> >> trying to remember which way I wanted the 'allocation' boolean to be
> >> set, so coming up with a more intuitive name/default state in order to
> >> help legibility is worth it.  Do any of my above suggestions sound better?
> >>
> > 
> > I'd vote for "mapping" because it has a close connection with offset (as in
> > BDRV_BLOCK_OFFSET_VALID).
> > 
> > Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.
> 
> Well, there ARE drivers that WANT to return BDRV_BLOCK_OFFSET_VALID
> regardless of the state of the boolean (namely, any driver that also
> returns BDRV_BLOCK_RAW, to hint that this is a passthrough and the query
> should be repeated at the next BDS in the chain).  So stating that
> 'offset' is false MUST preclude BDRV_BLOCK_OFFSET_VALID is a bit too
> strong, but I can probably come up with appropriate wording that meets
> in the middle ground (if 'offset' is true, then make all efforts to
> include BDRV_BLOCK_OFFSET_VALID in the return if that is possible; if it
> is false, then omitting the flag in order to get a larger pnum is
> acceptable)).

Yes you are totally right, I thought that too after replied.

Fam
Eric Blake July 6, 2017, 2:35 a.m. UTC | #6
On 07/03/2017 05:14 PM, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  In particular, bdrv_is_allocated() cares more
> about finding the largest run of allocated data from the guest
> perspective, whether or not that data is consecutive from the
> host perspective.  Therefore, doing subsequent refinements such
> as checking how much of the format-layer allocation also satisfies
> BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
> case, it just costs extra CPU cycles during a single
> bdrv_is_allocated(), but in the worst case, it results in a smaller
> *pnum, and forces callers to iterate through more status probes when
> visiting the entire file for even more extra CPU cycles.
> 
> This patch only optimizes the block layer.  But subsequent patches
> will tweak the driver callback to be byte-based, and in the process,
> can also pass this hint through to the driver.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

> @@ -1810,12 +1817,13 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>          }
>      }
> 
> -    if (local_file && local_file != bs &&
> +    if (!allocation && local_file && local_file != bs &&
>          (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>          (ret & BDRV_BLOCK_OFFSET_VALID)) {
>          int file_pnum;
> 
> -        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +        ret2 = bdrv_co_get_block_status(local_file, true,
> +                                        ret >> BDRV_SECTOR_BITS,
>                                          *pnum, &file_pnum, NULL);
>          if (ret2 >= 0) {
>              /* Ignore errors.  This is just providing extra information, it

Hmm. My initial thinking here was that if we already have a good primary
status, we want our secondary status (where we are probing bs->file for
whether we can add BDRV_BLOCK_ZERO) to be as fast as possible, so I
hard-coded the answer that favors is_allocated (I have to be careful
describing this, since v3 will switch from 'bool allocated=true' to
'bool mapping=false' to express that same request).  But it turns out
that, at least for file-posix.c (for that matter, for several protocol
drivers), it's a LOT faster to just blindly state that the entire file
is allocated and data than it is to lseek(SEEK_HOLE).  So favoring
allocation status instead of mapping status defeats the purpose, and
this should be s/true/allocation/ (which is always false at this point)
[or conversely s/false/mapping/, which is always true, in v3].
diff mbox

Patch

diff --git a/block/io.c b/block/io.c
index 6358d07..719a6b0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1685,6 +1685,7 @@  typedef struct BdrvCoGetBlockStatusData {
     int nb_sectors;
     int *pnum;
     int64_t ret;
+    bool allocation;
     bool done;
 } BdrvCoGetBlockStatusData;

@@ -1717,6 +1718,10 @@  int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * Drivers not implementing the functionality are assumed to not support
  * backing files, hence all their sectors are reported as allocated.
  *
+ * If 'allocation' is true, the caller only cares about allocation
+ * status; this is a hint that a larger 'pnum' result is more
+ * important than including BDRV_BLOCK_OFFSET_VALID in the return.
+ *
  * If 'sector_num' is beyond the end of the disk image the return value is
  * BDRV_BLOCK_EOF and 'pnum' is set to 0.
  *
@@ -1733,6 +1738,7 @@  int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
  * is allocated in.
  */
 static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
+                                                     bool allocation,
                                                      int64_t sector_num,
                                                      int nb_sectors, int *pnum,
                                                      BlockDriverState **file)
@@ -1791,14 +1797,15 @@  static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-        ret = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+        ret = bdrv_co_get_block_status(local_file, allocation,
+                                       ret >> BDRV_SECTOR_BITS,
                                        *pnum, pnum, &local_file);
         goto out;
     }

     if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
         ret |= BDRV_BLOCK_ALLOCATED;
-    } else {
+    } else if (!allocation) {
         if (bdrv_unallocated_blocks_are_zero(bs)) {
             ret |= BDRV_BLOCK_ZERO;
         } else if (bs->backing) {
@@ -1810,12 +1817,13 @@  static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         }
     }

-    if (local_file && local_file != bs &&
+    if (!allocation && local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
         int file_pnum;

-        ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
+        ret2 = bdrv_co_get_block_status(local_file, true,
+                                        ret >> BDRV_SECTOR_BITS,
                                         *pnum, &file_pnum, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
@@ -1850,6 +1858,7 @@  out:

 static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,
         BlockDriverState *base,
+        bool allocation,
         int64_t sector_num,
         int nb_sectors,
         int *pnum,
@@ -1861,7 +1870,8 @@  static int64_t coroutine_fn bdrv_co_get_block_status_above(BlockDriverState *bs,

     assert(bs != base);
     for (p = bs; p != base; p = backing_bs(p)) {
-        ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file);
+        ret = bdrv_co_get_block_status(p, allocation, sector_num, nb_sectors,
+                                       pnum, file);
         if (ret < 0) {
             break;
         }
@@ -1891,6 +1901,7 @@  static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
     BdrvCoGetBlockStatusData *data = opaque;

     data->ret = bdrv_co_get_block_status_above(data->bs, data->base,
+                                               data->allocation,
                                                data->sector_num,
                                                data->nb_sectors,
                                                data->pnum,
@@ -1903,11 +1914,12 @@  static void coroutine_fn bdrv_get_block_status_above_co_entry(void *opaque)
  *
  * See bdrv_co_get_block_status_above() for details.
  */
-int64_t bdrv_get_block_status_above(BlockDriverState *bs,
-                                    BlockDriverState *base,
-                                    int64_t sector_num,
-                                    int nb_sectors, int *pnum,
-                                    BlockDriverState **file)
+static int64_t bdrv_common_block_status_above(BlockDriverState *bs,
+                                              BlockDriverState *base,
+                                              bool allocation,
+                                              int64_t sector_num,
+                                              int nb_sectors, int *pnum,
+                                              BlockDriverState **file)
 {
     Coroutine *co;
     BdrvCoGetBlockStatusData data = {
@@ -1917,6 +1929,7 @@  int64_t bdrv_get_block_status_above(BlockDriverState *bs,
         .sector_num = sector_num,
         .nb_sectors = nb_sectors,
         .pnum = pnum,
+        .allocation = allocation,
         .done = false,
     };

@@ -1932,6 +1945,16 @@  int64_t bdrv_get_block_status_above(BlockDriverState *bs,
     return data.ret;
 }

+int64_t bdrv_get_block_status_above(BlockDriverState *bs,
+                                    BlockDriverState *base,
+                                    int64_t sector_num,
+                                    int nb_sectors, int *pnum,
+                                    BlockDriverState **file)
+{
+    return bdrv_common_block_status_above(bs, base, false, sector_num,
+                                          nb_sectors, pnum, file);
+}
+
 int64_t bdrv_get_block_status(BlockDriverState *bs,
                               int64_t sector_num,
                               int nb_sectors, int *pnum,
@@ -1944,16 +1967,16 @@  int64_t bdrv_get_block_status(BlockDriverState *bs,
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
                                    int64_t bytes, int64_t *pnum)
 {
-    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
-    int nb_sectors = bytes >> BDRV_SECTOR_BITS;
     int64_t ret;
     int psectors;

     assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
     assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE) &&
            bytes < INT_MAX * BDRV_SECTOR_SIZE);
-    ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &psectors,
-                                NULL);
+    ret = bdrv_common_block_status_above(bs, backing_bs(bs), true,
+                                         offset >> BDRV_SECTOR_BITS,
+                                         bytes >> BDRV_SECTOR_BITS, &psectors,
+                                         NULL);
     if (ret < 0) {
         return ret;
     }