diff mbox

[1/4] block: add bdrv_get_format_alloc_stat format interface

Message ID 20170530104857.70083-2-vsementsov@virtuozzo.com
State New
Headers show

Commit Message

Vladimir Sementsov-Ogievskiy May 30, 2017, 10:48 a.m. UTC
The function should collect statistics, about allocted/unallocated by
top-level format driver space (in its .file) and allocation status
(allocated/hole/after eof) of corresponding areas in this .file.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c                   | 16 ++++++++++++++++
 include/block/block.h     |  3 +++
 include/block/block_int.h |  2 ++
 qapi/block-core.json      | 26 ++++++++++++++++++++++++++
 4 files changed, 47 insertions(+)

Comments

Eric Blake May 30, 2017, 2:53 p.m. UTC | #1
On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote:
> The function should collect statistics, about allocted/unallocated by
> top-level format driver space (in its .file) and allocation status
> (allocated/hole/after eof) of corresponding areas in this .file.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block.c                   | 16 ++++++++++++++++
>  include/block/block.h     |  3 +++
>  include/block/block_int.h |  2 ++
>  qapi/block-core.json      | 26 ++++++++++++++++++++++++++
>  4 files changed, 47 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 50ba264143..7d720ae0c2 100644
> --- a/block.c

> +++ b/qapi/block-core.json
> @@ -139,6 +139,32 @@
>             '*format-specific': 'ImageInfoSpecific' } }
>  
>  ##
> +# @BlockFormatAllocInfo:
> +#
> +# Information about allocations, including metadata. All fields are in bytes.
> +#
> +# @alloc_alloc: allocated by format driver and allocated in underlying file
> +#

New structs should favor naming with '-' rather than '_'. So this should
be 'alloc-alloc', if we like the name.

So this statistic represents the portions of the format file that are in
use by the format, and which are backed by data in the underlying protocol.

> +# @alloc_hole: allocated by format driver but actually is a hole in
> +#              underlying file

The portions of the format file in use by the format, but where the
entire cluster is a hole in the underlying file (note that with cluster
size large enough, you can get a cluster that is part-data/part-hole in
the underlying file, it looks like you are counting those as data).

> +#
> +# @alloc_overhead: allocated by format driver after end of underlying file

The portions of the format file in use by the format, but where the
entire cluster is beyond the end of the underlying file (the effective
hole).  Do we really need to distinguish between hole within the
underlying file and hole beyond the end of the file? Or can this number
be combined with the previous?

> +#
> +# @hole_alloc: not allocated by format driver but allocated in underlying file

I'm not sure I like this name.  "Hole" in file-system terminology means
"reads-as-zero" - but here we are talking about portions of the format
layer that are unallocated (defer to backing file, and we can't
guarantee they read as zero unless there is no backing file).

This statistic represents the portion of the underlying file that has
been previously allocated to hold data, but which is now unused by the
format layer (in other words, leaked clusters that can be reclaimed, or
which can be reused instead of growing the underlying the file if new
clusters are allocated).

> +#
> +# @hole_hole: not allocated by format driver hole in underlying file

This statistic represents fragmentation - portions of the format layer
that are no longer in use, but which are also occupying no space in the
underlying file.  A refragmentation operation (if we ever implemented
one) could remove the address space used by these clusters, but would
not change the disk usage.

> +#
> +# Since: 2.10
> +#
> +##
> +{ 'struct': 'BlockFormatAllocInfo',
> +  'data': {'alloc_alloc':    'uint64',
> +           'alloc_hole':     'uint64',
> +           'alloc_overhead': 'uint64',
> +           'hole_alloc':     'uint64',
> +           'hole_hole':      'uint64' } }

The idea seems okay, but the naming needs to be fixed.  Also, I'm not
sure if we need all 5, or if 4 is enough; and I'm not sure if we have
the right names ("how does alloc-hole differ from hole-alloc?"), or if
we can come up with something more descriptive.  Particularly since
"hole-" is not a hole in the filesystem sense, so much as unused
clusters.  But I'm also not coming up with better names to suggest at
the moment.
Vladimir Sementsov-Ogievskiy May 30, 2017, 3:27 p.m. UTC | #2
30.05.2017 17:53, Eric Blake wrote:
> On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote:
>> The function should collect statistics, about allocted/unallocated by
>> top-level format driver space (in its .file) and allocation status
>> (allocated/hole/after eof) of corresponding areas in this .file.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block.c                   | 16 ++++++++++++++++
>>   include/block/block.h     |  3 +++
>>   include/block/block_int.h |  2 ++
>>   qapi/block-core.json      | 26 ++++++++++++++++++++++++++
>>   4 files changed, 47 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 50ba264143..7d720ae0c2 100644
>> --- a/block.c
>> +++ b/qapi/block-core.json
>> @@ -139,6 +139,32 @@
>>              '*format-specific': 'ImageInfoSpecific' } }
>>   
>>   ##
>> +# @BlockFormatAllocInfo:
>> +#
>> +# Information about allocations, including metadata. All fields are in bytes.
>> +#
>> +# @alloc_alloc: allocated by format driver and allocated in underlying file
>> +#
> New structs should favor naming with '-' rather than '_'. So this should
> be 'alloc-alloc', if we like the name.
>
> So this statistic represents the portions of the format file that are in
> use by the format, and which are backed by data in the underlying protocol.
>
>> +# @alloc_hole: allocated by format driver but actually is a hole in
>> +#              underlying file
> The portions of the format file in use by the format, but where the
> entire cluster is a hole in the underlying file (note that with cluster
> size large enough, you can get a cluster that is part-data/part-hole in
> the underlying file, it looks like you are counting those as data).

No, I account on sector boundary on bs->file. So it is not cluster-aligned.

>
>> +#
>> +# @alloc_overhead: allocated by format driver after end of underlying file
> The portions of the format file in use by the format, but where the
> entire cluster is beyond the end of the underlying file (the effective
> hole).  Do we really need to distinguish between hole within the
> underlying file and hole beyond the end of the file? Or can this number
> be combined with the previous?

alloc_alloc + alloc_hole + hole_alloc + hole_hole = size of bs->file, I 
think this is good and clear.

these 4 stats describes bs->file usage in details (not on cluster 
boundary, but on sector, based on bs->file block status), alloc_overhead 
is separate.

>
>> +#
>> +# @hole_alloc: not allocated by format driver but allocated in underlying file
> I'm not sure I like this name.  "Hole" in file-system terminology means
> "reads-as-zero" - but here we are talking about portions of the format
> layer that are unallocated (defer to backing file, and we can't
> guarantee they read as zero unless there is no backing file).
>
> This statistic represents the portion of the underlying file that has
> been previously allocated to hold data, but which is now unused by the
> format layer (in other words, leaked clusters that can be reclaimed, or
> which can be reused instead of growing the underlying the file if new
> clusters are allocated).
>
>> +#
>> +# @hole_hole: not allocated by format driver hole in underlying file
> This statistic represents fragmentation - portions of the format layer
> that are no longer in use, but which are also occupying no space in the
> underlying file.  A refragmentation operation (if we ever implemented
> one) could remove the address space used by these clusters, but would
> not change the disk usage.
>
>> +#
>> +# Since: 2.10
>> +#
>> +##
>> +{ 'struct': 'BlockFormatAllocInfo',
>> +  'data': {'alloc_alloc':    'uint64',
>> +           'alloc_hole':     'uint64',
>> +           'alloc_overhead': 'uint64',
>> +           'hole_alloc':     'uint64',
>> +           'hole_hole':      'uint64' } }
> The idea seems okay, but the naming needs to be fixed.  Also, I'm not
> sure if we need all 5, or if 4 is enough; and I'm not sure if we have
> the right names ("how does alloc-hole differ from hole-alloc?"), or if
> we can come up with something more descriptive.  Particularly since
> "hole-" is not a hole in the filesystem sense, so much as unused
> clusters.  But I'm also not coming up with better names to suggest at
> the moment.
>
yes, naming is a problem. Proposed has the advantage of short names, and 
easy to use, if you have read documentation) But without reading docs, 
it is hard to distinguish, what do they mean..
Eric Blake May 30, 2017, 3:43 p.m. UTC | #3
On 05/30/2017 10:27 AM, Vladimir Sementsov-Ogievskiy wrote:
> 30.05.2017 17:53, Eric Blake wrote:
>> On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> The function should collect statistics, about allocted/unallocated by
>>> top-level format driver space (in its .file) and allocation status
>>> (allocated/hole/after eof) of corresponding areas in this .file.
>>>

>>> +# @alloc_hole: allocated by format driver but actually is a hole in
>>> +#              underlying file
>> The portions of the format file in use by the format, but where the
>> entire cluster is a hole in the underlying file (note that with cluster
>> size large enough, you can get a cluster that is part-data/part-hole in
>> the underlying file, it looks like you are counting those as data).
> 
> No, I account on sector boundary on bs->file. So it is not cluster-aligned.

Okay, worth knowing (and therefore worth including in the documentation).

> 
>>
>>> +#
>>> +# @alloc_overhead: allocated by format driver after end of
>>> underlying file
>> The portions of the format file in use by the format, but where the
>> entire cluster is beyond the end of the underlying file (the effective
>> hole).  Do we really need to distinguish between hole within the
>> underlying file and hole beyond the end of the file? Or can this number
>> be combined with the previous?
> 
> alloc_alloc + alloc_hole + hole_alloc + hole_hole = size of bs->file, I
> think this is good and clear.

That relationship is worth including in the documentation.

> 
> these 4 stats describes bs->file usage in details (not on cluster
> boundary, but on sector, based on bs->file block status), alloc_overhead
> is separate.

I'd lean more towards alloc-overrun than alloc-overhead (we have
metadata overhead no matter what, but overrun does a nice job of
explaining offsets that are important to the format but which are not
present in the underlying protocol).
Vladimir Sementsov-Ogievskiy June 2, 2017, 3:26 p.m. UTC | #4
30.05.2017 17:53, Eric Blake wrote:
> On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote:
>> The function should collect statistics, about allocted/unallocated by
>> top-level format driver space (in its .file) and allocation status
>> (allocated/hole/after eof) of corresponding areas in this .file.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block.c                   | 16 ++++++++++++++++
>>   include/block/block.h     |  3 +++
>>   include/block/block_int.h |  2 ++
>>   qapi/block-core.json      | 26 ++++++++++++++++++++++++++
>>   4 files changed, 47 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 50ba264143..7d720ae0c2 100644
>> --- a/block.c
>> +++ b/qapi/block-core.json
>> @@ -139,6 +139,32 @@
>>              '*format-specific': 'ImageInfoSpecific' } }
>>   
>>   ##
>> +# @BlockFormatAllocInfo:
>> +#
>> +# Information about allocations, including metadata. All fields are in bytes.

s/All fields are in bytes./All fields are in bytes and aligned by sector 
(512 bytes)./

- ok? to emphasize that there is nothing about clusters... Or may be 
better to write that they are aligned by byte.

>> +#
>> +# @alloc_alloc: allocated by format driver and allocated in underlying file
>> +#
> New structs should favor naming with '-' rather than '_'. So this should
> be 'alloc-alloc', if we like the name.
>
> So this statistic represents the portions of the format file that are in
> use by the format, and which are backed by data in the underlying protocol.
>
>> +# @alloc_hole: allocated by format driver but actually is a hole in
>> +#              underlying file
> The portions of the format file in use by the format, but where the
> entire cluster is a hole in the underlying file (note that with cluster
> size large enough, you can get a cluster that is part-data/part-hole in
> the underlying file, it looks like you are counting those as data).
>
>> +#
>> +# @alloc_overhead: allocated by format driver after end of underlying file
> The portions of the format file in use by the format, but where the
> entire cluster is beyond the end of the underlying file (the effective
> hole).  Do we really need to distinguish between hole within the
> underlying file and hole beyond the end of the file? Or can this number
> be combined with the previous?
>
>> +#
>> +# @hole_alloc: not allocated by format driver but allocated in underlying file
> I'm not sure I like this name.  "Hole" in file-system terminology means
> "reads-as-zero" - but here we are talking about portions of the format
> layer that are unallocated (defer to backing file, and we can't
> guarantee they read as zero unless there is no backing file).
>
> This statistic represents the portion of the underlying file that has
> been previously allocated to hold data, but which is now unused by the
> format layer (in other words, leaked clusters that can be reclaimed, or
> which can be reused instead of growing the underlying the file if new
> clusters are allocated).
>
>> +#
>> +# @hole_hole: not allocated by format driver hole in underlying file
> This statistic represents fragmentation - portions of the format layer
> that are no longer in use, but which are also occupying no space in the
> underlying file.  A refragmentation operation (if we ever implemented
> one) could remove the address space used by these clusters, but would
> not change the disk usage.
>
>> +#
>> +# Since: 2.10
>> +#
>> +##
>> +{ 'struct': 'BlockFormatAllocInfo',
>> +  'data': {'alloc_alloc':    'uint64',
>> +           'alloc_hole':     'uint64',
>> +           'alloc_overhead': 'uint64',
>> +           'hole_alloc':     'uint64',
>> +           'hole_hole':      'uint64' } }
> The idea seems okay, but the naming needs to be fixed.  Also, I'm not
> sure if we need all 5, or if 4 is enough; and I'm not sure if we have
> the right names ("how does alloc-hole differ from hole-alloc?"), or if
> we can come up with something more descriptive.  Particularly since
> "hole-" is not a hole in the filesystem sense, so much as unused
> clusters.  But I'm also not coming up with better names to suggest at
> the moment.
>
how about:

used-allocated
used-discarded
used-overrun

unused-allocated
unused-discarded


also, do you mention that your detailed wordings should be included into 
block-core.json or you just clarify things?
Eric Blake June 6, 2017, 12:08 p.m. UTC | #5
On 06/02/2017 10:26 AM, Vladimir Sementsov-Ogievskiy wrote:
> 30.05.2017 17:53, Eric Blake wrote:
>> On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> The function should collect statistics, about allocted/unallocated by
>>> top-level format driver space (in its .file) and allocation status
>>> (allocated/hole/after eof) of corresponding areas in this .file.
>>>

>>> +# @BlockFormatAllocInfo:
>>> +#
>>> +# Information about allocations, including metadata. All fields are
>>> in bytes.
> 
> s/All fields are in bytes./All fields are in bytes and aligned by sector
> (512 bytes)./

I wouldn't even promise sector alignment. We probably happen to have
sector alignment (especially for qcow2, since the smallest cluster size
permitted is sector aligned), but I see no inherent reason why we can't
support sub-sector values if we are reporting in bytes.

> 
> - ok? to emphasize that there is nothing about clusters... Or may be
> better to write that they are aligned by byte.

I think "All fields are in bytes" is sufficient.


>>> +{ 'struct': 'BlockFormatAllocInfo',
>>> +  'data': {'alloc_alloc':    'uint64',
>>> +           'alloc_hole':     'uint64',
>>> +           'alloc_overhead': 'uint64',
>>> +           'hole_alloc':     'uint64',
>>> +           'hole_hole':      'uint64' } }
>> The idea seems okay, but the naming needs to be fixed.  Also, I'm not
>> sure if we need all 5, or if 4 is enough; and I'm not sure if we have
>> the right names ("how does alloc-hole differ from hole-alloc?"), or if
>> we can come up with something more descriptive.  Particularly since
>> "hole-" is not a hole in the filesystem sense, so much as unused
>> clusters.  But I'm also not coming up with better names to suggest at
>> the moment.
>>
> how about:
> 
> used-allocated
> used-discarded
> used-overrun
> 
> unused-allocated
> unused-discarded

Those work for me.

> 
> 
> also, do you mention that your detailed wordings should be included into
> block-core.json or you just clarify things?

Good documentation is worth the effort. I don't know if you want all of
my details in block-core.json, but giving a better overview of how each
statistic is possible does make it easier to visualize what the
statistic is actually counting.
diff mbox

Patch

diff --git a/block.c b/block.c
index 50ba264143..7d720ae0c2 100644
--- a/block.c
+++ b/block.c
@@ -3407,6 +3407,22 @@  int64_t bdrv_get_allocated_file_size(BlockDriverState *bs)
 }
 
 /**
+ * Collect format allocation info. See BlockFormatAllocInfo definition in
+ * qapi/block-core.json.
+ */
+int bdrv_get_format_alloc_stat(BlockDriverState *bs, BlockFormatAllocInfo *bfai)
+{
+    BlockDriver *drv = bs->drv;
+    if (!drv) {
+        return -ENOMEDIUM;
+    }
+    if (drv->bdrv_get_format_alloc_stat) {
+        return drv->bdrv_get_format_alloc_stat(bs, bfai);
+    }
+    return -ENOTSUP;
+}
+
+/**
  * Return number of sectors on success, -errno on error.
  */
 int64_t bdrv_nb_sectors(BlockDriverState *bs)
diff --git a/include/block/block.h b/include/block/block.h
index 9b355e92d8..646376a772 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -335,6 +335,9 @@  typedef enum {
 
 int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
 
+int bdrv_get_format_alloc_stat(BlockDriverState *bs,
+                               BlockFormatAllocInfo *bfai);
+
 /* The units of offset and total_work_size may be chosen arbitrarily by the
  * block driver; total_work_size may change during the course of the amendment
  * operation */
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8d3724cce6..458c715e99 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -208,6 +208,8 @@  struct BlockDriver {
     int64_t (*bdrv_getlength)(BlockDriverState *bs);
     bool has_variable_length;
     int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
+    int (*bdrv_get_format_alloc_stat)(BlockDriverState *bs,
+                                      BlockFormatAllocInfo *bfai);
 
     int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
         uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ea0b3e8b13..365070b3eb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -139,6 +139,32 @@ 
            '*format-specific': 'ImageInfoSpecific' } }
 
 ##
+# @BlockFormatAllocInfo:
+#
+# Information about allocations, including metadata. All fields are in bytes.
+#
+# @alloc_alloc: allocated by format driver and allocated in underlying file
+#
+# @alloc_hole: allocated by format driver but actually is a hole in
+#              underlying file
+#
+# @alloc_overhead: allocated by format driver after end of underlying file
+#
+# @hole_alloc: not allocated by format driver but allocated in underlying file
+#
+# @hole_hole: not allocated by format driver hole in underlying file
+#
+# Since: 2.10
+#
+##
+{ 'struct': 'BlockFormatAllocInfo',
+  'data': {'alloc_alloc':    'uint64',
+           'alloc_hole':     'uint64',
+           'alloc_overhead': 'uint64',
+           'hole_alloc':     'uint64',
+           'hole_hole':      'uint64' } }
+
+##
 # @ImageCheck:
 #
 # Information about a QEMU image file check