Message ID | 20170530104857.70083-2-vsementsov@virtuozzo.com |
---|---|
State | New |
Headers | show |
On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote: > The function should collect statistics, about allocted/unallocated by > top-level format driver space (in its .file) and allocation status > (allocated/hole/after eof) of corresponding areas in this .file. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > block.c | 16 ++++++++++++++++ > include/block/block.h | 3 +++ > include/block/block_int.h | 2 ++ > qapi/block-core.json | 26 ++++++++++++++++++++++++++ > 4 files changed, 47 insertions(+) > > diff --git a/block.c b/block.c > index 50ba264143..7d720ae0c2 100644 > --- a/block.c > +++ b/qapi/block-core.json > @@ -139,6 +139,32 @@ > '*format-specific': 'ImageInfoSpecific' } } > > ## > +# @BlockFormatAllocInfo: > +# > +# Information about allocations, including metadata. All fields are in bytes. > +# > +# @alloc_alloc: allocated by format driver and allocated in underlying file > +# New structs should favor naming with '-' rather than '_'. So this should be 'alloc-alloc', if we like the name. So this statistic represents the portions of the format file that are in use by the format, and which are backed by data in the underlying protocol. > +# @alloc_hole: allocated by format driver but actually is a hole in > +# underlying file The portions of the format file in use by the format, but where the entire cluster is a hole in the underlying file (note that with cluster size large enough, you can get a cluster that is part-data/part-hole in the underlying file, it looks like you are counting those as data). > +# > +# @alloc_overhead: allocated by format driver after end of underlying file The portions of the format file in use by the format, but where the entire cluster is beyond the end of the underlying file (the effective hole). Do we really need to distinguish between hole within the underlying file and hole beyond the end of the file? Or can this number be combined with the previous? > +# > +# @hole_alloc: not allocated by format driver but allocated in underlying file I'm not sure I like this name. "Hole" in file-system terminology means "reads-as-zero" - but here we are talking about portions of the format layer that are unallocated (defer to backing file, and we can't guarantee they read as zero unless there is no backing file). This statistic represents the portion of the underlying file that has been previously allocated to hold data, but which is now unused by the format layer (in other words, leaked clusters that can be reclaimed, or which can be reused instead of growing the underlying the file if new clusters are allocated). > +# > +# @hole_hole: not allocated by format driver hole in underlying file This statistic represents fragmentation - portions of the format layer that are no longer in use, but which are also occupying no space in the underlying file. A refragmentation operation (if we ever implemented one) could remove the address space used by these clusters, but would not change the disk usage. > +# > +# Since: 2.10 > +# > +## > +{ 'struct': 'BlockFormatAllocInfo', > + 'data': {'alloc_alloc': 'uint64', > + 'alloc_hole': 'uint64', > + 'alloc_overhead': 'uint64', > + 'hole_alloc': 'uint64', > + 'hole_hole': 'uint64' } } The idea seems okay, but the naming needs to be fixed. Also, I'm not sure if we need all 5, or if 4 is enough; and I'm not sure if we have the right names ("how does alloc-hole differ from hole-alloc?"), or if we can come up with something more descriptive. Particularly since "hole-" is not a hole in the filesystem sense, so much as unused clusters. But I'm also not coming up with better names to suggest at the moment.
30.05.2017 17:53, Eric Blake wrote: > On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote: >> The function should collect statistics, about allocted/unallocated by >> top-level format driver space (in its .file) and allocation status >> (allocated/hole/after eof) of corresponding areas in this .file. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >> --- >> block.c | 16 ++++++++++++++++ >> include/block/block.h | 3 +++ >> include/block/block_int.h | 2 ++ >> qapi/block-core.json | 26 ++++++++++++++++++++++++++ >> 4 files changed, 47 insertions(+) >> >> diff --git a/block.c b/block.c >> index 50ba264143..7d720ae0c2 100644 >> --- a/block.c >> +++ b/qapi/block-core.json >> @@ -139,6 +139,32 @@ >> '*format-specific': 'ImageInfoSpecific' } } >> >> ## >> +# @BlockFormatAllocInfo: >> +# >> +# Information about allocations, including metadata. All fields are in bytes. >> +# >> +# @alloc_alloc: allocated by format driver and allocated in underlying file >> +# > New structs should favor naming with '-' rather than '_'. So this should > be 'alloc-alloc', if we like the name. > > So this statistic represents the portions of the format file that are in > use by the format, and which are backed by data in the underlying protocol. > >> +# @alloc_hole: allocated by format driver but actually is a hole in >> +# underlying file > The portions of the format file in use by the format, but where the > entire cluster is a hole in the underlying file (note that with cluster > size large enough, you can get a cluster that is part-data/part-hole in > the underlying file, it looks like you are counting those as data). No, I account on sector boundary on bs->file. So it is not cluster-aligned. > >> +# >> +# @alloc_overhead: allocated by format driver after end of underlying file > The portions of the format file in use by the format, but where the > entire cluster is beyond the end of the underlying file (the effective > hole). Do we really need to distinguish between hole within the > underlying file and hole beyond the end of the file? Or can this number > be combined with the previous? alloc_alloc + alloc_hole + hole_alloc + hole_hole = size of bs->file, I think this is good and clear. these 4 stats describes bs->file usage in details (not on cluster boundary, but on sector, based on bs->file block status), alloc_overhead is separate. > >> +# >> +# @hole_alloc: not allocated by format driver but allocated in underlying file > I'm not sure I like this name. "Hole" in file-system terminology means > "reads-as-zero" - but here we are talking about portions of the format > layer that are unallocated (defer to backing file, and we can't > guarantee they read as zero unless there is no backing file). > > This statistic represents the portion of the underlying file that has > been previously allocated to hold data, but which is now unused by the > format layer (in other words, leaked clusters that can be reclaimed, or > which can be reused instead of growing the underlying the file if new > clusters are allocated). > >> +# >> +# @hole_hole: not allocated by format driver hole in underlying file > This statistic represents fragmentation - portions of the format layer > that are no longer in use, but which are also occupying no space in the > underlying file. A refragmentation operation (if we ever implemented > one) could remove the address space used by these clusters, but would > not change the disk usage. > >> +# >> +# Since: 2.10 >> +# >> +## >> +{ 'struct': 'BlockFormatAllocInfo', >> + 'data': {'alloc_alloc': 'uint64', >> + 'alloc_hole': 'uint64', >> + 'alloc_overhead': 'uint64', >> + 'hole_alloc': 'uint64', >> + 'hole_hole': 'uint64' } } > The idea seems okay, but the naming needs to be fixed. Also, I'm not > sure if we need all 5, or if 4 is enough; and I'm not sure if we have > the right names ("how does alloc-hole differ from hole-alloc?"), or if > we can come up with something more descriptive. Particularly since > "hole-" is not a hole in the filesystem sense, so much as unused > clusters. But I'm also not coming up with better names to suggest at > the moment. > yes, naming is a problem. Proposed has the advantage of short names, and easy to use, if you have read documentation) But without reading docs, it is hard to distinguish, what do they mean..
On 05/30/2017 10:27 AM, Vladimir Sementsov-Ogievskiy wrote: > 30.05.2017 17:53, Eric Blake wrote: >> On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote: >>> The function should collect statistics, about allocted/unallocated by >>> top-level format driver space (in its .file) and allocation status >>> (allocated/hole/after eof) of corresponding areas in this .file. >>> >>> +# @alloc_hole: allocated by format driver but actually is a hole in >>> +# underlying file >> The portions of the format file in use by the format, but where the >> entire cluster is a hole in the underlying file (note that with cluster >> size large enough, you can get a cluster that is part-data/part-hole in >> the underlying file, it looks like you are counting those as data). > > No, I account on sector boundary on bs->file. So it is not cluster-aligned. Okay, worth knowing (and therefore worth including in the documentation). > >> >>> +# >>> +# @alloc_overhead: allocated by format driver after end of >>> underlying file >> The portions of the format file in use by the format, but where the >> entire cluster is beyond the end of the underlying file (the effective >> hole). Do we really need to distinguish between hole within the >> underlying file and hole beyond the end of the file? Or can this number >> be combined with the previous? > > alloc_alloc + alloc_hole + hole_alloc + hole_hole = size of bs->file, I > think this is good and clear. That relationship is worth including in the documentation. > > these 4 stats describes bs->file usage in details (not on cluster > boundary, but on sector, based on bs->file block status), alloc_overhead > is separate. I'd lean more towards alloc-overrun than alloc-overhead (we have metadata overhead no matter what, but overrun does a nice job of explaining offsets that are important to the format but which are not present in the underlying protocol).
30.05.2017 17:53, Eric Blake wrote: > On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote: >> The function should collect statistics, about allocted/unallocated by >> top-level format driver space (in its .file) and allocation status >> (allocated/hole/after eof) of corresponding areas in this .file. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >> --- >> block.c | 16 ++++++++++++++++ >> include/block/block.h | 3 +++ >> include/block/block_int.h | 2 ++ >> qapi/block-core.json | 26 ++++++++++++++++++++++++++ >> 4 files changed, 47 insertions(+) >> >> diff --git a/block.c b/block.c >> index 50ba264143..7d720ae0c2 100644 >> --- a/block.c >> +++ b/qapi/block-core.json >> @@ -139,6 +139,32 @@ >> '*format-specific': 'ImageInfoSpecific' } } >> >> ## >> +# @BlockFormatAllocInfo: >> +# >> +# Information about allocations, including metadata. All fields are in bytes. s/All fields are in bytes./All fields are in bytes and aligned by sector (512 bytes)./ - ok? to emphasize that there is nothing about clusters... Or may be better to write that they are aligned by byte. >> +# >> +# @alloc_alloc: allocated by format driver and allocated in underlying file >> +# > New structs should favor naming with '-' rather than '_'. So this should > be 'alloc-alloc', if we like the name. > > So this statistic represents the portions of the format file that are in > use by the format, and which are backed by data in the underlying protocol. > >> +# @alloc_hole: allocated by format driver but actually is a hole in >> +# underlying file > The portions of the format file in use by the format, but where the > entire cluster is a hole in the underlying file (note that with cluster > size large enough, you can get a cluster that is part-data/part-hole in > the underlying file, it looks like you are counting those as data). > >> +# >> +# @alloc_overhead: allocated by format driver after end of underlying file > The portions of the format file in use by the format, but where the > entire cluster is beyond the end of the underlying file (the effective > hole). Do we really need to distinguish between hole within the > underlying file and hole beyond the end of the file? Or can this number > be combined with the previous? > >> +# >> +# @hole_alloc: not allocated by format driver but allocated in underlying file > I'm not sure I like this name. "Hole" in file-system terminology means > "reads-as-zero" - but here we are talking about portions of the format > layer that are unallocated (defer to backing file, and we can't > guarantee they read as zero unless there is no backing file). > > This statistic represents the portion of the underlying file that has > been previously allocated to hold data, but which is now unused by the > format layer (in other words, leaked clusters that can be reclaimed, or > which can be reused instead of growing the underlying the file if new > clusters are allocated). > >> +# >> +# @hole_hole: not allocated by format driver hole in underlying file > This statistic represents fragmentation - portions of the format layer > that are no longer in use, but which are also occupying no space in the > underlying file. A refragmentation operation (if we ever implemented > one) could remove the address space used by these clusters, but would > not change the disk usage. > >> +# >> +# Since: 2.10 >> +# >> +## >> +{ 'struct': 'BlockFormatAllocInfo', >> + 'data': {'alloc_alloc': 'uint64', >> + 'alloc_hole': 'uint64', >> + 'alloc_overhead': 'uint64', >> + 'hole_alloc': 'uint64', >> + 'hole_hole': 'uint64' } } > The idea seems okay, but the naming needs to be fixed. Also, I'm not > sure if we need all 5, or if 4 is enough; and I'm not sure if we have > the right names ("how does alloc-hole differ from hole-alloc?"), or if > we can come up with something more descriptive. Particularly since > "hole-" is not a hole in the filesystem sense, so much as unused > clusters. But I'm also not coming up with better names to suggest at > the moment. > how about: used-allocated used-discarded used-overrun unused-allocated unused-discarded also, do you mention that your detailed wordings should be included into block-core.json or you just clarify things?
On 06/02/2017 10:26 AM, Vladimir Sementsov-Ogievskiy wrote: > 30.05.2017 17:53, Eric Blake wrote: >> On 05/30/2017 05:48 AM, Vladimir Sementsov-Ogievskiy wrote: >>> The function should collect statistics, about allocted/unallocated by >>> top-level format driver space (in its .file) and allocation status >>> (allocated/hole/after eof) of corresponding areas in this .file. >>> >>> +# @BlockFormatAllocInfo: >>> +# >>> +# Information about allocations, including metadata. All fields are >>> in bytes. > > s/All fields are in bytes./All fields are in bytes and aligned by sector > (512 bytes)./ I wouldn't even promise sector alignment. We probably happen to have sector alignment (especially for qcow2, since the smallest cluster size permitted is sector aligned), but I see no inherent reason why we can't support sub-sector values if we are reporting in bytes. > > - ok? to emphasize that there is nothing about clusters... Or may be > better to write that they are aligned by byte. I think "All fields are in bytes" is sufficient. >>> +{ 'struct': 'BlockFormatAllocInfo', >>> + 'data': {'alloc_alloc': 'uint64', >>> + 'alloc_hole': 'uint64', >>> + 'alloc_overhead': 'uint64', >>> + 'hole_alloc': 'uint64', >>> + 'hole_hole': 'uint64' } } >> The idea seems okay, but the naming needs to be fixed. Also, I'm not >> sure if we need all 5, or if 4 is enough; and I'm not sure if we have >> the right names ("how does alloc-hole differ from hole-alloc?"), or if >> we can come up with something more descriptive. Particularly since >> "hole-" is not a hole in the filesystem sense, so much as unused >> clusters. But I'm also not coming up with better names to suggest at >> the moment. >> > how about: > > used-allocated > used-discarded > used-overrun > > unused-allocated > unused-discarded Those work for me. > > > also, do you mention that your detailed wordings should be included into > block-core.json or you just clarify things? Good documentation is worth the effort. I don't know if you want all of my details in block-core.json, but giving a better overview of how each statistic is possible does make it easier to visualize what the statistic is actually counting.
diff --git a/block.c b/block.c index 50ba264143..7d720ae0c2 100644 --- a/block.c +++ b/block.c @@ -3407,6 +3407,22 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs) } /** + * Collect format allocation info. See BlockFormatAllocInfo definition in + * qapi/block-core.json. + */ +int bdrv_get_format_alloc_stat(BlockDriverState *bs, BlockFormatAllocInfo *bfai) +{ + BlockDriver *drv = bs->drv; + if (!drv) { + return -ENOMEDIUM; + } + if (drv->bdrv_get_format_alloc_stat) { + return drv->bdrv_get_format_alloc_stat(bs, bfai); + } + return -ENOTSUP; +} + +/** * Return number of sectors on success, -errno on error. */ int64_t bdrv_nb_sectors(BlockDriverState *bs) diff --git a/include/block/block.h b/include/block/block.h index 9b355e92d8..646376a772 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -335,6 +335,9 @@ typedef enum { int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix); +int bdrv_get_format_alloc_stat(BlockDriverState *bs, + BlockFormatAllocInfo *bfai); + /* The units of offset and total_work_size may be chosen arbitrarily by the * block driver; total_work_size may change during the course of the amendment * operation */ diff --git a/include/block/block_int.h b/include/block/block_int.h index 8d3724cce6..458c715e99 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -208,6 +208,8 @@ struct BlockDriver { int64_t (*bdrv_getlength)(BlockDriverState *bs); bool has_variable_length; int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs); + int (*bdrv_get_format_alloc_stat)(BlockDriverState *bs, + BlockFormatAllocInfo *bfai); int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov); diff --git a/qapi/block-core.json b/qapi/block-core.json index ea0b3e8b13..365070b3eb 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -139,6 +139,32 @@ '*format-specific': 'ImageInfoSpecific' } } ## +# @BlockFormatAllocInfo: +# +# Information about allocations, including metadata. All fields are in bytes. +# +# @alloc_alloc: allocated by format driver and allocated in underlying file +# +# @alloc_hole: allocated by format driver but actually is a hole in +# underlying file +# +# @alloc_overhead: allocated by format driver after end of underlying file +# +# @hole_alloc: not allocated by format driver but allocated in underlying file +# +# @hole_hole: not allocated by format driver hole in underlying file +# +# Since: 2.10 +# +## +{ 'struct': 'BlockFormatAllocInfo', + 'data': {'alloc_alloc': 'uint64', + 'alloc_hole': 'uint64', + 'alloc_overhead': 'uint64', + 'hole_alloc': 'uint64', + 'hole_hole': 'uint64' } } + +## # @ImageCheck: # # Information about a QEMU image file check
The function should collect statistics, about allocted/unallocated by top-level format driver space (in its .file) and allocation status (allocated/hole/after eof) of corresponding areas in this .file. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- block.c | 16 ++++++++++++++++ include/block/block.h | 3 +++ include/block/block_int.h | 2 ++ qapi/block-core.json | 26 ++++++++++++++++++++++++++ 4 files changed, 47 insertions(+)