mbox series

[v5,00/10] preallocate filter

Message ID 20200821141123.28538-1-vsementsov@virtuozzo.com
Headers show
Series preallocate filter | expand

Message

Vladimir Sementsov-Ogievskiy Aug. 21, 2020, 2:11 p.m. UTC
Hi all!

Here is a filter, which does preallocation on write.

In Virtuozzo we have to deal with some custom distributed storage
solution, where allocation is relatively expensive operation. We have to
workaround it in Qemu, so here is a new filter.

v5: rewrite patch 08 as Nir suggested

v4:
01-04: add r-bs
05: add coroutine_fn tag
06: use QEMU_LOCK_GUARD and fix reqs_lock leak
07: grammar
08-10: add r-b

Vladimir Sementsov-Ogievskiy (10):
  block: simplify comment to BDRV_REQ_SERIALISING
  block/io.c: drop assertion on double waiting for request serialisation
  block/io: split out bdrv_find_conflicting_request
  block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
  block: bdrv_mark_request_serialising: split non-waiting function
  block: introduce BDRV_REQ_NO_WAIT flag
  block: introduce preallocate filter
  iotests.py: add verify_o_direct helper
  iotests.py: add filter_img_check
  iotests: add 298 to test new preallocate filter driver

 docs/system/qemu-block-drivers.rst.inc |  26 +++
 qapi/block-core.json                   |  20 +-
 include/block/block.h                  |  20 +-
 include/block/block_int.h              |   3 +-
 block/file-posix.c                     |   2 +-
 block/io.c                             | 130 ++++++-----
 block/preallocate.c                    | 291 +++++++++++++++++++++++++
 block/Makefile.objs                    |   1 +
 tests/qemu-iotests/298                 |  50 +++++
 tests/qemu-iotests/298.out             |   6 +
 tests/qemu-iotests/group               |   1 +
 tests/qemu-iotests/iotests.py          |  16 ++
 12 files changed, 498 insertions(+), 68 deletions(-)
 create mode 100644 block/preallocate.c
 create mode 100644 tests/qemu-iotests/298
 create mode 100644 tests/qemu-iotests/298.out

Comments

Vladimir Sementsov-Ogievskiy Aug. 27, 2020, 9:08 p.m. UTC | #1
21.08.2020 17:11, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is a filter, which does preallocation on write.
> 
> In Virtuozzo we have to deal with some custom distributed storage
> solution, where allocation is relatively expensive operation. We have to
> workaround it in Qemu, so here is a new filter.

I have a problem now with this thing.

We need preallocation. But we don't want to explicitly specify it in all the management tools. So it should be inserted by default. It's OK for us to keep this default different from upstream... But there are problems with the implicitly inserted filter (actually iotests fail and I failed to fix them)

1. I have to set bs->inherits_from for filter and it's child by hand after bdrv_replace_node(), otherwise bdrv_check_perm doesn't work.

2. I have to set filter_bs->implicit and teach bdrv_refresh_filename() to ignore implicit filters when it checks for drv->bdrv_file_open, to avoid appearing of json in backing file names

3. And the real design problem, which seems impossible to fix: reopen is broken, just because user is not prepared to the fact that file child is a filter, not a file node and has another options, and don't support options of file-posix.

And seems all it (and mostly [3]) shows that implicitly inserting the filter is near to be impossible..

So, what are possible solutions?

In virtuozzo7 we have preallocation feature done inside qcow2 driver. This is very uncomfortable: we should to handle each possible over-EOF write to underlying node (to keep data_end in sync to be able to shrink preallocation on close()).. I don't like this way and don't want to port it..

Another option is implementing preallocation inside file-posix driver. Then, instead of BDRV_REQ_NO_WAIT flag I'll need to extend serialising requests API (bdrv_make_request_serialising() is already used in file-posix.c) to dupport no-wait behavior + expanding the serialising request bounds. This option seems feasible, so I'll try this way if no other ideas.

Filter is obviously the true way: we use generic block layer for native request serialising, don't need to catch every write in qcow2 driver, don't need to modify any other driver and get a universal thing. But how to insert it implicitly (or at least automatically in some cases) and avoid all the problems?
Max Reitz Sept. 1, 2020, 3:07 p.m. UTC | #2
On 27.08.20 23:08, Vladimir Sementsov-Ogievskiy wrote:
> 21.08.2020 17:11, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> Here is a filter, which does preallocation on write.
>>
>> In Virtuozzo we have to deal with some custom distributed storage
>> solution, where allocation is relatively expensive operation. We have to
>> workaround it in Qemu, so here is a new filter.
> 
> I have a problem now with this thing.
> 
> We need preallocation. But we don't want to explicitly specify it in all
> the management tools.

Why?

> So it should be inserted by default.

Why?  You mean without any option?  That seems...  Interesting?

(Also like a recipe for reports of performance regression in some cases.)

> It's OK for
> us to keep this default different from upstream... But there are
> problems with the implicitly inserted filter (actually iotests fail and
> I failed to fix them)

I would suspect even if the iotests passed we would end up with a heap
of problems that we would only notice at some later point.  I thought
you too weren’t too fond of the idea of implicit filters.

> 1. I have to set bs->inherits_from for filter and it's child by hand
> after bdrv_replace_node(), otherwise bdrv_check_perm doesn't work.
> 
> 2. I have to set filter_bs->implicit and teach bdrv_refresh_filename()
> to ignore implicit filters when it checks for drv->bdrv_file_open, to
> avoid appearing of json in backing file names
> 
> 3. And the real design problem, which seems impossible to fix: reopen is
> broken, just because user is not prepared to the fact that file child is
> a filter, not a file node and has another options, and don't support
> options of file-posix.

Well, what should I say.  I feel like we have made efforts in the past
years to make the block graph fully visible to users and yield the
responsibility of managing it to the users, too, so I’m not surprised if
a step backwards breaks that.

> And seems all it (and mostly [3]) shows that implicitly inserting the
> filter is near to be impossible..
> 
> So, what are possible solutions?
> 
> In virtuozzo7 we have preallocation feature done inside qcow2 driver.
> This is very uncomfortable: we should to handle each possible over-EOF
> write to underlying node (to keep data_end in sync to be able to shrink
> preallocation on close()).. I don't like this way and don't want to port
> it..
> 
> Another option is implementing preallocation inside file-posix driver.
> Then, instead of BDRV_REQ_NO_WAIT flag I'll need to extend serialising
> requests API (bdrv_make_request_serialising() is already used in
> file-posix.c) to dupport no-wait behavior + expanding the serialising
> request bounds. This option seems feasible, so I'll try this way if no
> other ideas.

Possible, but you haven’t yet explained what the problem with the
management layer inserting the preallocation filter is.

> Filter is obviously the true way: we use generic block layer for native
> request serialising, don't need to catch every write in qcow2 driver,
> don't need to modify any other driver and get a universal thing. But how
> to insert it implicitly (or at least automatically in some cases) and
> avoid all the problems?

I don’t understand why inserting it implicitly is important.

Max
Vladimir Sementsov-Ogievskiy Sept. 17, 2020, 9 a.m. UTC | #3
01.09.2020 18:07, Max Reitz wrote:
> On 27.08.20 23:08, Vladimir Sementsov-Ogievskiy wrote:
>> 21.08.2020 17:11, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> Here is a filter, which does preallocation on write.
>>>
>>> In Virtuozzo we have to deal with some custom distributed storage
>>> solution, where allocation is relatively expensive operation. We have to
>>> workaround it in Qemu, so here is a new filter.
>>
>> I have a problem now with this thing.
>>
>> We need preallocation. But we don't want to explicitly specify it in all
>> the management tools.
> 
> Why?
> 
>> So it should be inserted by default.
> 
> Why?  You mean without any option?  That seems...  Interesting?
> 
> (Also like a recipe for reports of performance regression in some cases.)
> 
>> It's OK for
>> us to keep this default different from upstream... But there are
>> problems with the implicitly inserted filter (actually iotests fail and
>> I failed to fix them)
> 
> I would suspect even if the iotests passed we would end up with a heap
> of problems that we would only notice at some later point.  I thought
> you too weren’t too fond of the idea of implicit filters.
> 
>> 1. I have to set bs->inherits_from for filter and it's child by hand
>> after bdrv_replace_node(), otherwise bdrv_check_perm doesn't work.
>>
>> 2. I have to set filter_bs->implicit and teach bdrv_refresh_filename()
>> to ignore implicit filters when it checks for drv->bdrv_file_open, to
>> avoid appearing of json in backing file names
>>
>> 3. And the real design problem, which seems impossible to fix: reopen is
>> broken, just because user is not prepared to the fact that file child is
>> a filter, not a file node and has another options, and don't support
>> options of file-posix.
> 
> Well, what should I say.  I feel like we have made efforts in the past
> years to make the block graph fully visible to users and yield the
> responsibility of managing it to the users, too, so I’m not surprised if
> a step backwards breaks that.
> 
>> And seems all it (and mostly [3]) shows that implicitly inserting the
>> filter is near to be impossible..
>>
>> So, what are possible solutions?
>>
>> In virtuozzo7 we have preallocation feature done inside qcow2 driver.
>> This is very uncomfortable: we should to handle each possible over-EOF
>> write to underlying node (to keep data_end in sync to be able to shrink
>> preallocation on close()).. I don't like this way and don't want to port
>> it..
>>
>> Another option is implementing preallocation inside file-posix driver.
>> Then, instead of BDRV_REQ_NO_WAIT flag I'll need to extend serialising
>> requests API (bdrv_make_request_serialising() is already used in
>> file-posix.c) to dupport no-wait behavior + expanding the serialising
>> request bounds. This option seems feasible, so I'll try this way if no
>> other ideas.
> 
> Possible, but you haven’t yet explained what the problem with the
> management layer inserting the preallocation filter is.
> 
>> Filter is obviously the true way: we use generic block layer for native
>> request serialising, don't need to catch every write in qcow2 driver,
>> don't need to modify any other driver and get a universal thing. But how
>> to insert it implicitly (or at least automatically in some cases) and
>> avoid all the problems?
> 
> I don’t understand why inserting it implicitly is important.
> 

You are right. Thanks for strong point of view, this makes me to revise my own. Now I'm working on v6.