diff mbox

[v2,02/15] file-posix: support BDRV_REQ_ALLOCATE

Message ID 1496330073-51338-3-git-send-email-anton.nefedov@virtuozzo.com
State New
Headers show

Commit Message

Anton Nefedov June 1, 2017, 3:14 p.m. UTC
Current write_zeroes implementation is good enough to satisfy this flag too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/file-posix.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Eric Blake June 1, 2017, 7:49 p.m. UTC | #1
On 06/01/2017 10:14 AM, Anton Nefedov wrote:
> Current write_zeroes implementation is good enough to satisfy this flag too
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  block/file-posix.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Are we sure that fallocate() is always fast, or are there some file
systems where it is no faster than manually writing zeroes?  I'm worried
that blindly claiming BDRV_REQ_ALLOCATE may fail if we encounter a libc
or kernel-based fallback that takes a slow patch on our behalf.
Eric Blake June 1, 2017, 7:54 p.m. UTC | #2
On 06/01/2017 02:49 PM, Eric Blake wrote:
> On 06/01/2017 10:14 AM, Anton Nefedov wrote:
>> Current write_zeroes implementation is good enough to satisfy this flag too
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> ---
>>  block/file-posix.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> Are we sure that fallocate() is always fast, or are there some file
> systems where it is no faster than manually writing zeroes?  I'm worried
> that blindly claiming BDRV_REQ_ALLOCATE may fail if we encounter a libc

not so much fail as in "break the guest", but fail as in "take far more
time than we were expecting, pessimising our behavior to worse than if
we had not tried the allocation at all"

> or kernel-based fallback that takes a slow patch on our behalf.
>
Anton Nefedov June 2, 2017, 2:34 p.m. UTC | #3
On 06/01/2017 10:54 PM, Eric Blake wrote:
> On 06/01/2017 02:49 PM, Eric Blake wrote:
>> On 06/01/2017 10:14 AM, Anton Nefedov wrote:
>>> Current write_zeroes implementation is good enough to satisfy this flag too
>>>
>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>> ---
>>>   block/file-posix.c | 9 ++++++++-
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> Are we sure that fallocate() is always fast, or are there some file
>> systems where it is no faster than manually writing zeroes?  I'm worried
>> that blindly claiming BDRV_REQ_ALLOCATE may fail if we encounter a libc
> 
> not so much fail as in "break the guest", but fail as in "take far more
> time than we were expecting, pessimising our behavior to worse than if
> we had not tried the allocation at all"
> 
>> or kernel-based fallback that takes a slow patch on our behalf.
>>
> 

I would expect such filesystems to not support fallocate.

Though I must admit I can't see anywhere in the documentation that it 
MUST be strictly faster than writing zeroes; it would look very strange
to me if there were a slowpath fallback somewhere past the libc.

/Anton
diff mbox

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index de2d3a2..117bbee 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -527,7 +527,6 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
 
     s->has_discard = true;
     s->has_write_zeroes = true;
-    bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
     if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
         s->needs_alignment = true;
     }
@@ -577,6 +576,11 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
 #endif
 
+    bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
+    if (s->has_write_zeroes || s->has_fallocate) {
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
+    }
+
     ret = 0;
 fail:
     if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
@@ -1390,6 +1394,9 @@  static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
     }
 #endif
 
+    if (!s->has_fallocate) {
+        aiocb->bs->supported_zero_flags &= ~BDRV_REQ_ALLOCATE;
+    }
     return -ENOTSUP;
 }