diff mbox

[5/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes

Message ID 1422607337-25335-6-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Jan. 30, 2015, 8:42 a.m. UTC
There is a possibility that we are extending our image and thus writing
zeroes beyond the end of the file. In this case we do not need to care
about the hole to make sure that there is no data in the file under
this offset (pre-condition to fallocate(0) to work). We could simply call
fallocate(0).

This improves the performance of writing zeroes even on really old
platforms which do not have even FALLOC_FL_PUNCH_HOLE.

Before the patch do_fallocate was used when either
CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
Now the story is different. CONFIG_FALLOCATE is defined when Linux
fallocate is defined, posix_fallocate is completely different story
(CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
thus we are on the safe side.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Max Reitz <mreitz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
CC: Fam Zheng <famz@redhat.com>
---
 block/raw-posix.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Max Reitz Jan. 30, 2015, 2:58 p.m. UTC | #1
On 2015-01-30 at 03:42, Denis V. Lunev wrote:
> There is a possibility that we are extending our image and thus writing
> zeroes beyond the end of the file. In this case we do not need to care
> about the hole to make sure that there is no data in the file under
> this offset (pre-condition to fallocate(0) to work). We could simply call
> fallocate(0).
>
> This improves the performance of writing zeroes even on really old
> platforms which do not have even FALLOC_FL_PUNCH_HOLE.
>
> Before the patch do_fallocate was used when either
> CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
> Now the story is different. CONFIG_FALLOCATE is defined when Linux
> fallocate is defined, posix_fallocate is completely different story
> (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
> for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
> thus we are on the safe side.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Peter Lieven <pl@kamp.de>
> CC: Fam Zheng <famz@redhat.com>
> ---
>   block/raw-posix.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 5a777e7..1c88ad8 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -147,6 +147,7 @@ typedef struct BDRVRawState {
>       bool has_discard:1;
>       bool has_write_zeroes:1;
>       bool discard_zeroes:1;
> +    bool has_fallocate;
>       bool needs_alignment;
>   } BDRVRawState;
>   
> @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>       }
>       if (S_ISREG(st.st_mode)) {
>           s->discard_zeroes = true;
> +        s->has_fallocate = true;

This could be moved upwards where has_discard and has_write_zeroes are 
initialized; but it won't matter in practice, I hope. Thus:

Reviewed-by: Max Reitz <mreitz@redhat.com>

>       }
>       if (S_ISBLK(st.st_mode)) {
>   #ifdef BLKDISCARDZEROES
> @@ -902,7 +904,7 @@ static int translate_err(int err)
>       return err;
>   }
>   
> -#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || defined(CONFIG_FALLOCATE_ZERO_RANGE)
> +#ifdef CONFIG_FALLOCATE
>   static int do_fallocate(int fd, int mode, off_t offset, off_t len)
>   {
>       do {
> @@ -965,6 +967,16 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>       }
>   #endif
>   
> +#ifdef CONFIG_FALLOCATE
> +    if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
> +        int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
> +        if (ret == 0 || ret != -ENOTSUP) {
> +            return ret;
> +        }
> +        s->has_fallocate = false;
> +    }
> +#endif
> +
>       return -ENOTSUP;
>   }
>
Denis V. Lunev Jan. 30, 2015, 3:41 p.m. UTC | #2
On 30/01/15 17:58, Max Reitz wrote:
> On 2015-01-30 at 03:42, Denis V. Lunev wrote:
>> There is a possibility that we are extending our image and thus writing
>> zeroes beyond the end of the file. In this case we do not need to care
>> about the hole to make sure that there is no data in the file under
>> this offset (pre-condition to fallocate(0) to work). We could simply 
>> call
>> fallocate(0).
>>
>> This improves the performance of writing zeroes even on really old
>> platforms which do not have even FALLOC_FL_PUNCH_HOLE.
>>
>> Before the patch do_fallocate was used when either
>> CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
>> Now the story is different. CONFIG_FALLOCATE is defined when Linux
>> fallocate is defined, posix_fallocate is completely different story
>> (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
>> for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
>> thus we are on the safe side.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Peter Lieven <pl@kamp.de>
>> CC: Fam Zheng <famz@redhat.com>
>> ---
>>   block/raw-posix.c | 14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 5a777e7..1c88ad8 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -147,6 +147,7 @@ typedef struct BDRVRawState {
>>       bool has_discard:1;
>>       bool has_write_zeroes:1;
>>       bool discard_zeroes:1;
>> +    bool has_fallocate;
>>       bool needs_alignment;
>>   } BDRVRawState;
>>   @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
>> *bs, QDict *options,
>>       }
>>       if (S_ISREG(st.st_mode)) {
>>           s->discard_zeroes = true;
>> +        s->has_fallocate = true;
>
> This could be moved upwards where has_discard and has_write_zeroes are 
> initialized; but it won't matter in practice, I hope. Thus:
>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

This does matter as has_discard and has_write_zeroes are bit fields
thus I can not insert something useful into the middle of those
fields.
Max Reitz Jan. 30, 2015, 3:42 p.m. UTC | #3
On 2015-01-30 at 10:41, Denis V. Lunev wrote:
> On 30/01/15 17:58, Max Reitz wrote:
>> On 2015-01-30 at 03:42, Denis V. Lunev wrote:
>>> There is a possibility that we are extending our image and thus writing
>>> zeroes beyond the end of the file. In this case we do not need to care
>>> about the hole to make sure that there is no data in the file under
>>> this offset (pre-condition to fallocate(0) to work). We could simply 
>>> call
>>> fallocate(0).
>>>
>>> This improves the performance of writing zeroes even on really old
>>> platforms which do not have even FALLOC_FL_PUNCH_HOLE.
>>>
>>> Before the patch do_fallocate was used when either
>>> CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are defined.
>>> Now the story is different. CONFIG_FALLOCATE is defined when Linux
>>> fallocate is defined, posix_fallocate is completely different story
>>> (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
>>> for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
>>> thus we are on the safe side.
>>>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Peter Lieven <pl@kamp.de>
>>> CC: Fam Zheng <famz@redhat.com>
>>> ---
>>>   block/raw-posix.c | 14 +++++++++++++-
>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>> index 5a777e7..1c88ad8 100644
>>> --- a/block/raw-posix.c
>>> +++ b/block/raw-posix.c
>>> @@ -147,6 +147,7 @@ typedef struct BDRVRawState {
>>>       bool has_discard:1;
>>>       bool has_write_zeroes:1;
>>>       bool discard_zeroes:1;
>>> +    bool has_fallocate;
>>>       bool needs_alignment;
>>>   } BDRVRawState;
>>>   @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
>>> *bs, QDict *options,
>>>       }
>>>       if (S_ISREG(st.st_mode)) {
>>>           s->discard_zeroes = true;
>>> +        s->has_fallocate = true;
>>
>> This could be moved upwards where has_discard and has_write_zeroes 
>> are initialized; but it won't matter in practice, I hope. Thus:
>>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>
> This does matter as has_discard and has_write_zeroes are bit fields
> thus I can not insert something useful into the middle of those
> fields.

Right, but I did not mean the placement inside of the structure but the 
placement of the initialization statement (s->has_fallocate = true) in 
raw_open_common().

Max
Denis V. Lunev Jan. 30, 2015, 3:53 p.m. UTC | #4
On 30/01/15 18:42, Max Reitz wrote:
> On 2015-01-30 at 10:41, Denis V. Lunev wrote:
>> On 30/01/15 17:58, Max Reitz wrote:
>>> On 2015-01-30 at 03:42, Denis V. Lunev wrote:
>>>> There is a possibility that we are extending our image and thus 
>>>> writing
>>>> zeroes beyond the end of the file. In this case we do not need to care
>>>> about the hole to make sure that there is no data in the file under
>>>> this offset (pre-condition to fallocate(0) to work). We could 
>>>> simply call
>>>> fallocate(0).
>>>>
>>>> This improves the performance of writing zeroes even on really old
>>>> platforms which do not have even FALLOC_FL_PUNCH_HOLE.
>>>>
>>>> Before the patch do_fallocate was used when either
>>>> CONFIG_FALLOCATE_PUNCH_HOLE or CONFIG_FALLOCATE_ZERO_RANGE are 
>>>> defined.
>>>> Now the story is different. CONFIG_FALLOCATE is defined when Linux
>>>> fallocate is defined, posix_fallocate is completely different story
>>>> (CONFIG_POSIX_FALLOCATE). CONFIG_FALLOCATE is mandatory prerequite
>>>> for both CONFIG_FALLOCATE_PUNCH_HOLE and CONFIG_FALLOCATE_ZERO_RANGE
>>>> thus we are on the safe side.
>>>>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Max Reitz <mreitz@redhat.com>
>>>> CC: Kevin Wolf <kwolf@redhat.com>
>>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>>> CC: Peter Lieven <pl@kamp.de>
>>>> CC: Fam Zheng <famz@redhat.com>
>>>> ---
>>>>   block/raw-posix.c | 14 +++++++++++++-
>>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>>> index 5a777e7..1c88ad8 100644
>>>> --- a/block/raw-posix.c
>>>> +++ b/block/raw-posix.c
>>>> @@ -147,6 +147,7 @@ typedef struct BDRVRawState {
>>>>       bool has_discard:1;
>>>>       bool has_write_zeroes:1;
>>>>       bool discard_zeroes:1;
>>>> +    bool has_fallocate;
>>>>       bool needs_alignment;
>>>>   } BDRVRawState;
>>>>   @@ -452,6 +453,7 @@ static int raw_open_common(BlockDriverState 
>>>> *bs, QDict *options,
>>>>       }
>>>>       if (S_ISREG(st.st_mode)) {
>>>>           s->discard_zeroes = true;
>>>> +        s->has_fallocate = true;
>>>
>>> This could be moved upwards where has_discard and has_write_zeroes 
>>> are initialized; but it won't matter in practice, I hope. Thus:
>>>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>> This does matter as has_discard and has_write_zeroes are bit fields
>> thus I can not insert something useful into the middle of those
>> fields.
>
> Right, but I did not mean the placement inside of the structure but 
> the placement of the initialization statement (s->has_fallocate = 
> true) in raw_open_common().
>
> Max
hmm, you are right. This is possible but I don't want
to have this bit set for block/character etc devices
even if they are not using this bit/code. With my
approach the assignment is made in a way to indicate
application area.

Thank you for a review :) It is somewhat difficult
to obtain feedback here in comparison with Linux
kernel.
diff mbox

Patch

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 5a777e7..1c88ad8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -147,6 +147,7 @@  typedef struct BDRVRawState {
     bool has_discard:1;
     bool has_write_zeroes:1;
     bool discard_zeroes:1;
+    bool has_fallocate;
     bool needs_alignment;
 } BDRVRawState;
 
@@ -452,6 +453,7 @@  static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
     if (S_ISREG(st.st_mode)) {
         s->discard_zeroes = true;
+        s->has_fallocate = true;
     }
     if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -902,7 +904,7 @@  static int translate_err(int err)
     return err;
 }
 
-#if defined(CONFIG_FALLOCATE_PUNCH_HOLE) || defined(CONFIG_FALLOCATE_ZERO_RANGE)
+#ifdef CONFIG_FALLOCATE
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
 {
     do {
@@ -965,6 +967,16 @@  static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
     }
 #endif
 
+#ifdef CONFIG_FALLOCATE
+    if (s->has_fallocate && aiocb->aio_offset >= bdrv_getlength(aiocb->bs)) {
+        int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+        if (ret == 0 || ret != -ENOTSUP) {
+            return ret;
+        }
+        s->has_fallocate = false;
+    }
+#endif
+
     return -ENOTSUP;
 }