diff mbox

[6/8] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes

Message ID 1419931250-19259-7-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Dec. 30, 2014, 9:20 a.m. UTC
This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.

Simple fallocate(0) will extend file with zeroes when appropriate in the
middle of the file if there is a hole there and at the end of the file.
Unfortunately fallocate(0) does not drop the content of the file if
there is a data on this offset. Therefore to make the situation consistent
we should drop the data beforehand. This is done using FALLOC_FL_PUNCH_HOLE

This should increase the performance a bit for not-so-modern kernels or for
filesystems which do not support FALLOC_FL_ZERO_RANGE.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Peter Lieven <pl@kamp.de>
---
 block/raw-posix.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Fam Zheng Jan. 5, 2015, 7:02 a.m. UTC | #1
On Tue, 12/30 12:20, Denis V. Lunev wrote:
> This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.
> 
> Simple fallocate(0) will extend file with zeroes when appropriate in the
> middle of the file if there is a hole there and at the end of the file.
> Unfortunately fallocate(0) does not drop the content of the file if
> there is a data on this offset. Therefore to make the situation consistent
> we should drop the data beforehand. This is done using FALLOC_FL_PUNCH_HOLE
> 
> This should increase the performance a bit for not-so-modern kernels or for
> filesystems which do not support FALLOC_FL_ZERO_RANGE.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Peter Lieven <pl@kamp.de>
> ---
>  block/raw-posix.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 7866d31..96a8678 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -968,6 +968,23 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>  #endif
>  
>      s->has_write_zeroes = false;
> +
> +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> +    if (s->has_discard) {
> +        int ret;
> +        ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +                           aiocb->aio_offset, aiocb->aio_nbytes);
> +        if (ret < 0) {
> +            if (ret == -ENOTSUP) {
> +                s->has_discard = false;
> +            }
> +            return ret;
> +        }
> +        return do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);

Why is fallocate(0) necessary here? The manpage says:

Deallocating file space
	Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38)
	in mode deallocates space (i.e., creates a hole) in the byte range
	starting at offset and continuing for len bytes.  Within the specified
	range, partial file system blocks are zeroed, and whole file system
	blocks are removed from the file.  After a successful call, subsequent
	reads from this range will return zeroes.

So the data are already zeroes after FALLOC_FL_PUNCH_HOLE.

Fam

> +    }
> +#endif
> +
> +    s->has_discard = false;
>      return -ENOTSUP;
>  }
>  
> -- 
> 1.9.1
> 
>
Denis V. Lunev Jan. 5, 2015, 11:14 a.m. UTC | #2
On 05/01/15 10:02, Fam Zheng wrote:
> On Tue, 12/30 12:20, Denis V. Lunev wrote:
>> This sequence works efficiently if FALLOC_FL_ZERO_RANGE is not supported.
>>
>> Simple fallocate(0) will extend file with zeroes when appropriate in the
>> middle of the file if there is a hole there and at the end of the file.
>> Unfortunately fallocate(0) does not drop the content of the file if
>> there is a data on this offset. Therefore to make the situation consistent
>> we should drop the data beforehand. This is done using FALLOC_FL_PUNCH_HOLE
>>
>> This should increase the performance a bit for not-so-modern kernels or for
>> filesystems which do not support FALLOC_FL_ZERO_RANGE.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Peter Lieven <pl@kamp.de>
>> ---
>>   block/raw-posix.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 7866d31..96a8678 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -968,6 +968,23 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>>   #endif
>>   
>>       s->has_write_zeroes = false;
>> +
>> +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>> +    if (s->has_discard) {
>> +        int ret;
>> +        ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>> +                           aiocb->aio_offset, aiocb->aio_nbytes);
>> +        if (ret < 0) {
>> +            if (ret == -ENOTSUP) {
>> +                s->has_discard = false;
>> +            }
>> +            return ret;
>> +        }
>> +        return do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
> Why is fallocate(0) necessary here? The manpage says:
>
> Deallocating file space
> 	Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux 2.6.38)
> 	in mode deallocates space (i.e., creates a hole) in the byte range
> 	starting at offset and continuing for len bytes.  Within the specified
> 	range, partial file system blocks are zeroed, and whole file system
> 	blocks are removed from the file.  After a successful call, subsequent
> 	reads from this range will return zeroes.
>
> So the data are already zeroes after FALLOC_FL_PUNCH_HOLE.
>
> Fam
These zeroes will have different properties.  FALLOC_FL_PUNCH_HOLE
deallocates disk space on that range. Thus this call work work in a
different way in respect to the method of zero writing. This does not
look good for me.

The function should keep the file in the same state using all
possible internal implementations. If the caller wants to use 
FALLOC_FL_PUNCH_HOLE
alone, it should call handle_aiocb_discard method.
diff mbox

Patch

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7866d31..96a8678 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -968,6 +968,23 @@  static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
 #endif
 
     s->has_write_zeroes = false;
+
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+    if (s->has_discard) {
+        int ret;
+        ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+                           aiocb->aio_offset, aiocb->aio_nbytes);
+        if (ret < 0) {
+            if (ret == -ENOTSUP) {
+                s->has_discard = false;
+            }
+            return ret;
+        }
+        return do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
+    }
+#endif
+
+    s->has_discard = false;
     return -ENOTSUP;
 }