mbox series

[RFC,0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE to reflect extents allocation in block device internals

Message ID 157599668662.12112.10184894900037871860.stgit@localhost.localdomain
Headers show
Series block,ext4: Introduce REQ_OP_ASSIGN_RANGE to reflect extents allocation in block device internals | expand

Message

Kirill Tkhai Dec. 10, 2019, 4:56 p.m. UTC
Information about continuous extent placement may be useful
for some block devices. Say, distributed network filesystems,
which provide block device interface, may use this information
for better blocks placement over the nodes in their cluster,
and for better performance. Block devices, which map a file
on another filesystem (loop), may request the same length extent
on underlining filesystem for less fragmentation and for batching
allocation requests. Also, hypervisors like QEMU may use this
information for optimization of cluster allocations.

This patchset introduces REQ_OP_ASSIGN_RANGE, which is going
to be used for forwarding user's fallocate(0) requests into
block device internals. It rather similar to existing
REQ_OP_DISCARD, REQ_OP_WRITE_ZEROES, etc. The corresponding
exported primitive is called blkdev_issue_assign_range().
See [1/3] for the details.

Patch [2/3] teaches loop driver to handle REQ_OP_ASSIGN_RANGE
requests by calling fallocate(0).

Patch [3/3] makes ext4 to notify a block device about fallocate(0).

Here is a simple test I did:
https://gist.github.com/tkhai/5b788651cdb74c1dbff3500745878856

I attached a file on ext4 to loop. Then, created ext4 partition
on loop device and started the test in the partition. Direct-io
is enabled on loop.

The test fallocates 4G file and writes from some offset with
given step, then it chooses another offset and repeats. After
the test all the blocks in the file become written.

The results shows that batching extents-assigning requests improves
the performance:

Before patchset: real ~ 1min 27sec
After patchset:  real ~ 1min 16sec (18% better)

Ordinary fallocate() before writes improves the performance
by batching the requests. These results just show, the same
is in case of forwarding extents information to underlining
filesystem.
---

Kirill Tkhai (3):
      block: Add support for REQ_OP_ASSIGN_RANGE operation
      loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0)
      ext4: Notify block device about fallocate(0)-assigned blocks


 block/blk-core.c          |    4 +++
 block/blk-lib.c           |   70 +++++++++++++++++++++++++++++++++++++++++++++
 block/blk-merge.c         |   21 ++++++++++++++
 block/bounce.c            |    1 +
 drivers/block/loop.c      |    5 +++
 fs/ext4/ext4.h            |    1 +
 fs/ext4/extents.c         |   11 ++++++-
 include/linux/bio.h       |    3 ++
 include/linux/blk_types.h |    2 +
 include/linux/blkdev.h    |   29 +++++++++++++++++++
 10 files changed, 145 insertions(+), 2 deletions(-)

--
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

Comments

Chaitanya Kulkarni Dec. 11, 2019, 7:42 a.m. UTC | #1
> Here is a simple test I did:
> https://gist.github.com/tkhai/5b788651cdb74c1dbff3500745878856
>

Somehow I'm not able to open this link, can you please share results
in plain text on the email ?
Kirill Tkhai Dec. 11, 2019, 8:50 a.m. UTC | #2
On 11.12.2019 10:42, Chaitanya Kulkarni wrote:
>> Here is a simple test I did:
>> https://gist.github.com/tkhai/5b788651cdb74c1dbff3500745878856
>>
> 
> Somehow I'm not able to open this link, can you please share results
> in plain text on the email ?

#define _GNU_SOURCE
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>

#define BLOCK_SIZE 4096
#define STEP (BLOCK_SIZE * 16)
#define SIZE (4ULL * 1024 * 1024 * 1024)

int main(int argc)
{
	int fd, step, ret = 0;
	unsigned long i;
	void *buf;

	if (posix_memalign(&buf, BLOCK_SIZE, SIZE)) {
		perror("alloc");
		exit(1);
	}

	fd = open("file2.img", O_RDWR|O_CREAT|O_DIRECT);
	if (fd < 0) {
		perror("open");
		exit(1);
	}

	if (ftruncate(fd, SIZE)) {
		perror("ftruncate");
		exit(1);
	}

	ret = fallocate(fd, 0, 0, SIZE);
	if (ret) {
		perror("fallocate");
		exit(1);
	}
	
	for (step = STEP - BLOCK_SIZE; step >= 0; step -= BLOCK_SIZE) {
		printf("step=%u\n", step);
		for (i = step; i < SIZE; i += STEP) {
			errno = 0;
			if (pwrite(fd, buf, BLOCK_SIZE, i) != BLOCK_SIZE) {
				perror("pwrite");
				exit(1);
			}
		}

		if (fsync(fd)) {
			perror("fsync");
			exit(1);
		}
	}

	return 0;
}
Kirill Tkhai Dec. 17, 2019, 2:16 p.m. UTC | #3
Hi!

Any comments on this?

Thanks

On 10.12.2019 19:56, Kirill Tkhai wrote:
> Information about continuous extent placement may be useful
> for some block devices. Say, distributed network filesystems,
> which provide block device interface, may use this information
> for better blocks placement over the nodes in their cluster,
> and for better performance. Block devices, which map a file
> on another filesystem (loop), may request the same length extent
> on underlining filesystem for less fragmentation and for batching
> allocation requests. Also, hypervisors like QEMU may use this
> information for optimization of cluster allocations.
> 
> This patchset introduces REQ_OP_ASSIGN_RANGE, which is going
> to be used for forwarding user's fallocate(0) requests into
> block device internals. It rather similar to existing
> REQ_OP_DISCARD, REQ_OP_WRITE_ZEROES, etc. The corresponding
> exported primitive is called blkdev_issue_assign_range().
> See [1/3] for the details.
> 
> Patch [2/3] teaches loop driver to handle REQ_OP_ASSIGN_RANGE
> requests by calling fallocate(0).
> 
> Patch [3/3] makes ext4 to notify a block device about fallocate(0).
> 
> Here is a simple test I did:
> https://gist.github.com/tkhai/5b788651cdb74c1dbff3500745878856
> 
> I attached a file on ext4 to loop. Then, created ext4 partition
> on loop device and started the test in the partition. Direct-io
> is enabled on loop.
> 
> The test fallocates 4G file and writes from some offset with
> given step, then it chooses another offset and repeats. After
> the test all the blocks in the file become written.
> 
> The results shows that batching extents-assigning requests improves
> the performance:
> 
> Before patchset: real ~ 1min 27sec
> After patchset:  real ~ 1min 16sec (18% better)
> 
> Ordinary fallocate() before writes improves the performance
> by batching the requests. These results just show, the same
> is in case of forwarding extents information to underlining
> filesystem.
> ---
> 
> Kirill Tkhai (3):
>       block: Add support for REQ_OP_ASSIGN_RANGE operation
>       loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0)
>       ext4: Notify block device about fallocate(0)-assigned blocks
> 
> 
>  block/blk-core.c          |    4 +++
>  block/blk-lib.c           |   70 +++++++++++++++++++++++++++++++++++++++++++++
>  block/blk-merge.c         |   21 ++++++++++++++
>  block/bounce.c            |    1 +
>  drivers/block/loop.c      |    5 +++
>  fs/ext4/ext4.h            |    1 +
>  fs/ext4/extents.c         |   11 ++++++-
>  include/linux/bio.h       |    3 ++
>  include/linux/blk_types.h |    2 +
>  include/linux/blkdev.h    |   29 +++++++++++++++++++
>  10 files changed, 145 insertions(+), 2 deletions(-)
> 
> --
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>