diff mbox

[v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc"

Message ID 1323327594-4914-1-git-send-email-hao.bigrat@gmail.com
State Accepted, archived
Headers show

Commit Message

Robin Dong Dec. 8, 2011, 6:59 a.m. UTC
From: Robin Dong <sanbai@taobao.com>

We found performance regression when using bigalloc with "nodelalloc"  (1MB cluster size):

1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024

The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
"dd" will only cost lesss than 1 second.

The reason is:  when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
in cluster because no buffer is "delayed".
A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
severely hurts the performance.

Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".

Signed-off-by: Robin Dong <sanbai@taobao.com>
---
 fs/ext4/extents.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

Comments

Yongqiang Yang Dec. 8, 2011, 8:36 a.m. UTC | #1
On Thu, Dec 8, 2011 at 2:59 PM, Robin Dong <hao.bigrat@gmail.com> wrote:
> From: Robin Dong <sanbai@taobao.com>
>
> We found performance regression when using bigalloc with "nodelalloc"  (1MB cluster size):
>
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
>
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
>
> The reason is:  when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
Looks good to me.

I think delayed extent tree can help a lot when a cluster has hundreds
of pages in delalloc case.

Hi Ted,

Any plans on merging delayed extent tree patches?

Yongqiang.
>
> Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
>
> Signed-off-by: Robin Dong <sanbai@taobao.com>
> ---
>  fs/ext4/extents.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 61fa9e1..60f5f25 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3282,6 +3282,9 @@ static int ext4_find_delalloc_range(struct inode *inode,
>        ext4_lblk_t i, pg_lblk;
>        pgoff_t index;
>
> +       if (!test_opt(inode->i_sb, DELALLOC))
> +               return 0;
> +
>        /* reverse search wont work if fs block size is less than page size */
>        if (inode->i_blkbits < PAGE_CACHE_SHIFT)
>                search_hint_reverse = 0;
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Dec. 19, 2011, 3:39 p.m. UTC | #2
On Thu, Dec 08, 2011 at 02:59:54PM +0800, Robin Dong wrote:
> From: Robin Dong <sanbai@taobao.com>
> 
> We found performance regression when using bigalloc with "nodelalloc"  (1MB cluster size):
> 
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
> 
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
> 
> The reason is:  when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
> 
> Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
> 
> Signed-off-by: Robin Dong <sanbai@taobao.com>

Thanks, applied.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..60f5f25 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3282,6 +3282,9 @@  static int ext4_find_delalloc_range(struct inode *inode,
 	ext4_lblk_t i, pg_lblk;
 	pgoff_t index;
 
+	if (!test_opt(inode->i_sb, DELALLOC))
+		return 0;
+
 	/* reverse search wont work if fs block size is less than page size */
 	if (inode->i_blkbits < PAGE_CACHE_SHIFT)
 		search_hint_reverse = 0;