backporting "ext4: inplace xattr block update fails to deduplicate blocks" to LTS kernels?

Message ID f3db4b3f-9cb6-c96c-8d31-f2a5f8b4082c@nokia.com
State New
Headers show
Series
  • backporting "ext4: inplace xattr block update fails to deduplicate blocks" to LTS kernels?
Related show

Commit Message

Tommi Rantala Feb. 19, 2018, 1:26 p.m.
Hi,

4.9 (and earlier) LTS kernels are missing this:

commit ec00022030da5761518476096626338bd67df57a
Author: Tahsin Erdogan <tahsin@google.com>
Date:   Sat Aug 5 22:41:42 2017 -0400

     ext4: inplace xattr block update fails to deduplicate blocks


OK to backport it?
I tested it briefly in 4.9, seems to work.

One of our testers noticed a glusterfs performance regression when going 
from 4.4 to 4.9, caused by the duplicated blocks.

In I understand everything correctly, in 4.4 mbcache uses the block 
number in the hash table bucket calculation, and the hash table is 
populated quite evenly even if there are duplicates. So the mbcache is fast.

But in later kernels mbcache puts all the duplicate entries into a 
single bucket. As the entries are stored in one big linked list, this 
obviously makes the mbcache slow.


I tested this in 4.9 (which still has the ext4_xattr_rehash() call that 
got eliminated in commit "ext4: eliminate xattr entry e_hash 
recalculation for removes"):


                         unlock_buffer(bs->bh);
@@ -959,6 +957,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode 
*inode,
                 } else if (bs->bh && s->base == bs->bh->b_data) {
                         /* We were modifying this block in-place. */
                         ea_bdebug(bs->bh, "keeping this block");
+                       ext4_xattr_cache_insert(ext4_mb_cache, bs->bh);
                         new_bh = bs->bh;
                         get_bh(new_bh);
                 } else {



Tommi

Comments

Greg Kroah-Hartman Feb. 21, 2018, 11:40 a.m. | #1
On Mon, Feb 19, 2018 at 03:26:37PM +0200, Tommi Rantala wrote:
> Hi,
> 
> 4.9 (and earlier) LTS kernels are missing this:
> 
> commit ec00022030da5761518476096626338bd67df57a
> Author: Tahsin Erdogan <tahsin@google.com>
> Date:   Sat Aug 5 22:41:42 2017 -0400
> 
>     ext4: inplace xattr block update fails to deduplicate blocks
> 
> 
> OK to backport it?
> I tested it briefly in 4.9, seems to work.
> 
> One of our testers noticed a glusterfs performance regression when going
> from 4.4 to 4.9, caused by the duplicated blocks.
> 
> In I understand everything correctly, in 4.4 mbcache uses the block number
> in the hash table bucket calculation, and the hash table is populated quite
> evenly even if there are duplicates. So the mbcache is fast.
> 
> But in later kernels mbcache puts all the duplicate entries into a single
> bucket. As the entries are stored in one big linked list, this obviously
> makes the mbcache slow.
> 
> 
> I tested this in 4.9 (which still has the ext4_xattr_rehash() call that got
> eliminated in commit "ext4: eliminate xattr entry e_hash recalculation for
> removes"):

I need an ack from the ext4 maintainers before I can take this...
Theodore Y. Ts'o Feb. 21, 2018, 3:56 p.m. | #2
On Wed, Feb 21, 2018 at 12:40:00PM +0100, Greg Kroah-Hartman wrote:
> On Mon, Feb 19, 2018 at 03:26:37PM +0200, Tommi Rantala wrote:
> > 
> > OK to backport it?
> > I tested it briefly in 4.9, seems to work.

It looks sane, but it would be nice if I can get people who are
backporting ext4 patches to make sure there are no regressions using
one of kvm-xfstests[1] or gce-xfstests[2][3].....

[1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
[2] https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
[3] https://thunk.org/gce-xfstests

I do run regression tests[4] on stable kernels when I have time, but
it scales much better when other people can help.

[4] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/tag/?h=ext4-4.9.54-1

> I need an ack from the ext4 maintainers before I can take this...

Greg, you can go ahead and take this, but in the future I'd appreciate
it if ext4 backporters could at least run a smoke test (which takes
less than 15 minutes on GCE) before and after the patch, and report no
test regressions.

More aggressive testers can and should run "{kvm,gce}-xfstests auto"
which take overnight, and the really adventurous can use the
lightweight test manager (ltm) which will break up the test run the
job across multiple VM's and do the job in about 2.5 hours or so.  :-)

		      	     	     - Ted
Tommi Rantala Feb. 22, 2018, 1:37 p.m. | #3
On 21.02.2018 17:56, Theodore Ts'o wrote:
> On Wed, Feb 21, 2018 at 12:40:00PM +0100, Greg Kroah-Hartman wrote:
>> On Mon, Feb 19, 2018 at 03:26:37PM +0200, Tommi Rantala wrote:
>>>
>>> OK to backport it?
>>> I tested it briefly in 4.9, seems to work.
> 
> It looks sane, but it would be nice if I can get people who are
> backporting ext4 patches to make sure there are no regressions using
> one of kvm-xfstests[1] or gce-xfstests[2][3].....
> 
> [1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
> [2] https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md
> [3] https://thunk.org/gce-xfstests
> 
> I do run regression tests[4] on stable kernels when I have time, but
> it scales much better when other people can help.
> 
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/tag/?h=ext4-4.9.54-1
> 
>> I need an ack from the ext4 maintainers before I can take this...
> 
> Greg, you can go ahead and take this, but in the future I'd appreciate
> it if ext4 backporters could at least run a smoke test (which takes
> less than 15 minutes on GCE) before and after the patch, and report no
> test regressions.

Thanks for the instructions!

Smoke test results 4.9.82 with and without the patch (attached, to avoid 
email client mangling it), no new failures:

-------------------- Summary report
KERNEL:    kernel 4.9.82-xfstests #2 SMP Thu Feb 22 14:58:27 EET 2018 x86_64
CPUS:      2
MEM:       1989.2

ext4/4k: 271 tests, 7 failures, 34 skipped, 737 seconds
   Failures: generic/081 generic/383 generic/384 generic/386
     generic/441 generic/451 generic/472
Totals: 271 tests, 34 skipped, 7 failures, 0 errors, 685s


-------------------- Summary report
KERNEL:    kernel 4.9.82-xfstests-00001-gb98ae0251413 #1 SMP Thu Feb 22 
14:31:01 EET 2018 x86_64
CPUS:      2
MEM:       1989.2

ext4/4k: 271 tests, 7 failures, 34 skipped, 749 seconds
   Failures: generic/081 generic/383 generic/384 generic/386
     generic/441 generic/451 generic/472
Totals: 271 tests, 34 skipped, 7 failures, 0 errors, 694s

FSTESTVER: e2fsprogs v1.43.6-85-g7595699d0 (Wed, 6 Sep 2017 22:04:14 -0400)
FSTESTVER: fio  fio-3.2 (Fri, 3 Nov 2017 15:23:49 -0600)
FSTESTVER: quota  4d81e8b (Mon, 16 Oct 2017 09:42:44 +0200)
FSTESTVER: stress-ng 977ae35 (Wed, 6 Sep 2017 23:45:03 -0400)
FSTESTVER: xfsprogs v4.14.0-rc2-1-g19ca9b0b (Mon, 27 Nov 2017 10:56:21 
-0600)
FSTESTVER: xfstests-bld ff7b8c2 (Wed, 13 Dec 2017 21:24:24 -0500)
FSTESTVER: xfstests linux-v3.8-1832-gafeee2d9 (Sun, 31 Dec 2017 13:35:28 
-0500)
FSTESTCFG: 4k
FSTESTSET: -g quick
FSTESTOPT: aex


Tommi
From b98ae025141361b9e92fdd470dfd2314a64a47d0 Mon Sep 17 00:00:00 2001
From: Tahsin Erdogan <tahsin@google.com>
Date: Sat, 5 Aug 2017 22:41:42 -0400
Subject: [PATCH] ext4: inplace xattr block update fails to deduplicate blocks

commit ec00022030da5761518476096626338bd67df57a upstream.

When an xattr block has a single reference, block is updated inplace
and it is reinserted to the cache. Later, a cache lookup is performed
to see whether an existing block has the same contents. This cache
lookup will most of the time return the just inserted entry so
deduplication is not achieved.

Running the following test script will produce two xattr blocks which
can be observed in "File ACL: " line of debugfs output:

  mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G
  mount /dev/sdb /mnt/sdb

  touch /mnt/sdb/{x,y}

  setfattr -n user.1 -v aaa /mnt/sdb/x
  setfattr -n user.2 -v bbb /mnt/sdb/x

  setfattr -n user.1 -v aaa /mnt/sdb/y
  setfattr -n user.2 -v bbb /mnt/sdb/y

  debugfs -R 'stat x' /dev/sdb | cat
  debugfs -R 'stat y' /dev/sdb | cat

This patch defers the reinsertion to the cache so that we can locate
other blocks with the same contents.

Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
---
 fs/ext4/xattr.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 3eeed8f0aa06..3fadfabcac39 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -837,8 +837,6 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 				if (!IS_LAST_ENTRY(s->first))
 					ext4_xattr_rehash(header(s->base),
 							  s->here);
-				ext4_xattr_cache_insert(ext4_mb_cache,
-					bs->bh);
 			}
 			ext4_xattr_block_csum_set(inode, bs->bh);
 			unlock_buffer(bs->bh);
@@ -959,6 +957,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 		} else if (bs->bh && s->base == bs->bh->b_data) {
 			/* We were modifying this block in-place. */
 			ea_bdebug(bs->bh, "keeping this block");
+			ext4_xattr_cache_insert(ext4_mb_cache, bs->bh);
 			new_bh = bs->bh;
 			get_bh(new_bh);
 		} else {

Patch

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 3eeed8f0aa06..3fadfabcac39 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -837,8 +837,6 @@  ext4_xattr_block_set(handle_t *handle, struct inode 
*inode,
                                 if (!IS_LAST_ENTRY(s->first))
                                         ext4_xattr_rehash(header(s->base),
                                                           s->here);
-                               ext4_xattr_cache_insert(ext4_mb_cache,
-                                       bs->bh);
                         }
                         ext4_xattr_block_csum_set(inode, bs->bh);