diff mbox

[2/6] ext4: Move LRU list handling into extent status code

Message ID 1415961957-21621-3-git-send-email-jack@suse.cz
State Superseded, archived
Headers show

Commit Message

Jan Kara Nov. 14, 2014, 10:45 a.m. UTC
Currently callers adding extents to extent status tree were responsible
for adding the inode to LRU list. This is error prone and puts LRU list
handling in unnecessarily many places.

Just add inode to LRU automatically when the first non-delay extent is
added to the tree and remove inode from LRU when the last non-delay
extent is removed.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c        |  2 --
 fs/ext4/extents_status.c | 10 ++++++----
 fs/ext4/extents_status.h |  2 --
 fs/ext4/inode.c          |  2 --
 fs/ext4/ioctl.c          |  2 --
 fs/ext4/super.c          |  1 -
 6 files changed, 6 insertions(+), 13 deletions(-)

Comments

Theodore Ts'o Nov. 23, 2014, 5:48 a.m. UTC | #1
On Fri, Nov 14, 2014 at 11:45:53AM +0100, Jan Kara wrote:
> Currently callers adding extents to extent status tree were responsible
> for adding the inode to LRU list. This is error prone and puts LRU list
> handling in unnecessarily many places.
> 
> Just add inode to LRU automatically when the first non-delay extent is
> added to the tree and remove inode from LRU when the last non-delay
> extent is removed.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

While trying to bisect the ext4/003 regression, I found that patches
one and two applied to gether causes the following deadlock (very
early in the boot sequence; before xfstests is started):

[   24.680699] 
[   24.681110] =============================================
[   24.682755] [ INFO: possible recursive locking detected ]
[   24.683344] 3.18.0-rc3-00538-gc12044b #2369 Not tainted
[   24.683344] ---------------------------------------------
[   24.683344] runtests.sh/2772 is trying to acquire lock:
[   24.683344]  (&(&sbi->s_es_lru_lock)->rlock){+.+...}, at: [<c02e870b>] ext4_es_free_extent+0x6f/0xc5
[   24.683344] 
[   24.683344] but task is already holding lock:
[   24.683344]  (&(&sbi->s_es_lru_lock)->rlock){+.+...}, at: [<c02e8857>] __ext4_es_shrink+0x3a/0x346
[   24.683344] 
[   24.683344] other info that might help us debug this:
[   24.683344]  Possible unsafe locking scenario:
[   24.683344] 
[   24.683344]        CPU0
[   24.683344]        ----
[   24.683344]   lock(&(&sbi->s_es_lru_lock)->rlock);
[   24.683344]   lock(&(&sbi->s_es_lru_lock)->rlock);
[   24.683344] 
[   24.683344]  *** DEADLOCK ***
[   24.683344] 
[   24.683344]  May be due to missing lock nesting notation
[   24.683344] 
[   24.683344] 4 locks held by runtests.sh/2772:
[   24.683344]  #0:  (sb_writers#5){.+.+.+}, at: [<c024a69d>] file_start_write+0x24/0x26
[   24.683344]  #1:  (shrinker_rwsem){++++..}, at: [<c021cca2>] shrink_slab+0x29/0xcd
[   24.683344]  #2:  (&(&sbi->s_es_lru_lock)->rlock){+.+...}, at: [<c02e8857>] __ext4_es_shrink+0x3a/0x346
[   24.683344]  #3:  (&ei->i_es_lock){++++..}, at: [<c02e8900>] __ext4_es_shrink+0xe3/0x346
[   24.683344] 
[   24.683344] stack backtrace:
[   24.683344] CPU: 1 PID: 2772 Comm: runtests.sh Not tainted 3.18.0-rc3-00538-gc12044b #2369
[   24.683344] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   24.683344]  00000000 00000000 f3f55d4c c091b746 f343e0d0 f3f55dc0 c019fa14 c0c0cc77
[   24.683344]  c0c0d6ea c0c0cb76 00000002 f343e694 c149d110 00003900 00000000 24b02008
[   24.683344]  4c422213 f343e694 00000000 00012581 00000004 f343e0d0 f343e6b8 f343e6bc
[   24.683344] Call Trace:
[   24.683344]  [<c091b746>] dump_stack+0x48/0x60
[   24.683344]  [<c019fa14>] __lock_acquire+0xb7d/0xcd9
[   24.683344]  [<c019f1f6>] ? __lock_acquire+0x35f/0xcd9
[   24.683344]  [<c019feb8>] lock_acquire+0xe7/0x15e
[   24.683344]  [<c02e870b>] ? ext4_es_free_extent+0x6f/0xc5
[   24.683344]  [<c0923cd1>] _raw_spin_lock+0x2a/0x5a
[   24.683344]  [<c02e870b>] ? ext4_es_free_extent+0x6f/0xc5
[   24.683344]  [<c02e870b>] ext4_es_free_extent+0x6f/0xc5
[   24.683344]  [<c02e87ff>] __es_try_to_reclaim_extents+0x9e/0xbc
[   24.683344]  [<c02e890e>] __ext4_es_shrink+0xf1/0x346
[   24.683344]  [<c02e8c38>] ext4_es_scan+0xd5/0x1fc
[   24.683344]  [<c021c8c5>] shrink_slab_node+0x196/0x321
[   24.683344]  [<c021ccd8>] shrink_slab+0x5f/0xcd
[   24.683344]  [<c0287306>] ? drop_pagecache_sb+0xbf/0xbf
[   24.683344]  [<c0287382>] drop_caches_sysctl_handler+0x7c/0xd2
[   24.683344]  [<c0296647>] proc_sys_call_handler+0x7b/0x9c
[   24.683344]  [<c0296668>] ? proc_sys_call_handler+0x9c/0x9c
[   24.683344]  [<c029667a>] proc_sys_write+0x12/0x14
[   24.683344]  [<c024ae85>] vfs_write+0x8c/0xf7
[   24.683344]  [<c024b21c>] SyS_write+0x4f/0x7c
[   24.683344]  [<c0924a2a>] syscall_call+0x7/0x7

We end up removing all of the LRU code in the next patch in the patch
series, so this is not a major problem, but it can trip people up when
they are doing bisects.  It may be simplest just to combine patches 2
and 3 in this series.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Nov. 24, 2014, 9:32 a.m. UTC | #2
On Sun 23-11-14 00:48:02, Ted Tso wrote:
> On Fri, Nov 14, 2014 at 11:45:53AM +0100, Jan Kara wrote:
> > Currently callers adding extents to extent status tree were responsible
> > for adding the inode to LRU list. This is error prone and puts LRU list
> > handling in unnecessarily many places.
> > 
> > Just add inode to LRU automatically when the first non-delay extent is
> > added to the tree and remove inode from LRU when the last non-delay
> > extent is removed.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> While trying to bisect the ext4/003 regression, I found that patches
> one and two applied to gether causes the following deadlock (very
> early in the boot sequence; before xfstests is started):
> 
> [   24.680699] 
> [   24.681110] =============================================
> [   24.682755] [ INFO: possible recursive locking detected ]
> [   24.683344] 3.18.0-rc3-00538-gc12044b #2369 Not tainted
> [   24.683344] ---------------------------------------------
> [   24.683344] runtests.sh/2772 is trying to acquire lock:
> [   24.683344]  (&(&sbi->s_es_lru_lock)->rlock){+.+...}, at: [<c02e870b>] ext4_es_free_extent+0x6f/0xc5
> [   24.683344] 
> [   24.683344] but task is already holding lock:
> [   24.683344]  (&(&sbi->s_es_lru_lock)->rlock){+.+...}, at: [<c02e8857>] __ext4_es_shrink+0x3a/0x346
> [   24.683344] 
> [   24.683344] other info that might help us debug this:
> [   24.683344]  Possible unsafe locking scenario:
> [   24.683344] 
> [   24.683344]        CPU0
> [   24.683344]        ----
> [   24.683344]   lock(&(&sbi->s_es_lru_lock)->rlock);
> [   24.683344]   lock(&(&sbi->s_es_lru_lock)->rlock);
> [   24.683344] 
> [   24.683344]  *** DEADLOCK ***
> [   24.683344] 
> [   24.683344]  May be due to missing lock nesting notation
> [   24.683344] 
> [   24.683344] 4 locks held by runtests.sh/2772:
> [   24.683344]  #0:  (sb_writers#5){.+.+.+}, at: [<c024a69d>] file_start_write+0x24/0x26
> [   24.683344]  #1:  (shrinker_rwsem){++++..}, at: [<c021cca2>] shrink_slab+0x29/0xcd
> [   24.683344]  #2:  (&(&sbi->s_es_lru_lock)->rlock){+.+...}, at: [<c02e8857>] __ext4_es_shrink+0x3a/0x346
> [   24.683344]  #3:  (&ei->i_es_lock){++++..}, at: [<c02e8900>] __ext4_es_shrink+0xe3/0x346
> [   24.683344] 
> [   24.683344] stack backtrace:
> [   24.683344] CPU: 1 PID: 2772 Comm: runtests.sh Not tainted 3.18.0-rc3-00538-gc12044b #2369
> [   24.683344] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   24.683344]  00000000 00000000 f3f55d4c c091b746 f343e0d0 f3f55dc0 c019fa14 c0c0cc77
> [   24.683344]  c0c0d6ea c0c0cb76 00000002 f343e694 c149d110 00003900 00000000 24b02008
> [   24.683344]  4c422213 f343e694 00000000 00012581 00000004 f343e0d0 f343e6b8 f343e6bc
> [   24.683344] Call Trace:
> [   24.683344]  [<c091b746>] dump_stack+0x48/0x60
> [   24.683344]  [<c019fa14>] __lock_acquire+0xb7d/0xcd9
> [   24.683344]  [<c019f1f6>] ? __lock_acquire+0x35f/0xcd9
> [   24.683344]  [<c019feb8>] lock_acquire+0xe7/0x15e
> [   24.683344]  [<c02e870b>] ? ext4_es_free_extent+0x6f/0xc5
> [   24.683344]  [<c0923cd1>] _raw_spin_lock+0x2a/0x5a
> [   24.683344]  [<c02e870b>] ? ext4_es_free_extent+0x6f/0xc5
> [   24.683344]  [<c02e870b>] ext4_es_free_extent+0x6f/0xc5
> [   24.683344]  [<c02e87ff>] __es_try_to_reclaim_extents+0x9e/0xbc
> [   24.683344]  [<c02e890e>] __ext4_es_shrink+0xf1/0x346
> [   24.683344]  [<c02e8c38>] ext4_es_scan+0xd5/0x1fc
> [   24.683344]  [<c021c8c5>] shrink_slab_node+0x196/0x321
> [   24.683344]  [<c021ccd8>] shrink_slab+0x5f/0xcd
> [   24.683344]  [<c0287306>] ? drop_pagecache_sb+0xbf/0xbf
> [   24.683344]  [<c0287382>] drop_caches_sysctl_handler+0x7c/0xd2
> [   24.683344]  [<c0296647>] proc_sys_call_handler+0x7b/0x9c
> [   24.683344]  [<c0296668>] ? proc_sys_call_handler+0x9c/0x9c
> [   24.683344]  [<c029667a>] proc_sys_write+0x12/0x14
> [   24.683344]  [<c024ae85>] vfs_write+0x8c/0xf7
> [   24.683344]  [<c024b21c>] SyS_write+0x4f/0x7c
> [   24.683344]  [<c0924a2a>] syscall_call+0x7/0x7
> 
> We end up removing all of the LRU code in the next patch in the patch
> series, so this is not a major problem, but it can trip people up when
> they are doing bisects.  It may be simplest just to combine patches 2
> and 3 in this series.
  Thanks for spotting this. The changes are fairly different so I don't
like combining the two patches. But we can just swap them in the series
without too much trouble and that should resolve the problem as well.

								Honza
diff mbox

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0a97e039d0a0..923327719507 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4622,7 +4622,6 @@  out2:
 
 	trace_ext4_ext_map_blocks_exit(inode, flags, map,
 				       err ? err : allocated);
-	ext4_es_lru_add(inode);
 	return err ? err : allocated;
 }
 
@@ -5181,7 +5180,6 @@  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		error = ext4_fill_fiemap_extents(inode, start_blk,
 						 len_blks, fieinfo);
 	}
-	ext4_es_lru_add(inode);
 	return error;
 }
 
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 94e7855ae71b..339a9e1678e6 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -314,7 +314,8 @@  ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
 	 * We don't count delayed extent because we never try to reclaim them
 	 */
 	if (!ext4_es_is_delayed(es)) {
-		EXT4_I(inode)->i_es_lru_nr++;
+		if (!EXT4_I(inode)->i_es_lru_nr++)
+			ext4_es_lru_add(inode);
 		percpu_counter_inc(&EXT4_SB(inode->i_sb)->
 					s_es_stats.es_stats_lru_cnt);
 	}
@@ -333,7 +334,8 @@  static void ext4_es_free_extent(struct inode *inode, struct extent_status *es)
 	/* Decrease the lru counter when this es is not delayed */
 	if (!ext4_es_is_delayed(es)) {
 		BUG_ON(EXT4_I(inode)->i_es_lru_nr == 0);
-		EXT4_I(inode)->i_es_lru_nr--;
+		if (!--EXT4_I(inode)->i_es_lru_nr)
+			ext4_es_lru_del(inode);
 		percpu_counter_dec(&EXT4_SB(inode->i_sb)->
 					s_es_stats.es_stats_lru_cnt);
 	}
@@ -1225,7 +1227,7 @@  void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi)
 	unregister_shrinker(&sbi->s_es_shrinker);
 }
 
-void ext4_es_lru_add(struct inode *inode)
+static void ext4_es_lru_add(struct inode *inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
@@ -1241,7 +1243,7 @@  void ext4_es_lru_add(struct inode *inode)
 	spin_unlock(&sbi->s_es_lru_lock);
 }
 
-void ext4_es_lru_del(struct inode *inode)
+static void ext4_es_lru_del(struct inode *inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index efd5f970b501..9b5e7616b116 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -151,7 +151,5 @@  static inline void ext4_es_store_pblock_status(struct extent_status *es,
 
 extern int ext4_es_register_shrinker(struct ext4_sb_info *sbi);
 extern void ext4_es_unregister_shrinker(struct ext4_sb_info *sbi);
-extern void ext4_es_lru_add(struct inode *inode);
-extern void ext4_es_lru_del(struct inode *inode);
 
 #endif /* _EXT4_EXTENTS_STATUS_H */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 69601d288119..e9e580152720 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -491,7 +491,6 @@  int ext4_map_blocks(handle_t *handle, struct inode *inode,
 
 	/* Lookup extent status tree firstly */
 	if (ext4_es_lookup_extent(inode, map->m_lblk, &es)) {
-		ext4_es_lru_add(inode);
 		if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) {
 			map->m_pblk = ext4_es_pblock(&es) +
 					map->m_lblk - es.es_lblk;
@@ -1393,7 +1392,6 @@  static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 
 	/* Lookup extent status tree firstly */
 	if (ext4_es_lookup_extent(inode, iblock, &es)) {
-		ext4_es_lru_add(inode);
 		if (ext4_es_is_hole(&es)) {
 			retval = 0;
 			down_read(&EXT4_I(inode)->i_data_sem);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index bfda18a15592..f58a0d106726 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -78,8 +78,6 @@  static void swap_inode_data(struct inode *inode1, struct inode *inode2)
 	memswap(&ei1->i_disksize, &ei2->i_disksize, sizeof(ei1->i_disksize));
 	ext4_es_remove_extent(inode1, 0, EXT_MAX_BLOCKS);
 	ext4_es_remove_extent(inode2, 0, EXT_MAX_BLOCKS);
-	ext4_es_lru_del(inode1);
-	ext4_es_lru_del(inode2);
 
 	isize = i_size_read(inode1);
 	i_size_write(inode1, i_size_read(inode2));
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1eda6ab0ef9d..9f052d49553e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -972,7 +972,6 @@  void ext4_clear_inode(struct inode *inode)
 	dquot_drop(inode);
 	ext4_discard_preallocations(inode);
 	ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
-	ext4_es_lru_del(inode);
 	if (EXT4_I(inode)->jinode) {
 		jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
 					       EXT4_I(inode)->jinode);