Message ID | 87obfcs1x6.fsf@openvz.org |
---|---|
State | Accepted, archived |
Headers | show |
On Fri, Feb 22, 2013 at 09:17:57PM +0400, Dmitry Monakhov wrote: > > 301'th xfstests are failed due to : > commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f > Author: Zheng Liu <wenqing.lz@taobao.com> > Date: Mon Feb 18 00:29:59 2013 -0500 > > ext4: lookup block mapping in extent status tree > > TESTCASE: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0 Thanks for the heads up. I haven't updatied the xfstests I've been using yet, since I want to make sure I'm comparing apples and oranges during the merge window when I'm checking for regressions; I'll update my xfstests in a week or two after the merge window settles down, and then I'll rerun my baseline tests using the updated xfstests against 3.8.0 and 3.9-rc2 or 3.9-rc3. (And furthermore, these new xfstests aren't yet in xfstests upstream yet, right? Any comments from the xfstests maintainer about whether they are going to be willing to take your proposed new test cases?) So when you say this is a regression, I take it that this test #301 doesn't fail on commit d100eef2440f^, but it does fail on d100eef2440f, correct? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Dmitry, Thanks for pointing out. I will fold your patch into this commit, and take a close look at this bug right now. Regards, - Zheng On 02/23/2013 01:17 AM, Dmitry Monakhov wrote: > > 301'th xfstests are failed due to : > commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f > Author: Zheng Liu <wenqing.lz@taobao.com> > Date: Mon Feb 18 00:29:59 2013 -0500 > > ext4: lookup block mapping in extent status tree > > TESTCASE: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0 > > ------------[ cut here ]------------ > kernel BUG at fs/ext4/inode.c:1452! > invalid opcode: 0000 [#1] SMP > Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes\ > l ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod > CPU 2 > Pid: 2142, comm: fio Not tainted 3.8.0-rc3+ #41 /DQ67SW > RIP: 0010:[<ffffffff81333c7b>] [<ffffffff81333c7b>] mpage_da_submit_io+0x41b/0x5b0 > RSP: 0018:ffff8801c2837a48 EFLAGS: 00010202 > RAX: ffff880231e64bb0 RBX: ffff880231e64bb0 RCX: 0000000000000000 > RDX: 0000000000000003 RSI: 0000000000000001 RDI: ffffffff821372d8 > RBP: ffff8801c2837bb8 R08: ffff8801c2837df8 R09: 0000000000000001 > R10: ffff880231e64bb1 R11: 0000000000000003 R12: ffffea00076f5c28 > R13: 00000000000026d0 R14: ffff8801c2837be8 R15: 00000000000ea6d0 > FS: 00007fefeaeb5700(0000) GS:ffff88023d800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fefdf10299c CR3: 0000000209154000 CR4: 00000000000407e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process fio (pid: 2142, threadinfo ffff8801c2836000, task ffff8801d3c48580) > Stack: > ffffffff82030f00 ffff8801c2837b58 ffff8801c2837df8 0000000000000000 > ffff880231e64bb1 0000000000000001 ffff880231e64bb0 000000002ba68688 > ffff8801c2837ae8 00000000000d4800 ffff880231f1cb98 00000000000026d0 > Call Trace: > [<ffffffff81339769>] mpage_da_map_and_submit+0x369/0x3d0 > [<ffffffff81339acb>] write_cache_pages_da+0x1db/0x6f0 > [<ffffffff8133b07f>] ext4_da_writepages+0x49f/0x8b0 > [<ffffffff81105d9a>] ? __lock_acquire+0x4ca/0x560 > [<ffffffff811b4421>] do_writepages+0x51/0x70 > [<ffffffff8119fc24>] __filemap_fdatawrite_range+0x64/0x70 > [<ffffffff811a6a8b>] sys_fadvise64_64+0x25b/0x300 > [<ffffffff811a6b3e>] sys_fadvise64+0xe/0x10 > [<ffffffff818e9e99>] system_call_fastpath+0x16/0x1b > Code: a0 fe ff ff 4c 8b 95 b0 fe ff ff 49 63 d1 48 83 c2 02 48 8b 34 d5 30 4c 2b 82 48 83 c6 01 45 85 c9 48 89 34 d5 30 4c 2b 82 74 04 <0f> 0b eb fe 41 0f b6 56 13 83 e2 01 48 63 f2 48 83 c6 02 48 8b > RIP [<ffffffff81333c7b>] mpage_da_submit_io+0x41b/0x5b0 > RSP <ffff8801c2837a48> > ---[ end trace 6f79fb4f46cf9f0e ]--- > ------------[ cut here ]------------ > > Commit (d100eef2) this is very first commit where we try to make > extent-status-tree usefull. Definitely this is because cached value > is out of sync with real on-disk structure. I'll try to find out exact > place where this happen, but i'm shure this is just a beginning. > So Zheng please add sane self-testing infastructure for extent status > tree, for example like follows: > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 22 Feb 2013 13:03:25 -0500, "Theodore Ts'o" <tytso@mit.edu> wrote: > On Fri, Feb 22, 2013 at 09:17:57PM +0400, Dmitry Monakhov wrote: > > > > 301'th xfstests are failed due to : > > commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f > > Author: Zheng Liu <wenqing.lz@taobao.com> > > Date: Mon Feb 18 00:29:59 2013 -0500 > > > > ext4: lookup block mapping in extent status tree > > > > TESTCASE: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0 > > Thanks for the heads up. I haven't updatied the xfstests I've been > using yet, since I want to make sure I'm comparing apples and oranges > during the merge window when I'm checking for regressions; I'll update > my xfstests in a week or two after the merge window settles down, and > then I'll rerun my baseline tests using the updated xfstests against > 3.8.0 and 3.9-rc2 or 3.9-rc3. > > (And furthermore, these new xfstests aren't yet in xfstests upstream > yet, right? Any comments from the xfstests maintainer about whether > they are going to be willing to take your proposed new test cases?) I hope so. I think i've fixed things according to Dave's commit. > So when you say this is a regression, I take it that this test #301 > doesn't fail on commit d100eef2440f^, but it does fail on d100eef2440f, > correct? Correct. d100ee is the first bad commit which trigger BUGON() But issue was introduced earlier es_cache was not updated after extents was swapped between inodes. I'll prepare patch soon. Actually I think that the regression in 269'th you have found recently caused by similar issue and commit which you foud by bisecting ( the one which allow migration between indirect<->extent based inodes) simply helps to spot real issue in es_caching code. BUT my main idea is that we need robust self-testing infrastructure similar one that we have at the time extents was introduced to ext4. > > - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/23/2013 05:37 PM, Dmitry Monakhov wrote: > On Fri, 22 Feb 2013 13:03:25 -0500, "Theodore Ts'o" <tytso@mit.edu> wrote: >> On Fri, Feb 22, 2013 at 09:17:57PM +0400, Dmitry Monakhov wrote: >>> >>> 301'th xfstests are failed due to : >>> commit d100eef2440fea13e4f09e88b1c8bcbca64beb9f >>> Author: Zheng Liu <wenqing.lz@taobao.com> >>> Date: Mon Feb 18 00:29:59 2013 -0500 >>> >>> ext4: lookup block mapping in extent status tree >>> >>> TESTCASE: https://github.com/dmonakhov/xfstests/commit/7b7efeee30a41109201e2040034e71db9b66ddc0 >> >> Thanks for the heads up. I haven't updatied the xfstests I've been >> using yet, since I want to make sure I'm comparing apples and oranges >> during the merge window when I'm checking for regressions; I'll update >> my xfstests in a week or two after the merge window settles down, and >> then I'll rerun my baseline tests using the updated xfstests against >> 3.8.0 and 3.9-rc2 or 3.9-rc3. >> >> (And furthermore, these new xfstests aren't yet in xfstests upstream >> yet, right? Any comments from the xfstests maintainer about whether >> they are going to be willing to take your proposed new test cases?) > I hope so. I think i've fixed things according to Dave's commit. >> So when you say this is a regression, I take it that this test #301 >> doesn't fail on commit d100eef2440f^, but it does fail on d100eef2440f, >> correct? > Correct. d100ee is the first bad commit which trigger BUGON() > But issue was introduced earlier es_cache was not updated > after extents was swapped between inodes. Yes, you are right. I forgot to update status tree after we do a defragmentation. > I'll prepare patch soon. Ah, thanks. So I will wait your patch. > Actually I think that the regression in 269'th you have found recently > caused by similar issue and commit which you foud by bisecting ( the one > which allow migration between indirect<->extent based inodes) > simply helps to spot real issue in es_caching code. I will revise this patch. IIRC, we forgot to update status tree after an inode is migrated from extent-based to indirect-based. Thanks for pointing out. > > BUT my main idea is that we need robust self-testing infrastructure > similar one that we have at the time extents was introduced to ext4. Could you please share more detailed with me? Extents had been introduced for a long time ago. I have missed too many things. Thanks, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Feb 23, 2013 at 06:00:35PM +0800, Zheng Liu wrote: > > Actually I think that the regression in 269'th you have found recently > > caused by similar issue and commit which you foud by bisecting ( the one > > which allow migration between indirect<->extent based inodes) > > simply helps to spot real issue in es_caching code. > > I will revise this patch. IIRC, we forgot to update status tree after > an inode is migrated from extent-based to indirect-based. Thanks for > pointing out. Can you do this as a new commit? I've already bumped the master pointer up since I finished running xfstests and I'm seeing no regressions (at least with my set of xfstests). So given that everything has been tested and things looks pretty stable, I pushed up the master branch. I did remember that you were still working on this regression, but since we're already half-way through the merge window, I really want to make things are ready for a merge request to Linus. (Which I probably will be sending to Linus by Monday or Tuesday.) I do plan to collect bug fixes and any remaining regression fixes to push to Linus by -rc2 or -rc3, so if don't rush fixing up defrag functionality. Thanks!! - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Feb 23, 2013 at 07:14:47PM -0500, Theodore Ts'o wrote: > On Sat, Feb 23, 2013 at 06:00:35PM +0800, Zheng Liu wrote: > > > Actually I think that the regression in 269'th you have found recently > > > caused by similar issue and commit which you foud by bisecting ( the one > > > which allow migration between indirect<->extent based inodes) > > > simply helps to spot real issue in es_caching code. > > > > I will revise this patch. IIRC, we forgot to update status tree after > > an inode is migrated from extent-based to indirect-based. Thanks for > > pointing out. > > Can you do this as a new commit? I've already bumped the master > pointer up since I finished running xfstests and I'm seeing no > regressions (at least with my set of xfstests). So given that > everything has been tested and things looks pretty stable, I pushed up > the master branch. Yes, I will prepare it as a new commit. But I am not pretty sure that the root cause is es_caching. > > I did remember that you were still working on this regression, but > since we're already half-way through the merge window, I really want > to make things are ready for a merge request to Linus. (Which I > probably will be sending to Linus by Monday or Tuesday.) > > I do plan to collect bug fixes and any remaining regression fixes to > push to Linus by -rc2 or -rc3, so if don't rush fixing up defrag > functionality. For defrag regression, I have two choice to fix it. One is a quick but sub-optimal fix that we can invalidate all written/unwritten/hole extent from status tree. But it will decrease the performance because we need to load extent into status tree again. Further, one thing we need to keep in our mind is that some extent is unwritten and delayed. So it makes thing complicated. But now we don't need to worry about it because a bigalloc file system doesn't support defrag. So we are safe. Another is to update all extent in status tree. I think this is a better choice and I think Dmirty is working on it. Dmitry, I don't get your response. Could you confirm it? TBH, we never use migration and defrag feature in our product system. I admit that I almost don't pay a attention to them. It's my fault. I make a plan for the next works. 1. Try to prepare a patch that invalidates all cache in status tree to fix defrag regression, and wait Dmitry's patch. 2. Revise migration patch. 3. Submit remain patches for extent status tree that try to convert unwritten extent in end_io callback function and remove a bogus wait in ext4_ind_direct_IO. Now the patch has already done and still need to be tested. 4. get_block_t and *map_blocks cleanup. 5. extent-level locking. Any comment? Thanks, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Dmirty, On Fri, Feb 22, 2013 at 09:17:57PM +0400, Dmitry Monakhov wrote: [snip] > From 65c5fc212b1c684c76899c6e5e1f24d88550c6fc Mon Sep 17 00:00:00 2001 > From: Dmitry Monakhov <dmonakhov@openvz.org> > Date: Fri, 22 Feb 2013 20:55:52 +0400 > Subject: [PATCH] ext4 add sanity ext4_es_lookup_extent > > This patch does very simple thing: it recheck result returned from > ext4_es_lookup_extent() by comparing it old-good lookup via > ext4_{ind,ext}_map_blocks() under i_data_sem > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> I try to apply the following patch in my tree, but I realize that it seems that we couldn't add this sanity check in ext4_map_blocks and ext4_da_map_blocks now. The reason is that when we try to initialize an unwritten extent this extent could be zeroed out. But we could know it in *_map_blocks, and status tree couldn't be updated. So we will hit a BUG_ON(1) after added this sanity check. I agree with you that we need to add self-testing infrastructre after applied status tree patch series. But it seems that we need to mix updating status tree code with extent tree code. IMHO it is too complicated. Any thought? Thanks, - Zheng > --- > fs/ext4/inode.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 42 insertions(+), 0 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 95a0c62..706db1f 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -482,6 +482,41 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx, > return num; > } > > +#define ES_AGGRESSIVE_TEST 1 > +#ifdef ES_AGGRESSIVE_TEST > +void ext4_map_blocks_es_recheck(handle_t *handle, struct inode *inode, > + struct ext4_map_blocks *es_map, int es_ret, > + struct ext4_map_blocks *map, int flags) > +{ > + int ret; > + > + map->m_flags = 0; > + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) > + down_read((&EXT4_I(inode)->i_data_sem)); > + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { > + ret = ext4_ext_map_blocks(handle, inode, map, flags & > + EXT4_GET_BLOCKS_KEEP_SIZE); > + } else { > + ret = ext4_ind_map_blocks(handle, inode, map, flags & > + EXT4_GET_BLOCKS_KEEP_SIZE); > + } > + if (es_map->m_lblk != map->m_lblk || > + es_map->m_flags != map->m_flags || > + es_map->m_len != map->m_len || > + es_map->m_pblk != map->m_pblk || > + es_ret != ret) { > + printk("Assertation failed for inode:%lu " > + "es_cached_ex [%d/%d/%llu/%x]:%d != " > + "found_ex [%d/%d/%llu/%x]:%d\n", inode->i_ino, > + es_map->m_lblk, es_map->m_len, es_map->m_pblk, > + es_map->m_flags, es_ret, > + map->m_lblk, map->m_len, map->m_pblk, map->m_flags, ret); > + BUG(); > + } > + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) > + up_read((&EXT4_I(inode)->i_data_sem)); > +} > +#endif > /* > * The ext4_map_blocks() function tries to look up the requested blocks, > * and returns if the blocks are already mapped. > @@ -509,7 +544,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, > { > struct extent_status es; > int retval; > +#ifdef ES_AGGRESSIVE_TEST > + struct ext4_map_blocks orig_map; > > + memcpy(&orig_map, map, sizeof(*map)); > +#endif > map->m_flags = 0; > ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u," > "logical block %lu\n", inode->i_ino, flags, map->m_len, > @@ -531,6 +570,9 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, > } else { > BUG_ON(1); > } > +#ifdef ES_AGGRESSIVE_TEST > + ext4_map_blocks_es_recheck(handle,inode, map, retval, &orig_map, flags); > +#endif > goto found; > } > > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 24 Feb 2013 22:58:37 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote: > Hi Dmirty, > > On Fri, Feb 22, 2013 at 09:17:57PM +0400, Dmitry Monakhov wrote: > [snip] > > From 65c5fc212b1c684c76899c6e5e1f24d88550c6fc Mon Sep 17 00:00:00 2001 > > From: Dmitry Monakhov <dmonakhov@openvz.org> > > Date: Fri, 22 Feb 2013 20:55:52 +0400 > > Subject: [PATCH] ext4 add sanity ext4_es_lookup_extent > > > > This patch does very simple thing: it recheck result returned from > > ext4_es_lookup_extent() by comparing it old-good lookup via > > ext4_{ind,ext}_map_blocks() under i_data_sem > > > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> > > I try to apply the following patch in my tree, but I realize that it > seems that we couldn't add this sanity check in ext4_map_blocks and > ext4_da_map_blocks now. The reason is that when we try to initialize an > unwritten extent this extent could be zeroed out. But we could know it > in *_map_blocks, and status tree couldn't be updated. So we will hit a > BUG_ON(1) after added this sanity check. This means that extent-status tree should be updated once zeroout happen. > > I agree with you that we need to add self-testing infrastructre after > applied status tree patch series. But it seems that we need to mix > updating status tree code with extent tree code. IMHO it is too > complicated. Any thought? Many developers have invested significant amount of man-hours in to ext4's delay allocation state machine. It is now known as stable and reliable So it reasonable to use it as a primary for self-testing of extent-status tree. If es_cache != real_state this is means that we have forget to update es_cache. For example when extents was introduced it have many self testing things such as: 1) CHECK_BINSEARCH__: Which recheck binarry search result. 2) AGGRESSIVE_TEST : Which force deep extent tree constructions and allow to cover unusual branches. 3) DOUBLE_CHECK : Recheck block alloc bitmap one more time I strongly believe in testing. Ext4 is production-grade filesystem so we can not break it in the name of new features, unless we are sure that features are safe and valuable, that's why investments in self-testing infrastructure for ES should have very high priority. But off course Theodore's decision whenever feature is looks stable enough to get ready go upstream. > > Thanks, > - Zheng > > > --- > > fs/ext4/inode.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 42 insertions(+), 0 deletions(-) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 95a0c62..706db1f 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -482,6 +482,41 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx, > > return num; > > } > > > > +#define ES_AGGRESSIVE_TEST 1 > > +#ifdef ES_AGGRESSIVE_TEST > > +void ext4_map_blocks_es_recheck(handle_t *handle, struct inode *inode, > > + struct ext4_map_blocks *es_map, int es_ret, > > + struct ext4_map_blocks *map, int flags) > > +{ > > + int ret; > > + > > + map->m_flags = 0; > > + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) > > + down_read((&EXT4_I(inode)->i_data_sem)); > > + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { > > + ret = ext4_ext_map_blocks(handle, inode, map, flags & > > + EXT4_GET_BLOCKS_KEEP_SIZE); > > + } else { > > + ret = ext4_ind_map_blocks(handle, inode, map, flags & > > + EXT4_GET_BLOCKS_KEEP_SIZE); > > + } > > + if (es_map->m_lblk != map->m_lblk || > > + es_map->m_flags != map->m_flags || > > + es_map->m_len != map->m_len || > > + es_map->m_pblk != map->m_pblk || > > + es_ret != ret) { > > + printk("Assertation failed for inode:%lu " > > + "es_cached_ex [%d/%d/%llu/%x]:%d != " > > + "found_ex [%d/%d/%llu/%x]:%d\n", inode->i_ino, > > + es_map->m_lblk, es_map->m_len, es_map->m_pblk, > > + es_map->m_flags, es_ret, > > + map->m_lblk, map->m_len, map->m_pblk, map->m_flags, ret); > > + BUG(); > > + } > > + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) > > + up_read((&EXT4_I(inode)->i_data_sem)); > > +} > > +#endif > > /* > > * The ext4_map_blocks() function tries to look up the requested blocks, > > * and returns if the blocks are already mapped. > > @@ -509,7 +544,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, > > { > > struct extent_status es; > > int retval; > > +#ifdef ES_AGGRESSIVE_TEST > > + struct ext4_map_blocks orig_map; > > > > + memcpy(&orig_map, map, sizeof(*map)); > > +#endif > > map->m_flags = 0; > > ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u," > > "logical block %lu\n", inode->i_ino, flags, map->m_len, > > @@ -531,6 +570,9 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, > > } else { > > BUG_ON(1); > > } > > +#ifdef ES_AGGRESSIVE_TEST > > + ext4_map_blocks_es_recheck(handle,inode, map, retval, &orig_map, flags); > > +#endif > > goto found; > > } > > > > -- > > 1.7.1 > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 25, 2013 at 12:39:21PM +0400, Dmitry Monakhov wrote: > On Sun, 24 Feb 2013 22:58:37 +0800, Zheng Liu <gnehzuil.liu@gmail.com> wrote: > > Hi Dmirty, > > > > On Fri, Feb 22, 2013 at 09:17:57PM +0400, Dmitry Monakhov wrote: > > [snip] > > > From 65c5fc212b1c684c76899c6e5e1f24d88550c6fc Mon Sep 17 00:00:00 2001 > > > From: Dmitry Monakhov <dmonakhov@openvz.org> > > > Date: Fri, 22 Feb 2013 20:55:52 +0400 > > > Subject: [PATCH] ext4 add sanity ext4_es_lookup_extent > > > > > > This patch does very simple thing: it recheck result returned from > > > ext4_es_lookup_extent() by comparing it old-good lookup via > > > ext4_{ind,ext}_map_blocks() under i_data_sem > > > > > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> > > > > I try to apply the following patch in my tree, but I realize that it > > seems that we couldn't add this sanity check in ext4_map_blocks and > > ext4_da_map_blocks now. The reason is that when we try to initialize an > > unwritten extent this extent could be zeroed out. But we could know it > > in *_map_blocks, and status tree couldn't be updated. So we will hit a > > BUG_ON(1) after added this sanity check. > This means that extent-status tree should be updated once zeroout > happen. > > > > > I agree with you that we need to add self-testing infrastructre after > > applied status tree patch series. But it seems that we need to mix > > updating status tree code with extent tree code. IMHO it is too > > complicated. Any thought? > Many developers have invested significant amount of man-hours in to > ext4's delay allocation state machine. It is now known as stable and reliable > So it reasonable to use it as a primary for self-testing of > extent-status tree. If es_cache != real_state this is means that we have > forget to update es_cache. > For example when extents was introduced it have many self testing things > such as: > 1) CHECK_BINSEARCH__: Which recheck binarry search result. > > 2) AGGRESSIVE_TEST : Which force deep extent tree constructions and > allow to cover unusual branches. > 3) DOUBLE_CHECK : Recheck block alloc bitmap one more time Thanks for your reminder. I will add it. > > I strongly believe in testing. Ext4 is production-grade filesystem > so we can not break it in the name of new features, unless we are sure > that features are safe and valuable, that's why investments in > self-testing infrastructure for ES should have very high priority. > But off course Theodore's decision whenever feature is looks stable > enough to get ready go upstream. Yes, I agree with you that we shouldn't break a production-grade file system. Certainly Ted has the final decision. Regards, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 25, 2013 at 05:57:21PM +0800, Zheng Liu wrote: > > I strongly believe in testing. Ext4 is production-grade filesystem > > so we can not break it in the name of new features, unless we are sure > > that features are safe and valuable, that's why investments in > > self-testing infrastructure for ES should have very high priority. > > But off course Theodore's decision whenever feature is looks stable > > enough to get ready go upstream. > > Yes, I agree with you that we shouldn't break a production-grade file > system. Certainly Ted has the final decision. I agree that having better self-testing for sanity check features would be a good thing to add, so that as we make changes in the future, it's much more likely that we will find potential problems sooner rather than later. Given that amount of testing that we've done looking for regressions, I'm feeling pretty good about the stability of the ext4 tree, so I'm ready to send a pull request to Linus with the extent status tree. Yes, we do have the problem with the defrag ioctl, but I trust that Zheng will have a fix for us before v3.9 releases. Worst case, we can just flush the extent status tree entirely which should (inefficiently) solve the defrag bug. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 65c5fc212b1c684c76899c6e5e1f24d88550c6fc Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov <dmonakhov@openvz.org> Date: Fri, 22 Feb 2013 20:55:52 +0400 Subject: [PATCH] ext4 add sanity ext4_es_lookup_extent This patch does very simple thing: it recheck result returned from ext4_es_lookup_extent() by comparing it old-good lookup via ext4_{ind,ext}_map_blocks() under i_data_sem Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/ext4/inode.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 42 insertions(+), 0 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 95a0c62..706db1f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -482,6 +482,41 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx, return num; } +#define ES_AGGRESSIVE_TEST 1 +#ifdef ES_AGGRESSIVE_TEST +void ext4_map_blocks_es_recheck(handle_t *handle, struct inode *inode, + struct ext4_map_blocks *es_map, int es_ret, + struct ext4_map_blocks *map, int flags) +{ + int ret; + + map->m_flags = 0; + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) + down_read((&EXT4_I(inode)->i_data_sem)); + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { + ret = ext4_ext_map_blocks(handle, inode, map, flags & + EXT4_GET_BLOCKS_KEEP_SIZE); + } else { + ret = ext4_ind_map_blocks(handle, inode, map, flags & + EXT4_GET_BLOCKS_KEEP_SIZE); + } + if (es_map->m_lblk != map->m_lblk || + es_map->m_flags != map->m_flags || + es_map->m_len != map->m_len || + es_map->m_pblk != map->m_pblk || + es_ret != ret) { + printk("Assertation failed for inode:%lu " + "es_cached_ex [%d/%d/%llu/%x]:%d != " + "found_ex [%d/%d/%llu/%x]:%d\n", inode->i_ino, + es_map->m_lblk, es_map->m_len, es_map->m_pblk, + es_map->m_flags, es_ret, + map->m_lblk, map->m_len, map->m_pblk, map->m_flags, ret); + BUG(); + } + if (!(flags & EXT4_GET_BLOCKS_NO_LOCK)) + up_read((&EXT4_I(inode)->i_data_sem)); +} +#endif /* * The ext4_map_blocks() function tries to look up the requested blocks, * and returns if the blocks are already mapped. @@ -509,7 +544,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, { struct extent_status es; int retval; +#ifdef ES_AGGRESSIVE_TEST + struct ext4_map_blocks orig_map; + memcpy(&orig_map, map, sizeof(*map)); +#endif map->m_flags = 0; ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u," "logical block %lu\n", inode->i_ino, flags, map->m_len, @@ -531,6 +570,9 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, } else { BUG_ON(1); } +#ifdef ES_AGGRESSIVE_TEST + ext4_map_blocks_es_recheck(handle,inode, map, retval, &orig_map, flags); +#endif goto found; } -- 1.7.1