Message ID | 536092CE.2090209@huawei.com |
---|---|
State | Accepted |
Headers | show |
On Wed, Apr 30, 2014 at 02:06:06PM +0800, hujianyang wrote: > According to this situation, my v2 fix returns from page_mkwrite > without performing unlock_page. We return VM_FAULT_LOCKED instead > of just return 0. After doing this, the race above will not happen. Hi, I have been testing patch v1 on our hardware. Normally, at higher data rates, we can hit the assert in under 8 hours. So far I have been running for 48 hours on one system without seeing the assert message. I will now switch to running v2 and see if I can't find a few more systems to run in parallel. Many thanks, and bye for now,
Hi Hujianyang and Laurence, I'm hitting an assertion in very similar code during the callback ubifs_releasepage(). I'm quite new to this code area, and currently diving in to understanding the race you described (before I attempt to look at the patch). I was wondering if this assertion is related? [ 862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid 11088) [ 862.700952] CPU: 0 PID: 11088 Comm: busybox Tainted: G W 3.10.0+ #1 [ 862.714460] [<c00147f4>] (unwind_backtrace+0x0/0x11c) from [<c0011bc0>] (show_stack+0x10/0x14) [ 862.722047] [<c0011bc0>] (show_stack+0x10/0x14) from [<c0187fe8>] (ubifs_releasepage+0x70/0xb8) [ 862.746573] [<c0187fe8>] (ubifs_releasepage+0x70/0xb8) from [<c00c7228>] (try_to_release_page+0x44/0x5c) [ 862.755865] [<c00c7228>] (try_to_release_page+0x44/0x5c) from [<c00d3b58>] (invalidate_inode_page+0x60/0x94) [ 862.764955] vbat_low_handler: vbat low [ 862.768728] [<c00d3b58>] (invalidate_inode_page+0x60/0x94) from [<c00d3c00>] (invalidate_mapping_pages+0x74/0x114) [ 862.779244] [<c00d3c00>] (invalidate_mapping_pages+0x74/0x114) from [<c0145004>] (drop_pagecache_sb+0x7c/0xbc) [ 862.789079] [<c0145004>] (drop_pagecache_sb+0x7c/0xbc) from [<c0104ab4>] (iterate_supers+0x74/0xc8) [ 862.798092] [<c0104ab4>] (iterate_supers+0x74/0xc8) from [<c0145088>] (drop_caches_sysctl_handler+0x44/0x90) [ 862.808907] [<c0145088>] (drop_caches_sysctl_handler+0x44/0x90) from [<c014f280>] (proc_sys_call_handler+0x84/0xa0) [ 862.821868] [<c014f280>] (proc_sys_call_handler+0x84/0xa0) from [<c014f2ac>] (proc_sys_write+0x10/0x14) [ 862.830349] [<c014f2ac>] (proc_sys_write+0x10/0x14) from [<c01022fc>] (vfs_write+0xd4/0x16c) [ 862.838719] [<c01022fc>] (vfs_write+0xd4/0x16c) from [<c010263c>] (SyS_write+0x3c/0x60) [ 862.846738] [<c010263c>] (SyS_write+0x3c/0x60) from [<c000e440>] (ret_fast_syscall+0x0/0x30) > On Wed, Apr 30, 2014 at 02:06:06PM +0800, hujianyang wrote: >> According to this situation, my v2 fix returns from page_mkwrite >> without performing unlock_page. We return VM_FAULT_LOCKED instead >> of just return 0. After doing this, the race above will not happen. > > Hi, > > I have been testing patch v1 on our hardware. Normally, at higher data > rates, > we can hit the assert in under 8 hours. So far I have been running for 48 > hours on one system without seeing the assert message. > > I will now switch to running v2 and see if I can't find a few more systems > to > run in parallel. > > Many thanks, and bye for now, > -- > Laurence Withers, <lwithers@guralp.com> > http://www.guralp.com/ > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ >
Hi, Dolev On 2014/4/30 21:48, Dolev Raviv wrote: > Hi Hujianyang and Laurence, > I'm hitting an assertion in very similar code during the callback > ubifs_releasepage(). I'm quite new to this code area, and currently diving > in to understanding the race you described (before I attempt to look at > the patch). > > I was wondering if this assertion is related? > > [ 862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid 11088) What's your kernel version? Because in v3.15 and v3.10, line 1434 present different assertion. I think you probably hit: "ubifs_assert(0)" static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags) { /* * An attempt to release a dirty page without budgeting for it - should * not happen. */ if (PageWriteback(page)) return 0; ubifs_assert(PagePrivate(page)); ubifs_assert(0); <- v3.10 line 1434 ClearPagePrivate(page); ClearPageChecked(page); return 1; } Page_Private bit seems to indicate page is budgeted in UBIFS. Dropping caches performs with page lock and just deals with pages which are not dirty. Then, if page_private bit is set, vfs performs ->releasepage at try_to_release_page. Maybe some explanation of this assert failed is that there is a private (budgeted) but not dirty page in your page caches. I think it is not related to my race because pages can just remain in 3 states at fsync and mmap: 1) Dirty, Private 2) Not Dirty, Not Private 3) Dirty, Not Private (wrong condition) Did you get some more assert failed after this? If you umount UBIFS, ubifs_put_super will check the budget info, and I think you should hit some assertion there. Do you have a simply way to reproduce this assert failed? Hu
Hi Hu, Thanks allot for your help. > Hi, Dolev > > On 2014/4/30 21:48, Dolev Raviv wrote: >> Hi Hujianyang and Laurence, >> I'm hitting an assertion in very similar code during the callback >> ubifs_releasepage(). I'm quite new to this code area, and currently >> diving >> in to understanding the race you described (before I attempt to look at >> the patch). >> >> I was wondering if this assertion is related? >> >> [ 862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid >> 11088) > > What's your kernel version? Because in v3.15 and v3.10, line 1434 > present different assertion. You are right the version is v3.10 and I do hit the "ubifs_assert(0)" > Page_Private bit seems to indicate page is budgeted in UBIFS. Dropping > caches performs with page lock and just deals with pages which are not > dirty. Then, if page_private bit is set, vfs performs ->releasepage at > try_to_release_page. > > Maybe some explanation of this assert failed is that there is a private > (budgeted) but not dirty page in your page caches. Wouldn't then the assertion in the previous line fail? I dove into the code, and I understood there probably isn't any real connection between the to. I do appreciate you took the time to answer me :) . > > I think it is not related to my race because pages can just remain > in 3 states at fsync and mmap: > > 1) Dirty, Private > 2) Not Dirty, Not Private > 3) Dirty, Not Private (wrong condition) > > Did you get some more assert failed after this? If you umount UBIFS, > ubifs_put_super will check the budget info, and I think you should > hit some assertion there. I did notice, at least once, assertion failure: [ 4445.102863] UBIFS: un-mount UBI device 1, volume 0 [ 4445.106627] UBIFS assert failed in ubifs_put_super at 1775 (pid 10416) But, this error happens only significantly later ~2 hours. > > Do you have a simply way to reproduce this assert failed? > At the moment I hit this randomly when stressing the system. I use a stress test that run parallel IOzone and lmdd benchmark, each 100MB (out of ~217MB free) Wouldn't then the assertion in the previous line fail? I dove into the code, and I understood there probably isn't any real connection between the to. I do appreciate you took the time to answer me . :-) Thanks, Dolev
On Wed, 2014-04-30 at 14:06 +0800, hujianyang wrote: > Hi, all > > Basing on the perious mail, I would like to show a clear figure > about the race I have found. Looks correct, thanks. This probably needs to go the the stable tree.
On Wed, 2014-04-30 at 12:18 +0000, Laurence Withers wrote: > On Wed, Apr 30, 2014 at 02:06:06PM +0800, hujianyang wrote: > > According to this situation, my v2 fix returns from page_mkwrite > > without performing unlock_page. We return VM_FAULT_LOCKED instead > > of just return 0. After doing this, the race above will not happen. > > Hi, > > I have been testing patch v1 on our hardware. Normally, at higher data rates, > we can hit the assert in under 8 hours. So far I have been running for 48 > hours on one system without seeing the assert message. > > I will now switch to running v2 and see if I can't find a few more systems to > run in parallel. > > Many thanks, and bye for now, So can I add: Tested-by: Laurence Withers <lwithers@guralp.com> ?
On Mon, 2014-05-05 at 10:02 +0300, Artem Bityutskiy wrote: > On Wed, 2014-04-30 at 14:06 +0800, hujianyang wrote: > > Hi, all > > > > Basing on the perious mail, I would like to show a clear figure > > about the race I have found. > > Looks correct, thanks. This probably needs to go the the stable tree. I think the patch is correct and fixes a real problem where we end up with a dirty page but no budget allocated for this page. So the subject is not very good - it looks like the patch just silence a warning, but it actually fixes a real problem. I'll change the subject. It looks like this patch should be merged to Linuses tree for 3.15, but I would like to play safe and postpone it to 3.16, because I did not test it myself, and only one person's Tested-by: is not enough to make me confident the patch is bullet-proof. However, I'll add "Cc: stable@vger.kernel.org", which means that once the patch hits Linuses tree, it'll also go to older stable kernel trees too, so all the careful users will end up having it too. How does this sound? And thanks for excellent analysis.
Hi Artem, > However, I'll add "Cc: stable@vger.kernel.org", which means that once > the patch hits Linuses tree, it'll also go to older stable kernel trees > too, so all the careful users will end up having it too. I really need this patch goes into stable, it's enough for me. I also think the commit message is not good and thank you for changing the subject. Hu
On Mon, May 05, 2014 at 10:07:56AM +0300, Artem Bityutskiy wrote: > So can I add: > > Tested-by: Laurence Withers <lwithers@guralp.com> Dear Artem, Yes, please do so. My test system is still going strong, 120 hours in. Many thanks, and bye for now,
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index 4f34dba..f7d48a0 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1525,8 +1525,7 @@ static int ubifs_vm_page_mkwrite(struct vm_area_struct *vma, } wait_for_stable_page(page); - unlock_page(page); - return 0; + return VM_FAULT_LOCKED; out_unlock: unlock_page(page);
Hi, all Basing on the perious mail, I would like to show a clear figure about the race I have found. Thread A (mmap) Thread B (fsync) ->__do_fault ->write_cache_pages -> ubifs_page_mkwrite -> budget_space -> lock_page -> release/convert_page_budget -> SetPagePrivate -> TestSetPageDirty -> unlock_page -> lock_page -> TestClearPageDirty -> ubifs_writepage -> do_writepage -> release_budget -> ClearPagePrivate -> unlock_page -> !(ret & VM_FAULT_LOCKED) -> lock_page -> set_page_dirty -> ubifs_set_page_dirty -> TestSetPageDirty (set page dirty without budgeting) -> unlock_page According to this situation, my v2 fix returns from page_mkwrite without performing unlock_page. We return VM_FAULT_LOCKED instead of just return 0. After doing this, the race above will not happen. Signed-off-by: hujianyang <hujianyang@huawei.com> --- fs/ubifs/file.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)