diff mbox

[v2] UBIFS: Fix assert failed in ubifs_set_page_dirty

Message ID 536092CE.2090209@huawei.com
State Accepted
Headers show

Commit Message

hujianyang April 30, 2014, 6:06 a.m. UTC
Hi, all

Basing on the perious mail, I would like to show a clear figure
about the race I have found.

Thread A (mmap)                        Thread B (fsync)

->__do_fault                           ->write_cache_pages
   -> ubifs_page_mkwrite
       -> budget_space
       -> lock_page
       -> release/convert_page_budget
       -> SetPagePrivate
       -> TestSetPageDirty
       -> unlock_page
                                       -> lock_page
                                           -> TestClearPageDirty
                                           -> ubifs_writepage
                                               -> do_writepage
                                                   -> release_budget
                                                   -> ClearPagePrivate
                                                   -> unlock_page
   -> !(ret & VM_FAULT_LOCKED)
   -> lock_page
   -> set_page_dirty
       -> ubifs_set_page_dirty
           -> TestSetPageDirty (set page dirty without budgeting)
   -> unlock_page

According to this situation, my v2 fix returns from page_mkwrite
without performing unlock_page. We return VM_FAULT_LOCKED instead
of just return 0. After doing this, the race above will not happen.


Signed-off-by: hujianyang <hujianyang@huawei.com>
---
 fs/ubifs/file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Laurence Withers April 30, 2014, 12:18 p.m. UTC | #1
On Wed, Apr 30, 2014 at 02:06:06PM +0800, hujianyang wrote:
> According to this situation, my v2 fix returns from page_mkwrite
> without performing unlock_page. We return VM_FAULT_LOCKED instead
> of just return 0. After doing this, the race above will not happen.

Hi,

I have been testing patch v1 on our hardware. Normally, at higher data rates,
we can hit the assert in under 8 hours. So far I have been running for 48
hours on one system without seeing the assert message.

I will now switch to running v2 and see if I can't find a few more systems to
run in parallel.

Many thanks, and bye for now,
Dolev Raviv April 30, 2014, 1:48 p.m. UTC | #2
Hi Hujianyang and Laurence,
I'm hitting an assertion in very similar code during the callback
ubifs_releasepage(). I'm quite new to this code area, and currently diving
in to understanding the race you described (before I attempt to look at
the patch).

I was wondering if this assertion is related?

[  862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid 11088)
[  862.700952] CPU: 0 PID: 11088 Comm: busybox Tainted: G        W   
3.10.0+ #1
[  862.714460] [<c00147f4>] (unwind_backtrace+0x0/0x11c) from [<c0011bc0>]
(show_stack+0x10/0x14)
[  862.722047] [<c0011bc0>] (show_stack+0x10/0x14) from [<c0187fe8>]
(ubifs_releasepage+0x70/0xb8)
[  862.746573] [<c0187fe8>] (ubifs_releasepage+0x70/0xb8) from
[<c00c7228>] (try_to_release_page+0x44/0x5c)
[  862.755865] [<c00c7228>] (try_to_release_page+0x44/0x5c) from
[<c00d3b58>] (invalidate_inode_page+0x60/0x94)
[  862.764955] vbat_low_handler: vbat low
[  862.768728] [<c00d3b58>] (invalidate_inode_page+0x60/0x94) from
[<c00d3c00>] (invalidate_mapping_pages+0x74/0x114)
[  862.779244] [<c00d3c00>] (invalidate_mapping_pages+0x74/0x114) from
[<c0145004>] (drop_pagecache_sb+0x7c/0xbc)
[  862.789079] [<c0145004>] (drop_pagecache_sb+0x7c/0xbc) from
[<c0104ab4>] (iterate_supers+0x74/0xc8)
[  862.798092] [<c0104ab4>] (iterate_supers+0x74/0xc8) from [<c0145088>]
(drop_caches_sysctl_handler+0x44/0x90)
[  862.808907] [<c0145088>] (drop_caches_sysctl_handler+0x44/0x90) from
[<c014f280>] (proc_sys_call_handler+0x84/0xa0)
[  862.821868] [<c014f280>] (proc_sys_call_handler+0x84/0xa0) from
[<c014f2ac>] (proc_sys_write+0x10/0x14)
[  862.830349] [<c014f2ac>] (proc_sys_write+0x10/0x14) from [<c01022fc>]
(vfs_write+0xd4/0x16c)
[  862.838719] [<c01022fc>] (vfs_write+0xd4/0x16c) from [<c010263c>]
(SyS_write+0x3c/0x60)
[  862.846738] [<c010263c>] (SyS_write+0x3c/0x60) from [<c000e440>]
(ret_fast_syscall+0x0/0x30)


> On Wed, Apr 30, 2014 at 02:06:06PM +0800, hujianyang wrote:
>> According to this situation, my v2 fix returns from page_mkwrite
>> without performing unlock_page. We return VM_FAULT_LOCKED instead
>> of just return 0. After doing this, the race above will not happen.
>
> Hi,
>
> I have been testing patch v1 on our hardware. Normally, at higher data
> rates,
> we can hit the assert in under 8 hours. So far I have been running for 48
> hours on one system without seeing the assert message.
>
> I will now switch to running v2 and see if I can't find a few more systems
> to
> run in parallel.
>
> Many thanks, and bye for now,
> --
> Laurence Withers, <lwithers@guralp.com>
> http://www.guralp.com/
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
hujianyang May 4, 2014, 6:38 a.m. UTC | #3
Hi, Dolev

On 2014/4/30 21:48, Dolev Raviv wrote:
> Hi Hujianyang and Laurence,
> I'm hitting an assertion in very similar code during the callback
> ubifs_releasepage(). I'm quite new to this code area, and currently diving
> in to understanding the race you described (before I attempt to look at
> the patch).
> 
> I was wondering if this assertion is related?
> 
> [  862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid 11088)

What's your kernel version? Because in v3.15 and v3.10, line 1434
present different assertion.

I think you probably hit:

"ubifs_assert(0)"

static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
{
        /*
         * An attempt to release a dirty page without budgeting for it - should
         * not happen.
         */
        if (PageWriteback(page))
                return 0;
        ubifs_assert(PagePrivate(page));
        ubifs_assert(0);                          <- v3.10 line 1434
        ClearPagePrivate(page);
        ClearPageChecked(page);
        return 1;
}


Page_Private bit seems to indicate page is budgeted in UBIFS. Dropping
caches performs with page lock and just deals with pages which are not
dirty. Then, if page_private bit is set, vfs performs ->releasepage at
try_to_release_page.

Maybe some explanation of this assert failed is that there is a private
(budgeted) but not dirty page in your page caches.

I think it is not related to my race because pages can just remain
in 3 states at fsync and mmap:

1) Dirty, Private
2) Not Dirty, Not Private
3) Dirty, Not Private (wrong condition)

Did you get some more assert failed after this? If you umount UBIFS,
ubifs_put_super will check the budget info, and I think you should
hit some assertion there.

Do you have a simply way to reproduce this assert failed?


Hu
Dolev Raviv May 4, 2014, 11:54 a.m. UTC | #4
Hi Hu, Thanks allot for your help.
> Hi, Dolev
>
> On 2014/4/30 21:48, Dolev Raviv wrote:
>> Hi Hujianyang and Laurence,
>> I'm hitting an assertion in very similar code during the callback
>> ubifs_releasepage(). I'm quite new to this code area, and currently
>> diving
>> in to understanding the race you described (before I attempt to look at
>> the patch).
>>
>> I was wondering if this assertion is related?
>>
>> [  862.695278] UBIFS assert failed in ubifs_releasepage at 1434 (pid
>> 11088)
>
> What's your kernel version? Because in v3.15 and v3.10, line 1434
> present different assertion.

You are right the version is v3.10 and I do hit the
"ubifs_assert(0)"

> Page_Private bit seems to indicate page is budgeted in UBIFS. Dropping
> caches performs with page lock and just deals with pages which are not
> dirty. Then, if page_private bit is set, vfs performs ->releasepage at
> try_to_release_page.
>
> Maybe some explanation of this assert failed is that there is a private
> (budgeted) but not dirty page in your page caches.

Wouldn't then the assertion in the previous line fail?
I dove into the code, and I understood there probably isn't any real
connection between the to.
I do appreciate you took the time to answer me :) .

>
> I think it is not related to my race because pages can just remain
> in 3 states at fsync and mmap:
>
> 1) Dirty, Private
> 2) Not Dirty, Not Private
> 3) Dirty, Not Private (wrong condition)
>
> Did you get some more assert failed after this? If you umount UBIFS,
> ubifs_put_super will check the budget info, and I think you should
> hit some assertion there.

I did notice, at least once, assertion failure:
[ 4445.102863] UBIFS: un-mount UBI device 1, volume 0
[ 4445.106627] UBIFS assert failed in ubifs_put_super at 1775 (pid 10416)

But, this error happens only significantly later ~2 hours.

>
> Do you have a simply way to reproduce this assert failed?
>

At the moment I hit this randomly when stressing the system.
I use a stress test that run parallel IOzone and lmdd benchmark, each
100MB (out of ~217MB free)



Wouldn't then the assertion in the previous line fail?
I dove into the code, and I understood there probably isn't any real
connection between the to.
I do appreciate you took the time to answer me . :-)

Thanks,
Dolev
Artem Bityutskiy May 5, 2014, 7:02 a.m. UTC | #5
On Wed, 2014-04-30 at 14:06 +0800, hujianyang wrote:
> Hi, all
> 
> Basing on the perious mail, I would like to show a clear figure
> about the race I have found.

Looks correct, thanks. This probably needs to go the the stable tree.
Artem Bityutskiy May 5, 2014, 7:07 a.m. UTC | #6
On Wed, 2014-04-30 at 12:18 +0000, Laurence Withers wrote:
> On Wed, Apr 30, 2014 at 02:06:06PM +0800, hujianyang wrote:
> > According to this situation, my v2 fix returns from page_mkwrite
> > without performing unlock_page. We return VM_FAULT_LOCKED instead
> > of just return 0. After doing this, the race above will not happen.
> 
> Hi,
> 
> I have been testing patch v1 on our hardware. Normally, at higher data rates,
> we can hit the assert in under 8 hours. So far I have been running for 48
> hours on one system without seeing the assert message.
> 
> I will now switch to running v2 and see if I can't find a few more systems to
> run in parallel.
> 
> Many thanks, and bye for now,

So can I add:

Tested-by: Laurence Withers <lwithers@guralp.com>

?
Artem Bityutskiy May 5, 2014, 7:16 a.m. UTC | #7
On Mon, 2014-05-05 at 10:02 +0300, Artem Bityutskiy wrote:
> On Wed, 2014-04-30 at 14:06 +0800, hujianyang wrote:
> > Hi, all
> > 
> > Basing on the perious mail, I would like to show a clear figure
> > about the race I have found.
> 
> Looks correct, thanks. This probably needs to go the the stable tree.

I think the patch is correct and fixes a real problem where we end up
with a dirty page but no budget allocated for this page.

So the subject is not very good - it looks like the patch just silence a
warning, but it actually fixes a real problem.

I'll change the subject.

It looks like this patch should be merged to Linuses tree for 3.15, but
I would like to play safe and postpone it to 3.16, because I did not
test it myself, and only one person's Tested-by: is not enough to make
me confident the patch is bullet-proof.

However, I'll add "Cc: stable@vger.kernel.org", which means that once
the patch hits Linuses tree, it'll also go to older stable kernel trees
too, so all the careful users will end up having it too.

How does this sound?

And thanks for excellent analysis.
hujianyang May 5, 2014, 7:30 a.m. UTC | #8
Hi Artem,

> However, I'll add "Cc: stable@vger.kernel.org", which means that once
> the patch hits Linuses tree, it'll also go to older stable kernel trees
> too, so all the careful users will end up having it too.

I really need this patch goes into stable, it's enough for me. I also
think the commit message is not good and thank you for changing the
subject.

Hu
Laurence Withers May 5, 2014, 11:50 a.m. UTC | #9
On Mon, May 05, 2014 at 10:07:56AM +0300, Artem Bityutskiy wrote:
> So can I add:
> 
> Tested-by: Laurence Withers <lwithers@guralp.com>

Dear Artem,

Yes, please do so. My test system is still going strong, 120 hours in.

Many thanks, and bye for now,
diff mbox

Patch

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 4f34dba..f7d48a0 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1525,8 +1525,7 @@  static int ubifs_vm_page_mkwrite(struct vm_area_struct *vma,
 	}

 	wait_for_stable_page(page);
-	unlock_page(page);
-	return 0;
+	return VM_FAULT_LOCKED;

 out_unlock:
 	unlock_page(page);