diff mbox series

mtd/ubi: fix initialization order of ubi subsystems

Message ID 20190620132753.19538-1-mikhail.kshevetskiy@gmail.com
State Rejected
Delegated to: Richard Weinberger
Headers show
Series mtd/ubi: fix initialization order of ubi subsystems | expand

Commit Message

Mikhail Kshevetskiy June 20, 2019, 1:27 p.m. UTC
during ubi initialization we have a following calling sequence

1) ubi_attach()

   ----------------------------------------------------------------
   err = ubi_wl_init(ubi, ai);
   if (err) goto out_vtbl;

   err = ubi_eba_init(ubi, ai);
   if (err) goto out_wl;
   ----------------------------------------------------------------

   As we can see "eba" subsytem is NOT initialized at the moment of
   initializing of "wl" subsystem

2) ubi_wl_init()

   it call ensure_wear_leveling() at some moment

3) ensure_wear_leveling()

   ---------------------------------------------------------------
   e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, u.rb);
   e2 = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
   if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD)) goto out_unlock;
   dbg_wl("schedule wear-leveling");
   ---------------------------------------------------------------

   so, if no wear-leveling is scheduled than everything is OK

   and a little bit below

   ---------------------------------------------------------------
   wrk->anchor = 0;
   wrk->func = &wear_leveling_worker;
   if (nested) __schedule_ubi_work(ubi, wrk);
   else schedule_ubi_work(ubi, wrk);
   ---------------------------------------------------------------

   as result we enter to wear_leveling_worker() function

4) wear_leveling_worker()

   ---------------------------------------------------------------
   /*
    * Now pick the least worn-out used physical eraseblock and a
    * highly worn-out free physical eraseblock. If the erase
    * counters differ much enough, start wear-leveling.
    */
   e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, u.rb);
   e2 = get_peb_for_wl(ubi);
   if (!e2) goto out_cancel;

   if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD)) {
       dbg_wl("no WL needed: min used EC %d, max free EC %d", e1->ec, e2->ec);
       /* Give the unused PEB back */
       wl_tree_add(e2, &ubi->free);
       ubi->free_count++;
       goto out_cancel;
   }
   ---------------------------------------------------------------

   so, if no WL needed than everything is OK

   and a little bit below

   ---------------------------------------------------------------
   err = ubi_eba_copy_leb(ubi, e1->pnum, e2->pnum, vid_hdr);
   ---------------------------------------------------------------

   OPS, eba sybsystem is not initialized yet (see (1))

From the other side, it looks like eba sybsystem does not require wl sybsystem
during initialization, so just fix ordering and proper handle error path.
---
 drivers/mtd/ubi/attach.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Artem Bityutskiy June 21, 2019, 4:04 p.m. UTC | #1
On Thu, 2019-06-20 at 16:27 +0300, Mikhail Kshevetskiy wrote:
> -	err = ubi_wl_init(ubi, ai);
> +	err = ubi_eba_init(ubi, ai);
>  	if (err)
>  		goto out_vtbl;
>  
> -	err = ubi_eba_init(ubi, ai);
> +	err = ubi_wl_init(ubi, ai);
>  	if (err)
> -		goto out_wl;
> +		goto out_vtbl;

Looks good to me, thanks.

Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Richard Weinberger June 21, 2019, 5:26 p.m. UTC | #2
On Thu, Jun 20, 2019 at 3:28 PM Mikhail Kshevetskiy
<mikhail.kshevetskiy@gmail.com> wrote:
>
> during ubi initialization we have a following calling sequence
>
> 1) ubi_attach()
>
>    ----------------------------------------------------------------
>    err = ubi_wl_init(ubi, ai);
>    if (err) goto out_vtbl;
>
>    err = ubi_eba_init(ubi, ai);
>    if (err) goto out_wl;
>    ----------------------------------------------------------------
>
>    As we can see "eba" subsytem is NOT initialized at the moment of
>    initializing of "wl" subsystem
>
> 2) ubi_wl_init()
>
>    it call ensure_wear_leveling() at some moment
>
> 3) ensure_wear_leveling()
>
>    ---------------------------------------------------------------
>    e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, u.rb);
>    e2 = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
>    if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD)) goto out_unlock;
>    dbg_wl("schedule wear-leveling");
>    ---------------------------------------------------------------
>
>    so, if no wear-leveling is scheduled than everything is OK
>
>    and a little bit below
>
>    ---------------------------------------------------------------
>    wrk->anchor = 0;
>    wrk->func = &wear_leveling_worker;
>    if (nested) __schedule_ubi_work(ubi, wrk);
>    else schedule_ubi_work(ubi, wrk);
>    ---------------------------------------------------------------
>
>    as result we enter to wear_leveling_worker() function

Well, we schedule work, but don't execute it since the ubi-thread
is still disabled.

Can you please share a little more about the problem you are facing?
Also produce_free_peb() should not get called at this point.
So before we flip the order of initialization I'd like to understand the problem
better.

Thanks,
//richard
Mikhail Kshevetskiy June 21, 2019, 6:39 p.m. UTC | #3
On Fri, 21 Jun 2019 19:26:37 +0200
Richard Weinberger <richard.weinberger@gmail.com> wrote:

> On Thu, Jun 20, 2019 at 3:28 PM Mikhail Kshevetskiy
> <mikhail.kshevetskiy@gmail.com> wrote:
> >
> > during ubi initialization we have a following calling sequence
> >
> > 1) ubi_attach()
> >
> >    ----------------------------------------------------------------
> >    err = ubi_wl_init(ubi, ai);
> >    if (err) goto out_vtbl;
> >
> >    err = ubi_eba_init(ubi, ai);
> >    if (err) goto out_wl;
> >    ----------------------------------------------------------------
> >
> >    As we can see "eba" subsytem is NOT initialized at the moment of
> >    initializing of "wl" subsystem
> >
> > 2) ubi_wl_init()
> >
> >    it call ensure_wear_leveling() at some moment
> >
> > 3) ensure_wear_leveling()
> >
> >    ---------------------------------------------------------------
> >    e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, u.rb);
> >    e2 = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
> >    if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD)) goto out_unlock;
> >    dbg_wl("schedule wear-leveling");
> >    ---------------------------------------------------------------
> >
> >    so, if no wear-leveling is scheduled than everything is OK
> >
> >    and a little bit below
> >
> >    ---------------------------------------------------------------
> >    wrk->anchor = 0;
> >    wrk->func = &wear_leveling_worker;
> >    if (nested) __schedule_ubi_work(ubi, wrk);
> >    else schedule_ubi_work(ubi, wrk);
> >    ---------------------------------------------------------------
> >
> >    as result we enter to wear_leveling_worker() function
> 
> Well, we schedule work, but don't execute it since the ubi-thread
> is still disabled.
> 
> Can you please share a little more about the problem you are facing?
> Also produce_free_peb() should not get called at this point.
> So before we flip the order of initialization I'd like to understand the
> problem better.

We faced a cycle rebooting in u-boot during ubi initialization. The problem
appears approximately once per week on a random router from our test farm.
We never trigger this problem in linux (only in u-boot).

From the other side ubi code in u-boot is almost the same as ubi code in linux
kernel (it backported from linux periodically), so it make sense to fix it in
linux as well to help with future porting.

PS we send the same patch to u-boot.

Mikhail
Richard Weinberger June 21, 2019, 6:47 p.m. UTC | #4
----- Ursprüngliche Mail -----
>> Can you please share a little more about the problem you are facing?
>> Also produce_free_peb() should not get called at this point.
>> So before we flip the order of initialization I'd like to understand the
>> problem better.
> 
> We faced a cycle rebooting in u-boot during ubi initialization. The problem
> appears approximately once per week on a random router from our test farm.
> We never trigger this problem in linux (only in u-boot).
> 
> From the other side ubi code in u-boot is almost the same as ubi code in linux
> kernel (it backported from linux periodically), so it make sense to fix it in
> linux as well to help with future porting.
> 
> PS we send the same patch to u-boot.

In u-boot the story is a little different because it has no concept of
threads and executes work immediately.

Do you see this on a recent u-boot?
With this commit in u-boot this problem should not happen:
f82290afc847 ("mtd: ubi: Fix worker handling")

Thanks,
//richard
Mikhail Kshevetskiy June 21, 2019, 7:16 p.m. UTC | #5
On Fri, 21 Jun 2019 20:47:50 +0200 (CEST)
Richard Weinberger <richard@nod.at> wrote:

> ----- Ursprüngliche Mail -----
> >> Can you please share a little more about the problem you are facing?
> >> Also produce_free_peb() should not get called at this point.
> >> So before we flip the order of initialization I'd like to understand the
> >> problem better.
> > 
> > We faced a cycle rebooting in u-boot during ubi initialization. The problem
> > appears approximately once per week on a random router from our test farm.
> > We never trigger this problem in linux (only in u-boot).
> > 
> > From the other side ubi code in u-boot is almost the same as ubi code in
> > linux kernel (it backported from linux periodically), so it make sense to
> > fix it in linux as well to help with future porting.
> > 
> > PS we send the same patch to u-boot.
> 
> In u-boot the story is a little different because it has no concept of
> threads and executes work immediately.
> 
> Do you see this on a recent u-boot?
> With this commit in u-boot this problem should not happen:
> f82290afc847 ("mtd: ubi: Fix worker handling")

no we use 201.07 based bootloader. I'll look on it. Thanks.

> 
> Thanks,
> //richard
Mikhail Kshevetskiy June 21, 2019, 7:16 p.m. UTC | #6
On Fri, 21 Jun 2019 20:47:50 +0200 (CEST)
Richard Weinberger <richard@nod.at> wrote:

> ----- Ursprüngliche Mail -----
> >> Can you please share a little more about the problem you are facing?
> >> Also produce_free_peb() should not get called at this point.
> >> So before we flip the order of initialization I'd like to understand the
> >> problem better.
> > 
> > We faced a cycle rebooting in u-boot during ubi initialization. The problem
> > appears approximately once per week on a random router from our test farm.
> > We never trigger this problem in linux (only in u-boot).
> > 
> > From the other side ubi code in u-boot is almost the same as ubi code in
> > linux kernel (it backported from linux periodically), so it make sense to
> > fix it in linux as well to help with future porting.
> > 
> > PS we send the same patch to u-boot.
> 
> In u-boot the story is a little different because it has no concept of
> threads and executes work immediately.
> 
> Do you see this on a recent u-boot?
> With this commit in u-boot this problem should not happen:
> f82290afc847 ("mtd: ubi: Fix worker handling")

no we use 2016.07 based bootloader. I'll look on it. Thanks.

> 
> Thanks,
> //richard
Richard Weinberger June 21, 2019, 7:33 p.m. UTC | #7
----- Ursprüngliche Mail -----
>> > PS we send the same patch to u-boot.
>> 
>> In u-boot the story is a little different because it has no concept of
>> threads and executes work immediately.
>> 
>> Do you see this on a recent u-boot?
>> With this commit in u-boot this problem should not happen:
>> f82290afc847 ("mtd: ubi: Fix worker handling")
> 
> no we use 201.07 based bootloader. I'll look on it. Thanks.

Please backport the said fix and communicate this on the
u-boot mailinglist.

Your patch fixes the issue only partially, you will still face
issues if ubi sees bitflips at attach time.

Thanks,
//richard
Mikhail Kshevetskiy June 21, 2019, 7:41 p.m. UTC | #8
On Fri, 21 Jun 2019 21:33:01 +0200 (CEST)
Richard Weinberger <richard@nod.at> wrote:

> ----- Ursprüngliche Mail -----
> >> > PS we send the same patch to u-boot.
> >> 
> >> In u-boot the story is a little different because it has no concept of
> >> threads and executes work immediately.
> >> 
> >> Do you see this on a recent u-boot?
> >> With this commit in u-boot this problem should not happen:
> >> f82290afc847 ("mtd: ubi: Fix worker handling")
> > 
> > no we use 201.07 based bootloader. I'll look on it. Thanks.
> 
> Please backport the said fix and communicate this on the
> u-boot mailinglist.
> 
> Your patch fixes the issue only partially, you will still face
> issues if ubi sees bitflips at attach time.

thanks a lot.


> Thanks,
> //richard
diff mbox series

Patch

diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c
index 10b2459f8951..8c1d629c0e1d 100644
--- a/drivers/mtd/ubi/attach.c
+++ b/drivers/mtd/ubi/attach.c
@@ -1602,13 +1602,13 @@  int ubi_attach(struct ubi_device *ubi, int force_scan)
 	if (err)
 		goto out_ai;
 
-	err = ubi_wl_init(ubi, ai);
+	err = ubi_eba_init(ubi, ai);
 	if (err)
 		goto out_vtbl;
 
-	err = ubi_eba_init(ubi, ai);
+	err = ubi_wl_init(ubi, ai);
 	if (err)
-		goto out_wl;
+		goto out_vtbl;
 
 #ifdef CONFIG_MTD_UBI_FASTMAP
 	if (ubi->fm && ubi_dbg_chk_fastmap(ubi)) {