Message ID | 20190620132753.19538-1-mikhail.kshevetskiy@gmail.com |
---|---|
State | Rejected |
Delegated to: | Richard Weinberger |
Headers | show |
Series | mtd/ubi: fix initialization order of ubi subsystems | expand |
On Thu, 2019-06-20 at 16:27 +0300, Mikhail Kshevetskiy wrote: > - err = ubi_wl_init(ubi, ai); > + err = ubi_eba_init(ubi, ai); > if (err) > goto out_vtbl; > > - err = ubi_eba_init(ubi, ai); > + err = ubi_wl_init(ubi, ai); > if (err) > - goto out_wl; > + goto out_vtbl; Looks good to me, thanks. Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
On Thu, Jun 20, 2019 at 3:28 PM Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com> wrote: > > during ubi initialization we have a following calling sequence > > 1) ubi_attach() > > ---------------------------------------------------------------- > err = ubi_wl_init(ubi, ai); > if (err) goto out_vtbl; > > err = ubi_eba_init(ubi, ai); > if (err) goto out_wl; > ---------------------------------------------------------------- > > As we can see "eba" subsytem is NOT initialized at the moment of > initializing of "wl" subsystem > > 2) ubi_wl_init() > > it call ensure_wear_leveling() at some moment > > 3) ensure_wear_leveling() > > --------------------------------------------------------------- > e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, u.rb); > e2 = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF); > if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD)) goto out_unlock; > dbg_wl("schedule wear-leveling"); > --------------------------------------------------------------- > > so, if no wear-leveling is scheduled than everything is OK > > and a little bit below > > --------------------------------------------------------------- > wrk->anchor = 0; > wrk->func = &wear_leveling_worker; > if (nested) __schedule_ubi_work(ubi, wrk); > else schedule_ubi_work(ubi, wrk); > --------------------------------------------------------------- > > as result we enter to wear_leveling_worker() function Well, we schedule work, but don't execute it since the ubi-thread is still disabled. Can you please share a little more about the problem you are facing? Also produce_free_peb() should not get called at this point. So before we flip the order of initialization I'd like to understand the problem better. Thanks, //richard
On Fri, 21 Jun 2019 19:26:37 +0200 Richard Weinberger <richard.weinberger@gmail.com> wrote: > On Thu, Jun 20, 2019 at 3:28 PM Mikhail Kshevetskiy > <mikhail.kshevetskiy@gmail.com> wrote: > > > > during ubi initialization we have a following calling sequence > > > > 1) ubi_attach() > > > > ---------------------------------------------------------------- > > err = ubi_wl_init(ubi, ai); > > if (err) goto out_vtbl; > > > > err = ubi_eba_init(ubi, ai); > > if (err) goto out_wl; > > ---------------------------------------------------------------- > > > > As we can see "eba" subsytem is NOT initialized at the moment of > > initializing of "wl" subsystem > > > > 2) ubi_wl_init() > > > > it call ensure_wear_leveling() at some moment > > > > 3) ensure_wear_leveling() > > > > --------------------------------------------------------------- > > e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, u.rb); > > e2 = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF); > > if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD)) goto out_unlock; > > dbg_wl("schedule wear-leveling"); > > --------------------------------------------------------------- > > > > so, if no wear-leveling is scheduled than everything is OK > > > > and a little bit below > > > > --------------------------------------------------------------- > > wrk->anchor = 0; > > wrk->func = &wear_leveling_worker; > > if (nested) __schedule_ubi_work(ubi, wrk); > > else schedule_ubi_work(ubi, wrk); > > --------------------------------------------------------------- > > > > as result we enter to wear_leveling_worker() function > > Well, we schedule work, but don't execute it since the ubi-thread > is still disabled. > > Can you please share a little more about the problem you are facing? > Also produce_free_peb() should not get called at this point. > So before we flip the order of initialization I'd like to understand the > problem better. We faced a cycle rebooting in u-boot during ubi initialization. The problem appears approximately once per week on a random router from our test farm. We never trigger this problem in linux (only in u-boot). From the other side ubi code in u-boot is almost the same as ubi code in linux kernel (it backported from linux periodically), so it make sense to fix it in linux as well to help with future porting. PS we send the same patch to u-boot. Mikhail
----- Ursprüngliche Mail ----- >> Can you please share a little more about the problem you are facing? >> Also produce_free_peb() should not get called at this point. >> So before we flip the order of initialization I'd like to understand the >> problem better. > > We faced a cycle rebooting in u-boot during ubi initialization. The problem > appears approximately once per week on a random router from our test farm. > We never trigger this problem in linux (only in u-boot). > > From the other side ubi code in u-boot is almost the same as ubi code in linux > kernel (it backported from linux periodically), so it make sense to fix it in > linux as well to help with future porting. > > PS we send the same patch to u-boot. In u-boot the story is a little different because it has no concept of threads and executes work immediately. Do you see this on a recent u-boot? With this commit in u-boot this problem should not happen: f82290afc847 ("mtd: ubi: Fix worker handling") Thanks, //richard
On Fri, 21 Jun 2019 20:47:50 +0200 (CEST) Richard Weinberger <richard@nod.at> wrote: > ----- Ursprüngliche Mail ----- > >> Can you please share a little more about the problem you are facing? > >> Also produce_free_peb() should not get called at this point. > >> So before we flip the order of initialization I'd like to understand the > >> problem better. > > > > We faced a cycle rebooting in u-boot during ubi initialization. The problem > > appears approximately once per week on a random router from our test farm. > > We never trigger this problem in linux (only in u-boot). > > > > From the other side ubi code in u-boot is almost the same as ubi code in > > linux kernel (it backported from linux periodically), so it make sense to > > fix it in linux as well to help with future porting. > > > > PS we send the same patch to u-boot. > > In u-boot the story is a little different because it has no concept of > threads and executes work immediately. > > Do you see this on a recent u-boot? > With this commit in u-boot this problem should not happen: > f82290afc847 ("mtd: ubi: Fix worker handling") no we use 201.07 based bootloader. I'll look on it. Thanks. > > Thanks, > //richard
On Fri, 21 Jun 2019 20:47:50 +0200 (CEST) Richard Weinberger <richard@nod.at> wrote: > ----- Ursprüngliche Mail ----- > >> Can you please share a little more about the problem you are facing? > >> Also produce_free_peb() should not get called at this point. > >> So before we flip the order of initialization I'd like to understand the > >> problem better. > > > > We faced a cycle rebooting in u-boot during ubi initialization. The problem > > appears approximately once per week on a random router from our test farm. > > We never trigger this problem in linux (only in u-boot). > > > > From the other side ubi code in u-boot is almost the same as ubi code in > > linux kernel (it backported from linux periodically), so it make sense to > > fix it in linux as well to help with future porting. > > > > PS we send the same patch to u-boot. > > In u-boot the story is a little different because it has no concept of > threads and executes work immediately. > > Do you see this on a recent u-boot? > With this commit in u-boot this problem should not happen: > f82290afc847 ("mtd: ubi: Fix worker handling") no we use 2016.07 based bootloader. I'll look on it. Thanks. > > Thanks, > //richard
----- Ursprüngliche Mail ----- >> > PS we send the same patch to u-boot. >> >> In u-boot the story is a little different because it has no concept of >> threads and executes work immediately. >> >> Do you see this on a recent u-boot? >> With this commit in u-boot this problem should not happen: >> f82290afc847 ("mtd: ubi: Fix worker handling") > > no we use 201.07 based bootloader. I'll look on it. Thanks. Please backport the said fix and communicate this on the u-boot mailinglist. Your patch fixes the issue only partially, you will still face issues if ubi sees bitflips at attach time. Thanks, //richard
On Fri, 21 Jun 2019 21:33:01 +0200 (CEST) Richard Weinberger <richard@nod.at> wrote: > ----- Ursprüngliche Mail ----- > >> > PS we send the same patch to u-boot. > >> > >> In u-boot the story is a little different because it has no concept of > >> threads and executes work immediately. > >> > >> Do you see this on a recent u-boot? > >> With this commit in u-boot this problem should not happen: > >> f82290afc847 ("mtd: ubi: Fix worker handling") > > > > no we use 201.07 based bootloader. I'll look on it. Thanks. > > Please backport the said fix and communicate this on the > u-boot mailinglist. > > Your patch fixes the issue only partially, you will still face > issues if ubi sees bitflips at attach time. thanks a lot. > Thanks, > //richard
diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c index 10b2459f8951..8c1d629c0e1d 100644 --- a/drivers/mtd/ubi/attach.c +++ b/drivers/mtd/ubi/attach.c @@ -1602,13 +1602,13 @@ int ubi_attach(struct ubi_device *ubi, int force_scan) if (err) goto out_ai; - err = ubi_wl_init(ubi, ai); + err = ubi_eba_init(ubi, ai); if (err) goto out_vtbl; - err = ubi_eba_init(ubi, ai); + err = ubi_wl_init(ubi, ai); if (err) - goto out_wl; + goto out_vtbl; #ifdef CONFIG_MTD_UBI_FASTMAP if (ubi->fm && ubi_dbg_chk_fastmap(ubi)) {