Message ID | 20170329153836.GB29118@axis.com |
---|---|
State | Superseded |
Delegated to: | Richard Weinberger |
Headers | show |
Jesper, Am 29.03.2017 um 17:38 schrieb Jesper Nilsson: > MTD_UBI_FASTMAP has been set as experimental since it > was merged back in 2012. > > There hasn't been much change in the format, > so we can consider the feature stable and start > being careful about breaking the format. > (This is somewhat of a pre-requisite for anyone actually > using the feature in the real world and depending on it) > > Drop the experimental note and the warning text about > the on-flash format not being finalized. I fully agree, we can drop this note. But we have to add another one. While Fastmap is a nice feature to speed-up the attach time it comes with a cost. It makes UBI less robust. I saw issues on NAND chips which misbehaved slightly where UBI was able to recover when using a full scan but not when Fastmap was used. The UBI full scan code is paranoid and can sort out problems very early, with Fastmap enabled you lose this valuable property. So, users should enable Fastmap only when they absolutely need a very fast attach time and be very sure that the NAND works as expected. Thanks, //richard
On 03/29/2017 10:04 PM, Richard Weinberger wrote: > Jesper, > > Am 29.03.2017 um 17:38 schrieb Jesper Nilsson: >> MTD_UBI_FASTMAP has been set as experimental since it >> was merged back in 2012. >> >> There hasn't been much change in the format, >> so we can consider the feature stable and start >> being careful about breaking the format. >> (This is somewhat of a pre-requisite for anyone actually >> using the feature in the real world and depending on it) >> >> Drop the experimental note and the warning text about >> the on-flash format not being finalized. > > I fully agree, we can drop this note. But we have to add another > one. > While Fastmap is a nice feature to speed-up the attach time it > comes with a cost. It makes UBI less robust. I saw issues > on NAND chips which misbehaved slightly where UBI was able to > recover when using a full scan but not when Fastmap was used. > The UBI full scan code is paranoid and can sort out problems > very early, with Fastmap enabled you lose this valuable property. > > So, users should enable Fastmap only when they absolutely need > a very fast attach time and be very sure that the NAND works as > expected. So we should document this with a big fat warning and set fastmap to default=n ?
Hi Richard, Marek, On Thu, Mar 30, 2017 at 12:01:41PM +0200, Marek Vasut wrote: > On 03/29/2017 10:04 PM, Richard Weinberger wrote: > > Jesper, > > > > Am 29.03.2017 um 17:38 schrieb Jesper Nilsson: > >> MTD_UBI_FASTMAP has been set as experimental since it > >> was merged back in 2012. > >> > >> There hasn't been much change in the format, > >> so we can consider the feature stable and start > >> being careful about breaking the format. > >> (This is somewhat of a pre-requisite for anyone actually > >> using the feature in the real world and depending on it) > >> > >> Drop the experimental note and the warning text about > >> the on-flash format not being finalized. > > > > I fully agree, we can drop this note. But we have to add another > > one. > > While Fastmap is a nice feature to speed-up the attach time it > > comes with a cost. It makes UBI less robust. I saw issues > > on NAND chips which misbehaved slightly where UBI was able to > > recover when using a full scan but not when Fastmap was used. > > The UBI full scan code is paranoid and can sort out problems > > very early, with Fastmap enabled you lose this valuable property. > > > > So, users should enable Fastmap only when they absolutely need > > a very fast attach time and be very sure that the NAND works as > > expected. > > So we should document this with a big fat warning and set fastmap to > default=n ? Does this sound reasonable? Note that this feature makes UBI less robust, since Fastmap does not scan the full flash, which might lead to problems on misbehaving NAND chips. Only enable this if the speedup in attach is really important and you can be sure that the NAND works as expected. > Best regards, > Marek Vasut /^JN - Jesper Nilsson
Jesper, Am 30.03.2017 um 19:39 schrieb Jesper Nilsson: >> So we should document this with a big fat warning and set fastmap to >> default=n ? > > Does this sound reasonable? > > Note that this feature makes UBI less robust, since Fastmap does not scan > the full flash, which might lead to problems on misbehaving NAND chips. > Only enable this if the speedup in attach is really important and I'm not a native English speaker, but shouldn't this be "...if speedup of the attach time is important ..." > you can be sure that the NAND works as expected. Looks fine! Thanks, //richard
Hi Richard, On Thu, Mar 30, 2017 at 11:29:15PM +0200, Richard Weinberger wrote: > Jesper, > > Am 30.03.2017 um 19:39 schrieb Jesper Nilsson: > >> So we should document this with a big fat warning and set fastmap to > >> default=n ? > > > > Does this sound reasonable? > > > > Note that this feature makes UBI less robust, since Fastmap does not scan > > the full flash, which might lead to problems on misbehaving NAND chips. > > Only enable this if the speedup in attach is really important and > > I'm not a native English speaker, but shouldn't this be > "...if speedup of the attach time is important ..." > > > you can be sure that the NAND works as expected. > > Looks fine! As you saw I resent the patch with this formulation added. However, after thinking about it (and with input from some coworkers), could we pinpoint the failure case a bit more here? What is the exact problem behaviour on NAND chips that we're worried about, and in which case will UBI be less robust if we don't scan the full flash? My first reaction was that this was a natural conclusion, but if the NAND flash is failing, we should either be in the case that the FASTMAP is corrupted or that the original data is corrupted. Both should be found by current implementation. Or am I missing additional failure cases here? I getting a bit worried about using the feature at all, even if it seems to work right now... > Thanks, > //richard /^JN - Jesper Nilsson
Hi Richard, I'm still worried about this failure case, do we really believe that the flash could fail in such a way that the fastmap is corrupted in an undetectable way? If we do detect corruption we should be no worse off than earlier since we should ignore the fastmap, IIRC. Could you please elaborate on the problem you were thinking about? Right now I'm hesitant to use fastmap in any production code, even if it works with my current hardware, since there is no guarantee that the flash chips won't get replaced with a second source option down the line... /Jesper On Mon, Apr 03, 2017 at 01:17:36PM +0200, Jesper Nilsson wrote: > Hi Richard, > > On Thu, Mar 30, 2017 at 11:29:15PM +0200, Richard Weinberger wrote: > > Jesper, > > > > Am 30.03.2017 um 19:39 schrieb Jesper Nilsson: > > >> So we should document this with a big fat warning and set fastmap to > > >> default=n ? > > > > > > Does this sound reasonable? > > > > > > Note that this feature makes UBI less robust, since Fastmap does not scan > > > the full flash, which might lead to problems on misbehaving NAND chips. > > > Only enable this if the speedup in attach is really important and > > > > I'm not a native English speaker, but shouldn't this be > > "...if speedup of the attach time is important ..." > > > > > you can be sure that the NAND works as expected. > > > > Looks fine! > > As you saw I resent the patch with this formulation added. > > However, after thinking about it (and with input from some coworkers), > could we pinpoint the failure case a bit more here? > > What is the exact problem behaviour on NAND chips that we're > worried about, and in which case will UBI be less robust if > we don't scan the full flash? > > My first reaction was that this was a natural conclusion, > but if the NAND flash is failing, we should either be in the > case that the FASTMAP is corrupted or that the original data > is corrupted. Both should be found by current implementation. > Or am I missing additional failure cases here? > > I getting a bit worried about using the feature at all, > even if it seems to work right now... > > > Thanks, > > //richard > > /^JN - Jesper Nilsson > -- > Jesper Nilsson -- jesper.nilsson@axis.com /^JN - Jesper Nilsson
Jesper, Am 09.05.2017 um 09:46 schrieb Jesper Nilsson: > Hi Richard, > > I'm still worried about this failure case, do we really > believe that the flash could fail in such a way that the > fastmap is corrupted in an undetectable way? > > If we do detect corruption we should be no worse off > than earlier since we should ignore the fastmap, IIRC. In a perfect world, yes. > Could you please elaborate on the problem you were > thinking about? e.g. commit 74f2c6e9a47cf4e508198c8594626cc82906a13d Author: Richard Weinberger <richard@nod.at> Date: Tue Jun 14 10:12:17 2016 +0200 ubi: Be more paranoid while seaching for the most recent Fastmap Since PEB erasure is asynchornous it can happen that there is more than one Fastmap on the MTD. This is fine because the attach logic will pick the Fastmap data structure with the highest sequence number. On a not so well configured MTD stack spurious ECC errors are common. Causes can be different, bad hardware, wrong operating modes, etc... If the most current Fastmap renders bad due to ECC errors UBI might pick an older Fastmap to attach from. While this can only happen on an anyway broken setup it will show completely different sympthoms and makes finding the root cause much more difficult. So, be debug friendly and fall back to scanning mode of we're facing an ECC error while scanning for Fastmap. Cc: <stable@vger.kernel.org> Signed-off-by: Richard Weinberger <richard@nod.at> > Right now I'm hesitant to use fastmap in any production code, > even if it works with my current hardware, since there is no > guarantee that the flash chips won't get replaced with a > second source option down the line... Fastmap is an aggressive optimization and makes finding issues much harder. Thanks, //richard
diff --git a/drivers/mtd/ubi/Kconfig b/drivers/mtd/ubi/Kconfig index f0855ce..019e261 100644 --- a/drivers/mtd/ubi/Kconfig +++ b/drivers/mtd/ubi/Kconfig @@ -57,12 +57,9 @@ config MTD_UBI_BEB_LIMIT Leave the default value if unsure. config MTD_UBI_FASTMAP - bool "UBI Fastmap (Experimental feature)" + bool "UBI Fastmap" default n help - Important: this feature is experimental so far and the on-flash - format for fastmap may change in the next kernel versions - Fastmap is a mechanism which allows attaching an UBI device in nearly constant time. Instead of scanning the whole MTD device it only has to locate a checkpoint (called fastmap) on the device.
MTD_UBI_FASTMAP has been set as experimental since it was merged back in 2012. There hasn't been much change in the format, so we can consider the feature stable and start being careful about breaking the format. (This is somewhat of a pre-requisite for anyone actually using the feature in the real world and depending on it) Drop the experimental note and the warning text about the on-flash format not being finalized. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> --- drivers/mtd/ubi/Kconfig | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)