Message ID | 20120606125013.5897a02d@pixies.home.jungo.com |
---|---|
State | New, archived |
Headers | show |
On Wed, 2012-06-06 at 12:50 +0300, Shmulik Ladkani wrote: > + if (!mtd->bitflip_threshold) > + mtd->bitflip_threshold = mtd->ecc_strength; Hmm, why se default is to report bit-flips only when one more flipping bit would cause an ECC error? The default be 1 - report on any bit-flip. The same should be done in 'add_mtd_device()' - we should preserve the old behavior, as it was before these patches. Do I miss something?
Hi Artem, On Wed, 06 Jun 2012 16:30:53 +0300 Artem Bityutskiy <artem.bityutskiy@linux.intel.com> wrote: > On Wed, 2012-06-06 at 12:50 +0300, Shmulik Ladkani wrote: > > + if (!mtd->bitflip_threshold) > > + mtd->bitflip_threshold = mtd->ecc_strength; > > Hmm, why se default is to report bit-flips only when one more flipping > bit would cause an ECC error? The default be 1 - report on any bit-flip. > The same should be done in 'add_mtd_device()' - we should preserve the > old behavior, as it was before these patches. Do I miss something? Mike's patchset modified behavior of mtd_read() to return EUCLEAN only if max number of bitflips (per ecc step) exceeds the bitflip_threshold: return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0; mtd->bitflip_threshold can be set in the following ways: - By the lowlevel driver - By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT previously set by the driver. - Using sysfs attribute Currently, no driver explicitly sets bitflip_threshold, so it defaults to ecc_strength. This is not necessarily 1; it depends on the hardware driver, or software algorithm, etc... So the "old behavior" was not preserved by the patchset, at least for those setups with ecc_strength greater than 1. And I don't think the intention was to preserve 'bitflip_threshold' as 1. As Sascha spotted, the patchset created a problem for 'scan_bbt'. This is since 'scan_bbt' calls 'mtd_read()' to read the BBT pages, however 'bitflip_threshold' is usually NOT YET assigned at that point, which results in mtd_read's condition (quoted above) to ALWAYS return -EUCLEAN. This leads to constantly scrubbing the BBT. Ouch. My suggestion is to assign the default value of 'ecc_strength' to 'bitflip_threshold' at the end of 'nand_scan_tail', PRIOR the call to scan_bbt(). Hence, BBT will be scrubbed only if maximum bitflips exceeded the 'ecc_strength' of this mtd (which is the default behavior). Does it make sense to you? Or would you like to scrub the BBT upon every bitflip, regardless ecc_strength? Regards, Shmulik
On Wed, 2012-06-06 at 18:15 +0300, Shmulik Ladkani wrote: > - By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT > previously set by the driver. But this is wrong. If I use the old doc2000 driver, with ecc_strength = 2, and it works fine for me, and I am happy that UBI scrubs for a single bit-flip, why should my system become broken because someone decided that now UBI should start scrubbing on 2 bit-flips? We should not change the defaults - if I do not set the threshold via sysfs of in the driver, it should be 1. Unless I am completely confused, we should change this, CC -stable if needed, and ask dwmw2 to merge that.
Hi, On Wed, 06 Jun 2012 18:46:15 +0300 Artem Bityutskiy <artem.bityutskiy@linux.intel.com> wrote: > On Wed, 2012-06-06 at 18:15 +0300, Shmulik Ladkani wrote: > > - By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT > > previously set by the driver. > > But this is wrong. If I use the old doc2000 driver, with ecc_strength = > 2, and it works fine for me, and I am happy that UBI scrubs for a single > bit-flip, why should my system become broken because someone decided > that now UBI should start scrubbing on 2 bit-flips? As I remember, the motivation was to reduce unnecessary scrubbing for devices exposing high rate of bitflips but having strong ECC to compensate for. For these users, the system was "broken" in their perspective, as it performed too many scrubbing... Anyway, I understand your point preserving the old behavior. I thought you were aware of this change. > We should not change the defaults - if I do not set the threshold via > sysfs of in the driver, it should be 1. Fair point. This is safer, backwords compatible. But eventually, wouldn't we end up with all the drivers assigning bitflip_threshold to ecc_strength? > Unless I am completely confused, we should change this, CC -stable if > needed, and ask dwmw2 to merge that. No you're not confused. If we'd like to preserve old behavior, the default assignment needs to be changed. BTW this is independent of the 'scan_bbt' bug spotted by Sascha. Regards, Shmulik
On Wed, Jun 06, 2012 at 04:46:15PM +0100, Artem Bityutskiy wrote: > On Wed, 2012-06-06 at 18:15 +0300, Shmulik Ladkani wrote: > > - By 'add_mtd_device'. Here it defaults to 'ecc_strength' if NOT > > previously set by the driver. > > But this is wrong. If I use the old doc2000 driver, with ecc_strength = > 2, and it works fine for me, and I am happy that UBI scrubs for a single > bit-flip, why should my system become broken because someone decided > that now UBI should start scrubbing on 2 bit-flips? > > We should not change the defaults - if I do not set the threshold via > sysfs of in the driver, it should be 1. > > Unless I am completely confused, we should change this, CC -stable if > needed, and ask dwmw2 to merge that. > Hi Artem, The "safest" bitflip_threshold value actually depends on the nand device you are using. If I want to use today's 4-bit and 8-bit nand devices, a threshold of 1 will make my system unusable (because of constant scrubbing and block torturing). The rationale behind Mike's patch is that it should be safe to delay block scrubbing until the nand controller has reached its maximum correction capability; at this point, any additional bitflip would produce uncorrectable errors, so scrubbing is necessary. This is the recommended policy in nand datasheets when the device ecc requirement matches the controller capability. In your doc2000 example, why would it break with a threshold of 2 ? On the other hand, I agree that we should not force this kind of changes on existing systems. Let us list use cases: 1. on legacy systems with 1-bit nand and strength = 1, default bitflip_threshold is 1 2. on legacy systems with 1-bit nand and strength > 1, default bitflip_threshold is 'strength' 3. on new systems with 2+ bit nand and strength > 1, default bitflip_threshold is 'strength' Case 1 is OK, no change. Case 2 introduces an unwanted change. Case 3 (which was not properly handled until Mike's patch) seems OK to me, it fits current datasheet requirements and the driver still has the possibility to explicitly set bitflip_threshold on a per-board basis (because it really depends on the nand device). So rather than reverting to a default bitflip_threshold of 1 for everyone, I suggest we revert the change on case 2 legacy systems like the doc2000 by explicitly setting bitflip_threshold to 1 in the driver. What do you think ? BR,
On Wed, 2012-06-06 at 19:55 +0200, Ivan Djelic wrote: > 1. on legacy systems with 1-bit nand and strength = 1, default bitflip_threshold is 1 > 2. on legacy systems with 1-bit nand and strength > 1, default bitflip_threshold is 'strength' > 3. on new systems with 2+ bit nand and strength > 1, default bitflip_threshold is 'strength' Ivan, Shmulik, I've gave this another though, and I think it is OK to leave this as it is now. Your replies were very helpful, thanks. Shmulik, would you please send your BBT read fix as a nice patch? Please, add a not to the commit message that this fixes an issue introduced in 3.5-rc1 and should be merged before 3.5. Thanks!
On Wed, 2012-06-06 at 19:55 +0200, Ivan Djelic wrote:
> In your doc2000 example, why would it break with a threshold of 2 ?
Yes, "break" was a wrong word. It would just change the behavior. But I
guess it would in fact _improve_ the behavior. Let's not try treat the
legacy specially, I think we can indeed treat this change as an
improvement and se if anyone complains.
On Thu, 07 Jun 2012 10:36:10 +0300 Artem Bityutskiy <artem.bityutskiy@linux.intel.com> wrote: > Shmulik, would you please send your BBT read fix as a nice patch? > Please, add a not to the commit message that this fixes an issue > introduced in 3.5-rc1 and should be merged before 3.5. Thanks! Will do. Regards, Shmulik
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 61805e7..46c7d2b 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -3484,6 +3484,8 @@ int nand_scan_tail(struct mtd_info *mtd) /* propagate ecc info to mtd_info */ mtd->ecclayout = chip->ecc.layout; mtd->ecc_strength = chip->ecc.strength; + if (!mtd->bitflip_threshold) + mtd->bitflip_threshold = mtd->ecc_strength; /* Check, if we should skip the bad block table scan */ if (chip->options & NAND_SKIP_BBTSCAN)