diff mbox

[v2] mtd: nand: omap: fix BCHx ecc.correct to return detected bit-flips in erased-page

Message ID 1395730699-14687-1-git-send-email-pekon@ti.com
State Accepted
Headers show

Commit Message

pekon gupta March 25, 2014, 6:58 a.m. UTC
fixes: commit 62116e5171e00f85a8d53f76e45b84423c89ff34
       mtd: nand: omap2: Support for hardware BCH error correction.

In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
correctable limit (< ecc.strength), then it is not indicated back to the caller
ecc->read_page().
This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
perfectly clean and use it for writing even if actual bitflip_count was
dangerously high (bitflip_count > mtd->bitflip_threshold).

This patch fixes this above issue, by returning 'stats' to caller
ecc->read_page() under all scenarios.

Reported-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/omap2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

pekon gupta March 25, 2014, 7:05 a.m. UTC | #1
Hi Brian,


>
>fixes: commit 62116e5171e00f85a8d53f76e45b84423c89ff34
>       mtd: nand: omap2: Support for hardware BCH error correction.
>
>In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
>correctable limit (< ecc.strength), then it is not indicated back to the caller
>ecc->read_page().
>This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
>perfectly clean and use it for writing even if actual bitflip_count was
>dangerously high (bitflip_count > mtd->bitflip_threshold).
>
>This patch fixes this above issue, by returning 'stats' to caller
>ecc->read_page() under all scenarios.
>
>Reported-by: Brian Norris <computersforpeace@gmail.com>
>Signed-off-by: Pekon Gupta <pekon@ti.com>
>---
I you think this qualifies for stable then please mark it for
CC: <stable@vger.kernel.org> # 3.9.x+


with regards, pekon
pekon gupta April 23, 2014, 6:29 a.m. UTC | #2
Hi Brian,

>>fixes: commit 62116e5171e00f85a8d53f76e45b84423c89ff34
>>       mtd: nand: omap2: Support for hardware BCH error correction.
>>
>>In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
>>correctable limit (< ecc.strength), then it is not indicated back to the caller
>>ecc->read_page().
>>This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
>>perfectly clean and use it for writing even if actual bitflip_count was
>>dangerously high (bitflip_count > mtd->bitflip_threshold).
>>
>>This patch fixes this above issue, by returning 'stats' to caller
>>ecc->read_page() under all scenarios.
>>
>>Reported-by: Brian Norris <computersforpeace@gmail.com>
>>Signed-off-by: Pekon Gupta <pekon@ti.com>
>>---
>I you think this qualifies for stable then please mark it for
>CC: <stable@vger.kernel.org> # 3.9.x+
>
>
Ping on this.. 

with regards, pekon
Brian Norris May 12, 2014, 8:36 p.m. UTC | #3
On Tue, Mar 25, 2014 at 12:28:19PM +0530, Pekon Gupta wrote:
> fixes: commit 62116e5171e00f85a8d53f76e45b84423c89ff34
>        mtd: nand: omap2: Support for hardware BCH error correction.
> 
> In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
> correctable limit (< ecc.strength), then it is not indicated back to the caller
> ecc->read_page().
> This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
> perfectly clean and use it for writing even if actual bitflip_count was
> dangerously high (bitflip_count > mtd->bitflip_threshold).
> 
> This patch fixes this above issue, by returning 'stats' to caller
> ecc->read_page() under all scenarios.
> 
> Reported-by: Brian Norris <computersforpeace@gmail.com>
> Signed-off-by: Pekon Gupta <pekon@ti.com>

Pushed to l2-mtd.git. Thanks!

Brian
pekon gupta May 19, 2014, 5 a.m. UTC | #4
Hi Brian,

>From: Brian Norris [mailto:computersforpeace@gmail.com]
>
>On Tue, Mar 25, 2014 at 12:28:19PM +0530, Pekon Gupta wrote:
>> fixes: commit 62116e5171e00f85a8d53f76e45b84423c89ff34
>>        mtd: nand: omap2: Support for hardware BCH error correction.
>>
>> In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
>> correctable limit (< ecc.strength), then it is not indicated back to the caller
>> ecc->read_page().
>> This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
>> perfectly clean and use it for writing even if actual bitflip_count was
>> dangerously high (bitflip_count > mtd->bitflip_threshold).
>>
>> This patch fixes this above issue, by returning 'stats' to caller
>> ecc->read_page() under all scenarios.
>>
>> Reported-by: Brian Norris <computersforpeace@gmail.com>
>> Signed-off-by: Pekon Gupta <pekon@ti.com>
>
>Pushed to l2-mtd.git. Thanks!
>

Just forgot to ask, Is this patch candidate for stable? 

This one fixes the long standing issue of returning bit-flip count of an
erased-page to upper layers. So that upper layers take correct decision.


with regards, pekon
Ezequiel Garcia May 19, 2014, 12:20 p.m. UTC | #5
Hi Pekon,

On 19 May 05:00 AM, Gupta, Pekon wrote:
> >From: Brian Norris [mailto:computersforpeace@gmail.com]
> >
> >On Tue, Mar 25, 2014 at 12:28:19PM +0530, Pekon Gupta wrote:
> >> fixes: commit 62116e5171e00f85a8d53f76e45b84423c89ff34
> >>        mtd: nand: omap2: Support for hardware BCH error correction.
> >>
> >> In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
> >> correctable limit (< ecc.strength), then it is not indicated back to the caller
> >> ecc->read_page().
> >> This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
> >> perfectly clean and use it for writing even if actual bitflip_count was
> >> dangerously high (bitflip_count > mtd->bitflip_threshold).
> >>
> >> This patch fixes this above issue, by returning 'stats' to caller
> >> ecc->read_page() under all scenarios.
> >>
> >> Reported-by: Brian Norris <computersforpeace@gmail.com>
> >> Signed-off-by: Pekon Gupta <pekon@ti.com>
> >
> >Pushed to l2-mtd.git. Thanks!
> >
> 
> Just forgot to ask, Is this patch candidate for stable? 
> 
> This one fixes the long standing issue of returning bit-flip count of an
> erased-page to upper layers. So that upper layers take correct decision.
> 

Yes, if the commit fixes a bug then it can be marked for stable.
There's some instructions on this subject, please take a look at [1] and [2].
I think this patch qualifies the stable rules.

In particular, if a patch fixes a bug and you know the culprit commit you
can add a Fixes tag to pass such information, with a 12-digit commit ID and
the commit title enclosed in double-quotes [1]:

  Fixes: 62116e5171e0 ("mtd: nand: omap2: Support for hardware BCH error correction")

To mark it for stable [2], you need add a Cc tag like this:

  Cc: <stable@vger.kernel.org> # 3.x.y

You can use something like this to find the releases that contain the bug:

git tag -l --contains 62116e5171e0 | grep ^v3

It's handy to add that as a git alias:

[alias]
  ..
  ontags = !sh -c 'git tag -l --contains $1 | grep ^v[23]' -

However, now that Brian has pushed the patch, you don't need to do any of
this. AFAIK, maintainers usually take care of all the above.

[1] Documentation/SubmittingPatches
[2] Documentation/stable_kernel_rules.txt.

Hope this helps,
pekon gupta May 20, 2014, 4:43 a.m. UTC | #6
>From: Ezequiel Garcia [mailto:ezequiel.garcia@free-electrons.com]
>>On 19 May 05:00 AM, Gupta, Pekon wrote:
>> >From: Brian Norris [mailto:computersforpeace@gmail.com]
>>> >On Tue, Mar 25, 2014 at 12:28:19PM +0530, Pekon Gupta wrote:

>> >> fixes: commit 62116e5171e00f85a8d53f76e45b84423c89ff34
>> >>        mtd: nand: omap2: Support for hardware BCH error correction.
>> >>
>> >> In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
>> >> correctable limit (< ecc.strength), then it is not indicated back to the caller
>> >> ecc->read_page().
>> >> This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
>> >> perfectly clean and use it for writing even if actual bitflip_count was
>> >> dangerously high (bitflip_count > mtd->bitflip_threshold).
>> >>
>> >> This patch fixes this above issue, by returning 'stats' to caller
>> >> ecc->read_page() under all scenarios.
>> >>
>> >> Reported-by: Brian Norris <computersforpeace@gmail.com>
>> >> Signed-off-by: Pekon Gupta <pekon@ti.com>
>> >
>> >Pushed to l2-mtd.git. Thanks!
>> >
>>
>> Just forgot to ask, Is this patch candidate for stable?
>>
>> This one fixes the long standing issue of returning bit-flip count of an
>> erased-page to upper layers. So that upper layers take correct decision.
>>
>
>Yes, if the commit fixes a bug then it can be marked for stable.
>There's some instructions on this subject, please take a look at [1] and [2].
>I think this patch qualifies the stable rules.
>
>In particular, if a patch fixes a bug and you know the culprit commit you
>can add a Fixes tag to pass such information, with a 12-digit commit ID and
>the commit title enclosed in double-quotes [1]:
>
>  Fixes: 62116e5171e0 ("mtd: nand: omap2: Support for hardware BCH error correction")
>
>To mark it for stable [2], you need add a Cc tag like this:
>
>  Cc: <stable@vger.kernel.org> # 3.x.y
>
Yes, the "fixes" tag is there in the commit log.
And I had requested Brian earlier to see if this can be marked for stable.
I think he missed that email while pushing this patch.

CC: <stable@vger.kernel.org> # 3.9.x+
http://lists.infradead.org/pipermail/linux-mtd/2014-March/052843.html



>You can use something like this to find the releases that contain the bug:
>
>git tag -l --contains 62116e5171e0 | grep ^v3
>
>It's handy to add that as a git alias:
>
>[alias]
>  ..
>  ontags = !sh -c 'git tag -l --contains $1 | grep ^v[23]' -
>
>However, now that Brian has pushed the patch, you don't need to do any of
>this. AFAIK, maintainers usually take care of all the above.
>
Thanks Ezequiel, this is helpful ..
But now as this patch is not marked with stable tag, I doubt it will
be pulled in by stable-tree maintainers, as they run a script on linus'
tree and fetch only those commits which have CC: <stable@vger.kernel.org>

So, I just want to check with Brian, if there is a possibility he can edit
The commit log and mark this one for stable? Since he has still not sent
a pull request .. 


Brian,

This fix is really important to be marked as stable, as some users of OMAP
platforms have spawning out from 3.10 and 3.12 kernel to go for production.
And quite a few of them might be using NAND for their booting purpose,
so without this patch any bit-flip on their erased-pages will not be detected.



>[1] Documentation/SubmittingPatches
>[2] Documentation/stable_kernel_rules.txt.
>
>Hope this helps,

Sure.. Thanks.


with regards, pekon
Brian Norris May 20, 2014, 11:42 p.m. UTC | #7
On Tue, May 20, 2014 at 04:43:35AM +0000, Pekon Gupta wrote:
> This fix is really important to be marked as stable, as some users of OMAP
> platforms have spawning out from 3.10 and 3.12 kernel to go for production.
> And quite a few of them might be using NAND for their booting purpose,
> so without this patch any bit-flip on their erased-pages will not be detected.

Well, I don't think it's actually as dire as you say. The erased-page
bitflips won't be flagged as -EUCLEAN, but they will be "corrected." I
expect that this should be fine mostly.

But anyway, tagged for -stable and re-queued to l2-mtd.git.

In the future, I may be less likely to do this kind of rebasing. Note
that there is still a procedure for getting fixes to -stable after their
merging:

<quote>
 - Send the patch, after verifying that it follows the above rules, to
   stable@vger.kernel.org.  You must note the upstream commit ID in the
   changelog of your submission, as well as the kernel version you wish
   it to be applied to.
</quote>

Brian
pekon gupta May 22, 2014, 5:56 p.m. UTC | #8
>From: Brian Norris [mailto:computersforpeace@gmail.com]
>
>On Tue, May 20, 2014 at 04:43:35AM +0000, Pekon Gupta wrote:
>> This fix is really important to be marked as stable, as some users of OMAP
>> platforms have spawning out from 3.10 and 3.12 kernel to go for production.
>> And quite a few of them might be using NAND for their booting purpose,
>> so without this patch any bit-flip on their erased-pages will not be detected.
>
>Well, I don't think it's actually as dire as you say. The erased-page
>bitflips won't be flagged as -EUCLEAN, but they will be "corrected." I
>expect that this should be fine mostly.
>
>But anyway, tagged for -stable and re-queued to l2-mtd.git.
>
Thanks much from TI support team. You saved them from many
unrelated queries :-), as some newer technology MLC NAND are
showing significant number of bit-flips on erased-pages.
Some users compare do apple to apple comparison of logs on
different kernels. And so if older versions of kernel were not 
howing bit-flips in erased-page and newer version are showing,
there is suspicion on driver.


>In the future, I may be less likely to do this kind of rebasing. Note
>that there is still a procedure for getting fixes to -stable after their
>merging:
>
><quote>
> - Send the patch, after verifying that it follows the above rules, to
>   stable@vger.kernel.org.  You must note the upstream commit ID in the
>   changelog of your submission, as well as the kernel version you wish
>   it to be applied to.
></quote>
>
Ok thanks. But I think anything going to stable would still require your Ack
as maintainer. I myself be more alert next time to mark genuine ones
stable in commit log.


with regards, pekon
diff mbox

Patch

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index dd8c0dc..c359af0 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -1442,7 +1442,7 @@  static int omap_elm_correct_data(struct mtd_info *mtd, u_char *data,
 
 	/* Check if any error reported */
 	if (!is_error_reported)
-		return 0;
+		return stat;
 
 	/* Decode BCH error using ELM module */
 	elm_decode_bch_error_page(info->elm_dev, ecc_vec, err_vec);