Patchwork mtd: nand: omap: fix race condition in omap_wait()

login
register
mail settings
Submitter Ivan Djelic
Date April 17, 2012, 11:11 a.m.
Message ID <1334661113-24709-1-git-send-email-ivan.djelic@parrot.com>
Download mbox | patch
Permalink /patch/153134/
State Accepted
Commit a9c465f07c2dcd515d20b96f93470762f9ae08b6
Headers show

Comments

Ivan Djelic - April 17, 2012, 11:11 a.m.
If a context switch occurs in function omap_wait() just before the
while loop is entered, then upon return from context switch the
timeout may already have elapsed: in that case, status is never
read from NAND device, and omap_wait() returns an error.
This failure has been experimentally observed during stress tests.

This patch ensures a NAND status read is always performed before
returning, as in the generic nand_wait() function.

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
---
 drivers/mtd/nand/omap2.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
Artem Bityutskiy - April 27, 2012, 5:50 a.m.
On Tue, 2012-04-17 at 13:11 +0200, Ivan Djelic wrote:
> If a context switch occurs in function omap_wait() just before the
> while loop is entered, then upon return from context switch the
> timeout may already have elapsed: in that case, status is never
> read from NAND device, and omap_wait() returns an error.
> This failure has been experimentally observed during stress tests.
> 
> This patch ensures a NAND status read is always performed before
> returning, as in the generic nand_wait() function.
> 
> Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>

Pushed this one to l2-mtd.git, thanks!
Mark Olleson - April 27, 2012, 11:26 a.m.
On 27 Apr 2012, at 06:50, Artem Bityutskiy wrote:

> On Tue, 2012-04-17 at 13:11 +0200, Ivan Djelic wrote:
>> If a context switch occurs in function omap_wait() just before the
>> while loop is entered, then upon return from context switch the
>> timeout may already have elapsed: in that case, status is never
>> read from NAND device, and omap_wait() returns an error.
>> This failure has been experimentally observed during stress tests.
>> 
>> This patch ensures a NAND status read is always performed before
>> returning, as in the generic nand_wait() function.
>> 
>> Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
> 
> Pushed this one to l2-mtd.git, thanks!
> 

I'm investigating a problem where omap_wait() returns (apparently after the timeout) without the device being ready.    In my case, the loop has run at least once (my system is lightly loaded so cond_resched() is unlikely to block us for long).   This patch will help in the case where cond_resched() blocks the tread beyond the timeout as well as the case where a context switch occurs before ever reading a value. 

When the timeout is reached without the device becoming ready,  omap_wait() returns with a status value with the NAND_STATUS_FAIL clear, but in many places where omap_wait() is called from only check for  NAND_STATUS_FAIL, and then go on to issue further commands to the device - which fail.   

This includes the code enabled by CONFIG_MTD_NAND_VERIFY_WRITE when then reads back garbage and fails.



Mark
---
Mark Olleson - Senior R&D Engineer
Technology Research & Development Group
Yamaha R&D Centre London

Patch

diff --git a/drivers/mtd/nand/omap2.c b/drivers/mtd/nand/omap2.c
index c2b0bba..45c6205 100644
--- a/drivers/mtd/nand/omap2.c
+++ b/drivers/mtd/nand/omap2.c
@@ -879,7 +879,7 @@  static int omap_wait(struct mtd_info *mtd, struct nand_chip *chip)
 	struct omap_nand_info *info = container_of(mtd, struct omap_nand_info,
 							mtd);
 	unsigned long timeo = jiffies;
-	int status = NAND_STATUS_FAIL, state = this->state;
+	int status, state = this->state;
 
 	if (state == FL_ERASING)
 		timeo += (HZ * 400) / 1000;
@@ -894,6 +894,8 @@  static int omap_wait(struct mtd_info *mtd, struct nand_chip *chip)
 			break;
 		cond_resched();
 	}
+
+	status = gpmc_nand_read(info->gpmc_cs, GPMC_NAND_DATA);
 	return status;
 }