[v2] mtd: rawnand: Do not check FAIL bit when executing a SET_FEATURES op

Message ID 20180511124407.7314-1-boris.brezillon@bootlin.com
State Accepted
Delegated to: Boris Brezillon
Headers show
Series
  • [v2] mtd: rawnand: Do not check FAIL bit when executing a SET_FEATURES op
Related show

Commit Message

Boris Brezillon May 11, 2018, 12:44 p.m.
The ONFI spec clearly says that FAIL bit is only valid for PROGRAM,
ERASE and READ-with-on-die-ECC operations, and should be ignored
otherwise.

It seems that checking it after sending a SET_FEATURES is a bad idea
because a previous READ, PROGRAM or ERASE op may have failed, and
depending on the implementation, the FAIL bit is not cleared until a
new READ, PROGRAM or ERASE is started.

This leads to ->set_features() returning -EIO while it actually worked,
which can sometimes stop a batch of READ/PROGRAM ops.

Note that we only fix the ->exec_op() path here, because some drivers
are abusing the NAND_STATUS_FAIL flag in their ->waitfunc()
implementation to propagate other kind of errors, like
wait-ready-timeout or controller-related errors. Let's not try to fix
those drivers since they worked fine so far.

Fixes: 8878b126df76 ("mtd: nand: add ->exec_op() implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
---
This patch is fixing a problem we had with on-die ECC on Micron
NANDs [1].
On these chips, when you have an ECC failure, the FAIL bit is set and
it's not cleared until the next READ operation, which led the following
SET_FEATURES (used to re-enable on-die ECC) to fail with -EIO and
stopped the batch of page reads started by UBIFS, which in turn led to
unmountable FS.

[1]http://patchwork.ozlabs.org/patch/907874/

Changes in v2:
- Fix the subject prefix
---
 drivers/mtd/nand/raw/nand_base.c | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

Comments

Miquel Raynal May 14, 2018, 8:54 a.m. | #1
Hi Boris,

On Fri, 11 May 2018 14:44:07 +0200, Boris Brezillon
<boris.brezillon@bootlin.com> wrote:

> The ONFI spec clearly says that FAIL bit is only valid for PROGRAM,
> ERASE and READ-with-on-die-ECC operations, and should be ignored
> otherwise.
> 
> It seems that checking it after sending a SET_FEATURES is a bad idea
> because a previous READ, PROGRAM or ERASE op may have failed, and
> depending on the implementation, the FAIL bit is not cleared until a
> new READ, PROGRAM or ERASE is started.
> 
> This leads to ->set_features() returning -EIO while it actually worked,
> which can sometimes stop a batch of READ/PROGRAM ops.
> 
> Note that we only fix the ->exec_op() path here, because some drivers
> are abusing the NAND_STATUS_FAIL flag in their ->waitfunc()
> implementation to propagate other kind of errors, like
> wait-ready-timeout or controller-related errors. Let's not try to fix
> those drivers since they worked fine so far.
> 
> Fixes: 8878b126df76 ("mtd: nand: add ->exec_op() implementation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> ---

So we have no real way to know if a SET_FEATURES actually succeeded.
I checked the ONFI spec and could not find anything. A GET_FEATURES
will do the trick though (when supported).

Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Boris Brezillon May 18, 2018, 6:55 a.m. | #2
On Fri, 11 May 2018 14:44:07 +0200
Boris Brezillon <boris.brezillon@bootlin.com> wrote:

> The ONFI spec clearly says that FAIL bit is only valid for PROGRAM,
> ERASE and READ-with-on-die-ECC operations, and should be ignored
> otherwise.
> 
> It seems that checking it after sending a SET_FEATURES is a bad idea
> because a previous READ, PROGRAM or ERASE op may have failed, and
> depending on the implementation, the FAIL bit is not cleared until a
> new READ, PROGRAM or ERASE is started.
> 
> This leads to ->set_features() returning -EIO while it actually worked,
> which can sometimes stop a batch of READ/PROGRAM ops.
> 
> Note that we only fix the ->exec_op() path here, because some drivers
> are abusing the NAND_STATUS_FAIL flag in their ->waitfunc()
> implementation to propagate other kind of errors, like
> wait-ready-timeout or controller-related errors. Let's not try to fix
> those drivers since they worked fine so far.
> 
> Fixes: 8878b126df76 ("mtd: nand: add ->exec_op() implementation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>

Applied to nand/next.

> ---
> This patch is fixing a problem we had with on-die ECC on Micron
> NANDs [1].
> On these chips, when you have an ECC failure, the FAIL bit is set and
> it's not cleared until the next READ operation, which led the following
> SET_FEATURES (used to re-enable on-die ECC) to fail with -EIO and
> stopped the batch of page reads started by UBIFS, which in turn led to
> unmountable FS.
> 
> [1]http://patchwork.ozlabs.org/patch/907874/
> 
> Changes in v2:
> - Fix the subject prefix
> ---
>  drivers/mtd/nand/raw/nand_base.c | 27 +++++++++------------------
>  1 file changed, 9 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> index f28c3a555861..ee29f34562ab 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -2174,7 +2174,6 @@ static int nand_set_features_op(struct nand_chip *chip, u8 feature,
>  	struct mtd_info *mtd = nand_to_mtd(chip);
>  	const u8 *params = data;
>  	int i, ret;
> -	u8 status;
>  
>  	if (chip->exec_op) {
>  		const struct nand_sdr_timings *sdr =
> @@ -2188,26 +2187,18 @@ static int nand_set_features_op(struct nand_chip *chip, u8 feature,
>  		};
>  		struct nand_operation op = NAND_OPERATION(instrs);
>  
> -		ret = nand_exec_op(chip, &op);
> -		if (ret)
> -			return ret;
> -
> -		ret = nand_status_op(chip, &status);
> -		if (ret)
> -			return ret;
> -	} else {
> -		chip->cmdfunc(mtd, NAND_CMD_SET_FEATURES, feature, -1);
> -		for (i = 0; i < ONFI_SUBFEATURE_PARAM_LEN; ++i)
> -			chip->write_byte(mtd, params[i]);
> +		return nand_exec_op(chip, &op);
> +	}
>  
> -		ret = chip->waitfunc(mtd, chip);
> -		if (ret < 0)
> -			return ret;
> +	chip->cmdfunc(mtd, NAND_CMD_SET_FEATURES, feature, -1);
> +	for (i = 0; i < ONFI_SUBFEATURE_PARAM_LEN; ++i)
> +		chip->write_byte(mtd, params[i]);
>  
> -		status = ret;
> -	}
> +	ret = chip->waitfunc(mtd, chip);
> +	if (ret < 0)
> +		return ret;
>  
> -	if (status & NAND_STATUS_FAIL)
> +	if (ret & NAND_STATUS_FAIL)
>  		return -EIO;
>  
>  	return 0;

Patch

diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index f28c3a555861..ee29f34562ab 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -2174,7 +2174,6 @@  static int nand_set_features_op(struct nand_chip *chip, u8 feature,
 	struct mtd_info *mtd = nand_to_mtd(chip);
 	const u8 *params = data;
 	int i, ret;
-	u8 status;
 
 	if (chip->exec_op) {
 		const struct nand_sdr_timings *sdr =
@@ -2188,26 +2187,18 @@  static int nand_set_features_op(struct nand_chip *chip, u8 feature,
 		};
 		struct nand_operation op = NAND_OPERATION(instrs);
 
-		ret = nand_exec_op(chip, &op);
-		if (ret)
-			return ret;
-
-		ret = nand_status_op(chip, &status);
-		if (ret)
-			return ret;
-	} else {
-		chip->cmdfunc(mtd, NAND_CMD_SET_FEATURES, feature, -1);
-		for (i = 0; i < ONFI_SUBFEATURE_PARAM_LEN; ++i)
-			chip->write_byte(mtd, params[i]);
+		return nand_exec_op(chip, &op);
+	}
 
-		ret = chip->waitfunc(mtd, chip);
-		if (ret < 0)
-			return ret;
+	chip->cmdfunc(mtd, NAND_CMD_SET_FEATURES, feature, -1);
+	for (i = 0; i < ONFI_SUBFEATURE_PARAM_LEN; ++i)
+		chip->write_byte(mtd, params[i]);
 
-		status = ret;
-	}
+	ret = chip->waitfunc(mtd, chip);
+	if (ret < 0)
+		return ret;
 
-	if (status & NAND_STATUS_FAIL)
+	if (ret & NAND_STATUS_FAIL)
 		return -EIO;
 
 	return 0;