diff mbox series

[2/2] mtd: rawnand: Bypass a couple of sanity checks during NAND identification

Message ID 20240507160546.130255-3-miquel.raynal@bootlin.com
State New
Headers show
Series mtd: rawnand: NAND early identification fixes | expand

Commit Message

Miquel Raynal May 7, 2024, 4:05 p.m. UTC
Early during NAND identification, mtd_info fields have not yet been
initialized (namely, writesize and oobsize) and thus cannot be used for
sanity checks yet. Of course if there is a misuse of
nand_change_read_column_op() so early we won't be warned, but there is
anyway no actual check to perform at this stage as we do not yet know
the NAND geometry.

So, if the fields are empty, especially mtd->writesize which is *always*
set quite rapidly after identification, let's skip the sanity checks.

nand_change_read_column_op() is subject to be used early for ONFI/JEDEC
identification in the very unlikely case of:
- bitflips appearing in the parameter page,
- the controller driver not supporting simple DATA_IN cycles.

Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers")
Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers")
Cc: stable@vger.kernel.org
Reported-by: Alexander Dahl <ada@thorsis.com>
Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/
Reported-by: Steven Seeger <steven.seeger@flightsystems.net>
Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 drivers/mtd/nand/raw/nand_base.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Alexander Dahl May 8, 2024, 6:41 a.m. UTC | #1
Hello Miquel,

Am Tue, May 07, 2024 at 06:05:46PM +0200 schrieb Miquel Raynal:
> Early during NAND identification, mtd_info fields have not yet been
> initialized (namely, writesize and oobsize) and thus cannot be used for
> sanity checks yet. Of course if there is a misuse of
> nand_change_read_column_op() so early we won't be warned, but there is
> anyway no actual check to perform at this stage as we do not yet know
> the NAND geometry.
> 
> So, if the fields are empty, especially mtd->writesize which is *always*
> set quite rapidly after identification, let's skip the sanity checks.
> 
> nand_change_read_column_op() is subject to be used early for ONFI/JEDEC
> identification in the very unlikely case of:
> - bitflips appearing in the parameter page,
> - the controller driver not supporting simple DATA_IN cycles.
> 
> Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers")
> Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers")
> Cc: stable@vger.kernel.org
> Reported-by: Alexander Dahl <ada@thorsis.com>
> Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/
> Reported-by: Steven Seeger <steven.seeger@flightsystems.net>
> Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  drivers/mtd/nand/raw/nand_base.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> index 248e654ecefd..a66e73cd68cb 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip,
>  	if (len && !buf)
>  		return -EINVAL;
>  
> -	if (offset_in_page + len > mtd->writesize + mtd->oobsize)
> -		return -EINVAL;
> +	if (mtd->writesize) {
> +		if ((offset_in_page + len > mtd->writesize + mtd->oobsize))
> +			return -EINVAL;

These doubled (( )) are new and I think not necessary?

Greets
Alex

>  
> -	/* Small page NANDs do not support column change. */
> -	if (mtd->writesize <= 512)
> -		return -ENOTSUPP;
> +		/* Small page NANDs do not support column change. */
> +		if (mtd->writesize <= 512)
> +			return -ENOTSUPP;
> +	}
>  
>  	if (nand_has_exec_op(chip)) {
>  		const struct nand_interface_config *conf =
> -- 
> 2.40.1
>
Miquel Raynal May 13, 2024, 7:05 a.m. UTC | #2
Hi Alexander,

ada@thorsis.com wrote on Wed, 8 May 2024 08:41:44 +0200:

> Hello Miquel,
> 
> Am Tue, May 07, 2024 at 06:05:46PM +0200 schrieb Miquel Raynal:
> > Early during NAND identification, mtd_info fields have not yet been
> > initialized (namely, writesize and oobsize) and thus cannot be used for
> > sanity checks yet. Of course if there is a misuse of
> > nand_change_read_column_op() so early we won't be warned, but there is
> > anyway no actual check to perform at this stage as we do not yet know
> > the NAND geometry.
> > 
> > So, if the fields are empty, especially mtd->writesize which is *always*
> > set quite rapidly after identification, let's skip the sanity checks.
> > 
> > nand_change_read_column_op() is subject to be used early for ONFI/JEDEC
> > identification in the very unlikely case of:
> > - bitflips appearing in the parameter page,
> > - the controller driver not supporting simple DATA_IN cycles.
> > 
> > Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers")
> > Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers")
> > Cc: stable@vger.kernel.org
> > Reported-by: Alexander Dahl <ada@thorsis.com>
> > Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/
> > Reported-by: Steven Seeger <steven.seeger@flightsystems.net>
> > Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> >  drivers/mtd/nand/raw/nand_base.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> > index 248e654ecefd..a66e73cd68cb 100644
> > --- a/drivers/mtd/nand/raw/nand_base.c
> > +++ b/drivers/mtd/nand/raw/nand_base.c
> > @@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip,
> >  	if (len && !buf)
> >  		return -EINVAL;
> >  
> > -	if (offset_in_page + len > mtd->writesize + mtd->oobsize)
> > -		return -EINVAL;
> > +	if (mtd->writesize) {
> > +		if ((offset_in_page + len > mtd->writesize + mtd->oobsize))
> > +			return -EINVAL;  
> 
> These doubled (( )) are new and I think not necessary?

Oops, true.

Any chances you'll be able to test the patchset?

Same question for Steven!

Cheers,
Miquèl
Sascha Hauer May 14, 2024, 12:25 p.m. UTC | #3
Hi Miquel,

On Tue, May 07, 2024 at 06:05:46PM +0200, Miquel Raynal wrote:
> Early during NAND identification, mtd_info fields have not yet been
> initialized (namely, writesize and oobsize) and thus cannot be used for
> sanity checks yet. Of course if there is a misuse of
> nand_change_read_column_op() so early we won't be warned, but there is
> anyway no actual check to perform at this stage as we do not yet know
> the NAND geometry.
> 
> So, if the fields are empty, especially mtd->writesize which is *always*
> set quite rapidly after identification, let's skip the sanity checks.
> 
> nand_change_read_column_op() is subject to be used early for ONFI/JEDEC
> identification in the very unlikely case of:
> - bitflips appearing in the parameter page,
> - the controller driver not supporting simple DATA_IN cycles.
> 
> Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers")
> Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers")
> Cc: stable@vger.kernel.org
> Reported-by: Alexander Dahl <ada@thorsis.com>
> Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/
> Reported-by: Steven Seeger <steven.seeger@flightsystems.net>
> Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/
> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> ---
>  drivers/mtd/nand/raw/nand_base.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> index 248e654ecefd..a66e73cd68cb 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip,
>  	if (len && !buf)
>  		return -EINVAL;
>  
> -	if (offset_in_page + len > mtd->writesize + mtd->oobsize)
> -		return -EINVAL;
> +	if (mtd->writesize) {
> +		if ((offset_in_page + len > mtd->writesize + mtd->oobsize))
> +			return -EINVAL;
>  
> -	/* Small page NANDs do not support column change. */
> -	if (mtd->writesize <= 512)
> -		return -ENOTSUPP;
> +		/* Small page NANDs do not support column change. */
> +		if (mtd->writesize <= 512)
> +			return -ENOTSUPP;
> +	}

This is not enough. A few lines further down nand_fill_column_cycles()
is called which also uses mtd->writesize. This function also needs to
know if we have a large page or small page NAND, so bypassing the checks
won't be enough there.

Sascha
Steven Seeger May 14, 2024, 5:57 p.m. UTC | #4
On Tuesday, May 7, 2024, Miquel Raynal wrote:

>So, if the fields are empty, especially mtd->writesize which is *always*
>set quite rapidly after identification, let's skip the sanity checks.

I noticed this when first looking at my board with the bitflip in a NAND chip's parameter page. I just assumed that since I was setting it up to do column change operations, I needed to add this in at the time. Looking at it now, since this information is being supplied by me before the scan, it's wrong.  So I agree it's a bug, but I didn't think about it again since I was tackling the bug of trying to read additional parameter page copies further down the page due to the bitflip.

I don't have access to the board right now, but when I get to it again I can try this patch. I will need to remove what I already added in to check and reply back. It may be a few weeks, though.

On another note, I think that this entire API would benefit from discouraging hybrid approaches. I implement function overloads for things like ecc.read_page, write_page, read_page_raw, etc, but also use the exec function for things like erase, read id, read parameter page, etc. I maybe did it "wrong" but it works. Past drivers I've done use the legacy cmdfunc, so this was my first attempt at using the command parser. I suspect there are a lot of people like me writing drivers for proprietary hardware that uses FPGAs to do some of the NAND interaction, rather than direct chip access as the API was originally designed for.

So, to explain further, read_page triggers my addr/chip select, read page command, and retrieving the buffer. Read parameter page goes through the command parser, as does the column change op, with some state variables to keep track of where in the read cycle we are so that each copy of the parameter page data can be accessed in the buffer. I lament the lack of consistency here. But, it works and the customer is unlikely to want to change anything at this point. :)

Steven
Miquel Raynal May 16, 2024, 7:52 a.m. UTC | #5
Hi Steven,

steven.seeger@flightsystems.net wrote on Tue, 14 May 2024 17:57:46
+0000:

> On Tuesday, May 7, 2024, Miquel Raynal wrote:
> 
> >So, if the fields are empty, especially mtd->writesize which is *always*
> >set quite rapidly after identification, let's skip the sanity checks.  
> 
> I noticed this when first looking at my board with the bitflip in a NAND chip's parameter page. I just assumed that since I was setting it up to do column change operations, I needed to add this in at the time. Looking at it now, since this information is being supplied by me before the scan, it's wrong.  So I agree it's a bug, but I didn't think about it again since I was tackling the bug of trying to read additional parameter page copies further down the page due to the bitflip.
> 
> I don't have access to the board right now, but when I get to it again I can try this patch. I will need to remove what I already added in to check and reply back. It may be a few weeks, though.
> 
> On another note, I think that this entire API would benefit from discouraging hybrid approaches. I implement function overloads for things like ecc.read_page, write_page, read_page_raw, etc, but also use the exec function for things like erase, read id, read parameter page, etc. I maybe did it "wrong" but it works. Past drivers I've done use the legacy cmdfunc, so this was my first attempt at using the command parser. I suspect there are a lot of people like me writing drivers for proprietary hardware that uses FPGAs to do some of the NAND interaction, rather than direct chip access as the API was originally designed for.

I don't know what you mean with direct chip access. ->cmd_ctrl() and
->cmdfunc() were desperately too simple and many drivers started
guessing what the core was trying to do, making it very hard to
extend/modify the core without breaking them all. This was sign
of an inadequate design and hence ->exec_op() (providing all the
operation) was introduced.

Just to make it clear, the original APIs were totally fine back then,
but controllers evolved, became smarter^W more complex, until the APIs
were no longer fitting.

> So, to explain further, read_page triggers my addr/chip select, read page command, and retrieving the buffer. Read parameter page goes through the command parser, as does the column change op, with some state variables to keep track of where in the read cycle we are so that each copy of the parameter page data can be accessed in the buffer. I lament the lack of consistency here. But, it works and the customer is unlikely to want to change anything at this point. :)

The logic is:

- Early at boot you need to identify the chip, its parameters, its
  configuration, etc.

  -> exec_op() is used

- During normal operation, it's time for I/Os. Using ->exec_op() again
  can work, but most of the time these operations can be done faster
  with a more custom approach, especially since most controller drivers
  embed and ECC engine that also needs to be managed during these
  accesses.

  -> your page helpers are here for that

- During debugging you might want to perform raw page reads,
  performance does not matter here but the data layout in the chip is
  NAND controller and ECC engine specific, while the user expected
  layout is: [all the data][all the oob]. 

  -> your raw page helpers are here for that

And there are standard helpers provided in the core if your controller
does not need specific implementations. You may want to use them
because it makes your life easier, they will use ->exec_op().

Thanks,
Miquèl
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 248e654ecefd..a66e73cd68cb 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -1440,12 +1440,14 @@  int nand_change_read_column_op(struct nand_chip *chip,
 	if (len && !buf)
 		return -EINVAL;
 
-	if (offset_in_page + len > mtd->writesize + mtd->oobsize)
-		return -EINVAL;
+	if (mtd->writesize) {
+		if ((offset_in_page + len > mtd->writesize + mtd->oobsize))
+			return -EINVAL;
 
-	/* Small page NANDs do not support column change. */
-	if (mtd->writesize <= 512)
-		return -ENOTSUPP;
+		/* Small page NANDs do not support column change. */
+		if (mtd->writesize <= 512)
+			return -ENOTSUPP;
+	}
 
 	if (nand_has_exec_op(chip)) {
 		const struct nand_interface_config *conf =