diff mbox series

[1/2] dt-bindings: mtd: nand: Macronix: document new binding

Message ID 20230323124510.2484808-2-noltari@gmail.com
State Changes Requested, archived
Headers show
Series mtd: nand: raw: macronix: allow disabling block protection | expand

Checks

Context Check Description
robh/patch-applied success
robh/checkpatch warning total: 0 errors, 1 warnings, 9 lines checked

Commit Message

Álvaro Fernández Rojas March 23, 2023, 12:45 p.m. UTC
Add new "mxic,disable-block-protection" binding documentation.
This binding allows disabling block protection support for those devices not
supporting it.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
---
 Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
 1 file changed, 3 insertions(+)

Comments

Miquel Raynal March 24, 2023, 9:40 a.m. UTC | #1
Hi Álvaro,

noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:

> Add new "mxic,disable-block-protection" binding documentation.
> This binding allows disabling block protection support for those devices not
> supporting it.
> 
> Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> ---
>  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/mtd/nand-macronix.txt b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> index ffab28a2c4d1..03f65ca32cd3 100644
> --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> @@ -16,6 +16,9 @@ in children nodes.
>  Required NAND chip properties in children mode:
>  - randomizer enable: should be "mxic,enable-randomizer-otp"
>  
> +Optional NAND chip properties in children mode:
> +- block protection disable: should be "mxic,disable-block-protection"
> +

Besides the fact that nowadays we prefer to see binding conversions to
yaml before adding anything, I don't think this will fly.

I'm not sure exactly what "disable block protection" means, we
already have similar properties like "lock" and "secure-regions", not
sure they will fit but I think it's worth checking.

Otherwise, why would you disable the block protection? What does it
mean exactly? I'm not in favor of a Macronix-specific property here.

Thanks,
Miquèl
Álvaro Fernández Rojas March 24, 2023, 10:31 a.m. UTC | #2
Hi Miquèl,

El vie, 24 mar 2023 a las 10:40, Miquel Raynal
(<miquel.raynal@bootlin.com>) escribió:
>
> Hi Álvaro,
>
> noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
>
> > Add new "mxic,disable-block-protection" binding documentation.
> > This binding allows disabling block protection support for those devices not
> > supporting it.
> >
> > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> > ---
> >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/mtd/nand-macronix.txt b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > index ffab28a2c4d1..03f65ca32cd3 100644
> > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > @@ -16,6 +16,9 @@ in children nodes.
> >  Required NAND chip properties in children mode:
> >  - randomizer enable: should be "mxic,enable-randomizer-otp"
> >
> > +Optional NAND chip properties in children mode:
> > +- block protection disable: should be "mxic,disable-block-protection"
> > +
>
> Besides the fact that nowadays we prefer to see binding conversions to
> yaml before adding anything, I don't think this will fly.
>
> I'm not sure exactly what "disable block protection" means, we
> already have similar properties like "lock" and "secure-regions", not
> sure they will fit but I think it's worth checking.

As explained in 2/2, commit 03a539c7a118 introduced a regression on
Sercomm H500-s (BCM63268) OpenWrt devices with Macronix MX30LF1G18AC
which hangs the device.

This is the log with block protection disabled:
[    0.495831] bcm6368_nand 10000200.nand: there is not valid maps for
state default
[    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
[    0.511526] nand: Macronix MX30LF1G18AC
[    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 64
[    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
[    0.535912] Bad block table found at page 65472, version 0x01
[    0.544268] Bad block table found at page 65408, version 0x01
[    0.954329] 9 fixed-partitions partitions found on MTD device brcmnand.0
...

This is the log with block protection enabled:
[    0.495095] bcm6368_nand 10000200.nand: there is not valid maps for
state default
[    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
[    0.510772] nand: Macronix MX30LF1G18AC
[    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 64
[    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
[    0.539687] Bad block table not found for chip 0
[    0.550153] Bad block table not found for chip 0
[    0.555069] Scanning device for bad blocks
[    0.601213] CPU 1 Unable to handle kernel paging request at virtual
address 10277f00, epc == 8039ce70, ra == 8016ad50
*** Device hangs ***

Enabling macronix_nand_block_protection_support() makes the device
unable to detect the bad block table and hangs it when trying to scan
for bad blocks.

>
> Otherwise, why would you disable the block protection? What does it
> mean exactly? I'm not in favor of a Macronix-specific property here.
>
> Thanks,
> Miquèl

--
Álvaro
Miquel Raynal March 24, 2023, 10:49 a.m. UTC | #3
Hi Álvaro,

noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:

> Hi Miquèl,
> 
> El vie, 24 mar 2023 a las 10:40, Miquel Raynal
> (<miquel.raynal@bootlin.com>) escribió:
> >
> > Hi Álvaro,
> >
> > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
> >  
> > > Add new "mxic,disable-block-protection" binding documentation.
> > > This binding allows disabling block protection support for those devices not
> > > supporting it.
> > >
> > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> > > ---
> > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/mtd/nand-macronix.txt b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > index ffab28a2c4d1..03f65ca32cd3 100644
> > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > @@ -16,6 +16,9 @@ in children nodes.
> > >  Required NAND chip properties in children mode:
> > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
> > >
> > > +Optional NAND chip properties in children mode:
> > > +- block protection disable: should be "mxic,disable-block-protection"
> > > +  
> >
> > Besides the fact that nowadays we prefer to see binding conversions to
> > yaml before adding anything, I don't think this will fly.
> >
> > I'm not sure exactly what "disable block protection" means, we
> > already have similar properties like "lock" and "secure-regions", not
> > sure they will fit but I think it's worth checking.  
> 
> As explained in 2/2, commit 03a539c7a118 introduced a regression on
> Sercomm H500-s (BCM63268) OpenWrt devices with Macronix MX30LF1G18AC
> which hangs the device.
> 
> This is the log with block protection disabled:
> [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps for
> state default
> [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> [    0.511526] nand: Macronix MX30LF1G18AC
> [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> 2048, OOB size: 64
> [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> [    0.535912] Bad block table found at page 65472, version 0x01
> [    0.544268] Bad block table found at page 65408, version 0x01
> [    0.954329] 9 fixed-partitions partitions found on MTD device brcmnand.0
> ...
> 
> This is the log with block protection enabled:
> [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps for
> state default
> [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> [    0.510772] nand: Macronix MX30LF1G18AC
> [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> 2048, OOB size: 64
> [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> [    0.539687] Bad block table not found for chip 0
> [    0.550153] Bad block table not found for chip 0
> [    0.555069] Scanning device for bad blocks
> [    0.601213] CPU 1 Unable to handle kernel paging request at virtual
> address 10277f00, epc == 8039ce70, ra == 8016ad50
> *** Device hangs ***
> 
> Enabling macronix_nand_block_protection_support() makes the device
> unable to detect the bad block table and hangs it when trying to scan
> for bad blocks.

Please trace nand_macronix.c and look:
- are the get_features and set_features really supported by the
  controller driver?
- what is the state of the locking configuration in the chip when you
  boot?
- is there anything that locks the device by calling mxic_nand_lock() ?
- finding no bbt is one thing, hanging is another, where is it hanging
  exactly? (offset in nand/ and line in the code)

> 
> >
> > Otherwise, why would you disable the block protection? What does it
> > mean exactly? I'm not in favor of a Macronix-specific property here.
> >
> > Thanks,
> > Miquèl  
> 
> --
> Álvaro


Thanks,
Miquèl
Álvaro Fernández Rojas March 24, 2023, 11:21 a.m. UTC | #4
El vie, 24 mar 2023 a las 11:49, Miquel Raynal
(<miquel.raynal@bootlin.com>) escribió:
>
> Hi Álvaro,
>
> noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:
>
> > Hi Miquèl,
> >
> > El vie, 24 mar 2023 a las 10:40, Miquel Raynal
> > (<miquel.raynal@bootlin.com>) escribió:
> > >
> > > Hi Álvaro,
> > >
> > > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
> > >
> > > > Add new "mxic,disable-block-protection" binding documentation.
> > > > This binding allows disabling block protection support for those devices not
> > > > supporting it.
> > > >
> > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> > > > ---
> > > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/mtd/nand-macronix.txt b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > > index ffab28a2c4d1..03f65ca32cd3 100644
> > > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > > @@ -16,6 +16,9 @@ in children nodes.
> > > >  Required NAND chip properties in children mode:
> > > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
> > > >
> > > > +Optional NAND chip properties in children mode:
> > > > +- block protection disable: should be "mxic,disable-block-protection"
> > > > +
> > >
> > > Besides the fact that nowadays we prefer to see binding conversions to
> > > yaml before adding anything, I don't think this will fly.
> > >
> > > I'm not sure exactly what "disable block protection" means, we
> > > already have similar properties like "lock" and "secure-regions", not
> > > sure they will fit but I think it's worth checking.
> >
> > As explained in 2/2, commit 03a539c7a118 introduced a regression on
> > Sercomm H500-s (BCM63268) OpenWrt devices with Macronix MX30LF1G18AC
> > which hangs the device.
> >
> > This is the log with block protection disabled:
> > [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps for
> > state default
> > [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> > [    0.511526] nand: Macronix MX30LF1G18AC
> > [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> > 2048, OOB size: 64
> > [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
> > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> > [    0.535912] Bad block table found at page 65472, version 0x01
> > [    0.544268] Bad block table found at page 65408, version 0x01
> > [    0.954329] 9 fixed-partitions partitions found on MTD device brcmnand.0
> > ...
> >
> > This is the log with block protection enabled:
> > [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps for
> > state default
> > [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> > [    0.510772] nand: Macronix MX30LF1G18AC
> > [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> > 2048, OOB size: 64
> > [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
> > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> > [    0.539687] Bad block table not found for chip 0
> > [    0.550153] Bad block table not found for chip 0
> > [    0.555069] Scanning device for bad blocks
> > [    0.601213] CPU 1 Unable to handle kernel paging request at virtual
> > address 10277f00, epc == 8039ce70, ra == 8016ad50
> > *** Device hangs ***
> >
> > Enabling macronix_nand_block_protection_support() makes the device
> > unable to detect the bad block table and hangs it when trying to scan
> > for bad blocks.
>
> Please trace nand_macronix.c and look:
> - are the get_features and set_features really supported by the
>   controller driver?

This is what I could find by debugging:
[    0.494993] bcm6368_nand 10000200.nand: there is not valid maps for
state default
[    0.505375] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
[    0.512077] nand: Macronix MX30LF1G18AC
[    0.515994] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 64
[    0.523928] bcm6368_nand 10000200.nand: detected 128MiB total,
128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
[    0.534415] bcm6368_nand 10000200.nand: ll_op cmd 0xa00ee
[    0.539988] bcm6368_nand 10000200.nand: ll_op cmd 0x600a0
[    0.545659] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
[    0.551214] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
[    0.557843] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
[    0.563475] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
[    0.569998] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
[    0.575653] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
[    0.582246] bcm6368_nand 10000200.nand: ll_op cmd 0x80010000
[    0.588067] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
[    0.594657] nand: nand_get_features: addr=a0 subfeature_param=[00
00 00 00] -> 0
[    0.602341] macronix_nand_block_protection_support:
ONFI_FEATURE_ADDR_MXIC_PROTECTION=0
[    0.610548] macronix_nand_block_protection_support: !=
MXIC_BLOCK_PROTECTION_ALL_LOCK
[    0.624760] Bad block table not found for chip 0
[    0.635542] Bad block table not found for chip 0
[    0.640270] Scanning device for bad blocks

I don't know how to tell if get_features / set_features is really supported...

> - what is the state of the locking configuration in the chip when you
>   boot?

Unlocked, I guess...
How can I check that?

> - is there anything that locks the device by calling mxic_nand_lock() ?
> - finding no bbt is one thing, hanging is another, where is it hanging
>   exactly? (offset in nand/ and line in the code)

I've got no idea...

>
> >
> > >
> > > Otherwise, why would you disable the block protection? What does it
> > > mean exactly? I'm not in favor of a Macronix-specific property here.
> > >
> > > Thanks,
> > > Miquèl
> >
> > --
> > Álvaro
>
>
> Thanks,
> Miquèl
Miquel Raynal March 24, 2023, 1:45 p.m. UTC | #5
Hi Álvaro,

noltari@gmail.com wrote on Fri, 24 Mar 2023 12:21:11 +0100:

> El vie, 24 mar 2023 a las 11:49, Miquel Raynal
> (<miquel.raynal@bootlin.com>) escribió:
> >
> > Hi Álvaro,
> >
> > noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:
> >  
> > > Hi Miquèl,
> > >
> > > El vie, 24 mar 2023 a las 10:40, Miquel Raynal
> > > (<miquel.raynal@bootlin.com>) escribió:  
> > > >
> > > > Hi Álvaro,
> > > >
> > > > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
> > > >  
> > > > > Add new "mxic,disable-block-protection" binding documentation.
> > > > > This binding allows disabling block protection support for those devices not
> > > > > supporting it.
> > > > >
> > > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> > > > > ---
> > > > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/devicetree/bindings/mtd/nand-macronix.txt b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > > > index ffab28a2c4d1..03f65ca32cd3 100644
> > > > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> > > > > @@ -16,6 +16,9 @@ in children nodes.
> > > > >  Required NAND chip properties in children mode:
> > > > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
> > > > >
> > > > > +Optional NAND chip properties in children mode:
> > > > > +- block protection disable: should be "mxic,disable-block-protection"
> > > > > +  
> > > >
> > > > Besides the fact that nowadays we prefer to see binding conversions to
> > > > yaml before adding anything, I don't think this will fly.
> > > >
> > > > I'm not sure exactly what "disable block protection" means, we
> > > > already have similar properties like "lock" and "secure-regions", not
> > > > sure they will fit but I think it's worth checking.  
> > >
> > > As explained in 2/2, commit 03a539c7a118 introduced a regression on
> > > Sercomm H500-s (BCM63268) OpenWrt devices with Macronix MX30LF1G18AC
> > > which hangs the device.
> > >
> > > This is the log with block protection disabled:
> > > [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps for
> > > state default
> > > [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> > > [    0.511526] nand: Macronix MX30LF1G18AC
> > > [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> > > 2048, OOB size: 64
> > > [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> > > [    0.535912] Bad block table found at page 65472, version 0x01
> > > [    0.544268] Bad block table found at page 65408, version 0x01
> > > [    0.954329] 9 fixed-partitions partitions found on MTD device brcmnand.0
> > > ...
> > >
> > > This is the log with block protection enabled:
> > > [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps for
> > > state default
> > > [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> > > [    0.510772] nand: Macronix MX30LF1G18AC
> > > [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> > > 2048, OOB size: 64
> > > [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> > > [    0.539687] Bad block table not found for chip 0
> > > [    0.550153] Bad block table not found for chip 0
> > > [    0.555069] Scanning device for bad blocks
> > > [    0.601213] CPU 1 Unable to handle kernel paging request at virtual
> > > address 10277f00, epc == 8039ce70, ra == 8016ad50
> > > *** Device hangs ***
> > >
> > > Enabling macronix_nand_block_protection_support() makes the device
> > > unable to detect the bad block table and hangs it when trying to scan
> > > for bad blocks.  
> >
> > Please trace nand_macronix.c and look:
> > - are the get_features and set_features really supported by the
> >   controller driver?  
> 
> This is what I could find by debugging:
> [    0.494993] bcm6368_nand 10000200.nand: there is not valid maps for
> state default
> [    0.505375] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> [    0.512077] nand: Macronix MX30LF1G18AC
> [    0.515994] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> 2048, OOB size: 64
> [    0.523928] bcm6368_nand 10000200.nand: detected 128MiB total,
> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> [    0.534415] bcm6368_nand 10000200.nand: ll_op cmd 0xa00ee
> [    0.539988] bcm6368_nand 10000200.nand: ll_op cmd 0x600a0
> [    0.545659] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> [    0.551214] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> [    0.557843] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> [    0.563475] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> [    0.569998] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> [    0.575653] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> [    0.582246] bcm6368_nand 10000200.nand: ll_op cmd 0x80010000
> [    0.588067] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> [    0.594657] nand: nand_get_features: addr=a0 subfeature_param=[00
> 00 00 00] -> 0
> [    0.602341] macronix_nand_block_protection_support:
> ONFI_FEATURE_ADDR_MXIC_PROTECTION=0
> [    0.610548] macronix_nand_block_protection_support: !=
> MXIC_BLOCK_PROTECTION_ALL_LOCK
> [    0.624760] Bad block table not found for chip 0
> [    0.635542] Bad block table not found for chip 0
> [    0.640270] Scanning device for bad blocks
> 
> I don't know how to tell if get_features / set_features is really supported...

Looks like your driver does not support exec_op but the core provides a
get/set_feature implementation.

> 
> > - what is the state of the locking configuration in the chip when you
> >   boot?  
> 
> Unlocked, I guess...
> How can I check that?

It's in your dump, the chip returns 0, meaning it's all unlocked,
apparently.

> > - is there anything that locks the device by calling mxic_nand_lock() ?

So nobody locks the device I guess? Did you add traces there?

> > - finding no bbt is one thing, hanging is another, where is it hanging
> >   exactly? (offset in nand/ and line in the code)  
> 
> I've got no idea...

You can use ftrace or just add printks a bit everywhere and try to get
closer and closer.

I looked at the patch, I don't see anything strange. Besides, I have a
close enough datasheet and I don't see what could confuse the device.

Are you really sure this patch is the problem? Is the WP pin wired on
your design?

Thanks,
Miquèl
Álvaro Fernández Rojas March 24, 2023, 2:15 p.m. UTC | #6
Hi Miquèl,

2023-03-24 14:45 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:
> Hi Álvaro,
>
> noltari@gmail.com wrote on Fri, 24 Mar 2023 12:21:11 +0100:
>
>> El vie, 24 mar 2023 a las 11:49, Miquel Raynal
>> (<miquel.raynal@bootlin.com>) escribió:
>> >
>> > Hi Álvaro,
>> >
>> > noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:
>> >
>> > > Hi Miquèl,
>> > >
>> > > El vie, 24 mar 2023 a las 10:40, Miquel Raynal
>> > > (<miquel.raynal@bootlin.com>) escribió:
>> > > >
>> > > > Hi Álvaro,
>> > > >
>> > > > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
>> > > >
>> > > > > Add new "mxic,disable-block-protection" binding documentation.
>> > > > > This binding allows disabling block protection support for those
>> > > > > devices not
>> > > > > supporting it.
>> > > > >
>> > > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
>> > > > > ---
>> > > > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
>> > > > >  1 file changed, 3 insertions(+)
>> > > > >
>> > > > > diff --git
>> > > > > a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> > > > > b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> > > > > index ffab28a2c4d1..03f65ca32cd3 100644
>> > > > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> > > > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> > > > > @@ -16,6 +16,9 @@ in children nodes.
>> > > > >  Required NAND chip properties in children mode:
>> > > > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
>> > > > >
>> > > > > +Optional NAND chip properties in children mode:
>> > > > > +- block protection disable: should be
>> > > > > "mxic,disable-block-protection"
>> > > > > +
>> > > >
>> > > > Besides the fact that nowadays we prefer to see binding conversions
>> > > > to
>> > > > yaml before adding anything, I don't think this will fly.
>> > > >
>> > > > I'm not sure exactly what "disable block protection" means, we
>> > > > already have similar properties like "lock" and "secure-regions",
>> > > > not
>> > > > sure they will fit but I think it's worth checking.
>> > >
>> > > As explained in 2/2, commit 03a539c7a118 introduced a regression on
>> > > Sercomm H500-s (BCM63268) OpenWrt devices with Macronix MX30LF1G18AC
>> > > which hangs the device.
>> > >
>> > > This is the log with block protection disabled:
>> > > [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps
>> > > for
>> > > state default
>> > > [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID:
>> > > 0xf1
>> > > [    0.511526] nand: Macronix MX30LF1G18AC
>> > > [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
>> > > 2048, OOB size: 64
>> > > [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
>> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> > > [    0.535912] Bad block table found at page 65472, version 0x01
>> > > [    0.544268] Bad block table found at page 65408, version 0x01
>> > > [    0.954329] 9 fixed-partitions partitions found on MTD device
>> > > brcmnand.0
>> > > ...
>> > >
>> > > This is the log with block protection enabled:
>> > > [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps
>> > > for
>> > > state default
>> > > [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID:
>> > > 0xf1
>> > > [    0.510772] nand: Macronix MX30LF1G18AC
>> > > [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
>> > > 2048, OOB size: 64
>> > > [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
>> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> > > [    0.539687] Bad block table not found for chip 0
>> > > [    0.550153] Bad block table not found for chip 0
>> > > [    0.555069] Scanning device for bad blocks
>> > > [    0.601213] CPU 1 Unable to handle kernel paging request at
>> > > virtual
>> > > address 10277f00, epc == 8039ce70, ra == 8016ad50
>> > > *** Device hangs ***
>> > >
>> > > Enabling macronix_nand_block_protection_support() makes the device
>> > > unable to detect the bad block table and hangs it when trying to scan
>> > > for bad blocks.
>> >
>> > Please trace nand_macronix.c and look:
>> > - are the get_features and set_features really supported by the
>> >   controller driver?
>>
>> This is what I could find by debugging:
>> [    0.494993] bcm6368_nand 10000200.nand: there is not valid maps for
>> state default
>> [    0.505375] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
>> [    0.512077] nand: Macronix MX30LF1G18AC
>> [    0.515994] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
>> 2048, OOB size: 64
>> [    0.523928] bcm6368_nand 10000200.nand: detected 128MiB total,
>> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> [    0.534415] bcm6368_nand 10000200.nand: ll_op cmd 0xa00ee
>> [    0.539988] bcm6368_nand 10000200.nand: ll_op cmd 0x600a0
>> [    0.545659] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
>> [    0.551214] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
>> [    0.557843] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
>> [    0.563475] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
>> [    0.569998] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
>> [    0.575653] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
>> [    0.582246] bcm6368_nand 10000200.nand: ll_op cmd 0x80010000
>> [    0.588067] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
>> [    0.594657] nand: nand_get_features: addr=a0 subfeature_param=[00
>> 00 00 00] -> 0
>> [    0.602341] macronix_nand_block_protection_support:
>> ONFI_FEATURE_ADDR_MXIC_PROTECTION=0
>> [    0.610548] macronix_nand_block_protection_support: !=
>> MXIC_BLOCK_PROTECTION_ALL_LOCK
>> [    0.624760] Bad block table not found for chip 0
>> [    0.635542] Bad block table not found for chip 0
>> [    0.640270] Scanning device for bad blocks
>>
>> I don't know how to tell if get_features / set_features is really
>> supported...
>
> Looks like your driver does not support exec_op but the core provides a
> get/set_feature implementation.

According to Florian, low level should be supported on brcmnand
controllers >= 4.0
Also:
https://github.com/nomis/bcm963xx_4.12L.06B_consumer/blob/e2f23ddbb20bf75689372b6e6a5a0dc613f6e313/shared/opensource/include/bcm963xx/63268_map_part.h#L1597

>
>>
>> > - what is the state of the locking configuration in the chip when you
>> >   boot?
>>
>> Unlocked, I guess...
>> How can I check that?
>
> It's in your dump, the chip returns 0, meaning it's all unlocked,
> apparently.

Well, I can read/write the device if block protection isn’t disabled,
so I guess we can confirm it’s unlocked…

>
>> > - is there anything that locks the device by calling mxic_nand_lock() ?
>
> So nobody locks the device I guess? Did you add traces there?

It doesn’t get to the point that it enabled the lock/unlock functions
since it fails when checking if feature is 0x38, so there’s no point
in adding those traces…

>
>> > - finding no bbt is one thing, hanging is another, where is it hanging
>> >   exactly? (offset in nand/ and line in the code)
>>
>> I've got no idea...
>
> You can use ftrace or just add printks a bit everywhere and try to get
> closer and closer.

I think that after trying to get the feature it just start reading
nonsense from the NAND and at some point it hangs due to that garbage…
Is it posible that the NAND starts behaving like this after getting
the feature due to some specific config of my device?

>
> I looked at the patch, I don't see anything strange. Besides, I have a
> close enough datasheet and I don't see what could confuse the device.
>
> Are you really sure this patch is the problem? Is the WP pin wired on
> your design?

There’s no WP pin in brcmnand controllers < 7.0

>
> Thanks,
> Miquèl
>

Thanks,
Álvaro
Miquel Raynal March 24, 2023, 2:36 p.m. UTC | #7
Hi Álvaro,

+ YouChing and Jaime from Macronix
TLDR for them: there is a misbehavior since Mason added block
protection support. Just checking if the blocks are protected seems to
misconfigure the chip entirely, see below. Any hints?

noltari@gmail.com wrote on Fri, 24 Mar 2023 15:15:47 +0100:

> Hi Miquèl,
> 
> 2023-03-24 14:45 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:
> > Hi Álvaro,
> >
> > noltari@gmail.com wrote on Fri, 24 Mar 2023 12:21:11 +0100:
> >  
> >> El vie, 24 mar 2023 a las 11:49, Miquel Raynal
> >> (<miquel.raynal@bootlin.com>) escribió:  
> >> >
> >> > Hi Álvaro,
> >> >
> >> > noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:
> >> >  
> >> > > Hi Miquèl,
> >> > >
> >> > > El vie, 24 mar 2023 a las 10:40, Miquel Raynal
> >> > > (<miquel.raynal@bootlin.com>) escribió:  
> >> > > >
> >> > > > Hi Álvaro,
> >> > > >
> >> > > > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
> >> > > >  
> >> > > > > Add new "mxic,disable-block-protection" binding documentation.
> >> > > > > This binding allows disabling block protection support for those
> >> > > > > devices not
> >> > > > > supporting it.
> >> > > > >
> >> > > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> >> > > > > ---
> >> > > > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3 +++
> >> > > > >  1 file changed, 3 insertions(+)
> >> > > > >
> >> > > > > diff --git
> >> > > > > a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> > > > > b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> > > > > index ffab28a2c4d1..03f65ca32cd3 100644
> >> > > > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> > > > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> > > > > @@ -16,6 +16,9 @@ in children nodes.
> >> > > > >  Required NAND chip properties in children mode:
> >> > > > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
> >> > > > >
> >> > > > > +Optional NAND chip properties in children mode:
> >> > > > > +- block protection disable: should be
> >> > > > > "mxic,disable-block-protection"
> >> > > > > +  
> >> > > >
> >> > > > Besides the fact that nowadays we prefer to see binding conversions
> >> > > > to
> >> > > > yaml before adding anything, I don't think this will fly.
> >> > > >
> >> > > > I'm not sure exactly what "disable block protection" means, we
> >> > > > already have similar properties like "lock" and "secure-regions",
> >> > > > not
> >> > > > sure they will fit but I think it's worth checking.  
> >> > >
> >> > > As explained in 2/2, commit 03a539c7a118 introduced a regression on
> >> > > Sercomm H500-s (BCM63268) OpenWrt devices with Macronix MX30LF1G18AC
> >> > > which hangs the device.
> >> > >
> >> > > This is the log with block protection disabled:
> >> > > [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps
> >> > > for
> >> > > state default
> >> > > [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> > > 0xf1
> >> > > [    0.511526] nand: Macronix MX30LF1G18AC
> >> > > [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> > > 2048, OOB size: 64
> >> > > [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> > > [    0.535912] Bad block table found at page 65472, version 0x01
> >> > > [    0.544268] Bad block table found at page 65408, version 0x01
> >> > > [    0.954329] 9 fixed-partitions partitions found on MTD device
> >> > > brcmnand.0
> >> > > ...
> >> > >
> >> > > This is the log with block protection enabled:
> >> > > [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps
> >> > > for
> >> > > state default
> >> > > [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> > > 0xf1
> >> > > [    0.510772] nand: Macronix MX30LF1G18AC
> >> > > [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> > > 2048, OOB size: 64
> >> > > [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> > > [    0.539687] Bad block table not found for chip 0
> >> > > [    0.550153] Bad block table not found for chip 0
> >> > > [    0.555069] Scanning device for bad blocks
> >> > > [    0.601213] CPU 1 Unable to handle kernel paging request at
> >> > > virtual
> >> > > address 10277f00, epc == 8039ce70, ra == 8016ad50
> >> > > *** Device hangs ***
> >> > >
> >> > > Enabling macronix_nand_block_protection_support() makes the device
> >> > > unable to detect the bad block table and hangs it when trying to scan
> >> > > for bad blocks.  
> >> >
> >> > Please trace nand_macronix.c and look:
> >> > - are the get_features and set_features really supported by the
> >> >   controller driver?  
> >>
> >> This is what I could find by debugging:
> >> [    0.494993] bcm6368_nand 10000200.nand: there is not valid maps for
> >> state default
> >> [    0.505375] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> >> [    0.512077] nand: Macronix MX30LF1G18AC
> >> [    0.515994] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> 2048, OOB size: 64
> >> [    0.523928] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> [    0.534415] bcm6368_nand 10000200.nand: ll_op cmd 0xa00ee
> >> [    0.539988] bcm6368_nand 10000200.nand: ll_op cmd 0x600a0
> >> [    0.545659] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> [    0.551214] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> >> [    0.557843] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> [    0.563475] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> >> [    0.569998] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> [    0.575653] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> >> [    0.582246] bcm6368_nand 10000200.nand: ll_op cmd 0x80010000
> >> [    0.588067] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES = 0x00
> >> [    0.594657] nand: nand_get_features: addr=a0 subfeature_param=[00
> >> 00 00 00] -> 0
> >> [    0.602341] macronix_nand_block_protection_support:
> >> ONFI_FEATURE_ADDR_MXIC_PROTECTION=0
> >> [    0.610548] macronix_nand_block_protection_support: !=
> >> MXIC_BLOCK_PROTECTION_ALL_LOCK
> >> [    0.624760] Bad block table not found for chip 0
> >> [    0.635542] Bad block table not found for chip 0
> >> [    0.640270] Scanning device for bad blocks
> >>
> >> I don't know how to tell if get_features / set_features is really
> >> supported...  
> >
> > Looks like your driver does not support exec_op but the core provides a
> > get/set_feature implementation.  
> 
> According to Florian, low level should be supported on brcmnand
> controllers >= 4.0
> Also:
> https://github.com/nomis/bcm963xx_4.12L.06B_consumer/blob/e2f23ddbb20bf75689372b6e6a5a0dc613f6e313/shared/opensource/include/bcm963xx/63268_map_part.h#L1597

Just to be sure, you're using a mainline controller driver, not this
one?

> >  
> >>  
> >> > - what is the state of the locking configuration in the chip when you
> >> >   boot?  
> >>
> >> Unlocked, I guess...
> >> How can I check that?  
> >
> > It's in your dump, the chip returns 0, meaning it's all unlocked,
> > apparently.  
> 
> Well, I can read/write the device if block protection isn’t disabled,
> so I guess we can confirm it’s unlocked…
> 
> >  
> >> > - is there anything that locks the device by calling mxic_nand_lock() ?  
> >
> > So nobody locks the device I guess? Did you add traces there?  
> 
> It doesn’t get to the point that it enabled the lock/unlock functions
> since it fails when checking if feature is 0x38, so there’s no point
> in adding those traces…

Right, it returns before setting these I guess.

> 
> >  
> >> > - finding no bbt is one thing, hanging is another, where is it hanging
> >> >   exactly? (offset in nand/ and line in the code)  
> >>
> >> I've got no idea...  
> >
> > You can use ftrace or just add printks a bit everywhere and try to get
> > closer and closer.  
> 
> I think that after trying to get the feature it just start reading
> nonsense from the NAND and at some point it hangs due to that garbage…

It should refuse to mount the device somehow, but in no case the kernel
should hang.

> Is it posible that the NAND starts behaving like this after getting
> the feature due to some specific config of my device?
> 
> >
> > I looked at the patch, I don't see anything strange. Besides, I have a
> > close enough datasheet and I don't see what could confuse the device.
> >
> > Are you really sure this patch is the problem? Is the WP pin wired on
> > your design?  
> 
> There’s no WP pin in brcmnand controllers < 7.0

What about the chip?

Thanks,
Miquèl
Álvaro Fernández Rojas March 24, 2023, 5:04 p.m. UTC | #8
Hi Miquèl,

2023-03-24 15:36 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:
> Hi Álvaro,
>
> + YouChing and Jaime from Macronix
> TLDR for them: there is a misbehavior since Mason added block
> protection support. Just checking if the blocks are protected seems to
> misconfigure the chip entirely, see below. Any hints?

Could it be that the NAND is stuck expecting a read 0x00 command which
isn’t sent after getting the features?

>
> noltari@gmail.com wrote on Fri, 24 Mar 2023 15:15:47 +0100:
>
>> Hi Miquèl,
>>
>> 2023-03-24 14:45 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:
>> > Hi Álvaro,
>> >
>> > noltari@gmail.com wrote on Fri, 24 Mar 2023 12:21:11 +0100:
>> >
>> >> El vie, 24 mar 2023 a las 11:49, Miquel Raynal
>> >> (<miquel.raynal@bootlin.com>) escribió:
>> >> >
>> >> > Hi Álvaro,
>> >> >
>> >> > noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:
>> >> >
>> >> > > Hi Miquèl,
>> >> > >
>> >> > > El vie, 24 mar 2023 a las 10:40, Miquel Raynal
>> >> > > (<miquel.raynal@bootlin.com>) escribió:
>> >> > > >
>> >> > > > Hi Álvaro,
>> >> > > >
>> >> > > > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
>> >> > > >
>> >> > > > > Add new "mxic,disable-block-protection" binding documentation.
>> >> > > > > This binding allows disabling block protection support for
>> >> > > > > those
>> >> > > > > devices not
>> >> > > > > supporting it.
>> >> > > > >
>> >> > > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
>> >> > > > > ---
>> >> > > > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3
>> >> > > > > +++
>> >> > > > >  1 file changed, 3 insertions(+)
>> >> > > > >
>> >> > > > > diff --git
>> >> > > > > a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> >> > > > > b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> >> > > > > index ffab28a2c4d1..03f65ca32cd3 100644
>> >> > > > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> >> > > > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
>> >> > > > > @@ -16,6 +16,9 @@ in children nodes.
>> >> > > > >  Required NAND chip properties in children mode:
>> >> > > > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
>> >> > > > >
>> >> > > > > +Optional NAND chip properties in children mode:
>> >> > > > > +- block protection disable: should be
>> >> > > > > "mxic,disable-block-protection"
>> >> > > > > +
>> >> > > >
>> >> > > > Besides the fact that nowadays we prefer to see binding
>> >> > > > conversions
>> >> > > > to
>> >> > > > yaml before adding anything, I don't think this will fly.
>> >> > > >
>> >> > > > I'm not sure exactly what "disable block protection" means, we
>> >> > > > already have similar properties like "lock" and
>> >> > > > "secure-regions",
>> >> > > > not
>> >> > > > sure they will fit but I think it's worth checking.
>> >> > >
>> >> > > As explained in 2/2, commit 03a539c7a118 introduced a regression
>> >> > > on
>> >> > > Sercomm H500-s (BCM63268) OpenWrt devices with Macronix
>> >> > > MX30LF1G18AC
>> >> > > which hangs the device.
>> >> > >
>> >> > > This is the log with block protection disabled:
>> >> > > [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps
>> >> > > for
>> >> > > state default
>> >> > > [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID:
>> >> > > 0xf1
>> >> > > [    0.511526] nand: Macronix MX30LF1G18AC
>> >> > > [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
>> >> > > 2048, OOB size: 64
>> >> > > [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
>> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> >> > > [    0.535912] Bad block table found at page 65472, version 0x01
>> >> > > [    0.544268] Bad block table found at page 65408, version 0x01
>> >> > > [    0.954329] 9 fixed-partitions partitions found on MTD device
>> >> > > brcmnand.0
>> >> > > ...
>> >> > >
>> >> > > This is the log with block protection enabled:
>> >> > > [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps
>> >> > > for
>> >> > > state default
>> >> > > [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID:
>> >> > > 0xf1
>> >> > > [    0.510772] nand: Macronix MX30LF1G18AC
>> >> > > [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
>> >> > > 2048, OOB size: 64
>> >> > > [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
>> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> >> > > [    0.539687] Bad block table not found for chip 0
>> >> > > [    0.550153] Bad block table not found for chip 0
>> >> > > [    0.555069] Scanning device for bad blocks
>> >> > > [    0.601213] CPU 1 Unable to handle kernel paging request at
>> >> > > virtual
>> >> > > address 10277f00, epc == 8039ce70, ra == 8016ad50
>> >> > > *** Device hangs ***
>> >> > >
>> >> > > Enabling macronix_nand_block_protection_support() makes the device
>> >> > > unable to detect the bad block table and hangs it when trying to
>> >> > > scan
>> >> > > for bad blocks.
>> >> >
>> >> > Please trace nand_macronix.c and look:
>> >> > - are the get_features and set_features really supported by the
>> >> >   controller driver?
>> >>
>> >> This is what I could find by debugging:
>> >> [    0.494993] bcm6368_nand 10000200.nand: there is not valid maps for
>> >> state default
>> >> [    0.505375] nand: device found, Manufacturer ID: 0xc2, Chip ID:
>> >> 0xf1
>> >> [    0.512077] nand: Macronix MX30LF1G18AC
>> >> [    0.515994] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
>> >> 2048, OOB size: 64
>> >> [    0.523928] bcm6368_nand 10000200.nand: detected 128MiB total,
>> >> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
>> >> [    0.534415] bcm6368_nand 10000200.nand: ll_op cmd 0xa00ee
>> >> [    0.539988] bcm6368_nand 10000200.nand: ll_op cmd 0x600a0
>> >> [    0.545659] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
>> >> [    0.551214] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
>> >> 0x00
>> >> [    0.557843] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
>> >> [    0.563475] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
>> >> 0x00
>> >> [    0.569998] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
>> >> [    0.575653] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
>> >> 0x00
>> >> [    0.582246] bcm6368_nand 10000200.nand: ll_op cmd 0x80010000
>> >> [    0.588067] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
>> >> 0x00
>> >> [    0.594657] nand: nand_get_features: addr=a0 subfeature_param=[00
>> >> 00 00 00] -> 0
>> >> [    0.602341] macronix_nand_block_protection_support:
>> >> ONFI_FEATURE_ADDR_MXIC_PROTECTION=0
>> >> [    0.610548] macronix_nand_block_protection_support: !=
>> >> MXIC_BLOCK_PROTECTION_ALL_LOCK
>> >> [    0.624760] Bad block table not found for chip 0
>> >> [    0.635542] Bad block table not found for chip 0
>> >> [    0.640270] Scanning device for bad blocks
>> >>
>> >> I don't know how to tell if get_features / set_features is really
>> >> supported...
>> >
>> > Looks like your driver does not support exec_op but the core provides a
>> > get/set_feature implementation.
>>
>> According to Florian, low level should be supported on brcmnand
>> controllers >= 4.0
>> Also:
>> https://github.com/nomis/bcm963xx_4.12L.06B_consumer/blob/e2f23ddbb20bf75689372b6e6a5a0dc613f6e313/shared/opensource/include/bcm963xx/63268_map_part.h#L1597
>
> Just to be sure, you're using a mainline controller driver, not this
> one?

Yes, this was just to prove that the HW I’m using has get/set features support.
I’m using OpenWrt, so it’s linux v5.15 driver.

>
>> >
>> >>
>> >> > - what is the state of the locking configuration in the chip when
>> >> > you
>> >> >   boot?
>> >>
>> >> Unlocked, I guess...
>> >> How can I check that?
>> >
>> > It's in your dump, the chip returns 0, meaning it's all unlocked,
>> > apparently.
>>
>> Well, I can read/write the device if block protection isn’t disabled,
>> so I guess we can confirm it’s unlocked…
>>
>> >
>> >> > - is there anything that locks the device by calling mxic_nand_lock()
>> >> > ?
>> >
>> > So nobody locks the device I guess? Did you add traces there?
>>
>> It doesn’t get to the point that it enabled the lock/unlock functions
>> since it fails when checking if feature is 0x38, so there’s no point
>> in adding those traces…
>
> Right, it returns before setting these I guess.
>
>>
>> >
>> >> > - finding no bbt is one thing, hanging is another, where is it
>> >> > hanging
>> >> >   exactly? (offset in nand/ and line in the code)
>> >>
>> >> I've got no idea...
>> >
>> > You can use ftrace or just add printks a bit everywhere and try to get
>> > closer and closer.
>>
>> I think that after trying to get the feature it just start reading
>> nonsense from the NAND and at some point it hangs due to that garbage…
>
> It should refuse to mount the device somehow, but in no case the kernel
> should hang.

Yes, I think that this is a side effect (maybe a different bug somewhere else).

>
>> Is it posible that the NAND starts behaving like this after getting
>> the feature due to some specific config of my device?
>>
>> >
>> > I looked at the patch, I don't see anything strange. Besides, I have a
>> > close enough datasheet and I don't see what could confuse the device.
>> >
>> > Are you really sure this patch is the problem? Is the WP pin wired on
>> > your design?
>>
>> There’s no WP pin in brcmnand controllers < 7.0
>
> What about the chip?

Maybe it has a GPIO controlling that, but I don’t have that info…

>
> Thanks,
> Miquèl
>
Miquel Raynal March 27, 2023, 8:21 a.m. UTC | #9
Hi Álvaro,

noltari@gmail.com wrote on Fri, 24 Mar 2023 18:04:38 +0100:

> Hi Miquèl,
> 
> 2023-03-24 15:36 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:
> > Hi Álvaro,
> >
> > + YouChing and Jaime from Macronix
> > TLDR for them: there is a misbehavior since Mason added block
> > protection support. Just checking if the blocks are protected seems to
> > misconfigure the chip entirely, see below. Any hints?  
> 
> Could it be that the NAND is stuck expecting a read 0x00 command which
> isn’t sent after getting the features?

I have no idea, please try that, you can manually generate a READ0 by
hacking an existing read helper.

> > noltari@gmail.com wrote on Fri, 24 Mar 2023 15:15:47 +0100:
> >  
> >> Hi Miquèl,
> >>
> >> 2023-03-24 14:45 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:  
> >> > Hi Álvaro,
> >> >
> >> > noltari@gmail.com wrote on Fri, 24 Mar 2023 12:21:11 +0100:
> >> >  
> >> >> El vie, 24 mar 2023 a las 11:49, Miquel Raynal
> >> >> (<miquel.raynal@bootlin.com>) escribió:  
> >> >> >
> >> >> > Hi Álvaro,
> >> >> >
> >> >> > noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:
> >> >> >  
> >> >> > > Hi Miquèl,
> >> >> > >
> >> >> > > El vie, 24 mar 2023 a las 10:40, Miquel Raynal
> >> >> > > (<miquel.raynal@bootlin.com>) escribió:  
> >> >> > > >
> >> >> > > > Hi Álvaro,
> >> >> > > >
> >> >> > > > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
> >> >> > > >  
> >> >> > > > > Add new "mxic,disable-block-protection" binding documentation.
> >> >> > > > > This binding allows disabling block protection support for
> >> >> > > > > those
> >> >> > > > > devices not
> >> >> > > > > supporting it.
> >> >> > > > >
> >> >> > > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> >> >> > > > > ---
> >> >> > > > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3
> >> >> > > > > +++
> >> >> > > > >  1 file changed, 3 insertions(+)
> >> >> > > > >
> >> >> > > > > diff --git
> >> >> > > > > a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > index ffab28a2c4d1..03f65ca32cd3 100644
> >> >> > > > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > @@ -16,6 +16,9 @@ in children nodes.
> >> >> > > > >  Required NAND chip properties in children mode:
> >> >> > > > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
> >> >> > > > >
> >> >> > > > > +Optional NAND chip properties in children mode:
> >> >> > > > > +- block protection disable: should be
> >> >> > > > > "mxic,disable-block-protection"
> >> >> > > > > +  
> >> >> > > >
> >> >> > > > Besides the fact that nowadays we prefer to see binding
> >> >> > > > conversions
> >> >> > > > to
> >> >> > > > yaml before adding anything, I don't think this will fly.
> >> >> > > >
> >> >> > > > I'm not sure exactly what "disable block protection" means, we
> >> >> > > > already have similar properties like "lock" and
> >> >> > > > "secure-regions",
> >> >> > > > not
> >> >> > > > sure they will fit but I think it's worth checking.  
> >> >> > >
> >> >> > > As explained in 2/2, commit 03a539c7a118 introduced a regression
> >> >> > > on
> >> >> > > Sercomm H500-s (BCM63268) OpenWrt devices with Macronix
> >> >> > > MX30LF1G18AC
> >> >> > > which hangs the device.
> >> >> > >
> >> >> > > This is the log with block protection disabled:
> >> >> > > [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps
> >> >> > > for
> >> >> > > state default
> >> >> > > [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> >> > > 0xf1
> >> >> > > [    0.511526] nand: Macronix MX30LF1G18AC
> >> >> > > [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> >> > > 2048, OOB size: 64
> >> >> > > [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> >> > > [    0.535912] Bad block table found at page 65472, version 0x01
> >> >> > > [    0.544268] Bad block table found at page 65408, version 0x01
> >> >> > > [    0.954329] 9 fixed-partitions partitions found on MTD device
> >> >> > > brcmnand.0
> >> >> > > ...
> >> >> > >
> >> >> > > This is the log with block protection enabled:
> >> >> > > [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps
> >> >> > > for
> >> >> > > state default
> >> >> > > [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> >> > > 0xf1
> >> >> > > [    0.510772] nand: Macronix MX30LF1G18AC
> >> >> > > [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> >> > > 2048, OOB size: 64
> >> >> > > [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> >> > > [    0.539687] Bad block table not found for chip 0
> >> >> > > [    0.550153] Bad block table not found for chip 0
> >> >> > > [    0.555069] Scanning device for bad blocks
> >> >> > > [    0.601213] CPU 1 Unable to handle kernel paging request at
> >> >> > > virtual
> >> >> > > address 10277f00, epc == 8039ce70, ra == 8016ad50
> >> >> > > *** Device hangs ***
> >> >> > >
> >> >> > > Enabling macronix_nand_block_protection_support() makes the device
> >> >> > > unable to detect the bad block table and hangs it when trying to
> >> >> > > scan
> >> >> > > for bad blocks.  
> >> >> >
> >> >> > Please trace nand_macronix.c and look:
> >> >> > - are the get_features and set_features really supported by the
> >> >> >   controller driver?  
> >> >>
> >> >> This is what I could find by debugging:
> >> >> [    0.494993] bcm6368_nand 10000200.nand: there is not valid maps for
> >> >> state default
> >> >> [    0.505375] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> >> 0xf1
> >> >> [    0.512077] nand: Macronix MX30LF1G18AC
> >> >> [    0.515994] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> >> 2048, OOB size: 64
> >> >> [    0.523928] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> >> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> >> [    0.534415] bcm6368_nand 10000200.nand: ll_op cmd 0xa00ee
> >> >> [    0.539988] bcm6368_nand 10000200.nand: ll_op cmd 0x600a0
> >> >> [    0.545659] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> >> [    0.551214] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.557843] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> >> [    0.563475] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.569998] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> >> [    0.575653] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.582246] bcm6368_nand 10000200.nand: ll_op cmd 0x80010000
> >> >> [    0.588067] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.594657] nand: nand_get_features: addr=a0 subfeature_param=[00
> >> >> 00 00 00] -> 0
> >> >> [    0.602341] macronix_nand_block_protection_support:
> >> >> ONFI_FEATURE_ADDR_MXIC_PROTECTION=0
> >> >> [    0.610548] macronix_nand_block_protection_support: !=
> >> >> MXIC_BLOCK_PROTECTION_ALL_LOCK
> >> >> [    0.624760] Bad block table not found for chip 0
> >> >> [    0.635542] Bad block table not found for chip 0
> >> >> [    0.640270] Scanning device for bad blocks
> >> >>
> >> >> I don't know how to tell if get_features / set_features is really
> >> >> supported...  
> >> >
> >> > Looks like your driver does not support exec_op but the core provides a
> >> > get/set_feature implementation.  
> >>
> >> According to Florian, low level should be supported on brcmnand
> >> controllers >= 4.0
> >> Also:
> >> https://github.com/nomis/bcm963xx_4.12L.06B_consumer/blob/e2f23ddbb20bf75689372b6e6a5a0dc613f6e313/shared/opensource/include/bcm963xx/63268_map_part.h#L1597  
> >
> > Just to be sure, you're using a mainline controller driver, not this
> > one?  
> 
> Yes, this was just to prove that the HW I’m using has get/set features support.
> I’m using OpenWrt, so it’s linux v5.15 driver.

Ok, thanks for the confirmation.

> 
> >  
> >> >  
> >> >>  
> >> >> > - what is the state of the locking configuration in the chip when
> >> >> > you
> >> >> >   boot?  
> >> >>
> >> >> Unlocked, I guess...
> >> >> How can I check that?  
> >> >
> >> > It's in your dump, the chip returns 0, meaning it's all unlocked,
> >> > apparently.  
> >>
> >> Well, I can read/write the device if block protection isn’t disabled,
> >> so I guess we can confirm it’s unlocked…
> >>  
> >> >  
> >> >> > - is there anything that locks the device by calling mxic_nand_lock()
> >> >> > ?  
> >> >
> >> > So nobody locks the device I guess? Did you add traces there?  
> >>
> >> It doesn’t get to the point that it enabled the lock/unlock functions
> >> since it fails when checking if feature is 0x38, so there’s no point
> >> in adding those traces…  
> >
> > Right, it returns before setting these I guess.
> >  
> >>  
> >> >  
> >> >> > - finding no bbt is one thing, hanging is another, where is it
> >> >> > hanging
> >> >> >   exactly? (offset in nand/ and line in the code)  
> >> >>
> >> >> I've got no idea...  
> >> >
> >> > You can use ftrace or just add printks a bit everywhere and try to get
> >> > closer and closer.  
> >>
> >> I think that after trying to get the feature it just start reading
> >> nonsense from the NAND and at some point it hangs due to that garbage…  
> >
> > It should refuse to mount the device somehow, but in no case the kernel
> > should hang.  
> 
> Yes, I think that this is a side effect (maybe a different bug somewhere else).

Could be worth checking.

> 
> >  
> >> Is it posible that the NAND starts behaving like this after getting
> >> the feature due to some specific config of my device?
> >>  
> >> >
> >> > I looked at the patch, I don't see anything strange. Besides, I have a
> >> > close enough datasheet and I don't see what could confuse the device.
> >> >
> >> > Are you really sure this patch is the problem? Is the WP pin wired on
> >> > your design?  
> >>
> >> There’s no WP pin in brcmnand controllers < 7.0  
> >
> > What about the chip?  
> 
> Maybe it has a GPIO controlling that, but I don’t have that info…

I mean, on the board, is the chip connected to some kind of
pull-up/down resistor? Because it may change its behavior.

Regarding your issue, I see there is a problem, but I don't get why.
The current proposal is not satisfying, I cannot pick this up. We
need feedback from Macronix :-)

Thanks,
Miquèl
Álvaro Fernández Rojas April 22, 2023, 9:28 a.m. UTC | #10
Hello YouChing and Jaime,

I still didn't get any feedback from you (or Macronix) on this issue.
Did you have time to look into it?

Thanks,
Álvaro.

El vie, 24 mar 2023 a las 18:04, Álvaro Fernández Rojas
(<noltari@gmail.com>) escribió:
>
> Hi Miquèl,
>
> 2023-03-24 15:36 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:
> > Hi Álvaro,
> >
> > + YouChing and Jaime from Macronix
> > TLDR for them: there is a misbehavior since Mason added block
> > protection support. Just checking if the blocks are protected seems to
> > misconfigure the chip entirely, see below. Any hints?
>
> Could it be that the NAND is stuck expecting a read 0x00 command which
> isn’t sent after getting the features?
>
> >
> > noltari@gmail.com wrote on Fri, 24 Mar 2023 15:15:47 +0100:
> >
> >> Hi Miquèl,
> >>
> >> 2023-03-24 14:45 GMT+01:00, Miquel Raynal <miquel.raynal@bootlin.com>:
> >> > Hi Álvaro,
> >> >
> >> > noltari@gmail.com wrote on Fri, 24 Mar 2023 12:21:11 +0100:
> >> >
> >> >> El vie, 24 mar 2023 a las 11:49, Miquel Raynal
> >> >> (<miquel.raynal@bootlin.com>) escribió:
> >> >> >
> >> >> > Hi Álvaro,
> >> >> >
> >> >> > noltari@gmail.com wrote on Fri, 24 Mar 2023 11:31:17 +0100:
> >> >> >
> >> >> > > Hi Miquèl,
> >> >> > >
> >> >> > > El vie, 24 mar 2023 a las 10:40, Miquel Raynal
> >> >> > > (<miquel.raynal@bootlin.com>) escribió:
> >> >> > > >
> >> >> > > > Hi Álvaro,
> >> >> > > >
> >> >> > > > noltari@gmail.com wrote on Thu, 23 Mar 2023 13:45:09 +0100:
> >> >> > > >
> >> >> > > > > Add new "mxic,disable-block-protection" binding documentation.
> >> >> > > > > This binding allows disabling block protection support for
> >> >> > > > > those
> >> >> > > > > devices not
> >> >> > > > > supporting it.
> >> >> > > > >
> >> >> > > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> >> >> > > > > ---
> >> >> > > > >  Documentation/devicetree/bindings/mtd/nand-macronix.txt | 3
> >> >> > > > > +++
> >> >> > > > >  1 file changed, 3 insertions(+)
> >> >> > > > >
> >> >> > > > > diff --git
> >> >> > > > > a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > index ffab28a2c4d1..03f65ca32cd3 100644
> >> >> > > > > --- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > +++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
> >> >> > > > > @@ -16,6 +16,9 @@ in children nodes.
> >> >> > > > >  Required NAND chip properties in children mode:
> >> >> > > > >  - randomizer enable: should be "mxic,enable-randomizer-otp"
> >> >> > > > >
> >> >> > > > > +Optional NAND chip properties in children mode:
> >> >> > > > > +- block protection disable: should be
> >> >> > > > > "mxic,disable-block-protection"
> >> >> > > > > +
> >> >> > > >
> >> >> > > > Besides the fact that nowadays we prefer to see binding
> >> >> > > > conversions
> >> >> > > > to
> >> >> > > > yaml before adding anything, I don't think this will fly.
> >> >> > > >
> >> >> > > > I'm not sure exactly what "disable block protection" means, we
> >> >> > > > already have similar properties like "lock" and
> >> >> > > > "secure-regions",
> >> >> > > > not
> >> >> > > > sure they will fit but I think it's worth checking.
> >> >> > >
> >> >> > > As explained in 2/2, commit 03a539c7a118 introduced a regression
> >> >> > > on
> >> >> > > Sercomm H500-s (BCM63268) OpenWrt devices with Macronix
> >> >> > > MX30LF1G18AC
> >> >> > > which hangs the device.
> >> >> > >
> >> >> > > This is the log with block protection disabled:
> >> >> > > [    0.495831] bcm6368_nand 10000200.nand: there is not valid maps
> >> >> > > for
> >> >> > > state default
> >> >> > > [    0.504995] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> >> > > 0xf1
> >> >> > > [    0.511526] nand: Macronix MX30LF1G18AC
> >> >> > > [    0.515586] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> >> > > 2048, OOB size: 64
> >> >> > > [    0.523516] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> >> > > [    0.535912] Bad block table found at page 65472, version 0x01
> >> >> > > [    0.544268] Bad block table found at page 65408, version 0x01
> >> >> > > [    0.954329] 9 fixed-partitions partitions found on MTD device
> >> >> > > brcmnand.0
> >> >> > > ...
> >> >> > >
> >> >> > > This is the log with block protection enabled:
> >> >> > > [    0.495095] bcm6368_nand 10000200.nand: there is not valid maps
> >> >> > > for
> >> >> > > state default
> >> >> > > [    0.504249] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> >> > > 0xf1
> >> >> > > [    0.510772] nand: Macronix MX30LF1G18AC
> >> >> > > [    0.514874] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> >> > > 2048, OOB size: 64
> >> >> > > [    0.522780] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> >> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> >> > > [    0.539687] Bad block table not found for chip 0
> >> >> > > [    0.550153] Bad block table not found for chip 0
> >> >> > > [    0.555069] Scanning device for bad blocks
> >> >> > > [    0.601213] CPU 1 Unable to handle kernel paging request at
> >> >> > > virtual
> >> >> > > address 10277f00, epc == 8039ce70, ra == 8016ad50
> >> >> > > *** Device hangs ***
> >> >> > >
> >> >> > > Enabling macronix_nand_block_protection_support() makes the device
> >> >> > > unable to detect the bad block table and hangs it when trying to
> >> >> > > scan
> >> >> > > for bad blocks.
> >> >> >
> >> >> > Please trace nand_macronix.c and look:
> >> >> > - are the get_features and set_features really supported by the
> >> >> >   controller driver?
> >> >>
> >> >> This is what I could find by debugging:
> >> >> [    0.494993] bcm6368_nand 10000200.nand: there is not valid maps for
> >> >> state default
> >> >> [    0.505375] nand: device found, Manufacturer ID: 0xc2, Chip ID:
> >> >> 0xf1
> >> >> [    0.512077] nand: Macronix MX30LF1G18AC
> >> >> [    0.515994] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> >> >> 2048, OOB size: 64
> >> >> [    0.523928] bcm6368_nand 10000200.nand: detected 128MiB total,
> >> >> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> >> >> [    0.534415] bcm6368_nand 10000200.nand: ll_op cmd 0xa00ee
> >> >> [    0.539988] bcm6368_nand 10000200.nand: ll_op cmd 0x600a0
> >> >> [    0.545659] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> >> [    0.551214] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.557843] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> >> [    0.563475] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.569998] bcm6368_nand 10000200.nand: ll_op cmd 0x10000
> >> >> [    0.575653] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.582246] bcm6368_nand 10000200.nand: ll_op cmd 0x80010000
> >> >> [    0.588067] bcm6368_nand 10000200.nand: NAND_CMD_GET_FEATURES =
> >> >> 0x00
> >> >> [    0.594657] nand: nand_get_features: addr=a0 subfeature_param=[00
> >> >> 00 00 00] -> 0
> >> >> [    0.602341] macronix_nand_block_protection_support:
> >> >> ONFI_FEATURE_ADDR_MXIC_PROTECTION=0
> >> >> [    0.610548] macronix_nand_block_protection_support: !=
> >> >> MXIC_BLOCK_PROTECTION_ALL_LOCK
> >> >> [    0.624760] Bad block table not found for chip 0
> >> >> [    0.635542] Bad block table not found for chip 0
> >> >> [    0.640270] Scanning device for bad blocks
> >> >>
> >> >> I don't know how to tell if get_features / set_features is really
> >> >> supported...
> >> >
> >> > Looks like your driver does not support exec_op but the core provides a
> >> > get/set_feature implementation.
> >>
> >> According to Florian, low level should be supported on brcmnand
> >> controllers >= 4.0
> >> Also:
> >> https://github.com/nomis/bcm963xx_4.12L.06B_consumer/blob/e2f23ddbb20bf75689372b6e6a5a0dc613f6e313/shared/opensource/include/bcm963xx/63268_map_part.h#L1597
> >
> > Just to be sure, you're using a mainline controller driver, not this
> > one?
>
> Yes, this was just to prove that the HW I’m using has get/set features support.
> I’m using OpenWrt, so it’s linux v5.15 driver.
>
> >
> >> >
> >> >>
> >> >> > - what is the state of the locking configuration in the chip when
> >> >> > you
> >> >> >   boot?
> >> >>
> >> >> Unlocked, I guess...
> >> >> How can I check that?
> >> >
> >> > It's in your dump, the chip returns 0, meaning it's all unlocked,
> >> > apparently.
> >>
> >> Well, I can read/write the device if block protection isn’t disabled,
> >> so I guess we can confirm it’s unlocked…
> >>
> >> >
> >> >> > - is there anything that locks the device by calling mxic_nand_lock()
> >> >> > ?
> >> >
> >> > So nobody locks the device I guess? Did you add traces there?
> >>
> >> It doesn’t get to the point that it enabled the lock/unlock functions
> >> since it fails when checking if feature is 0x38, so there’s no point
> >> in adding those traces…
> >
> > Right, it returns before setting these I guess.
> >
> >>
> >> >
> >> >> > - finding no bbt is one thing, hanging is another, where is it
> >> >> > hanging
> >> >> >   exactly? (offset in nand/ and line in the code)
> >> >>
> >> >> I've got no idea...
> >> >
> >> > You can use ftrace or just add printks a bit everywhere and try to get
> >> > closer and closer.
> >>
> >> I think that after trying to get the feature it just start reading
> >> nonsense from the NAND and at some point it hangs due to that garbage…
> >
> > It should refuse to mount the device somehow, but in no case the kernel
> > should hang.
>
> Yes, I think that this is a side effect (maybe a different bug somewhere else).
>
> >
> >> Is it posible that the NAND starts behaving like this after getting
> >> the feature due to some specific config of my device?
> >>
> >> >
> >> > I looked at the patch, I don't see anything strange. Besides, I have a
> >> > close enough datasheet and I don't see what could confuse the device.
> >> >
> >> > Are you really sure this patch is the problem? Is the WP pin wired on
> >> > your design?
> >>
> >> There’s no WP pin in brcmnand controllers < 7.0
> >
> > What about the chip?
>
> Maybe it has a GPIO controlling that, but I don’t have that info…
>
> >
> > Thanks,
> > Miquèl
> >
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/mtd/nand-macronix.txt b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
index ffab28a2c4d1..03f65ca32cd3 100644
--- a/Documentation/devicetree/bindings/mtd/nand-macronix.txt
+++ b/Documentation/devicetree/bindings/mtd/nand-macronix.txt
@@ -16,6 +16,9 @@  in children nodes.
 Required NAND chip properties in children mode:
 - randomizer enable: should be "mxic,enable-randomizer-otp"
 
+Optional NAND chip properties in children mode:
+- block protection disable: should be "mxic,disable-block-protection"
+
 Example:
 
 	nand: nand-controller@unit-address {