diff mbox series

[v3] mtd: limit OTP NVMEM Cell parse to non Nand devices

Message ID 20240322040951.16680-1-ansuelsmth@gmail.com
State Accepted
Headers show
Series [v3] mtd: limit OTP NVMEM Cell parse to non Nand devices | expand

Commit Message

Christian Marangi March 22, 2024, 4:09 a.m. UTC
MTD OTP logic is very fragile and can be problematic with some specific
kind of devices.

NVMEM across the years had various iteration on how Cells could be
declared in DT and MTD OTP probably was left behind and
add_legacy_fixed_of_cells was enabled without thinking of the consequences.

That option enables NVMEM to scan the provided of_node and treat each
child as a NVMEM Cell, this was to support legacy NVMEM implementation
and don't cause regression.

This is problematic if we have devices like Nand where the OTP is
triggered by setting a special mode in the flash. In this context real
partitions declared in the Nand node are registered as OTP Cells and
this cause probe fail with -EINVAL error.

This was never notice due to the fact that till now, no Nand supported
the OTP feature. With commit e87161321a40 ("mtd: rawnand: macronix: OTP
access for MX30LFxG18AC") this changed and coincidentally this Nand is
used on an FritzBox 7530 supported on OpenWrt.

Alternative and more robust way to declare OTP Cells are already
prossible by using the fixed-layout node or by declaring a child node
with the compatible set to "otp-user" or "otp-factory".

To fix this and limit any regression with other MTD that makes use of
declaring OTP as direct child of the dev node, disable
add_legacy_fixed_of_cells if we detect the MTD type is Nand.

With the following logic, the OTP NVMEM entry is correctly created with
no Cells and the MTD Nand is correctly probed and partitions are
correctly exposed.

Fixes: 2cc3b37f5b6d ("nvmem: add explicit config option to read old syntax fixed OF cells")
Cc: <stable@vger.kernel.org>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 drivers/mtd/mtdcore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Miquel Raynal March 25, 2024, 10:18 a.m. UTC | #1
On Fri, 2024-03-22 at 04:09:49 UTC, Christian Marangi wrote:
> MTD OTP logic is very fragile and can be problematic with some specific
> kind of devices.
> 
> NVMEM across the years had various iteration on how Cells could be
> declared in DT and MTD OTP probably was left behind and
> add_legacy_fixed_of_cells was enabled without thinking of the consequences.
> 
> That option enables NVMEM to scan the provided of_node and treat each
> child as a NVMEM Cell, this was to support legacy NVMEM implementation
> and don't cause regression.
> 
> This is problematic if we have devices like Nand where the OTP is
> triggered by setting a special mode in the flash. In this context real
> partitions declared in the Nand node are registered as OTP Cells and
> this cause probe fail with -EINVAL error.
> 
> This was never notice due to the fact that till now, no Nand supported
> the OTP feature. With commit e87161321a40 ("mtd: rawnand: macronix: OTP
> access for MX30LFxG18AC") this changed and coincidentally this Nand is
> used on an FritzBox 7530 supported on OpenWrt.
> 
> Alternative and more robust way to declare OTP Cells are already
> prossible by using the fixed-layout node or by declaring a child node
> with the compatible set to "otp-user" or "otp-factory".
> 
> To fix this and limit any regression with other MTD that makes use of
> declaring OTP as direct child of the dev node, disable
> add_legacy_fixed_of_cells if we detect the MTD type is Nand.
> 
> With the following logic, the OTP NVMEM entry is correctly created with
> no Cells and the MTD Nand is correctly probed and partitions are
> correctly exposed.
> 
> Fixes: 2cc3b37f5b6d ("nvmem: add explicit config option to read old syntax fixed OF cells")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes, thanks.

Miquel
Rafał Miłecki March 27, 2024, 2:26 p.m. UTC | #2
On 2024-03-22 05:09, Christian Marangi wrote:
> MTD OTP logic is very fragile and can be problematic with some specific
> kind of devices.
> 
> NVMEM across the years had various iteration on how Cells could be
> declared in DT and MTD OTP probably was left behind and
> add_legacy_fixed_of_cells was enabled without thinking of the 
> consequences.

Er... thank you?


> That option enables NVMEM to scan the provided of_node and treat each
> child as a NVMEM Cell, this was to support legacy NVMEM implementation
> and don't cause regression.
> 
> This is problematic if we have devices like Nand where the OTP is
> triggered by setting a special mode in the flash. In this context real
> partitions declared in the Nand node are registered as OTP Cells and
> this cause probe fail with -EINVAL error.
> 
> This was never notice due to the fact that till now, no Nand supported
> the OTP feature. With commit e87161321a40 ("mtd: rawnand: macronix: OTP
> access for MX30LFxG18AC") this changed and coincidentally this Nand is
> used on an FritzBox 7530 supported on OpenWrt.

So as you noticed this problem was *exposed* by adding OTP support for
Macronix NAND chips.


> Alternative and more robust way to declare OTP Cells are already
> prossible by using the fixed-layout node or by declaring a child node
> with the compatible set to "otp-user" or "otp-factory".
> 
> To fix this and limit any regression with other MTD that makes use of
> declaring OTP as direct child of the dev node, disable
> add_legacy_fixed_of_cells if we detect the MTD type is Nand.
> 
> With the following logic, the OTP NVMEM entry is correctly created with
> no Cells and the MTD Nand is correctly probed and partitions are
> correctly exposed.
> 
> Fixes: 2cc3b37f5b6d ("nvmem: add explicit config option to read old 
> syntax fixed OF cells")

It's not that commit however that introduced the problem. Introducing
"add_legacy_fixed_of_cells" just added a clean way of enabling parsing
of old cells syntax. Even before my commit NVMEM subsystem was looking
for NVMEM cells in NAND devices.

I booted kernel 6.6 which has commit e87161321a40 ("mtd: rawnand:
macronix: OTP > access for MX30LFxG18AC") but does NOT have commit
2cc3b37f5b6d ("nvmem: add explicit config option to read old syntax
fixed OF cells").

Look at this log from Broadcom Northstar (Linux 6.6):
[    0.410107] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
[    0.416531] nand: Macronix MX30LF4G18AC
[    0.420409] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, 
OOB size: 64
[    0.428022] iproc_nand 18028000.nand-controller: detected 512MiB 
total, 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-8
[    0.438991] Scanning device for bad blocks
[    0.873598] Bad eraseblock 738 at 0x000005c40000
[    1.030279] random: crng init done
[    1.854895] Bad eraseblock 2414 at 0x000012dc0000
[    2.657354] Bad eraseblock 3783 at 0x00001d8e0000
[    2.662967] Bad eraseblock 3785 at 0x00001d920000
[    2.848418] nvmem user-otp1: nvmem: invalid reg on 
/nand-controller@18028000/nand@0
[    2.856126] iproc_nand 18028000.nand-controller: error -EINVAL: 
Failed to register OTP NVMEM device

So to summary it up:
1. Problem exists since much earlier and wasn't introduced by 
2cc3b37f5b6d
2. Commit 2cc3b37f5b6d just gives you a clean way of solving this issue
3. Problem was exposed by commit e87161321a40
4. We miss fix for v6.6 which doesn't have 2cc3b37f5b6d (it hit v6.7)


> Cc: <stable@vger.kernel.org>
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Christian Marangi March 27, 2024, 2:36 p.m. UTC | #3
On Wed, Mar 27, 2024 at 03:26:55PM +0100, Rafał Miłecki wrote:
> On 2024-03-22 05:09, Christian Marangi wrote:
> > MTD OTP logic is very fragile and can be problematic with some specific
> > kind of devices.
> > 
> > NVMEM across the years had various iteration on how Cells could be
> > declared in DT and MTD OTP probably was left behind and
> > add_legacy_fixed_of_cells was enabled without thinking of the
> > consequences.
> 
> Er... thank you?
>

Probably made some bad assumption and sorry for it!

> 
> > That option enables NVMEM to scan the provided of_node and treat each
> > child as a NVMEM Cell, this was to support legacy NVMEM implementation
> > and don't cause regression.
> > 
> > This is problematic if we have devices like Nand where the OTP is
> > triggered by setting a special mode in the flash. In this context real
> > partitions declared in the Nand node are registered as OTP Cells and
> > this cause probe fail with -EINVAL error.
> > 
> > This was never notice due to the fact that till now, no Nand supported
> > the OTP feature. With commit e87161321a40 ("mtd: rawnand: macronix: OTP
> > access for MX30LFxG18AC") this changed and coincidentally this Nand is
> > used on an FritzBox 7530 supported on OpenWrt.
> 
> So as you noticed this problem was *exposed* by adding OTP support for
> Macronix NAND chips.
> 
> 
> > Alternative and more robust way to declare OTP Cells are already
> > prossible by using the fixed-layout node or by declaring a child node
> > with the compatible set to "otp-user" or "otp-factory".
> > 
> > To fix this and limit any regression with other MTD that makes use of
> > declaring OTP as direct child of the dev node, disable
> > add_legacy_fixed_of_cells if we detect the MTD type is Nand.
> > 
> > With the following logic, the OTP NVMEM entry is correctly created with
> > no Cells and the MTD Nand is correctly probed and partitions are
> > correctly exposed.
> > 
> > Fixes: 2cc3b37f5b6d ("nvmem: add explicit config option to read old
> > syntax fixed OF cells")
> 
> It's not that commit however that introduced the problem. Introducing
> "add_legacy_fixed_of_cells" just added a clean way of enabling parsing
> of old cells syntax. Even before my commit NVMEM subsystem was looking
> for NVMEM cells in NAND devices.
> 
> I booted kernel 6.6 which has commit e87161321a40 ("mtd: rawnand:
> macronix: OTP > access for MX30LFxG18AC") but does NOT have commit
> 2cc3b37f5b6d ("nvmem: add explicit config option to read old syntax
> fixed OF cells").
> 
> Look at this log from Broadcom Northstar (Linux 6.6):
> [    0.410107] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> [    0.416531] nand: Macronix MX30LF4G18AC
> [    0.420409] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> size: 64
> [    0.428022] iproc_nand 18028000.nand-controller: detected 512MiB total,
> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-8
> [    0.438991] Scanning device for bad blocks
> [    0.873598] Bad eraseblock 738 at 0x000005c40000
> [    1.030279] random: crng init done
> [    1.854895] Bad eraseblock 2414 at 0x000012dc0000
> [    2.657354] Bad eraseblock 3783 at 0x00001d8e0000
> [    2.662967] Bad eraseblock 3785 at 0x00001d920000
> [    2.848418] nvmem user-otp1: nvmem: invalid reg on
> /nand-controller@18028000/nand@0
> [    2.856126] iproc_nand 18028000.nand-controller: error -EINVAL: Failed to
> register OTP NVMEM device
> 
> So to summary it up:
> 1. Problem exists since much earlier and wasn't introduced by 2cc3b37f5b6d
> 2. Commit 2cc3b37f5b6d just gives you a clean way of solving this issue
> 3. Problem was exposed by commit e87161321a40
> 4. We miss fix for v6.6 which doesn't have 2cc3b37f5b6d (it hit v6.7)
> 

So the thing was broken all along? Maybe the regression was introduced
when OF support for NVMEM cell was introduced? (and OF scan was enabled
by default?)

Anyway Sorry for adding the wrong fixes, maybe Miquel can remote the
commit from mtd/fixes and fix the problematic fixes tag?

> 
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Miquel Raynal March 27, 2024, 3:31 p.m. UTC | #4
Hi Christian,

ansuelsmth@gmail.com wrote on Wed, 27 Mar 2024 15:36:54 +0100:

> On Wed, Mar 27, 2024 at 03:26:55PM +0100, Rafał Miłecki wrote:
> > On 2024-03-22 05:09, Christian Marangi wrote:  
> > > MTD OTP logic is very fragile and can be problematic with some specific
> > > kind of devices.
> > > 
> > > NVMEM across the years had various iteration on how Cells could be
> > > declared in DT and MTD OTP probably was left behind and
> > > add_legacy_fixed_of_cells was enabled without thinking of the
> > > consequences.  
> > 
> > Er... thank you?
> >  
> 
> Probably made some bad assumption and sorry for it!

Well, "not thinking about all consequences" seems always legitimate to
me, we are not robots. Anyway, I agree we should drop this sentence.

> > > That option enables NVMEM to scan the provided of_node and treat each
> > > child as a NVMEM Cell, this was to support legacy NVMEM implementation
> > > and don't cause regression.
> > > 
> > > This is problematic if we have devices like Nand where the OTP is
> > > triggered by setting a special mode in the flash. In this context real
> > > partitions declared in the Nand node are registered as OTP Cells and
> > > this cause probe fail with -EINVAL error.
> > > 
> > > This was never notice due to the fact that till now, no Nand supported
> > > the OTP feature. With commit e87161321a40 ("mtd: rawnand: macronix: OTP
> > > access for MX30LFxG18AC") this changed and coincidentally this Nand is
> > > used on an FritzBox 7530 supported on OpenWrt.  
> > 
> > So as you noticed this problem was *exposed* by adding OTP support for
> > Macronix NAND chips.
> > 
> >   
> > > Alternative and more robust way to declare OTP Cells are already
> > > prossible by using the fixed-layout node or by declaring a child node
> > > with the compatible set to "otp-user" or "otp-factory".
> > > 
> > > To fix this and limit any regression with other MTD that makes use of
> > > declaring OTP as direct child of the dev node, disable
> > > add_legacy_fixed_of_cells if we detect the MTD type is Nand.
> > > 
> > > With the following logic, the OTP NVMEM entry is correctly created with
> > > no Cells and the MTD Nand is correctly probed and partitions are
> > > correctly exposed.
> > > 
> > > Fixes: 2cc3b37f5b6d ("nvmem: add explicit config option to read old
> > > syntax fixed OF cells")  
> > 
> > It's not that commit however that introduced the problem. Introducing
> > "add_legacy_fixed_of_cells" just added a clean way of enabling parsing
> > of old cells syntax. Even before my commit NVMEM subsystem was looking
> > for NVMEM cells in NAND devices.
> > 
> > I booted kernel 6.6 which has commit e87161321a40 ("mtd: rawnand:
> > macronix: OTP > access for MX30LFxG18AC") but does NOT have commit
> > 2cc3b37f5b6d ("nvmem: add explicit config option to read old syntax
> > fixed OF cells").
> > 
> > Look at this log from Broadcom Northstar (Linux 6.6):
> > [    0.410107] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > [    0.416531] nand: Macronix MX30LF4G18AC
> > [    0.420409] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> > size: 64
> > [    0.428022] iproc_nand 18028000.nand-controller: detected 512MiB total,
> > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-8
> > [    0.438991] Scanning device for bad blocks
> > [    0.873598] Bad eraseblock 738 at 0x000005c40000
> > [    1.030279] random: crng init done
> > [    1.854895] Bad eraseblock 2414 at 0x000012dc0000
> > [    2.657354] Bad eraseblock 3783 at 0x00001d8e0000
> > [    2.662967] Bad eraseblock 3785 at 0x00001d920000
> > [    2.848418] nvmem user-otp1: nvmem: invalid reg on
> > /nand-controller@18028000/nand@0
> > [    2.856126] iproc_nand 18028000.nand-controller: error -EINVAL: Failed to
> > register OTP NVMEM device
> > 
> > So to summary it up:
> > 1. Problem exists since much earlier and wasn't introduced by 2cc3b37f5b6d
> > 2. Commit 2cc3b37f5b6d just gives you a clean way of solving this issue
> > 3. Problem was exposed by commit e87161321a40
> > 4. We miss fix for v6.6 which doesn't have 2cc3b37f5b6d (it hit v6.7)
> >   
> 
> So the thing was broken all along? Maybe the regression was introduced
> when OF support for NVMEM cell was introduced? (and OF scan was enabled
> by default?)
> 
> Anyway Sorry for adding the wrong fixes, maybe Miquel can remote the
> commit from mtd/fixes and fix the problematic fixes tag?

Yes, please send a v4 (with the sentence above updated) and I will drop
v3.

Thanks,
Miquèl
Rafał Miłecki March 27, 2024, 9:53 p.m. UTC | #5
On 27.03.2024 15:36, Christian Marangi wrote:
> On Wed, Mar 27, 2024 at 03:26:55PM +0100, Rafał Miłecki wrote:
>> On 2024-03-22 05:09, Christian Marangi wrote:
>>> MTD OTP logic is very fragile and can be problematic with some specific
>>> kind of devices.
>>>
>>> NVMEM across the years had various iteration on how Cells could be
>>> declared in DT and MTD OTP probably was left behind and
>>> add_legacy_fixed_of_cells was enabled without thinking of the
>>> consequences.
>>
>> Er... thank you?
>>
> 
> Probably made some bad assumption and sorry for it!

No problem :)


>>> That option enables NVMEM to scan the provided of_node and treat each
>>> child as a NVMEM Cell, this was to support legacy NVMEM implementation
>>> and don't cause regression.
>>>
>>> This is problematic if we have devices like Nand where the OTP is
>>> triggered by setting a special mode in the flash. In this context real
>>> partitions declared in the Nand node are registered as OTP Cells and
>>> this cause probe fail with -EINVAL error.
>>>
>>> This was never notice due to the fact that till now, no Nand supported
>>> the OTP feature. With commit e87161321a40 ("mtd: rawnand: macronix: OTP
>>> access for MX30LFxG18AC") this changed and coincidentally this Nand is
>>> used on an FritzBox 7530 supported on OpenWrt.
>>
>> So as you noticed this problem was *exposed* by adding OTP support for
>> Macronix NAND chips.
>>
>>
>>> Alternative and more robust way to declare OTP Cells are already
>>> prossible by using the fixed-layout node or by declaring a child node
>>> with the compatible set to "otp-user" or "otp-factory".
>>>
>>> To fix this and limit any regression with other MTD that makes use of
>>> declaring OTP as direct child of the dev node, disable
>>> add_legacy_fixed_of_cells if we detect the MTD type is Nand.
>>>
>>> With the following logic, the OTP NVMEM entry is correctly created with
>>> no Cells and the MTD Nand is correctly probed and partitions are
>>> correctly exposed.
>>>
>>> Fixes: 2cc3b37f5b6d ("nvmem: add explicit config option to read old
>>> syntax fixed OF cells")
>>
>> It's not that commit however that introduced the problem. Introducing
>> "add_legacy_fixed_of_cells" just added a clean way of enabling parsing
>> of old cells syntax. Even before my commit NVMEM subsystem was looking
>> for NVMEM cells in NAND devices.
>>
>> I booted kernel 6.6 which has commit e87161321a40 ("mtd: rawnand:
>> macronix: OTP > access for MX30LFxG18AC") but does NOT have commit
>> 2cc3b37f5b6d ("nvmem: add explicit config option to read old syntax
>> fixed OF cells").
>>
>> Look at this log from Broadcom Northstar (Linux 6.6):
>> [    0.410107] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
>> [    0.416531] nand: Macronix MX30LF4G18AC
>> [    0.420409] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
>> size: 64
>> [    0.428022] iproc_nand 18028000.nand-controller: detected 512MiB total,
>> 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-8
>> [    0.438991] Scanning device for bad blocks
>> [    0.873598] Bad eraseblock 738 at 0x000005c40000
>> [    1.030279] random: crng init done
>> [    1.854895] Bad eraseblock 2414 at 0x000012dc0000
>> [    2.657354] Bad eraseblock 3783 at 0x00001d8e0000
>> [    2.662967] Bad eraseblock 3785 at 0x00001d920000
>> [    2.848418] nvmem user-otp1: nvmem: invalid reg on
>> /nand-controller@18028000/nand@0
>> [    2.856126] iproc_nand 18028000.nand-controller: error -EINVAL: Failed to
>> register OTP NVMEM device
>>
>> So to summary it up:
>> 1. Problem exists since much earlier and wasn't introduced by 2cc3b37f5b6d
>> 2. Commit 2cc3b37f5b6d just gives you a clean way of solving this issue
>> 3. Problem was exposed by commit e87161321a40
>> 4. We miss fix for v6.6 which doesn't have 2cc3b37f5b6d (it hit v6.7)
>>
> 
> So the thing was broken all along? Maybe the regression was introduced
> when OF support for NVMEM cell was introduced? (and OF scan was enabled
> by default?)

I went back to the commit 4b361cfa8624 ("mtd: core: add OTP nvmem
provider support") from 2021 (it went into 5.14). It seems that even
back then nvmem_register() used to call nvmem_add_cells_from_of(). SO
maybe this problem is as old as that?


> Anyway Sorry for adding the wrong fixes, maybe Miquel can remote the
> commit from mtd/fixes and fix the problematic fixes tag?
> 
>>
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
>
Rafał Miłecki March 27, 2024, 10:15 p.m. UTC | #6
On 22.03.2024 05:09, Christian Marangi wrote:
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index 5887feb347a4..0de87bc63840 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -900,7 +900,7 @@ static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
>   	config.name = compatible;
>   	config.id = NVMEM_DEVID_AUTO;
>   	config.owner = THIS_MODULE;
> -	config.add_legacy_fixed_of_cells = true;
> +	config.add_legacy_fixed_of_cells = !mtd_type_is_nand(mtd);
>   	config.type = NVMEM_TYPE_OTP;
>   	config.root_only = true;
>   	config.ignore_wp = true;

I think there may be even more unwanted behaviour here. If
mtd_otp_nvmem_register() fails to find node with "user-otp" /
"factory-otp" compatible then it sets "config.of_node" to NULL but that
means NVMEM core still looks for NVMEM cells in device's "of_node".

I believe we should not look for OTP NVMEM cells out of the "user-otp" /
"factory-otp" compatible nodes.

So maybe what we need in the first place is just:
config.add_legacy_fixed_of_cells = !!np;
?

Any extra limitation of .add_legacy_fixed_of_cells should probably be
used only if we want to prevent new users of the legacy syntax. The
problem is that mtd.yaml binding allowed "user-otp" and "factory-otp"
with old syntax cells. It means every MTD device was allowed to have
them.

No in-kernel DTS even used "user-otp" or "factory-otp" with NVMEM legacy
cells but I'm not sure about downstream DTS files. Ideally we would do
config.add_legacy_fixed_of_cells = false;
but that could break compatibility with some downstream DTS files.
Christian Marangi March 28, 2024, 2:19 p.m. UTC | #7
On Wed, Mar 27, 2024 at 11:15:02PM +0100, Rafał Miłecki wrote:
> On 22.03.2024 05:09, Christian Marangi wrote:
> > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> > index 5887feb347a4..0de87bc63840 100644
> > --- a/drivers/mtd/mtdcore.c
> > +++ b/drivers/mtd/mtdcore.c
> > @@ -900,7 +900,7 @@ static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
> >   	config.name = compatible;
> >   	config.id = NVMEM_DEVID_AUTO;
> >   	config.owner = THIS_MODULE;
> > -	config.add_legacy_fixed_of_cells = true;
> > +	config.add_legacy_fixed_of_cells = !mtd_type_is_nand(mtd);
> >   	config.type = NVMEM_TYPE_OTP;
> >   	config.root_only = true;
> >   	config.ignore_wp = true;
> 
> I think there may be even more unwanted behaviour here. If
> mtd_otp_nvmem_register() fails to find node with "user-otp" /
> "factory-otp" compatible then it sets "config.of_node" to NULL but that
> means NVMEM core still looks for NVMEM cells in device's "of_node".
> 
> I believe we should not look for OTP NVMEM cells out of the "user-otp" /
> "factory-otp" compatible nodes.
> 
> So maybe what we need in the first place is just:
> config.add_legacy_fixed_of_cells = !!np;
> ?
> 
> Any extra limitation of .add_legacy_fixed_of_cells should probably be
> used only if we want to prevent new users of the legacy syntax. The
> problem is that mtd.yaml binding allowed "user-otp" and "factory-otp"
> with old syntax cells. It means every MTD device was allowed to have
> them.
> 
> No in-kernel DTS even used "user-otp" or "factory-otp" with NVMEM legacy
> cells but I'm not sure about downstream DTS files. Ideally we would do
> config.add_legacy_fixed_of_cells = false;
> but that could break compatibility with some downstream DTS files.

Yes the main problem is prevent regression in downstream. I feel for the
nand usage, this is 100% of the times broken. For SPI and other corner
case MTD devices it's not?

Anyway did you by chance have a suggestion for a better fixes tag?
Christian Marangi March 28, 2024, 2:20 p.m. UTC | #8
On Wed, Mar 27, 2024 at 04:31:29PM +0100, Miquel Raynal wrote:
> Hi Christian,
> 
> ansuelsmth@gmail.com wrote on Wed, 27 Mar 2024 15:36:54 +0100:
> 
> > On Wed, Mar 27, 2024 at 03:26:55PM +0100, Rafał Miłecki wrote:
> > > On 2024-03-22 05:09, Christian Marangi wrote:  
> > > > MTD OTP logic is very fragile and can be problematic with some specific
> > > > kind of devices.
> > > > 
> > > > NVMEM across the years had various iteration on how Cells could be
> > > > declared in DT and MTD OTP probably was left behind and
> > > > add_legacy_fixed_of_cells was enabled without thinking of the
> > > > consequences.  
> > > 
> > > Er... thank you?
> > >  
> > 
> > Probably made some bad assumption and sorry for it!
> 
> Well, "not thinking about all consequences" seems always legitimate to
> me, we are not robots. Anyway, I agree we should drop this sentence.
>
> > > > That option enables NVMEM to scan the provided of_node and treat each
> > > > child as a NVMEM Cell, this was to support legacy NVMEM implementation
> > > > and don't cause regression.
> > > > 
> > > > This is problematic if we have devices like Nand where the OTP is
> > > > triggered by setting a special mode in the flash. In this context real
> > > > partitions declared in the Nand node are registered as OTP Cells and
> > > > this cause probe fail with -EINVAL error.
> > > > 
> > > > This was never notice due to the fact that till now, no Nand supported
> > > > the OTP feature. With commit e87161321a40 ("mtd: rawnand: macronix: OTP
> > > > access for MX30LFxG18AC") this changed and coincidentally this Nand is
> > > > used on an FritzBox 7530 supported on OpenWrt.  
> > > 
> > > So as you noticed this problem was *exposed* by adding OTP support for
> > > Macronix NAND chips.
> > > 
> > >   
> > > > Alternative and more robust way to declare OTP Cells are already
> > > > prossible by using the fixed-layout node or by declaring a child node
> > > > with the compatible set to "otp-user" or "otp-factory".
> > > > 
> > > > To fix this and limit any regression with other MTD that makes use of
> > > > declaring OTP as direct child of the dev node, disable
> > > > add_legacy_fixed_of_cells if we detect the MTD type is Nand.
> > > > 
> > > > With the following logic, the OTP NVMEM entry is correctly created with
> > > > no Cells and the MTD Nand is correctly probed and partitions are
> > > > correctly exposed.
> > > > 
> > > > Fixes: 2cc3b37f5b6d ("nvmem: add explicit config option to read old
> > > > syntax fixed OF cells")  
> > > 
> > > It's not that commit however that introduced the problem. Introducing
> > > "add_legacy_fixed_of_cells" just added a clean way of enabling parsing
> > > of old cells syntax. Even before my commit NVMEM subsystem was looking
> > > for NVMEM cells in NAND devices.
> > > 
> > > I booted kernel 6.6 which has commit e87161321a40 ("mtd: rawnand:
> > > macronix: OTP > access for MX30LFxG18AC") but does NOT have commit
> > > 2cc3b37f5b6d ("nvmem: add explicit config option to read old syntax
> > > fixed OF cells").
> > > 
> > > Look at this log from Broadcom Northstar (Linux 6.6):
> > > [    0.410107] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > [    0.416531] nand: Macronix MX30LF4G18AC
> > > [    0.420409] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB
> > > size: 64
> > > [    0.428022] iproc_nand 18028000.nand-controller: detected 512MiB total,
> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-8
> > > [    0.438991] Scanning device for bad blocks
> > > [    0.873598] Bad eraseblock 738 at 0x000005c40000
> > > [    1.030279] random: crng init done
> > > [    1.854895] Bad eraseblock 2414 at 0x000012dc0000
> > > [    2.657354] Bad eraseblock 3783 at 0x00001d8e0000
> > > [    2.662967] Bad eraseblock 3785 at 0x00001d920000
> > > [    2.848418] nvmem user-otp1: nvmem: invalid reg on
> > > /nand-controller@18028000/nand@0
> > > [    2.856126] iproc_nand 18028000.nand-controller: error -EINVAL: Failed to
> > > register OTP NVMEM device
> > > 
> > > So to summary it up:
> > > 1. Problem exists since much earlier and wasn't introduced by 2cc3b37f5b6d
> > > 2. Commit 2cc3b37f5b6d just gives you a clean way of solving this issue
> > > 3. Problem was exposed by commit e87161321a40
> > > 4. We miss fix for v6.6 which doesn't have 2cc3b37f5b6d (it hit v6.7)
> > >   
> > 
> > So the thing was broken all along? Maybe the regression was introduced
> > when OF support for NVMEM cell was introduced? (and OF scan was enabled
> > by default?)
> > 
> > Anyway Sorry for adding the wrong fixes, maybe Miquel can remote the
> > commit from mtd/fixes and fix the problematic fixes tag?
> 
> Yes, please send a v4 (with the sentence above updated) and I will drop
> v3.
> 

Thanks a lot! I asked Rafal some suggestion for a better fixes tag and I
will send v4.
Rafał Miłecki March 28, 2024, 2:44 p.m. UTC | #9
On 2024-03-28 15:19, Christian Marangi wrote:
> On Wed, Mar 27, 2024 at 11:15:02PM +0100, Rafał Miłecki wrote:
>> On 22.03.2024 05:09, Christian Marangi wrote:
>> > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
>> > index 5887feb347a4..0de87bc63840 100644
>> > --- a/drivers/mtd/mtdcore.c
>> > +++ b/drivers/mtd/mtdcore.c
>> > @@ -900,7 +900,7 @@ static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
>> >   	config.name = compatible;
>> >   	config.id = NVMEM_DEVID_AUTO;
>> >   	config.owner = THIS_MODULE;
>> > -	config.add_legacy_fixed_of_cells = true;
>> > +	config.add_legacy_fixed_of_cells = !mtd_type_is_nand(mtd);
>> >   	config.type = NVMEM_TYPE_OTP;
>> >   	config.root_only = true;
>> >   	config.ignore_wp = true;
>> 
>> I think there may be even more unwanted behaviour here. If
>> mtd_otp_nvmem_register() fails to find node with "user-otp" /
>> "factory-otp" compatible then it sets "config.of_node" to NULL but 
>> that
>> means NVMEM core still looks for NVMEM cells in device's "of_node".
>> 
>> I believe we should not look for OTP NVMEM cells out of the "user-otp" 
>> /
>> "factory-otp" compatible nodes.
>> 
>> So maybe what we need in the first place is just:
>> config.add_legacy_fixed_of_cells = !!np;
>> ?
>> 
>> Any extra limitation of .add_legacy_fixed_of_cells should probably be
>> used only if we want to prevent new users of the legacy syntax. The
>> problem is that mtd.yaml binding allowed "user-otp" and "factory-otp"
>> with old syntax cells. It means every MTD device was allowed to have
>> them.
>> 
>> No in-kernel DTS even used "user-otp" or "factory-otp" with NVMEM 
>> legacy
>> cells but I'm not sure about downstream DTS files. Ideally we would do
>> config.add_legacy_fixed_of_cells = false;
>> but that could break compatibility with some downstream DTS files.
> 
> Yes the main problem is prevent regression in downstream. I feel for 
> the
> nand usage, this is 100% of the times broken. For SPI and other corner
> case MTD devices it's not?
> 
> Anyway did you by chance have a suggestion for a better fixes tag?

My personal idea for that would be to put two Fixes with two commits and
describe in commit body that one just exposed existing bug.

You may check my OpenWrt quick patch for an idea how I'd handle that:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/pending-6.6/440-mtd-don-t-look-for-OTP-legacy-NVMEM-cells-if-proper-.patch;h=d9d15a4048c144d8565c8ea38e15a79f7f4a5fe1;hb=dd78a59cd7b029560b33cb3ac0e1aa8b747bd807
Christian Marangi March 30, 2024, 9:13 a.m. UTC | #10
On Thu, Mar 28, 2024 at 03:44:15PM +0100, Rafał Miłecki wrote:
> On 2024-03-28 15:19, Christian Marangi wrote:
> > On Wed, Mar 27, 2024 at 11:15:02PM +0100, Rafał Miłecki wrote:
> > > On 22.03.2024 05:09, Christian Marangi wrote:
> > > > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> > > > index 5887feb347a4..0de87bc63840 100644
> > > > --- a/drivers/mtd/mtdcore.c
> > > > +++ b/drivers/mtd/mtdcore.c
> > > > @@ -900,7 +900,7 @@ static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
> > > >   	config.name = compatible;
> > > >   	config.id = NVMEM_DEVID_AUTO;
> > > >   	config.owner = THIS_MODULE;
> > > > -	config.add_legacy_fixed_of_cells = true;
> > > > +	config.add_legacy_fixed_of_cells = !mtd_type_is_nand(mtd);
> > > >   	config.type = NVMEM_TYPE_OTP;
> > > >   	config.root_only = true;
> > > >   	config.ignore_wp = true;
> > > 
> > > I think there may be even more unwanted behaviour here. If
> > > mtd_otp_nvmem_register() fails to find node with "user-otp" /
> > > "factory-otp" compatible then it sets "config.of_node" to NULL but
> > > that
> > > means NVMEM core still looks for NVMEM cells in device's "of_node".
> > > 
> > > I believe we should not look for OTP NVMEM cells out of the
> > > "user-otp" /
> > > "factory-otp" compatible nodes.
> > > 
> > > So maybe what we need in the first place is just:
> > > config.add_legacy_fixed_of_cells = !!np;
> > > ?
> > > 
> > > Any extra limitation of .add_legacy_fixed_of_cells should probably be
> > > used only if we want to prevent new users of the legacy syntax. The
> > > problem is that mtd.yaml binding allowed "user-otp" and "factory-otp"
> > > with old syntax cells. It means every MTD device was allowed to have
> > > them.
> > > 
> > > No in-kernel DTS even used "user-otp" or "factory-otp" with NVMEM
> > > legacy
> > > cells but I'm not sure about downstream DTS files. Ideally we would do
> > > config.add_legacy_fixed_of_cells = false;
> > > but that could break compatibility with some downstream DTS files.
> > 
> > Yes the main problem is prevent regression in downstream. I feel for the
> > nand usage, this is 100% of the times broken. For SPI and other corner
> > case MTD devices it's not?
> > 
> > Anyway did you by chance have a suggestion for a better fixes tag?
> 
> My personal idea for that would be to put two Fixes with two commits and
> describe in commit body that one just exposed existing bug.
> 
> You may check my OpenWrt quick patch for an idea how I'd handle that:
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/pending-6.6/440-mtd-don-t-look-for-OTP-legacy-NVMEM-cells-if-proper-.patch;h=d9d15a4048c144d8565c8ea38e15a79f7f4a5fe1;hb=dd78a59cd7b029560b33cb3ac0e1aa8b747bd807
>

My concern is that using !!np might pose some regression problem. Also I
feel adding the macronix commit in fixes tag might be confusing?

Think I will just use the nand check just to be extra safe and add a
kernel dependency for when the add_legacy_fixed_of_cells was introduced
since before that a different patch is needed. What do you think?
diff mbox series

Patch

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 5887feb347a4..0de87bc63840 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -900,7 +900,7 @@  static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
 	config.name = compatible;
 	config.id = NVMEM_DEVID_AUTO;
 	config.owner = THIS_MODULE;
-	config.add_legacy_fixed_of_cells = true;
+	config.add_legacy_fixed_of_cells = !mtd_type_is_nand(mtd);
 	config.type = NVMEM_TYPE_OTP;
 	config.root_only = true;
 	config.ignore_wp = true;