mbox series

[0/8] block: implement NVMEM provider

Message ID cover.1711048433.git.daniel@makrotopia.org
Headers show
Series block: implement NVMEM provider | expand

Message

Daniel Golle March 21, 2024, 7:31 p.m. UTC
On embedded devices using an eMMC it is common that one or more (hw/sw)
partitions on the eMMC are used to store MAC addresses and Wi-Fi
calibration EEPROM data.

Implement an NVMEM provider backed by a block device as typically the
NVMEM framework is used to have kernel drivers read and use binary data
from EEPROMs, efuses, flash memory (MTD), ...

In order to be able to reference hardware partitions on an eMMC, add code
to bind each hardware partition to a specific firmware subnode.

Overall, this enables uniform handling across practially all flash
storage types used for this purpose (MTD, UBI, and now also MMC).

As part of this series it was necessary to define a device tree schema
for block devices and partitions on them, which (similar to how it now
works also for UBI volumes) can be matched by one or more properties.

---
This series has previously been submitted as RFC on July 19th 2023[1]
and most of the basic idea did not change since. Another round of RFC
was submitted on March 5th 2024[2] which has received overall positive
feedback and only minor corrections have been done since (see
changelog below).

[1]: https://patchwork.kernel.org/project/linux-block/list/?series=767565
[2]: https://patchwork.kernel.org/project/linux-block/list/?series=832705

Changes since RFC:
 * Use 'partuuid' instead of reserved 'uuid' keyword to match against
   PARTUUID.
 * Simplify blk_nvmem_init(void) function.

Daniel Golle (8):
  dt-bindings: block: add basic bindings for block devices
  block: partitions: populate fwnode
  block: add new genhd flag GENHD_FL_NVMEM
  block: implement NVMEM provider
  dt-bindings: mmc: mmc-card: add block device nodes
  mmc: core: set card fwnode_handle
  mmc: block: set fwnode of disk devices
  mmc: block: set GENHD_FL_NVMEM

 .../bindings/block/block-device.yaml          |  22 +++
 .../devicetree/bindings/block/partition.yaml  |  51 ++++++
 .../devicetree/bindings/block/partitions.yaml |  20 +++
 .../devicetree/bindings/mmc/mmc-card.yaml     |  45 +++++
 MAINTAINERS                                   |   5 +
 block/Kconfig                                 |   9 +
 block/Makefile                                |   1 +
 block/blk-nvmem.c                             | 169 ++++++++++++++++++
 block/partitions/core.c                       |  41 +++++
 drivers/mmc/core/block.c                      |   8 +
 drivers/mmc/core/bus.c                        |   2 +
 include/linux/blkdev.h                        |   2 +
 12 files changed, 375 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/block/block-device.yaml
 create mode 100644 Documentation/devicetree/bindings/block/partition.yaml
 create mode 100644 Documentation/devicetree/bindings/block/partitions.yaml
 create mode 100644 block/blk-nvmem.c

Comments

Bart Van Assche March 21, 2024, 7:44 p.m. UTC | #1
On 3/21/24 12:34, Daniel Golle wrote:
> On embedded devices using an eMMC it is common that one or more partitions
> on the eMMC are used to store MAC addresses and Wi-Fi calibration EEPROM
> data. Allow referencing the partition in device tree for the kernel and
> Wi-Fi drivers accessing it via the NVMEM layer.

Why to store calibration data in a partition instead of in a file on a
filesystem?

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8c88f362feb55..242a0a139c00a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3662,6 +3662,11 @@ L:	linux-mtd@lists.infradead.org
>   S:	Maintained
>   F:	drivers/mtd/devices/block2mtd.c
>   
> +BLOCK NVMEM DRIVER
> +M:	Daniel Golle <daniel@makrotopia.org>
> +S:	Maintained
> +F:	block/blk-nvmem.c

Why to add this functionality to the block layer instead of somewhere
in the drivers/ directory?

Thanks,

Bart.
Daniel Golle March 21, 2024, 8:22 p.m. UTC | #2
Hi Bart,

thank you for looking at the patches!

On Thu, Mar 21, 2024 at 12:44:19PM -0700, Bart Van Assche wrote:
> On 3/21/24 12:34, Daniel Golle wrote:
> > On embedded devices using an eMMC it is common that one or more partitions
> > on the eMMC are used to store MAC addresses and Wi-Fi calibration EEPROM
> > data. Allow referencing the partition in device tree for the kernel and
> > Wi-Fi drivers accessing it via the NVMEM layer.
> 
> Why to store calibration data in a partition instead of in a file on a
> filesystem?

First of all, it's just how it is already in the practical world out
there. The same methods for mass-production are used independently of
the type of flash memory, so vendors don't care if in Linux the flash
ends up as MMC/block (in case of an eMMC) device or MTD device (in
case of SPI-NOR, for example). I can name countless devices of
numerous vendors following this generally very common practise (and
then ending up extracting that using ugly custom drivers, or poking
around in the block devices in early userland, ... none of it is nice,
which is the motivation for this series).
Adtran, GL-iNet, Netgear, ... to name just a few very popular vendors.

The devices are already out there, and the way they store those
details is considered part of the low level firmware which will never
change. Yet it would be nice to run vanilla Linux on them (or
OpenWrt), and make sure things like NFS root can work, and for that
the MAC address needs to be in place already, ie. extracting it in
userland would be too late.

However, I also believe there is nothing wrong with that and using a
filesystem comes with many additional pitfalls, such as being possibly
not cleanly unmounted, the file could be renamed or deleted by the
user, .... All that should not result in a device not having it's
proper MAC address any more.

Why have all the complexity for something as simple as storing 6 bytes
of MAC address?

I will not re-iterate over all that discussion now, you may look at
list archives where this has been explained and discussed also for the
first run of the RFC series last year.

> 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 8c88f362feb55..242a0a139c00a 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3662,6 +3662,11 @@ L:	linux-mtd@lists.infradead.org
> >   S:	Maintained
> >   F:	drivers/mtd/devices/block2mtd.c
> > +BLOCK NVMEM DRIVER
> > +M:	Daniel Golle <daniel@makrotopia.org>
> > +S:	Maintained
> > +F:	block/blk-nvmem.c
> 
> Why to add this functionality to the block layer instead of somewhere
> in the drivers/ directory?

Simply because we need notifications about appearing and disappearing
block devices, or a way to iterate over all block devices in a system.
For both there isn't currently any other interface than using a
class_interface for that, and that requires access to &block_class
which is considered a block subsystem internal.

Also note that the same is true for the MTD NVMEM provider (in
drivers/mtd/mtdcore.c) as well as the UBI NVMEM provider (in
drivers/mtd/ubi/nvmem.c), both are considered an integral part of
their corresponding subsystems -- despite the fact that in those cases
this wouldn't even be stricktly needed as for MTD we got
register_mtd_user() and for UBI we'd have
ubi_register_volume_notifier().

Doing it differently for block devices would hence not only complicate
things unnessesarily, it would also be inconsistent.
Bart Van Assche March 22, 2024, 5:49 p.m. UTC | #3
On 3/21/24 12:33, Daniel Golle wrote:
> Add new flag to destinguish block devices which may act as an NVMEM
> provider.
> 
> Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> ---
>   include/linux/blkdev.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c3e8f7cf96be9..f2c4f280d7619 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -81,11 +81,13 @@ struct partition_meta_info {
>    * ``GENHD_FL_NO_PART``: partition support is disabled.  The kernel will not
>    * scan for partitions from add_disk, and users can't add partitions manually.
>    *
> + * ``GENHD_FL_NVMEM``: the block device should be considered as NVMEM provider.
>    */
>   enum {
>   	GENHD_FL_REMOVABLE			= 1 << 0,
>   	GENHD_FL_HIDDEN				= 1 << 1,
>   	GENHD_FL_NO_PART			= 1 << 2,
> +	GENHD_FL_NVMEM				= 1 << 3,
>   };

What would break if this flag wouldn't exist?

Thanks,

Bart.
Bart Van Assche March 22, 2024, 5:52 p.m. UTC | #4
On 3/21/24 12:31, Daniel Golle wrote:
> On embedded devices using an eMMC it is common that one or more (hw/sw)
> partitions on the eMMC are used to store MAC addresses and Wi-Fi
> calibration EEPROM data.
> 
> Implement an NVMEM provider backed by a block device as typically the
> NVMEM framework is used to have kernel drivers read and use binary data
> from EEPROMs, efuses, flash memory (MTD), ...
> 
> In order to be able to reference hardware partitions on an eMMC, add code
> to bind each hardware partition to a specific firmware subnode.
> 
> Overall, this enables uniform handling across practially all flash
> storage types used for this purpose (MTD, UBI, and now also MMC).
> 
> As part of this series it was necessary to define a device tree schema
> for block devices and partitions on them, which (similar to how it now
> works also for UBI volumes) can be matched by one or more properties.

Since this patch series adds code that opens partitions and reads
from partitions, can that part of the functionality be implemented in
user space? There is already a mechanism for notifying user space about
block device changes, namely udev.

Thanks,

Bart.
Bart Van Assche March 22, 2024, 5:52 p.m. UTC | #5
On 3/21/24 13:22, Daniel Golle wrote:
> On Thu, Mar 21, 2024 at 12:44:19PM -0700, Bart Van Assche wrote:
>> Why to add this functionality to the block layer instead of somewhere
>> in the drivers/ directory?
> 
> Simply because we need notifications about appearing and disappearing
> block devices, or a way to iterate over all block devices in a system.
> For both there isn't currently any other interface than using a
> class_interface for that, and that requires access to &block_class
> which is considered a block subsystem internal.

That's an argument for adding an interface to the block layer that
implements this functionality but not for adding this code in the block
layer.

Thanks,

Bart.
Daniel Golle March 22, 2024, 6:02 p.m. UTC | #6
On Fri, Mar 22, 2024 at 10:52:17AM -0700, Bart Van Assche wrote:
> On 3/21/24 12:31, Daniel Golle wrote:
> > On embedded devices using an eMMC it is common that one or more (hw/sw)
> > partitions on the eMMC are used to store MAC addresses and Wi-Fi
> > calibration EEPROM data.
> > 
> > Implement an NVMEM provider backed by a block device as typically the
> > NVMEM framework is used to have kernel drivers read and use binary data
> > from EEPROMs, efuses, flash memory (MTD), ...
> > 
> > In order to be able to reference hardware partitions on an eMMC, add code
> > to bind each hardware partition to a specific firmware subnode.
> > 
> > Overall, this enables uniform handling across practially all flash
> > storage types used for this purpose (MTD, UBI, and now also MMC).
> > 
> > As part of this series it was necessary to define a device tree schema
> > for block devices and partitions on them, which (similar to how it now
> > works also for UBI volumes) can be matched by one or more properties.
> 
> Since this patch series adds code that opens partitions and reads
> from partitions, can that part of the functionality be implemented in
> user space? There is already a mechanism for notifying user space about
> block device changes, namely udev.

No. Because it has to happen (e.g. for nfsroot to work) before
userland gets initiated: Without Ethernet MAC address (which if often
stored at some raw offset on a partition or hw-partition of an eMMC),
we don't have a way to use nfsroot (because that requires functional
Ethernet), hence userland won't come up. It's a circular dependency
problem which can only be addressed by making sure that everything
needed for Ethernet to come up is provided by the kernel **before**
rootfs (which can be nfsroot) is mounted.
Daniel Golle March 22, 2024, 6:07 p.m. UTC | #7
On Fri, Mar 22, 2024 at 10:49:48AM -0700, Bart Van Assche wrote:
> On 3/21/24 12:33, Daniel Golle wrote:
> > Add new flag to destinguish block devices which may act as an NVMEM
> > provider.
> > 
> > Signed-off-by: Daniel Golle <daniel@makrotopia.org>
> > ---
> >   include/linux/blkdev.h | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index c3e8f7cf96be9..f2c4f280d7619 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -81,11 +81,13 @@ struct partition_meta_info {
> >    * ``GENHD_FL_NO_PART``: partition support is disabled.  The kernel will not
> >    * scan for partitions from add_disk, and users can't add partitions manually.
> >    *
> > + * ``GENHD_FL_NVMEM``: the block device should be considered as NVMEM provider.
> >    */
> >   enum {
> >   	GENHD_FL_REMOVABLE			= 1 << 0,
> >   	GENHD_FL_HIDDEN				= 1 << 1,
> >   	GENHD_FL_NO_PART			= 1 << 2,
> > +	GENHD_FL_NVMEM				= 1 << 3,
> >   };
> 
> What would break if this flag wouldn't exist?

As both, MTD and UBI already act as NVMEM providers themselves, once
the user creates a ubiblock device or got CONFIG_MTD_BLOCK=y set in their
kernel configuration, we would run into problems because both, the block
layer as well as MTD or UBI would try to be an NVMEM provider for the same
device tree node.

I intially suggested the invert of this flag, GENHD_FL_NO_NVMEM which
would be set only for mtdblock and ubiblock devices to opt-out of acting
as NVMEM proviers. However, in a previous comment [1] on the RFC it was
requested to make this opt-in instead.

[1]: https://patchwork.kernel.org/comment/25432948/
Daniel Golle March 22, 2024, 6:11 p.m. UTC | #8
On Fri, Mar 22, 2024 at 10:52:36AM -0700, Bart Van Assche wrote:
> On 3/21/24 13:22, Daniel Golle wrote:
> > On Thu, Mar 21, 2024 at 12:44:19PM -0700, Bart Van Assche wrote:
> > > Why to add this functionality to the block layer instead of somewhere
> > > in the drivers/ directory?
> > 
> > Simply because we need notifications about appearing and disappearing
> > block devices, or a way to iterate over all block devices in a system.
> > For both there isn't currently any other interface than using a
> > class_interface for that, and that requires access to &block_class
> > which is considered a block subsystem internal.
> 
> That's an argument for adding an interface to the block layer that
> implements this functionality but not for adding this code in the block
> layer.

Fine with me. I can implement such an interface, similar to how it is
implemented for MTD devices or UBI volumes for the block layer.

I would basically add a subscription and callback interface utilizing
a class_interface inside the block subsystem similar to how the same
is done in this series for registering block-device-backed NVMEM
providers.

However, given that this is a bigger task, I'd like to know from more
than one block subsystem maintainer that this approach would be
agreeable before spending time and effort in this direction.

Also note that obviously it would be much more intrusive and affect
*all* users of the block subsystem, while the current approach would
only affect those users who got CONFIG_BLOCK_NVMEM enabled.
Bart Van Assche March 22, 2024, 7:19 p.m. UTC | #9
On 3/22/24 11:02, Daniel Golle wrote:
> On Fri, Mar 22, 2024 at 10:52:17AM -0700, Bart Van Assche wrote:
>> On 3/21/24 12:31, Daniel Golle wrote:
>>> On embedded devices using an eMMC it is common that one or more (hw/sw)
>>> partitions on the eMMC are used to store MAC addresses and Wi-Fi
>>> calibration EEPROM data.
>>>
>>> Implement an NVMEM provider backed by a block device as typically the
>>> NVMEM framework is used to have kernel drivers read and use binary data
>>> from EEPROMs, efuses, flash memory (MTD), ...
>>>
>>> In order to be able to reference hardware partitions on an eMMC, add code
>>> to bind each hardware partition to a specific firmware subnode.
>>>
>>> Overall, this enables uniform handling across practially all flash
>>> storage types used for this purpose (MTD, UBI, and now also MMC).
>>>
>>> As part of this series it was necessary to define a device tree schema
>>> for block devices and partitions on them, which (similar to how it now
>>> works also for UBI volumes) can be matched by one or more properties.
>>
>> Since this patch series adds code that opens partitions and reads
>> from partitions, can that part of the functionality be implemented in
>> user space? There is already a mechanism for notifying user space about
>> block device changes, namely udev.
> 
> No. Because it has to happen (e.g. for nfsroot to work) before
> userland gets initiated: Without Ethernet MAC address (which if often
> stored at some raw offset on a partition or hw-partition of an eMMC),
> we don't have a way to use nfsroot (because that requires functional
> Ethernet), hence userland won't come up. It's a circular dependency
> problem which can only be addressed by making sure that everything
> needed for Ethernet to come up is provided by the kernel **before**
> rootfs (which can be nfsroot) is mounted.

How about the initial RAM disk? I think that's where code should occur
that reads calibration data from local storage.

Thanks,

Bart.
Bart Van Assche March 22, 2024, 7:22 p.m. UTC | #10
On 3/22/24 11:07, Daniel Golle wrote:
> On Fri, Mar 22, 2024 at 10:49:48AM -0700, Bart Van Assche wrote:
>> On 3/21/24 12:33, Daniel Golle wrote:
>>>    enum {
>>>    	GENHD_FL_REMOVABLE			= 1 << 0,
>>>    	GENHD_FL_HIDDEN				= 1 << 1,
>>>    	GENHD_FL_NO_PART			= 1 << 2,
>>> +	GENHD_FL_NVMEM				= 1 << 3,
>>>    };
>>
>> What would break if this flag wouldn't exist?
> 
> As both, MTD and UBI already act as NVMEM providers themselves, once
> the user creates a ubiblock device or got CONFIG_MTD_BLOCK=y set in their
> kernel configuration, we would run into problems because both, the block
> layer as well as MTD or UBI would try to be an NVMEM provider for the same
> device tree node.

Why would both MTD and UBI try to be an NVMEM provider for the same
device tree node? Why can't this patch series be implemented such that
a partition UUID occurs in the device tree and such that other code
scans for that partition UUID?

Thanks,

Bart.
Rob Herring (Arm) March 25, 2024, 3:10 p.m. UTC | #11
On Thu, Mar 21, 2024 at 07:31:48PM +0000, Daniel Golle wrote:
> On embedded devices using an eMMC it is common that one or more (hw/sw)
> partitions on the eMMC are used to store MAC addresses and Wi-Fi
> calibration EEPROM data.
> 
> Implement an NVMEM provider backed by a block device as typically the
> NVMEM framework is used to have kernel drivers read and use binary data
> from EEPROMs, efuses, flash memory (MTD), ...
> 
> In order to be able to reference hardware partitions on an eMMC, add code
> to bind each hardware partition to a specific firmware subnode.
> 
> Overall, this enables uniform handling across practially all flash
> storage types used for this purpose (MTD, UBI, and now also MMC).
> 
> As part of this series it was necessary to define a device tree schema
> for block devices and partitions on them, which (similar to how it now
> works also for UBI volumes) can be matched by one or more properties.
> 
> ---
> This series has previously been submitted as RFC on July 19th 2023[1]
> and most of the basic idea did not change since. Another round of RFC
> was submitted on March 5th 2024[2] which has received overall positive
> feedback and only minor corrections have been done since (see
> changelog below).

I don't recall giving positive feedback.

I still think this should use offsets rather than partition specific 
information. Not wanting to have to update the offsets if they change is 
not reason enough to not use them.

Rob
Rob Herring (Arm) March 25, 2024, 3:12 p.m. UTC | #12
On Thu, Mar 21, 2024 at 07:31:48PM +0000, Daniel Golle wrote:
> On embedded devices using an eMMC it is common that one or more (hw/sw)
> partitions on the eMMC are used to store MAC addresses and Wi-Fi
> calibration EEPROM data.
> 
> Implement an NVMEM provider backed by a block device as typically the
> NVMEM framework is used to have kernel drivers read and use binary data
> from EEPROMs, efuses, flash memory (MTD), ...
> 
> In order to be able to reference hardware partitions on an eMMC, add code
> to bind each hardware partition to a specific firmware subnode.
> 
> Overall, this enables uniform handling across practially all flash
> storage types used for this purpose (MTD, UBI, and now also MMC).
> 
> As part of this series it was necessary to define a device tree schema
> for block devices and partitions on them, which (similar to how it now
> works also for UBI volumes) can be matched by one or more properties.
> 
> ---
> This series has previously been submitted as RFC on July 19th 2023[1]
> and most of the basic idea did not change since. Another round of RFC
> was submitted on March 5th 2024[2] which has received overall positive
> feedback and only minor corrections have been done since (see
> changelog below).

Also, please version your patches. 'RFC' is a tag, not a version. v1 was
July. v2 was March 5th. This is v3.

Rob
Daniel Golle March 25, 2024, 3:38 p.m. UTC | #13
On Mon, Mar 25, 2024 at 10:10:46AM -0500, Rob Herring wrote:
> On Thu, Mar 21, 2024 at 07:31:48PM +0000, Daniel Golle wrote:
> > On embedded devices using an eMMC it is common that one or more (hw/sw)
> > partitions on the eMMC are used to store MAC addresses and Wi-Fi
> > calibration EEPROM data.
> > 
> > Implement an NVMEM provider backed by a block device as typically the
> > NVMEM framework is used to have kernel drivers read and use binary data
> > from EEPROMs, efuses, flash memory (MTD), ...
> > 
> > In order to be able to reference hardware partitions on an eMMC, add code
> > to bind each hardware partition to a specific firmware subnode.
> > 
> > Overall, this enables uniform handling across practially all flash
> > storage types used for this purpose (MTD, UBI, and now also MMC).
> > 
> > As part of this series it was necessary to define a device tree schema
> > for block devices and partitions on them, which (similar to how it now
> > works also for UBI volumes) can be matched by one or more properties.
> > 
> > ---
> > This series has previously been submitted as RFC on July 19th 2023[1]
> > and most of the basic idea did not change since. Another round of RFC
> > was submitted on March 5th 2024[2] which has received overall positive
> > feedback and only minor corrections have been done since (see
> > changelog below).
> 
> I don't recall giving positive feedback.
> 
> I still think this should use offsets rather than partition specific 
> information. Not wanting to have to update the offsets if they change is 
> not reason enough to not use them.

Using raw offsets on the block device (rather than the partition)
won't work for most existing devices and boot firmware out there. They
always reference the partition, usually by the name of a GPT
partition (but sometimes also PARTUUID or even PARTNO) which is then
used in the exact same way as an MTD partition or UBI volume would be
on devices with NOR or NAND flash. Just on eMMC we usually use a GPT
or MBR partition table rather than defining partitions in DT or cmdline,
which is rather rare (for historic reasons, I suppose, but it is what it
is now).

Depending on the eMMC chip used, that partition may not even be at the
same offset for different batches of the same device and hence I'd
like to just do it in the same way vendor firmware does it as well.

Chad of Adtran has previously confirmed that [1], which was the
positive feedback I was refering to. Other vendors like GL-iNet or
Netgear are doing the exact same thing.

As of now, we support this in OpenWrt by adding a lot of
board-specific knowledge to userland, which is ugly and also prevents
using things like PXE-initiated nfsroot on those devices.

The purpose of this series is to be able to properly support such devices
(ie. practially all consumer-grade routers out there using an eMMC for
storing firmware).

Also, those devices have enough resources to run a general purpose
distribution like Debian instead of OpenWrt, and all the userland
hacks to set MAC addresses and extract WiFi-EEPROM-data in a
board-specific ways will most certainly never find their way into
Debian. It's just not how embedded Linux works, unless you are looking
only at the RaspberryPi which got that data stored in a textfile
which is shipped by the distribution -- something very weird and very
different from literally all of-the-shelf routers, access-points or
switches I have ever seen (and I've seen many). Maybe Felix who has
seen even more of them can tell us more about that.


[1]: https://patchwork.kernel.org/project/linux-block/patch/f70bb480aef6f55228a25ce20ff0e88e670e1b70.1709667858.git.daniel@makrotopia.org/#25756072
Daniel Golle March 25, 2024, 3:46 p.m. UTC | #14
On Mon, Mar 25, 2024 at 10:12:59AM -0500, Rob Herring wrote:
> On Thu, Mar 21, 2024 at 07:31:48PM +0000, Daniel Golle wrote:
> > On embedded devices using an eMMC it is common that one or more (hw/sw)
> > partitions on the eMMC are used to store MAC addresses and Wi-Fi
> > calibration EEPROM data.
> > 
> > Implement an NVMEM provider backed by a block device as typically the
> > NVMEM framework is used to have kernel drivers read and use binary data
> > from EEPROMs, efuses, flash memory (MTD), ...
> > 
> > In order to be able to reference hardware partitions on an eMMC, add code
> > to bind each hardware partition to a specific firmware subnode.
> > 
> > Overall, this enables uniform handling across practially all flash
> > storage types used for this purpose (MTD, UBI, and now also MMC).
> > 
> > As part of this series it was necessary to define a device tree schema
> > for block devices and partitions on them, which (similar to how it now
> > works also for UBI volumes) can be matched by one or more properties.
> > 
> > ---
> > This series has previously been submitted as RFC on July 19th 2023[1]
> > and most of the basic idea did not change since. Another round of RFC
> > was submitted on March 5th 2024[2] which has received overall positive
> > feedback and only minor corrections have been done since (see
> > changelog below).
> 
> Also, please version your patches. 'RFC' is a tag, not a version. v1 was
> July. v2 was March 5th. This is v3.

According to "Submitting patches: the essential guide to getting your
code into the kernel" [1] a version is also a tag.

Quote:
 Common tags might include a version descriptor if the [sic] multiple
 versions of the patch have been sent out in response to comments
 (i.e., “v1, v2, v3”), or “RFC” to indicate a request for comments.

Maybe this should be clarified, exclusive or inclusive "or" is up to
the reader to interpret at this point, and I've often seen RFC, RFCv2,
v1, v2, ... as a sequence of tags applied for the same series, which
is why I followed what I used to believe was the most common
interpretation of the guidelines.

In any way, thank you for pointing it out, I assume the next iteration
should then be v4.

[1]: https://docs.kernel.org/process/submitting-patches.html
Rob Herring (Arm) March 26, 2024, 8:24 p.m. UTC | #15
+boot-architecture list

On Mon, Mar 25, 2024 at 03:38:19PM +0000, Daniel Golle wrote:
> On Mon, Mar 25, 2024 at 10:10:46AM -0500, Rob Herring wrote:
> > On Thu, Mar 21, 2024 at 07:31:48PM +0000, Daniel Golle wrote:
> > > On embedded devices using an eMMC it is common that one or more (hw/sw)
> > > partitions on the eMMC are used to store MAC addresses and Wi-Fi
> > > calibration EEPROM data.
> > > 
> > > Implement an NVMEM provider backed by a block device as typically the
> > > NVMEM framework is used to have kernel drivers read and use binary data
> > > from EEPROMs, efuses, flash memory (MTD), ...
> > > 
> > > In order to be able to reference hardware partitions on an eMMC, add code
> > > to bind each hardware partition to a specific firmware subnode.
> > > 
> > > Overall, this enables uniform handling across practially all flash
> > > storage types used for this purpose (MTD, UBI, and now also MMC).
> > > 
> > > As part of this series it was necessary to define a device tree schema
> > > for block devices and partitions on them, which (similar to how it now
> > > works also for UBI volumes) can be matched by one or more properties.
> > > 
> > > ---
> > > This series has previously been submitted as RFC on July 19th 2023[1]
> > > and most of the basic idea did not change since. Another round of RFC
> > > was submitted on March 5th 2024[2] which has received overall positive
> > > feedback and only minor corrections have been done since (see
> > > changelog below).
> > 
> > I don't recall giving positive feedback.
> > 
> > I still think this should use offsets rather than partition specific 
> > information. Not wanting to have to update the offsets if they change is 
> > not reason enough to not use them.
> 
> Using raw offsets on the block device (rather than the partition)
> won't work for most existing devices and boot firmware out there. They
> always reference the partition, usually by the name of a GPT
> partition (but sometimes also PARTUUID or even PARTNO) which is then
> used in the exact same way as an MTD partition or UBI volume would be
> on devices with NOR or NAND flash.

MTD normally uses offsets hence why I'd like some alignment. UBI is 
special because raw NAND is, well, special.

> Just on eMMC we usually use a GPT
> or MBR partition table rather than defining partitions in DT or cmdline,
> which is rather rare (for historic reasons, I suppose, but it is what it
> is now).

Yes, I understand how eMMC works. I don't understand why if you have 
part #, uuid, or name you can't get to the offset or vice-versa. You 
need only 1 piece of identification to map partition table entries to DT 
nodes. Sure, offsets can change, but surely the firmware can handle 
adjusting the DT? 

An offset would also work for the case of random firmware data on the 
disk that may or may not have a partition associated with it. There are 
certainly cases of that. I don't think we have much of a solution for 
that other than trying to educate vendors to not do that or OS 
installers only supporting installing to something other than eMMC. This 
is something EBBR[1] is trying to address.

> Depending on the eMMC chip used, that partition may not even be at the
> same offset for different batches of the same device and hence I'd
> like to just do it in the same way vendor firmware does it as well.

Often vendor firmware is not a model to follow...

> Chad of Adtran has previously confirmed that [1], which was the
> positive feedback I was refering to. Other vendors like GL-iNet or
> Netgear are doing the exact same thing.
> 
> As of now, we support this in OpenWrt by adding a lot of
> board-specific knowledge to userland, which is ugly and also prevents
> using things like PXE-initiated nfsroot on those devices.
> 
> The purpose of this series is to be able to properly support such devices
> (ie. practially all consumer-grade routers out there using an eMMC for
> storing firmware).
> 
> Also, those devices have enough resources to run a general purpose
> distribution like Debian instead of OpenWrt, and all the userland
> hacks to set MAC addresses and extract WiFi-EEPROM-data in a
> board-specific ways will most certainly never find their way into
> Debian. It's just not how embedded Linux works, unless you are looking
> only at the RaspberryPi which got that data stored in a textfile
> which is shipped by the distribution -- something very weird and very
> different from literally all of-the-shelf routers, access-points or
> switches I have ever seen (and I've seen many). Maybe Felix who has
> seen even more of them can tell us more about that.

General purpose distros want to partition the disk themselves. Adding 
anything to the DT for disk partitions would require the installer to be 
aware of it. There's various distro folks on the boot-arch list, so 
maybe one of them can comment.

Rob

[1] https://arm-software.github.io/ebbr/index.html#document-chapter4-firmware-media
Daniel Golle March 26, 2024, 9:28 p.m. UTC | #16
Hi Rob,

On Tue, Mar 26, 2024 at 03:24:49PM -0500, Rob Herring wrote:
> +boot-architecture list

Good idea, thank you :)

> 
> On Mon, Mar 25, 2024 at 03:38:19PM +0000, Daniel Golle wrote:
> > On Mon, Mar 25, 2024 at 10:10:46AM -0500, Rob Herring wrote:
> > > On Thu, Mar 21, 2024 at 07:31:48PM +0000, Daniel Golle wrote:
> > > > On embedded devices using an eMMC it is common that one or more (hw/sw)
> > > > partitions on the eMMC are used to store MAC addresses and Wi-Fi
> > > > calibration EEPROM data.
> > > > 
> > > > Implement an NVMEM provider backed by a block device as typically the
> > > > NVMEM framework is used to have kernel drivers read and use binary data
> > > > from EEPROMs, efuses, flash memory (MTD), ...
> > > > 
> > > > In order to be able to reference hardware partitions on an eMMC, add code
> > > > to bind each hardware partition to a specific firmware subnode.
> > > > 
> > > > Overall, this enables uniform handling across practially all flash
> > > > storage types used for this purpose (MTD, UBI, and now also MMC).
> > > > 
> > > > As part of this series it was necessary to define a device tree schema
> > > > for block devices and partitions on them, which (similar to how it now
> > > > works also for UBI volumes) can be matched by one or more properties.
> > > > 
> > > > ---
> > > > This series has previously been submitted as RFC on July 19th 2023[1]
> > > > and most of the basic idea did not change since. Another round of RFC
> > > > was submitted on March 5th 2024[2] which has received overall positive
> > > > feedback and only minor corrections have been done since (see
> > > > changelog below).
> > > 
> > > I don't recall giving positive feedback.
> > > 
> > > I still think this should use offsets rather than partition specific 
> > > information. Not wanting to have to update the offsets if they change is 
> > > not reason enough to not use them.
> > 
> > Using raw offsets on the block device (rather than the partition)
> > won't work for most existing devices and boot firmware out there. They
> > always reference the partition, usually by the name of a GPT
> > partition (but sometimes also PARTUUID or even PARTNO) which is then
> > used in the exact same way as an MTD partition or UBI volume would be
> > on devices with NOR or NAND flash.
> 
> MTD normally uses offsets hence why I'd like some alignment. UBI is 
> special because raw NAND is, well, special.

I get the point and in a way this is also already intended and
supported by this series. You can already just add an 'nvmem-layout'
node directly to a disk device rather than to a partition and define a
layout in this way.

Making this useful in practice will require some improvements to the
nvmem system in Linux though, because that currently uses signed 32-bit
integers as addresses which is not sufficient for the size of the
user-part of an eMMC. However, that needs to be done then and should
of course not be read as an excuse.

> 
> > Just on eMMC we usually use a GPT
> > or MBR partition table rather than defining partitions in DT or cmdline,
> > which is rather rare (for historic reasons, I suppose, but it is what it
> > is now).
> 
> Yes, I understand how eMMC works. I don't understand why if you have 
> part #, uuid, or name you can't get to the offset or vice-versa. You 
> need only 1 piece of identification to map partition table entries to DT 
> nodes.

Yes, either of them (or a combination) is fine. In practise I've mostly
seen PARTNAME as identifier used in userland scripts, and only adding
this for now will probably cover most devices (and existing boot firmware)
out there. Notable exceptions are devices which are using MBR partitions
because the BootROM expects the bootloader to be at the same block as
we would usually have the primary GPT. In this case we can only use the
PARTNO, of course, and it stinks.
MediaTek's MT7623A/N is such an example, but it's a slingly outdated
and pretty weird niche SoC I admit.

> Sure, offsets can change, but surely the firmware can handle 
> adjusting the DT? 

Future firmware may be able to do this, of course. Current existing
firmware already out there on devices such as the quite popular
GL.iNet MT-6000, Netgear's Orbi and Orbi Pro series as well as all
Adtran SmartRG devices does not. Updating or changing the boot
firmware of devices already out there is not intended and quite
challenging, and will make the device incompatible with its vendor
firmware. Hence it would be better to support replacing only the
Linux-based firmware (eg. with OpenWrt or even Debian or any
general-purpose Linux, the eMMC is large enough...) while not having
to touch the boot firmware (and risking to brick the device if that
goes wrong).

Personally, I'm rather burdened and unhappy with vendor attempts to
have the boot firmware mess around too much in (highly customized,
downstream) DT, it may look like a good solution at the moment, but
can totally become an obstacle in an unpredictable future (no offense
ASUS...)

> 
> An offset would also work for the case of random firmware data on the 
> disk that may or may not have a partition associated with it. There are 
> certainly cases of that. I don't think we have much of a solution for 
> that other than trying to educate vendors to not do that or OS 
> installers only supporting installing to something other than eMMC. This 
> is something EBBR[1] is trying to address.

Absolutely. Actually *early* GL-iNet devices did exactly that: Use the
eMMC boot hw-partitions to store boot firmware as well as MAC
addresses and potentially also Wi-Fi calibration data.

The MT-2500 is the example I'm aware of and got sitting on my desk for
testing with this very series (which allows to also reference eMMC
hardware partitions, see "[7/8] mmc: block: set fwnode of disk
devices").
Unfortunately later devices such the the flag-ship MT-6000 moved MAC
addresses and WiFi-EEPROMs into a GPT partition on the user-part of
the eMMC.

> 
> > Depending on the eMMC chip used, that partition may not even be at the
> > same offset for different batches of the same device and hence I'd
> > like to just do it in the same way vendor firmware does it as well.
> 
> Often vendor firmware is not a model to follow...

I totally agree. However, I don't see a good reason for not supporting
those network-appliance-type embedded devices which even ship with
(outdated, downstream) Linux by default while going through great
lengths for things like broken ACPI tables in many laptops which
require lots of work-arounds to have features like suspend-to-disk
working, or even be able to run Linux at all.

> 
> > Chad of Adtran has previously confirmed that [1], which was the
> > positive feedback I was refering to. Other vendors like GL-iNet or
> > Netgear are doing the exact same thing.
> > 
> > As of now, we support this in OpenWrt by adding a lot of
> > board-specific knowledge to userland, which is ugly and also prevents
> > using things like PXE-initiated nfsroot on those devices.
> > 
> > The purpose of this series is to be able to properly support such devices
> > (ie. practially all consumer-grade routers out there using an eMMC for
> > storing firmware).
> > 
> > Also, those devices have enough resources to run a general purpose
> > distribution like Debian instead of OpenWrt, and all the userland
> > hacks to set MAC addresses and extract WiFi-EEPROM-data in a
> > board-specific ways will most certainly never find their way into
> > Debian. It's just not how embedded Linux works, unless you are looking
> > only at the RaspberryPi which got that data stored in a textfile
> > which is shipped by the distribution -- something very weird and very
> > different from literally all of-the-shelf routers, access-points or
> > switches I have ever seen (and I've seen many). Maybe Felix who has
> > seen even more of them can tell us more about that.
> 
> General purpose distros want to partition the disk themselves. Adding 
> anything to the DT for disk partitions would require the installer to be 
> aware of it. There's various distro folks on the boot-arch list, so 
> maybe one of them can comment.

Usually the installers are already aware to not touch partitions when
unaware of their purpose. Repartitioning the disk from scratch is not
what (modern) distributions are doing, at least the EFI System
partition is kept, as well as typical rescue/recovery partitions many
vendors put on their (Windows, Mac) laptops to allow to "factory
reset" them.

Installers usually offer to replace (or resize) the "large" partition
used by the currently installed OS instead.

And well, the DT reference to a partition holding e.g. MAC addresses
does make the installer aware of it, obviously.


Thank you for the constructive debate!


Cheers


Daniel


> 
> Rob
> 
> [1] https://arm-software.github.io/ebbr/index.html#document-chapter4-firmware-media
Rob Herring (Arm) March 27, 2024, 12:33 p.m. UTC | #17
On Tue, Mar 26, 2024 at 4:29 PM Daniel Golle <daniel@makrotopia.org> wrote:
>
> Hi Rob,
>
> On Tue, Mar 26, 2024 at 03:24:49PM -0500, Rob Herring wrote:
> > +boot-architecture list
>
> Good idea, thank you :)

Now really adding it. :(

Will reply to rest later.

> >
> > On Mon, Mar 25, 2024 at 03:38:19PM +0000, Daniel Golle wrote:
> > > On Mon, Mar 25, 2024 at 10:10:46AM -0500, Rob Herring wrote:
> > > > On Thu, Mar 21, 2024 at 07:31:48PM +0000, Daniel Golle wrote:
> > > > > On embedded devices using an eMMC it is common that one or more (hw/sw)
> > > > > partitions on the eMMC are used to store MAC addresses and Wi-Fi
> > > > > calibration EEPROM data.
> > > > >
> > > > > Implement an NVMEM provider backed by a block device as typically the
> > > > > NVMEM framework is used to have kernel drivers read and use binary data
> > > > > from EEPROMs, efuses, flash memory (MTD), ...
> > > > >
> > > > > In order to be able to reference hardware partitions on an eMMC, add code
> > > > > to bind each hardware partition to a specific firmware subnode.
> > > > >
> > > > > Overall, this enables uniform handling across practially all flash
> > > > > storage types used for this purpose (MTD, UBI, and now also MMC).
> > > > >
> > > > > As part of this series it was necessary to define a device tree schema
> > > > > for block devices and partitions on them, which (similar to how it now
> > > > > works also for UBI volumes) can be matched by one or more properties.
> > > > >
> > > > > ---
> > > > > This series has previously been submitted as RFC on July 19th 2023[1]
> > > > > and most of the basic idea did not change since. Another round of RFC
> > > > > was submitted on March 5th 2024[2] which has received overall positive
> > > > > feedback and only minor corrections have been done since (see
> > > > > changelog below).
> > > >
> > > > I don't recall giving positive feedback.
> > > >
> > > > I still think this should use offsets rather than partition specific
> > > > information. Not wanting to have to update the offsets if they change is
> > > > not reason enough to not use them.
> > >
> > > Using raw offsets on the block device (rather than the partition)
> > > won't work for most existing devices and boot firmware out there. They
> > > always reference the partition, usually by the name of a GPT
> > > partition (but sometimes also PARTUUID or even PARTNO) which is then
> > > used in the exact same way as an MTD partition or UBI volume would be
> > > on devices with NOR or NAND flash.
> >
> > MTD normally uses offsets hence why I'd like some alignment. UBI is
> > special because raw NAND is, well, special.
>
> I get the point and in a way this is also already intended and
> supported by this series. You can already just add an 'nvmem-layout'
> node directly to a disk device rather than to a partition and define a
> layout in this way.
>
> Making this useful in practice will require some improvements to the
> nvmem system in Linux though, because that currently uses signed 32-bit
> integers as addresses which is not sufficient for the size of the
> user-part of an eMMC. However, that needs to be done then and should
> of course not be read as an excuse.
>
> >
> > > Just on eMMC we usually use a GPT
> > > or MBR partition table rather than defining partitions in DT or cmdline,
> > > which is rather rare (for historic reasons, I suppose, but it is what it
> > > is now).
> >
> > Yes, I understand how eMMC works. I don't understand why if you have
> > part #, uuid, or name you can't get to the offset or vice-versa. You
> > need only 1 piece of identification to map partition table entries to DT
> > nodes.
>
> Yes, either of them (or a combination) is fine. In practise I've mostly
> seen PARTNAME as identifier used in userland scripts, and only adding
> this for now will probably cover most devices (and existing boot firmware)
> out there. Notable exceptions are devices which are using MBR partitions
> because the BootROM expects the bootloader to be at the same block as
> we would usually have the primary GPT. In this case we can only use the
> PARTNO, of course, and it stinks.
> MediaTek's MT7623A/N is such an example, but it's a slingly outdated
> and pretty weird niche SoC I admit.
>
> > Sure, offsets can change, but surely the firmware can handle
> > adjusting the DT?
>
> Future firmware may be able to do this, of course. Current existing
> firmware already out there on devices such as the quite popular
> GL.iNet MT-6000, Netgear's Orbi and Orbi Pro series as well as all
> Adtran SmartRG devices does not. Updating or changing the boot
> firmware of devices already out there is not intended and quite
> challenging, and will make the device incompatible with its vendor
> firmware. Hence it would be better to support replacing only the
> Linux-based firmware (eg. with OpenWrt or even Debian or any
> general-purpose Linux, the eMMC is large enough...) while not having
> to touch the boot firmware (and risking to brick the device if that
> goes wrong).
>
> Personally, I'm rather burdened and unhappy with vendor attempts to
> have the boot firmware mess around too much in (highly customized,
> downstream) DT, it may look like a good solution at the moment, but
> can totally become an obstacle in an unpredictable future (no offense
> ASUS...)
>
> >
> > An offset would also work for the case of random firmware data on the
> > disk that may or may not have a partition associated with it. There are
> > certainly cases of that. I don't think we have much of a solution for
> > that other than trying to educate vendors to not do that or OS
> > installers only supporting installing to something other than eMMC. This
> > is something EBBR[1] is trying to address.
>
> Absolutely. Actually *early* GL-iNet devices did exactly that: Use the
> eMMC boot hw-partitions to store boot firmware as well as MAC
> addresses and potentially also Wi-Fi calibration data.
>
> The MT-2500 is the example I'm aware of and got sitting on my desk for
> testing with this very series (which allows to also reference eMMC
> hardware partitions, see "[7/8] mmc: block: set fwnode of disk
> devices").
> Unfortunately later devices such the the flag-ship MT-6000 moved MAC
> addresses and WiFi-EEPROMs into a GPT partition on the user-part of
> the eMMC.
>
> >
> > > Depending on the eMMC chip used, that partition may not even be at the
> > > same offset for different batches of the same device and hence I'd
> > > like to just do it in the same way vendor firmware does it as well.
> >
> > Often vendor firmware is not a model to follow...
>
> I totally agree. However, I don't see a good reason for not supporting
> those network-appliance-type embedded devices which even ship with
> (outdated, downstream) Linux by default while going through great
> lengths for things like broken ACPI tables in many laptops which
> require lots of work-arounds to have features like suspend-to-disk
> working, or even be able to run Linux at all.
>
> >
> > > Chad of Adtran has previously confirmed that [1], which was the
> > > positive feedback I was refering to. Other vendors like GL-iNet or
> > > Netgear are doing the exact same thing.
> > >
> > > As of now, we support this in OpenWrt by adding a lot of
> > > board-specific knowledge to userland, which is ugly and also prevents
> > > using things like PXE-initiated nfsroot on those devices.
> > >
> > > The purpose of this series is to be able to properly support such devices
> > > (ie. practially all consumer-grade routers out there using an eMMC for
> > > storing firmware).
> > >
> > > Also, those devices have enough resources to run a general purpose
> > > distribution like Debian instead of OpenWrt, and all the userland
> > > hacks to set MAC addresses and extract WiFi-EEPROM-data in a
> > > board-specific ways will most certainly never find their way into
> > > Debian. It's just not how embedded Linux works, unless you are looking
> > > only at the RaspberryPi which got that data stored in a textfile
> > > which is shipped by the distribution -- something very weird and very
> > > different from literally all of-the-shelf routers, access-points or
> > > switches I have ever seen (and I've seen many). Maybe Felix who has
> > > seen even more of them can tell us more about that.
> >
> > General purpose distros want to partition the disk themselves. Adding
> > anything to the DT for disk partitions would require the installer to be
> > aware of it. There's various distro folks on the boot-arch list, so
> > maybe one of them can comment.
>
> Usually the installers are already aware to not touch partitions when
> unaware of their purpose. Repartitioning the disk from scratch is not
> what (modern) distributions are doing, at least the EFI System
> partition is kept, as well as typical rescue/recovery partitions many
> vendors put on their (Windows, Mac) laptops to allow to "factory
> reset" them.
>
> Installers usually offer to replace (or resize) the "large" partition
> used by the currently installed OS instead.
>
> And well, the DT reference to a partition holding e.g. MAC addresses
> does make the installer aware of it, obviously.
>
>
> Thank you for the constructive debate!
>
>
> Cheers
>
>
> Daniel
>
>
> >
> > Rob
> >
> > [1] https://arm-software.github.io/ebbr/index.html#document-chapter4-firmware-media
Daniel Golle April 18, 2024, 10:51 p.m. UTC | #18
On Fri, Mar 22, 2024 at 12:22:32PM -0700, Bart Van Assche wrote:
> On 3/22/24 11:07, Daniel Golle wrote:
> > On Fri, Mar 22, 2024 at 10:49:48AM -0700, Bart Van Assche wrote:
> > > On 3/21/24 12:33, Daniel Golle wrote:
> > > >    enum {
> > > >    	GENHD_FL_REMOVABLE			= 1 << 0,
> > > >    	GENHD_FL_HIDDEN				= 1 << 1,
> > > >    	GENHD_FL_NO_PART			= 1 << 2,
> > > > +	GENHD_FL_NVMEM				= 1 << 3,
> > > >    };
> > > 
> > > What would break if this flag wouldn't exist?
> > 
> > As both, MTD and UBI already act as NVMEM providers themselves, once
> > the user creates a ubiblock device or got CONFIG_MTD_BLOCK=y set in their
> > kernel configuration, we would run into problems because both, the block
> > layer as well as MTD or UBI would try to be an NVMEM provider for the same
> > device tree node.
> 
> Why would both MTD and UBI try to be an NVMEM provider for the same
> device tree node?

I didn't mean that both MTD and UBI would **simultanously** try to act
as NVMEM providers for the same device tree node. What I meant was
that either of them can act as an NVMEM provider while at the same time
also providing an emulated block device (mtdblock xor ubiblock).

Hence those emulated block devices will have to be excluded from acting
as NVMEM providers. In this patch I suggest to do this by opt-in of
block drivers which should potentially provide NVMEM (typically mmcblk).

I apologize for the confusion and assume that wasn't clear from the
wording I've used. I hope it's more clear now.

Alternatively it could also be solved via opt-out of ubiblock and
mtdblock devices using the inverted flag (GENHD_FL_NO_NVMEM) --
however, this has previously been criticized and I was asked to rather
make it opt-in.[1]


> Why can't this patch series be implemented such that
> a partition UUID occurs in the device tree and such that other code
> scans for that partition UUID?

This is actually one way this very series allows one to handle this:
by identifying a partition using its partuuid.

However, it's also quite common that the MMC boot **hardware**
partitions are used to store MAC addresses and/or Wi-Fi calibration
data. In this case there is no partition table and the NVMEM provider
has to act directly on the whole disk device (which is only a few
megabytes in size in case of those mmcblkXbootY devices and never has
a partition table).

[1]: https://patchwork.kernel.org/comment/25432948/