diff mbox series

of/fdt: Ignore disabled memory nodes

Message ID 20220517101410.3493781-1-andre.przywara@arm.com
State Accepted, archived
Headers show
Series of/fdt: Ignore disabled memory nodes | expand

Checks

Context Check Description
robh/checkpatch warning total: 0 errors, 1 warnings, 9 lines checked
robh/patch-applied fail build log

Commit Message

Andre Przywara May 17, 2022, 10:14 a.m. UTC
When we boot a machine using a devicetree, the generic DT code goes
through all nodes with a 'device_type = "memory"' property, and collects
all memory banks mentioned there. However it does not check for the
status property, so any nodes which are explicitly "disabled" will still
be added as a memblock.
This ends up badly for QEMU, when booting with secure firmware on
arm/arm64 machines, because QEMU adds a node describing secure-only
memory:
===================
	secram@e000000 {
		secure-status = "okay";
		status = "disabled";
		reg = <0x00 0xe000000 0x00 0x1000000>;
		device_type = "memory";
	};
===================

The kernel will eventually use that memory block (which is located below
the main DRAM bank), but accesses to that will be answered with an
SError:
===================
[    0.000000] Internal error: synchronous external abort: 96000050 [#1] PREEMPT SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc6-00014-g10c8acb8b679 #524
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : new_slab+0x190/0x340
[    0.000000] lr : new_slab+0x184/0x340
[    0.000000] sp : ffff80000a4b3d10
....
==================
The actual crash location and call stack will be somewhat random, and
depend on the specific allocation of that physical memory range.

As the DT spec[1] explicitly mentions standard properties, add a simple
check to skip over disabled memory nodes, so that we only use memory
that is meant for non-secure code to use.

That fixes booting a QEMU arm64 VM with EL3 enabled ("secure=on"), when
not using UEFI. In this case the QEMU generated DT will be handed on
to the kernel, which will see the secram node.
This issue is reproducible when using TF-A together with U-Boot as
firmware, then booting with the "booti" command.

When using U-Boot as an UEFI provider, the code there [2] explicitly
filters for disabled nodes when generating the UEFI memory map, so we
are safe.
EDK/2 only reads the first bank of the first DT memory node [3] to learn
about memory, so we got lucky there.

[1] https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter3-devicenodes.rst#memory-node (after the table)
[2] https://source.denx.de/u-boot/u-boot/-/blob/master/lib/fdtdec.c#L1061-1063
[3] https://github.com/tianocore/edk2/blob/master/ArmVirtPkg/PrePi/FdtParser.c

Reported-by: Ross Burton <ross.burton@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 drivers/of/fdt.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Rob Herring May 17, 2022, 3:34 p.m. UTC | #1
On Tue, May 17, 2022 at 11:14:10AM +0100, Andre Przywara wrote:
> When we boot a machine using a devicetree, the generic DT code goes
> through all nodes with a 'device_type = "memory"' property, and collects
> all memory banks mentioned there. However it does not check for the
> status property, so any nodes which are explicitly "disabled" will still
> be added as a memblock.
> This ends up badly for QEMU, when booting with secure firmware on
> arm/arm64 machines, because QEMU adds a node describing secure-only
> memory:
> ===================
> 	secram@e000000 {

BTW, 'memory' is the correct node name.

> 		secure-status = "okay";
> 		status = "disabled";
> 		reg = <0x00 0xe000000 0x00 0x1000000>;
> 		device_type = "memory";
> 	};
> ===================
> 
> The kernel will eventually use that memory block (which is located below
> the main DRAM bank), but accesses to that will be answered with an
> SError:
> ===================
> [    0.000000] Internal error: synchronous external abort: 96000050 [#1] PREEMPT SMP
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc6-00014-g10c8acb8b679 #524
> [    0.000000] Hardware name: linux,dummy-virt (DT)
> [    0.000000] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    0.000000] pc : new_slab+0x190/0x340
> [    0.000000] lr : new_slab+0x184/0x340
> [    0.000000] sp : ffff80000a4b3d10
> ....
> ==================
> The actual crash location and call stack will be somewhat random, and
> depend on the specific allocation of that physical memory range.
> 
> As the DT spec[1] explicitly mentions standard properties, add a simple
> check to skip over disabled memory nodes, so that we only use memory
> that is meant for non-secure code to use.
> 
> That fixes booting a QEMU arm64 VM with EL3 enabled ("secure=on"), when
> not using UEFI. In this case the QEMU generated DT will be handed on
> to the kernel, which will see the secram node.
> This issue is reproducible when using TF-A together with U-Boot as
> firmware, then booting with the "booti" command.
> 
> When using U-Boot as an UEFI provider, the code there [2] explicitly
> filters for disabled nodes when generating the UEFI memory map, so we
> are safe.
> EDK/2 only reads the first bank of the first DT memory node [3] to learn
> about memory, so we got lucky there.
> 
> [1] https://github.com/devicetree-org/devicetree-specification/blob/main/source/chapter3-devicenodes.rst#memory-node (after the table)
> [2] https://source.denx.de/u-boot/u-boot/-/blob/master/lib/fdtdec.c#L1061-1063
> [3] https://github.com/tianocore/edk2/blob/master/ArmVirtPkg/PrePi/FdtParser.c
> 
> Reported-by: Ross Burton <ross.burton@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  drivers/of/fdt.c | 3 +++
>  1 file changed, 3 insertions(+)

Applied, thanks!
Peter Maydell May 17, 2022, 4:54 p.m. UTC | #2
On Tue, 17 May 2022 at 16:34, Rob Herring <robh@kernel.org> wrote:
>
> On Tue, May 17, 2022 at 11:14:10AM +0100, Andre Przywara wrote:
> > When we boot a machine using a devicetree, the generic DT code goes
> > through all nodes with a 'device_type = "memory"' property, and collects
> > all memory banks mentioned there. However it does not check for the
> > status property, so any nodes which are explicitly "disabled" will still
> > be added as a memblock.
> > This ends up badly for QEMU, when booting with secure firmware on
> > arm/arm64 machines, because QEMU adds a node describing secure-only
> > memory:
> > ===================
> >       secram@e000000 {
>
> BTW, 'memory' is the correct node name.

We already have a 'memory' node, which is for the NS
memory. This one's for the secure-only RAM block,
which is why I gave it a name that hopefully helps in
spotting that when a human is reading the DT.

I'm not really sure to what extent node names in device trees are
"this is just an identifying textual label" and to what extent
they are "this is really ABI and you need to follow the standard",
though -- nothing in practice seems to care what they are,
suggesting the "textual label" theory, but some bits of tooling
complain if you do things like forget the address value or use the
same address for two different nodes, suggesting the "really ABI"
theory.

thanks
-- PMM
Rob Herring May 17, 2022, 5:47 p.m. UTC | #3
On Tue, May 17, 2022 at 11:54 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Tue, 17 May 2022 at 16:34, Rob Herring <robh@kernel.org> wrote:
> >
> > On Tue, May 17, 2022 at 11:14:10AM +0100, Andre Przywara wrote:
> > > When we boot a machine using a devicetree, the generic DT code goes
> > > through all nodes with a 'device_type = "memory"' property, and collects
> > > all memory banks mentioned there. However it does not check for the
> > > status property, so any nodes which are explicitly "disabled" will still
> > > be added as a memblock.
> > > This ends up badly for QEMU, when booting with secure firmware on
> > > arm/arm64 machines, because QEMU adds a node describing secure-only
> > > memory:
> > > ===================
> > >       secram@e000000 {
> >
> > BTW, 'memory' is the correct node name.
>
> We already have a 'memory' node, which is for the NS
> memory. This one's for the secure-only RAM block,
> which is why I gave it a name that hopefully helps in
> spotting that when a human is reading the DT.

You can do: secram: memory@e000000 {

Where 'secram' is only a source level label until overlays come into
the picture.

> I'm not really sure to what extent node names in device trees are
> "this is just an identifying textual label" and to what extent
> they are "this is really ABI and you need to follow the standard",
> though -- nothing in practice seems to care what they are,
> suggesting the "textual label" theory, but some bits of tooling
> complain if you do things like forget the address value or use the
> same address for two different nodes, suggesting the "really ABI"
> theory.

Node names are supposed to follow the class of device and there's a
list of established names in the spec.

Sometimes it's ABI and sometimes not. Much of it is just good hygiene.
memory nodes are also special because 'device_type' is used to
identify them, but device_type is generally deprecated for FDT as its
meaning in OpenFirmware doesn't apply (it defines what callable
methods exist). We could use the nodename (without unit address)
instead, but that would fail in some cases as other names have been
used.

Rob
Peter Maydell May 17, 2022, 7:19 p.m. UTC | #4
On Tue, 17 May 2022 at 18:48, Rob Herring <robh@kernel.org> wrote:
>
> On Tue, May 17, 2022 at 11:54 AM Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Tue, 17 May 2022 at 16:34, Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Tue, May 17, 2022 at 11:14:10AM +0100, Andre Przywara wrote:
> > > > When we boot a machine using a devicetree, the generic DT code goes
> > > > through all nodes with a 'device_type = "memory"' property, and collects
> > > > all memory banks mentioned there. However it does not check for the
> > > > status property, so any nodes which are explicitly "disabled" will still
> > > > be added as a memblock.
> > > > This ends up badly for QEMU, when booting with secure firmware on
> > > > arm/arm64 machines, because QEMU adds a node describing secure-only
> > > > memory:
> > > > ===================
> > > >       secram@e000000 {
> > >
> > > BTW, 'memory' is the correct node name.
> >
> > We already have a 'memory' node, which is for the NS
> > memory. This one's for the secure-only RAM block,
> > which is why I gave it a name that hopefully helps in
> > spotting that when a human is reading the DT.
>
> You can do: secram: memory@e000000 {
>
> Where 'secram' is only a source level label until overlays come into
> the picture.

We generate the DTB with libfdt, so source-only information
isn't something we can put in, I think. (The quoted DT fragment
in this patch's commit message is the result of decompiling
the runtime generated DT binary blob with dtc.)

> > I'm not really sure to what extent node names in device trees are
> > "this is just an identifying textual label" and to what extent
> > they are "this is really ABI and you need to follow the standard",
> > though -- nothing in practice seems to care what they are,
> > suggesting the "textual label" theory, but some bits of tooling
> > complain if you do things like forget the address value or use the
> > same address for two different nodes, suggesting the "really ABI"
> > theory.
>
> Node names are supposed to follow the class of device and there's a
> list of established names in the spec.
>
> Sometimes it's ABI and sometimes not. Much of it is just good hygiene.
> memory nodes are also special because 'device_type' is used to
> identify them, but device_type is generally deprecated for FDT as its
> meaning in OpenFirmware doesn't apply (it defines what callable
> methods exist). We could use the nodename (without unit address)
> instead, but that would fail in some cases as other names have been
> used.

This seems kind of odd to me as a design, compared to
"have the node have a property that says what it is
and let the name of the node just be, well, its name"
(especially since 'device_type' and 'compatible' look an
awful lot like "this is the property that tells you what this
node actually is".)
Are we just stuck with what we have for historical reasons ?

-- PMM
Rob Herring May 18, 2022, 4:54 p.m. UTC | #5
On Tue, May 17, 2022 at 08:19:47PM +0100, Peter Maydell wrote:
> On Tue, 17 May 2022 at 18:48, Rob Herring <robh@kernel.org> wrote:
> >
> > On Tue, May 17, 2022 at 11:54 AM Peter Maydell <peter.maydell@linaro.org> wrote:
> > >
> > > On Tue, 17 May 2022 at 16:34, Rob Herring <robh@kernel.org> wrote:
> > > >
> > > > On Tue, May 17, 2022 at 11:14:10AM +0100, Andre Przywara wrote:
> > > > > When we boot a machine using a devicetree, the generic DT code goes
> > > > > through all nodes with a 'device_type = "memory"' property, and collects
> > > > > all memory banks mentioned there. However it does not check for the
> > > > > status property, so any nodes which are explicitly "disabled" will still
> > > > > be added as a memblock.
> > > > > This ends up badly for QEMU, when booting with secure firmware on
> > > > > arm/arm64 machines, because QEMU adds a node describing secure-only
> > > > > memory:
> > > > > ===================
> > > > >       secram@e000000 {
> > > >
> > > > BTW, 'memory' is the correct node name.
> > >
> > > We already have a 'memory' node, which is for the NS
> > > memory. This one's for the secure-only RAM block,
> > > which is why I gave it a name that hopefully helps in
> > > spotting that when a human is reading the DT.
> >
> > You can do: secram: memory@e000000 {
> >
> > Where 'secram' is only a source level label until overlays come into
> > the picture.
> 
> We generate the DTB with libfdt, so source-only information
> isn't something we can put in, I think. (The quoted DT fragment
> in this patch's commit message is the result of decompiling
> the runtime generated DT binary blob with dtc.)

Given the runtime aspect with overlays, it's conceivable that libfdt 
could support setting labels some day and then dts output maintaining 
them.

We could also consider a standard node name such as 'secure-memory'. 
It's a whole can of worms though on how secure vs. non-secure memory 
(and other things) are represented.

> > > I'm not really sure to what extent node names in device trees are
> > > "this is just an identifying textual label" and to what extent
> > > they are "this is really ABI and you need to follow the standard",
> > > though -- nothing in practice seems to care what they are,
> > > suggesting the "textual label" theory, but some bits of tooling
> > > complain if you do things like forget the address value or use the
> > > same address for two different nodes, suggesting the "really ABI"
> > > theory.
> >
> > Node names are supposed to follow the class of device and there's a
> > list of established names in the spec.
> >
> > Sometimes it's ABI and sometimes not. Much of it is just good hygiene.
> > memory nodes are also special because 'device_type' is used to
> > identify them, but device_type is generally deprecated for FDT as its
> > meaning in OpenFirmware doesn't apply (it defines what callable
> > methods exist). We could use the nodename (without unit address)
> > instead, but that would fail in some cases as other names have been
> > used.
> 
> This seems kind of odd to me as a design, compared to

Design? I wish. Evolution.

> "have the node have a property that says what it is
> and let the name of the node just be, well, its name"
> (especially since 'device_type' and 'compatible' look an
> awful lot like "this is the property that tells you what this
> node actually is".)
> Are we just stuck with what we have for historical reasons ?

Yes. If we were designing this, we'd probably have 'compatible = 
"memory"'. We're likely just stuck with things how they are. Mostly node 
names haven't been an ABI and we're just trying to be consistent in 
naming and use of unit-addresses.

Rob
Peter Maydell May 18, 2022, 5:54 p.m. UTC | #6
On Wed, 18 May 2022 at 17:54, Rob Herring <robh@kernel.org> wrote:
>
> On Tue, May 17, 2022 at 08:19:47PM +0100, Peter Maydell wrote:
> > We generate the DTB with libfdt, so source-only information
> > isn't something we can put in, I think. (The quoted DT fragment
> > in this patch's commit message is the result of decompiling
> > the runtime generated DT binary blob with dtc.)
>
> Given the runtime aspect with overlays, it's conceivable that libfdt
> could support setting labels some day and then dts output maintaining
> them.
>
> We could also consider a standard node name such as 'secure-memory'.
> It's a whole can of worms though on how secure vs. non-secure memory
> (and other things) are represented.

Mmm. We put in the very basic parts years ago in
Documentation/devicetree/bindings/arm/secure.txt
which is (and has remained) generally sufficient for the QEMU->Trusted
Firmware-> maybe uboot->Linux stack, which is pretty much the only use
case I think. (My intention when we wrote that up was that memory
that's S-only would be indicated the same way as S-only devices,
with the secure-status and status properties.)

> > Are we just stuck with what we have for historical reasons ?
>
> Yes. If we were designing this, we'd probably have 'compatible =
> "memory"'. We're likely just stuck with things how they are. Mostly node
> names haven't been an ABI and we're just trying to be consistent in
> naming and use of unit-addresses.

So, do you think it's worthwhile/a good idea for me to rename
the DT node that QEMU is currently calling "secmem" to be
"memory" ? My default is "leave it as it is", for economy of
effort reasons :-) -- but it's an easy enough change to make.
Though EDK2's dtb reading code just looks for the first
"memory" node and assumes it's the big one, so changing the node
name would make us reliant on the order of the two nodes in the
DTB unless we fixed EDK2 (which we should probably do anyway, tbh).

thanks
-- PMM
Ard Biesheuvel May 18, 2022, 8:46 p.m. UTC | #7
On Wed, 18 May 2022 at 19:54, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Wed, 18 May 2022 at 17:54, Rob Herring <robh@kernel.org> wrote:
> >
> > On Tue, May 17, 2022 at 08:19:47PM +0100, Peter Maydell wrote:
> > > We generate the DTB with libfdt, so source-only information
> > > isn't something we can put in, I think. (The quoted DT fragment
> > > in this patch's commit message is the result of decompiling
> > > the runtime generated DT binary blob with dtc.)
> >
> > Given the runtime aspect with overlays, it's conceivable that libfdt
> > could support setting labels some day and then dts output maintaining
> > them.
> >
> > We could also consider a standard node name such as 'secure-memory'.
> > It's a whole can of worms though on how secure vs. non-secure memory
> > (and other things) are represented.
>
> Mmm. We put in the very basic parts years ago in
> Documentation/devicetree/bindings/arm/secure.txt
> which is (and has remained) generally sufficient for the QEMU->Trusted
> Firmware-> maybe uboot->Linux stack, which is pretty much the only use
> case I think. (My intention when we wrote that up was that memory
> that's S-only would be indicated the same way as S-only devices,
> with the secure-status and status properties.)
>
> > > Are we just stuck with what we have for historical reasons ?
> >
> > Yes. If we were designing this, we'd probably have 'compatible =
> > "memory"'. We're likely just stuck with things how they are. Mostly node
> > names haven't been an ABI and we're just trying to be consistent in
> > naming and use of unit-addresses.
>
> So, do you think it's worthwhile/a good idea for me to rename
> the DT node that QEMU is currently calling "secmem" to be
> "memory" ? My default is "leave it as it is", for economy of
> effort reasons :-) -- but it's an easy enough change to make.
> Though EDK2's dtb reading code just looks for the first
> "memory" node and assumes it's the big one, so changing the node
> name would make us reliant on the order of the two nodes in the
> DTB unless we fixed EDK2 (which we should probably do anyway, tbh).
>

Agreed. The referenced code is only one of two implementations (one
for the firmware version that executes in place from NOR flash, and
one for the version that executes from DRAM), which don't follow the
same strategy (the other one enumerates all device_type="memory" nodes
in the tree, and picks the one with the lowest starting address, but
also ignores status properties or node names). The PrePi version
linked here should simply choose the node that covers its own image in
memory. The XIP version uses a stack in DRAM, so it could do something
similar.
diff mbox series

Patch

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index ec315b060cd50..0f30496ce80bf 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1105,6 +1105,9 @@  int __init early_init_dt_scan_memory(void)
 		if (type == NULL || strcmp(type, "memory") != 0)
 			continue;
 
+		if (!of_fdt_device_is_available(fdt, node))
+			continue;
+
 		reg = of_get_flat_dt_prop(node, "linux,usable-memory", &l);
 		if (reg == NULL)
 			reg = of_get_flat_dt_prop(node, "reg", &l);