diff mbox series

[v2] memblock: add no-map alloc functions

Message ID 20240416120635.361838-2-skseofh@gmail.com
State Changes Requested
Headers show
Series [v2] memblock: add no-map alloc functions | expand

Checks

Context Check Description
robh/checkpatch warning total: 60 errors, 42 warnings, 111 lines checked
robh/patch-applied fail build log

Commit Message

DaeRo Lee April 16, 2024, 12:06 p.m. UTC
From: Daero Lee <daero_le.lee@samsung.com>

Like reserved-memory with the 'no-map' property and only 'size' property
(w/o 'reg' property), there are memory regions need to be allocated in
memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be
allocated in memblock.reserved.

example : arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
        reserved-memory {
                #address-cells = <2>;
                #size-cells = <2>;
                ranges;

                bman_fbpr: bman-fbpr {
                        compatible = "shared-dma-pool";
                        size = <0 0x1000000>;
                        alignment = <0 0x1000000>;
                        no-map;
                };

                qman_fqd: qman-fqd {
                        compatible = "shared-dma-pool";
                        size = <0 0x400000>;
                        alignment = <0 0x400000>;
                        no-map;
                };

                qman_pfdr: qman-pfdr {
                        compatible = "shared-dma-pool";
                        size = <0 0x2000000>;
                        alignment = <0 0x2000000>;
                        no-map;
                };
        };

So, functions were added that find the required memory area in
memblock.memory, but do not allocate it to memblock.reserved.

In previous patch(a7259df), early_init_dt_alloc_reserved_memory was
modified to use memblock_phys_alloc_range allocating memory in
memblock.reserved, instead of memblock_find_in_range that just find the
available region. But if there is a 'no-map' property, memory region
should not be allocated to memblock.reserved.

So, the early_init_dt_alloc_reserved_memory_arch function was modified
using the no-map alloc function.

Signed-off-by: Daero Lee <daero_le.lee@samsung.com>
---
 drivers/of/of_reserved_mem.c |  9 +++--
 mm/memblock.c                | 78 ++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 3 deletions(-)

Comments

Wei Yang April 17, 2024, 2:23 a.m. UTC | #1
On Tue, Apr 16, 2024 at 09:06:35PM +0900, skseofh@gmail.com wrote:
>From: Daero Lee <daero_le.lee@samsung.com>
>
>Like reserved-memory with the 'no-map' property and only 'size' property
>(w/o 'reg' property), there are memory regions need to be allocated in
>memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be
>allocated in memblock.reserved.

We don't "allocate" memory from memblock.memory directly.

We present memory in memblock.memory and "allocate" a memory by marking it in
memblock.reserved.

>
>example : arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>        reserved-memory {
>                #address-cells = <2>;
>                #size-cells = <2>;
>                ranges;
>
>                bman_fbpr: bman-fbpr {
>                        compatible = "shared-dma-pool";
>                        size = <0 0x1000000>;
>                        alignment = <0 0x1000000>;
>                        no-map;
>                };
>
>                qman_fqd: qman-fqd {
>                        compatible = "shared-dma-pool";
>                        size = <0 0x400000>;
>                        alignment = <0 0x400000>;
>                        no-map;
>                };
>
>                qman_pfdr: qman-pfdr {
>                        compatible = "shared-dma-pool";
>                        size = <0 0x2000000>;
>                        alignment = <0 0x2000000>;
>                        no-map;
>                };
>        };
>
>So, functions were added that find the required memory area in
>memblock.memory, but do not allocate it to memblock.reserved.
>
>In previous patch(a7259df), early_init_dt_alloc_reserved_memory was

You want to say this patch introduced a regression?

>modified to use memblock_phys_alloc_range allocating memory in
>memblock.reserved, instead of memblock_find_in_range that just find the
>available region. But if there is a 'no-map' property, memory region
>should not be allocated to memblock.reserved.

If my understanding is correct, memblock_phys_free() is called if 'no-map'
property is set. This would release the "allocation" in memblock.reserved.

>
>So, the early_init_dt_alloc_reserved_memory_arch function was modified
>using the no-map alloc function.
>
>Signed-off-by: Daero Lee <daero_le.lee@samsung.com>
>---
> drivers/of/of_reserved_mem.c |  9 +++--
> mm/memblock.c                | 78 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 84 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
>index 8236ecae2953..504f2f60689c 100644
>--- a/drivers/of/of_reserved_mem.c
>+++ b/drivers/of/of_reserved_mem.c
>@@ -40,15 +40,18 @@ static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
> 
> 	end = !end ? MEMBLOCK_ALLOC_ANYWHERE : end;
> 	align = !align ? SMP_CACHE_BYTES : align;
>-	base = memblock_phys_alloc_range(size, align, start, end);
>+	if (nomap) {
>+		base = memblock_phys_alloc_range_nomap(size, align, start, end);
>+	} else {
>+		base = memblock_phys_alloc_range(size, align, start, end);
>+	}
>+	
> 	if (!base)
> 		return -ENOMEM;
> 
> 	*res_base = base;
> 	if (nomap) {
> 		err = memblock_mark_nomap(base, size);
>-		if (err)
>-			memblock_phys_free(base, size);

I see you removed it here, seems does the same as before.

> 	}
> 
> 	kmemleak_ignore_phys(base);
>diff --git a/mm/memblock.c b/mm/memblock.c
>index d09136e040d3..f103f1ecbfad 100644
>--- a/mm/memblock.c
>+++ b/mm/memblock.c
>@@ -1506,6 +1506,72 @@ phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
> 	return found;
> }
> 
>+phys_addr_t __init memblock_alloc_range_nid_nomap(phys_addr_t size,
>+                                        phys_addr_t align, phys_addr_t start,
>+                                        phys_addr_t end, int nid,
>+                                        bool exact_nid)
>+{
>+        enum memblock_flags flags = choose_memblock_flags();
>+        phys_addr_t found;
>+
>+        if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
>+                nid = NUMA_NO_NODE;
>+
>+        if (!align) {
>+                /* Can't use WARNs this early in boot on powerpc */
>+                dump_stack();
>+                align = SMP_CACHE_BYTES;
>+        }
>+
>+again:
>+        found = memblock_find_in_range_node(size, align, start, end, nid,
>+                                            flags);
>+        if (found)
>+                goto done;
>+
>+        if (nid != NUMA_NO_NODE && !exact_nid) {
>+                found = memblock_find_in_range_node(size, align, start,
>+                                                    end, NUMA_NO_NODE,
>+                                                    flags);
>+                if (found)
>+                        goto done;
>+        }
>+
>+        if (flags & MEMBLOCK_MIRROR) {
>+                flags &= ~MEMBLOCK_MIRROR;
>+                pr_warn_ratelimited("Could not allocate %pap bytes of mirrored memory\n",
>+                        &size);
>+                goto again;
>+        }
>+
>+        return 0;
>+
>+done:
>+        /*
>+         * Skip kmemleak for those places like kasan_init() and
>+         * early_pgtable_alloc() due to high volume.
>+         */
>+        if (end != MEMBLOCK_ALLOC_NOLEAKTRACE)
>+                /*
>+                 * Memblock allocated blocks are never reported as
>+                 * leaks. This is because many of these blocks are
>+                 * only referred via the physical address which is
>+                 * not looked up by kmemleak.
>+                 */
>+                kmemleak_alloc_phys(found, size, 0);
>+
>+        /*
>+         * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP,
>+         * require memory to be accepted before it can be used by the
>+         * guest.
>+         *
>+         * Accept the memory of the allocated buffer.
>+         */
>+        accept_memory(found, found + size);
>+
>+        return found;
>+}
>+
> /**
>  * memblock_phys_alloc_range - allocate a memory block inside specified range
>  * @size: size of memory block to be allocated in bytes
>@@ -1530,6 +1596,18 @@ phys_addr_t __init memblock_phys_alloc_range(phys_addr_t size,
> 					false);
> }
> 
>+phys_addr_t __init memblock_phys_alloc_range_nomap(phys_addr_t size,
>+                                                   phys_addr_t align,
>+                                                   phys_addr_t start,
>+                                                   phys_addr_t end)
>+{
>+        memblock_dbg("%s: %llu bytes align=0x%llx from=%pa max_addr=%pa %pS\n",
>+                     __func__, (u64)size, (u64)align, &start, &end,
>+                     (void *)_RET_IP_);
>+        return memblock_alloc_range_nid_nomap(size, align, start, end, 
>+					      NUMA_NO_NODE, false);
>+}
>+
> /**
>  * memblock_phys_alloc_try_nid - allocate a memory block from specified NUMA node
>  * @size: size of memory block to be allocated in bytes
>-- 
>2.25.1
>
Mike Rapoport April 17, 2024, 6:02 a.m. UTC | #2
On Tue, Apr 16, 2024 at 09:06:35PM +0900, skseofh@gmail.com wrote:
> From: Daero Lee <daero_le.lee@samsung.com>
> 
> Like reserved-memory with the 'no-map' property and only 'size' property
> (w/o 'reg' property), there are memory regions need to be allocated in
> memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be
> allocated in memblock.reserved.

This still does not explain why you need such regions.

As Wei Yang explained, memblock does not allocate memory from
memblock.reserved. The memblock.reserved array represents memory that is in
use by firmware or by early kernel allocations and cannot be freed to page
allocator.

If you have a region that's _NOMAP in memblock.memory and is absent in
memblock.reserved it will not be mapped by the kernel page tables, but it
will be considered as free memory by the core mm. 

Is this really what you want?
 
> example : arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
>         reserved-memory {
>                 #address-cells = <2>;
>                 #size-cells = <2>;
>                 ranges;
> 
>                 bman_fbpr: bman-fbpr {
>                         compatible = "shared-dma-pool";
>                         size = <0 0x1000000>;
>                         alignment = <0 0x1000000>;
>                         no-map;
>                 };
> 
>                 qman_fqd: qman-fqd {
>                         compatible = "shared-dma-pool";
>                         size = <0 0x400000>;
>                         alignment = <0 0x400000>;
>                         no-map;
>                 };
> 
>                 qman_pfdr: qman-pfdr {
>                         compatible = "shared-dma-pool";
>                         size = <0 0x2000000>;
>                         alignment = <0 0x2000000>;
>                         no-map;
>                 };
>         };
>
DaeRo Lee April 18, 2024, 2:54 p.m. UTC | #3
2024년 4월 17일 (수) 오후 3:03, Mike Rapoport <rppt@kernel.org>님이 작성:
>
> On Tue, Apr 16, 2024 at 09:06:35PM +0900, skseofh@gmail.com wrote:
> > From: Daero Lee <daero_le.lee@samsung.com>
> >
> > Like reserved-memory with the 'no-map' property and only 'size' property
> > (w/o 'reg' property), there are memory regions need to be allocated in
> > memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be
> > allocated in memblock.reserved.
>
> This still does not explain why you need such regions.
>
> As Wei Yang explained, memblock does not allocate memory from
> memblock.reserved. The memblock.reserved array represents memory that is in
> use by firmware or by early kernel allocations and cannot be freed to page
> allocator.
Thank you for your comments. I used the wrong word.
When I use 'allocate', I mean that the region 'adds' to the memblock.reserved.

>
> If you have a region that's _NOMAP in memblock.memory and is absent in
> memblock.reserved it will not be mapped by the kernel page tables, but it
> will be considered as free memory by the core mm.
>
> Is this really what you want?
If my understanding is right, before freeing (memory && !reserved)
area, we marked the memblock.reserved regions and memblock.memory
regions with no-map flag. And when we free (memory && !reserved) area,
we skip the memblock.memory regions with no-map(see
should_skip_region). So, I think that the memory regions with no-map
flag will not be considered as free memory.

If there is anything I think is wrong, feel free to correct me.

Regards,
DaeRo Lee
Mike Rapoport April 18, 2024, 6:03 p.m. UTC | #4
On Thu, Apr 18, 2024 at 11:54:15PM +0900, DaeRo Lee wrote:
> 2024년 4월 17일 (수) 오후 3:03, Mike Rapoport <rppt@kernel.org>님이 작성:
> >
> > On Tue, Apr 16, 2024 at 09:06:35PM +0900, skseofh@gmail.com wrote:
> > > From: Daero Lee <daero_le.lee@samsung.com>
> > >
> > > Like reserved-memory with the 'no-map' property and only 'size' property
> > > (w/o 'reg' property), there are memory regions need to be allocated in
> > > memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be
> > > allocated in memblock.reserved.
> >
> > This still does not explain why you need such regions.
> >
> > As Wei Yang explained, memblock does not allocate memory from
> > memblock.reserved. The memblock.reserved array represents memory that is in
> > use by firmware or by early kernel allocations and cannot be freed to page
> > allocator.
> Thank you for your comments. I used the wrong word.
> When I use 'allocate', I mean that the region 'adds' to the memblock.reserved.
> 
> >
> > If you have a region that's _NOMAP in memblock.memory and is absent in
> > memblock.reserved it will not be mapped by the kernel page tables, but it
> > will be considered as free memory by the core mm.
> >
> > Is this really what you want?
> If my understanding is right, before freeing (memory && !reserved)
> area, we marked the memblock.reserved regions and memblock.memory
> regions with no-map flag. And when we free (memory && !reserved) area,
> we skip the memblock.memory regions with no-map(see
> should_skip_region). So, I think that the memory regions with no-map
> flag will not be considered as free memory.

You are right here.

But I still don't understand *why* do you want to change the way
early_init_dt_alloc_reserved_memory_arch() works.
 
> Regards,
> DaeRo Lee
DaeRo Lee April 19, 2024, 1:46 a.m. UTC | #5
2024년 4월 19일 (금) 오전 3:04, Mike Rapoport <rppt@kernel.org>님이 작성:
>
> On Thu, Apr 18, 2024 at 11:54:15PM +0900, DaeRo Lee wrote:
> > 2024년 4월 17일 (수) 오후 3:03, Mike Rapoport <rppt@kernel.org>님이 작성:
> > >
> > > On Tue, Apr 16, 2024 at 09:06:35PM +0900, skseofh@gmail.com wrote:
> > > > From: Daero Lee <daero_le.lee@samsung.com>
> > > >
> > > > Like reserved-memory with the 'no-map' property and only 'size' property
> > > > (w/o 'reg' property), there are memory regions need to be allocated in
> > > > memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be
> > > > allocated in memblock.reserved.
> > >
> > > This still does not explain why you need such regions.
> > >
> > > As Wei Yang explained, memblock does not allocate memory from
> > > memblock.reserved. The memblock.reserved array represents memory that is in
> > > use by firmware or by early kernel allocations and cannot be freed to page
> > > allocator.
> > Thank you for your comments. I used the wrong word.
> > When I use 'allocate', I mean that the region 'adds' to the memblock.reserved.
> >
> > >
> > > If you have a region that's _NOMAP in memblock.memory and is absent in
> > > memblock.reserved it will not be mapped by the kernel page tables, but it
> > > will be considered as free memory by the core mm.
> > >
> > > Is this really what you want?
> > If my understanding is right, before freeing (memory && !reserved)
> > area, we marked the memblock.reserved regions and memblock.memory
> > regions with no-map flag. And when we free (memory && !reserved) area,
> > we skip the memblock.memory regions with no-map(see
> > should_skip_region). So, I think that the memory regions with no-map
> > flag will not be considered as free memory.
>
> You are right here.
>
> But I still don't understand *why* do you want to change the way
> early_init_dt_alloc_reserved_memory_arch() works.

In memmap_init_reserved_pages, we mark memblock.reserved as
PageReserved first and mark the memblock.reserved with nomap flag
also.
-> Isn't this duplicated work? (If we add no-map region to
memblock.reserved 'and' mark in memblock.memory..)
So, I think that for the no-map region, we don't need to add to the
memblock.reserved.
This is what we do now in early_init_dt_reserve_memory. the nomap
region is not added to the memblock.reserved.

In early_init_dt_alloc_reserved_memory_arch, if 'nomap' is true, we
mark the memblock.memory region as _NOMAP. And if the return value
'err' is not zero(which is '-ENOMEM' from memblock_isolate_range), we
free the region.
- 'nomap' is true -> memblock_mark_nomap : success -> not free the region

: fail -> free the region
And it can be said that we add the region to the memblock.reserved
using memblock_phys_alloc_range and if the region is nomap, then we
can free the region from memblock.reserved. But is it necessary to add
it to memblock.reserved? We just need the region in memblock.memory to
mark nomap.

So, here is what I think:
- reserved-memory w/ nomap region -> mark only to memblock.memory
- reserved-memory w/o nomap region -> add to the memblock.reserved

Regards,
DaeRo Lee
DaeRo Lee April 19, 2024, 1:59 a.m. UTC | #6
2024년 4월 19일 (금) 오전 10:46, DaeRo Lee <skseofh@gmail.com>님이 작성:
>
> 2024년 4월 19일 (금) 오전 3:04, Mike Rapoport <rppt@kernel.org>님이 작성:
> >
> > On Thu, Apr 18, 2024 at 11:54:15PM +0900, DaeRo Lee wrote:
> > > 2024년 4월 17일 (수) 오후 3:03, Mike Rapoport <rppt@kernel.org>님이 작성:
> > > >
> > > > On Tue, Apr 16, 2024 at 09:06:35PM +0900, skseofh@gmail.com wrote:
> > > > > From: Daero Lee <daero_le.lee@samsung.com>
> > > > >
> > > > > Like reserved-memory with the 'no-map' property and only 'size' property
> > > > > (w/o 'reg' property), there are memory regions need to be allocated in
> > > > > memblock.memory marked with the MEMBLOCK_NOMAP flag, but should not be
> > > > > allocated in memblock.reserved.
> > > >
> > > > This still does not explain why you need such regions.
> > > >
> > > > As Wei Yang explained, memblock does not allocate memory from
> > > > memblock.reserved. The memblock.reserved array represents memory that is in
> > > > use by firmware or by early kernel allocations and cannot be freed to page
> > > > allocator.
> > > Thank you for your comments. I used the wrong word.
> > > When I use 'allocate', I mean that the region 'adds' to the memblock.reserved.
> > >
> > > >
> > > > If you have a region that's _NOMAP in memblock.memory and is absent in
> > > > memblock.reserved it will not be mapped by the kernel page tables, but it
> > > > will be considered as free memory by the core mm.
> > > >
> > > > Is this really what you want?
> > > If my understanding is right, before freeing (memory && !reserved)
> > > area, we marked the memblock.reserved regions and memblock.memory
> > > regions with no-map flag. And when we free (memory && !reserved) area,
> > > we skip the memblock.memory regions with no-map(see
> > > should_skip_region). So, I think that the memory regions with no-map
> > > flag will not be considered as free memory.
> >
> > You are right here.
> >
> > But I still don't understand *why* do you want to change the way
> > early_init_dt_alloc_reserved_memory_arch() works.
>
> In memmap_init_reserved_pages, we mark memblock.reserved as
> PageReserved first and mark the memblock.reserved with nomap flag
> also.
Sorry. This is my mistake. 'memblock.memory with nomap flag' is right.

> -> Isn't this duplicated work? (If we add no-map region to
> memblock.reserved 'and' mark in memblock.memory..)
> So, I think that for the no-map region, we don't need to add to the
> memblock.reserved.
> This is what we do now in early_init_dt_reserve_memory. the nomap
> region is not added to the memblock.reserved.
>
> In early_init_dt_alloc_reserved_memory_arch, if 'nomap' is true, we
> mark the memblock.memory region as _NOMAP. And if the return value
> 'err' is not zero(which is '-ENOMEM' from memblock_isolate_range), we
> free the region.
> - 'nomap' is true -> memblock_mark_nomap : success -> not free the region
>
> : fail -> free the region
> And it can be said that we add the region to the memblock.reserved
> using memblock_phys_alloc_range and if the region is nomap, then we
> can free the region from memblock.reserved. But is it necessary to add
> it to memblock.reserved? We just need the region in memblock.memory to
> mark nomap.
>
> So, here is what I think:
> - reserved-memory w/ nomap region -> mark only to memblock.memory
> - reserved-memory w/o nomap region -> add to the memblock.reserved
>
> Regards,
> DaeRo Lee
Mike Rapoport April 27, 2024, 8:49 a.m. UTC | #7
On Fri, Apr 19, 2024 at 10:59:52AM +0900, DaeRo Lee wrote:
> 2024년 4월 19일 (금) 오전 10:46, DaeRo Lee <skseofh@gmail.com>님이 작성:
> >
> > In memmap_init_reserved_pages, we mark memblock.reserved as
> > PageReserved first and mark the memblock.reserved with nomap flag
> > also.
> Sorry. This is my mistake. 'memblock.memory with nomap flag' is right.
> 
> > -> Isn't this duplicated work? (If we add no-map region to
> > memblock.reserved 'and' mark in memblock.memory..)
> > So, I think that for the no-map region, we don't need to add to the
> > memblock.reserved.
> > This is what we do now in early_init_dt_reserve_memory. the nomap
> > region is not added to the memblock.reserved.
> >
> > In early_init_dt_alloc_reserved_memory_arch, if 'nomap' is true, we
> > mark the memblock.memory region as _NOMAP. And if the return value
> > 'err' is not zero(which is '-ENOMEM' from memblock_isolate_range), we
> > free the region.
> > - 'nomap' is true -> memblock_mark_nomap : success -> not free the region
> >
> > : fail -> free the region
> > And it can be said that we add the region to the memblock.reserved
> > using memblock_phys_alloc_range and if the region is nomap, then we
> > can free the region from memblock.reserved. But is it necessary to add
> > it to memblock.reserved? We just need the region in memblock.memory to
> > mark nomap.
> >
> > So, here is what I think:
> > - reserved-memory w/ nomap region -> mark only to memblock.memory
> > - reserved-memory w/o nomap region -> add to the memblock.reserved

NOMAP and memblock.reserved are semantically different, and at makes sense
to have a "reserved nomap" node in fdt recorded in both memblock.memory and
memblock.reserved.

memblock.reserved represents the memory that is used by firmware or early
kernel allocation, so reserved memory in fdt should be reserved in memblock
as well. I believe it's an oversight that early_init_dt_reserve_memory()
does not call memblock_reserve() for nomap memory.

NOMAP is a property of a memory region that says that that region should
not be mapped in the linear map, it's not necessarily in use.

> > Regards,
> > DaeRo Lee
DaeRo Lee April 27, 2024, 10:24 a.m. UTC | #8
2024년 4월 27일 (토) 오후 5:50, Mike Rapoport <rppt@kernel.org>님이 작성:
>
> On Fri, Apr 19, 2024 at 10:59:52AM +0900, DaeRo Lee wrote:
> > 2024년 4월 19일 (금) 오전 10:46, DaeRo Lee <skseofh@gmail.com>님이 작성:
> > >
> > > In memmap_init_reserved_pages, we mark memblock.reserved as
> > > PageReserved first and mark the memblock.reserved with nomap flag
> > > also.
> > Sorry. This is my mistake. 'memblock.memory with nomap flag' is right.
> >
> > > -> Isn't this duplicated work? (If we add no-map region to
> > > memblock.reserved 'and' mark in memblock.memory..)
> > > So, I think that for the no-map region, we don't need to add to the
> > > memblock.reserved.
> > > This is what we do now in early_init_dt_reserve_memory. the nomap
> > > region is not added to the memblock.reserved.
> > >
> > > In early_init_dt_alloc_reserved_memory_arch, if 'nomap' is true, we
> > > mark the memblock.memory region as _NOMAP. And if the return value
> > > 'err' is not zero(which is '-ENOMEM' from memblock_isolate_range), we
> > > free the region.
> > > - 'nomap' is true -> memblock_mark_nomap : success -> not free the region
> > >
> > > : fail -> free the region
> > > And it can be said that we add the region to the memblock.reserved
> > > using memblock_phys_alloc_range and if the region is nomap, then we
> > > can free the region from memblock.reserved. But is it necessary to add
> > > it to memblock.reserved? We just need the region in memblock.memory to
> > > mark nomap.
> > >
> > > So, here is what I think:
> > > - reserved-memory w/ nomap region -> mark only to memblock.memory
> > > - reserved-memory w/o nomap region -> add to the memblock.reserved
>
> NOMAP and memblock.reserved are semantically different, and at makes sense
> to have a "reserved nomap" node in fdt recorded in both memblock.memory and
> memblock.reserved.
>
> memblock.reserved represents the memory that is used by firmware or early
> kernel allocation, so reserved memory in fdt should be reserved in memblock
> as well. I believe it's an oversight that early_init_dt_reserve_memory()
> does not call memblock_reserve() for nomap memory.
>
> NOMAP is a property of a memory region that says that that region should
> not be mapped in the linear map, it's not necessarily in use.

I agree that the NOMAP region should be added to memblock.reserved.

So, I think we need to clean-up memmap_init_reserved_pages, because in
this function we call reserve_bootmem_region for memblock.reserved and
memblock.memory with nomap. We don't need to call
reserve_bootmem_region for nomap.

Regards,.
DaeRo Lee
Mike Rapoport April 28, 2024, 6:33 a.m. UTC | #9
On Sat, Apr 27, 2024 at 07:24:23PM +0900, DaeRo Lee wrote:
> 2024년 4월 27일 (토) 오후 5:50, Mike Rapoport <rppt@kernel.org>님이 작성:
> >
> > On Fri, Apr 19, 2024 at 10:59:52AM +0900, DaeRo Lee wrote:
> > > 2024년 4월 19일 (금) 오전 10:46, DaeRo Lee <skseofh@gmail.com>님이 작성:
> > > >
> > > > In memmap_init_reserved_pages, we mark memblock.reserved as
> > > > PageReserved first and mark the memblock.reserved with nomap flag
> > > > also.
> > > Sorry. This is my mistake. 'memblock.memory with nomap flag' is right.
> > >
> > > > -> Isn't this duplicated work? (If we add no-map region to
> > > > memblock.reserved 'and' mark in memblock.memory..)
> > > > So, I think that for the no-map region, we don't need to add to the
> > > > memblock.reserved.
> > > > This is what we do now in early_init_dt_reserve_memory. the nomap
> > > > region is not added to the memblock.reserved.
> > > >
> > > > In early_init_dt_alloc_reserved_memory_arch, if 'nomap' is true, we
> > > > mark the memblock.memory region as _NOMAP. And if the return value
> > > > 'err' is not zero(which is '-ENOMEM' from memblock_isolate_range), we
> > > > free the region.
> > > > - 'nomap' is true -> memblock_mark_nomap : success -> not free the region
> > > >
> > > > : fail -> free the region
> > > > And it can be said that we add the region to the memblock.reserved
> > > > using memblock_phys_alloc_range and if the region is nomap, then we
> > > > can free the region from memblock.reserved. But is it necessary to add
> > > > it to memblock.reserved? We just need the region in memblock.memory to
> > > > mark nomap.
> > > >
> > > > So, here is what I think:
> > > > - reserved-memory w/ nomap region -> mark only to memblock.memory
> > > > - reserved-memory w/o nomap region -> add to the memblock.reserved
> >
> > NOMAP and memblock.reserved are semantically different, and at makes sense
> > to have a "reserved nomap" node in fdt recorded in both memblock.memory and
> > memblock.reserved.
> >
> > memblock.reserved represents the memory that is used by firmware or early
> > kernel allocation, so reserved memory in fdt should be reserved in memblock
> > as well. I believe it's an oversight that early_init_dt_reserve_memory()
> > does not call memblock_reserve() for nomap memory.
> >
> > NOMAP is a property of a memory region that says that that region should
> > not be mapped in the linear map, it's not necessarily in use.
> 
> I agree that the NOMAP region should be added to memblock.reserved.
> 
> So, I think we need to clean-up memmap_init_reserved_pages, because in
> this function we call reserve_bootmem_region for memblock.reserved and
> memblock.memory with nomap. We don't need to call
> reserve_bootmem_region for nomap.

Read the comment about memblock_mark_nomap()

> Regards,.
> DaeRo Lee
DaeRo Lee April 28, 2024, 10:36 a.m. UTC | #10
2024년 4월 28일 (일) 오후 3:35, Mike Rapoport <rppt@kernel.org>님이 작성:
>
> On Sat, Apr 27, 2024 at 07:24:23PM +0900, DaeRo Lee wrote:
> > 2024년 4월 27일 (토) 오후 5:50, Mike Rapoport <rppt@kernel.org>님이 작성:
> > >
> > > On Fri, Apr 19, 2024 at 10:59:52AM +0900, DaeRo Lee wrote:
> > > > 2024년 4월 19일 (금) 오전 10:46, DaeRo Lee <skseofh@gmail.com>님이 작성:
> > > > >
> > > > > In memmap_init_reserved_pages, we mark memblock.reserved as
> > > > > PageReserved first and mark the memblock.reserved with nomap flag
> > > > > also.
> > > > Sorry. This is my mistake. 'memblock.memory with nomap flag' is right.
> > > >
> > > > > -> Isn't this duplicated work? (If we add no-map region to
> > > > > memblock.reserved 'and' mark in memblock.memory..)
> > > > > So, I think that for the no-map region, we don't need to add to the
> > > > > memblock.reserved.
> > > > > This is what we do now in early_init_dt_reserve_memory. the nomap
> > > > > region is not added to the memblock.reserved.
> > > > >
> > > > > In early_init_dt_alloc_reserved_memory_arch, if 'nomap' is true, we
> > > > > mark the memblock.memory region as _NOMAP. And if the return value
> > > > > 'err' is not zero(which is '-ENOMEM' from memblock_isolate_range), we
> > > > > free the region.
> > > > > - 'nomap' is true -> memblock_mark_nomap : success -> not free the region
> > > > >
> > > > > : fail -> free the region
> > > > > And it can be said that we add the region to the memblock.reserved
> > > > > using memblock_phys_alloc_range and if the region is nomap, then we
> > > > > can free the region from memblock.reserved. But is it necessary to add
> > > > > it to memblock.reserved? We just need the region in memblock.memory to
> > > > > mark nomap.
> > > > >
> > > > > So, here is what I think:
> > > > > - reserved-memory w/ nomap region -> mark only to memblock.memory
> > > > > - reserved-memory w/o nomap region -> add to the memblock.reserved
> > >
> > > NOMAP and memblock.reserved are semantically different, and at makes sense
> > > to have a "reserved nomap" node in fdt recorded in both memblock.memory and
> > > memblock.reserved.
> > >
> > > memblock.reserved represents the memory that is used by firmware or early
> > > kernel allocation, so reserved memory in fdt should be reserved in memblock
> > > as well. I believe it's an oversight that early_init_dt_reserve_memory()
> > > does not call memblock_reserve() for nomap memory.
> > >
> > > NOMAP is a property of a memory region that says that that region should
> > > not be mapped in the linear map, it's not necessarily in use.
> >
> > I agree that the NOMAP region should be added to memblock.reserved.
> >
> > So, I think we need to clean-up memmap_init_reserved_pages, because in
> > this function we call reserve_bootmem_region for memblock.reserved and
> > memblock.memory with nomap. We don't need to call
> > reserve_bootmem_region for nomap.
>
> Read the comment about memblock_mark_nomap()
I read the comment about memblock_mark_nomap() and understood that
regions with nomap flags should be treated as PageReserved.
But, if we add this nomap region to memblock.reserved, the region with
nomap flag will be processed in the first for-loop in
memmap_init_reserved_pages.

Am I thinking wrong?

Regards,
DaeRo Lee
Mike Rapoport April 28, 2024, noon UTC | #11
On Sun, Apr 28, 2024 at 07:36:40PM +0900, DaeRo Lee wrote:
> 2024년 4월 28일 (일) 오후 3:35, Mike Rapoport <rppt@kernel.org>님이 작성:
> >
> > On Sat, Apr 27, 2024 at 07:24:23PM +0900, DaeRo Lee wrote:
> > > 2024년 4월 27일 (토) 오후 5:50, Mike Rapoport <rppt@kernel.org>님이 작성:
> > > > > >
> > > > > > So, here is what I think:
> > > > > > - reserved-memory w/ nomap region -> mark only to memblock.memory
> > > > > > - reserved-memory w/o nomap region -> add to the memblock.reserved
> > > >
> > > > NOMAP and memblock.reserved are semantically different, and at makes sense
> > > > to have a "reserved nomap" node in fdt recorded in both memblock.memory and
> > > > memblock.reserved.
> > > >
> > > > memblock.reserved represents the memory that is used by firmware or early
> > > > kernel allocation, so reserved memory in fdt should be reserved in memblock
> > > > as well. I believe it's an oversight that early_init_dt_reserve_memory()
> > > > does not call memblock_reserve() for nomap memory.
> > > >
> > > > NOMAP is a property of a memory region that says that that region should
> > > > not be mapped in the linear map, it's not necessarily in use.
> > >
> > > I agree that the NOMAP region should be added to memblock.reserved.
> > >
> > > So, I think we need to clean-up memmap_init_reserved_pages, because in
> > > this function we call reserve_bootmem_region for memblock.reserved and
> > > memblock.memory with nomap. We don't need to call
> > > reserve_bootmem_region for nomap.
> >
> > Read the comment about memblock_mark_nomap()
> I read the comment about memblock_mark_nomap() and understood that
> regions with nomap flags should be treated as PageReserved.
> But, if we add this nomap region to memblock.reserved, the region with
> nomap flag will be processed in the first for-loop in
> memmap_init_reserved_pages.

memblock still must make sure that pages in nomap regions get PG_Reserved
to be robust against potential errors and bugs in firmware parsing.
 
> Am I thinking wrong?
> 
> Regards,
> DaeRo Lee
DaeRo Lee April 28, 2024, 12:52 p.m. UTC | #12
2024년 4월 28일 (일) 오후 9:01, Mike Rapoport <rppt@kernel.org>님이 작성:
>
> On Sun, Apr 28, 2024 at 07:36:40PM +0900, DaeRo Lee wrote:
> > 2024년 4월 28일 (일) 오후 3:35, Mike Rapoport <rppt@kernel.org>님이 작성:
> > >
> > > On Sat, Apr 27, 2024 at 07:24:23PM +0900, DaeRo Lee wrote:
> > > > 2024년 4월 27일 (토) 오후 5:50, Mike Rapoport <rppt@kernel.org>님이 작성:
> > > > > > >
> > > > > > > So, here is what I think:
> > > > > > > - reserved-memory w/ nomap region -> mark only to memblock.memory
> > > > > > > - reserved-memory w/o nomap region -> add to the memblock.reserved
> > > > >
> > > > > NOMAP and memblock.reserved are semantically different, and at makes sense
> > > > > to have a "reserved nomap" node in fdt recorded in both memblock.memory and
> > > > > memblock.reserved.
> > > > >
> > > > > memblock.reserved represents the memory that is used by firmware or early
> > > > > kernel allocation, so reserved memory in fdt should be reserved in memblock
> > > > > as well. I believe it's an oversight that early_init_dt_reserve_memory()
> > > > > does not call memblock_reserve() for nomap memory.
> > > > >
> > > > > NOMAP is a property of a memory region that says that that region should
> > > > > not be mapped in the linear map, it's not necessarily in use.
> > > >
> > > > I agree that the NOMAP region should be added to memblock.reserved.
> > > >
> > > > So, I think we need to clean-up memmap_init_reserved_pages, because in
> > > > this function we call reserve_bootmem_region for memblock.reserved and
> > > > memblock.memory with nomap. We don't need to call
> > > > reserve_bootmem_region for nomap.
> > >
> > > Read the comment about memblock_mark_nomap()
> > I read the comment about memblock_mark_nomap() and understood that
> > regions with nomap flags should be treated as PageReserved.
> > But, if we add this nomap region to memblock.reserved, the region with
> > nomap flag will be processed in the first for-loop in
> > memmap_init_reserved_pages.
>
> memblock still must make sure that pages in nomap regions get PG_Reserved
> to be robust against potential errors and bugs in firmware parsing.
Got it.
Thank you for your comments. I'll make another patch for just clean-up
early_init_dt_reserve_memory()

Thank you.

Regards,
DaeRo Lee
diff mbox series

Patch

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 8236ecae2953..504f2f60689c 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -40,15 +40,18 @@  static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
 
 	end = !end ? MEMBLOCK_ALLOC_ANYWHERE : end;
 	align = !align ? SMP_CACHE_BYTES : align;
-	base = memblock_phys_alloc_range(size, align, start, end);
+	if (nomap) {
+		base = memblock_phys_alloc_range_nomap(size, align, start, end);
+	} else {
+		base = memblock_phys_alloc_range(size, align, start, end);
+	}
+	
 	if (!base)
 		return -ENOMEM;
 
 	*res_base = base;
 	if (nomap) {
 		err = memblock_mark_nomap(base, size);
-		if (err)
-			memblock_phys_free(base, size);
 	}
 
 	kmemleak_ignore_phys(base);
diff --git a/mm/memblock.c b/mm/memblock.c
index d09136e040d3..f103f1ecbfad 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1506,6 +1506,72 @@  phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size,
 	return found;
 }
 
+phys_addr_t __init memblock_alloc_range_nid_nomap(phys_addr_t size,
+                                        phys_addr_t align, phys_addr_t start,
+                                        phys_addr_t end, int nid,
+                                        bool exact_nid)
+{
+        enum memblock_flags flags = choose_memblock_flags();
+        phys_addr_t found;
+
+        if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
+                nid = NUMA_NO_NODE;
+
+        if (!align) {
+                /* Can't use WARNs this early in boot on powerpc */
+                dump_stack();
+                align = SMP_CACHE_BYTES;
+        }
+
+again:
+        found = memblock_find_in_range_node(size, align, start, end, nid,
+                                            flags);
+        if (found)
+                goto done;
+
+        if (nid != NUMA_NO_NODE && !exact_nid) {
+                found = memblock_find_in_range_node(size, align, start,
+                                                    end, NUMA_NO_NODE,
+                                                    flags);
+                if (found)
+                        goto done;
+        }
+
+        if (flags & MEMBLOCK_MIRROR) {
+                flags &= ~MEMBLOCK_MIRROR;
+                pr_warn_ratelimited("Could not allocate %pap bytes of mirrored memory\n",
+                        &size);
+                goto again;
+        }
+
+        return 0;
+
+done:
+        /*
+         * Skip kmemleak for those places like kasan_init() and
+         * early_pgtable_alloc() due to high volume.
+         */
+        if (end != MEMBLOCK_ALLOC_NOLEAKTRACE)
+                /*
+                 * Memblock allocated blocks are never reported as
+                 * leaks. This is because many of these blocks are
+                 * only referred via the physical address which is
+                 * not looked up by kmemleak.
+                 */
+                kmemleak_alloc_phys(found, size, 0);
+
+        /*
+         * Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP,
+         * require memory to be accepted before it can be used by the
+         * guest.
+         *
+         * Accept the memory of the allocated buffer.
+         */
+        accept_memory(found, found + size);
+
+        return found;
+}
+
 /**
  * memblock_phys_alloc_range - allocate a memory block inside specified range
  * @size: size of memory block to be allocated in bytes
@@ -1530,6 +1596,18 @@  phys_addr_t __init memblock_phys_alloc_range(phys_addr_t size,
 					false);
 }
 
+phys_addr_t __init memblock_phys_alloc_range_nomap(phys_addr_t size,
+                                                   phys_addr_t align,
+                                                   phys_addr_t start,
+                                                   phys_addr_t end)
+{
+        memblock_dbg("%s: %llu bytes align=0x%llx from=%pa max_addr=%pa %pS\n",
+                     __func__, (u64)size, (u64)align, &start, &end,
+                     (void *)_RET_IP_);
+        return memblock_alloc_range_nid_nomap(size, align, start, end, 
+					      NUMA_NO_NODE, false);
+}
+
 /**
  * memblock_phys_alloc_try_nid - allocate a memory block from specified NUMA node
  * @size: size of memory block to be allocated in bytes