diff mbox series

powerpc/drmem: cache LMBs in xarray to accelerate lookup

Message ID 20200128221113.17158-1-cheloha@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series powerpc/drmem: cache LMBs in xarray to accelerate lookup | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (23f1be8476528cfab2a015de5a4c5cecf69536d0)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 127 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Scott Cheloha Jan. 28, 2020, 10:11 p.m. UTC
LMB lookup is currently an O(n) linear search.  This scales poorly when
there are many LMBs.

If we cache each LMB by both its base address and its DRC index
in an xarray we can cut lookups to O(log n), greatly accelerating
drmem initialization and memory hotplug.

This patch introduces two xarrays of of LMBs and fills them during
drmem initialization.  The patch also adds two interfaces for LMB
lookup.

The first interface, drmem_find_lmb_by_base_addr(), is employed in
hot_add_drconf_scn_to_nid() to replace a linear search.  This speeds up
memory_add_physaddr_to_nid(), which is called by lmb_set_nid(), an
interface used during drmem initialization and memory hotplug.

The second interface, drmem_find_lmb_by_drc_index(), is employed in
get_lmb_range() to replace a linear search.  This speeds up
dlpar_memory_add_by_ic() and dlpar_memory_remove_by_ic(), interfaces
used during memory hotplug.

These substitutions yield significant improvements:

1. A POWER9 VM with a maximum memory of 10TB and 256MB LMBs has
   40960 LMBs.  With this patch it completes drmem_init() ~1138ms
   faster.

Before:
[    0.542244] drmem: initializing drmem v1
[    1.768787] drmem: initialized 40960 LMBs

After:
[    0.543611] drmem: initializing drmem v1
[    0.631386] drmem: initialized 40960 LMBs

2. A POWER9 VM with a maximum memory of 4TB and 256MB LMBs has
   16384 LMBs.  Via the qemu monitor we can hot-add memory as
   virtual DIMMs.  Each DIMM is 256 LMBs.  With this patch we
   hot-add every possible LMB about 60 seconds faster.

Before:
[   17.422177] pseries-hotplug-mem: Attempting to hot-add 256 LMB(s) at index 80000100
[...]
[  167.285563] pseries-hotplug-mem: Memory at 3fff0000000 (drc index 80003fff) was hot-added

After:
[   14.753480] pseries-hotplug-mem: Attempting to hot-add 256 LMB(s) at index 80000100
[...]
[  103.934092] pseries-hotplug-mem: Memory at 3fff0000000 (drc index 80003fff) was hot-added

Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
---
These linear searches become a serious bottleneck as the machine
approaches 64TB.  There are just too many LMBs to use a linear
search.

On a 60TB machine we recently saw the following soft lockup during
drmem_init():

[   60.602386] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [swapper/0:1]
[   60.602414] Modules linked in:
[   60.602417] Supported: No, Unreleased kernel
[   60.602423] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.3.18-2-default #1 SLE15-SP2 (unreleased)
[   60.602426] NIP:  c000000000095c0c LR: c000000000095bb0 CTR: 0000000000000000
[   60.602430] REGS: c00022c7fc497830 TRAP: 0901   Not tainted  (5.3.18-2-default)
[   60.602432] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000244  XER: 00000000
[   60.602442] CFAR: c000000000095c18 IRQMASK: 0 
               GPR00: c000000000095bb0 c00022c7fc497ac0 c00000000162cc00 c0003bffffff5e08 
               GPR04: 0000000000000000 c000000000ea539a 000000000000002f 0000000010000000 
               GPR08: c00007fc5f59ffb8 000014bc50000000 c00007fc5f1f1a30 c0000000014f2fb8 
               GPR12: 0000000000000000 c00000001e980600 
[   60.602464] NIP [c000000000095c0c] hot_add_scn_to_nid+0xbc/0x400
[   60.602467] LR [c000000000095bb0] hot_add_scn_to_nid+0x60/0x400
[   60.602470] Call Trace:
[   60.602473] [c00022c7fc497ac0] [c000000000095bb0] hot_add_scn_to_nid+0x60/0x400 (unreliable)
[   60.602478] [c00022c7fc497b20] [c00000000007a6a0] memory_add_physaddr_to_nid+0x20/0x60
[   60.602483] [c00022c7fc497b40] [c0000000010235a4] drmem_init+0x258/0x2d8
[   60.602485] [c00022c7fc497c10] [c000000000010694] do_one_initcall+0x64/0x300
[   60.602489] [c00022c7fc497ce0] [c0000000010144f8] kernel_init_freeable+0x2e8/0x3fc
[   60.602491] [c00022c7fc497db0] [c000000000010b0c] kernel_init+0x2c/0x160
[   60.602497] [c00022c7fc497e20] [c00000000000b960] ret_from_kernel_thread+0x5c/0x7c
[   60.602498] Instruction dump:
[   60.602501] 7d0a4214 7faa4040 419d0328 e92a0010 71290088 2fa90008 409e001c e92a0000 
[   60.602506] 7fbe4840 419c0010 7d274a14 7fbe4840 <419c00e4> 394a0018 7faa4040 409dffd0 

This patch should eliminate the drmem_init() bottleneck during boot.

One other important thing to note is that this only addresses part of
the slowdown during memory hotplug when there are many LMBs.  A far larger
part of it is caused by the linear memblock search in find_memory_block().
That problem is addressed with this patch:

https://lore.kernel.org/lkml/20200121231028.13699-1-cheloha@linux.ibm.com/

which is in linux-next.  The numbers I quote here in the commit message
for time improvements during hotplug are taken with that patch applied.

Without it, hotplug is even slower.

 arch/powerpc/include/asm/drmem.h              |  3 ++
 arch/powerpc/mm/drmem.c                       | 33 +++++++++++++++++++
 arch/powerpc/mm/numa.c                        | 30 +++++++----------
 .../platforms/pseries/hotplug-memory.c        | 11 ++-----
 4 files changed, 50 insertions(+), 27 deletions(-)

Comments

Nathan Lynch Jan. 28, 2020, 11:56 p.m. UTC | #1
Scott Cheloha <cheloha@linux.ibm.com> writes:
> LMB lookup is currently an O(n) linear search.  This scales poorly when
> there are many LMBs.
>
> If we cache each LMB by both its base address and its DRC index
> in an xarray we can cut lookups to O(log n), greatly accelerating
> drmem initialization and memory hotplug.
>
> This patch introduces two xarrays of of LMBs and fills them during
> drmem initialization.  The patch also adds two interfaces for LMB
> lookup.

Good but can you replace the array of LMBs altogether
(drmem_info->lmbs)? xarray allows iteration over the members if needed.
Scott Cheloha Jan. 29, 2020, 6:10 p.m. UTC | #2
On Tue, Jan 28, 2020 at 05:56:55PM -0600, Nathan Lynch wrote:
> Scott Cheloha <cheloha@linux.ibm.com> writes:
> > LMB lookup is currently an O(n) linear search.  This scales poorly when
> > there are many LMBs.
> >
> > If we cache each LMB by both its base address and its DRC index
> > in an xarray we can cut lookups to O(log n), greatly accelerating
> > drmem initialization and memory hotplug.
> >
> > This patch introduces two xarrays of of LMBs and fills them during
> > drmem initialization.  The patch also adds two interfaces for LMB
> > lookup.
> 
> Good but can you replace the array of LMBs altogether
> (drmem_info->lmbs)? xarray allows iteration over the members if needed.

I don't think we can without potentially changing the current behavior.

The current behavior in dlpar_memory_{add,remove}_by_ic() is to advance
linearly through the array from the LMB with the matching DRC index.

Iteration through the xarray via xa_for_each_start() will return LMBs
indexed with monotonically increasing DRC indices.

Are they equivalent?  Or can we have an LMB with a smaller DRC index
appear at a greater offset in the array?

If the following condition is possible:

	drmem_info->lmbs[i].drc_index > drmem_info->lmbs[j].drc_index

where i < j, then we have a possible behavior change because
xa_for_each_start() may not return a contiguous array slice.  It might
"leap backwards" in the array.  Or it might skip over a chunk of LMBs.
Fontenot, Nathan Jan. 30, 2020, 4:09 p.m. UTC | #3
On 1/29/2020 12:10 PM, Scott Cheloha wrote:
> On Tue, Jan 28, 2020 at 05:56:55PM -0600, Nathan Lynch wrote:
>> Scott Cheloha <cheloha@linux.ibm.com> writes:
>>> LMB lookup is currently an O(n) linear search.  This scales poorly when
>>> there are many LMBs.
>>>
>>> If we cache each LMB by both its base address and its DRC index
>>> in an xarray we can cut lookups to O(log n), greatly accelerating
>>> drmem initialization and memory hotplug.
>>>
>>> This patch introduces two xarrays of of LMBs and fills them during
>>> drmem initialization.  The patch also adds two interfaces for LMB
>>> lookup.
>>
>> Good but can you replace the array of LMBs altogether
>> (drmem_info->lmbs)? xarray allows iteration over the members if needed.
> 
> I don't think we can without potentially changing the current behavior.
> 
> The current behavior in dlpar_memory_{add,remove}_by_ic() is to advance
> linearly through the array from the LMB with the matching DRC index.
> 
> Iteration through the xarray via xa_for_each_start() will return LMBs
> indexed with monotonically increasing DRC indices.> 
> Are they equivalent?  Or can we have an LMB with a smaller DRC index
> appear at a greater offset in the array?
> 
> If the following condition is possible:
> 
> 	drmem_info->lmbs[i].drc_index > drmem_info->lmbs[j].drc_index
> 
> where i < j, then we have a possible behavior change because
> xa_for_each_start() may not return a contiguous array slice.  It might
> "leap backwards" in the array.  Or it might skip over a chunk of LMBs.
> 

The LMB array should have each LMB in monotonically increasing DRC Index
value. Note that this is set up based on the DT property but I don't recall
ever seeing the DT specify LMBs out of order or not being contiguous.

I am not very familiar with xarrays but it appears this should be possible.

-Nathan
Scott Cheloha Feb. 3, 2020, 8:13 p.m. UTC | #4
On Thu, Jan 30, 2020 at 10:09:32AM -0600, Fontenot, Nathan wrote:
> On 1/29/2020 12:10 PM, Scott Cheloha wrote:
> > On Tue, Jan 28, 2020 at 05:56:55PM -0600, Nathan Lynch wrote:
> >> Scott Cheloha <cheloha@linux.ibm.com> writes:
> >>> LMB lookup is currently an O(n) linear search.  This scales poorly when
> >>> there are many LMBs.
> >>>
> >>> If we cache each LMB by both its base address and its DRC index
> >>> in an xarray we can cut lookups to O(log n), greatly accelerating
> >>> drmem initialization and memory hotplug.
> >>>
> >>> This patch introduces two xarrays of of LMBs and fills them during
> >>> drmem initialization.  The patch also adds two interfaces for LMB
> >>> lookup.
> >>
> >> Good but can you replace the array of LMBs altogether
> >> (drmem_info->lmbs)? xarray allows iteration over the members if needed.
> > 
> > I don't think we can without potentially changing the current behavior.
> > 
> > The current behavior in dlpar_memory_{add,remove}_by_ic() is to advance
> > linearly through the array from the LMB with the matching DRC index.
> > 
> > Iteration through the xarray via xa_for_each_start() will return LMBs
> > indexed with monotonically increasing DRC indices.> 
> > Are they equivalent?  Or can we have an LMB with a smaller DRC index
> > appear at a greater offset in the array?
> > 
> > If the following condition is possible:
> > 
> > 	drmem_info->lmbs[i].drc_index > drmem_info->lmbs[j].drc_index
> > 
> > where i < j, then we have a possible behavior change because
> > xa_for_each_start() may not return a contiguous array slice.  It might
> > "leap backwards" in the array.  Or it might skip over a chunk of LMBs.
> > 
> 
> The LMB array should have each LMB in monotonically increasing DRC Index
> value. Note that this is set up based on the DT property but I don't recall
> ever seeing the DT specify LMBs out of order or not being contiguous.

Is that ordering guaranteed by the PAPR or some other spec or is that
just a convention?

Code like drmem_update_dt_v1() makes me very nervous:

static int drmem_update_dt_v1(struct device_node *memory,
                              struct property *prop)
{
        struct property *new_prop;
        struct of_drconf_cell_v1 *dr_cell;
        struct drmem_lmb *lmb;
        u32 *p;

        new_prop = clone_property(prop, prop->length);
        if (!new_prop)
                return -1;

        p = new_prop->value;
        *p++ = cpu_to_be32(drmem_info->n_lmbs);

        dr_cell = (struct of_drconf_cell_v1 *)p;

        for_each_drmem_lmb(lmb) {
                dr_cell->base_addr = cpu_to_be64(lmb->base_addr);
                dr_cell->drc_index = cpu_to_be32(lmb->drc_index);
                dr_cell->aa_index = cpu_to_be32(lmb->aa_index);
                dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb));

                dr_cell++;
        }

        of_update_property(memory, new_prop);
        return 0;
}

If for whatever reason the firmware has a DRC that isn't monotonically
increasing and we update a firmware property at the wrong offset I have
no idea what would happen.

With the array we preserve the order.  Without it we might violate
some assumption the firmware has made.
Scott Cheloha Feb. 4, 2020, 4:19 p.m. UTC | #5
On Tue, Jan 28, 2020 at 05:56:55PM -0600, Nathan Lynch wrote:
> Scott Cheloha <cheloha@linux.ibm.com> writes:
> > LMB lookup is currently an O(n) linear search.  This scales poorly when
> > there are many LMBs.
> >
> > If we cache each LMB by both its base address and its DRC index
> > in an xarray we can cut lookups to O(log n), greatly accelerating
> > drmem initialization and memory hotplug.
> >
> > This patch introduces two xarrays of of LMBs and fills them during
> > drmem initialization.  The patch also adds two interfaces for LMB
> > lookup.
> 
> Good but can you replace the array of LMBs altogether
> (drmem_info->lmbs)? xarray allows iteration over the members if needed.

I would like to try to "solve one problem at a time".

We can fix the linear search performance scaling problems without
removing the array of LMBs.  As I've shown in my diff, we can do it
with minimal change to the existing code.

If it turns out that the PAPR guarantees the ordering of the memory
DRCs then in a subsequent patch (series) we can replace the LMB array
(__drmem_info.lmbs) with an xarray indexed by DRC and use e.g.
xa_for_each() in the hotplug code.
Fontenot, Nathan Feb. 5, 2020, 2:33 p.m. UTC | #6
On 2/3/2020 2:13 PM, Scott Cheloha wrote:
> On Thu, Jan 30, 2020 at 10:09:32AM -0600, Fontenot, Nathan wrote:
>> On 1/29/2020 12:10 PM, Scott Cheloha wrote:
>>> On Tue, Jan 28, 2020 at 05:56:55PM -0600, Nathan Lynch wrote:
>>>> Scott Cheloha <cheloha@linux.ibm.com> writes:
>>>>> LMB lookup is currently an O(n) linear search.  This scales poorly when
>>>>> there are many LMBs.
>>>>>
>>>>> If we cache each LMB by both its base address and its DRC index
>>>>> in an xarray we can cut lookups to O(log n), greatly accelerating
>>>>> drmem initialization and memory hotplug.
>>>>>
>>>>> This patch introduces two xarrays of of LMBs and fills them during
>>>>> drmem initialization.  The patch also adds two interfaces for LMB
>>>>> lookup.
>>>>
>>>> Good but can you replace the array of LMBs altogether
>>>> (drmem_info->lmbs)? xarray allows iteration over the members if needed.
>>>
>>> I don't think we can without potentially changing the current behavior.
>>>
>>> The current behavior in dlpar_memory_{add,remove}_by_ic() is to advance
>>> linearly through the array from the LMB with the matching DRC index.
>>>
>>> Iteration through the xarray via xa_for_each_start() will return LMBs
>>> indexed with monotonically increasing DRC indices.> 
>>> Are they equivalent?  Or can we have an LMB with a smaller DRC index
>>> appear at a greater offset in the array?
>>>
>>> If the following condition is possible:
>>>
>>> 	drmem_info->lmbs[i].drc_index > drmem_info->lmbs[j].drc_index
>>>
>>> where i < j, then we have a possible behavior change because
>>> xa_for_each_start() may not return a contiguous array slice.  It might
>>> "leap backwards" in the array.  Or it might skip over a chunk of LMBs.
>>>
>>
>> The LMB array should have each LMB in monotonically increasing DRC Index
>> value. Note that this is set up based on the DT property but I don't recall
>> ever seeing the DT specify LMBs out of order or not being contiguous.
> 
> Is that ordering guaranteed by the PAPR or some other spec or is that
> just a convention?

From what I remember the PAPR does not specify that DRC indexes are guaranteed
to be contiguous. In past discussions with pHyp developers I had been told that
they always generate contiguous DRC index values but without a specification in
the PAPR that could always break.

-Nathan

> 
> Code like drmem_update_dt_v1() makes me very nervous:
> 
> static int drmem_update_dt_v1(struct device_node *memory,
>                               struct property *prop)
> {
>         struct property *new_prop;
>         struct of_drconf_cell_v1 *dr_cell;
>         struct drmem_lmb *lmb;
>         u32 *p;
> 
>         new_prop = clone_property(prop, prop->length);
>         if (!new_prop)
>                 return -1;
> 
>         p = new_prop->value;
>         *p++ = cpu_to_be32(drmem_info->n_lmbs);
> 
>         dr_cell = (struct of_drconf_cell_v1 *)p;
> 
>         for_each_drmem_lmb(lmb) {
>                 dr_cell->base_addr = cpu_to_be64(lmb->base_addr);
>                 dr_cell->drc_index = cpu_to_be32(lmb->drc_index);
>                 dr_cell->aa_index = cpu_to_be32(lmb->aa_index);
>                 dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb));
> 
>                 dr_cell++;
>         }
> 
>         of_update_property(memory, new_prop);
>         return 0;
> }
> 
> If for whatever reason the firmware has a DRC that isn't monotonically
> increasing and we update a firmware property at the wrong offset I have
> no idea what would happen.
> 
> With the array we preserve the order.  Without it we might violate
> some assumption the firmware has made.
>
Scott Cheloha Feb. 21, 2020, 5:29 p.m. UTC | #7
This patch series introduces two xarrays of LMBs in an effort to speed
up the drmem code and simplify the hotplug code on pseries machines.

The first patch introduces an xarray of LMBs indexed by physical
address.  xa_load() is then used to accelerate LMB lookup during
memory_add_physaddr_to_nid().  The interface is used during boot,
in drmem_init(), and memory hot-add, in dlpar_add_lmb(), on pseries
machines.  Its linear LMB search is a serious bottleneck on larger
machines that xa_load() flattens nicely as shown in the before/after
example.

The second patch introduces a second xarray of LMBs, this one
indexed by DRC index.  The xarray API is leveraged to replace
the custom LMB search/iteration/marking code currently used in
the pseries memory hotplug module.  The result is cleaner and,
again thanks to xa_load(), faster.

v1: One big patch.

v2 changes:
- Split up the big patch from v1 into a series.
- Provide a more dramatic example in patch 1/2 to emphasize the
  linear search bottleneck in memory_add_physaddr_to_nid().
- Expand the use of the xarray API in patch 2/2 to replace more
  custom code.
Nathan Lynch Feb. 21, 2020, 6:11 p.m. UTC | #8
Hi Scott, I've owed you a follow-up on this for a couple weeks, sorry.

Beyond the issue of whether to remove the drmem_info->lmbs array, there
are some other things to address.

Scott Cheloha <cheloha@linux.ibm.com> writes:
> diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
> index 3d76e1c388c2..a37cbe794cdd 100644
> --- a/arch/powerpc/include/asm/drmem.h
> +++ b/arch/powerpc/include/asm/drmem.h
> @@ -88,6 +88,9 @@ static inline bool drmem_lmb_reserved(struct drmem_lmb *lmb)
>  	return lmb->flags & DRMEM_LMB_RESERVED;
>  }
>  
> +struct drmem_lmb *drmem_find_lmb_by_base_addr(unsigned long);
> +struct drmem_lmb *drmem_find_lmb_by_drc_index(unsigned long);

Need identifiers for the function arguments here. checkpatch warns about
this. Also drc_index conventionally is handled as u32 in this code.


> +struct drmem_lmb *drmem_find_lmb_by_base_addr(unsigned long base_addr)
> +{
> +	return xa_load(&drmem_lmb_base_addr, base_addr);
> +}
> +
> +struct drmem_lmb *drmem_find_lmb_by_drc_index(unsigned long drc_index)
> +{
> +	return xa_load(&drmem_lmb_drc_index, drc_index);
> +}
> +
> +static int drmem_lmb_cache_for_lookup(struct drmem_lmb *lmb)

This is called only from __init functions, so it should be __init as well.


> +{
> +	void *ret;
> +
> +	ret = xa_store(&drmem_lmb_base_addr, lmb->base_addr, lmb,  GFP_KERNEL);
> +	if (xa_err(ret))
> +		return xa_err(ret);
> +
> +	ret = xa_store(&drmem_lmb_drc_index, lmb->drc_index, lmb, GFP_KERNEL);
> +	if (xa_err(ret))
> +		return xa_err(ret);
> +
> +	return 0;
> +}
> +
>  static u32 drmem_lmb_flags(struct drmem_lmb *lmb)
>  {
>  	/*
> @@ -364,6 +392,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
>  
>  	for_each_drmem_lmb(lmb) {
>  		read_drconf_v1_cell(lmb, &prop);
> +		if (drmem_lmb_cache_for_lookup(lmb) != 0)
> +			return;
>  		lmb_set_nid(lmb);
>  	}

Failing to record an lmb in the caches shouldn't be cause for silently
aborting this initialization. Future lookups against the caches (should
the system even boot) may fail, but the drmem_lmbs will still be
initialized correctly.

I'd say just ignore (or perhaps log once) xa_store() failures as long as
this code only runs at boot.


> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 50d68d21ddcc..23684d44549f 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -958,27 +958,21 @@ early_param("topology_updates", early_topology_updates);
>  static int hot_add_drconf_scn_to_nid(unsigned long scn_addr)
>  {
>  	struct drmem_lmb *lmb;
> -	unsigned long lmb_size;
> -	int nid = NUMA_NO_NODE;
> -
> -	lmb_size = drmem_lmb_size();
> -
> -	for_each_drmem_lmb(lmb) {
> -		/* skip this block if it is reserved or not assigned to
> -		 * this partition */
> -		if ((lmb->flags & DRCONF_MEM_RESERVED)
> -		    || !(lmb->flags & DRCONF_MEM_ASSIGNED))
> -			continue;
>  
> -		if ((scn_addr < lmb->base_addr)
> -		    || (scn_addr >= (lmb->base_addr + lmb_size)))
> -			continue;
> +	lmb = drmem_find_lmb_by_base_addr(scn_addr);
> +	if (lmb == NULL)
> +		return NUMA_NO_NODE;
>  
> -		nid = of_drconf_to_nid_single(lmb);
> -		break;
> -	}
> +	/*
> +	 * We can't use it if it is reserved or not assigned to
> +	 * this partition.
> +	 */
> +	if (lmb->flags & DRCONF_MEM_RESERVED)
> +		return NUMA_NO_NODE;
> +	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
> +		return NUMA_NO_NODE;
>  
> -	return nid;
> +	return of_drconf_to_nid_single(lmb);
>  }
>  
>  /*
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index c126b94d1943..29bd19831a9a 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -222,17 +222,10 @@ static int get_lmb_range(u32 drc_index, int n_lmbs,
>  			 struct drmem_lmb **start_lmb,
>  			 struct drmem_lmb **end_lmb)
>  {
> -	struct drmem_lmb *lmb, *start, *end;
> +	struct drmem_lmb *start, *end;
>  	struct drmem_lmb *last_lmb;
>  
> -	start = NULL;
> -	for_each_drmem_lmb(lmb) {
> -		if (lmb->drc_index == drc_index) {
> -			start = lmb;
> -			break;
> -		}
> -	}
> -
> +	start = drmem_find_lmb_by_drc_index(drc_index);
>  	if (!start)
>  		return -EINVAL;

The changes to hot_add_drconf_scn_to_nid() and get_lmb_range() look
correct to me.
Nathan Lynch Feb. 21, 2020, 6:28 p.m. UTC | #9
Nathan Lynch <nathanl@linux.ibm.com> writes:

> Hi Scott, I've owed you a follow-up on this for a couple weeks, sorry.

I see v2 was posted as I was writing this, so I'll follow up on that
thread.
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 3d76e1c388c2..a37cbe794cdd 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -88,6 +88,9 @@  static inline bool drmem_lmb_reserved(struct drmem_lmb *lmb)
 	return lmb->flags & DRMEM_LMB_RESERVED;
 }
 
+struct drmem_lmb *drmem_find_lmb_by_base_addr(unsigned long);
+struct drmem_lmb *drmem_find_lmb_by_drc_index(unsigned long);
+
 u64 drmem_lmb_memory_max(void);
 void __init walk_drmem_lmbs(struct device_node *dn,
 			void (*func)(struct drmem_lmb *, const __be32 **));
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 557d9080604d..7c464b0a256e 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -11,9 +11,12 @@ 
 #include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/memblock.h>
+#include <linux/xarray.h>
 #include <asm/prom.h>
 #include <asm/drmem.h>
 
+static DEFINE_XARRAY(drmem_lmb_base_addr);
+static DEFINE_XARRAY(drmem_lmb_drc_index);
 static struct drmem_lmb_info __drmem_info;
 struct drmem_lmb_info *drmem_info = &__drmem_info;
 
@@ -25,6 +28,31 @@  u64 drmem_lmb_memory_max(void)
 	return last_lmb->base_addr + drmem_lmb_size();
 }
 
+struct drmem_lmb *drmem_find_lmb_by_base_addr(unsigned long base_addr)
+{
+	return xa_load(&drmem_lmb_base_addr, base_addr);
+}
+
+struct drmem_lmb *drmem_find_lmb_by_drc_index(unsigned long drc_index)
+{
+	return xa_load(&drmem_lmb_drc_index, drc_index);
+}
+
+static int drmem_lmb_cache_for_lookup(struct drmem_lmb *lmb)
+{
+	void *ret;
+
+	ret = xa_store(&drmem_lmb_base_addr, lmb->base_addr, lmb,  GFP_KERNEL);
+	if (xa_err(ret))
+		return xa_err(ret);
+
+	ret = xa_store(&drmem_lmb_drc_index, lmb->drc_index, lmb, GFP_KERNEL);
+	if (xa_err(ret))
+		return xa_err(ret);
+
+	return 0;
+}
+
 static u32 drmem_lmb_flags(struct drmem_lmb *lmb)
 {
 	/*
@@ -364,6 +392,8 @@  static void __init init_drmem_v1_lmbs(const __be32 *prop)
 
 	for_each_drmem_lmb(lmb) {
 		read_drconf_v1_cell(lmb, &prop);
+		if (drmem_lmb_cache_for_lookup(lmb) != 0)
+			return;
 		lmb_set_nid(lmb);
 	}
 }
@@ -411,6 +441,9 @@  static void __init init_drmem_v2_lmbs(const __be32 *prop)
 			lmb->aa_index = dr_cell.aa_index;
 			lmb->flags = dr_cell.flags;
 
+			if (drmem_lmb_cache_for_lookup(lmb) != 0)
+				return;
+
 			lmb_set_nid(lmb);
 		}
 	}
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 50d68d21ddcc..23684d44549f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -958,27 +958,21 @@  early_param("topology_updates", early_topology_updates);
 static int hot_add_drconf_scn_to_nid(unsigned long scn_addr)
 {
 	struct drmem_lmb *lmb;
-	unsigned long lmb_size;
-	int nid = NUMA_NO_NODE;
-
-	lmb_size = drmem_lmb_size();
-
-	for_each_drmem_lmb(lmb) {
-		/* skip this block if it is reserved or not assigned to
-		 * this partition */
-		if ((lmb->flags & DRCONF_MEM_RESERVED)
-		    || !(lmb->flags & DRCONF_MEM_ASSIGNED))
-			continue;
 
-		if ((scn_addr < lmb->base_addr)
-		    || (scn_addr >= (lmb->base_addr + lmb_size)))
-			continue;
+	lmb = drmem_find_lmb_by_base_addr(scn_addr);
+	if (lmb == NULL)
+		return NUMA_NO_NODE;
 
-		nid = of_drconf_to_nid_single(lmb);
-		break;
-	}
+	/*
+	 * We can't use it if it is reserved or not assigned to
+	 * this partition.
+	 */
+	if (lmb->flags & DRCONF_MEM_RESERVED)
+		return NUMA_NO_NODE;
+	if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
+		return NUMA_NO_NODE;
 
-	return nid;
+	return of_drconf_to_nid_single(lmb);
 }
 
 /*
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c126b94d1943..29bd19831a9a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -222,17 +222,10 @@  static int get_lmb_range(u32 drc_index, int n_lmbs,
 			 struct drmem_lmb **start_lmb,
 			 struct drmem_lmb **end_lmb)
 {
-	struct drmem_lmb *lmb, *start, *end;
+	struct drmem_lmb *start, *end;
 	struct drmem_lmb *last_lmb;
 
-	start = NULL;
-	for_each_drmem_lmb(lmb) {
-		if (lmb->drc_index == drc_index) {
-			start = lmb;
-			break;
-		}
-	}
-
+	start = drmem_find_lmb_by_drc_index(drc_index);
 	if (!start)
 		return -EINVAL;