diff mbox series

[1/1] powerpc/kernel: Enables memory hot-remove after reboot on pseries guests

Message ID 20200228060439.52749-1-leonardo@linux.ibm.com
State Superseded
Headers show
Series [1/1] powerpc/kernel: Enables memory hot-remove after reboot on pseries guests | expand

Checks

Context Check Description
snowpatch_ozlabs/needsstable success Patch has no Fixes tags
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (e3a1ab299346a9a415f334e91a78da7ea84aa5a2)

Commit Message

Leonardo Bras Feb. 28, 2020, 6:04 a.m. UTC
While providing guests, it's desirable to resize it's memory on demand.

By now, it's possible to do so by creating a guest with a small base
memory, hot-plugging all the rest, and using 'movable_node' kernel
command-line parameter, which puts all hot-plugged memory in
ZONE_MOVABLE, allowing it to be removed whenever needed.

But there is an issue regarding guest reboot:
If memory is hot-plugged, and then the guest is rebooted, all hot-plugged
memory goes to ZONE_NORMAL, which offers no guaranteed hot-removal.
It usually prevents this memory to be hot-removed from the guest.

It's possible to use device-tree information to fix that behavior, as
it stores flags for LMB ranges on ibm,dynamic-memory-vN.
It involves marking each memblock with the correct flags as hotpluggable
memory, which mm/memblock.c puts in ZONE_MOVABLE during boot if
'movable_node' is passed.

For base memory, qemu assigns these flags for it's LMBs:
(DRCONF_MEM_AI_INVALID | DRCONF_MEM_RESERVED)
For hot-plugged memory, it assigns (DRCONF_MEM_ASSIGNED).

While guest kernel reads the device-tree, early_init_drmem_lmb() is
called for every added LMBs, doing nothing for base memory, and adding
memblocks for hot-plugged memory. Skipping base memory happens here:

if ((lmb->flags & DRCONF_MEM_RESERVED) ||
    !(lmb->flags & DRCONF_MEM_ASSIGNED))
	return;

Marking memblocks added by this function as hotplugable memory
is enough to get the desirable behavior, and should cause no change
if 'movable_node' parameter is not passed to kernel.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 arch/powerpc/kernel/prom.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Bharata B Rao March 4, 2020, 4:43 a.m. UTC | #1
On Fri, Feb 28, 2020 at 11:36 AM Leonardo Bras <leonardo@linux.ibm.com> wrote:
>
> While providing guests, it's desirable to resize it's memory on demand.
>
> By now, it's possible to do so by creating a guest with a small base
> memory, hot-plugging all the rest, and using 'movable_node' kernel
> command-line parameter, which puts all hot-plugged memory in
> ZONE_MOVABLE, allowing it to be removed whenever needed.
>
> But there is an issue regarding guest reboot:
> If memory is hot-plugged, and then the guest is rebooted, all hot-plugged
> memory goes to ZONE_NORMAL, which offers no guaranteed hot-removal.
> It usually prevents this memory to be hot-removed from the guest.
>
> It's possible to use device-tree information to fix that behavior, as
> it stores flags for LMB ranges on ibm,dynamic-memory-vN.
> It involves marking each memblock with the correct flags as hotpluggable
> memory, which mm/memblock.c puts in ZONE_MOVABLE during boot if
> 'movable_node' is passed.
>
> For base memory, qemu assigns these flags for it's LMBs:
> (DRCONF_MEM_AI_INVALID | DRCONF_MEM_RESERVED)
> For hot-plugged memory, it assigns (DRCONF_MEM_ASSIGNED).
>
> While guest kernel reads the device-tree, early_init_drmem_lmb() is
> called for every added LMBs, doing nothing for base memory, and adding
> memblocks for hot-plugged memory. Skipping base memory happens here:
>
> if ((lmb->flags & DRCONF_MEM_RESERVED) ||
>     !(lmb->flags & DRCONF_MEM_ASSIGNED))
>         return;
>
> Marking memblocks added by this function as hotplugable memory
> is enough to get the desirable behavior, and should cause no change
> if 'movable_node' parameter is not passed to kernel.
>
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>  arch/powerpc/kernel/prom.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 6620f37abe73..f4d14c67bf53 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -518,6 +518,8 @@ static void __init early_init_drmem_lmb(struct drmem_lmb *lmb,
>                 DBG("Adding: %llx -> %llx\n", base, size);
>                 if (validate_mem_limit(base, &size))
>                         memblock_add(base, size);
> +
> +               early_init_dt_mark_hotplug_memory_arch(base, size);

Hi,

I tried this a few years back
(https://patchwork.ozlabs.org/patch/800142/) and didn't pursue it
further because at that time, it was felt that the approach might not
work for PowerVM guests, because all the present memory except RMA
gets marked as hot-pluggable by PowerVM. This discussion is not
present in the above thread, but during my private discussions with
Reza and Nathan, it was noted that making all that memory as MOVABLE
is not preferable for PowerVM guests as we might run out of memory for
kernel allocations.

Regards,
Bharata.
Leonardo Bras March 4, 2020, 7:18 a.m. UTC | #2
Hello Bharata, thanks for this feedback!

On Wed, 2020-03-04 at 10:13 +0530, Bharata B Rao wrote:
> Hi,
> 
> I tried this a few years back
> (https://patchwork.ozlabs.org/patch/800142/) and didn't pursue it
> further because at that time, it was felt that the approach might not
> work for PowerVM guests, because all the present memory except RMA
> gets marked as hot-pluggable by PowerVM. This discussion is not
> present in the above thread, but during my private discussions with
> Reza and Nathan, it was noted that making all that memory as MOVABLE
> is not preferable for PowerVM guests as we might run out of memory for
> kernel allocations.

Humm, this makes sense.
But with mu change, these pieces of memory only get into ZONE_MOVABLE
if the boot parameter 'movable_node' gets passed to guest kernel. 

So, even if we are unable to sort out some flag combination that work
fine for both use-cases, if PowerVM don't pass 'movable_node' as boot
parameter to kernel, it will behave just as today.

What are your thoughts on that?

Best regards,

Leonardo Bras
Leonardo Bras March 4, 2020, 10:05 p.m. UTC | #3
On Wed, 2020-03-04 at 04:18 -0300, Leonardo Bras wrote:
> Humm, this makes sense.
> But with mu change, these pieces of memory only get into ZONE_MOVABLE
> if the boot parameter 'movable_node' gets passed to guest kernel. 

Humm, I think your patch also does that.

> So, even if we are unable to sort out some flag combination that work
> fine for both use-cases, if PowerVM don't pass 'movable_node' as boot
> parameter to kernel, it will behave just as today.

Also, another option would be adding a new 'removable' flag, given it
has a lot of free bytes. It would only be passed by qemu, so we would
be safe with PowerVM. 

Then we would have 
+	if(lmb->flags & DRCONF_MEM_REMOVABLE)	
+		early_init_dt_mark_hotplug_memory_arch(base, size);

Do you know if it's possible?
We would need to update the LOPAPR? 

Leonardo
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 6620f37abe73..f4d14c67bf53 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -518,6 +518,8 @@  static void __init early_init_drmem_lmb(struct drmem_lmb *lmb,
 		DBG("Adding: %llx -> %llx\n", base, size);
 		if (validate_mem_limit(base, &size))
 			memblock_add(base, size);
+
+		early_init_dt_mark_hotplug_memory_arch(base, size);
 	} while (--rngs);
 }
 #endif /* CONFIG_PPC_PSERIES */