diff mbox series

[RFC] powerpc/pseries: exploit H_PAGE_SET_UNUSED for partition migration

Message ID 20240111-h_page_set_unused-for-lpm-v1-1-cd56184ad608@linux.ibm.com (mailing list archive)
State New
Headers show
Series [RFC] powerpc/pseries: exploit H_PAGE_SET_UNUSED for partition migration | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.

Commit Message

Nathan Lynch via B4 Relay Jan. 11, 2024, 11:47 p.m. UTC
From: Nathan Lynch <nathanl@linux.ibm.com>

Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has
been tied to the cooperative memory overcommit (CMO) platform feature,
the flag also is treated by the PowerVM hypervisor as a hint that the
page contents need not be copied to the destination during a live
partition migration.

Use the "ibm,migratable-partition" root node property to determine
whether this partition/guest can be migrated. Mark freed pages unused
if so (or if CMO is in use, as before).

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
Several things yet to improve here:

* powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled
  from CONFIG_PPC_SMLPAR.

* powerpc's arch_free_page() could be made to use a static key if
  justified.

* I have not yet measured the overhead this introduces, nor have I
  measured the benefit to a live migration.

To date, I have smoke tested it by doing a live migration and
performing a build on a kernel with the change, to ensure it doesn't
introduce obvious memory corruption or anything. It hasn't blown up
yet :-)

This will be a possibly significant behavior change in that we will be
flagging pages unused where we typically did not before. Until now,
having CMO enabled was the only way to do this, and I don't think that
feature is used all that much?

Posting this as RFC to see if there are any major concerns.
---
 arch/powerpc/platforms/pseries/lpar.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)


---
base-commit: 44a1aad2fe6c10bfe0589d8047057b10a4c18a19
change-id: 20240111-h_page_set_unused-for-lpm-d9027c672fe8

Best regards,

Comments

Nathan Lynch Feb. 19, 2024, 10:25 p.m. UTC | #1
>
> Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has
> been tied to the cooperative memory overcommit (CMO) platform feature,
> the flag also is treated by the PowerVM hypervisor as a hint that the
> page contents need not be copied to the destination during a live
> partition migration.
>
> Use the "ibm,migratable-partition" root node property to determine
> whether this partition/guest can be migrated. Mark freed pages unused
> if so (or if CMO is in use, as before).
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
> Several things yet to improve here:
>
> * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled
>   from CONFIG_PPC_SMLPAR.
>
> * powerpc's arch_free_page() could be made to use a static key if
>   justified.
>
> * I have not yet measured the overhead this introduces, nor have I
>   measured the benefit to a live migration.
>
> To date, I have smoke tested it by doing a live migration and
> performing a build on a kernel with the change, to ensure it doesn't
> introduce obvious memory corruption or anything.

An update on this. I've been working on quantifying the benefit, and
I've now seen memory corruption after partition migrations, with several
programs getting mysterious SIGSEGV/SIGILL:

mobility: calling ibm,suspend-me on CPU 7
mobility: CPU 7 waking all threads
mobility: waiting for memory transfer to complete...
touch[10988]: segfault (11) at 8 nip 7fff8fc2f9fc lr 7fff8fc26684 code 1 in ld-2.28.so[7fff8fc20000+30000]
touch[10987]: segfault (11) at 8 nip 7fff86f5f9fc lr 7fff86f56684 code 1 in ld-2.28.so[7fff86f50000+30000]
touch[10988]: code: 3d22ffff 81297470 71290020 4082254c e93d00f0 2fa90000 f93f00b0 409e2458 
touch[10988]: code: e93d00f8 e95d0068 eb1d0000 2fa90000 <ea2a0008> 419e00f8 e93d0158 e91d0058 
touch[10987]: code: 3d22ffff 81297470 71290020 4082254c e93d00f0 2fa90000 f93f00b0 409e2458 
touch[10987]: code: e93d00f8 e95d0068 eb1d0000 2fa90000 <ea2a0008> 419e00f8 e93d0158 e91d0058 

Maybe I had some debug options enabled that masked this before, or just
got lucky.

In any case, it seems obvious now that for this to be safe,
powerpc/pseries needs to implement arch_alloc_page() to undo setting the
"unused" flag.


>
> This will be a possibly significant behavior change in that we will be
> flagging pages unused where we typically did not before. Until now,
> having CMO enabled was the only way to do this, and I don't think that
> feature is used all that much?
>
> Posting this as RFC to see if there are any major concerns.
> ---
>  arch/powerpc/platforms/pseries/lpar.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 4561667832ed..b264371d8e12 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -16,6 +16,7 @@
>  #include <linux/export.h>
>  #include <linux/jump_label.h>
>  #include <linux/delay.h>
> +#include <linux/of.h>
>  #include <linux/stop_machine.h>
>  #include <linux/spinlock.h>
>  #include <linux/cpuhotplug.h>
> @@ -1772,17 +1773,25 @@ static void pSeries_set_page_state(struct page *page, int order,
>  	}
>  }
>  
> +static bool migratable_partition;
> +
>  void arch_free_page(struct page *page, int order)
>  {
> -	if (radix_enabled())
> -		return;
> -	if (!cmo_free_hint_flag || !firmware_has_feature(FW_FEATURE_CMO))
> -		return;
> -
> -	pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED);
> +	if (migratable_partition ||
> +	    (firmware_has_feature(FW_FEATURE_CMO) && cmo_free_hint_flag))
> +		pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED);
>  }
>  EXPORT_SYMBOL(arch_free_page);
>  
> +static int __init check_migratable_partition(void)
> +{
> +	struct device_node *root = of_find_node_by_path("/");
> +	migratable_partition = !!of_find_property(root, "ibm,migratable-partition", NULL);
> +	of_node_put(root);
> +	return 0;
> +}
> +machine_device_initcall(pseries, check_migratable_partition);
> +
>  #endif /* CONFIG_PPC_SMLPAR */
>  #endif /* CONFIG_PPC_BOOK3S_64 */
>  
>
Michael Ellerman Feb. 20, 2024, 12:16 p.m. UTC | #2
Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
writes:
> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has
> been tied to the cooperative memory overcommit (CMO) platform feature,
> the flag also is treated by the PowerVM hypervisor as a hint that the
> page contents need not be copied to the destination during a live
> partition migration.
>
> Use the "ibm,migratable-partition" root node property to determine
> whether this partition/guest can be migrated. Mark freed pages unused
> if so (or if CMO is in use, as before).
>
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
> Several things yet to improve here:
>
> * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled
>   from CONFIG_PPC_SMLPAR.
>
> * powerpc's arch_free_page() could be made to use a static key if
>   justified.
>
> * I have not yet measured the overhead this introduces, nor have I
>   measured the benefit to a live migration.
>
> To date, I have smoke tested it by doing a live migration and
> performing a build on a kernel with the change, to ensure it doesn't
> introduce obvious memory corruption or anything. It hasn't blown up
> yet :-)
>
> This will be a possibly significant behavior change in that we will be
> flagging pages unused where we typically did not before. Until now,
> having CMO enabled was the only way to do this, and I don't think that
> feature is used all that much?

Yeah AFAIK it has to be explicitly configured and enabled via the HMC,
so doesn't get much testing or usage.

> Posting this as RFC to see if there are any major concerns.
 
My worry is that this will add overhead for everyone in normal usage, an
hcall per freed set of pages, whereas the benefit is only seen when a
migration happens.

But that does depend on how often arch_free_page() gets called in normal
usage, which I don't know offhand.

cheers
Nathan Lynch Feb. 20, 2024, 4:20 p.m. UTC | #3
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
> writes:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has
>> been tied to the cooperative memory overcommit (CMO) platform feature,
>> the flag also is treated by the PowerVM hypervisor as a hint that the
>> page contents need not be copied to the destination during a live
>> partition migration.
>>
>> Use the "ibm,migratable-partition" root node property to determine
>> whether this partition/guest can be migrated. Mark freed pages unused
>> if so (or if CMO is in use, as before).
>>
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>> ---
>> Several things yet to improve here:
>>
>> * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled
>>   from CONFIG_PPC_SMLPAR.
>>
>> * powerpc's arch_free_page() could be made to use a static key if
>>   justified.
>>
>> * I have not yet measured the overhead this introduces, nor have I
>>   measured the benefit to a live migration.
>>
>> To date, I have smoke tested it by doing a live migration and
>> performing a build on a kernel with the change, to ensure it doesn't
>> introduce obvious memory corruption or anything. It hasn't blown up
>> yet :-)
>>
>> This will be a possibly significant behavior change in that we will be
>> flagging pages unused where we typically did not before. Until now,
>> having CMO enabled was the only way to do this, and I don't think that
>> feature is used all that much?
>
> Yeah AFAIK it has to be explicitly configured and enabled via the HMC,
> so doesn't get much testing or usage.
>
>> Posting this as RFC to see if there are any major concerns.
>  
> My worry is that this will add overhead for everyone in normal usage, an
> hcall per freed set of pages, whereas the benefit is only seen when a
> migration happens.
>
> But that does depend on how often arch_free_page() gets called in normal
> usage, which I don't know offhand.

Yes, and as I said in my followup yesterday:

>> for this to be safe, powerpc/pseries needs to implement
>> arch_alloc_page() to undo setting the "unused" flag.

So, perhaps more significantly, we'd also incur an hcall per
arch_alloc_page() with the most straightforward implementation that
doesn't eat data (unlike this version!).

Nevertheless I'll plan on doing that for the next iteration to see if I
can measure the overhead and benefit, with the expectation that we'll
ultimately need a more sophisticated design.
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 4561667832ed..b264371d8e12 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -16,6 +16,7 @@ 
 #include <linux/export.h>
 #include <linux/jump_label.h>
 #include <linux/delay.h>
+#include <linux/of.h>
 #include <linux/stop_machine.h>
 #include <linux/spinlock.h>
 #include <linux/cpuhotplug.h>
@@ -1772,17 +1773,25 @@  static void pSeries_set_page_state(struct page *page, int order,
 	}
 }
 
+static bool migratable_partition;
+
 void arch_free_page(struct page *page, int order)
 {
-	if (radix_enabled())
-		return;
-	if (!cmo_free_hint_flag || !firmware_has_feature(FW_FEATURE_CMO))
-		return;
-
-	pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED);
+	if (migratable_partition ||
+	    (firmware_has_feature(FW_FEATURE_CMO) && cmo_free_hint_flag))
+		pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED);
 }
 EXPORT_SYMBOL(arch_free_page);
 
+static int __init check_migratable_partition(void)
+{
+	struct device_node *root = of_find_node_by_path("/");
+	migratable_partition = !!of_find_property(root, "ibm,migratable-partition", NULL);
+	of_node_put(root);
+	return 0;
+}
+machine_device_initcall(pseries, check_migratable_partition);
+
 #endif /* CONFIG_PPC_SMLPAR */
 #endif /* CONFIG_PPC_BOOK3S_64 */