diff mbox

[RFCv2,9/9] pseries: Automatically resize HPT for memory hot add/remove

Message ID 1454045043-25545-10-git-send-email-david@gibson.dropbear.id.au (mailing list archive)
State Superseded
Headers show

Commit Message

David Gibson Jan. 29, 2016, 5:24 a.m. UTC
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.

This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed.  This tries to always keep the HPT at
a reasonable size for our current memory size.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/include/asm/sparsemem.h |  1 +
 arch/powerpc/mm/hash_utils_64.c      | 29 +++++++++++++++++++++++++++++
 arch/powerpc/mm/mem.c                |  4 ++++
 3 files changed, 34 insertions(+)

Comments

Anshuman Khandual Feb. 1, 2016, 8:51 a.m. UTC | #1
On 01/29/2016 10:54 AM, David Gibson wrote:
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +void resize_hpt_for_hotplug(unsigned long new_mem_size)
> +{
> +	unsigned target_hpt_shift;
> +
> +	if (!ppc_md.resize_hpt)
> +		return;
> +
> +	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
> +
> +	/*
> +	 * To avoid lots of HPT resizes if memory size is fluctuating
> +	 * across a boundary, we deliberately have some hysterisis


What do you mean by 'memory size is fluctuating across a boundary' ?
Through memory hotplug interface ? Why some one will do that ? I
can understand why we dont have this check in the sysfs debug path
as we would like to test any memory HPT re sizing scenario we want
in any sequence of increase or decrease we want.

Overall the RFC V2 looks pretty good. Looking forward to see the
host side of the code for this feature.
David Gibson Feb. 1, 2016, 10:55 a.m. UTC | #2
On Mon, Feb 01, 2016 at 02:21:46PM +0530, Anshuman Khandual wrote:
> On 01/29/2016 10:54 AM, David Gibson wrote:
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> > +void resize_hpt_for_hotplug(unsigned long new_mem_size)
> > +{
> > +	unsigned target_hpt_shift;
> > +
> > +	if (!ppc_md.resize_hpt)
> > +		return;
> > +
> > +	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
> > +
> > +	/*
> > +	 * To avoid lots of HPT resizes if memory size is fluctuating
> > +	 * across a boundary, we deliberately have some hysterisis
> 
> 
> What do you mean by 'memory size is fluctuating across a boundary' ?
> Through memory hotplug interface ? Why some one will do that ?

I was thinking it might be possible to have some management system
that automatically adjusts memory size based on load, and if that
happened to land on a boundary you could get nasty behaviour.

> I
> can understand why we dont have this check in the sysfs debug path
> as we would like to test any memory HPT re sizing scenario we want
> in any sequence of increase or decrease we want.
> 
> Overall the RFC V2 looks pretty good. Looking forward to see the
> host side of the code for this feature.

The qemu host side has been posted to qemu-devel@nongnu.org already.
I haven't started on a KVM HV implementation yet.
Paul Mackerras Feb. 8, 2016, 6:01 a.m. UTC | #3
On Fri, Jan 29, 2016 at 04:24:03PM +1100, David Gibson wrote:
> We've now implemented code in the pseries platform to use the new PAPR
> interface to allow resizing the hash page table (HPT) at runtime.
> 
> This patch uses that interface to automatically attempt to resize the HPT
> when memory is hot added or removed.  This tries to always keep the HPT at
> a reasonable size for our current memory size.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Paul Mackerras <paulus@samba.org>
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
index f6fc0ee..737335c 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -16,6 +16,7 @@ 
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+extern void resize_hpt_for_hotplug(unsigned long new_mem_size);
 extern int create_section_mapping(unsigned long start, unsigned long end);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 #ifdef CONFIG_NUMA
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 882e409..18cc851 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -638,6 +638,35 @@  static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+void resize_hpt_for_hotplug(unsigned long new_mem_size)
+{
+	unsigned target_hpt_shift;
+
+	if (!ppc_md.resize_hpt)
+		return;
+
+	target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
+
+	/*
+	 * To avoid lots of HPT resizes if memory size is fluctuating
+	 * across a boundary, we deliberately have some hysterisis
+	 * here: we immediately increase the HPT size if the target
+	 * shift exceeds the current shift, but we won't attempt to
+	 * reduce unless the target shift is at least 2 below the
+	 * current shift
+	 */
+	if ((target_hpt_shift > ppc64_pft_size)
+	    || (target_hpt_shift < (ppc64_pft_size - 1))) {
+		int rc;
+
+		rc = ppc_md.resize_hpt(target_hpt_shift);
+		if (rc)
+			printk(KERN_WARNING
+			       "Unable to resize hash page table to target order %d: %d\n",
+			       target_hpt_shift, rc);
+	}
+}
+
 int create_section_mapping(unsigned long start, unsigned long end)
 {
 	int rc = htab_bolt_mapping(start, end, __pa(start),
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8ffc1e2..e77f36c 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -121,6 +121,8 @@  int arch_add_memory(int nid, u64 start, u64 size, bool for_device)
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int rc;
 
+	resize_hpt_for_hotplug(memblock_phys_mem_size());
+
 	pgdata = NODE_DATA(nid);
 
 	start = (unsigned long)__va(start);
@@ -161,6 +163,8 @@  int arch_remove_memory(u64 start, u64 size)
 	 */
 	vm_unmap_aliases();
 
+	resize_hpt_for_hotplug(memblock_phys_mem_size());
+
 	return ret;
 }
 #endif