Patchwork [1/1] powerpc: Update page in counter for CMM

login
register
mail settings
Submitter Brian King
Date Oct. 20, 2008, 10:19 p.m.
Message ID <200810202219.m9KMJiRX000463@d01av04.pok.ibm.com>
Download mbox | patch
Permalink /patch/5144/
State Superseded
Headers show

Comments

Brian King - Oct. 20, 2008, 10:19 p.m.
A new field has been added to the VPA as a method for
the client OS to communicate to firmware the number of
page ins it is performing when running collaborative
memory overcommit. The hypervisor will use this information
to better determine if a partition is experiencing memory
pressure and needs more memory allocated to it.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 arch/powerpc/include/asm/lppaca.h |    3 ++-
 arch/powerpc/kernel/paca.c        |    1 +
 arch/powerpc/mm/fault.c           |    8 ++++++--
 3 files changed, 9 insertions(+), 3 deletions(-)
Milton Miller - Oct. 21, 2008, 4:36 a.m.
X-Patchwork-Id: 5144

On Mon Oct 20, 2008 near 12:19:21 GMT, Brian King wrote:
> 
> A new field has been added to the VPA as a method for
> the client OS to communicate to firmware the number of
> page ins it is performing when running collaborative
> memory overcommit. The hypervisor will use this information
> to better determine if a partition is experiencing memory
> pressure and needs more memory allocated to it.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
> 
>  arch/powerpc/include/asm/lppaca.h |    3 ++-
>  arch/powerpc/kernel/paca.c        |    1 +
>  arch/powerpc/mm/fault.c           |    8 ++++++--
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff -puN arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure arch/powerpc/mm/fault.c
> --- linux-2.6/arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure	2008-10-20 17:13:25.000000000 -0500
> +++ linux-2.6-bjking1/arch/powerpc/mm/fault.c	2008-10-20 17:13:25.000000000 -0500
..
> @@ -318,9 +320,11 @@ good_area:
>  			goto do_sigbus;
>  		BUG();
>  	}
> -	if (ret & VM_FAULT_MAJOR)
> +	if (ret & VM_FAULT_MAJOR) {
>  		current->maj_flt++;
> -	else
> +		if (firmware_has_feature(FW_FEATURE_CMO))
> +			atomic_inc((atomic_t *)(&(get_lppaca()->page_ins)));
> +	} else
>  		current->min_flt++;
>  	up_read(&mm->mmap_sem);
>  	return 0;

(1) why do we need atomic_inc and the hundreds or thousands of cycles to
do the atomic inc in a per-cpu area?

(2) assuming we make this a normal increment, should we keep the
feature test or just do it unconditionally (ie is the additional load
and branch worse that just doing the load and store of the counter --
the address was previously reserved, right?  (no question if
it has to be atomic).


<Ramble things one might consider>

Ben asked if we need to worry about the hypervisor clearing the
count.  If they treat it as a only-incrementing then we don't need
to worry.  And since its only an indicator, then we may not care
about a big count by them interrupting between the load and store

If we are worried about linux preemption, then we need to disable
it to avoid crossing cpu variables or  getting to this point multiple
times.  (I have not looked at the context to see if we are already
disabled).

</Ramble>


milton
Brian King - Oct. 21, 2008, 8:24 p.m.
Milton Miller wrote:
> X-Patchwork-Id: 5144
>> diff -puN arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure arch/powerpc/mm/fault.c
>> --- linux-2.6/arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure	2008-10-20 17:13:25.000000000 -0500
>> +++ linux-2.6-bjking1/arch/powerpc/mm/fault.c	2008-10-20 17:13:25.000000000 -0500
> ..
>> @@ -318,9 +320,11 @@ good_area:
>>  			goto do_sigbus;
>>  		BUG();
>>  	}
>> -	if (ret & VM_FAULT_MAJOR)
>> +	if (ret & VM_FAULT_MAJOR) {
>>  		current->maj_flt++;
>> -	else
>> +		if (firmware_has_feature(FW_FEATURE_CMO))
>> +			atomic_inc((atomic_t *)(&(get_lppaca()->page_ins)));
>> +	} else
>>  		current->min_flt++;
>>  	up_read(&mm->mmap_sem);
>>  	return 0;
> 
> (1) why do we need atomic_inc and the hundreds or thousands of cycles to
> do the atomic inc in a per-cpu area?

We don't. I've now removed it.

> (2) assuming we make this a normal increment, should we keep the
> feature test or just do it unconditionally (ie is the additional load
> and branch worse that just doing the load and store of the counter --
> the address was previously reserved, right?  (no question if
> it has to be atomic).

Simplified patch on the way...

> <Ramble things one might consider>
> 
> Ben asked if we need to worry about the hypervisor clearing the
> count.  If they treat it as a only-incrementing then we don't need
> to worry.  And since its only an indicator, then we may not care
> about a big count by them interrupting between the load and store

This is a read only field from the hypervisor's perspective. They
shouldn't be clearing it.

> If we are worried about linux preemption, then we need to disable
> it to avoid crossing cpu variables or  getting to this point multiple
> times.  (I have not looked at the context to see if we are already
> disabled).

Looks to me like linux preemption is a possibility in this code, so
I've added the code to disable preemption around the increment.

-Brian

Patch

diff -puN arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure arch/powerpc/mm/fault.c
--- linux-2.6/arch/powerpc/mm/fault.c~powerpc_vrm_mm_pressure	2008-10-20 17:13:25.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/mm/fault.c	2008-10-20 17:13:25.000000000 -0500
@@ -30,6 +30,8 @@ 
 #include <linux/kprobes.h>
 #include <linux/kdebug.h>
 
+#include <asm/atomic.h>
+#include <asm/firmware.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/mmu.h>
@@ -318,9 +320,11 @@  good_area:
 			goto do_sigbus;
 		BUG();
 	}
-	if (ret & VM_FAULT_MAJOR)
+	if (ret & VM_FAULT_MAJOR) {
 		current->maj_flt++;
-	else
+		if (firmware_has_feature(FW_FEATURE_CMO))
+			atomic_inc((atomic_t *)(&(get_lppaca()->page_ins)));
+	} else
 		current->min_flt++;
 	up_read(&mm->mmap_sem);
 	return 0;
diff -puN arch/powerpc/include/asm/lppaca.h~powerpc_vrm_mm_pressure arch/powerpc/include/asm/lppaca.h
--- linux-2.6/arch/powerpc/include/asm/lppaca.h~powerpc_vrm_mm_pressure	2008-10-20 17:13:25.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/include/asm/lppaca.h	2008-10-20 17:13:25.000000000 -0500
@@ -133,7 +133,8 @@  struct lppaca {
 //=============================================================================
 // CACHE_LINE_4-5 0x0180 - 0x027F Contains PMC interrupt data
 //=============================================================================
-	u8	pmc_save_area[256];	// PMC interrupt Area           x00-xFF
+	volatile u32 page_ins;		// CMO Hint - # page ins by OS  x00-x04
+	u8	pmc_save_area[252];	// PMC interrupt Area           x04-xFF
 } __attribute__((__aligned__(0x400)));
 
 extern struct lppaca lppaca[];
diff -puN arch/powerpc/kernel/paca.c~powerpc_vrm_mm_pressure arch/powerpc/kernel/paca.c
--- linux-2.6/arch/powerpc/kernel/paca.c~powerpc_vrm_mm_pressure	2008-10-20 17:13:25.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/kernel/paca.c	2008-10-20 17:13:25.000000000 -0500
@@ -37,6 +37,7 @@  struct lppaca lppaca[] = {
 		.end_of_quantum = 0xfffffffffffffffful,
 		.slb_count = 64,
 		.vmxregs_in_use = 0,
+		.page_ins = 0,
 	},
 };