Message ID | 1434509021-24168-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Michael Ellerman |
Headers | show |
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes: Hi Scott, > Current swap encoding in pte can't support large pfns > above 4TB. Change the swap encoding such that we put > the swap type in the PTE bits. Also add build checks > to make sure we don't overlap with HPTEFLAGS. > Can you please review this w.r.t 64bit booke ? -aneesh
On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote: > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes: > > > Hi Scott, > > > Current swap encoding in pte can't support large pfns > > above 4TB. Change the swap encoding such that we put > > the swap type in the PTE bits. Also add build checks > > to make sure we don't overlap with HPTEFLAGS. > > > > Can you please review this w.r.t 64bit booke ? I booted it on our p5020ds FWIW. cheers
On Wed, 2015-06-17 at 19:45 +1000, Michael Ellerman wrote: > On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote: > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes: > > > > > > Hi Scott, > > > > > Current swap encoding in pte can't support large pfns > > > above 4TB. Change the swap encoding such that we put > > > the swap type in the PTE bits. Also add build checks > > > to make sure we don't overlap with HPTEFLAGS. > > > > > > > Can you please review this w.r.t 64bit booke ? It looks OK. I'm curious why _PAGE_BIT_SWAP_TYPE is 2 -- it seems like it could be any value >= 1 that isn't large enough to cause a conflict. Does something get stored in that second bit? > I booted it on our p5020ds FWIW. Actively using swap? -Scott
On Wed, 2015-06-17 at 16:14 -0500, Scott Wood wrote: > On Wed, 2015-06-17 at 19:45 +1000, Michael Ellerman wrote: > > On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote: > > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes: > > > > > > > > > Hi Scott, > > > > > > > Current swap encoding in pte can't support large pfns > > > > above 4TB. Change the swap encoding such that we put > > > > the swap type in the PTE bits. Also add build checks > > > > to make sure we don't overlap with HPTEFLAGS. > > > > > > > > > > Can you please review this w.r.t 64bit booke ? > > It looks OK. > > I'm curious why _PAGE_BIT_SWAP_TYPE is 2 -- it seems like it could be > any value >= 1 that isn't large enough to cause a conflict. Does > something get stored in that second bit? > > > I booted it on our p5020ds FWIW. > > Actively using swap? Yeah good point, it wasn't. I ran 4 make -j kernel builds in parallel which seemed to do the trick: total used free shared buffers cached Mem: 4053952 4038324 15628 344 2880 26932 -/+ buffers/cache: 4008512 45440 Swap: 7918588 6102800 1815788 Of course it went OOM not long after that, but it's still pinging and it's running fine, just spending all its time printing the OOM kill info to the console. cheers
Scott Wood <scottwood@freescale.com> writes: > On Wed, 2015-06-17 at 19:45 +1000, Michael Ellerman wrote: >> On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote: >> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes: >> > >> > >> > Hi Scott, >> > >> > > Current swap encoding in pte can't support large pfns >> > > above 4TB. Change the swap encoding such that we put >> > > the swap type in the PTE bits. Also add build checks >> > > to make sure we don't overlap with HPTEFLAGS. >> > > >> > >> > Can you please review this w.r.t 64bit booke ? > > It looks OK. > > I'm curious why _PAGE_BIT_SWAP_TYPE is 2 -- it seems like it could be > any value >= 1 that isn't large enough to cause a conflict. Does > something get stored in that second bit? Yes, we should be able to use >= 1. But then our _PAGE_USER is also used to indicate prot_none. It should really be _PAGE_PRESENT set and _PAGE_USER cleared. So for the swap case we should be ok to use _PAGE_USER. But i didn't want to audit all the asm code. So i decided to leave _PAGE_USER as it is. > >> I booted it on our p5020ds FWIW. > > Actively using swap? > -aneesh
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index 43e6ad424c7f..954ae1201e42 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -347,11 +347,27 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry) pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e)) /* Encode and de-code a swap entry */ -#define __swp_type(entry) (((entry).val >> 1) & 0x3f) -#define __swp_offset(entry) ((entry).val >> 8) -#define __swp_entry(type, offset) ((swp_entry_t){((type)<< 1)|((offset)<<8)}) -#define __pte_to_swp_entry(pte) ((swp_entry_t){pte_val(pte) >> PTE_RPN_SHIFT}) -#define __swp_entry_to_pte(x) ((pte_t) { (x).val << PTE_RPN_SHIFT }) +#define MAX_SWAPFILES_CHECK() do { \ + BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \ + /* \ + * Don't have overlapping bits with _PAGE_HPTEFLAGS \ + * We filter HPTEFLAGS on set_pte. \ + */ \ + BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \ + } while (0) +/* + * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT; + */ +#define SWP_TYPE_BITS 5 +#define __swp_type(x) (((x).val >> _PAGE_BIT_SWAP_TYPE) \ + & ((1UL << SWP_TYPE_BITS) - 1)) +#define __swp_offset(x) ((x).val >> PTE_RPN_SHIFT) +#define __swp_entry(type, offset) ((swp_entry_t) { \ + ((type) << _PAGE_BIT_SWAP_TYPE) \ + | ((offset) << PTE_RPN_SHIFT) }) + +#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) +#define __swp_entry_to_pte(x) __pte((x).val) void pgtable_cache_add(unsigned shift, void (*ctor)(void *)); void pgtable_cache_init(void); diff --git a/arch/powerpc/include/asm/pte-book3e.h b/arch/powerpc/include/asm/pte-book3e.h index 91a704952ca1..8d8473278d91 100644 --- a/arch/powerpc/include/asm/pte-book3e.h +++ b/arch/powerpc/include/asm/pte-book3e.h @@ -11,6 +11,7 @@ /* Architected bits */ #define _PAGE_PRESENT 0x000001 /* software: pte contains a translation */ #define _PAGE_SW1 0x000002 +#define _PAGE_BIT_SWAP_TYPE 2 #define _PAGE_BAP_SR 0x000004 #define _PAGE_BAP_UR 0x000008 #define _PAGE_BAP_SW 0x000010 diff --git a/arch/powerpc/include/asm/pte-hash64.h b/arch/powerpc/include/asm/pte-hash64.h index fc852f7e7b3a..ef612c160da7 100644 --- a/arch/powerpc/include/asm/pte-hash64.h +++ b/arch/powerpc/include/asm/pte-hash64.h @@ -16,6 +16,7 @@ */ #define _PAGE_PRESENT 0x0001 /* software: pte contains a translation */ #define _PAGE_USER 0x0002 /* matches one of the PP bits */ +#define _PAGE_BIT_SWAP_TYPE 2 #define _PAGE_EXEC 0x0004 /* No execute on POWER4 and newer (we invert) */ #define _PAGE_GUARDED 0x0008 /* We can derive Memory coherence from _PAGE_NO_CACHE */
Current swap encoding in pte can't support large pfns above 4TB. Change the swap encoding such that we put the swap type in the PTE bits. Also add build checks to make sure we don't overlap with HPTEFLAGS. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- arch/powerpc/include/asm/pgtable-ppc64.h | 26 +++++++++++++++++++++----- arch/powerpc/include/asm/pte-book3e.h | 1 + arch/powerpc/include/asm/pte-hash64.h | 1 + 3 files changed, 23 insertions(+), 5 deletions(-)