diff mbox

[v2,2/2] powerpc/mm: Change the swap encoding in pte.

Message ID 1434509021-24168-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive)
State Accepted
Delegated to: Michael Ellerman
Headers show

Commit Message

Aneesh Kumar K.V June 17, 2015, 2:43 a.m. UTC
Current swap encoding in pte can't support large pfns
above 4TB. Change the swap encoding such that we put
the swap type in the PTE bits. Also add build checks
to make sure we don't overlap with HPTEFLAGS.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 26 +++++++++++++++++++++-----
 arch/powerpc/include/asm/pte-book3e.h    |  1 +
 arch/powerpc/include/asm/pte-hash64.h    |  1 +
 3 files changed, 23 insertions(+), 5 deletions(-)

Comments

Aneesh Kumar K.V June 17, 2015, 2:51 a.m. UTC | #1
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:


Hi Scott,

> Current swap encoding in pte can't support large pfns
> above 4TB. Change the swap encoding such that we put
> the swap type in the PTE bits. Also add build checks
> to make sure we don't overlap with HPTEFLAGS.
>

Can you please review this w.r.t 64bit booke ? 

-aneesh
Michael Ellerman June 17, 2015, 9:45 a.m. UTC | #2
On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> 
> Hi Scott,
> 
> > Current swap encoding in pte can't support large pfns
> > above 4TB. Change the swap encoding such that we put
> > the swap type in the PTE bits. Also add build checks
> > to make sure we don't overlap with HPTEFLAGS.
> >
> 
> Can you please review this w.r.t 64bit booke ? 

I booted it on our p5020ds FWIW.

cheers
Scott Wood June 17, 2015, 9:14 p.m. UTC | #3
On Wed, 2015-06-17 at 19:45 +1000, Michael Ellerman wrote:
> On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote:
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> > 
> > 
> > Hi Scott,
> > 
> > > Current swap encoding in pte can't support large pfns
> > > above 4TB. Change the swap encoding such that we put
> > > the swap type in the PTE bits. Also add build checks
> > > to make sure we don't overlap with HPTEFLAGS.
> > > 
> > 
> > Can you please review this w.r.t 64bit booke ? 

It looks OK.

I'm curious why _PAGE_BIT_SWAP_TYPE is 2 -- it seems like it could be 
any value >= 1 that isn't large enough to cause a conflict. Does 
something get stored in that second bit?

> I booted it on our p5020ds FWIW.

Actively using swap?

-Scott
Michael Ellerman June 18, 2015, 4:16 a.m. UTC | #4
On Wed, 2015-06-17 at 16:14 -0500, Scott Wood wrote:
> On Wed, 2015-06-17 at 19:45 +1000, Michael Ellerman wrote:
> > On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote:
> > > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> > > 
> > > 
> > > Hi Scott,
> > > 
> > > > Current swap encoding in pte can't support large pfns
> > > > above 4TB. Change the swap encoding such that we put
> > > > the swap type in the PTE bits. Also add build checks
> > > > to make sure we don't overlap with HPTEFLAGS.
> > > > 
> > > 
> > > Can you please review this w.r.t 64bit booke ? 
> 
> It looks OK.
> 
> I'm curious why _PAGE_BIT_SWAP_TYPE is 2 -- it seems like it could be 
> any value >= 1 that isn't large enough to cause a conflict. Does 
> something get stored in that second bit?
> 
> > I booted it on our p5020ds FWIW.
> 
> Actively using swap?

Yeah good point, it wasn't.

I ran 4 make -j kernel builds in parallel which seemed to do the trick:

               total       used       free     shared    buffers     cached
  Mem:       4053952    4038324      15628        344       2880      26932
  -/+ buffers/cache:    4008512      45440
  Swap:      7918588    6102800    1815788


Of course it went OOM not long after that, but it's still pinging and it's
running fine, just spending all its time printing the OOM kill info to the
console.

cheers
Aneesh Kumar K.V June 18, 2015, 5:20 a.m. UTC | #5
Scott Wood <scottwood@freescale.com> writes:

> On Wed, 2015-06-17 at 19:45 +1000, Michael Ellerman wrote:
>> On Wed, 2015-06-17 at 08:21 +0530, Aneesh Kumar K.V wrote:
>> > "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>> > 
>> > 
>> > Hi Scott,
>> > 
>> > > Current swap encoding in pte can't support large pfns
>> > > above 4TB. Change the swap encoding such that we put
>> > > the swap type in the PTE bits. Also add build checks
>> > > to make sure we don't overlap with HPTEFLAGS.
>> > > 
>> > 
>> > Can you please review this w.r.t 64bit booke ? 
>
> It looks OK.
>
> I'm curious why _PAGE_BIT_SWAP_TYPE is 2 -- it seems like it could be 
> any value >= 1 that isn't large enough to cause a conflict. Does 
> something get stored in that second bit?

Yes, we should be able to use >= 1. But then our _PAGE_USER is also used
to indicate prot_none. It should really be _PAGE_PRESENT set and
_PAGE_USER cleared. So for the swap case we should be ok to use
_PAGE_USER. But i didn't want to audit all the asm code. So i decided to
leave _PAGE_USER as it is.


>
>> I booted it on our p5020ds FWIW.
>
> Actively using swap?
>

-aneesh
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 43e6ad424c7f..954ae1201e42 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -347,11 +347,27 @@  static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
 	pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 
 /* Encode and de-code a swap entry */
-#define __swp_type(entry)	(((entry).val >> 1) & 0x3f)
-#define __swp_offset(entry)	((entry).val >> 8)
-#define __swp_entry(type, offset) ((swp_entry_t){((type)<< 1)|((offset)<<8)})
-#define __pte_to_swp_entry(pte)	((swp_entry_t){pte_val(pte) >> PTE_RPN_SHIFT})
-#define __swp_entry_to_pte(x)	((pte_t) { (x).val << PTE_RPN_SHIFT })
+#define MAX_SWAPFILES_CHECK() do { \
+	BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS); \
+	/*							\
+	 * Don't have overlapping bits with _PAGE_HPTEFLAGS	\
+	 * We filter HPTEFLAGS on set_pte.			\
+	 */							\
+	BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
+	} while (0)
+/*
+ * on pte we don't need handle RADIX_TREE_EXCEPTIONAL_SHIFT;
+ */
+#define SWP_TYPE_BITS 5
+#define __swp_type(x)		(((x).val >> _PAGE_BIT_SWAP_TYPE) \
+				& ((1UL << SWP_TYPE_BITS) - 1))
+#define __swp_offset(x)		((x).val >> PTE_RPN_SHIFT)
+#define __swp_entry(type, offset)	((swp_entry_t) { \
+					((type) << _PAGE_BIT_SWAP_TYPE) \
+					| ((offset) << PTE_RPN_SHIFT) })
+
+#define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
+#define __swp_entry_to_pte(x)		__pte((x).val)
 
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
 void pgtable_cache_init(void);
diff --git a/arch/powerpc/include/asm/pte-book3e.h b/arch/powerpc/include/asm/pte-book3e.h
index 91a704952ca1..8d8473278d91 100644
--- a/arch/powerpc/include/asm/pte-book3e.h
+++ b/arch/powerpc/include/asm/pte-book3e.h
@@ -11,6 +11,7 @@ 
 /* Architected bits */
 #define _PAGE_PRESENT	0x000001 /* software: pte contains a translation */
 #define _PAGE_SW1	0x000002
+#define _PAGE_BIT_SWAP_TYPE	2
 #define _PAGE_BAP_SR	0x000004
 #define _PAGE_BAP_UR	0x000008
 #define _PAGE_BAP_SW	0x000010
diff --git a/arch/powerpc/include/asm/pte-hash64.h b/arch/powerpc/include/asm/pte-hash64.h
index fc852f7e7b3a..ef612c160da7 100644
--- a/arch/powerpc/include/asm/pte-hash64.h
+++ b/arch/powerpc/include/asm/pte-hash64.h
@@ -16,6 +16,7 @@ 
  */
 #define _PAGE_PRESENT		0x0001 /* software: pte contains a translation */
 #define _PAGE_USER		0x0002 /* matches one of the PP bits */
+#define _PAGE_BIT_SWAP_TYPE	2
 #define _PAGE_EXEC		0x0004 /* No execute on POWER4 and newer (we invert) */
 #define _PAGE_GUARDED		0x0008
 /* We can derive Memory coherence from _PAGE_NO_CACHE */