Patchwork [-V7,01/18] mm/THP: HPAGE_SHIFT is not a #define on some arch

login
register
mail settings
Submitter Aneesh Kumar K.V
Date April 28, 2013, 7:37 p.m.
Message ID <1367177859-7893-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/240283/
State Superseded
Headers show

Comments

Aneesh Kumar K.V - April 28, 2013, 7:37 p.m.
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
and other derived values like HPAGE_PMD_ORDER are not constants. So move
that to hugepage_init

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 include/linux/huge_mm.h | 3 ---
 mm/huge_memory.c        | 9 ++++++---
 2 files changed, 6 insertions(+), 6 deletions(-)
David Gibson - April 30, 2013, 2:21 a.m.
On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote:
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
> and other derived values like HPAGE_PMD_ORDER are not constants. So move
> that to hugepage_init

These seems to miss the point.  Those variables may be defined in
terms of HPAGE_SHIFT right now, but that is of itself kind of broken.
The transparent hugepage mechanism only works if the hugepage size is
equal to the PMD size - and PMD_SHIFT remains a compile time constant.

There's no reason having transparent hugepage should force the PMD
size of hugepage to be the default for other purposes - it should be
possible to do THP as long as PMD-sized is a possible hugepage size.
David Gibson - April 30, 2013, 2:24 a.m.
On Tue, Apr 30, 2013 at 12:21:49PM +1000, David Gibson wrote:
> On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote:
> > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> > 
> > On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
> > and other derived values like HPAGE_PMD_ORDER are not constants. So move
> > that to hugepage_init
> 
> These seems to miss the point.  Those variables may be defined in
> terms of HPAGE_SHIFT right now, but that is of itself kind of broken.
> The transparent hugepage mechanism only works if the hugepage size is
> equal to the PMD size - and PMD_SHIFT remains a compile time constant.
> 
> There's no reason having transparent hugepage should force the PMD
> size of hugepage to be the default for other purposes - it should be
> possible to do THP as long as PMD-sized is a possible hugepage size.

Oh, also, I'm pretty sure I said something similar on the last
posting.  Receiving review comments and then ignoring them does not
count as "Reviewed-by"...
Aneesh Kumar K.V - April 30, 2013, 3:42 a.m.
David Gibson <dwg@au1.ibm.com> writes:

> On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
>> and other derived values like HPAGE_PMD_ORDER are not constants. So move
>> that to hugepage_init
>
> These seems to miss the point.  Those variables may be defined in
> terms of HPAGE_SHIFT right now, but that is of itself kind of broken.
> The transparent hugepage mechanism only works if the hugepage size is
> equal to the PMD size - and PMD_SHIFT remains a compile time constant.
>
> There's no reason having transparent hugepage should force the PMD
> size of hugepage to be the default for other purposes - it should be
> possible to do THP as long as PMD-sized is a possible hugepage size.
>

THP code does

#define HPAGE_PMD_SHIFT HPAGE_SHIFT
#define HPAGE_PMD_MASK HPAGE_MASK
#define HPAGE_PMD_SIZE HPAGE_SIZE

I had two options, one to move all those in terms of PMD_SHIFT or switch
ppc64 to not use HPAGE_SHIFT the way it use now. Both would involve large
code changes. Hence I end up moving some of the checks to runtime
checks. Actual HPAGE_SHIFT == PMD_SHIFT check happens in the has_transparent_hugepage() 

https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/106002.html

IMHO what the patch is checking is that, HPAGE_SHIFT
value is not resulting in a page order higher than MAX_ORDER. 

Related to Reviewed-by: that came from V5 patchset 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/105299.html

Your review suggestion to move that runtime check back to macro happened
in V6. I missed dropping reviewed-by after that. 

-aneesh
David Gibson - April 30, 2013, 5:01 a.m.
On Tue, Apr 30, 2013 at 09:12:09AM +0530, Aneesh Kumar K.V wrote:
> David Gibson <dwg@au1.ibm.com> writes:
> 
> > On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
> >> and other derived values like HPAGE_PMD_ORDER are not constants. So move
> >> that to hugepage_init
> >
> > These seems to miss the point.  Those variables may be defined in
> > terms of HPAGE_SHIFT right now, but that is of itself kind of broken.
> > The transparent hugepage mechanism only works if the hugepage size is
> > equal to the PMD size - and PMD_SHIFT remains a compile time constant.
> >
> > There's no reason having transparent hugepage should force the PMD
> > size of hugepage to be the default for other purposes - it should be
> > possible to do THP as long as PMD-sized is a possible hugepage size.
> >
> 
> THP code does
> 
> #define HPAGE_PMD_SHIFT HPAGE_SHIFT
> #define HPAGE_PMD_MASK HPAGE_MASK
> #define HPAGE_PMD_SIZE HPAGE_SIZE
> 
> I had two options, one to move all those in terms of PMD_SHIFT

This is a much better option that you've taken now, and really
shouldn't be that hard.  The THP code is much more strongly tied to
the fact that it is a PMD than the fact that it's the same size as
explicit huge pages.

> or switch
> ppc64 to not use HPAGE_SHIFT the way it use now. Both would involve large
> code changes. Hence I end up moving some of the checks to runtime
> checks. Actual HPAGE_SHIFT == PMD_SHIFT check happens in the has_transparent_hugepage() 
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/106002.html

And my other point is that this is also wrong.  All you should need to
check is that HPAGE_PMD_SHIFT (== PMD_SHIFT) is a supported hugepage
size, not that it is equal to HPAGE_SHIFT the default explicit
hugepage size.

> IMHO what the patch is checking is that, HPAGE_SHIFT
> value is not resulting in a page order higher than MAX_ORDER. 

Which you don't actually care about in THP - you only care that
HPAGE_PMD_SHIFT doesn't exceed MAX_ORDER.

> Related to Reviewed-by: that came from V5 patchset 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/105299.html
> 
> Your review suggestion to move that runtime check back to macro happened
> in V6. I missed dropping reviewed-by after that. 

Ok.
Aneesh Kumar K.V - May 3, 2013, 6:51 p.m.
David Gibson <dwg@au1.ibm.com> writes:

> On Tue, Apr 30, 2013 at 09:12:09AM +0530, Aneesh Kumar K.V wrote:
>> David Gibson <dwg@au1.ibm.com> writes:
>> 
>> > On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote:
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT
>> >> and other derived values like HPAGE_PMD_ORDER are not constants. So move
>> >> that to hugepage_init
>> >
>> > These seems to miss the point.  Those variables may be defined in
>> > terms of HPAGE_SHIFT right now, but that is of itself kind of broken.
>> > The transparent hugepage mechanism only works if the hugepage size is
>> > equal to the PMD size - and PMD_SHIFT remains a compile time constant.
>> >
>> > There's no reason having transparent hugepage should force the PMD
>> > size of hugepage to be the default for other purposes - it should be
>> > possible to do THP as long as PMD-sized is a possible hugepage size.
>> >
>> 
>> THP code does
>> 
>> #define HPAGE_PMD_SHIFT HPAGE_SHIFT
>> #define HPAGE_PMD_MASK HPAGE_MASK
>> #define HPAGE_PMD_SIZE HPAGE_SIZE
>> 
>> I had two options, one to move all those in terms of PMD_SHIFT
>
> This is a much better option that you've taken now, and really
> shouldn't be that hard.  The THP code is much more strongly tied to
> the fact that it is a PMD than the fact that it's the same size as
> explicit huge pages.
>

Ok I tried the above and that turned out to be much simpler. I will have
to make sure i didn't break other archs that support THP.

-aneesh

Patch

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index ee1c244..bdc5aef 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -119,9 +119,6 @@  extern void __split_huge_page_pmd(struct vm_area_struct *vma,
 	} while (0)
 extern void split_huge_page_pmd_mm(struct mm_struct *mm, unsigned long address,
 		pmd_t *pmd);
-#if HPAGE_PMD_ORDER > MAX_ORDER
-#error "hugepages can't be allocated by the buddy allocator"
-#endif
 extern int hugepage_madvise(struct vm_area_struct *vma,
 			    unsigned long *vm_flags, int advice);
 extern void __vma_adjust_trans_huge(struct vm_area_struct *vma,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e2f7f5aa..78bd84f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -45,7 +45,7 @@  unsigned long transparent_hugepage_flags __read_mostly =
 	(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
 
 /* default scan 8*512 pte (or vmas) every 30 second */
-static unsigned int khugepaged_pages_to_scan __read_mostly = HPAGE_PMD_NR*8;
+static unsigned int khugepaged_pages_to_scan __read_mostly;
 static unsigned int khugepaged_pages_collapsed;
 static unsigned int khugepaged_full_scans;
 static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000;
@@ -60,7 +60,7 @@  static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  * it would have happened if the vma was large enough during page
  * fault.
  */
-static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
+static unsigned int khugepaged_max_ptes_none __read_mostly;
 
 static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
@@ -620,11 +620,14 @@  static int __init hugepage_init(void)
 	int err;
 	struct kobject *hugepage_kobj;
 
-	if (!has_transparent_hugepage()) {
+	if (!has_transparent_hugepage() || (HPAGE_PMD_ORDER > MAX_ORDER)) {
 		transparent_hugepage_flags = 0;
 		return -EINVAL;
 	}
 
+	khugepaged_pages_to_scan = HPAGE_PMD_NR*8;
+	khugepaged_max_ptes_none = HPAGE_PMD_NR-1;
+
 	err = hugepage_init_sysfs(&hugepage_kobj);
 	if (err)
 		return err;