Message ID | 1367177859-7893-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT > and other derived values like HPAGE_PMD_ORDER are not constants. So move > that to hugepage_init These seems to miss the point. Those variables may be defined in terms of HPAGE_SHIFT right now, but that is of itself kind of broken. The transparent hugepage mechanism only works if the hugepage size is equal to the PMD size - and PMD_SHIFT remains a compile time constant. There's no reason having transparent hugepage should force the PMD size of hugepage to be the default for other purposes - it should be possible to do THP as long as PMD-sized is a possible hugepage size.
On Tue, Apr 30, 2013 at 12:21:49PM +1000, David Gibson wrote: > On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote: > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > > > On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT > > and other derived values like HPAGE_PMD_ORDER are not constants. So move > > that to hugepage_init > > These seems to miss the point. Those variables may be defined in > terms of HPAGE_SHIFT right now, but that is of itself kind of broken. > The transparent hugepage mechanism only works if the hugepage size is > equal to the PMD size - and PMD_SHIFT remains a compile time constant. > > There's no reason having transparent hugepage should force the PMD > size of hugepage to be the default for other purposes - it should be > possible to do THP as long as PMD-sized is a possible hugepage size. Oh, also, I'm pretty sure I said something similar on the last posting. Receiving review comments and then ignoring them does not count as "Reviewed-by"...
David Gibson <dwg@au1.ibm.com> writes: > On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote: >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> >> >> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT >> and other derived values like HPAGE_PMD_ORDER are not constants. So move >> that to hugepage_init > > These seems to miss the point. Those variables may be defined in > terms of HPAGE_SHIFT right now, but that is of itself kind of broken. > The transparent hugepage mechanism only works if the hugepage size is > equal to the PMD size - and PMD_SHIFT remains a compile time constant. > > There's no reason having transparent hugepage should force the PMD > size of hugepage to be the default for other purposes - it should be > possible to do THP as long as PMD-sized is a possible hugepage size. > THP code does #define HPAGE_PMD_SHIFT HPAGE_SHIFT #define HPAGE_PMD_MASK HPAGE_MASK #define HPAGE_PMD_SIZE HPAGE_SIZE I had two options, one to move all those in terms of PMD_SHIFT or switch ppc64 to not use HPAGE_SHIFT the way it use now. Both would involve large code changes. Hence I end up moving some of the checks to runtime checks. Actual HPAGE_SHIFT == PMD_SHIFT check happens in the has_transparent_hugepage() https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/106002.html IMHO what the patch is checking is that, HPAGE_SHIFT value is not resulting in a page order higher than MAX_ORDER. Related to Reviewed-by: that came from V5 patchset https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/105299.html Your review suggestion to move that runtime check back to macro happened in V6. I missed dropping reviewed-by after that. -aneesh
On Tue, Apr 30, 2013 at 09:12:09AM +0530, Aneesh Kumar K.V wrote: > David Gibson <dwg@au1.ibm.com> writes: > > > On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote: > >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > >> > >> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT > >> and other derived values like HPAGE_PMD_ORDER are not constants. So move > >> that to hugepage_init > > > > These seems to miss the point. Those variables may be defined in > > terms of HPAGE_SHIFT right now, but that is of itself kind of broken. > > The transparent hugepage mechanism only works if the hugepage size is > > equal to the PMD size - and PMD_SHIFT remains a compile time constant. > > > > There's no reason having transparent hugepage should force the PMD > > size of hugepage to be the default for other purposes - it should be > > possible to do THP as long as PMD-sized is a possible hugepage size. > > > > THP code does > > #define HPAGE_PMD_SHIFT HPAGE_SHIFT > #define HPAGE_PMD_MASK HPAGE_MASK > #define HPAGE_PMD_SIZE HPAGE_SIZE > > I had two options, one to move all those in terms of PMD_SHIFT This is a much better option that you've taken now, and really shouldn't be that hard. The THP code is much more strongly tied to the fact that it is a PMD than the fact that it's the same size as explicit huge pages. > or switch > ppc64 to not use HPAGE_SHIFT the way it use now. Both would involve large > code changes. Hence I end up moving some of the checks to runtime > checks. Actual HPAGE_SHIFT == PMD_SHIFT check happens in the has_transparent_hugepage() > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/106002.html And my other point is that this is also wrong. All you should need to check is that HPAGE_PMD_SHIFT (== PMD_SHIFT) is a supported hugepage size, not that it is equal to HPAGE_SHIFT the default explicit hugepage size. > IMHO what the patch is checking is that, HPAGE_SHIFT > value is not resulting in a page order higher than MAX_ORDER. Which you don't actually care about in THP - you only care that HPAGE_PMD_SHIFT doesn't exceed MAX_ORDER. > Related to Reviewed-by: that came from V5 patchset > https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/105299.html > > Your review suggestion to move that runtime check back to macro happened > in V6. I missed dropping reviewed-by after that. Ok.
David Gibson <dwg@au1.ibm.com> writes: > On Tue, Apr 30, 2013 at 09:12:09AM +0530, Aneesh Kumar K.V wrote: >> David Gibson <dwg@au1.ibm.com> writes: >> >> > On Mon, Apr 29, 2013 at 01:07:22AM +0530, Aneesh Kumar K.V wrote: >> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> >> >> >> >> On archs like powerpc that support different hugepage sizes, HPAGE_SHIFT >> >> and other derived values like HPAGE_PMD_ORDER are not constants. So move >> >> that to hugepage_init >> > >> > These seems to miss the point. Those variables may be defined in >> > terms of HPAGE_SHIFT right now, but that is of itself kind of broken. >> > The transparent hugepage mechanism only works if the hugepage size is >> > equal to the PMD size - and PMD_SHIFT remains a compile time constant. >> > >> > There's no reason having transparent hugepage should force the PMD >> > size of hugepage to be the default for other purposes - it should be >> > possible to do THP as long as PMD-sized is a possible hugepage size. >> > >> >> THP code does >> >> #define HPAGE_PMD_SHIFT HPAGE_SHIFT >> #define HPAGE_PMD_MASK HPAGE_MASK >> #define HPAGE_PMD_SIZE HPAGE_SIZE >> >> I had two options, one to move all those in terms of PMD_SHIFT > > This is a much better option that you've taken now, and really > shouldn't be that hard. The THP code is much more strongly tied to > the fact that it is a PMD than the fact that it's the same size as > explicit huge pages. > Ok I tried the above and that turned out to be much simpler. I will have to make sure i didn't break other archs that support THP. -aneesh
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ee1c244..bdc5aef 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -119,9 +119,6 @@ extern void __split_huge_page_pmd(struct vm_area_struct *vma, } while (0) extern void split_huge_page_pmd_mm(struct mm_struct *mm, unsigned long address, pmd_t *pmd); -#if HPAGE_PMD_ORDER > MAX_ORDER -#error "hugepages can't be allocated by the buddy allocator" -#endif extern int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); extern void __vma_adjust_trans_huge(struct vm_area_struct *vma, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e2f7f5aa..78bd84f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -45,7 +45,7 @@ unsigned long transparent_hugepage_flags __read_mostly = (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG); /* default scan 8*512 pte (or vmas) every 30 second */ -static unsigned int khugepaged_pages_to_scan __read_mostly = HPAGE_PMD_NR*8; +static unsigned int khugepaged_pages_to_scan __read_mostly; static unsigned int khugepaged_pages_collapsed; static unsigned int khugepaged_full_scans; static unsigned int khugepaged_scan_sleep_millisecs __read_mostly = 10000; @@ -60,7 +60,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * it would have happened if the vma was large enough during page * fault. */ -static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1; +static unsigned int khugepaged_max_ptes_none __read_mostly; static int khugepaged(void *none); static int khugepaged_slab_init(void); @@ -620,11 +620,14 @@ static int __init hugepage_init(void) int err; struct kobject *hugepage_kobj; - if (!has_transparent_hugepage()) { + if (!has_transparent_hugepage() || (HPAGE_PMD_ORDER > MAX_ORDER)) { transparent_hugepage_flags = 0; return -EINVAL; } + khugepaged_pages_to_scan = HPAGE_PMD_NR*8; + khugepaged_max_ptes_none = HPAGE_PMD_NR-1; + err = hugepage_init_sysfs(&hugepage_kobj); if (err) return err;