Message ID | 1396592835-24767-2-git-send-email-maddy@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Fri, Apr 04, 2014 at 11:57:14AM +0530, Madhavan Srinivasan wrote: > Kirill A. Shutemov with faultaround patchset introduced > vm_ops->map_pages() for mapping easy accessible pages around > fault address in hope to reduce number of minor page faults. > > This patch creates infrastructure to move the FAULT_AROUND_ORDER > to arch/ using Kconfig. This will enable architecture maintainers > to decide on suitable FAULT_AROUND_ORDER value based on > performance data for that architecture. Patch also adds > FAULT_AROUND_ORDER Kconfig element in arch/X86. > > Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> > --- > arch/x86/Kconfig | 4 ++++ > include/linux/mm.h | 9 +++++++++ > mm/memory.c | 12 +++++------- > 3 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 9c0a657..5833f22 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1177,6 +1177,10 @@ config DIRECT_GBPAGES > support it. This can improve the kernel's performance a tiny bit by > reducing TLB pressure. If in doubt, say "Y". > > +config FAULT_AROUND_ORDER > + int > + default "4" > + > # Common NUMA Features > config NUMA > bool "Numa Memory Allocation and Scheduler Support" > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 0bd4359..b93c1c3 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -26,6 +26,15 @@ struct file_ra_state; > struct user_struct; > struct writeback_control; > > +/* > + * Fault around order is a control knob to decide the fault around pages. > + * Default value is set to 0UL (disabled), but the arch can override it as > + * desired. > + */ > +#ifndef CONFIG_FAULT_AROUND_ORDER > +#define CONFIG_FAULT_AROUND_ORDER 0 > +#endif > + I don't think it should be in header file: nobody except mm/memory.c cares. Just put it instead '#define FAULT_AROUND_ORDER'. > #ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */ > extern unsigned long max_mapnr; > > diff --git a/mm/memory.c b/mm/memory.c > index b02c584..22a4a89 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3358,10 +3358,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > update_mmu_cache(vma, address, pte); > } > > -#define FAULT_AROUND_ORDER 4 > - > #ifdef CONFIG_DEBUG_FS > -static unsigned int fault_around_order = FAULT_AROUND_ORDER; > +static unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER; > > static int fault_around_order_get(void *data, u64 *val) > { > @@ -3371,7 +3369,7 @@ static int fault_around_order_get(void *data, u64 *val) > > static int fault_around_order_set(void *data, u64 val) > { > - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); > + BUILD_BUG_ON((1UL << CONFIG_FAULT_AROUND_ORDER) > PTRS_PER_PTE); > if (1UL << val > PTRS_PER_PTE) > return -EINVAL; > fault_around_order = val; > @@ -3406,14 +3404,14 @@ static inline unsigned long fault_around_pages(void) > { > unsigned long nr_pages; > > - nr_pages = 1UL << FAULT_AROUND_ORDER; > + nr_pages = 1UL << CONFIG_FAULT_AROUND_ORDER; > BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); > return nr_pages; > } > > static inline unsigned long fault_around_mask(void) > { > - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); > + return ~((1UL << (PAGE_SHIFT + CONFIG_FAULT_AROUND_ORDER)) - 1); > } > #endif > > @@ -3471,7 +3469,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * if page by the offset is not ready to be mapped (cold cache or > * something). > */ > - if (vma->vm_ops->map_pages) { > + if ((vma->vm_ops->map_pages) && (fault_around_pages() > 1)) { if (vma->vm_ops->map_pages && fault_around_pages()) { > pte = pte_offset_map_lock(mm, pmd, address, &ptl); > do_fault_around(vma, address, pte, pgoff, flags); > if (!pte_same(*pte, orig_pte)) > -- > 1.7.10.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote: > This patch creates infrastructure to move the FAULT_AROUND_ORDER > to arch/ using Kconfig. This will enable architecture maintainers > to decide on suitable FAULT_AROUND_ORDER value based on > performance data for that architecture. Patch also adds > FAULT_AROUND_ORDER Kconfig element in arch/X86. Please don't do it this way. In mm/Kconfig, put config FAULT_AROUND_ORDER int default 1234 if POWERPC default 4 The way you have it now, every single architecture that needs to enable this has to go put that in their Kconfig. That's madness. This way, you only put it in one place, and folks only have to care if they want to change the default to be something other than 4.
From: Dave Hansen <dave.hansen@intel.com> Date: Fri, 04 Apr 2014 09:18:43 -0700 > On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote: >> This patch creates infrastructure to move the FAULT_AROUND_ORDER >> to arch/ using Kconfig. This will enable architecture maintainers >> to decide on suitable FAULT_AROUND_ORDER value based on >> performance data for that architecture. Patch also adds >> FAULT_AROUND_ORDER Kconfig element in arch/X86. > > Please don't do it this way. > > In mm/Kconfig, put > > config FAULT_AROUND_ORDER > int > default 1234 if POWERPC > default 4 > > The way you have it now, every single architecture that needs to enable > this has to go put that in their Kconfig. That's madness. This way, > you only put it in one place, and folks only have to care if they want > to change the default to be something other than 4. It looks more like it's necessary only to change the default, not to enable it. Unless I read his patch wrong...
On Fri, 2014-04-04 at 09:18 -0700, Dave Hansen wrote: > On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote: > > This patch creates infrastructure to move the FAULT_AROUND_ORDER > > to arch/ using Kconfig. This will enable architecture maintainers > > to decide on suitable FAULT_AROUND_ORDER value based on > > performance data for that architecture. Patch also adds > > FAULT_AROUND_ORDER Kconfig element in arch/X86. > > Please don't do it this way. > > In mm/Kconfig, put > > config FAULT_AROUND_ORDER > int > default 1234 if POWERPC > default 4 > > The way you have it now, every single architecture that needs to enable > this has to go put that in their Kconfig. That's madness. This way, > you only put it in one place, and folks only have to care if they want > to change the default to be something other than 4. Also does it have to be a constant ? Maddy here tested on our POWER servers. The "Sweet spot" value might be VERY different on an embedded chip or even on a future generation of server chip. Cheers, Ben.
On Friday 04 April 2014 06:47 PM, Kirill A. Shutemov wrote: > On Fri, Apr 04, 2014 at 11:57:14AM +0530, Madhavan Srinivasan wrote: >> Kirill A. Shutemov with faultaround patchset introduced >> vm_ops->map_pages() for mapping easy accessible pages around >> fault address in hope to reduce number of minor page faults. >> >> This patch creates infrastructure to move the FAULT_AROUND_ORDER >> to arch/ using Kconfig. This will enable architecture maintainers >> to decide on suitable FAULT_AROUND_ORDER value based on >> performance data for that architecture. Patch also adds >> FAULT_AROUND_ORDER Kconfig element in arch/X86. >> >> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> >> --- >> arch/x86/Kconfig | 4 ++++ >> include/linux/mm.h | 9 +++++++++ >> mm/memory.c | 12 +++++------- >> 3 files changed, 18 insertions(+), 7 deletions(-) >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index 9c0a657..5833f22 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1177,6 +1177,10 @@ config DIRECT_GBPAGES >> support it. This can improve the kernel's performance a tiny bit by >> reducing TLB pressure. If in doubt, say "Y". >> >> +config FAULT_AROUND_ORDER >> + int >> + default "4" >> + >> # Common NUMA Features >> config NUMA >> bool "Numa Memory Allocation and Scheduler Support" >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 0bd4359..b93c1c3 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -26,6 +26,15 @@ struct file_ra_state; >> struct user_struct; >> struct writeback_control; >> >> +/* >> + * Fault around order is a control knob to decide the fault around pages. >> + * Default value is set to 0UL (disabled), but the arch can override it as >> + * desired. >> + */ >> +#ifndef CONFIG_FAULT_AROUND_ORDER >> +#define CONFIG_FAULT_AROUND_ORDER 0 >> +#endif >> + > > I don't think it should be in header file: nobody except mm/memory.c cares. > Just put it instead '#define FAULT_AROUND_ORDER'. > Ok. Will do this change. >> #ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */ >> extern unsigned long max_mapnr; >> >> diff --git a/mm/memory.c b/mm/memory.c >> index b02c584..22a4a89 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3358,10 +3358,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, >> update_mmu_cache(vma, address, pte); >> } >> >> -#define FAULT_AROUND_ORDER 4 >> - >> #ifdef CONFIG_DEBUG_FS >> -static unsigned int fault_around_order = FAULT_AROUND_ORDER; >> +static unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER; >> >> static int fault_around_order_get(void *data, u64 *val) >> { >> @@ -3371,7 +3369,7 @@ static int fault_around_order_get(void *data, u64 *val) >> >> static int fault_around_order_set(void *data, u64 val) >> { >> - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); >> + BUILD_BUG_ON((1UL << CONFIG_FAULT_AROUND_ORDER) > PTRS_PER_PTE); >> if (1UL << val > PTRS_PER_PTE) >> return -EINVAL; >> fault_around_order = val; >> @@ -3406,14 +3404,14 @@ static inline unsigned long fault_around_pages(void) >> { >> unsigned long nr_pages; >> >> - nr_pages = 1UL << FAULT_AROUND_ORDER; >> + nr_pages = 1UL << CONFIG_FAULT_AROUND_ORDER; >> BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); >> return nr_pages; >> } >> >> static inline unsigned long fault_around_mask(void) >> { >> - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); >> + return ~((1UL << (PAGE_SHIFT + CONFIG_FAULT_AROUND_ORDER)) - 1); >> } >> #endif >> >> @@ -3471,7 +3469,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> * if page by the offset is not ready to be mapped (cold cache or >> * something). >> */ >> - if (vma->vm_ops->map_pages) { >> + if ((vma->vm_ops->map_pages) && (fault_around_pages() > 1)) { > > if (vma->vm_ops->map_pages && fault_around_pages()) { > For a fault around value of 0, fault_around_pages() will return 1 and that is reason for checking it greater than 1. Also, using debug fs, fault around value can be zeroed. With regards Maddy >> pte = pte_offset_map_lock(mm, pmd, address, &ptl); >> do_fault_around(vma, address, pte, pgoff, flags); >> if (!pte_same(*pte, orig_pte)) >> -- >> 1.7.10.4 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >
On Friday 04 April 2014 09:48 PM, Dave Hansen wrote: > On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote: >> This patch creates infrastructure to move the FAULT_AROUND_ORDER >> to arch/ using Kconfig. This will enable architecture maintainers >> to decide on suitable FAULT_AROUND_ORDER value based on >> performance data for that architecture. Patch also adds >> FAULT_AROUND_ORDER Kconfig element in arch/X86. > > Please don't do it this way. > > In mm/Kconfig, put > > config FAULT_AROUND_ORDER > int > default 1234 if POWERPC > default 4 > > The way you have it now, every single architecture that needs to enable > this has to go put that in their Kconfig. That's madness. This way, I though about it and decided not to do this way because, in future, sub platforms of the architecture may decide to change the values. Also, adding an if line for each architecture with different sub platforms oring to it will look messy. With regards Maddy > you only put it in one place, and folks only have to care if they want > to change the default to be something other than 4. >
On Friday 04 April 2014 11:20 PM, David Miller wrote: > From: Dave Hansen <dave.hansen@intel.com> > Date: Fri, 04 Apr 2014 09:18:43 -0700 > >> On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote: >>> This patch creates infrastructure to move the FAULT_AROUND_ORDER >>> to arch/ using Kconfig. This will enable architecture maintainers >>> to decide on suitable FAULT_AROUND_ORDER value based on >>> performance data for that architecture. Patch also adds >>> FAULT_AROUND_ORDER Kconfig element in arch/X86. >> >> Please don't do it this way. >> >> In mm/Kconfig, put >> >> config FAULT_AROUND_ORDER >> int >> default 1234 if POWERPC >> default 4 >> >> The way you have it now, every single architecture that needs to enable >> this has to go put that in their Kconfig. That's madness. This way, >> you only put it in one place, and folks only have to care if they want >> to change the default to be something other than 4. > > It looks more like it's necessary only to change the default, not > to enable it. Unless I read his patch wrong... > Yes. With current patch, you only need to change the default by which you enable it. With regards Maddy >
On Wed, Apr 09, 2014 at 07:02:02AM +0530, Madhavan Srinivasan wrote: > On Friday 04 April 2014 09:48 PM, Dave Hansen wrote: > > On 04/03/2014 11:27 PM, Madhavan Srinivasan wrote: > >> This patch creates infrastructure to move the FAULT_AROUND_ORDER > >> to arch/ using Kconfig. This will enable architecture maintainers > >> to decide on suitable FAULT_AROUND_ORDER value based on > >> performance data for that architecture. Patch also adds > >> FAULT_AROUND_ORDER Kconfig element in arch/X86. > > > > Please don't do it this way. > > > > In mm/Kconfig, put > > > > config FAULT_AROUND_ORDER > > int > > default 1234 if POWERPC > > default 4 > > > > The way you have it now, every single architecture that needs to enable > > this has to go put that in their Kconfig. That's madness. This way, > > I though about it and decided not to do this way because, in future, > sub platforms of the architecture may decide to change the values. Also, > adding an if line for each architecture with different sub platforms > oring to it will look messy. This still misses out on Ben's objection that its impossible to get this right at compile time for many kernels, since they can boot and run on many different subarchs.
On 04/08/2014 06:32 PM, Madhavan Srinivasan wrote: >> > In mm/Kconfig, put >> > >> > config FAULT_AROUND_ORDER >> > int >> > default 1234 if POWERPC >> > default 4 >> > >> > The way you have it now, every single architecture that needs to enable >> > this has to go put that in their Kconfig. That's madness. This way, > I though about it and decided not to do this way because, in future, > sub platforms of the architecture may decide to change the values. Also, > adding an if line for each architecture with different sub platforms > oring to it will look messy. I'm not sure why I'm trying here any more. You do seem quite content to add as much cruft to ppc and every other architecture as possible. If your theoretical scenario pops up, you simply do this in ppc: config ARCH_FAULT_AROUND_ORDER int default 999 default 888 if OTHER_SILLY_POWERPC_SUBARCH But *ONLY* in the architectures that care about doing that stuff. You leave every other architecture on the planet alone. Then, in mm/Kconfig: config FAULT_AROUND_ORDER int default ARCH_FAULT_AROUND_ORDER if ARCH_FAULT_AROUND_ORDER default 4 Your way still requires going and individually touching every single architecture's Kconfig that wants to enable fault around. That's not an acceptable solution.
On 04/09/2014 01:20 AM, Peter Zijlstra wrote: > This still misses out on Ben's objection that its impossible to get this > right at compile time for many kernels, since they can boot and run on > many different subarchs. Completely agree. The Kconfig-time stuff should probably just be a knob to turn it off completely, if anything.
On Wednesday 09 April 2014 09:18 PM, Dave Hansen wrote: > On 04/09/2014 01:20 AM, Peter Zijlstra wrote: >> This still misses out on Ben's objection that its impossible to get this >> right at compile time for many kernels, since they can boot and run on >> many different subarchs. > > Completely agree. The Kconfig-time stuff should probably just be a knob > to turn it off completely, if anything. > ok. Here is my thought. So to address Ben's concern, it would be better to have this as a variable with a default value (and the platform can override ride it). And a mm/Kconfig to disable it? Kindly let me know whether this will work. Thanks for review comments. With regards Maddy
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9c0a657..5833f22 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1177,6 +1177,10 @@ config DIRECT_GBPAGES support it. This can improve the kernel's performance a tiny bit by reducing TLB pressure. If in doubt, say "Y". +config FAULT_AROUND_ORDER + int + default "4" + # Common NUMA Features config NUMA bool "Numa Memory Allocation and Scheduler Support" diff --git a/include/linux/mm.h b/include/linux/mm.h index 0bd4359..b93c1c3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -26,6 +26,15 @@ struct file_ra_state; struct user_struct; struct writeback_control; +/* + * Fault around order is a control knob to decide the fault around pages. + * Default value is set to 0UL (disabled), but the arch can override it as + * desired. + */ +#ifndef CONFIG_FAULT_AROUND_ORDER +#define CONFIG_FAULT_AROUND_ORDER 0 +#endif + #ifndef CONFIG_NEED_MULTIPLE_NODES /* Don't use mapnrs, do it properly */ extern unsigned long max_mapnr; diff --git a/mm/memory.c b/mm/memory.c index b02c584..22a4a89 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3358,10 +3358,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } -#define FAULT_AROUND_ORDER 4 - #ifdef CONFIG_DEBUG_FS -static unsigned int fault_around_order = FAULT_AROUND_ORDER; +static unsigned int fault_around_order = CONFIG_FAULT_AROUND_ORDER; static int fault_around_order_get(void *data, u64 *val) { @@ -3371,7 +3369,7 @@ static int fault_around_order_get(void *data, u64 *val) static int fault_around_order_set(void *data, u64 val) { - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); + BUILD_BUG_ON((1UL << CONFIG_FAULT_AROUND_ORDER) > PTRS_PER_PTE); if (1UL << val > PTRS_PER_PTE) return -EINVAL; fault_around_order = val; @@ -3406,14 +3404,14 @@ static inline unsigned long fault_around_pages(void) { unsigned long nr_pages; - nr_pages = 1UL << FAULT_AROUND_ORDER; + nr_pages = 1UL << CONFIG_FAULT_AROUND_ORDER; BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); return nr_pages; } static inline unsigned long fault_around_mask(void) { - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); + return ~((1UL << (PAGE_SHIFT + CONFIG_FAULT_AROUND_ORDER)) - 1); } #endif @@ -3471,7 +3469,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages) { + if ((vma->vm_ops->map_pages) && (fault_around_pages() > 1)) { pte = pte_offset_map_lock(mm, pmd, address, &ptl); do_fault_around(vma, address, pte, pgoff, flags); if (!pte_same(*pte, orig_pte))
Kirill A. Shutemov with faultaround patchset introduced vm_ops->map_pages() for mapping easy accessible pages around fault address in hope to reduce number of minor page faults. This patch creates infrastructure to move the FAULT_AROUND_ORDER to arch/ using Kconfig. This will enable architecture maintainers to decide on suitable FAULT_AROUND_ORDER value based on performance data for that architecture. Patch also adds FAULT_AROUND_ORDER Kconfig element in arch/X86. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> --- arch/x86/Kconfig | 4 ++++ include/linux/mm.h | 9 +++++++++ mm/memory.c | 12 +++++------- 3 files changed, 18 insertions(+), 7 deletions(-)