Message ID | 20140520102738.7F096E009B@blue.fi.intel.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, 20 May 2014 13:27:38 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote: > Rusty Russell wrote: > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > > Andrew Morton wrote: > > >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > > >> > > >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > > >> > the order of the fault-around size in bytes, and fault_around_pages() > > >> > use 1UL << (fault_around_order - PAGE_SHIFT) > > >> > > >> Yes. And shame on me for missing it (this time!) at review. > > >> > > >> There's still time to fix this. Patches, please. > > > > > > Here it is. Made at 3.30 AM, build tested only. > > > > Prefer on top of Maddy's patch which makes it always a variable, rather > > than CONFIG_DEBUG_FS. It's got enough hair as it is. > > Something like this? This appears to be against mainline, not against Madhavan's patch. As mentioned previously, I'd prefer it that way but confused. > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Date: Tue, 20 May 2014 13:02:03 +0300 > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order > > There are evidences that faultaround feature is less relevant on > architectures with page size bigger then 4k. Which makes sense since > page fault overhead per byte of mapped area should be less there. > > Let's rework the feature to specify faultaround area in bytes instead of > page order. It's 64 kilobytes for now. > > The patch effectively disables faultaround on architectures with > page size >= 64k (like ppc64). > > It's possible that some other size of faultaround area is relevant for a > platform. We can expose `fault_around_bytes' variable to arch-specific > code once such platforms will be found. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- > 1 file changed, 23 insertions(+), 39 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..252b319e8cdf 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > update_mmu_cache(vma, address, pte); > } > > -#define FAULT_AROUND_ORDER 4 > +static unsigned long fault_around_bytes = 65536; > + > +static inline unsigned long fault_around_pages(void) > +{ > + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; > +} I think we should round up, not down. So if the user asks for 1kb, they get one page. So this becomes return PAGE_ALIGN(fault_around_bytes) / PAGE_SIZE; > +static inline unsigned long fault_around_mask(void) > +{ > + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; > +} And this has me a bit stumped. It's not helpful that do_fault_around() is undocumented. Does it fault in N/2 pages ahead and N/2 pages behind? Or does it align the address down to the highest multiple of fault_around_bytes? It appears to be the latter, so the location of the faultaround window around the fault address is basically random, depending on what address userspace happened to pick. I don't know why we did this :( Or something. Can we please get some code commentary over do_fault_around() describing this design decision and explaining the reasoning behind it? Also, "neast" is not a word.
On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote: > Rusty Russell wrote: >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >>> Andrew Morton wrote: >>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: >>>> >>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be >>>>> the order of the fault-around size in bytes, and fault_around_pages() >>>>> use 1UL << (fault_around_order - PAGE_SHIFT) >>>> >>>> Yes. And shame on me for missing it (this time!) at review. >>>> >>>> There's still time to fix this. Patches, please. >>> >>> Here it is. Made at 3.30 AM, build tested only. >> >> Prefer on top of Maddy's patch which makes it always a variable, rather >> than CONFIG_DEBUG_FS. It's got enough hair as it is. > > Something like this? > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Date: Tue, 20 May 2014 13:02:03 +0300 > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order > > There are evidences that faultaround feature is less relevant on > architectures with page size bigger then 4k. Which makes sense since > page fault overhead per byte of mapped area should be less there. > > Let's rework the feature to specify faultaround area in bytes instead of > page order. It's 64 kilobytes for now. > > The patch effectively disables faultaround on architectures with > page size >= 64k (like ppc64). > > It's possible that some other size of faultaround area is relevant for a > platform. We can expose `fault_around_bytes' variable to arch-specific > code once such platforms will be found. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- > 1 file changed, 23 insertions(+), 39 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..252b319e8cdf 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > update_mmu_cache(vma, address, pte); > } > > -#define FAULT_AROUND_ORDER 4 > +static unsigned long fault_around_bytes = 65536; > + > +static inline unsigned long fault_around_pages(void) > +{ > + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; > +} > + > +static inline unsigned long fault_around_mask(void) > +{ > + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; > +} > > -#ifdef CONFIG_DEBUG_FS > -static unsigned int fault_around_order = FAULT_AROUND_ORDER; > > -static int fault_around_order_get(void *data, u64 *val) > +#ifdef CONFIG_DEBUG_FS > +static int fault_around_bytes_get(void *data, u64 *val) > { > - *val = fault_around_order; > + *val = fault_around_bytes; > return 0; > } > > -static int fault_around_order_set(void *data, u64 val) > +static int fault_around_bytes_set(void *data, u64 val) > { Kindly ignore the question if not relevant. Even though we need root access to alter the value, will we be fine with negative value?. Regards Maddy > - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); > - if (1UL << val > PTRS_PER_PTE) > + if (val / PAGE_SIZE > PTRS_PER_PTE) > return -EINVAL; > - fault_around_order = val; > + fault_around_bytes = val; > return 0; > } > -DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops, > - fault_around_order_get, fault_around_order_set, "%llu\n"); > +DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops, > + fault_around_bytes_get, fault_around_bytes_set, "%llu\n"); > > static int __init fault_around_debugfs(void) > { > void *ret; > > - ret = debugfs_create_file("fault_around_order", 0644, NULL, NULL, > - &fault_around_order_fops); > + ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL, > + &fault_around_bytes_fops); > if (!ret) > - pr_warn("Failed to create fault_around_order in debugfs"); > + pr_warn("Failed to create fault_around_bytes in debugfs"); > return 0; > } > late_initcall(fault_around_debugfs); > - > -static inline unsigned long fault_around_pages(void) > -{ > - return 1UL << fault_around_order; > -} > - > -static inline unsigned long fault_around_mask(void) > -{ > - return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1); > -} > -#else > -static inline unsigned long fault_around_pages(void) > -{ > - unsigned long nr_pages; > - > - nr_pages = 1UL << FAULT_AROUND_ORDER; > - BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); > - return nr_pages; > -} > - > -static inline unsigned long fault_around_mask(void) > -{ > - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); > -} > #endif > > static void do_fault_around(struct vm_area_struct *vma, unsigned long address, > @@ -3515,7 +3499,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * if page by the offset is not ready to be mapped (cold cache or > * something). > */ > - if (vma->vm_ops->map_pages) { > + if (vma->vm_ops->map_pages && fault_around_pages() > 1) { > pte = pte_offset_map_lock(mm, pmd, address, &ptl); > do_fault_around(vma, address, pte, pgoff, flags); > if (!pte_same(*pte, orig_pte)) >
Madhavan Srinivasan wrote: > On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote: > > Rusty Russell wrote: > >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > >>> Andrew Morton wrote: > >>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > >>>> > >>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be > >>>>> the order of the fault-around size in bytes, and fault_around_pages() > >>>>> use 1UL << (fault_around_order - PAGE_SHIFT) > >>>> > >>>> Yes. And shame on me for missing it (this time!) at review. > >>>> > >>>> There's still time to fix this. Patches, please. > >>> > >>> Here it is. Made at 3.30 AM, build tested only. > >> > >> Prefer on top of Maddy's patch which makes it always a variable, rather > >> than CONFIG_DEBUG_FS. It's got enough hair as it is. > > > > Something like this? > > > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > Date: Tue, 20 May 2014 13:02:03 +0300 > > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order > > > > There are evidences that faultaround feature is less relevant on > > architectures with page size bigger then 4k. Which makes sense since > > page fault overhead per byte of mapped area should be less there. > > > > Let's rework the feature to specify faultaround area in bytes instead of > > page order. It's 64 kilobytes for now. > > > > The patch effectively disables faultaround on architectures with > > page size >= 64k (like ppc64). > > > > It's possible that some other size of faultaround area is relevant for a > > platform. We can expose `fault_around_bytes' variable to arch-specific > > code once such platforms will be found. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- > > 1 file changed, 23 insertions(+), 39 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 037b812a9531..252b319e8cdf 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, > > update_mmu_cache(vma, address, pte); > > } > > > > -#define FAULT_AROUND_ORDER 4 > > +static unsigned long fault_around_bytes = 65536; > > + > > +static inline unsigned long fault_around_pages(void) > > +{ > > + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; > > +} > > + > > +static inline unsigned long fault_around_mask(void) > > +{ > > + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; > > +} > > > > -#ifdef CONFIG_DEBUG_FS > > -static unsigned int fault_around_order = FAULT_AROUND_ORDER; > > > > -static int fault_around_order_get(void *data, u64 *val) > > +#ifdef CONFIG_DEBUG_FS > > +static int fault_around_bytes_get(void *data, u64 *val) > > { > > - *val = fault_around_order; > > + *val = fault_around_bytes; > > return 0; > > } > > > > -static int fault_around_order_set(void *data, u64 val) > > +static int fault_around_bytes_set(void *data, u64 val) > > { > > Kindly ignore the question if not relevant. Even though we need root > access to alter the value, will we be fine with > negative value?. val is u64. or I miss something?
On Tuesday 27 May 2014 03:51 PM, Kirill A. Shutemov wrote: > Madhavan Srinivasan wrote: >> On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote: >>> Rusty Russell wrote: >>>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >>>>> Andrew Morton wrote: >>>>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: >>>>>> >>>>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be >>>>>>> the order of the fault-around size in bytes, and fault_around_pages() >>>>>>> use 1UL << (fault_around_order - PAGE_SHIFT) >>>>>> >>>>>> Yes. And shame on me for missing it (this time!) at review. >>>>>> >>>>>> There's still time to fix this. Patches, please. >>>>> >>>>> Here it is. Made at 3.30 AM, build tested only. >>>> >>>> Prefer on top of Maddy's patch which makes it always a variable, rather >>>> than CONFIG_DEBUG_FS. It's got enough hair as it is. >>> >>> Something like this? >>> >>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> >>> Date: Tue, 20 May 2014 13:02:03 +0300 >>> Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order >>> >>> There are evidences that faultaround feature is less relevant on >>> architectures with page size bigger then 4k. Which makes sense since >>> page fault overhead per byte of mapped area should be less there. >>> >>> Let's rework the feature to specify faultaround area in bytes instead of >>> page order. It's 64 kilobytes for now. >>> >>> The patch effectively disables faultaround on architectures with >>> page size >= 64k (like ppc64). >>> >>> It's possible that some other size of faultaround area is relevant for a >>> platform. We can expose `fault_around_bytes' variable to arch-specific >>> code once such platforms will be found. >>> >>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >>> --- >>> mm/memory.c | 62 +++++++++++++++++++++++-------------------------------------- >>> 1 file changed, 23 insertions(+), 39 deletions(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 037b812a9531..252b319e8cdf 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, >>> update_mmu_cache(vma, address, pte); >>> } >>> >>> -#define FAULT_AROUND_ORDER 4 >>> +static unsigned long fault_around_bytes = 65536; >>> + >>> +static inline unsigned long fault_around_pages(void) >>> +{ >>> + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; >>> +} >>> + >>> +static inline unsigned long fault_around_mask(void) >>> +{ >>> + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; >>> +} >>> >>> -#ifdef CONFIG_DEBUG_FS >>> -static unsigned int fault_around_order = FAULT_AROUND_ORDER; >>> >>> -static int fault_around_order_get(void *data, u64 *val) >>> +#ifdef CONFIG_DEBUG_FS >>> +static int fault_around_bytes_get(void *data, u64 *val) >>> { >>> - *val = fault_around_order; >>> + *val = fault_around_bytes; >>> return 0; >>> } >>> >>> -static int fault_around_order_set(void *data, u64 val) >>> +static int fault_around_bytes_set(void *data, u64 val) >>> { >> >> Kindly ignore the question if not relevant. Even though we need root >> access to alter the value, will we be fine with >> negative value?. > ppc > val is u64. or I miss something? > My Bad. What I wanted to check was for all 0xf input and guess we are fine. Sorry about that. Regards Maddy
diff --git a/mm/memory.c b/mm/memory.c index 037b812a9531..252b319e8cdf 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } -#define FAULT_AROUND_ORDER 4 +static unsigned long fault_around_bytes = 65536; + +static inline unsigned long fault_around_pages(void) +{ + return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE; +} + +static inline unsigned long fault_around_mask(void) +{ + return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK; +} -#ifdef CONFIG_DEBUG_FS -static unsigned int fault_around_order = FAULT_AROUND_ORDER; -static int fault_around_order_get(void *data, u64 *val) +#ifdef CONFIG_DEBUG_FS +static int fault_around_bytes_get(void *data, u64 *val) { - *val = fault_around_order; + *val = fault_around_bytes; return 0; } -static int fault_around_order_set(void *data, u64 val) +static int fault_around_bytes_set(void *data, u64 val) { - BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE); - if (1UL << val > PTRS_PER_PTE) + if (val / PAGE_SIZE > PTRS_PER_PTE) return -EINVAL; - fault_around_order = val; + fault_around_bytes = val; return 0; } -DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops, - fault_around_order_get, fault_around_order_set, "%llu\n"); +DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops, + fault_around_bytes_get, fault_around_bytes_set, "%llu\n"); static int __init fault_around_debugfs(void) { void *ret; - ret = debugfs_create_file("fault_around_order", 0644, NULL, NULL, - &fault_around_order_fops); + ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL, + &fault_around_bytes_fops); if (!ret) - pr_warn("Failed to create fault_around_order in debugfs"); + pr_warn("Failed to create fault_around_bytes in debugfs"); return 0; } late_initcall(fault_around_debugfs); - -static inline unsigned long fault_around_pages(void) -{ - return 1UL << fault_around_order; -} - -static inline unsigned long fault_around_mask(void) -{ - return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1); -} -#else -static inline unsigned long fault_around_pages(void) -{ - unsigned long nr_pages; - - nr_pages = 1UL << FAULT_AROUND_ORDER; - BUILD_BUG_ON(nr_pages > PTRS_PER_PTE); - return nr_pages; -} - -static inline unsigned long fault_around_mask(void) -{ - return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1); -} #endif static void do_fault_around(struct vm_area_struct *vma, unsigned long address, @@ -3515,7 +3499,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma, * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages) { + if (vma->vm_ops->map_pages && fault_around_pages() > 1) { pte = pte_offset_map_lock(mm, pmd, address, &ptl); do_fault_around(vma, address, pte, pgoff, flags); if (!pte_same(*pte, orig_pte))