Message ID | 1386268702-30806-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
On 12/05/2013 01:38 PM, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. > On archs like ppc64 that don't use _PAGE_PROTNONE and also have > a separate page table outside linux pagetable, we just need to > make sure that when calling change_prot_numa we flush the > hardware page table entry so that next page access result in a numa > fault. > > We still need to make sure we use the numa faulting logic only > when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault > (Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING > is set. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com>
On Fri, Dec 06, 2013 at 12:08:22AM +0530, Aneesh Kumar K.V wrote: > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > > change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. > On archs like ppc64 that don't use _PAGE_PROTNONE and also have > a separate page table outside linux pagetable, we just need to > make sure that when calling change_prot_numa we flush the > hardware page table entry so that next page access result in a numa > fault. > > We still need to make sure we use the numa faulting logic only > when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault > (Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING > is set. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> You're right on that there is no direct dependance on numa balancing and use of prot_none. The BUILD_BUG_ON was to flag very clearly that arches wanting to support automatic NUMA balancing must ensure such things as o _PAGE_NUMA is defined o setting _PAGE_NUMA traps a fault and the fault can be uniquely identified as being a numa hinting fault o that pte_present still returns true for pte_numa pages even though the underlying present bit may be cleared. Otherwise operations like following and copying ptes will get confused o shortly, arches will also need to avoid taking references on pte_numa pages in get_user_pages to account for hinting faults properly I guess the _PAGE_NUMA parts will already be caught by other checks and the rest will fall out during testing so it's ok to remove. Acked-by: Mel Gorman <mgorman@suse.de>
diff --git a/include/linux/mm.h b/include/linux/mm.h index 1cedd000cf29..a7b4e310bf42 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1842,7 +1842,7 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) } #endif -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long start, unsigned long end); #endif diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eca4a3129129..9f73b29d304d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -613,7 +613,7 @@ static inline int queue_pages_pgd_range(struct vm_area_struct *vma, return 0; } -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING /* * This is used to mark a range of virtual addresses to be inaccessible. * These are later cleared by a NUMA hinting fault. Depending on these @@ -627,7 +627,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long addr, unsigned long end) { int nr_updated; - BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE); nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1); if (nr_updated) @@ -641,7 +640,7 @@ static unsigned long change_prot_numa(struct vm_area_struct *vma, { return 0; } -#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */ +#endif /* CONFIG_NUMA_BALANCING */ /* * Walk through page tables and collect pages to be migrated.