Message ID | 1425741651-29152-2-git-send-email-mgorman@suse.de (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Michael Ellerman |
Headers | show |
Looks obviously correct. The old code was just very wrong. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Linus On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@suse.de> wrote: > The wrong value is being returned by change_huge_pmd since commit > 10c1045f28e8 ("mm: numa: avoid unnecessary TLB flushes when setting > NUMA hinting entries") which allows a fallthrough that tries to adjust > non-existent PTEs. This patch corrects it. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/huge_memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index fc00c8cb5a82..194c0f019774 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1482,6 +1482,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > > if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { > pmd_t entry; > + ret = 1; > > /* > * Avoid trapping faults against the zero page. The read-only > @@ -1490,11 +1491,10 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > */ > if (prot_numa && is_huge_zero_pmd(*pmd)) { > spin_unlock(ptl); > - return 0; > + return ret; > } > > if (!prot_numa || !pmd_protnone(*pmd)) { > - ret = 1; > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > entry = pmd_modify(entry, newprot); > ret = HPAGE_PMD_NR; > -- > 2.1.2 >
On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@suse.de> wrote: > > if (!prot_numa || !pmd_protnone(*pmd)) { > - ret = 1; > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > entry = pmd_modify(entry, newprot); > ret = HPAGE_PMD_NR; Hmm. I know I acked this already, but the return value - which correct - is still potentially something we could improve upon. In particular, we don't need to flush the TLB's if the old entry was not present. Sadly, we don't have a helper function for that. But the code *could* do something like entry = pmdp_get_and_clear_notify(mm, addr, pmd); ret = pmd_tlb_cacheable(entry) ? HPAGE_PMD_NR : 1; entry = pmd_modify(entry, newprot); where pmd_tlb_cacheable() on x86 would test if _PAGE_PRESENT (bit #0) is set. In particular, that would mean that as we change *from* a protnone (whether NUMA or really protnone) we wouldn't need to flush the TLB. In fact, we could make it even more aggressive: it's not just an old non-present TLB entry that doesn't need flushing - we can avoid the flushing whenever we strictly increase the access rigths. So we could have something that takes the old entry _and_ the new protections into account, and avoids the TLB flush if the new entry is strictly more permissive. This doesn't explain the extra TLB flushes Dave sees, though, because the old code didn't make those kinds of optimizations either. But maybe something like this is worth doing. Linus
On Sat, Mar 07, 2015 at 12:31:03PM -0800, Linus Torvalds wrote: > On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@suse.de> wrote: > > > > if (!prot_numa || !pmd_protnone(*pmd)) { > > - ret = 1; > > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > > entry = pmd_modify(entry, newprot); > > ret = HPAGE_PMD_NR; > > Hmm. I know I acked this already, but the return value - which correct > - is still potentially something we could improve upon. > > In particular, we don't need to flush the TLB's if the old entry was > not present. Sadly, we don't have a helper function for that. > > But the code *could* do something like > > entry = pmdp_get_and_clear_notify(mm, addr, pmd); > ret = pmd_tlb_cacheable(entry) ? HPAGE_PMD_NR : 1; > entry = pmd_modify(entry, newprot); > > where pmd_tlb_cacheable() on x86 would test if _PAGE_PRESENT (bit #0) is set. > I agree with you in principle. pmd_tlb_cacheable looks and sounds very similar to pte_accessible(). > In particular, that would mean that as we change *from* a protnone > (whether NUMA or really protnone) we wouldn't need to flush the TLB. > > In fact, we could make it even more aggressive: it's not just an old > non-present TLB entry that doesn't need flushing - we can avoid the > flushing whenever we strictly increase the access rigths. So we could > have something that takes the old entry _and_ the new protections into > account, and avoids the TLB flush if the new entry is strictly more > permissive. > > This doesn't explain the extra TLB flushes Dave sees, though, because > the old code didn't make those kinds of optimizations either. But > maybe something like this is worth doing. > I think it is worth doing although it'll be after LSF/MM before I do it. I severely doubt this is what Dave is seeing because the vmstats indicated there was no THP activity but it's still a good idea.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fc00c8cb5a82..194c0f019774 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1482,6 +1482,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) { pmd_t entry; + ret = 1; /* * Avoid trapping faults against the zero page. The read-only @@ -1490,11 +1491,10 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, */ if (prot_numa && is_huge_zero_pmd(*pmd)) { spin_unlock(ptl); - return 0; + return ret; } if (!prot_numa || !pmd_protnone(*pmd)) { - ret = 1; entry = pmdp_get_and_clear_notify(mm, addr, pmd); entry = pmd_modify(entry, newprot); ret = HPAGE_PMD_NR;
The wrong value is being returned by change_huge_pmd since commit 10c1045f28e8 ("mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries") which allows a fallthrough that tries to adjust non-existent PTEs. This patch corrects it. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)