diff mbox

[3/4] mm: numa: Mark huge PTEs young when clearing NUMA hinting faults

Message ID 1425741651-29152-4-git-send-email-mgorman@suse.de (mailing list archive)
State Not Applicable
Delegated to: Michael Ellerman
Headers show

Commit Message

Mel Gorman March 7, 2015, 3:20 p.m. UTC
Base PTEs are marked young when the NUMA hinting information is cleared
but the same does not happen for huge pages which this patch addresses.
Note that migrated pages are not marked young as the base page migration
code does not assume that migrated pages have been referenced. This could
be addressed but beyond the scope of this series which is aimed at Dave
Chinners shrink workload that is unlikely to be affected by this issue.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/huge_memory.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Linus Torvalds March 7, 2015, 6:33 p.m. UTC | #1
On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@suse.de> wrote:
>         pmd = pmd_modify(pmd, vma->vm_page_prot);
> +       pmd = pmd_mkyoung(pmd);

Hmm. I *thought* this should be unnecessary. vm_page_prot alreadty has
the accessed bit set, and we kind of depend on the initial page table
setup and mk_pte() and friends (ie all new pages are installed
"young").

But it looks like I am wrong - the way we use _[H]PAGE_CHG_MASK means
that we always take the accessed and dirty bits from the old entry,
ignoring the bit in vm_page_prot.

I wonder if we should just make pte/pmd_modify() work the way I
*thought* they worked (remove the masking of vm_page_prot bits).

So the patch isn't wrong. It's just that we *migth* instead just do
something like this:

    arch/x86/include/asm/pgtable.h | 4 ++--
    1 file changed, 2 insertions(+), 2 deletions(-)

   diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
   index a0c35bf6cb92..79b898bb9e18 100644
   --- a/arch/x86/include/asm/pgtable.h
   +++ b/arch/x86/include/asm/pgtable.h
   @@ -355,7 +355,7 @@ static inline pte_t pte_modify(pte_t pte,
pgprot_t newprot)
            * the newprot (if present):
            */
           val &= _PAGE_CHG_MASK;
   -       val |= massage_pgprot(newprot) & ~_PAGE_CHG_MASK;
   +       val |= massage_pgprot(newprot);

           return __pte(val);
    }
   @@ -365,7 +365,7 @@ static inline pmd_t pmd_modify(pmd_t pmd,
pgprot_t newprot)
           pmdval_t val = pmd_val(pmd);

           val &= _HPAGE_CHG_MASK;
   -       val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK;
   +       val |= massage_pgprot(newprot);

           return __pmd(val);
    }

instead, and remove the mkyoung. Completely untested, but that "just
or in the new protection bits" is what pnf_pte() does just a few lines
above this.

                        Linus
Linus Torvalds March 7, 2015, 6:42 p.m. UTC | #2
On Sat, Mar 7, 2015 at 10:33 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>             Completely untested, but that "just
> or in the new protection bits" is what pnf_pte() does just a few lines
> above this.

Hmm. Looking at this, we do *not* want to set _PAGE_ACCESSED when we
turn a page into PROT_NONE or mark it for numa faulting. Nor do we
want to set it for mprotect for random pages that we haven't actually
accessed, just changed the protections for.

So my patch was obviously wrong, and I should feel bad for suggesting
it. I'm a moron, and my expectations that "pte_modify()" would just
take the accessed bit from the vm_page_prot field was stupid and
wrong.

Mel's patch is the right thing to do.

                                Linus
diff mbox

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 194c0f019774..ae13ad31e113 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1359,6 +1359,7 @@  int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 clear_pmdnuma:
 	BUG_ON(!PageLocked(page));
 	pmd = pmd_modify(pmd, vma->vm_page_prot);
+	pmd = pmd_mkyoung(pmd);
 	set_pmd_at(mm, haddr, pmdp, pmd);
 	update_mmu_cache_pmd(vma, addr, pmdp);
 	unlock_page(page);