diff mbox

[V2,2/2] powerpc/thp: Serialize pmd clear against a linux page table walk.

Message ID 1430983408-24924-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show

Commit Message

Aneesh Kumar K.V May 7, 2015, 7:23 a.m. UTC
Serialize against find_linux_pte_or_hugepte which does lock-less
lookup in page tables with local interrupts disabled. For huge pages
it casts pmd_t to pte_t. Since format of pte_t is different from
pmd_t we want to prevent transit from pmd pointing to page table
to pmd pointing to huge page (and back) while interrupts are disabled.
We clear pmd to possibly replace it with page table pointer in
different code paths. So make sure we wait for the parallel
find_linux_pte_or_hugepage to finish.

Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
Changes from v1:
* Move kick_all_cpus_sync to pmdp_get_and_clear so that it handle zap_huge_pmd
  case also.

 arch/powerpc/mm/pgtable_64.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Andrew Morton May 8, 2015, 10:21 p.m. UTC | #1
On Thu,  7 May 2015 12:53:28 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Serialize against find_linux_pte_or_hugepte which does lock-less
> lookup in page tables with local interrupts disabled. For huge pages
> it casts pmd_t to pte_t. Since format of pte_t is different from
> pmd_t we want to prevent transit from pmd pointing to page table
> to pmd pointing to huge page (and back) while interrupts are disabled.
> We clear pmd to possibly replace it with page table pointer in
> different code paths. So make sure we wait for the parallel
> find_linux_pte_or_hugepage to finish.

I'm not seeing here any description of the problem which is being
fixed.  Does the patch make the machine faster?  Does the machine
crash?
Aneesh Kumar K.V May 11, 2015, 6:30 a.m. UTC | #2
Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu,  7 May 2015 12:53:28 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Serialize against find_linux_pte_or_hugepte which does lock-less
>> lookup in page tables with local interrupts disabled. For huge pages
>> it casts pmd_t to pte_t. Since format of pte_t is different from
>> pmd_t we want to prevent transit from pmd pointing to page table
>> to pmd pointing to huge page (and back) while interrupts are disabled.
>> We clear pmd to possibly replace it with page table pointer in
>> different code paths. So make sure we wait for the parallel
>> find_linux_pte_or_hugepage to finish.
>
> I'm not seeing here any description of the problem which is being
> fixed.  Does the patch make the machine faster?  Does the machine
> crash?

I sent v3 with updated commit message. Adding that below.

    powerpc/thp: Serialize pmd clear against a linux page table walk.
    
    Serialize against find_linux_pte_or_hugepte which does lock-less
    lookup in page tables with local interrupts disabled. For huge pages
    it casts pmd_t to pte_t. Since format of pte_t is different from
    pmd_t we want to prevent transit from pmd pointing to page table
    to pmd pointing to huge page (and back) while interrupts are disabled.
    We clear pmd to possibly replace it with page table pointer in
    different code paths. So make sure we wait for the parallel
    find_linux_pte_or_hugepage to finish.
    
    Without this patch, a find_linux_pte_or_hugepte running in parallel to
    __split_huge_zero_page_pmd or do_huge_pmd_wp_page_fallback or zap_huge_pmd
    can run into the above issue. With __split_huge_zero_page_pmd and
    do_huge_pmd_wp_page_fallback we clear the hugepage pte before inserting
    the pmd entry with a regular pgtable address. Such a clear need to
    wait for the parallel find_linux_pte_or_hugepte to finish.
    
    With zap_huge_pmd, we can run into issues, with a hugepage pte
    getting zapped due to a MADV_DONTNEED while other cpu fault it
    in as small pages.

-aneesh
diff mbox

Patch

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 9171c1a37290..049d961802aa 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -845,6 +845,17 @@  pmd_t pmdp_get_and_clear(struct mm_struct *mm,
 	 * hash fault look at them.
 	 */
 	memset(pgtable, 0, PTE_FRAG_SIZE);
+	/*
+	 * Serialize against find_linux_pte_or_hugepte which does lock-less
+	 * lookup in page tables with local interrupts disabled. For huge pages
+	 * it casts pmd_t to pte_t. Since format of pte_t is different from
+	 * pmd_t we want to prevent transit from pmd pointing to page table
+	 * to pmd pointing to huge page (and back) while interrupts are disabled.
+	 * We clear pmd to possibly replace it with page table pointer in
+	 * different code paths. So make sure we wait for the parallel
+	 * find_linux_pte_or_hugepage to finish.
+	 */
+	kick_all_cpus_sync();
 	return old_pmd;
 }