diff mbox series

[3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion

Message ID 20180828112034.30875-4-npiggin@gmail.com (mailing list archive)
State Superseded
Headers show
Series mm: dirty/accessed pte optimisations | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied
snowpatch_ozlabs/checkpatch success Test checkpatch on branch next
snowpatch_ozlabs/build-ppc64le success Test build-ppc64le on branch next
snowpatch_ozlabs/build-ppc64be success Test build-ppc64be on branch next
snowpatch_ozlabs/build-ppc64e success Test build-ppc64e on branch next
snowpatch_ozlabs/build-ppc32 success Test build-ppc32 on branch next

Commit Message

Nicholas Piggin Aug. 28, 2018, 11:20 a.m. UTC
Similarly to the previous patch, this tries to optimise dirty/accessed
bits in ptes to avoid access costs of hardware setting them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/huge_memory.c | 12 +++++++-----
 mm/memory.c      |  8 +++++---
 2 files changed, 12 insertions(+), 8 deletions(-)

Comments

Guenter Roeck Sept. 5, 2018, 2:29 p.m. UTC | #1
Hi,

On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> Similarly to the previous patch, this tries to optimise dirty/accessed
> bits in ptes to avoid access costs of hardware setting them.
> 

This patch results in silent nios2 boot failures, silent meaning that
the boot stalls.

...
Unpacking initramfs...
Freeing initrd memory: 2168K
workingset: timestamp_bits=30 max_order=15 bucket_order=0
jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
random: fast init done
random: crng init done

[no further activity until the qemu session is aborted]

Reverting the patch fixes the problem. Bisect log is attached.

Guenter

---
# bad: [387ac6229ecf6e012649d4fc409c5352655a4cf0] Add linux-next specific files for 20180905
# good: [57361846b52bc686112da6ca5368d11210796804] Linux 4.19-rc2
git bisect start 'HEAD' 'v4.19-rc2'
# good: [668570e8389bb076bea9b7531553e1362f5abd11] Merge remote-tracking branch 'net-next/master'
git bisect good 668570e8389bb076bea9b7531553e1362f5abd11
# good: [7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6] Merge remote-tracking branch 'battery/for-next'
git bisect good 7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6
# good: [c31458d3e03e3a2edeaab225a22eaf68c07c8290] Merge remote-tracking branch 'rpmsg/for-next'
git bisect good c31458d3e03e3a2edeaab225a22eaf68c07c8290
# good: [e0f43dcbe9af8ac72f39fe92c5d0ee1883546427] Merge remote-tracking branch 'nvdimm/libnvdimm-for-next'
git bisect good e0f43dcbe9af8ac72f39fe92c5d0ee1883546427
# bad: [f509e2c0f3cd11df238f0f1b5ba013fe726decdf] of: ignore sub-page memory regions
git bisect bad f509e2c0f3cd11df238f0f1b5ba013fe726decdf
# good: [2f7eebf30b87534f7e4c3982307579d9adc782a5] ocfs2: fix clusters leak in ocfs2_defrag_extent()
git bisect good 2f7eebf30b87534f7e4c3982307579d9adc782a5
# good: [119eb88c9dd23e305939ad748237100078e304a8] mm/swapfile.c: call free_swap_slot() in __swap_entry_free()
git bisect good 119eb88c9dd23e305939ad748237100078e304a8
# good: [21d64d37adf3ab20b4c3a1951018e84bf815c887] mm: remove vm_insert_pfn()
git bisect good 21d64d37adf3ab20b4c3a1951018e84bf815c887
# good: [90cd1a69010844e9dbfc43279d681d798812b962] cramfs: convert to use vmf_insert_mixed
git bisect good 90cd1a69010844e9dbfc43279d681d798812b962
# good: [c7dd91289b4bb4c400a8a71953511991815f8e6f] mm/cow: optimise pte dirty/accessed bits handling in fork
git bisect good c7dd91289b4bb4c400a8a71953511991815f8e6f
# bad: [87d74ae75700a39effcb8c9ed8a8445e719ac369] hexagon: switch to NO_BOOTMEM
git bisect bad 87d74ae75700a39effcb8c9ed8a8445e719ac369
# bad: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
git bisect bad 3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360
# first bad commit: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
Nicholas Piggin Sept. 5, 2018, 10:18 p.m. UTC | #2
On Wed, 5 Sep 2018 07:29:51 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > Similarly to the previous patch, this tries to optimise dirty/accessed
> > bits in ptes to avoid access costs of hardware setting them.
> >   
> 
> This patch results in silent nios2 boot failures, silent meaning that
> the boot stalls.
> 
> ...
> Unpacking initramfs...
> Freeing initrd memory: 2168K
> workingset: timestamp_bits=30 max_order=15 bucket_order=0
> jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
> random: fast init done
> random: crng init done
> 
> [no further activity until the qemu session is aborted]
> 
> Reverting the patch fixes the problem. Bisect log is attached.

Thanks for bisecting it, I'll try to reproduce. Just qemu with no
obscure options? Interesting that it's hit nios2 but apparently not
other archs (yet).

Thanks,
Nick
Guenter Roeck Sept. 6, 2018, 12:36 a.m. UTC | #3
On 09/05/2018 03:18 PM, Nicholas Piggin wrote:
> On Wed, 5 Sep 2018 07:29:51 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> Hi,
>>
>> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
>>> Similarly to the previous patch, this tries to optimise dirty/accessed
>>> bits in ptes to avoid access costs of hardware setting them.
>>>    
>>
>> This patch results in silent nios2 boot failures, silent meaning that
>> the boot stalls.
>>
>> ...
>> Unpacking initramfs...
>> Freeing initrd memory: 2168K
>> workingset: timestamp_bits=30 max_order=15 bucket_order=0
>> jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
>> random: fast init done
>> random: crng init done
>>
>> [no further activity until the qemu session is aborted]
>>
>> Reverting the patch fixes the problem. Bisect log is attached.
> 
> Thanks for bisecting it, I'll try to reproduce. Just qemu with no
> obscure options? Interesting that it's hit nios2 but apparently not
> other archs (yet).
> 

Nothing special. See https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2/.

Guenter
Nicholas Piggin Sept. 17, 2018, 5:53 p.m. UTC | #4
On Wed, 5 Sep 2018 07:29:51 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > Similarly to the previous patch, this tries to optimise dirty/accessed
> > bits in ptes to avoid access costs of hardware setting them.
> >   
> 
> This patch results in silent nios2 boot failures, silent meaning that
> the boot stalls.

Okay I just got back to looking at this. The reason for the hang is
I think a bug in the nios2 TLB code, but maybe other archs have similar
issues.

In case of a missing / !present Linux pte, nios2 installs a TLB entry
with no permissions via its fast TLB exception handler (software TLB
fill). Then it relies on that causing a TLB permission exception in a
slower handler that calls handle_mm_fault to set the Linux pte and
flushes the old TLB. Then the fast exception handler will find the new
Linux pte.

With this patch, nios2 has a case where handle_mm_fault does not flush
the old TLB, which results in the TLB permission exception continually
being retried.

What happens now is that fault paths like do_read_fault will install a
Linux pte with the young bit clear and return. That will cause nios2 to
fault again but this time go down the bottom of handle_pte_fault and to
the access flags update with the young bit set. The young bit is seen to
be different, so that causes ptep_set_access_flags to do a TLB flush and
that finally allows the fast TLB handler to fire and pick up the new
Linux pte.

With this patch, the young bit is set in the first handle_mm_fault, so
the second handle_mm_fault no longer sees the ptes are different and
does not flush the TLB. The spurious fault handler also does not flush
them unless FAULT_FLAG_WRITE is set.

What nios2 should do is invalidate the TLB in update_mmu_cache. What it
*really* should do is install the new TLB entry, I have some patches to
make that work in qemu I can submit. But I would like to try getting
these dirty/accessed bit optimisation in 4.20, so I will send a simple
path to just do the TLB invalidate that could go in Andrew's git tree.

Is that agreeable with the nios2 maintainers?

Thanks,
Nick
Ley Foon Tan Sept. 21, 2018, 8:42 a.m. UTC | #5
On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote:
> On Wed, 5 Sep 2018 07:29:51 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > 
> > Hi,
> > 
> > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > > 
> > > Similarly to the previous patch, this tries to optimise
> > > dirty/accessed
> > > bits in ptes to avoid access costs of hardware setting them.
> > > 
> > This patch results in silent nios2 boot failures, silent meaning
> > that
> > the boot stalls.
> Okay I just got back to looking at this. The reason for the hang is
> I think a bug in the nios2 TLB code, but maybe other archs have
> similar
> issues.
> 
> In case of a missing / !present Linux pte, nios2 installs a TLB entry
> with no permissions via its fast TLB exception handler (software TLB
> fill). Then it relies on that causing a TLB permission exception in a
> slower handler that calls handle_mm_fault to set the Linux pte and
> flushes the old TLB. Then the fast exception handler will find the
> new
> Linux pte.
> 
> With this patch, nios2 has a case where handle_mm_fault does not
> flush
> the old TLB, which results in the TLB permission exception
> continually
> being retried.
> 
> What happens now is that fault paths like do_read_fault will install
> a
> Linux pte with the young bit clear and return. That will cause nios2
> to
> fault again but this time go down the bottom of handle_pte_fault and
> to
> the access flags update with the young bit set. The young bit is seen
> to
> be different, so that causes ptep_set_access_flags to do a TLB flush
> and
> that finally allows the fast TLB handler to fire and pick up the new
> Linux pte.
> 
> With this patch, the young bit is set in the first handle_mm_fault,
> so
> the second handle_mm_fault no longer sees the ptes are different and
> does not flush the TLB. The spurious fault handler also does not
> flush
> them unless FAULT_FLAG_WRITE is set.
> 
> What nios2 should do is invalidate the TLB in update_mmu_cache. What
> it
> *really* should do is install the new TLB entry, I have some patches
> to
> make that work in qemu I can submit. But I would like to try getting
> these dirty/accessed bit optimisation in 4.20, so I will send a
> simple
> path to just do the TLB invalidate that could go in Andrew's git
> tree.
> 
> Is that agreeable with the nios2 maintainers?
> 
> Thanks,
> Nick
> 
Hi

Do you have patches to test?

Regards
Ley Foon
Nicholas Piggin Sept. 23, 2018, 9:23 a.m. UTC | #6
On Fri, 21 Sep 2018 16:42:05 +0800
Ley Foon Tan <ley.foon.tan@intel.com> wrote:

> On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote:
> > On Wed, 5 Sep 2018 07:29:51 -0700
> > Guenter Roeck <linux@roeck-us.net> wrote:
> >   
> > > 
> > > Hi,
> > > 
> > > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:  
> > > > 
> > > > Similarly to the previous patch, this tries to optimise
> > > > dirty/accessed
> > > > bits in ptes to avoid access costs of hardware setting them.
> > > >   
> > > This patch results in silent nios2 boot failures, silent meaning
> > > that
> > > the boot stalls.  
> > Okay I just got back to looking at this. The reason for the hang is
> > I think a bug in the nios2 TLB code, but maybe other archs have
> > similar
> > issues.
> > 
> > In case of a missing / !present Linux pte, nios2 installs a TLB entry
> > with no permissions via its fast TLB exception handler (software TLB
> > fill). Then it relies on that causing a TLB permission exception in a
> > slower handler that calls handle_mm_fault to set the Linux pte and
> > flushes the old TLB. Then the fast exception handler will find the
> > new
> > Linux pte.
> > 
> > With this patch, nios2 has a case where handle_mm_fault does not
> > flush
> > the old TLB, which results in the TLB permission exception
> > continually
> > being retried.
> > 
> > What happens now is that fault paths like do_read_fault will install
> > a
> > Linux pte with the young bit clear and return. That will cause nios2
> > to
> > fault again but this time go down the bottom of handle_pte_fault and
> > to
> > the access flags update with the young bit set. The young bit is seen
> > to
> > be different, so that causes ptep_set_access_flags to do a TLB flush
> > and
> > that finally allows the fast TLB handler to fire and pick up the new
> > Linux pte.
> > 
> > With this patch, the young bit is set in the first handle_mm_fault,
> > so
> > the second handle_mm_fault no longer sees the ptes are different and
> > does not flush the TLB. The spurious fault handler also does not
> > flush
> > them unless FAULT_FLAG_WRITE is set.
> > 
> > What nios2 should do is invalidate the TLB in update_mmu_cache. What
> > it
> > *really* should do is install the new TLB entry, I have some patches
> > to
> > make that work in qemu I can submit. But I would like to try getting
> > these dirty/accessed bit optimisation in 4.20, so I will send a
> > simple
> > path to just do the TLB invalidate that could go in Andrew's git
> > tree.
> > 
> > Is that agreeable with the nios2 maintainers?
> > 
> > Thanks,
> > Nick
> >   
> Hi
> 
> Do you have patches to test?

I've been working on some, it has taken longer than I expected, I'll
hopefully have something to send out by tomorrow.

Thanks,
Nick
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5fb1a43e12e0..2c169041317f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1197,6 +1197,7 @@  static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 	for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
 		pte_t entry;
 		entry = mk_pte(pages[i], vma->vm_page_prot);
+		entry = pte_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		memcg = (void *)page_private(pages[i]);
 		set_page_private(pages[i], 0);
@@ -2067,7 +2068,7 @@  static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	struct page *page;
 	pgtable_t pgtable;
 	pmd_t old_pmd, _pmd;
-	bool young, write, soft_dirty, pmd_migration = false;
+	bool young, write, dirty, soft_dirty, pmd_migration = false;
 	unsigned long addr;
 	int i;
 
@@ -2145,8 +2146,7 @@  static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		page = pmd_page(old_pmd);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	page_ref_add(page, HPAGE_PMD_NR - 1);
-	if (pmd_dirty(old_pmd))
-		SetPageDirty(page);
+	dirty = pmd_dirty(old_pmd);
 	write = pmd_write(old_pmd);
 	young = pmd_young(old_pmd);
 	soft_dirty = pmd_soft_dirty(old_pmd);
@@ -2176,8 +2176,10 @@  static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			entry = maybe_mkwrite(entry, vma);
 			if (!write)
 				entry = pte_wrprotect(entry);
-			if (!young)
-				entry = pte_mkold(entry);
+			if (young)
+				entry = pte_mkyoung(entry);
+			if (dirty)
+				entry = pte_mkdirty(entry);
 			if (soft_dirty)
 				entry = pte_mksoft_dirty(entry);
 		}
diff --git a/mm/memory.c b/mm/memory.c
index 3d8bf8220bd0..d205ba69918c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1830,10 +1830,9 @@  static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
 		entry = pte_mkspecial(pfn_t_pte(pfn, prot));
 
 out_mkwrite:
-	if (mkwrite) {
-		entry = pte_mkyoung(entry);
+	entry = pte_mkyoung(entry);
+	if (mkwrite)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-	}
 
 	set_pte_at(mm, addr, pte, entry);
 	update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
@@ -2560,6 +2559,7 @@  static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		}
 		flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
 		entry = mk_pte(new_page, vma->vm_page_prot);
+		entry = pte_mkyoung(entry);
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 		/*
 		 * Clear the pte entry and flush it first, before updating the
@@ -3069,6 +3069,7 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 	dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS);
 	pte = mk_pte(page, vma->vm_page_prot);
+	pte = pte_mkyoung(pte);
 	if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
 		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
 		vmf->flags &= ~FAULT_FLAG_WRITE;
@@ -3479,6 +3480,7 @@  vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 
 	flush_icache_page(vma, page);
 	entry = mk_pte(page, vma->vm_page_prot);
+	entry = pte_mkyoung(entry);
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 	/* copy-on-write page */