diff mbox

ARM: 7178/1: fault.c: Port OOM changes into do_page_fault

Message ID 1335950184-6287-2-git-send-email-paolo.pisati@canonical.com
State New
Headers show

Commit Message

Paolo Pisati May 2, 2012, 9:16 a.m. UTC
From: Kautuk Consul <consul.kautuk@gmail.com>

Commit d065bd810b6deb67d4897a14bfe21f8eb526ba99
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525d393d48a7d59f870b3bc061a30ccdb
(x86,mm: make pagefault killable)

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to ARM.

Without these changes, my ARM board encounters many hang and livelock
scenarios.
After applying this patch, OOM feature performance improves according to
my testing.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

BugLink: http://launchpad.net/bugs/951043

Signed-off-by: Ike Panhc <ike.pan@canonical.com>
(cherry picked from commit 03bc9403d0712d8b2cc6b54d0aa0cfad2144b8bd)

Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
---
 arch/arm/mm/fault.c |   58 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

Comments

Seth Forshee May 2, 2012, 2:20 p.m. UTC | #1
On Wed, May 02, 2012 at 11:16:23AM +0200, Paolo Pisati wrote:
> From: Kautuk Consul <consul.kautuk@gmail.com>
> 
> Commit d065bd810b6deb67d4897a14bfe21f8eb526ba99
> (mm: retry page fault when blocking on disk transfer) and
> commit 37b23e0525d393d48a7d59f870b3bc061a30ccdb
> (x86,mm: make pagefault killable)
> 
> The above commits introduced changes into the x86 pagefault handler
> for making the page fault handler retryable as well as killable.
> 
> These changes reduce the mmap_sem hold time, which is crucial
> during OOM killer invocation.
> 
> Port these changes to ARM.
> 
> Without these changes, my ARM board encounters many hang and livelock
> scenarios.
> After applying this patch, OOM feature performance improves according to
> my testing.
> 
> Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> BugLink: http://launchpad.net/bugs/951043
> 
> Signed-off-by: Ike Panhc <ike.pan@canonical.com>
> (cherry picked from commit 03bc9403d0712d8b2cc6b54d0aa0cfad2144b8bd)

I would have thought that the preference would be to cherry pick the
original commit, but if there's a good reason to get it from a different
tree you should probably at least note what tree it came from, because
otherwise the sha-1 is meaningless.

Looking at Linus's tree, I also see the following, which looks like it
fixes a bug in page fault accounting introduced by this patch. Should we
pick it up as well?

  commit dff2aa7af8c96a11f75d858859f0e0c78b193d12
  Author: Kautuk Consul <consul.kautuk@gmail.com>
  Date:   Mon Apr 2 18:19:49 2012 +0100

  ARM: 7368/1: fault.c: correct how the tsk->[maj|min]_flt gets incremented

  commit 8878a539ff19a43cf3729e7562cd528f490246ae was done by me
  to make the page fault handler retryable as well as interruptible.
    
  Due to this commit, there is a mistake in the way in which
   tsk->[maj|min]_flt counter gets incremented for VM_FAULT_ERROR:
  If VM_FAULT_ERROR is returned in the fault flags by handle_mm_fault,
  then either maj_flt or min_flt will get incremented. This is wrong
  as in the case of a VM_FAULT_ERROR we need to be skip ahead to the
  error handling code in do_page_fault.
    
  Added a check after the call to __do_page_fault() to check for
  (fault & VM_FAULT_ERROR).
Tim Gardner May 2, 2012, 3:45 p.m. UTC | #2
ACK to straight cherry-picks from upstream:

8878a539ff19a43cf3729e7562cd528f490246ae ARM: 7178/1: fault.c: Port OOM
changes into do_page_fault
dff2aa7af8c96a11f75d858859f0e0c78b193d12 ARM: 7368/1: fault.c: correct
how the tsk->[maj|min]_flt gets incremented
Paolo Pisati May 2, 2012, 3:49 p.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/02/2012 04:20 PM, Seth Forshee wrote:
> 
> I would have thought that the preference would be to cherry pick
> the original commit, but if there's a good reason to get it from a
> different tree you should probably at least note what tree it came
> from, because otherwise the sha-1 is meaningless.

We used to pick from other Ubuntu branches in the past (see L/dove or
N/omap4 for example), probably because patches already had the acks,
or because it was easier to demonstrate that the code was already in
use/tested within ubuntu.

> Looking at Linus's tree, I also see the following, which looks like
> it fixes a bug in page fault accounting introduced by this patch.
> Should we pick it up as well?

i'll take a look and send a follow up.

- -- 
bye,
p.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPoVeUAAoJEJdscTmfuQbQJE0P/RoR8xL+egWVitsu4XPO6AAP
rBMRd3COeKbJbFsfBD7U9p1sfz52MDGbuNuoK/shOCLL/Q1VsSSGlJguyj4Ki0Lm
fyojvYjEekGHKzzVqMjAsdlAL0VCW0R6df3AOJVyVd8oqsftUgK0eMmnXV1j/EEt
AW+xvGCcRvtJIc1pLsLSu5H0ISBs+vTOAr9cXsw6QVINQv4oKq0MRv7wLJZsUR0E
AWkYX35TT/PXNMOjKGxr3XRs+fiygqCpsQKj74bnY7zC9TcVRPL9jeSFYsjK/6DB
+G2mwHSoLXPB8ilMmVci6VNP8+0YSTSLlB2h9hNVSRPQoJq/eHtoCkJFV8PSQNTd
B99MbC/Uv87LvToSvIkdQtgCCUDb/hRxmxeNdb1VY4jCi4OxGjbu3JbioD0vXHeF
GBKVsst/fK2xuXfL4EMXX+BB9iMYXaq2oH3P0D/NdNw+jYnNUi1/gJA/zXwyjKnX
ZFQ/fyXZuTTQI7lKQheLDlSgtfMm3zLvZFdhw/Vc0x6KFbWepqdwmQNZkOvbMm6E
CSC7Pq/p09V5cfGw3auSi4FWo+Dh/rkeePovvwROFjuU4tj3WXnZYc9PhC759O09
dIIPrGVTmVc6dGHeXuqdZKJj+9YWh7jk2KADxaaJS3Rs+/lZeJnc/SWxzUri4sAK
tPa42ByRSbkpzjOQsiSn
=TpOn
-----END PGP SIGNATURE-----
Paolo Pisati May 3, 2012, 8:53 a.m. UTC | #4
SRU JUSTIFICATION 
=================

EFFECT 
======

Reduce mmap_sem lock contention on arm while using java (see below)

FIX 
=== 

These patches reduce contention on the mmap_sem lock and seem to solve the

INFO: task foo:bar blocked for more than 120 seconds

scenario (and a subsequent deadlock frequently seen using java on arm -
see http://launchpad.net/bugs/845158).

TEST 
====

The commit has already been incorporated/tested in the hwe/armadaxp kernel and impacts
only the arm side of master, proposing for P/master (omap4 will get it via
rebase).
Moreover, a fix for original patch has been incorporated.

BugLink: http://launchpad.net/bugs/845158
BugLink: http://launchpad.net/bugs/951043

v2:
- noted in the first patch the URL of the armadaxp git tree
- added a subsequent fix to the first patch

Kautuk Consul (2):
  ARM: 7178/1: fault.c: Port OOM changes into do_page_fault
  ARM: 7368/1: fault.c: correct how the tsk->[maj|min]_flt gets
    incremented

 arch/arm/mm/fault.c |   58 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)
Tim Gardner May 3, 2012, 12:39 p.m. UTC | #5
After comparing "ARM: 7178/1: fault.c: Port OOM changes into
do_page_fault" to upstream 8878a539ff19a43cf3729e7562cd528f490246ae I
just cherry-picked from Linus' repo.

rtg
diff mbox

Patch

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index aa33949..4aabeae 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -231,7 +231,7 @@  static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma)
 
 static int __kprobes
 __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr,
-		struct task_struct *tsk)
+		unsigned int flags, struct task_struct *tsk)
 {
 	struct vm_area_struct *vma;
 	int fault;
@@ -253,18 +253,7 @@  good_area:
 		goto out;
 	}
 
-	/*
-	 * If for any reason at all we couldn't handle the fault, make
-	 * sure we exit gracefully rather than endlessly redo the fault.
-	 */
-	fault = handle_mm_fault(mm, vma, addr & PAGE_MASK, (fsr & FSR_WRITE) ? FAULT_FLAG_WRITE : 0);
-	if (unlikely(fault & VM_FAULT_ERROR))
-		return fault;
-	if (fault & VM_FAULT_MAJOR)
-		tsk->maj_flt++;
-	else
-		tsk->min_flt++;
-	return fault;
+	return handle_mm_fault(mm, vma, addr & PAGE_MASK, flags);
 
 check_stack:
 	if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr))
@@ -279,6 +268,9 @@  do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	struct task_struct *tsk;
 	struct mm_struct *mm;
 	int fault, sig, code;
+	int write = fsr & FSR_WRITE;
+	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
+				(write ? FAULT_FLAG_WRITE : 0);
 
 	if (notify_page_fault(regs, fsr))
 		return 0;
@@ -305,6 +297,7 @@  do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	if (!down_read_trylock(&mm->mmap_sem)) {
 		if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc))
 			goto no_context;
+retry:
 		down_read(&mm->mmap_sem);
 	} else {
 		/*
@@ -320,14 +313,41 @@  do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 #endif
 	}
 
-	fault = __do_page_fault(mm, addr, fsr, tsk);
-	up_read(&mm->mmap_sem);
+	fault = __do_page_fault(mm, addr, fsr, flags, tsk);
+
+	/* If we need to retry but a fatal signal is pending, handle the
+	 * signal first. We do not need to release the mmap_sem because
+	 * it would already be released in __lock_page_or_retry in
+	 * mm/filemap.c. */
+	if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
+		return 0;
+
+	/*
+	 * Major/minor page fault accounting is only done on the
+	 * initial attempt. If we go through a retry, it is extremely
+	 * likely that the page will be found in page cache at that point.
+	 */
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
-	if (fault & VM_FAULT_MAJOR)
-		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, addr);
-	else if (fault & VM_FAULT_MINOR)
-		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, addr);
+	if (flags & FAULT_FLAG_ALLOW_RETRY) {
+		if (fault & VM_FAULT_MAJOR) {
+			tsk->maj_flt++;
+			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
+					regs, addr);
+		} else {
+			tsk->min_flt++;
+			perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
+					regs, addr);
+		}
+		if (fault & VM_FAULT_RETRY) {
+			/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
+			* of starvation. */
+			flags &= ~FAULT_FLAG_ALLOW_RETRY;
+			goto retry;
+		}
+	}
+
+	up_read(&mm->mmap_sem);
 
 	/*
 	 * Handle the "normal" case first - VM_FAULT_MAJOR / VM_FAULT_MINOR