Message ID | 1335950184-6287-2-git-send-email-paolo.pisati@canonical.com |
---|---|
State | New |
Headers | show |
On Wed, May 02, 2012 at 11:16:23AM +0200, Paolo Pisati wrote: > From: Kautuk Consul <consul.kautuk@gmail.com> > > Commit d065bd810b6deb67d4897a14bfe21f8eb526ba99 > (mm: retry page fault when blocking on disk transfer) and > commit 37b23e0525d393d48a7d59f870b3bc061a30ccdb > (x86,mm: make pagefault killable) > > The above commits introduced changes into the x86 pagefault handler > for making the page fault handler retryable as well as killable. > > These changes reduce the mmap_sem hold time, which is crucial > during OOM killer invocation. > > Port these changes to ARM. > > Without these changes, my ARM board encounters many hang and livelock > scenarios. > After applying this patch, OOM feature performance improves according to > my testing. > > Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > > BugLink: http://launchpad.net/bugs/951043 > > Signed-off-by: Ike Panhc <ike.pan@canonical.com> > (cherry picked from commit 03bc9403d0712d8b2cc6b54d0aa0cfad2144b8bd) I would have thought that the preference would be to cherry pick the original commit, but if there's a good reason to get it from a different tree you should probably at least note what tree it came from, because otherwise the sha-1 is meaningless. Looking at Linus's tree, I also see the following, which looks like it fixes a bug in page fault accounting introduced by this patch. Should we pick it up as well? commit dff2aa7af8c96a11f75d858859f0e0c78b193d12 Author: Kautuk Consul <consul.kautuk@gmail.com> Date: Mon Apr 2 18:19:49 2012 +0100 ARM: 7368/1: fault.c: correct how the tsk->[maj|min]_flt gets incremented commit 8878a539ff19a43cf3729e7562cd528f490246ae was done by me to make the page fault handler retryable as well as interruptible. Due to this commit, there is a mistake in the way in which tsk->[maj|min]_flt counter gets incremented for VM_FAULT_ERROR: If VM_FAULT_ERROR is returned in the fault flags by handle_mm_fault, then either maj_flt or min_flt will get incremented. This is wrong as in the case of a VM_FAULT_ERROR we need to be skip ahead to the error handling code in do_page_fault. Added a check after the call to __do_page_fault() to check for (fault & VM_FAULT_ERROR).
ACK to straight cherry-picks from upstream: 8878a539ff19a43cf3729e7562cd528f490246ae ARM: 7178/1: fault.c: Port OOM changes into do_page_fault dff2aa7af8c96a11f75d858859f0e0c78b193d12 ARM: 7368/1: fault.c: correct how the tsk->[maj|min]_flt gets incremented
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/02/2012 04:20 PM, Seth Forshee wrote: > > I would have thought that the preference would be to cherry pick > the original commit, but if there's a good reason to get it from a > different tree you should probably at least note what tree it came > from, because otherwise the sha-1 is meaningless. We used to pick from other Ubuntu branches in the past (see L/dove or N/omap4 for example), probably because patches already had the acks, or because it was easier to demonstrate that the code was already in use/tested within ubuntu. > Looking at Linus's tree, I also see the following, which looks like > it fixes a bug in page fault accounting introduced by this patch. > Should we pick it up as well? i'll take a look and send a follow up. - -- bye, p. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJPoVeUAAoJEJdscTmfuQbQJE0P/RoR8xL+egWVitsu4XPO6AAP rBMRd3COeKbJbFsfBD7U9p1sfz52MDGbuNuoK/shOCLL/Q1VsSSGlJguyj4Ki0Lm fyojvYjEekGHKzzVqMjAsdlAL0VCW0R6df3AOJVyVd8oqsftUgK0eMmnXV1j/EEt AW+xvGCcRvtJIc1pLsLSu5H0ISBs+vTOAr9cXsw6QVINQv4oKq0MRv7wLJZsUR0E AWkYX35TT/PXNMOjKGxr3XRs+fiygqCpsQKj74bnY7zC9TcVRPL9jeSFYsjK/6DB +G2mwHSoLXPB8ilMmVci6VNP8+0YSTSLlB2h9hNVSRPQoJq/eHtoCkJFV8PSQNTd B99MbC/Uv87LvToSvIkdQtgCCUDb/hRxmxeNdb1VY4jCi4OxGjbu3JbioD0vXHeF GBKVsst/fK2xuXfL4EMXX+BB9iMYXaq2oH3P0D/NdNw+jYnNUi1/gJA/zXwyjKnX ZFQ/fyXZuTTQI7lKQheLDlSgtfMm3zLvZFdhw/Vc0x6KFbWepqdwmQNZkOvbMm6E CSC7Pq/p09V5cfGw3auSi4FWo+Dh/rkeePovvwROFjuU4tj3WXnZYc9PhC759O09 dIIPrGVTmVc6dGHeXuqdZKJj+9YWh7jk2KADxaaJS3Rs+/lZeJnc/SWxzUri4sAK tPa42ByRSbkpzjOQsiSn =TpOn -----END PGP SIGNATURE-----
SRU JUSTIFICATION ================= EFFECT ====== Reduce mmap_sem lock contention on arm while using java (see below) FIX === These patches reduce contention on the mmap_sem lock and seem to solve the INFO: task foo:bar blocked for more than 120 seconds scenario (and a subsequent deadlock frequently seen using java on arm - see http://launchpad.net/bugs/845158). TEST ==== The commit has already been incorporated/tested in the hwe/armadaxp kernel and impacts only the arm side of master, proposing for P/master (omap4 will get it via rebase). Moreover, a fix for original patch has been incorporated. BugLink: http://launchpad.net/bugs/845158 BugLink: http://launchpad.net/bugs/951043 v2: - noted in the first patch the URL of the armadaxp git tree - added a subsequent fix to the first patch Kautuk Consul (2): ARM: 7178/1: fault.c: Port OOM changes into do_page_fault ARM: 7368/1: fault.c: correct how the tsk->[maj|min]_flt gets incremented arch/arm/mm/fault.c | 58 ++++++++++++++++++++++++++++++++++----------------- 1 file changed, 39 insertions(+), 19 deletions(-)
After comparing "ARM: 7178/1: fault.c: Port OOM changes into do_page_fault" to upstream 8878a539ff19a43cf3729e7562cd528f490246ae I just cherry-picked from Linus' repo. rtg
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index aa33949..4aabeae 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -231,7 +231,7 @@ static inline bool access_error(unsigned int fsr, struct vm_area_struct *vma) static int __kprobes __do_page_fault(struct mm_struct *mm, unsigned long addr, unsigned int fsr, - struct task_struct *tsk) + unsigned int flags, struct task_struct *tsk) { struct vm_area_struct *vma; int fault; @@ -253,18 +253,7 @@ good_area: goto out; } - /* - * If for any reason at all we couldn't handle the fault, make - * sure we exit gracefully rather than endlessly redo the fault. - */ - fault = handle_mm_fault(mm, vma, addr & PAGE_MASK, (fsr & FSR_WRITE) ? FAULT_FLAG_WRITE : 0); - if (unlikely(fault & VM_FAULT_ERROR)) - return fault; - if (fault & VM_FAULT_MAJOR) - tsk->maj_flt++; - else - tsk->min_flt++; - return fault; + return handle_mm_fault(mm, vma, addr & PAGE_MASK, flags); check_stack: if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr)) @@ -279,6 +268,9 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) struct task_struct *tsk; struct mm_struct *mm; int fault, sig, code; + int write = fsr & FSR_WRITE; + unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE | + (write ? FAULT_FLAG_WRITE : 0); if (notify_page_fault(regs, fsr)) return 0; @@ -305,6 +297,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) if (!down_read_trylock(&mm->mmap_sem)) { if (!user_mode(regs) && !search_exception_tables(regs->ARM_pc)) goto no_context; +retry: down_read(&mm->mmap_sem); } else { /* @@ -320,14 +313,41 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) #endif } - fault = __do_page_fault(mm, addr, fsr, tsk); - up_read(&mm->mmap_sem); + fault = __do_page_fault(mm, addr, fsr, flags, tsk); + + /* If we need to retry but a fatal signal is pending, handle the + * signal first. We do not need to release the mmap_sem because + * it would already be released in __lock_page_or_retry in + * mm/filemap.c. */ + if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + return 0; + + /* + * Major/minor page fault accounting is only done on the + * initial attempt. If we go through a retry, it is extremely + * likely that the page will be found in page cache at that point. + */ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr); - if (fault & VM_FAULT_MAJOR) - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, addr); - else if (fault & VM_FAULT_MINOR) - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, addr); + if (flags & FAULT_FLAG_ALLOW_RETRY) { + if (fault & VM_FAULT_MAJOR) { + tsk->maj_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, + regs, addr); + } else { + tsk->min_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, + regs, addr); + } + if (fault & VM_FAULT_RETRY) { + /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk + * of starvation. */ + flags &= ~FAULT_FLAG_ALLOW_RETRY; + goto retry; + } + } + + up_read(&mm->mmap_sem); /* * Handle the "normal" case first - VM_FAULT_MAJOR / VM_FAULT_MINOR