Message ID | 20180629182005.10243-1-abrodkin@synopsys.com |
---|---|
State | New |
Headers | show |
Series | ARC: Improve handling of fatal signals in do_page_fault() | expand |
Hi Vineet, On Fri, 2018-06-29 at 11:20 -0700, Alexey Brodkin wrote: > This was triggered by investigation of a deadlock after OOM killer invocation, > see [1] for more details. > > Looks like our handling of fatal signal in do_page_fault() has some issues: > > 1. We only want to do special (read "early") handling of fatal signal > if handle_mm_fault() returned VM_FAULT_RETRY so that we don't loop > in retry loop endlessly, otherwise we'll handle that signal normally > on exit from exception handler. > > 2. up_read() is not needed as indeed it will be done in __lock_page_or_retry() > in mm/filemap.c. > > With above comments in mind simplified version should be like that: > ------------------------------->8--------------------------- > if (fatal_signal_pending(current) > if (fault & VM_FAULT_RETRY) > if (user_mode(regs)) > return; > ------------------------------->8--------------------------- > > But looks like there's a room for improvement, see [2]. > Instead of proceeding forward and then inevitably hitting retry path we > short-cut right to kernel fix-up code in no_context. > > [1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-February/003403.html > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=746a272e44141af24a02f6c9b0f65f4c4598ed42 > > Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> I should have put that in commit message right away:) This patch fixes OOM lock-up we were seeing previously, see STAR 9001304674 "OOM killer hangs system". -Alexey
Hi Alexey, I was finally forced to revisit this for my glibc tst-tls3-malloc deadlock. And indeed with this change we don'tsee the deadlock. But see below.. > @@ -139,12 +139,16 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) > */ > fault = handle_mm_fault(vma, address, flags); > > - /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */ > + /* If we need to retry but a fatal signal is pending, handle the > + * signal first. We do not need to release the mmap_sem because > + * it would already be released in __lock_page_or_retry in > + * mm/filemap.c. */ Right and we were already doing that: up_read() was called for !VM_FAULT_RETRY meaning we relied on the core mm to do that already for VM_FAULT_RETRY case. The issue here was additional check for VM_FAULT_ERROR. Typically this is not set by handle_mm_fault() meaning for common user faults with signal pending, we were not calling up_read, hence the ensuing deadlock. > if (unlikely(fatal_signal_pending(current))) { > - if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) > - up_read(&mm->mmap_sem); > - if (user_mode(regs)) > + if (fault & VM_FAULT_RETRY) { > + if (!user_mode(regs)) > + goto no_context; Given this code is really tricky, lets only solve one problem with 1 one patch. > return; > + } > } The fault handling is spaghetti mess of checks and more checks and has not really been touched since upstreaming. I need to clean it up and essentially rewrite it for v4.19
Hi Vineet, On Wed, 2018-08-01 at 12:49 -0700, Vineet Gupta wrote: > Hi Alexey, > > I was finally forced to revisit this for my glibc tst-tls3-malloc deadlock. And > indeed with this change we don'tsee the deadlock. But see below.. > > > > @@ -139,12 +139,16 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) > > */ > > fault = handle_mm_fault(vma, address, flags); > > > > - /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */ > > + /* If we need to retry but a fatal signal is pending, handle the > > + * signal first. We do not need to release the mmap_sem because > > + * it would already be released in __lock_page_or_retry in > > + * mm/filemap.c. */ > > Right and we were already doing that: up_read() was called for !VM_FAULT_RETRY > meaning we relied on the core mm to do that already for VM_FAULT_RETRY case. > > The issue here was additional check for VM_FAULT_ERROR. Typically this is not set > by handle_mm_fault() meaning for common user faults with signal pending, we were > not calling up_read, hence the ensuing deadlock. Right. > > if (unlikely(fatal_signal_pending(current))) { > > - if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) > > - up_read(&mm->mmap_sem); > > - if (user_mode(regs)) > > + if (fault & VM_FAULT_RETRY) { > > + if (!user_mode(regs)) > > + goto no_context; > > Given this code is really tricky, lets only solve one problem with 1 one patch. Agree. > > return; > > + } > > } > > The fault handling is spaghetti mess of checks and more checks and has not really > been touched since upstreaming. I need to clean it up and essentially rewrite it > for v4.19 So would you like me to send a re-spin with less changes as discussed above so we have something better for now and for back-porting to stable branches. Or you're going to rewrite all that sometime soon yourself? -Alexey
On 08/02/2018 12:08 AM, Alexey Brodkin wrote: >> The fault handling is spaghetti mess of checks and more checks and has not really >> been touched since upstreaming. I need to clean it up and essentially rewrite it >> for v4.19 > So would you like me to send a re-spin with less changes as discussed above so > we have something better for now and for back-porting to stable branches. No that's fine, I'll fix it up here - minimally for the stable backports and do the rewrite later. BTW I tried to reproduce the Old OOM hangs from LTP (STARS 9001304674, 9001281305) to prove that this fix solves those too, but I can't seem to reproduce the same deadlock without this fix on 4.17 kernel. -Vineet
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index a0b7bd6d030d..17ed78e2f5eb 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -139,12 +139,16 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) */ fault = handle_mm_fault(vma, address, flags); - /* If Pagefault was interrupted by SIGKILL, exit page fault "early" */ + /* If we need to retry but a fatal signal is pending, handle the + * signal first. We do not need to release the mmap_sem because + * it would already be released in __lock_page_or_retry in + * mm/filemap.c. */ if (unlikely(fatal_signal_pending(current))) { - if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY)) - up_read(&mm->mmap_sem); - if (user_mode(regs)) + if (fault & VM_FAULT_RETRY) { + if (!user_mode(regs)) + goto no_context; return; + } } perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
This was triggered by investigation of a deadlock after OOM killer invocation, see [1] for more details. Looks like our handling of fatal signal in do_page_fault() has some issues: 1. We only want to do special (read "early") handling of fatal signal if handle_mm_fault() returned VM_FAULT_RETRY so that we don't loop in retry loop endlessly, otherwise we'll handle that signal normally on exit from exception handler. 2. up_read() is not needed as indeed it will be done in __lock_page_or_retry() in mm/filemap.c. With above comments in mind simplified version should be like that: ------------------------------->8--------------------------- if (fatal_signal_pending(current) if (fault & VM_FAULT_RETRY) if (user_mode(regs)) return; ------------------------------->8--------------------------- But looks like there's a room for improvement, see [2]. Instead of proceeding forward and then inevitably hitting retry path we short-cut right to kernel fix-up code in no_context. [1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-February/003403.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=746a272e44141af24a02f6c9b0f65f4c4598ed42 Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> --- arch/arc/mm/fault.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)