diff mbox

[v3.13,v3.14,Regression] kthread: make kthread_create() killable

Message ID 201403170013.GJF86930.FtVOOQOHLFFMSJ@I-love.SAKURA.ne.jp
State New
Headers show

Commit Message

Tetsuo Handa March 16, 2014, 3:13 p.m. UTC
Hello.

Although the process who is sending SIGKILL to systemd-udevd is not
identified yet, adding wait_for_completion_timeout() can solve this
regression. Therefore, I'd like to propose this patch for 3.14-final
and 3.13-stable.

Regards.
----------
>From c6562e5d774dd1f36724197dbcb8976cccfaab53 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sun, 16 Mar 2014 22:44:23 +0900
Subject: [PATCH] kthread: Do not leave kthread_create() immediately upon SIGKILL.

Commit 786235ee "kthread: make kthread_create() killable" changed to
leave kthread_create() as soon as receiving SIGKILL. But this change
caused boot failures if systemd-udevd received SIGKILL (probably due
to timeout) while loading SCSI controller drivers using
finit_module() [1].

Therefore, this patch changes kthread_create() to wait for 10 more
seconds after receiving SIGKILL, unless chosen by the OOM killer,
in order to give the kthreadd a chance to complete the request.

  [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: <stable@vger.kernel.org> [3.13+]
---
 kernel/kthread.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

Comments

Tetsuo Handa March 17, 2014, 12:38 p.m. UTC | #1
Oleg Nesterov wrote:
> > @@ -292,6 +292,17 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
> >  	 * new kernel thread.
> >  	 */
> >  	if (unlikely(wait_for_completion_killable(&done))) {
> > +		int i = 0;
> > +
> > +		/*
> > +		 * I got SIGKILL, but wait for 10 more seconds for completion
> > +		 * unless chosen by the OOM killer. This delay is there as a
> > +		 * workaround for boot failure caused by SIGKILL upon device
> > +		 * driver initialization timeout.
> > +		 */
> > +		while (i++ < 10 && !test_tsk_thread_flag(current, TIF_MEMDIE))
> > +			if (wait_for_completion_timeout(&done, HZ))
> > +				goto ready;
> 
> Personally I really dislike this hack. And btw, why we return -ENOMEM if
> SIGKILL'ed? Why not EINTR ?

I chose -ENOMEM because -ENOMEM looked better for conveying that current thread
was SIGKILLed by the OOM killer in order to solve no memory state. But I forgot
that -ENOMEM will not convey the reason properly if current thread was
SIGKILLed by other than the OOM killer. Maybe

  return test_tsk_thread_flag(current, TIF_MEMDIE) ? -ENOMEM : -EINTR;

rather than

  return -ENOMEM;

?

> > Commit 786235ee "kthread: make kthread_create() killable" changed to
> > leave kthread_create() as soon as receiving SIGKILL. But this change
> > caused boot failures if systemd-udevd received SIGKILL (probably due
> > to timeout) while loading SCSI controller drivers using
> > finit_module() [1].
> 
> Shouldn't we fix the caller instead? It should handle the error from
> kthread_create() correctly.
> 
> And could you tell who is the caller which doesn't do this? If it can't
> be fixed, then, say, it can use workqueue to create a kernel thread.
> 

There are many callers. One of them is scsi_host_alloc() which is called
upon bootup in order to recognize SCSI storage devices.

To my surprise, systemd-udevd process sends SIGKILL to worker systemd-udevd
processes if they did not complete their jobs within 30 seconds. On some
machines, it takes more than 30 seconds to recognize SCSI storage devices.

On such machines, scsi_host_alloc() is called after the worker process
received SIGKILL. Since commit 786235ee "kthread: make kthread_create()
killable" broke all callers of kthread_create() who had been able to survive
SIGKILL, I think fixing this regression at kthread_create() is the appropriate
response.

Given that said, which one do we prefer?

  (a) Wait for completion forever after receiving SIGKILL, unless chosen
      by the OOM killer.

  (b) Wait for completion for only limited duration after receiving SIGKILL.

This patch is (b) which waits for only 10 seconds after receiving SIGKILL.
(a) will change "kthread: make kthread_create() killable" to
"kthread: allow the OOM-killer to kill kthread_create()".
diff mbox

Patch

diff --git a/kernel/kthread.c b/kernel/kthread.c
index b5ae3ee..52ae7dc 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -292,6 +292,17 @@  struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 	 * new kernel thread.
 	 */
 	if (unlikely(wait_for_completion_killable(&done))) {
+		int i = 0;
+
+		/*
+		 * I got SIGKILL, but wait for 10 more seconds for completion
+		 * unless chosen by the OOM killer. This delay is there as a
+		 * workaround for boot failure caused by SIGKILL upon device
+		 * driver initialization timeout.
+		 */
+		while (i++ < 10 && !test_tsk_thread_flag(current, TIF_MEMDIE))
+			if (wait_for_completion_timeout(&done, HZ))
+				goto ready;
 		/*
 		 * If I was SIGKILLed before kthreadd (or new kernel thread)
 		 * calls complete(), leave the cleanup of this structure to
@@ -305,6 +316,7 @@  struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
 		 */
 		wait_for_completion(&done);
 	}
+ready:
 	task = create->result;
 	if (!IS_ERR(task)) {
 		static const struct sched_param param = { .sched_priority = 0 };