Patchwork [1/2] cpuhotplug/nohz: Remove offline cpus from nohz-idle state

login
register
mail settings
Submitter Srivatsa Vaddagiri
Date Jan. 4, 2013, 2:58 a.m.
Message ID <1357268318-7993-1-git-send-email-vatsa@codeaurora.org>
Download mbox | patch
Permalink /patch/209351/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Srivatsa Vaddagiri - Jan. 4, 2013, 2:58 a.m.
Modify idle loop of arm, mips, s390, sh and x86 architectures to exit from nohz
state before dying upon hot-remove. This change is needed to avoid userspace
tools like top command from seeing a rollback in total idle time over some
sampling periods.

Additionaly, modify idle loop on all architectures supporting cpu hotplug to
have idle thread of a dying cpu die immediately after scheduler returns control
to it. There is no point in wasting time via calls to *_enter()/*_exit() before
noticing the need to die and dying.

Additional ARM specific change:
	Revert commit ff081e05 ("ARM: 7457/1: smp: Fix suspicious
RCU originating from cpu_die()"), which added a RCU_NONIDLE() wrapper
around call to complete(). That wrapper is no longer needed as cpu_die() is
now called outside of a rcu_idle_enter()/exit() section. I also think that the
wait_for_completion() based wait in ARM's __cpu_die() can be replaced with a
busy-loop based one, as the wait there in general should be terminated within
few cycles.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: linux-sh@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: mhocko@suse.cz
Cc: srivatsa.bhat@linux.vnet.ibm.com
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
---
 arch/arm/kernel/process.c      |    9 ++++-----
 arch/arm/kernel/smp.c          |    2 +-
 arch/blackfin/kernel/process.c |    8 ++++----
 arch/mips/kernel/process.c     |    6 +++---
 arch/powerpc/kernel/idle.c     |    2 +-
 arch/s390/kernel/process.c     |    4 ++--
 arch/sh/kernel/idle.c          |    5 ++---
 arch/sparc/kernel/process_64.c |    3 ++-
 arch/x86/kernel/process.c      |    5 ++---
 9 files changed, 21 insertions(+), 23 deletions(-)
Russell King - ARM Linux - Jan. 5, 2013, 10:36 a.m.
On Thu, Jan 03, 2013 at 06:58:38PM -0800, Srivatsa Vaddagiri wrote:
> I also think that the
> wait_for_completion() based wait in ARM's __cpu_die() can be replaced with a
> busy-loop based one, as the wait there in general should be terminated within
> few cycles.

Why open-code this stuff when we have infrastructure already in the kernel
for waiting for stuff to happen?  I chose to use the standard infrastructure
because its better tested, and avoids having to think about whether we need
CPU barriers and such like to ensure that updates are seen in a timely
manner.

My stance on a lot of this idle/cpu dying code is that much of it can
probably be cleaned up and merged into a single common implementation -
in which case the use of standard infrastructure for things like waiting
for other CPUs do stuff is even more justified.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa Vaddagiri - Jan. 8, 2013, 4:27 a.m.
* Russell King - ARM Linux <linux@arm.linux.org.uk> [2013-01-05 10:36:27]:

> On Thu, Jan 03, 2013 at 06:58:38PM -0800, Srivatsa Vaddagiri wrote:
> > I also think that the
> > wait_for_completion() based wait in ARM's __cpu_die() can be replaced with a
> > busy-loop based one, as the wait there in general should be terminated within
> > few cycles.
> 
> Why open-code this stuff when we have infrastructure already in the kernel
> for waiting for stuff to happen?  I chose to use the standard infrastructure
> because its better tested, and avoids having to think about whether we need
> CPU barriers and such like to ensure that updates are seen in a timely
> manner.

I was primarily thinking of calling as few generic functions as possible on
a dead cpu. I recall several "am I running on a dead cpu?" checks
(cpu_is_offline(this_cpu) that were put in generic routines during early
versions of cpu hotplug [1] to educate code running on dead cpu, the need for
which went away though with introduction of atomic/stop-machine variant. The
need to add a RCU_NONIDLE() wrapper around ARM's cpu_die() [2] is perhaps a more
recent example of educating code running on dead cpu. As quickly we die as
possible after idle thread of dying cpu gains control, the better!

1. http://lwn.net/Articles/69040/
2. http://lists.infradead.org/pipermail/linux-arm-kernel/2012-July/107971.html

- vatsa
Srivatsa S. Bhat - Jan. 8, 2013, 6:55 a.m.
On 01/05/2013 04:06 PM, Russell King - ARM Linux wrote:
> On Thu, Jan 03, 2013 at 06:58:38PM -0800, Srivatsa Vaddagiri wrote:
>> I also think that the
>> wait_for_completion() based wait in ARM's __cpu_die() can be replaced with a
>> busy-loop based one, as the wait there in general should be terminated within
>> few cycles.
> 
> Why open-code this stuff when we have infrastructure already in the kernel
> for waiting for stuff to happen?  I chose to use the standard infrastructure
> because its better tested, and avoids having to think about whether we need
> CPU barriers and such like to ensure that updates are seen in a timely
> manner.
> 
> My stance on a lot of this idle/cpu dying code is that much of it can
> probably be cleaned up and merged into a single common implementation -
> in which case the use of standard infrastructure for things like waiting
> for other CPUs do stuff is even more justified.

On similar lines, Nikunj (in CC) and I had posted a patchset sometime ago to
consolidate some of the CPU hotplug related code in the various architectures
into a common standard implementation [1].

However, we ended up hitting a problem with Xen, because its existing code
was unlike the other arch/ pieces [2]. At that time, we decided that we will
first make the CPU online and offline paths symmetric in the generic code and
then provide a common implementation of the duplicated bits in arch/, for the
new CPU hotplug model [3].

I guess we should probably revisit it sometime, consolidating the code in
incremental steps if not all at a time...

--
[1]. http://lwn.net/Articles/500185/
[2]. http://thread.gmane.org/gmane.linux.kernel.cross-arch/14342/focus=14430
[3]. http://thread.gmane.org/gmane.linux.kernel.cross-arch/14342/focus=15567

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index c6dec5f..254099b 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -191,11 +191,6 @@  void cpu_idle(void)
 		rcu_idle_enter();
 		ledtrig_cpu(CPU_LED_IDLE_START);
 		while (!need_resched()) {
-#ifdef CONFIG_HOTPLUG_CPU
-			if (cpu_is_offline(smp_processor_id()))
-				cpu_die();
-#endif
-
 			/*
 			 * We need to disable interrupts here
 			 * to ensure we don't miss a wakeup call.
@@ -224,6 +219,10 @@  void cpu_idle(void)
 		rcu_idle_exit();
 		tick_nohz_idle_exit();
 		schedule_preempt_disabled();
+#ifdef CONFIG_HOTPLUG_CPU
+		if (cpu_is_offline(smp_processor_id()))
+			cpu_die();
+#endif
 	}
 }
 
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 84f4cbf..a8e3b8a 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -251,7 +251,7 @@  void __ref cpu_die(void)
 	mb();
 
 	/* Tell __cpu_die() that this CPU is now safe to dispose of */
-	RCU_NONIDLE(complete(&cpu_died));
+	complete(&cpu_died);
 
 	/*
 	 * actual CPU shutdown procedure is at least platform (if not
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 3e16ad9..2bee1af 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -83,10 +83,6 @@  void cpu_idle(void)
 	while (1) {
 		void (*idle)(void) = pm_idle;
 
-#ifdef CONFIG_HOTPLUG_CPU
-		if (cpu_is_offline(smp_processor_id()))
-			cpu_die();
-#endif
 		if (!idle)
 			idle = default_idle;
 		tick_nohz_idle_enter();
@@ -98,6 +94,10 @@  void cpu_idle(void)
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
+#ifdef CONFIG_HOTPLUG_CPU
+		if (cpu_is_offline(smp_processor_id()))
+			cpu_die();
+#endif
 	}
 }
 
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index a11c6f9..41102a0 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -71,13 +71,13 @@  void __noreturn cpu_idle(void)
 				start_critical_timings();
 			}
 		}
+		rcu_idle_exit();
+		tick_nohz_idle_exit();
+		schedule_preempt_disabled();
 #ifdef CONFIG_HOTPLUG_CPU
 		if (!cpu_online(cpu) && !cpu_isset(cpu, cpu_callin_map))
 			play_dead();
 #endif
-		rcu_idle_exit();
-		tick_nohz_idle_exit();
-		schedule_preempt_disabled();
 	}
 }
 
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index ea78761..39ad029 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -102,11 +102,11 @@  void cpu_idle(void)
 		ppc64_runlatch_on();
 		rcu_idle_exit();
 		tick_nohz_idle_exit();
+		schedule_preempt_disabled();
 		if (cpu_should_die()) {
 			sched_preempt_enable_no_resched();
 			cpu_die();
 		}
-		schedule_preempt_disabled();
 	}
 }
 
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 536d645..5290556 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -66,8 +66,6 @@  unsigned long thread_saved_pc(struct task_struct *tsk)
  */
 static void default_idle(void)
 {
-	if (cpu_is_offline(smp_processor_id()))
-		cpu_die();
 	local_irq_disable();
 	if (need_resched()) {
 		local_irq_enable();
@@ -95,6 +93,8 @@  void cpu_idle(void)
 		if (test_thread_flag(TIF_MCCK_PENDING))
 			s390_handle_mcck();
 		schedule_preempt_disabled();
+		if (cpu_is_offline(smp_processor_id()))
+			cpu_die();
 	}
 }
 
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 0c91016..f8bc2f0 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -96,9 +96,6 @@  void cpu_idle(void)
 			check_pgt_cache();
 			rmb();
 
-			if (cpu_is_offline(cpu))
-				play_dead();
-
 			local_irq_disable();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
@@ -115,6 +112,8 @@  void cpu_idle(void)
 		rcu_idle_exit();
 		tick_nohz_idle_exit();
 		schedule_preempt_disabled();
+		if (cpu_is_offline(cpu))
+			play_dead();
 	}
 }
 
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index cdb80b2..01589e7 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -105,13 +105,14 @@  void cpu_idle(void)
 		rcu_idle_exit();
 		tick_nohz_idle_exit();
 
+		schedule_preempt_disabled();
+
 #ifdef CONFIG_HOTPLUG_CPU
 		if (cpu_is_offline(cpu)) {
 			sched_preempt_enable_no_resched();
 			cpu_play_dead();
 		}
 #endif
-		schedule_preempt_disabled();
 	}
 }
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 2ed787f..3d5f142 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -331,9 +331,6 @@  void cpu_idle(void)
 		while (!need_resched()) {
 			rmb();
 
-			if (cpu_is_offline(smp_processor_id()))
-				play_dead();
-
 			/*
 			 * Idle routines should keep interrupts disabled
 			 * from here on, until they go to idle.
@@ -366,6 +363,8 @@  void cpu_idle(void)
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
+		if (cpu_is_offline(smp_processor_id()))
+			play_dead();
 	}
 }