powerpc/pseries: Fix cpu hotplug

Message ID	20081127115952.1f7db89d@bull.net (mailing list archive)
State	Accepted, archived
Commit	b906cfa397fdef8decbd36467b1f63c830a0bf2b
Delegated to:	Paul Mackerras
Headers	show Return-Path: <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org> Date: Thu, 27 Nov 2008 11:59:52 +0100 From: Sebastien Dugue <sebastien.dugue@bull.net> To: linux-ppc <linuxppc-dev@ozlabs.org> Subject: [PATCH] powerpc/pseries: Fix cpu hotplug Message-ID: <20081127115952.1f7db89d@bull.net> Mime-Version: 1.0 Cc: Will Schmidt <will_schmidt@vnet.ibm.com>, Paul Mackerras <paulus@samba.org>, Jean Pierre Dion <jean-pierre.dion@bull.net>, Gilles Carry <Gilles.Carry@ext.bull.net> Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org

Message ID

20081127115952.1f7db89d@bull.net (mailing list archive)

State

Accepted, archived

Commit

b906cfa397fdef8decbd36467b1f63c830a0bf2b

Delegated to:

Paul Mackerras

Headers

show

Return-Path: <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from ozlabs.org (localhost [127.0.0.1])
	by ozlabs.org (Postfix) with ESMTP id B0D16DDF25
	for <patchwork-incoming@ozlabs.org>;
	Thu, 27 Nov 2008 22:00:56 +1100 (EST)
X-Original-To: linuxppc-dev@ozlabs.org
Delivered-To: linuxppc-dev@ozlabs.org
Received: from ecfrec.frec.bull.fr (ecfrec.frec.bull.fr [129.183.4.8])
	by ozlabs.org (Postfix) with ESMTP id A6A10DDE08
	for <linuxppc-dev@ozlabs.org>; Thu, 27 Nov 2008 21:59:58 +1100 (EST)
Received: from localhost (localhost [127.0.0.1])
	by ecfrec.frec.bull.fr (Postfix) with ESMTP
	id 159451A18F2; Thu, 27 Nov 2008 11:59:56 +0100 (CET)
Received: from ecfrec.frec.bull.fr ([127.0.0.1])
	by localhost (ecfrec.frec.bull.fr [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id 25754-04; Thu, 27 Nov 2008 11:59:52 +0100 (CET)
Received: from cyclope.frec.bull.fr (cyclope.frec.bull.fr [129.183.4.9])
	by ecfrec.frec.bull.fr (Postfix) with ESMTP
	id A76A31A18B0; Thu, 27 Nov 2008 11:59:52 +0100 (CET)
Received: from localhost (frecb000686.frec.bull.fr [129.183.101.139])
	by cyclope.frec.bull.fr (Postfix) with ESMTP id BF0C127289;
	Thu, 27 Nov 2008 11:59:49 +0100 (CET)
Date: Thu, 27 Nov 2008 11:59:52 +0100
From: Sebastien Dugue <sebastien.dugue@bull.net>
To: linux-ppc <linuxppc-dev@ozlabs.org>
Subject: [PATCH] powerpc/pseries: Fix cpu hotplug
Message-ID: <20081127115952.1f7db89d@bull.net>
X-Mailer: Claws Mail 3.5.0 (GTK+ 2.12.2; i486-pc-linux-gnu)
Mime-Version: 1.0
X-Virus-Scanned: by amavisd-new at frec.bull.fr
Cc: Will Schmidt <will_schmidt@vnet.ibm.com>,
	Paul Mackerras <paulus@samba.org>, 
	Jean Pierre Dion <jean-pierre.dion@bull.net>,
	Gilles Carry <Gilles.Carry@ext.bull.net>
X-BeenThere: linuxppc-dev@ozlabs.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: Linux on PowerPC Developers Mail List <linuxppc-dev.ozlabs.org>
List-Unsubscribe: <https://ozlabs.org/mailman/options/linuxppc-dev>,
	<mailto:linuxppc-dev-request@ozlabs.org?subject=unsubscribe>
List-Archive: <http://ozlabs.org/pipermail/linuxppc-dev>
List-Post: <mailto:linuxppc-dev@ozlabs.org>
List-Help: <mailto:linuxppc-dev-request@ozlabs.org?subject=help>
List-Subscribe: <https://ozlabs.org/mailman/listinfo/linuxppc-dev>,
	<mailto:linuxppc-dev-request@ozlabs.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org
Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org

Commit Message

Sebastien Dugue Nov. 27, 2008, 10:59 a.m. UTC

Currently, pseries_cpu_die() calls msleep() while polling RTAS for
the status of the dying cpu.

  However if the cpu that is going down also happens to be the one doing
the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed
later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification.
Therefore jiffies won't be updated anymore.

  This patch replaces that msleep() with a cpu_relax() to make sure we're
not going to schedule at that point.

  With this patch my test box survives a 100k iterations hotplug stress
test on _all_ cpus, whereas without it, it quickly dies after ~50 iterations.


Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Nathan Lynch Nov. 28, 2008, 12:14 a.m. UTC | #1

Hi, I have some questions about this patch.

Sebastien Dugue wrote:
> 
>   Currently, pseries_cpu_die() calls msleep() while polling RTAS for
> the status of the dying cpu.
> 
>   However if the cpu that is going down also happens to be the one doing
> the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed
> later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification.
> Therefore jiffies won't be updated anymore.

I confess unfamiliarity with the tick/timer code, but this sounds like
something that should be addressed earlier in the process of taking
down a CPU.

>   This patch replaces that msleep() with a cpu_relax() to make sure we're
> not going to schedule at that point.

This is a significant change in behavior.  With the msleep(), we poll
for at least five seconds before giving up; with the cpu_relax(), the
period will almost certainly be much shorter and we're likely to give
up too soon in some circumstances.  Could be addressed by using
mdelay(), but...

It's just not clear to me how busy-waiting in the __cpu_die() path is
a legitimate fix.  Is sleeping in this path forbidden now?  (I notice
at least native_cpu_die() in x86 does msleep(), btw.)

As it can take several milliseconds for RTAS to report a CPU
offline, and the maximum latency of the operation is unspecified, it
seems inappropriate to tie up the waiting CPU this way.

>   With this patch my test box survives a 100k iterations hotplug stress
> test on _all_ cpus, whereas without it, it quickly dies after ~50 iterations.

What is the failure (e.g. stack trace, kernel messages)?

Sebastien Dugue Nov. 28, 2008, 10:04 a.m. UTC | #2

Hi Nathan,

On Thu, 27 Nov 2008 18:14:33 -0600 Nathan Lynch <ntl@pobox.com> wrote:

> Hi, I have some questions about this patch.
> 
> Sebastien Dugue wrote:
> > 
> >   Currently, pseries_cpu_die() calls msleep() while polling RTAS for
> > the status of the dying cpu.
> > 
> >   However if the cpu that is going down also happens to be the one doing
> > the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed
> > later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification.
> > Therefore jiffies won't be updated anymore.
> 
> I confess unfamiliarity with the tick/timer code, but this sounds like
> something that should be addressed earlier in the process of taking
> down a CPU.

  Maybe you're right, at least the tick_do_timer_cpu should be changed earlier
in the down process, but I'm not sure where we can do that.

> 
> 
> >   This patch replaces that msleep() with a cpu_relax() to make sure we're
> > not going to schedule at that point.
> 
> This is a significant change in behavior.  With the msleep(), we poll
> for at least five seconds before giving up; with the cpu_relax(), the
> period will almost certainly be much shorter and we're likely to give
> up too soon in some circumstances.

  Right, I realized a bit late that that would indeed change the
behaviour. On my test box (2 Power6) the msleep call is hit in ~10 % of the
cases and only loop once, but that may not be the case for all the pSeries
out there where a longer delay might be needed.

>  Could be addressed by using
> mdelay(), but...

  Yep, would be better. I'm still wondering why the hang is not systematic
when offlining the tick_do_timer_cpu, I must have missed something in
my analysis and there might be a race somewhere I failed to identify.

> 
> It's just not clear to me how busy-waiting in the __cpu_die() path is
> a legitimate fix.  Is sleeping in this path forbidden now?

  In that case, I think it is if the cpu going down is also doing the
tick.

>  (I notice
> at least native_cpu_die() in x86 does msleep(), btw.)

  Right, as most other arches do, but the thing is that on those, you
cannot offline CPU0, whereas on power (and maybe otheres too) you can.

> 
> As it can take several milliseconds for RTAS to report a CPU
> offline, and the maximum latency of the operation is unspecified, it
> seems inappropriate to tie up the waiting CPU this way.

  Agreed.

> 
> 
> >   With this patch my test box survives a 100k iterations hotplug stress
> > test on _all_ cpus, whereas without it, it quickly dies after ~50 iterations.
> 
> What is the failure (e.g. stack trace, kernel messages)?

  No stack trace, no kernel messages, nothing :( When that happens, the
cpus get stuck in idle and won't reschedule at all.

  I verified that the decrementer is still ticking, hrtimers are still
running but regular timers are stuck. Changing the tick_do_timer_cpu manually
under xmon resurects the box.

  Will have to look at this a bit more.

  Thanks,

  Sebastien.

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 1f03248..a20ead8 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -116,7 +116,7 @@  static void pseries_cpu_die(unsigned int cpu)
 		cpu_status = query_cpu_stopped(pcpu);
 		if (cpu_status == 0 || cpu_status == -1)
 			break;
-		msleep(200);
+		cpu_relax();
 	}
 	if (cpu_status != 0) {
 		printk("Querying DEAD? cpu %i (%i) shows %i\n",

powerpc/pseries: Fix cpu hotplug

Commit Message

Comments

Patch