diff mbox

[v3,00/10] Add tegra30 support for secondary cores

Message ID 20120229154229.GL12917@tbergstrom-lnx.Nvidia.com
State Not Applicable, archived
Headers show

Commit Message

Peter De Schrijver Feb. 29, 2012, 3:42 p.m. UTC
On Wed, Feb 15, 2012 at 10:27:43PM +0100, Stephen Warren wrote:
> Peter De Schrijver wrote at Thursday, February 09, 2012 4:48 PM:
> > This patchset introduces support for secondary cores on tegra30. It also
> > introduces some functions for the flowcontroller and adds basic support for
> > tegra30 powerdomains.
> 
> The code looks fine to me. I do notice the following when running
> "shutdown -h now":
> 
> [   27.762124] SMP: failed to stop secondary CPUs
> 
> So, this patch series makes the secondary cores boot OK, but not shut
> down it appears. Is that expected? It's probably something we can fix
> with a following on patch rather than re-spinning this series though?
> 

I spent some more time looking at this. It appears that nr_online_cpus isn't
always updated properly. I added an extra pr_warn() when the failure happens
and I got:

[   22.369815] Restarting system.
[   23.506792] SMP: failed to stop secondary CPUs
[   23.511535] running on CPU: 3, still online: 2, mask: 8

I also did some function tracing and all CPUs besides the one initiating the
halt reach platform_cpu_kill() about 50us after sending the IPIs to stop them.
The set_cpu_online() call in ipi_cpu_stop() should update nr_online_cpus, but
somehow this doesn't always happen. Any insight on what might be going on?

Instrumentation patch attached.

Cheers,

Peter.

Comments

Stephen Warren Feb. 29, 2012, 10:38 p.m. UTC | #1
Peter De Schrijver wrote at Wednesday, February 29, 2012 8:42 AM:
> On Wed, Feb 15, 2012 at 10:27:43PM +0100, Stephen Warren wrote:
> > Peter De Schrijver wrote at Thursday, February 09, 2012 4:48 PM:
> > > This patchset introduces support for secondary cores on tegra30. It also
> > > introduces some functions for the flowcontroller and adds basic support for
> > > tegra30 powerdomains.
> >
> > The code looks fine to me. I do notice the following when running
> > "shutdown -h now":
> >
> > [   27.762124] SMP: failed to stop secondary CPUs

I can't seem to reproduce this at all on next-20120229 anymore. What
kind of repro frequency are you seeing, and does it get fixed if you
update to that baseline?

(I tried 20 times with shutdown, and 15 of them I reverted all the CPU
and cache errata in case they affected the issue at all)
Peter De Schrijver March 1, 2012, 11:42 a.m. UTC | #2
On Wed, Feb 29, 2012 at 11:38:16PM +0100, Stephen Warren wrote:
> Peter De Schrijver wrote at Wednesday, February 29, 2012 8:42 AM:
> > On Wed, Feb 15, 2012 at 10:27:43PM +0100, Stephen Warren wrote:
> > > Peter De Schrijver wrote at Thursday, February 09, 2012 4:48 PM:
> > > > This patchset introduces support for secondary cores on tegra30. It also
> > > > introduces some functions for the flowcontroller and adds basic support for
> > > > tegra30 powerdomains.
> > >
> > > The code looks fine to me. I do notice the following when running
> > > "shutdown -h now":
> > >
> > > [   27.762124] SMP: failed to stop secondary CPUs
> 
> I can't seem to reproduce this at all on next-20120229 anymore. What
> kind of repro frequency are you seeing, and does it get fixed if you
> update to that baseline?
> 

I see it abount once every 3 reboots. I have a an expect script which
automates the test. I haven't tried with next-20120229 yet.

> (I tried 20 times with shutdown, and 15 of them I reverted all the CPU
> and cache errata in case they affected the issue at all)

Cheers,

Peter.
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter De Schrijver March 1, 2012, 2:35 p.m. UTC | #3
On Thu, Mar 01, 2012 at 12:42:11PM +0100, Peter De Schrijver wrote:
> On Wed, Feb 29, 2012 at 11:38:16PM +0100, Stephen Warren wrote:
> > Peter De Schrijver wrote at Wednesday, February 29, 2012 8:42 AM:
> > > On Wed, Feb 15, 2012 at 10:27:43PM +0100, Stephen Warren wrote:
> > > > Peter De Schrijver wrote at Thursday, February 09, 2012 4:48 PM:
> > > > > This patchset introduces support for secondary cores on tegra30. It also
> > > > > introduces some functions for the flowcontroller and adds basic support for
> > > > > tegra30 powerdomains.
> > > >
> > > > The code looks fine to me. I do notice the following when running
> > > > "shutdown -h now":
> > > >
> > > > [   27.762124] SMP: failed to stop secondary CPUs
> > 
> > I can't seem to reproduce this at all on next-20120229 anymore. What
> > kind of repro frequency are you seeing, and does it get fixed if you
> > update to that baseline?
> > 
> 
> I see it abount once every 3 reboots. I have a an expect script which
> automates the test. I haven't tried with next-20120229 yet.
> 
> > (I tried 20 times with shutdown, and 15 of them I reverted all the CPU
> > and cache errata in case they affected the issue at all)

Tried it with next-20120229 and it works fine here. I also noticed
next-20120229 does not have https://lkml.org/lkml/2012/1/31/444
which was part of the earlier tree I used for testing. So num_online_cpus()
recalculates the number of online CPUs everytime instead of trying to only
recalculate it when there is a change.

Cheers,

Peter.
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From d871feda574bd9f626f175be403cabd308ca18c0 Mon Sep 17 00:00:00 2001
From: Peter De Schrijver <pdeschrijver@nvidia.com>
Date: Wed, 29 Feb 2012 17:37:27 +0200
Subject: [PATCH] ARM: debugging smp_send_stop
X-NVConfidentiality: public

Change-Id: Ia4f33b054f2eccf173a3d977686e1577e2146b92
---
 arch/arm/kernel/smp.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3e43c5f..e0e5eea 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -570,6 +570,7 @@  void smp_send_reschedule(int cpu)
 void smp_send_stop(void)
 {
 	unsigned long timeout;
+	int cpu = smp_processor_id();
 
 	if (num_online_cpus() > 1) {
 		cpumask_t mask = cpu_online_map;
@@ -583,8 +584,12 @@  void smp_send_stop(void)
 	while (num_online_cpus() > 1 && timeout--)
 		udelay(1);
 
-	if (num_online_cpus() > 1)
+	if (num_online_cpus() > 1) {
+		char cpu_mask[10];
 		pr_warning("SMP: failed to stop secondary CPUs\n");
+		cpumask_scnprintf(cpu_mask, sizeof(cpu_mask), cpu_online_mask);
+		pr_warn("running on CPU: %d, still online: %d, mask: %s\n", cpu, num_online_cpus(), cpu_mask);
+	}
 }
 
 /*
-- 
1.7.7.rc0.72.g4b5ea.dirty