Patchwork [Bugme-new,Bug,13444] New: SparcServer 1000E SMP can cause kernel-nullpointer with some hw configurations

login
register
mail settings
Submitter oftedal
Date June 3, 2009, 11:14 p.m.
Message ID <Pine.LNX.4.64.0906040050120.6782@oizys.tordivel.org>
Download mbox | patch
Permalink /patch/28079/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

oftedal - June 3, 2009, 11:14 p.m.
On Wed, 3 Jun 2009, Andrew Morton wrote:

>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 3 Jun 2009 16:45:33 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=13444
>>
>>            Summary: SparcServer 1000E SMP can cause kernel-nullpointer
>>                     with some hw configurations
>>            Product: Platform Specific/Hardware
>>            Version: 2.5
>>     Kernel Version: 2.4.37
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: SPARC32
>>         AssignedTo: zaitcev@yahoo.com
>>         ReportedBy: oftedal@gmail.com
>>         Regression: No
>>
>>
>> Created an attachment (id=21732)
>>  --> (http://bugzilla.kernel.org/attachment.cgi?id=21732)
>> SUN4D-SMP mode patch
>>
>> SparcServer 1000E/SUN4D machines will cause a kernel-nullpointer when running
>> in SMP-mode with certain hw configurations.
>> According to Sun documentation Slot A on the systemboards should be filled with
>> cpu modules first. This will fill physical cpu-slots 0,2,4,8 with cpus first.
>> The current smp code will then fill the init_tasks-array using the physical
>> cpu-mapping.
>> The scheduler on the other hand uses the logical cpu-mapping.
>> So in a two cpu system with two systemboards cpu-slots 0 and 2 will be occupied
>> and init_tasks-array position 0 and 2 will contain a idle_task. But the
>> scheduler will access init_tasks position 0 and 1 which will cause an error.
>> Any sparcserver 1000(E) system with 2 systemboards and 2 cpus, with 1 cpu per
>> board will cause this error.
>>
>
> Thanks, but handling patches via bugzilla is far from preferred.
>
> Please send the patch via email, as a reply-to-all to this email.  The
> patch should include a full description and a Signed-off-by:, as per
> Documentation/SubmittingPatches.
>
>

As mentioned in the bug-report(#13444) the sun4d-SMP code on 
2.4-series kernels(2.4.37) uses the physical cpu-mapping when inserting 
tasks into the init_tasks array.
And the scheduler uses the logical cpu-mapping which will cause a
kernel nullpointer when there are gaps in the physical cpu-mapping.

The attached patch uses the cpucount variable, which can be used to find 
the next logical slot in the init_tasks array. As it is only incremented
when a cpu is successfully started.

(I hope this was more compliant with kernel-bugfixing standards. I am new 
to this)

Signed-off-by: Kjetil Oftedal <oftedal@gmail.com>
---

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - June 4, 2009, 4:23 a.m.
From: oftedal <oftedal@gmail.com>
Date: Thu, 4 Jun 2009 01:14:33 +0200 (CEST)

> As mentioned in the bug-report(#13444) the sun4d-SMP code on
> 2.4-series kernels(2.4.37) uses the physical cpu-mapping when
> inserting tasks into the init_tasks array.
> And the scheduler uses the logical cpu-mapping which will cause a
> kernel nullpointer when there are gaps in the physical cpu-mapping.
> 
> The attached patch uses the cpucount variable, which can be used to
> find the next logical slot in the init_tasks array. As it is only
> incremented
> when a cpu is successfully started.
> 
> (I hope this was more compliant with kernel-bugfixing standards. I am
> new to this)
> 
> Signed-off-by: Kjetil Oftedal <oftedal@gmail.com>

I don't think we're handling 2.4.x patches at this point. :-)

If the bug exists in 2.6.x as well I'd be happy to review such
a patch and apply it for you.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
oftedal - June 4, 2009, 8:39 a.m.
On Wed, 3 Jun 2009, David Miller wrote:

> From: oftedal <oftedal@gmail.com>
> Date: Thu, 4 Jun 2009 01:14:33 +0200 (CEST)
>
>> As mentioned in the bug-report(#13444) the sun4d-SMP code on
>> 2.4-series kernels(2.4.37) uses the physical cpu-mapping when
>> inserting tasks into the init_tasks array.
>> And the scheduler uses the logical cpu-mapping which will cause a
>> kernel nullpointer when there are gaps in the physical cpu-mapping.
>>
>> The attached patch uses the cpucount variable, which can be used to
>> find the next logical slot in the init_tasks array. As it is only
>> incremented
>> when a cpu is successfully started.
>>
>> (I hope this was more compliant with kernel-bugfixing standards. I am
>> new to this)
>>
>> Signed-off-by: Kjetil Oftedal <oftedal@gmail.com>
>
> I don't think we're handling 2.4.x patches at this point. :-)
>
> If the bug exists in 2.6.x as well I'd be happy to review such
> a patch and apply it for you.
>

Yeah. 2.4.x is a "bit" old. But it was easier to fix. As the 2.6.x-series
is giving me all kinds of trouble. The SMP-code was redesigned for 2.6, 
and I havn't been able to confirm if it works or not.
I could have a go at it if someone could give me a small pointer in how to 
get the piggyback_32 utility working with 2.6s System.map.
(looks like "A end" is missing from it)

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sam Ravnborg - June 4, 2009, 9:22 a.m.
> 
> Yeah. 2.4.x is a "bit" old. But it was easier to fix. As the 2.6.x-series
> is giving me all kinds of trouble. The SMP-code was redesigned for 2.6, 
> and I havn't been able to confirm if it works or not.
> I could have a go at it if someone could give me a small pointer in how to 
> get the piggyback_32 utility working with 2.6s System.map.
> (looks like "A end" is missing from it)

Just a quick guess..

Try to add:

	end = .;

to arch/sparc/kernel/vmlinux.lds.S

Do this above the definition of _end.

Or better change piggyback_32 to support _end.

	Sam
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/sparc/kernel/sun4d_smp.c b/arch/sparc/kernel/sun4d_smp.c
index 9bb7f78..5b9b34e 100644
--- a/arch/sparc/kernel/sun4d_smp.c
+++ b/arch/sparc/kernel/sun4d_smp.c
@@ -221,7 +221,9 @@  void __init smp4d_boot_cpus(void)
 			cpucount++;

 			p = init_task.prev_task;
-			init_tasks[i] = p;
+
+			/* The scheduler uses the logical cpu mapping when accessing this array */
+			init_tasks[cpucount] = p;

 			p->processor = i;
 			p->cpus_runnable = 1 << i; /* we schedule the first task manually */