diff mbox

[V2] ARM: BCM5301X: Implement SMP support

Message ID 1427030415-31721-1-git-send-email-zajec5@gmail.com
State New
Headers show

Commit Message

Rafał Miłecki March 22, 2015, 1:20 p.m. UTC
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
---
V2: Change code after receiving Florian's comments:
    1) Use "mmio-sram"
    2) Remove commented out ASM call
    3) Fix coding style in ASM
    4) Simplify finding OF node
---
 Documentation/devicetree/bindings/arm/bcm4708.txt |  24 ++++
 Documentation/devicetree/bindings/arm/cpus.txt    |   1 +
 arch/arm/boot/dts/bcm4708.dtsi                    |  13 ++
 arch/arm/mach-bcm/Makefile                        |   3 +
 arch/arm/mach-bcm/bcm5301x_headsmp.S              |  45 ++++++
 arch/arm/mach-bcm/bcm5301x_smp.c                  | 158 ++++++++++++++++++++++
 6 files changed, 244 insertions(+)
 create mode 100644 arch/arm/mach-bcm/bcm5301x_headsmp.S
 create mode 100644 arch/arm/mach-bcm/bcm5301x_smp.c

Comments

Russell King - ARM Linux March 26, 2015, noon UTC | #1
On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
> +/*
> + * BCM5301X specific entry point for secondary CPUs.
> + */
> +ENTRY(bcm5301x_secondary_startup)
> +	mrc	p15, 0, r0, c0, c0, 5
> +	and	r0, r0, #15
> +	adr	r4, 1f
> +	ldmia	r4, {r5, r6}
> +	sub	r4, r4, r5
> +	add	r6, r6, r4
> +pen:	ldr	r7, [r6]
> +	cmp	r7, r0
> +	bne	pen
> +
> +	/*
> +	 * In case L1 cache has unpredictable contents at power-up
> +	 * clean its contents without flushing.
> +	 */
> +	bl      v7_invalidate_l1
> +
> +	mov	r0, #0
> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
> +	dsb
> +	isb

So if your I-cache contains unpredictable contents, how do you execute
the code to this point?  Shouldn't the I-cache invalidate be the very
first instruction you execute followed by the dsb and isb (oh, and iirc
it ignores the value in the register).

In the case where a CPU has unpredictable contents at power up, the
ARM ARM requires that an implementation specific sequence is followed
to initialise the caches.  I doubt that such a sequence includes testing
a pen value.

> +	sysram_base_addr = of_iomap(node, 0);
> +	if (!sysram_base_addr) {
> +		pr_warn("Failed to map sysram\n");
> +		return;
> +	}
> +
> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
> +
> +	dsb_sev();	/* Exit WFI */

Which WFI?  This seems to imply that you have some kind of initial
firmware.  If so, that should be taking care of the cache initialisation,
not the kernel.

> +	mb();		/* make sure write buffer is drained */

writel() already ensures that.

> +	/*
> +	 * The secondary processor is waiting to be released from
> +	 * the holding pen - release it, then wait for it to flag
> +	 * that it has been released by resetting pen_release.
> +	 *
> +	 * Note that "pen_release" is the hardware CPU ID, whereas
> +	 * "cpu" is Linux's internal ID.
> +	 */
> +	write_pen_release(cpu_logical_map(cpu));
> +
> +	 /* Send the secondary CPU SEV */
> +	dsb_sev();

If you even need any of the pen code, if you're having to send a SEV here,
wouldn't having a WFE in the pen assembly loop above be a good idea?
Hauke Mehrtens Oct. 13, 2015, 10:29 p.m. UTC | #2
Hi,

I tested this patch on my device now.

What does the loader do before Linux gets started on the second CPU and
what is ensured?

On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
> On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
>> +/*
>> + * BCM5301X specific entry point for secondary CPUs.
>> + */
>> +ENTRY(bcm5301x_secondary_startup)
>> +	mrc	p15, 0, r0, c0, c0, 5
>> +	and	r0, r0, #15
>> +	adr	r4, 1f
>> +	ldmia	r4, {r5, r6}
>> +	sub	r4, r4, r5
>> +	add	r6, r6, r4
>> +pen:	ldr	r7, [r6]
>> +	cmp	r7, r0
>> +	bne	pen
>> +
>> +	/*
>> +	 * In case L1 cache has unpredictable contents at power-up
>> +	 * clean its contents without flushing.
>> +	 */
>> +	bl      v7_invalidate_l1
>> +
>> +	mov	r0, #0
>> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
>> +	dsb
>> +	isb
> 
> So if your I-cache contains unpredictable contents, how do you execute
> the code to this point?  Shouldn't the I-cache invalidate be the very
> first instruction you execute followed by the dsb and isb (oh, and iirc
> it ignores the value in the register).
> 
> In the case where a CPU has unpredictable contents at power up, the
> ARM ARM requires that an implementation specific sequence is followed
> to initialise the caches.  I doubt that such a sequence includes testing
> a pen value.

When I remove the test for the pen value the CPU does not come up any
more, I get this log output:

[ 0.132292] CPU: Testing write buffer coherency: ok
[ 0.137635] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[ 0.143675] Setting up static identity map for 0x82a0 - 0x82d4
[ 10.149786] CPU1: failed to boot: -38
[ 10.153651] Brought up 1 CPUs


This was caused just by removing the "cmp r7, r0" and "bne pen"
instructions.

With these instructions are added it works and I get this:

[    0.132329] CPU: Testing write buffer coherency: ok
[    0.137682] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.143708] Setting up static identity map for 0x82a0 - 0x82d4
[    0.189788] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[    0.189892] Brought up 2 CPUs
[    0.198889] SMP: Total of 2 processors activated (3188.32 BogoMIPS).

Currently this is 100% reproducible.

Could it be that the second CPU needs some time till it is synchronised
correctly?

I do not know if and why the cache clearing is needed, I do not have
access to the SoC documentation or the ASIC/firmware developer.

> 
>> +	sysram_base_addr = of_iomap(node, 0);
>> +	if (!sysram_base_addr) {
>> +		pr_warn("Failed to map sysram\n");
>> +		return;
>> +	}
>> +
>> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
>> +
>> +	dsb_sev();	/* Exit WFI */
> 
> Which WFI?  This seems to imply that you have some kind of initial
> firmware.  If so, that should be taking care of the cache initialisation,
> not the kernel.
> 
>> +	mb();		/* make sure write buffer is drained */
> 
> writel() already ensures that.
> 
>> +	/*
>> +	 * The secondary processor is waiting to be released from
>> +	 * the holding pen - release it, then wait for it to flag
>> +	 * that it has been released by resetting pen_release.
>> +	 *
>> +	 * Note that "pen_release" is the hardware CPU ID, whereas
>> +	 * "cpu" is Linux's internal ID.
>> +	 */
>> +	write_pen_release(cpu_logical_map(cpu));
>> +
>> +	 /* Send the secondary CPU SEV */
>> +	dsb_sev();
> 
> If you even need any of the pen code, if you're having to send a SEV here,
> wouldn't having a WFE in the pen assembly loop above be a good idea?
> 


I have to read more on how WFE and co works.

Hauke
Ray Jui Oct. 13, 2015, 10:48 p.m. UTC | #3
+ bcm-kernel-feedback-list.

Kapil, you might want to take a look at this. Not sure how this is
related to your SMP patches for NSP.

On 10/13/2015 3:29 PM, Hauke Mehrtens wrote:
> Hi,
> 
> I tested this patch on my device now.
> 
> What does the loader do before Linux gets started on the second CPU and
> what is ensured?
> 
> On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
>> On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
>>> +/*
>>> + * BCM5301X specific entry point for secondary CPUs.
>>> + */
>>> +ENTRY(bcm5301x_secondary_startup)
>>> +	mrc	p15, 0, r0, c0, c0, 5
>>> +	and	r0, r0, #15
>>> +	adr	r4, 1f
>>> +	ldmia	r4, {r5, r6}
>>> +	sub	r4, r4, r5
>>> +	add	r6, r6, r4
>>> +pen:	ldr	r7, [r6]
>>> +	cmp	r7, r0
>>> +	bne	pen
>>> +
>>> +	/*
>>> +	 * In case L1 cache has unpredictable contents at power-up
>>> +	 * clean its contents without flushing.
>>> +	 */
>>> +	bl      v7_invalidate_l1
>>> +
>>> +	mov	r0, #0
>>> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
>>> +	dsb
>>> +	isb
>>
>> So if your I-cache contains unpredictable contents, how do you execute
>> the code to this point?  Shouldn't the I-cache invalidate be the very
>> first instruction you execute followed by the dsb and isb (oh, and iirc
>> it ignores the value in the register).
>>
>> In the case where a CPU has unpredictable contents at power up, the
>> ARM ARM requires that an implementation specific sequence is followed
>> to initialise the caches.  I doubt that such a sequence includes testing
>> a pen value.
> 
> When I remove the test for the pen value the CPU does not come up any
> more, I get this log output:
> 
> [ 0.132292] CPU: Testing write buffer coherency: ok
> [ 0.137635] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> [ 0.143675] Setting up static identity map for 0x82a0 - 0x82d4
> [ 10.149786] CPU1: failed to boot: -38
> [ 10.153651] Brought up 1 CPUs
> 
> 
> This was caused just by removing the "cmp r7, r0" and "bne pen"
> instructions.
> 
> With these instructions are added it works and I get this:
> 
> [    0.132329] CPU: Testing write buffer coherency: ok
> [    0.137682] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> [    0.143708] Setting up static identity map for 0x82a0 - 0x82d4
> [    0.189788] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> [    0.189892] Brought up 2 CPUs
> [    0.198889] SMP: Total of 2 processors activated (3188.32 BogoMIPS).
> 
> Currently this is 100% reproducible.
> 
> Could it be that the second CPU needs some time till it is synchronised
> correctly?
> 
> I do not know if and why the cache clearing is needed, I do not have
> access to the SoC documentation or the ASIC/firmware developer.
> 
>>
>>> +	sysram_base_addr = of_iomap(node, 0);
>>> +	if (!sysram_base_addr) {
>>> +		pr_warn("Failed to map sysram\n");
>>> +		return;
>>> +	}
>>> +
>>> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
>>> +
>>> +	dsb_sev();	/* Exit WFI */
>>
>> Which WFI?  This seems to imply that you have some kind of initial
>> firmware.  If so, that should be taking care of the cache initialisation,
>> not the kernel.
>>
>>> +	mb();		/* make sure write buffer is drained */
>>
>> writel() already ensures that.
>>
>>> +	/*
>>> +	 * The secondary processor is waiting to be released from
>>> +	 * the holding pen - release it, then wait for it to flag
>>> +	 * that it has been released by resetting pen_release.
>>> +	 *
>>> +	 * Note that "pen_release" is the hardware CPU ID, whereas
>>> +	 * "cpu" is Linux's internal ID.
>>> +	 */
>>> +	write_pen_release(cpu_logical_map(cpu));
>>> +
>>> +	 /* Send the secondary CPU SEV */
>>> +	dsb_sev();
>>
>> If you even need any of the pen code, if you're having to send a SEV here,
>> wouldn't having a WFE in the pen assembly loop above be a good idea?
>>
> 
> 
> I have to read more on how WFE and co works.
> 
> Hauke
>
Kapil Hali Oct. 14, 2015, 1:42 p.m. UTC | #4
On 10/14/2015 4:18 AM, Ray Jui wrote:
> + bcm-kernel-feedback-list.
> 
> Kapil, you might want to take a look at this. Not sure how this is
> related to your SMP patches for NSP.

Ray, I don't have complete/other patch sets for this change. It would
be good if I get those patch sets as well or complete e-mail thread. 
I think if we have a cleaner solutions for SMP, we can consolidate 
the change required for NS and NSP. I have few points to add, which 
are inline in this e-mail.

> 
> On 10/13/2015 3:29 PM, Hauke Mehrtens wrote:
>> Hi,
>>
>> I tested this patch on my device now.
>>
>> What does the loader do before Linux gets started on the second CPU and
>> what is ensured?
>>
>> On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
>>> On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
>>>> +/*
>>>> + * BCM5301X specific entry point for secondary CPUs.
>>>> + */
>>>> +ENTRY(bcm5301x_secondary_startup)
>>>> +	mrc	p15, 0, r0, c0, c0, 5
>>>> +	and	r0, r0, #15
>>>> +	adr	r4, 1f
>>>> +	ldmia	r4, {r5, r6}
>>>> +	sub	r4, r4, r5
>>>> +	add	r6, r6, r4
>>>> +pen:	ldr	r7, [r6]
>>>> +	cmp	r7, r0
>>>> +	bne	pen
>>>> +
>>>> +	/*
>>>> +	 * In case L1 cache has unpredictable contents at power-up
>>>> +	 * clean its contents without flushing.
>>>> +	 */
>>>> +	bl      v7_invalidate_l1
>>>> +
>>>> +	mov	r0, #0
>>>> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
>>>> +	dsb
>>>> +	isb
>>>
>>> So if your I-cache contains unpredictable contents, how do you execute
>>> the code to this point?  Shouldn't the I-cache invalidate be the very
>>> first instruction you execute followed by the dsb and isb (oh, and iirc
>>> it ignores the value in the register).
>>>
>>> In the case where a CPU has unpredictable contents at power up, the
>>> ARM ARM requires that an implementation specific sequence is followed
>>> to initialise the caches.  I doubt that such a sequence includes testing
>>> a pen value.

Are you sure this is an issue of unpredictable L1 cache contents at
power-up? AFAIK, 5301X had an issue with secondary core initialization. 
Secondary core which waits on WFE would let it out of the pen as soon as the 
first spin_*lock executes. This was because of a BOOTROM bug in NS, so the
work around was to reset the address for the secondary processor to go back 
and wait for the signal from the primary core. This vector fixup is required
so that the secondary core doesn't start executing kernel instructions until 
we've patched its jump address during wakeup_secondary().

Also, v7 setup function should invalidate L1 cache and we should remove all
v7_invalidate_l1 calls in all headsmp.S in platform specific directories.

>>
>> When I remove the test for the pen value the CPU does not come up any
>> more, I get this log output:
>>
>> [ 0.132292] CPU: Testing write buffer coherency: ok
>> [ 0.137635] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>> [ 0.143675] Setting up static identity map for 0x82a0 - 0x82d4
>> [ 10.149786] CPU1: failed to boot: -38
>> [ 10.153651] Brought up 1 CPUs
>>
>>
>> This was caused just by removing the "cmp r7, r0" and "bne pen"
>> instructions.
>>
>> With these instructions are added it works and I get this:
>>
>> [    0.132329] CPU: Testing write buffer coherency: ok
>> [    0.137682] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>> [    0.143708] Setting up static identity map for 0x82a0 - 0x82d4
>> [    0.189788] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>> [    0.189892] Brought up 2 CPUs
>> [    0.198889] SMP: Total of 2 processors activated (3188.32 BogoMIPS).
>>
>> Currently this is 100% reproducible.
>>
>> Could it be that the second CPU needs some time till it is synchronised
>> correctly?
>>
>> I do not know if and why the cache clearing is needed, I do not have
>> access to the SoC documentation or the ASIC/firmware developer.
>>
>>>
>>>> +	sysram_base_addr = of_iomap(node, 0);
>>>> +	if (!sysram_base_addr) {
>>>> +		pr_warn("Failed to map sysram\n");
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
>>>> +
>>>> +	dsb_sev();	/* Exit WFI */
>>>
>>> Which WFI?  This seems to imply that you have some kind of initial
>>> firmware.  If so, that should be taking care of the cache initialisation,
>>> not the kernel.
>>>
>>>> +	mb();		/* make sure write buffer is drained */
>>>
>>> writel() already ensures that.
>>>
>>>> +	/*
>>>> +	 * The secondary processor is waiting to be released from
>>>> +	 * the holding pen - release it, then wait for it to flag
>>>> +	 * that it has been released by resetting pen_release.
>>>> +	 *
>>>> +	 * Note that "pen_release" is the hardware CPU ID, whereas
>>>> +	 * "cpu" is Linux's internal ID.
>>>> +	 */
>>>> +	write_pen_release(cpu_logical_map(cpu));
>>>> +
>>>> +	 /* Send the secondary CPU SEV */
>>>> +	dsb_sev();
>>>
>>> If you even need any of the pen code, if you're having to send a SEV here,
>>> wouldn't having a WFE in the pen assembly loop above be a good idea?
>>>
>>
>>
>> I have to read more on how WFE and co works.
>>
>> Hauke
>>

Thanks,
Kapil
Hauke Mehrtens Oct. 14, 2015, 6:22 p.m. UTC | #5
On 10/14/2015 03:42 PM, Kapil Hali wrote:
> 
> 
> On 10/14/2015 4:18 AM, Ray Jui wrote:
>> + bcm-kernel-feedback-list.
>>
>> Kapil, you might want to take a look at this. Not sure how this is
>> related to your SMP patches for NSP.
> 
> Ray, I don't have complete/other patch sets for this change. It would
> be good if I get those patch sets as well or complete e-mail thread. 
> I think if we have a cleaner solutions for SMP, we can consolidate 
> the change required for NS and NSP. I have few points to add, which 
> are inline in this e-mail.

Hi Kapil,

the patch was posted on the arm mailing list:
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332690.html
but you can also find it in OpenWrt here:
https://dev.openwrt.org/browser/trunk/target/linux/bcm53xx/patches-4.1/131-ARM-BCM5301X-Implement-SMP-support.patch
It looks very similar to your approach so I assume we can use the same
SMP code for NS and NSP. I will look into your patches later on.

Hauke

> 
>>
>> On 10/13/2015 3:29 PM, Hauke Mehrtens wrote:
>>> Hi,
>>>
>>> I tested this patch on my device now.
>>>
>>> What does the loader do before Linux gets started on the second CPU and
>>> what is ensured?
>>>
>>> On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
>>>> On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
>>>>> +/*
>>>>> + * BCM5301X specific entry point for secondary CPUs.
>>>>> + */
>>>>> +ENTRY(bcm5301x_secondary_startup)
>>>>> +	mrc	p15, 0, r0, c0, c0, 5
>>>>> +	and	r0, r0, #15
>>>>> +	adr	r4, 1f
>>>>> +	ldmia	r4, {r5, r6}
>>>>> +	sub	r4, r4, r5
>>>>> +	add	r6, r6, r4
>>>>> +pen:	ldr	r7, [r6]
>>>>> +	cmp	r7, r0
>>>>> +	bne	pen
>>>>> +
>>>>> +	/*
>>>>> +	 * In case L1 cache has unpredictable contents at power-up
>>>>> +	 * clean its contents without flushing.
>>>>> +	 */
>>>>> +	bl      v7_invalidate_l1
>>>>> +
>>>>> +	mov	r0, #0
>>>>> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
>>>>> +	dsb
>>>>> +	isb
>>>>
>>>> So if your I-cache contains unpredictable contents, how do you execute
>>>> the code to this point?  Shouldn't the I-cache invalidate be the very
>>>> first instruction you execute followed by the dsb and isb (oh, and iirc
>>>> it ignores the value in the register).
>>>>
>>>> In the case where a CPU has unpredictable contents at power up, the
>>>> ARM ARM requires that an implementation specific sequence is followed
>>>> to initialise the caches.  I doubt that such a sequence includes testing
>>>> a pen value.
> 
> Are you sure this is an issue of unpredictable L1 cache contents at
> power-up? AFAIK, 5301X had an issue with secondary core initialization. 
> Secondary core which waits on WFE would let it out of the pen as soon as the 
> first spin_*lock executes. This was because of a BOOTROM bug in NS, so the
> work around was to reset the address for the secondary processor to go back 
> and wait for the signal from the primary core. This vector fixup is required
> so that the secondary core doesn't start executing kernel instructions until 
> we've patched its jump address during wakeup_secondary().
> 
> Also, v7 setup function should invalidate L1 cache and we should remove all
> v7_invalidate_l1 calls in all headsmp.S in platform specific directories.
> 
>>>
>>> When I remove the test for the pen value the CPU does not come up any
>>> more, I get this log output:
>>>
>>> [ 0.132292] CPU: Testing write buffer coherency: ok
>>> [ 0.137635] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>> [ 0.143675] Setting up static identity map for 0x82a0 - 0x82d4
>>> [ 10.149786] CPU1: failed to boot: -38
>>> [ 10.153651] Brought up 1 CPUs
>>>
>>>
>>> This was caused just by removing the "cmp r7, r0" and "bne pen"
>>> instructions.
>>>
>>> With these instructions are added it works and I get this:
>>>
>>> [    0.132329] CPU: Testing write buffer coherency: ok
>>> [    0.137682] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>> [    0.143708] Setting up static identity map for 0x82a0 - 0x82d4
>>> [    0.189788] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>>> [    0.189892] Brought up 2 CPUs
>>> [    0.198889] SMP: Total of 2 processors activated (3188.32 BogoMIPS).
>>>
>>> Currently this is 100% reproducible.
>>>
>>> Could it be that the second CPU needs some time till it is synchronised
>>> correctly?
>>>
>>> I do not know if and why the cache clearing is needed, I do not have
>>> access to the SoC documentation or the ASIC/firmware developer.
>>>
>>>>
>>>>> +	sysram_base_addr = of_iomap(node, 0);
>>>>> +	if (!sysram_base_addr) {
>>>>> +		pr_warn("Failed to map sysram\n");
>>>>> +		return;
>>>>> +	}
>>>>> +
>>>>> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
>>>>> +
>>>>> +	dsb_sev();	/* Exit WFI */
>>>>
>>>> Which WFI?  This seems to imply that you have some kind of initial
>>>> firmware.  If so, that should be taking care of the cache initialisation,
>>>> not the kernel.
>>>>
>>>>> +	mb();		/* make sure write buffer is drained */
>>>>
>>>> writel() already ensures that.
>>>>
>>>>> +	/*
>>>>> +	 * The secondary processor is waiting to be released from
>>>>> +	 * the holding pen - release it, then wait for it to flag
>>>>> +	 * that it has been released by resetting pen_release.
>>>>> +	 *
>>>>> +	 * Note that "pen_release" is the hardware CPU ID, whereas
>>>>> +	 * "cpu" is Linux's internal ID.
>>>>> +	 */
>>>>> +	write_pen_release(cpu_logical_map(cpu));
>>>>> +
>>>>> +	 /* Send the secondary CPU SEV */
>>>>> +	dsb_sev();
>>>>
>>>> If you even need any of the pen code, if you're having to send a SEV here,
>>>> wouldn't having a WFE in the pen assembly loop above be a good idea?
>>>>
>>>
>>>
>>> I have to read more on how WFE and co works.
>>>
>>> Hauke
>>>
> 
> Thanks,
> Kapil
>
Russell King - ARM Linux Oct. 15, 2015, 8:17 a.m. UTC | #6
On Wed, Oct 14, 2015 at 12:29:19AM +0200, Hauke Mehrtens wrote:
> Hi,
> 
> I tested this patch on my device now.
> 
> What does the loader do before Linux gets started on the second CPU and
> what is ensured?
> 
> On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
> > On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
> >> +/*
> >> + * BCM5301X specific entry point for secondary CPUs.
> >> + */
> >> +ENTRY(bcm5301x_secondary_startup)
> >> +	mrc	p15, 0, r0, c0, c0, 5
> >> +	and	r0, r0, #15
> >> +	adr	r4, 1f
> >> +	ldmia	r4, {r5, r6}
> >> +	sub	r4, r4, r5
> >> +	add	r6, r6, r4
> >> +pen:	ldr	r7, [r6]
> >> +	cmp	r7, r0
> >> +	bne	pen
> >> +
> >> +	/*
> >> +	 * In case L1 cache has unpredictable contents at power-up
> >> +	 * clean its contents without flushing.
> >> +	 */
> >> +	bl      v7_invalidate_l1
> >> +
> >> +	mov	r0, #0
> >> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
> >> +	dsb
> >> +	isb
> > 
> > So if your I-cache contains unpredictable contents, how do you execute
> > the code to this point?  Shouldn't the I-cache invalidate be the very
> > first instruction you execute followed by the dsb and isb (oh, and iirc
> > it ignores the value in the register).
> > 
> > In the case where a CPU has unpredictable contents at power up, the
> > ARM ARM requires that an implementation specific sequence is followed
> > to initialise the caches.  I doubt that such a sequence includes testing
> > a pen value.

Things have changed in this area: ARMv7 CPU support now conforms with
other Linux CPU support, and we invalidate the caches prior to enabling
them.

However, I still come back to the point I said above, which you have not
addressed.  If the L1 cache contains unpredictable contents at power up,
how can you guarnatee that any code the CPU executes _is read by the CPU_
rather than the random contents of the L1 cache.

This isn't about the pen stuff.  It's about whether you can execute any
code reliably at this point.

There are several possibilities that the ARM ARM allows:
* the caches are reset at CPU reset and contain no valid or dirty lines
* the caches are not reset at CPU reset, but are not searched until
  the enable bits are set
* the caches are not reset at CPU reset, but are searched

The last case is the "special" case that requires a implementation
specific code to initialise them safely - but anyone implementing that
would be really silly to do so, so I doubt this is what you have.

So, I'd just get rid of that unnecessary cache flushing, especially as
I've said above, the ARMv7 CPU entry has been fixed to invalidate caches,
rather than flush dirty data (hence potentially random data in the second
case above) out to memory.
Kapil Hali Oct. 15, 2015, 3:50 p.m. UTC | #7
On 10/14/2015 11:52 PM, Hauke Mehrtens wrote:
> On 10/14/2015 03:42 PM, Kapil Hali wrote:
>>
>>
>> On 10/14/2015 4:18 AM, Ray Jui wrote:
>>> + bcm-kernel-feedback-list.
>>>
>>> Kapil, you might want to take a look at this. Not sure how this is
>>> related to your SMP patches for NSP.
>>
>> Ray, I don't have complete/other patch sets for this change. It would
>> be good if I get those patch sets as well or complete e-mail thread. 
>> I think if we have a cleaner solutions for SMP, we can consolidate 
>> the change required for NS and NSP. I have few points to add, which 
>> are inline in this e-mail.
> 
> Hi Kapil,
> 
> the patch was posted on the arm mailing list:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332690.html
> but you can also find it in OpenWrt here:
> https://dev.openwrt.org/browser/trunk/target/linux/bcm53xx/patches-4.1/131-ARM-BCM5301X-Implement-SMP-support.patch
> It looks very similar to your approach so I assume we can use the same
> SMP code for NS and NSP. I will look into your patches later on.

You are right, we can use the common SMP code for NS and NSP, however, I am
only concerned about the secondary_startup() function in case of NS as is 
being discussed in the e-mail chain.
As there are multiple versions of NS SoC, and AFAIK, some of those have 
anomalous BOOTROM. We should have a cleaner generic solution which can be 
used for both NS and NSP.

> 
> Hauke
> 
>>
>>>
>>> On 10/13/2015 3:29 PM, Hauke Mehrtens wrote:
>>>> Hi,
>>>>
>>>> I tested this patch on my device now.
>>>>
>>>> What does the loader do before Linux gets started on the second CPU and
>>>> what is ensured?
>>>>
>>>> On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
>>>>> On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
>>>>>> +/*
>>>>>> + * BCM5301X specific entry point for secondary CPUs.
>>>>>> + */
>>>>>> +ENTRY(bcm5301x_secondary_startup)
>>>>>> +	mrc	p15, 0, r0, c0, c0, 5
>>>>>> +	and	r0, r0, #15
>>>>>> +	adr	r4, 1f
>>>>>> +	ldmia	r4, {r5, r6}
>>>>>> +	sub	r4, r4, r5
>>>>>> +	add	r6, r6, r4
>>>>>> +pen:	ldr	r7, [r6]
>>>>>> +	cmp	r7, r0
>>>>>> +	bne	pen
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * In case L1 cache has unpredictable contents at power-up
>>>>>> +	 * clean its contents without flushing.
>>>>>> +	 */
>>>>>> +	bl      v7_invalidate_l1
>>>>>> +
>>>>>> +	mov	r0, #0
>>>>>> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
>>>>>> +	dsb
>>>>>> +	isb
>>>>>
>>>>> So if your I-cache contains unpredictable contents, how do you execute
>>>>> the code to this point?  Shouldn't the I-cache invalidate be the very
>>>>> first instruction you execute followed by the dsb and isb (oh, and iirc
>>>>> it ignores the value in the register).
>>>>>
>>>>> In the case where a CPU has unpredictable contents at power up, the
>>>>> ARM ARM requires that an implementation specific sequence is followed
>>>>> to initialise the caches.  I doubt that such a sequence includes testing
>>>>> a pen value.
>>
>> Are you sure this is an issue of unpredictable L1 cache contents at
>> power-up? AFAIK, 5301X had an issue with secondary core initialization. 
>> Secondary core which waits on WFE would let it out of the pen as soon as the 
>> first spin_*lock executes. This was because of a BOOTROM bug in NS, so the
>> work around was to reset the address for the secondary processor to go back 
>> and wait for the signal from the primary core. This vector fixup is required
>> so that the secondary core doesn't start executing kernel instructions until 
>> we've patched its jump address during wakeup_secondary().
>>
>> Also, v7 setup function should invalidate L1 cache and we should remove all
>> v7_invalidate_l1 calls in all headsmp.S in platform specific directories.
>>
>>>>
>>>> When I remove the test for the pen value the CPU does not come up any
>>>> more, I get this log output:
>>>>
>>>> [ 0.132292] CPU: Testing write buffer coherency: ok
>>>> [ 0.137635] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>>> [ 0.143675] Setting up static identity map for 0x82a0 - 0x82d4
>>>> [ 10.149786] CPU1: failed to boot: -38
>>>> [ 10.153651] Brought up 1 CPUs
>>>>
>>>>
>>>> This was caused just by removing the "cmp r7, r0" and "bne pen"
>>>> instructions.
>>>>
>>>> With these instructions are added it works and I get this:
>>>>
>>>> [    0.132329] CPU: Testing write buffer coherency: ok
>>>> [    0.137682] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>>> [    0.143708] Setting up static identity map for 0x82a0 - 0x82d4
>>>> [    0.189788] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>>>> [    0.189892] Brought up 2 CPUs
>>>> [    0.198889] SMP: Total of 2 processors activated (3188.32 BogoMIPS).
>>>>
>>>> Currently this is 100% reproducible.
>>>>
>>>> Could it be that the second CPU needs some time till it is synchronised
>>>> correctly?
>>>>
>>>> I do not know if and why the cache clearing is needed, I do not have
>>>> access to the SoC documentation or the ASIC/firmware developer.
>>>>
>>>>>
>>>>>> +	sysram_base_addr = of_iomap(node, 0);
>>>>>> +	if (!sysram_base_addr) {
>>>>>> +		pr_warn("Failed to map sysram\n");
>>>>>> +		return;
>>>>>> +	}
>>>>>> +
>>>>>> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
>>>>>> +
>>>>>> +	dsb_sev();	/* Exit WFI */
>>>>>
>>>>> Which WFI?  This seems to imply that you have some kind of initial
>>>>> firmware.  If so, that should be taking care of the cache initialisation,
>>>>> not the kernel.
>>>>>
>>>>>> +	mb();		/* make sure write buffer is drained */
>>>>>
>>>>> writel() already ensures that.
>>>>>
>>>>>> +	/*
>>>>>> +	 * The secondary processor is waiting to be released from
>>>>>> +	 * the holding pen - release it, then wait for it to flag
>>>>>> +	 * that it has been released by resetting pen_release.
>>>>>> +	 *
>>>>>> +	 * Note that "pen_release" is the hardware CPU ID, whereas
>>>>>> +	 * "cpu" is Linux's internal ID.
>>>>>> +	 */
>>>>>> +	write_pen_release(cpu_logical_map(cpu));
>>>>>> +
>>>>>> +	 /* Send the secondary CPU SEV */
>>>>>> +	dsb_sev();
>>>>>
>>>>> If you even need any of the pen code, if you're having to send a SEV here,
>>>>> wouldn't having a WFE in the pen assembly loop above be a good idea?
>>>>>
>>>>
>>>>
>>>> I have to read more on how WFE and co works.
>>>>
>>>> Hauke
>>>>
>>
>> Thanks,
>> Kapil
>>
> 

Thanks,
Kapil
Jon Mason Oct. 22, 2015, 8:30 p.m. UTC | #8
On Wed, Oct 14, 2015 at 08:22:54PM +0200, Hauke Mehrtens wrote:
> On 10/14/2015 03:42 PM, Kapil Hali wrote:
> > 
> > 
> > On 10/14/2015 4:18 AM, Ray Jui wrote:
> >> + bcm-kernel-feedback-list.
> >>
> >> Kapil, you might want to take a look at this. Not sure how this is
> >> related to your SMP patches for NSP.
> > 
> > Ray, I don't have complete/other patch sets for this change. It would
> > be good if I get those patch sets as well or complete e-mail thread. 
> > I think if we have a cleaner solutions for SMP, we can consolidate 
> > the change required for NS and NSP. I have few points to add, which 
> > are inline in this e-mail.
> 
> Hi Kapil,
> 
> the patch was posted on the arm mailing list:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332690.html
> but you can also find it in OpenWrt here:
> https://dev.openwrt.org/browser/trunk/target/linux/bcm53xx/patches-4.1/131-ARM-BCM5301X-Implement-SMP-support.patch
> It looks very similar to your approach so I assume we can use the same
> SMP code for NS and NSP. I will look into your patches later on.

Hello Hauke,
With my last patch (https://lkml.org/lkml/2015/10/15/690), SMP on 4708
is working with the NSP SMP patch
(https://lkml.org/lkml/2015/10/14/769).  That patch does not have the
issues that Russell King was mentioning in this patch series (and
supports more SoCs).  Have you had a chance to verify that it works
for you?

Thanks,
Jon

> 
> Hauke
> 
> > 
> >>
> >> On 10/13/2015 3:29 PM, Hauke Mehrtens wrote:
> >>> Hi,
> >>>
> >>> I tested this patch on my device now.
> >>>
> >>> What does the loader do before Linux gets started on the second CPU and
> >>> what is ensured?
> >>>
> >>> On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
> >>>> On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
> >>>>> +/*
> >>>>> + * BCM5301X specific entry point for secondary CPUs.
> >>>>> + */
> >>>>> +ENTRY(bcm5301x_secondary_startup)
> >>>>> +	mrc	p15, 0, r0, c0, c0, 5
> >>>>> +	and	r0, r0, #15
> >>>>> +	adr	r4, 1f
> >>>>> +	ldmia	r4, {r5, r6}
> >>>>> +	sub	r4, r4, r5
> >>>>> +	add	r6, r6, r4
> >>>>> +pen:	ldr	r7, [r6]
> >>>>> +	cmp	r7, r0
> >>>>> +	bne	pen
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * In case L1 cache has unpredictable contents at power-up
> >>>>> +	 * clean its contents without flushing.
> >>>>> +	 */
> >>>>> +	bl      v7_invalidate_l1
> >>>>> +
> >>>>> +	mov	r0, #0
> >>>>> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
> >>>>> +	dsb
> >>>>> +	isb
> >>>>
> >>>> So if your I-cache contains unpredictable contents, how do you execute
> >>>> the code to this point?  Shouldn't the I-cache invalidate be the very
> >>>> first instruction you execute followed by the dsb and isb (oh, and iirc
> >>>> it ignores the value in the register).
> >>>>
> >>>> In the case where a CPU has unpredictable contents at power up, the
> >>>> ARM ARM requires that an implementation specific sequence is followed
> >>>> to initialise the caches.  I doubt that such a sequence includes testing
> >>>> a pen value.
> > 
> > Are you sure this is an issue of unpredictable L1 cache contents at
> > power-up? AFAIK, 5301X had an issue with secondary core initialization. 
> > Secondary core which waits on WFE would let it out of the pen as soon as the 
> > first spin_*lock executes. This was because of a BOOTROM bug in NS, so the
> > work around was to reset the address for the secondary processor to go back 
> > and wait for the signal from the primary core. This vector fixup is required
> > so that the secondary core doesn't start executing kernel instructions until 
> > we've patched its jump address during wakeup_secondary().
> > 
> > Also, v7 setup function should invalidate L1 cache and we should remove all
> > v7_invalidate_l1 calls in all headsmp.S in platform specific directories.
> > 
> >>>
> >>> When I remove the test for the pen value the CPU does not come up any
> >>> more, I get this log output:
> >>>
> >>> [ 0.132292] CPU: Testing write buffer coherency: ok
> >>> [ 0.137635] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> >>> [ 0.143675] Setting up static identity map for 0x82a0 - 0x82d4
> >>> [ 10.149786] CPU1: failed to boot: -38
> >>> [ 10.153651] Brought up 1 CPUs
> >>>
> >>>
> >>> This was caused just by removing the "cmp r7, r0" and "bne pen"
> >>> instructions.
> >>>
> >>> With these instructions are added it works and I get this:
> >>>
> >>> [    0.132329] CPU: Testing write buffer coherency: ok
> >>> [    0.137682] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> >>> [    0.143708] Setting up static identity map for 0x82a0 - 0x82d4
> >>> [    0.189788] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >>> [    0.189892] Brought up 2 CPUs
> >>> [    0.198889] SMP: Total of 2 processors activated (3188.32 BogoMIPS).
> >>>
> >>> Currently this is 100% reproducible.
> >>>
> >>> Could it be that the second CPU needs some time till it is synchronised
> >>> correctly?
> >>>
> >>> I do not know if and why the cache clearing is needed, I do not have
> >>> access to the SoC documentation or the ASIC/firmware developer.
> >>>
> >>>>
> >>>>> +	sysram_base_addr = of_iomap(node, 0);
> >>>>> +	if (!sysram_base_addr) {
> >>>>> +		pr_warn("Failed to map sysram\n");
> >>>>> +		return;
> >>>>> +	}
> >>>>> +
> >>>>> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
> >>>>> +
> >>>>> +	dsb_sev();	/* Exit WFI */
> >>>>
> >>>> Which WFI?  This seems to imply that you have some kind of initial
> >>>> firmware.  If so, that should be taking care of the cache initialisation,
> >>>> not the kernel.
> >>>>
> >>>>> +	mb();		/* make sure write buffer is drained */
> >>>>
> >>>> writel() already ensures that.
> >>>>
> >>>>> +	/*
> >>>>> +	 * The secondary processor is waiting to be released from
> >>>>> +	 * the holding pen - release it, then wait for it to flag
> >>>>> +	 * that it has been released by resetting pen_release.
> >>>>> +	 *
> >>>>> +	 * Note that "pen_release" is the hardware CPU ID, whereas
> >>>>> +	 * "cpu" is Linux's internal ID.
> >>>>> +	 */
> >>>>> +	write_pen_release(cpu_logical_map(cpu));
> >>>>> +
> >>>>> +	 /* Send the secondary CPU SEV */
> >>>>> +	dsb_sev();
> >>>>
> >>>> If you even need any of the pen code, if you're having to send a SEV here,
> >>>> wouldn't having a WFE in the pen assembly loop above be a good idea?
> >>>>
> >>>
> >>>
> >>> I have to read more on how WFE and co works.
> >>>
> >>> Hauke
> >>>
> > 
> > Thanks,
> > Kapil
> > 
>
Hauke Mehrtens Oct. 23, 2015, 10:36 p.m. UTC | #9
On 10/22/2015 10:30 PM, Jon Mason wrote:
> On Wed, Oct 14, 2015 at 08:22:54PM +0200, Hauke Mehrtens wrote:
>> On 10/14/2015 03:42 PM, Kapil Hali wrote:
>>>
>>>
>>> On 10/14/2015 4:18 AM, Ray Jui wrote:
>>>> + bcm-kernel-feedback-list.
>>>>
>>>> Kapil, you might want to take a look at this. Not sure how this is
>>>> related to your SMP patches for NSP.
>>>
>>> Ray, I don't have complete/other patch sets for this change. It would
>>> be good if I get those patch sets as well or complete e-mail thread. 
>>> I think if we have a cleaner solutions for SMP, we can consolidate 
>>> the change required for NS and NSP. I have few points to add, which 
>>> are inline in this e-mail.
>>
>> Hi Kapil,
>>
>> the patch was posted on the arm mailing list:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/332690.html
>> but you can also find it in OpenWrt here:
>> https://dev.openwrt.org/browser/trunk/target/linux/bcm53xx/patches-4.1/131-ARM-BCM5301X-Implement-SMP-support.patch
>> It looks very similar to your approach so I assume we can use the same
>> SMP code for NS and NSP. I will look into your patches later on.
> 
> Hello Hauke,
> With my last patch (https://lkml.org/lkml/2015/10/15/690), SMP on 4708
> is working with the NSP SMP patch
> (https://lkml.org/lkml/2015/10/14/769).  That patch does not have the
> issues that Russell King was mentioning in this patch series (and
> supports more SoCs).  Have you had a chance to verify that it works
> for you?

Yes I tried it and it works for me. I have already added it to OpenWrt:
https://dev.openwrt.org/changeset/47247

Hauke

> 
> Thanks,
> Jon
> 
>>
>> Hauke
>>
>>>
>>>>
>>>> On 10/13/2015 3:29 PM, Hauke Mehrtens wrote:
>>>>> Hi,
>>>>>
>>>>> I tested this patch on my device now.
>>>>>
>>>>> What does the loader do before Linux gets started on the second CPU and
>>>>> what is ensured?
>>>>>
>>>>> On 03/26/2015 01:00 PM, Russell King - ARM Linux wrote:
>>>>>> On Sun, Mar 22, 2015 at 02:20:15PM +0100, Rafał Miłecki wrote:
>>>>>>> +/*
>>>>>>> + * BCM5301X specific entry point for secondary CPUs.
>>>>>>> + */
>>>>>>> +ENTRY(bcm5301x_secondary_startup)
>>>>>>> +	mrc	p15, 0, r0, c0, c0, 5
>>>>>>> +	and	r0, r0, #15
>>>>>>> +	adr	r4, 1f
>>>>>>> +	ldmia	r4, {r5, r6}
>>>>>>> +	sub	r4, r4, r5
>>>>>>> +	add	r6, r6, r4
>>>>>>> +pen:	ldr	r7, [r6]
>>>>>>> +	cmp	r7, r0
>>>>>>> +	bne	pen
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * In case L1 cache has unpredictable contents at power-up
>>>>>>> +	 * clean its contents without flushing.
>>>>>>> +	 */
>>>>>>> +	bl      v7_invalidate_l1
>>>>>>> +
>>>>>>> +	mov	r0, #0
>>>>>>> +	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
>>>>>>> +	dsb
>>>>>>> +	isb
>>>>>>
>>>>>> So if your I-cache contains unpredictable contents, how do you execute
>>>>>> the code to this point?  Shouldn't the I-cache invalidate be the very
>>>>>> first instruction you execute followed by the dsb and isb (oh, and iirc
>>>>>> it ignores the value in the register).
>>>>>>
>>>>>> In the case where a CPU has unpredictable contents at power up, the
>>>>>> ARM ARM requires that an implementation specific sequence is followed
>>>>>> to initialise the caches.  I doubt that such a sequence includes testing
>>>>>> a pen value.
>>>
>>> Are you sure this is an issue of unpredictable L1 cache contents at
>>> power-up? AFAIK, 5301X had an issue with secondary core initialization. 
>>> Secondary core which waits on WFE would let it out of the pen as soon as the 
>>> first spin_*lock executes. This was because of a BOOTROM bug in NS, so the
>>> work around was to reset the address for the secondary processor to go back 
>>> and wait for the signal from the primary core. This vector fixup is required
>>> so that the secondary core doesn't start executing kernel instructions until 
>>> we've patched its jump address during wakeup_secondary().
>>>
>>> Also, v7 setup function should invalidate L1 cache and we should remove all
>>> v7_invalidate_l1 calls in all headsmp.S in platform specific directories.
>>>
>>>>>
>>>>> When I remove the test for the pen value the CPU does not come up any
>>>>> more, I get this log output:
>>>>>
>>>>> [ 0.132292] CPU: Testing write buffer coherency: ok
>>>>> [ 0.137635] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>>>> [ 0.143675] Setting up static identity map for 0x82a0 - 0x82d4
>>>>> [ 10.149786] CPU1: failed to boot: -38
>>>>> [ 10.153651] Brought up 1 CPUs
>>>>>
>>>>>
>>>>> This was caused just by removing the "cmp r7, r0" and "bne pen"
>>>>> instructions.
>>>>>
>>>>> With these instructions are added it works and I get this:
>>>>>
>>>>> [    0.132329] CPU: Testing write buffer coherency: ok
>>>>> [    0.137682] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>>>> [    0.143708] Setting up static identity map for 0x82a0 - 0x82d4
>>>>> [    0.189788] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>>>>> [    0.189892] Brought up 2 CPUs
>>>>> [    0.198889] SMP: Total of 2 processors activated (3188.32 BogoMIPS).
>>>>>
>>>>> Currently this is 100% reproducible.
>>>>>
>>>>> Could it be that the second CPU needs some time till it is synchronised
>>>>> correctly?
>>>>>
>>>>> I do not know if and why the cache clearing is needed, I do not have
>>>>> access to the SoC documentation or the ASIC/firmware developer.
>>>>>
>>>>>>
>>>>>>> +	sysram_base_addr = of_iomap(node, 0);
>>>>>>> +	if (!sysram_base_addr) {
>>>>>>> +		pr_warn("Failed to map sysram\n");
>>>>>>> +		return;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
>>>>>>> +
>>>>>>> +	dsb_sev();	/* Exit WFI */
>>>>>>
>>>>>> Which WFI?  This seems to imply that you have some kind of initial
>>>>>> firmware.  If so, that should be taking care of the cache initialisation,
>>>>>> not the kernel.
>>>>>>
>>>>>>> +	mb();		/* make sure write buffer is drained */
>>>>>>
>>>>>> writel() already ensures that.
>>>>>>
>>>>>>> +	/*
>>>>>>> +	 * The secondary processor is waiting to be released from
>>>>>>> +	 * the holding pen - release it, then wait for it to flag
>>>>>>> +	 * that it has been released by resetting pen_release.
>>>>>>> +	 *
>>>>>>> +	 * Note that "pen_release" is the hardware CPU ID, whereas
>>>>>>> +	 * "cpu" is Linux's internal ID.
>>>>>>> +	 */
>>>>>>> +	write_pen_release(cpu_logical_map(cpu));
>>>>>>> +
>>>>>>> +	 /* Send the secondary CPU SEV */
>>>>>>> +	dsb_sev();
>>>>>>
>>>>>> If you even need any of the pen code, if you're having to send a SEV here,
>>>>>> wouldn't having a WFE in the pen assembly loop above be a good idea?
>>>>>>
>>>>>
>>>>>
>>>>> I have to read more on how WFE and co works.
>>>>>
>>>>> Hauke
>>>>>
>>>
>>> Thanks,
>>> Kapil
>>>
>>
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/bcm4708.txt b/Documentation/devicetree/bindings/arm/bcm4708.txt
index 6b0f49f..3dd0e9d 100644
--- a/Documentation/devicetree/bindings/arm/bcm4708.txt
+++ b/Documentation/devicetree/bindings/arm/bcm4708.txt
@@ -6,3 +6,27 @@  Boards with the BCM4708 SoC shall have the following properties:
 Required root node property:
 
 compatible = "brcm,bcm4708";
+
+Optional sub-node properties:
+
+compatible = "mmio-sram" for SRAM access with IO memory region
+		This is needed for SMP-capable SoCs which use part of
+		SRAM for storing location of code to be executed by the
+		extra cores.
+		SMP support requires another sub-node with compatible
+		property "brcm,bcm4708-sysram".
+
+Example:
+
+	sysram@ffff0000 {
+		compatible = "mmio-sram";
+		reg = <0xffff0000 0x10000>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0 0xffff0000 0x10000>;
+
+		smp-sysram@0 {
+			compatible = "brcm,bcm4708-sysram";
+			reg = <0x0 0x1000>;
+		};
+	};
diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
index 6aa331d..3507ae3 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -189,6 +189,7 @@  nodes to be present and contain the properties described below.
 			  can be one of:
 			    "allwinner,sun6i-a31"
 			    "arm,psci"
+			    "brcm,bcm4708-smp"
 			    "brcm,brahma-b15"
 			    "marvell,armada-375-smp"
 			    "marvell,armada-380-smp"
diff --git a/arch/arm/boot/dts/bcm4708.dtsi b/arch/arm/boot/dts/bcm4708.dtsi
index 31141e8..c0af5cc 100644
--- a/arch/arm/boot/dts/bcm4708.dtsi
+++ b/arch/arm/boot/dts/bcm4708.dtsi
@@ -15,6 +15,7 @@ 
 	cpus {
 		#address-cells = <1>;
 		#size-cells = <0>;
+		enable-method = "brcm,bcm4708-smp";
 
 		cpu@0 {
 			device_type = "cpu";
@@ -31,4 +32,16 @@ 
 		};
 	};
 
+	sysram@ffff0000 {
+		compatible = "mmio-sram";
+		reg = <0xffff0000 0x10000>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges = <0 0xffff0000 0x10000>;
+
+		smp-sysram@0 {
+			compatible = "brcm,bcm4708-sysram";
+			reg = <0x0 0x1000>;
+		};
+	};
 };
diff --git a/arch/arm/mach-bcm/Makefile b/arch/arm/mach-bcm/Makefile
index 4c38674..ca12727 100644
--- a/arch/arm/mach-bcm/Makefile
+++ b/arch/arm/mach-bcm/Makefile
@@ -36,6 +36,9 @@  obj-$(CONFIG_ARCH_BCM2835)	+= board_bcm2835.o
 
 # BCM5301X
 obj-$(CONFIG_ARCH_BCM_5301X)	+= bcm_5301x.o
+ifeq ($(CONFIG_SMP),y)
+obj-$(CONFIG_ARCH_BCM_5301X)	+= bcm5301x_smp.o bcm5301x_headsmp.o
+endif
 
 # BCM63XXx
 obj-$(CONFIG_ARCH_BCM_63XX)	:= bcm63xx.o
diff --git a/arch/arm/mach-bcm/bcm5301x_headsmp.S b/arch/arm/mach-bcm/bcm5301x_headsmp.S
new file mode 100644
index 0000000..9ca8d20
--- /dev/null
+++ b/arch/arm/mach-bcm/bcm5301x_headsmp.S
@@ -0,0 +1,45 @@ 
+/*
+ * Broadcom BCM470X / BCM5301X ARM platform code.
+ *
+ * Copyright (c) 2003 ARM Limited
+ * All Rights Reserved
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+#include <linux/linkage.h>
+
+/*
+ * BCM5301X specific entry point for secondary CPUs.
+ */
+ENTRY(bcm5301x_secondary_startup)
+	mrc	p15, 0, r0, c0, c0, 5
+	and	r0, r0, #15
+	adr	r4, 1f
+	ldmia	r4, {r5, r6}
+	sub	r4, r4, r5
+	add	r6, r6, r4
+pen:	ldr	r7, [r6]
+	cmp	r7, r0
+	bne	pen
+
+	/*
+	 * In case L1 cache has unpredictable contents at power-up
+	 * clean its contents without flushing.
+	 */
+	bl      v7_invalidate_l1
+
+	mov	r0, #0
+	mcr	p15, 0, r0, c7, c5, 0	/* Invalidate icache */
+	dsb
+	isb
+
+	/*
+	 * we've been released from the holding pen: secondary_stack
+	 * should now contain the SVC stack for this core
+	 */
+	b	secondary_startup
+ENDPROC(bcm5301x_secondary_startup)
+
+	.align 2
+1:	.long	.
+	.long	pen_release
diff --git a/arch/arm/mach-bcm/bcm5301x_smp.c b/arch/arm/mach-bcm/bcm5301x_smp.c
new file mode 100644
index 0000000..45d7089
--- /dev/null
+++ b/arch/arm/mach-bcm/bcm5301x_smp.c
@@ -0,0 +1,158 @@ 
+/*
+ * Broadcom BCM470X / BCM5301X ARM platform code.
+ *
+ * Copyright (C) 2002 ARM Ltd.
+ * Copyright (C) 2015 Rafał Miłecki <zajec5@gmail.com>
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#include <asm/cacheflush.h>
+#include <asm/delay.h>
+#include <asm/smp_plat.h>
+#include <asm/smp_scu.h>
+
+#include <linux/clockchips.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+
+#define SOC_ROM_LUT_OFF		0x400
+
+extern void bcm5301x_secondary_startup(void);
+
+static void __cpuinit write_pen_release(int val)
+{
+	pen_release = val;
+	smp_wmb();
+	sync_cache_w(&pen_release);
+}
+
+static DEFINE_SPINLOCK(boot_lock);
+
+static void __init bcm5301x_smp_secondary_set_entry(void (*entry_point)(void))
+{
+	void __iomem *sysram_base_addr = NULL;
+	struct device_node *node;
+
+	node = of_find_compatible_node(NULL, NULL, "brcm,bcm4708-sysram");
+	if (!of_device_is_available(node))
+		return;
+
+	sysram_base_addr = of_iomap(node, 0);
+	if (!sysram_base_addr) {
+		pr_warn("Failed to map sysram\n");
+		return;
+	}
+
+	writel(virt_to_phys(entry_point), sysram_base_addr + SOC_ROM_LUT_OFF);
+
+	dsb_sev();	/* Exit WFI */
+	mb();		/* make sure write buffer is drained */
+
+	iounmap(sysram_base_addr);
+}
+
+static void __init bcm5301x_smp_prepare_cpus(unsigned int max_cpus)
+{
+	void __iomem *scu_base;
+
+	if (!scu_a9_has_base()) {
+		pr_warn("Unknown SCU base\n");
+		return;
+	}
+
+	scu_base = ioremap((phys_addr_t)scu_a9_get_base(), SZ_256);
+	if (!scu_base) {
+		pr_err("Failed to remap SCU\n");
+		return;
+	}
+
+	/* Initialise the SCU */
+	scu_enable(scu_base);
+
+	/* Let CPUs know where to start */
+	bcm5301x_smp_secondary_set_entry(bcm5301x_secondary_startup);
+
+	iounmap(scu_base);
+}
+
+static void __cpuinit bcm5301x_smp_secondary_init(unsigned int cpu)
+{
+	trace_hardirqs_off();
+
+	/*
+	 * let the primary processor know we're out of the
+	 * pen, then head off into the C entry point
+	 */
+	write_pen_release(-1);
+
+	/*
+	 * Synchronise with the boot thread.
+	 */
+	spin_lock(&boot_lock);
+	spin_unlock(&boot_lock);
+}
+
+static int __cpuinit bcm5301x_smp_boot_secondary(unsigned int cpu,
+						 struct task_struct *idle)
+{
+	unsigned long timeout;
+
+	/*
+	 * set synchronisation state between this boot processor
+	 * and the secondary one
+	 */
+	spin_lock(&boot_lock);
+
+	/*
+	 * The secondary processor is waiting to be released from
+	 * the holding pen - release it, then wait for it to flag
+	 * that it has been released by resetting pen_release.
+	 *
+	 * Note that "pen_release" is the hardware CPU ID, whereas
+	 * "cpu" is Linux's internal ID.
+	 */
+	write_pen_release(cpu_logical_map(cpu));
+
+	 /* Send the secondary CPU SEV */
+	dsb_sev();
+
+	udelay(100);
+
+	/*
+	 * Send the secondary CPU a soft interrupt, thereby causing
+	 * the boot monitor to read the system wide flags register,
+	 * and branch to the address found there.
+	 */
+	arch_send_wakeup_ipi_mask(cpumask_of(cpu));
+
+	/*
+	 * Timeout set on purpose in jiffies so that on slow processors
+	 * that must also have low HZ it will wait longer.
+	 */
+	timeout = jiffies + (HZ * 10);
+	while (time_before(jiffies, timeout)) {
+		smp_rmb();
+		if (pen_release == -1)
+			break;
+
+		udelay(10);
+	}
+
+	/*
+	 * now the secondary core is starting up let it run its
+	 * calibrations, then wait for it to finish
+	 */
+	spin_unlock(&boot_lock);
+
+	return pen_release != -1 ? -ENOSYS : 0;
+}
+
+static struct smp_operations bcm5301x_smp_ops __initdata = {
+	.smp_prepare_cpus	= bcm5301x_smp_prepare_cpus,
+	.smp_secondary_init	= bcm5301x_smp_secondary_init,
+	.smp_boot_secondary	= bcm5301x_smp_boot_secondary,
+};
+
+CPU_METHOD_OF_DECLARE(bcm5301x_smp, "brcm,bcm4708-smp",
+		      &bcm5301x_smp_ops);