Message ID | 546999F9.9070009@web.de |
---|---|
State | RFC |
Headers | show |
On 2014-11-17 07:47, Jan Kiszka wrote: > On 2014-11-10 14:10, Marc Zyngier wrote: >> On 10/11/14 12:57, Jan Kiszka wrote: >>> Hi all, >>> >>> I'm trying to get Marc's CPU hotplug-anabling patch [1] for sunxi >>> working on a B-Pi. After the first discussion it became clear that we >>> need something like flush_dcache_all in the PSCI monitor (I don't think >>> we need an icache flush, do we?). Does anyone have a clever suggestion >> >> No, I-cache can be left alone. >> >>> how to reuse the existing code for that? Or do we really need to >>> re-implement everything, in the worst case in assembly? >> >> Why don't you turn the u-boot code into a set of macros, included by >> both the core u-boot code and the PSCI code? > > I've now ported over v7_flush_dcache_all from the Linux kernel. > > However, that didn't magically solve the remaining issues with your > patch: I'm getting crashes on CPU 0 after handling the shoot-down FIQ. > That is even then the case if I only acknowledge the FIQ on the receiver > side, don't do any fiddling with CPU1's power states. Only if I > disabling sending the FIQ from CPU 1, the system remains stable in a CPU > off/on loop. > > Below the patch I'm using. Any ideas if something is wrong with the FIQ > handler or the setup of this mechanism or whatever? Ping. I'm still seeing no light in this tunnel. One finding below, but maybe a non-issue. > +.globl psci_fiq_enter > +psci_fiq_enter: > + push {r0-r12} > + > + @ Switch to secure > + mrc p15, 0, r7, c1, c1, 0 > + bic r8, r7, #1 > + mcr p15, 0, r8, c1, c1, 0 > + isb > + > + movw r8, #(GICC_BASE & 0xffff) > + movt r8, #(GICC_BASE >> 16) > + ldr r9, [r8, #GICC_IAR] > + movw r10, #0x3ff > + movt r10, #0 > + cmp r9, r10 > + beq out > + movw r10, #0x3fe > + cmp r9, r10 > + beq out > + str r9, [r8, #GICC_EOIR] > + dsb > + > + @ Compute CPU number > + lsr r9, r9, #10 > + and r9, r9, #0xf > + > + movw r8, #(SUN7I_CPUCFG_BASE & 0xffff) > + movt r8, #(SUN7I_CPUCFG_BASE >> 16) > + > + @ Wait for the core to enter WFI > + lsl r11, r9, #6 @ x64 > + add r11, r11, r8 > + > +1: ldr r10, [r11, #0x48] > + tst r10, #(1 << 2) > + bne 2f > + timer_wait r10, ONE_MS > + b 1b > + > + @ Reset CPU > +2: mov r10, #0 > + str r10, [r11, #0x40] > + > + @ Lock CPU > + mov r10, #1 > + lsl r9, r10, r9 @ r9 is now CPU mask > + ldr r10, [r8, #0x1e4] > + bic r10, r10, r9 > + str r10, [r8, #0x1e4] > + > + @ Set power gating > + ldr r10, [r8, #0x1b4] > + orr r10, r10, #1 > + str r10, [r8, #0x1b4] > + timer_wait r10, ONE_MS > + > + @ Activate power clamp > + mov r10, #1 > +1: str r10, [r8, #0x1b0] > + lsl r10, r10, #1 > + orr r10, r10, #1 > + tst r10, #0x100 > + beq 1b > + > + @ Restore security level > +out: mcr p15, 0, r7, c1, c1, 0 There is no isb here - not required? It has no impact on the stability issue, though. Jan
On 25/11/14 10:27, Jan Kiszka wrote: > On 2014-11-17 07:47, Jan Kiszka wrote: >> On 2014-11-10 14:10, Marc Zyngier wrote: >>> On 10/11/14 12:57, Jan Kiszka wrote: >>>> Hi all, >>>> >>>> I'm trying to get Marc's CPU hotplug-anabling patch [1] for sunxi >>>> working on a B-Pi. After the first discussion it became clear that we >>>> need something like flush_dcache_all in the PSCI monitor (I don't think >>>> we need an icache flush, do we?). Does anyone have a clever suggestion >>> >>> No, I-cache can be left alone. >>> >>>> how to reuse the existing code for that? Or do we really need to >>>> re-implement everything, in the worst case in assembly? >>> >>> Why don't you turn the u-boot code into a set of macros, included by >>> both the core u-boot code and the PSCI code? >> >> I've now ported over v7_flush_dcache_all from the Linux kernel. >> >> However, that didn't magically solve the remaining issues with your >> patch: I'm getting crashes on CPU 0 after handling the shoot-down FIQ. >> That is even then the case if I only acknowledge the FIQ on the receiver >> side, don't do any fiddling with CPU1's power states. Only if I >> disabling sending the FIQ from CPU 1, the system remains stable in a CPU >> off/on loop. >> >> Below the patch I'm using. Any ideas if something is wrong with the FIQ >> handler or the setup of this mechanism or whatever? > > Ping. I'm still seeing no light in this tunnel. One finding below, but > maybe a non-issue. Sorry, I haven't had a chance to look at this at all, as the next kernel merge window is getting closer and I still have tons of things to review. What you describe above seems to indicate that it is the FIQ handling that breaks something. Just to understand what you're observing: - does CPU0 always lock-up? - if not, can you find out if it locks up on the "out" path or not? I vaguely remember seeing issues in the "wait" loop below, but my memory is very fuzzy... Any chance you could instrument this? >> +.globl psci_fiq_enter >> +psci_fiq_enter: >> + push {r0-r12} >> + >> + @ Switch to secure >> + mrc p15, 0, r7, c1, c1, 0 >> + bic r8, r7, #1 >> + mcr p15, 0, r8, c1, c1, 0 >> + isb >> + >> + movw r8, #(GICC_BASE & 0xffff) >> + movt r8, #(GICC_BASE >> 16) >> + ldr r9, [r8, #GICC_IAR] >> + movw r10, #0x3ff >> + movt r10, #0 >> + cmp r9, r10 >> + beq out >> + movw r10, #0x3fe >> + cmp r9, r10 >> + beq out >> + str r9, [r8, #GICC_EOIR] >> + dsb >> + >> + @ Compute CPU number >> + lsr r9, r9, #10 >> + and r9, r9, #0xf >> + >> + movw r8, #(SUN7I_CPUCFG_BASE & 0xffff) >> + movt r8, #(SUN7I_CPUCFG_BASE >> 16) >> + >> + @ Wait for the core to enter WFI >> + lsl r11, r9, #6 @ x64 >> + add r11, r11, r8 >> + >> +1: ldr r10, [r11, #0x48] >> + tst r10, #(1 << 2) >> + bne 2f >> + timer_wait r10, ONE_MS >> + b 1b >> + >> + @ Reset CPU >> +2: mov r10, #0 >> + str r10, [r11, #0x40] >> + >> + @ Lock CPU >> + mov r10, #1 >> + lsl r9, r10, r9 @ r9 is now CPU mask >> + ldr r10, [r8, #0x1e4] >> + bic r10, r10, r9 >> + str r10, [r8, #0x1e4] >> + >> + @ Set power gating >> + ldr r10, [r8, #0x1b4] >> + orr r10, r10, #1 >> + str r10, [r8, #0x1b4] >> + timer_wait r10, ONE_MS >> + >> + @ Activate power clamp >> + mov r10, #1 >> +1: str r10, [r8, #0x1b0] >> + lsl r10, r10, #1 >> + orr r10, r10, #1 >> + tst r10, #0x100 >> + beq 1b >> + >> + @ Restore security level >> +out: mcr p15, 0, r7, c1, c1, 0 > > There is no isb here - not required? It has no impact on the stability > issue, though. The following instructions contain an exception return (movs pc, lr), which acts implicitly as an isb. Thanks, M.
On 2014-11-25 11:51, Marc Zyngier wrote: > On 25/11/14 10:27, Jan Kiszka wrote: >> On 2014-11-17 07:47, Jan Kiszka wrote: >>> On 2014-11-10 14:10, Marc Zyngier wrote: >>>> On 10/11/14 12:57, Jan Kiszka wrote: >>>>> Hi all, >>>>> >>>>> I'm trying to get Marc's CPU hotplug-anabling patch [1] for sunxi >>>>> working on a B-Pi. After the first discussion it became clear that we >>>>> need something like flush_dcache_all in the PSCI monitor (I don't think >>>>> we need an icache flush, do we?). Does anyone have a clever suggestion >>>> >>>> No, I-cache can be left alone. >>>> >>>>> how to reuse the existing code for that? Or do we really need to >>>>> re-implement everything, in the worst case in assembly? >>>> >>>> Why don't you turn the u-boot code into a set of macros, included by >>>> both the core u-boot code and the PSCI code? >>> >>> I've now ported over v7_flush_dcache_all from the Linux kernel. >>> >>> However, that didn't magically solve the remaining issues with your >>> patch: I'm getting crashes on CPU 0 after handling the shoot-down FIQ. >>> That is even then the case if I only acknowledge the FIQ on the receiver >>> side, don't do any fiddling with CPU1's power states. Only if I >>> disabling sending the FIQ from CPU 1, the system remains stable in a CPU >>> off/on loop. >>> >>> Below the patch I'm using. Any ideas if something is wrong with the FIQ >>> handler or the setup of this mechanism or whatever? >> >> Ping. I'm still seeing no light in this tunnel. One finding below, but >> maybe a non-issue. > > Sorry, I haven't had a chance to look at this at all, as the next kernel > merge window is getting closer and I still have tons of things to review. I understand, no problem. > > What you describe above seems to indicate that it is the FIQ handling > that breaks something. Just to understand what you're observing: > - does CPU0 always lock-up? > - if not, can you find out if it locks up on the "out" path or not? Only after a few offline/online cycles. Here is an example of the bug report: [ 101.259825] Internal error: Oops: 817 [#1] SMP ARM [ 101.264678] Modules linked in: carl9170 ath [ 101.268934] CPU: 0 PID: 57 Comm: kworker/1:2 Not tainted 3.18.0-rc3-00087-g8c27ecb #12 [ 101.276895] Workqueue: events cpuset_hotplug_workfn [ 101.281806] task: dd614ec0 ti: dd664000 task.ti: dd664000 [ 101.287230] PC is at register_leaf_sysctl_tables+0x128/0x1b4 [ 101.292914] LR is at _raw_spin_unlock+0x30/0x4c [ 101.297465] pc : [<c01b1e04>] lr : [<c059c3c8>] psr: a00f0013 [ 101.297465] sp : dd665c40 ip : dd665be8 fp : dd665c84 [ 101.308975] r10: dc48b428 r9 : dbd40900 r8 : dc43f000 [ 101.314219] r7 : dd665c9c r6 : 00000000 r5 : dc48b5e4 r4 : 00000000 [ 101.320767] r3 : 00000000 r2 : 00000000 r1 : 00000001 r0 : dc48b580 [ 101.327316] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 101.334651] Control: 10c5387d Table: 5d7c806a DAC: 00000015 [ 101.340416] Process kworker/1:2 (pid: 57, stack limit = 0xdd664240) [ 101.346704] Stack: (0xdd665c40 to 0xdd666000) [ 101.351085] 5c40: dd665c6c dd665c50 c01b0048 c0876a84 dc43f019 00000000 dc43f000 00000000 [ 101.359295] 5c60: dbd40924 00000002 dd665c9c dc43f000 dbd40900 dc48b428 dd665ccc dd665c88 [ 101.367505] 5c80: c01b1e5c c01b1ce8 dc48b5c0 c01b0000 000080d0 c0876a84 dc43f014 00000000 [ 101.375715] 5ca0: dd665ccc dc43f000 dc48b400 c085e730 c0876a84 dc43f014 dbd40900 dc48b428 [ 101.383924] 5cc0: dd665ccc dd665cd0 c01b1fcc c01b1ce8 dbd40900 dd665ce0 c01b2064 c01b2024 [ 101.392134] 5ce0: dd665d6c dd665cf0 c004d0c0 c01b2050 dec03960 dc48b640 c0852490 c0848a80 [ 101.400343] 5d00: 31757063 de001e00 000072ab 00000001 dd665d4c dd665d20 c013c828 c013bd30 [ 101.408553] 5d20: 00000000 dc4cb758 c085e6f4 00000000 dd665d6c dd665d40 c013c624 c0085d84 [ 101.416762] 5d40: c08aa504 c08aa2c0 dc48b4c0 dc48bbc0 00000000 00000001 00000001 dc48b4c4 [ 101.424971] 5d60: dd665dbc dd665d70 c0055070 c004cc70 dc48bbc0 c108c750 c0865e60 00000001 [ 101.433180] 5d80: c08aa504 00000000 00000000 00000001 00000001 00000001 00000000 dc48b4c0 [ 101.441389] 5da0: dc48bbc0 c108c750 c0865e60 00000001 dd665e14 dd665dc0 c00b7eac c0054d38 [ 101.449599] 5dc0: 00000001 00000000 c00ba478 00000002 dd665e0c dd614ec0 c00b0ed8 c0085d84 [ 101.457808] 5de0: 00000000 c0865f20 dd665e2c c0865e60 00000001 00000001 c108b2b8 c108c750 [ 101.466017] 5e00: c108c750 00000001 dd665e2c dd665e18 c00ba47c c00b7a70 00000000 00000001 [ 101.474227] 5e20: dd665e8c dd665e30 c00bac94 c00ba460 00000000 00000000 c00ba59c c00702b0 [ 101.482436] 5e40: 00000001 00000000 00000000 c003e598 00000000 de851480 dd665e74 600f0013 [ 101.490645] 5e60: c006ff90 dd644a80 c0865fdc dd664010 de851480 dd664000 de854c00 00000000 [ 101.498855] 5e80: dd665eec dd665e90 c003e724 c00ba498 00000001 00000000 c003e598 de851480 [ 101.507064] 5ea0: c003f8c4 00000000 00000000 c08a5c18 c0865fdc 00000000 00000000 c06e8114 [ 101.515274] 5ec0: c059c314 de851480 de851480 de8514b0 dd664030 dd644a98 dd644a80 00000000 [ 101.523483] 5ee0: dd665f24 dd665ef0 c003fb70 c003e3cc 00000000 dd644a80 c003f88c dd6511c0 [ 101.531692] 5f00: 00000000 dd644a80 c003f88c 00000000 00000000 00000000 dd665fac dd665f28 [ 101.539901] 5f20: c0044924 c003f898 dd665f44 00000000 c006ff90 dd644a80 00000000 00000000 [ 101.548111] 5f40: dead4ead ffffffff ffffffff c08aa254 00000000 00000000 c06dcfb4 dd665f5c [ 101.556320] 5f60: dd665f5c 00000000 00000000 dead4ead ffffffff ffffffff c08aa254 00000000 [ 101.564529] 5f80: 00000000 c06dcfb4 dd665f88 dd665f88 dd6511c0 c0044848 00000000 00000000 [ 101.572738] 5fa0: 00000000 dd665fb0 c000e4e8 c0044854 00000000 00000000 00000000 00000000 [ 101.580946] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 101.589154] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 101.597377] [<c01b1e04>] (register_leaf_sysctl_tables) from [<c01b1e5c>] (register_leaf_sysctl_tables+0x180/0x1b4) [ 101.607767] [<c01b1e5c>] (register_leaf_sysctl_tables) from [<c01b1fcc>] (__register_sysctl_paths+0x13c/0x188) [ 101.617802] Code: ebfe29e3 ea00001f e5804014 e5973000 (e5830000) [ 101.624400] ---[ end trace b5dc3024edfeff4c ]--- It's apparently always outside of the handler, after returning from it. I'm starting to suspect something is wrong with the stack the handler uses and, thus, with the Linux registers it restores on return. > > I vaguely remember seeing issues in the "wait" loop below, but my > memory is very fuzzy... Any chance you could instrument this? I tried but right now I'm not sure what to look for. > >>> +.globl psci_fiq_enter >>> +psci_fiq_enter: >>> + push {r0-r12} >>> + >>> + @ Switch to secure >>> + mrc p15, 0, r7, c1, c1, 0 >>> + bic r8, r7, #1 >>> + mcr p15, 0, r8, c1, c1, 0 >>> + isb >>> + >>> + movw r8, #(GICC_BASE & 0xffff) >>> + movt r8, #(GICC_BASE >> 16) >>> + ldr r9, [r8, #GICC_IAR] >>> + movw r10, #0x3ff >>> + movt r10, #0 >>> + cmp r9, r10 >>> + beq out >>> + movw r10, #0x3fe >>> + cmp r9, r10 >>> + beq out >>> + str r9, [r8, #GICC_EOIR] >>> + dsb >>> + >>> + @ Compute CPU number >>> + lsr r9, r9, #10 >>> + and r9, r9, #0xf >>> + >>> + movw r8, #(SUN7I_CPUCFG_BASE & 0xffff) >>> + movt r8, #(SUN7I_CPUCFG_BASE >> 16) >>> + >>> + @ Wait for the core to enter WFI >>> + lsl r11, r9, #6 @ x64 >>> + add r11, r11, r8 >>> + >>> +1: ldr r10, [r11, #0x48] >>> + tst r10, #(1 << 2) >>> + bne 2f >>> + timer_wait r10, ONE_MS >>> + b 1b >>> + >>> + @ Reset CPU >>> +2: mov r10, #0 >>> + str r10, [r11, #0x40] >>> + >>> + @ Lock CPU >>> + mov r10, #1 >>> + lsl r9, r10, r9 @ r9 is now CPU mask >>> + ldr r10, [r8, #0x1e4] >>> + bic r10, r10, r9 >>> + str r10, [r8, #0x1e4] >>> + >>> + @ Set power gating >>> + ldr r10, [r8, #0x1b4] >>> + orr r10, r10, #1 >>> + str r10, [r8, #0x1b4] >>> + timer_wait r10, ONE_MS >>> + >>> + @ Activate power clamp >>> + mov r10, #1 >>> +1: str r10, [r8, #0x1b0] >>> + lsl r10, r10, #1 >>> + orr r10, r10, #1 >>> + tst r10, #0x100 >>> + beq 1b >>> + >>> + @ Restore security level >>> +out: mcr p15, 0, r7, c1, c1, 0 >> >> There is no isb here - not required? It has no impact on the stability >> issue, though. > > The following instructions contain an exception return (movs pc, lr), > which acts implicitly as an isb. Ah, ok, thanks. Jan
On 2014-11-17 07:47, Jan Kiszka wrote: > On 2014-11-10 14:10, Marc Zyngier wrote: >> On 10/11/14 12:57, Jan Kiszka wrote: >>> Hi all, >>> >>> I'm trying to get Marc's CPU hotplug-anabling patch [1] for sunxi >>> working on a B-Pi. After the first discussion it became clear that we >>> need something like flush_dcache_all in the PSCI monitor (I don't think >>> we need an icache flush, do we?). Does anyone have a clever suggestion >> >> No, I-cache can be left alone. >> >>> how to reuse the existing code for that? Or do we really need to >>> re-implement everything, in the worst case in assembly? >> >> Why don't you turn the u-boot code into a set of macros, included by >> both the core u-boot code and the PSCI code? > > I've now ported over v7_flush_dcache_all from the Linux kernel. > > However, that didn't magically solve the remaining issues with your > patch: I'm getting crashes on CPU 0 after handling the shoot-down FIQ. > That is even then the case if I only acknowledge the FIQ on the receiver > side, don't do any fiddling with CPU1's power states. Only if I > disabling sending the FIQ from CPU 1, the system remains stable in a CPU > off/on loop. > > Below the patch I'm using. Any ideas if something is wrong with the FIQ > handler or the setup of this mechanism or whatever? > > Jan > > --- > > diff --git a/arch/arm/cpu/armv7/sunxi/psci.S b/arch/arm/cpu/armv7/sunxi/psci.S > index b9ea78b..0e4bca4 100644 > --- a/arch/arm/cpu/armv7/sunxi/psci.S > +++ b/arch/arm/cpu/armv7/sunxi/psci.S > @@ -18,6 +18,7 @@ > */ > > #include <config.h> > +#include <asm/gic.h> > #include <asm/psci.h> > #include <asm/arch/cpu.h> > > @@ -38,6 +39,8 @@ > > #define ONE_MS (CONFIG_SYS_CLK_FREQ / 1000) > #define TEN_MS (10 * ONE_MS) > +#define GICD_BASE 0x1c81000 > +#define GICC_BASE 0x1c82000 > > .macro timer_wait reg, ticks > @ Program CNTP_TVAL > @@ -61,7 +64,27 @@ > > .globl psci_arch_init > psci_arch_init: > + movw r4, #(GICD_BASE & 0xffff) > + movt r4, #(GICD_BASE >> 16) > + > + ldr r5, [r4, #GICD_IGROUPRn] > + bic r5, r5, #(1 << 15) @ SGI15 as Group-0 > + str r5, [r4, #GICD_IGROUPRn] > + > + mov r5, #0 @ Set SGI15 priority to 0 > + strb r5, [r4, #(GICD_IPRIORITYRn + 15)] > + > + add r4, r4, #0x1000 @ GICC address > + > + mov r5, #0xff > + str r5, [r4, #GICC_PMR] @ Be cool with non-secure > + > + ldr r5, [r4, #GICC_CTLR] > + orr r5, r5, #(1 << 3) @ Switch FIQEn on > + str r5, [r4, #GICC_CTLR] > + > mrc p15, 0, r5, c1, c1, 0 @ Read SCR > + orr r5, r5, #4 @ Enable FIQ in monitor mode > bic r5, r5, #1 @ Secure mode > mcr p15, 0, r5, c1, c1, 0 @ Write SCR > isb > @@ -79,6 +102,77 @@ psci_arch_init: > > bx lr > > +.globl psci_fiq_enter > +psci_fiq_enter: > + push {r0-r12} > + > + @ Switch to secure > + mrc p15, 0, r7, c1, c1, 0 > + bic r8, r7, #1 > + mcr p15, 0, r8, c1, c1, 0 > + isb > + > + movw r8, #(GICC_BASE & 0xffff) > + movt r8, #(GICC_BASE >> 16) > + ldr r9, [r8, #GICC_IAR] > + movw r10, #0x3ff > + movt r10, #0 > + cmp r9, r10 > + beq out > + movw r10, #0x3fe > + cmp r9, r10 > + beq out > + str r9, [r8, #GICC_EOIR] > + dsb > + > + @ Compute CPU number > + lsr r9, r9, #10 > + and r9, r9, #0xf > + > + movw r8, #(SUN7I_CPUCFG_BASE & 0xffff) > + movt r8, #(SUN7I_CPUCFG_BASE >> 16) > + > + @ Wait for the core to enter WFI > + lsl r11, r9, #6 @ x64 > + add r11, r11, r8 > + > +1: ldr r10, [r11, #0x48] > + tst r10, #(1 << 2) > + bne 2f > + timer_wait r10, ONE_MS > + b 1b > + > + @ Reset CPU > +2: mov r10, #0 > + str r10, [r11, #0x40] > + > + @ Lock CPU > + mov r10, #1 > + lsl r9, r10, r9 @ r9 is now CPU mask > + ldr r10, [r8, #0x1e4] > + bic r10, r10, r9 > + str r10, [r8, #0x1e4] > + > + @ Set power gating > + ldr r10, [r8, #0x1b4] > + orr r10, r10, #1 > + str r10, [r8, #0x1b4] > + timer_wait r10, ONE_MS > + > + @ Activate power clamp > + mov r10, #1 > +1: str r10, [r8, #0x1b0] > + lsl r10, r10, #1 > + orr r10, r10, #1 > + tst r10, #0x100 > + beq 1b > + > + @ Restore security level > +out: mcr p15, 0, r7, c1, c1, 0 > + > + pop {r0-r12} > + movs pc, lr We have to use "subs pc, lr, #4" to return from FIQ. I guess you can explain better than I why, I only stumbled over this in a stackoverflow posting - and it works! Jan
On 26/11/14 08:44, Jan Kiszka wrote: > On 2014-11-17 07:47, Jan Kiszka wrote: >> On 2014-11-10 14:10, Marc Zyngier wrote: >>> On 10/11/14 12:57, Jan Kiszka wrote: >>>> Hi all, >>>> >>>> I'm trying to get Marc's CPU hotplug-anabling patch [1] for sunxi >>>> working on a B-Pi. After the first discussion it became clear that we >>>> need something like flush_dcache_all in the PSCI monitor (I don't think >>>> we need an icache flush, do we?). Does anyone have a clever suggestion >>> >>> No, I-cache can be left alone. >>> >>>> how to reuse the existing code for that? Or do we really need to >>>> re-implement everything, in the worst case in assembly? >>> >>> Why don't you turn the u-boot code into a set of macros, included by >>> both the core u-boot code and the PSCI code? >> >> I've now ported over v7_flush_dcache_all from the Linux kernel. >> >> However, that didn't magically solve the remaining issues with your >> patch: I'm getting crashes on CPU 0 after handling the shoot-down FIQ. >> That is even then the case if I only acknowledge the FIQ on the receiver >> side, don't do any fiddling with CPU1's power states. Only if I >> disabling sending the FIQ from CPU 1, the system remains stable in a CPU >> off/on loop. >> >> Below the patch I'm using. Any ideas if something is wrong with the FIQ >> handler or the setup of this mechanism or whatever? >> >> Jan >> >> --- >> >> diff --git a/arch/arm/cpu/armv7/sunxi/psci.S b/arch/arm/cpu/armv7/sunxi/psci.S >> index b9ea78b..0e4bca4 100644 >> --- a/arch/arm/cpu/armv7/sunxi/psci.S >> +++ b/arch/arm/cpu/armv7/sunxi/psci.S >> @@ -18,6 +18,7 @@ >> */ >> >> #include <config.h> >> +#include <asm/gic.h> >> #include <asm/psci.h> >> #include <asm/arch/cpu.h> >> >> @@ -38,6 +39,8 @@ >> >> #define ONE_MS (CONFIG_SYS_CLK_FREQ / 1000) >> #define TEN_MS (10 * ONE_MS) >> +#define GICD_BASE 0x1c81000 >> +#define GICC_BASE 0x1c82000 >> >> .macro timer_wait reg, ticks >> @ Program CNTP_TVAL >> @@ -61,7 +64,27 @@ >> >> .globl psci_arch_init >> psci_arch_init: >> + movw r4, #(GICD_BASE & 0xffff) >> + movt r4, #(GICD_BASE >> 16) >> + >> + ldr r5, [r4, #GICD_IGROUPRn] >> + bic r5, r5, #(1 << 15) @ SGI15 as Group-0 >> + str r5, [r4, #GICD_IGROUPRn] >> + >> + mov r5, #0 @ Set SGI15 priority to 0 >> + strb r5, [r4, #(GICD_IPRIORITYRn + 15)] >> + >> + add r4, r4, #0x1000 @ GICC address >> + >> + mov r5, #0xff >> + str r5, [r4, #GICC_PMR] @ Be cool with non-secure >> + >> + ldr r5, [r4, #GICC_CTLR] >> + orr r5, r5, #(1 << 3) @ Switch FIQEn on >> + str r5, [r4, #GICC_CTLR] >> + >> mrc p15, 0, r5, c1, c1, 0 @ Read SCR >> + orr r5, r5, #4 @ Enable FIQ in monitor mode >> bic r5, r5, #1 @ Secure mode >> mcr p15, 0, r5, c1, c1, 0 @ Write SCR >> isb >> @@ -79,6 +102,77 @@ psci_arch_init: >> >> bx lr >> >> +.globl psci_fiq_enter >> +psci_fiq_enter: >> + push {r0-r12} >> + >> + @ Switch to secure >> + mrc p15, 0, r7, c1, c1, 0 >> + bic r8, r7, #1 >> + mcr p15, 0, r8, c1, c1, 0 >> + isb >> + >> + movw r8, #(GICC_BASE & 0xffff) >> + movt r8, #(GICC_BASE >> 16) >> + ldr r9, [r8, #GICC_IAR] >> + movw r10, #0x3ff >> + movt r10, #0 >> + cmp r9, r10 >> + beq out >> + movw r10, #0x3fe >> + cmp r9, r10 >> + beq out >> + str r9, [r8, #GICC_EOIR] >> + dsb >> + >> + @ Compute CPU number >> + lsr r9, r9, #10 >> + and r9, r9, #0xf >> + >> + movw r8, #(SUN7I_CPUCFG_BASE & 0xffff) >> + movt r8, #(SUN7I_CPUCFG_BASE >> 16) >> + >> + @ Wait for the core to enter WFI >> + lsl r11, r9, #6 @ x64 >> + add r11, r11, r8 >> + >> +1: ldr r10, [r11, #0x48] >> + tst r10, #(1 << 2) >> + bne 2f >> + timer_wait r10, ONE_MS >> + b 1b >> + >> + @ Reset CPU >> +2: mov r10, #0 >> + str r10, [r11, #0x40] >> + >> + @ Lock CPU >> + mov r10, #1 >> + lsl r9, r10, r9 @ r9 is now CPU mask >> + ldr r10, [r8, #0x1e4] >> + bic r10, r10, r9 >> + str r10, [r8, #0x1e4] >> + >> + @ Set power gating >> + ldr r10, [r8, #0x1b4] >> + orr r10, r10, #1 >> + str r10, [r8, #0x1b4] >> + timer_wait r10, ONE_MS >> + >> + @ Activate power clamp >> + mov r10, #1 >> +1: str r10, [r8, #0x1b0] >> + lsl r10, r10, #1 >> + orr r10, r10, #1 >> + tst r10, #0x100 >> + beq 1b >> + >> + @ Restore security level >> +out: mcr p15, 0, r7, c1, c1, 0 >> + >> + pop {r0-r12} >> + movs pc, lr > > We have to use "subs pc, lr, #4" to return from FIQ. I guess you can > explain better than I why, I only stumbled over this in a stackoverflow > posting - and it works! Of course!!! I wrote it just like the result of an SMC call, but that's an interrupt, and the stupid "fix the 3-stage pipeline yourself" rule applies. Very nice catch! For those who are now thinking "WTF??", this is explained in the ARM ARM (ARMv7 version), section B1.8.3, table B1-7. Thanks again, M.
diff --git a/arch/arm/cpu/armv7/sunxi/psci.S b/arch/arm/cpu/armv7/sunxi/psci.S index b9ea78b..0e4bca4 100644 --- a/arch/arm/cpu/armv7/sunxi/psci.S +++ b/arch/arm/cpu/armv7/sunxi/psci.S @@ -18,6 +18,7 @@ */ #include <config.h> +#include <asm/gic.h> #include <asm/psci.h> #include <asm/arch/cpu.h> @@ -38,6 +39,8 @@ #define ONE_MS (CONFIG_SYS_CLK_FREQ / 1000) #define TEN_MS (10 * ONE_MS) +#define GICD_BASE 0x1c81000 +#define GICC_BASE 0x1c82000 .macro timer_wait reg, ticks @ Program CNTP_TVAL @@ -61,7 +64,27 @@ .globl psci_arch_init psci_arch_init: + movw r4, #(GICD_BASE & 0xffff) + movt r4, #(GICD_BASE >> 16) + + ldr r5, [r4, #GICD_IGROUPRn] + bic r5, r5, #(1 << 15) @ SGI15 as Group-0 + str r5, [r4, #GICD_IGROUPRn] + + mov r5, #0 @ Set SGI15 priority to 0 + strb r5, [r4, #(GICD_IPRIORITYRn + 15)] + + add r4, r4, #0x1000 @ GICC address + + mov r5, #0xff + str r5, [r4, #GICC_PMR] @ Be cool with non-secure + + ldr r5, [r4, #GICC_CTLR] + orr r5, r5, #(1 << 3) @ Switch FIQEn on + str r5, [r4, #GICC_CTLR] + mrc p15, 0, r5, c1, c1, 0 @ Read SCR + orr r5, r5, #4 @ Enable FIQ in monitor mode bic r5, r5, #1 @ Secure mode mcr p15, 0, r5, c1, c1, 0 @ Write SCR isb @@ -79,6 +102,77 @@ psci_arch_init: bx lr +.globl psci_fiq_enter +psci_fiq_enter: + push {r0-r12} + + @ Switch to secure + mrc p15, 0, r7, c1, c1, 0 + bic r8, r7, #1 + mcr p15, 0, r8, c1, c1, 0 + isb + + movw r8, #(GICC_BASE & 0xffff) + movt r8, #(GICC_BASE >> 16) + ldr r9, [r8, #GICC_IAR] + movw r10, #0x3ff + movt r10, #0 + cmp r9, r10 + beq out + movw r10, #0x3fe + cmp r9, r10 + beq out + str r9, [r8, #GICC_EOIR] + dsb + + @ Compute CPU number + lsr r9, r9, #10 + and r9, r9, #0xf + + movw r8, #(SUN7I_CPUCFG_BASE & 0xffff) + movt r8, #(SUN7I_CPUCFG_BASE >> 16) + + @ Wait for the core to enter WFI + lsl r11, r9, #6 @ x64 + add r11, r11, r8 + +1: ldr r10, [r11, #0x48] + tst r10, #(1 << 2) + bne 2f + timer_wait r10, ONE_MS + b 1b + + @ Reset CPU +2: mov r10, #0 + str r10, [r11, #0x40] + + @ Lock CPU + mov r10, #1 + lsl r9, r10, r9 @ r9 is now CPU mask + ldr r10, [r8, #0x1e4] + bic r10, r10, r9 + str r10, [r8, #0x1e4] + + @ Set power gating + ldr r10, [r8, #0x1b4] + orr r10, r10, #1 + str r10, [r8, #0x1b4] + timer_wait r10, ONE_MS + + @ Activate power clamp + mov r10, #1 +1: str r10, [r8, #0x1b0] + lsl r10, r10, #1 + orr r10, r10, #1 + tst r10, #0x100 + beq 1b + + @ Restore security level +out: mcr p15, 0, r7, c1, c1, 0 + + pop {r0-r12} + movs pc, lr + @ r1 = target CPU @ r2 = target PC .globl psci_cpu_on @@ -144,6 +238,57 @@ psci_cpu_on: _target_pc: .word 0 +# Imported from Linux kernel +v7_flush_dcache_all: + dmb @ ensure ordering with previous memory accesses + mrc p15, 1, r0, c0, c0, 1 @ read clidr + ands r3, r0, #0x7000000 @ extract loc from clidr + mov r3, r3, lsr #23 @ left align loc bit field + beq finished @ if loc is 0, then no need to clean + mov r10, #0 @ start clean at cache level 0 +flush_levels: + add r2, r10, r10, lsr #1 @ work out 3x current cache level + mov r1, r0, lsr r2 @ extract cache type bits from clidr + and r1, r1, #7 @ mask of the bits for current cache only + cmp r1, #2 @ see what cache we have at this level + blt skip @ skip if no cache, or just i-cache + mrs r9, cpsr @ make cssr&csidr read atomic + mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr + isb @ isb to sych the new cssr&csidr + mrc p15, 1, r1, c0, c0, 0 @ read the new csidr + msr cpsr_c, r9 + and r2, r1, #7 @ extract the length of the cache lines + add r2, r2, #4 @ add 4 (line length offset) + ldr r4, =0x3ff + ands r4, r4, r1, lsr #3 @ find maximum number on the way size + clz r5, r4 @ find bit position of way size increment + ldr r7, =0x7fff + ands r7, r7, r1, lsr #13 @ extract max number of the index size +loop1: + mov r9, r7 @ create working copy of max index +loop2: + orr r11, r10, r4, lsl r5 @ factor way and cache number into r11 + orr r11, r11, r9, lsl r2 @ factor index number into r11 + mcr p15, 0, r11, c7, c14, 2 @ clean & invalidate by set/way + subs r9, r9, #1 @ decrement the index + bge loop2 + subs r4, r4, #1 @ decrement the way + bge loop1 +skip: + add r10, r10, #2 @ increment cache number + cmp r3, r10 + bgt flush_levels +finished: + mov r10, #0 @ swith back to cache level 0 + mcr p15, 2, r10, c0, c0, 0 @ select current cache level in cssr + dsb st + isb +movw r0, #0x8000 +movt r0, #0x01c2 +mov r1, #'!' +str r1, [r0] + bx lr + _sunxi_cpu_entry: @ Set SMP bit mrc p15, 0, r0, c1, c0, 1 @@ -158,5 +303,34 @@ _sunxi_cpu_entry: ldr r0, [r0] b _do_nonsec_entry +.globl psci_cpu_off +psci_cpu_off: + mrc p15, 0, r0, c1, c0, 0 @ SCTLR + bic r0, r0, #(1 << 2) @ Clear C bit + mcr p15, 0, r0, c1, c0, 0 @ SCTLR + isb + dsb + + bl v7_flush_dcache_all + + clrex @ Why??? + + mrc p15, 0, r0, c1, c0, 1 @ ACTLR + bic r0, r0, #(1 << 6) @ Clear SMP bit + mcr p15, 0, r0, c1, c0, 1 @ ACTLR + isb + dsb + + @ Ask CPU0 to pull the rug... + movw r0, #(GICD_BASE & 0xffff) + movt r0, #(GICD_BASE >> 16) + movw r1, #15 @ SGI15 + movt r1, #1 @ Target is CPU0 + str r1, [r0, #GICD_SGIR] + dsb + +1: wfi + b 1b + text_end: .popsection