diff mbox

State of ARM FIQ in Qemu

Message ID CAOgzsHWxMVNNns2UBiUbYdiVd8U_FZbc+20Xmtbca1nhEEKWbw@mail.gmail.com
State New
Headers show

Commit Message

Greg Bellows Nov. 3, 2014, 6:46 p.m. UTC
On 3 November 2014 10:22, Tim Sander <tim@krieglstein.org> wrote:

> Hi Greg
>
> Thanks for your fast reply.
> > I am still in the process of getting the security extension portion of
> the
> > GIC patches fully up and running.  By the sounds of your use, it sounds
> > like you just want FIQ support not necessarily secure GIC support.  Would
> > this be correct?
> Yes. More elaborate i am working on a modified cortexa9 versatile express,
> where i added my virtual test hardware.
>
> > I recently sent out an updated set of patches for review that contain GIC
> > interrupt grouping and FIQ enablement along with secure extension
> > infrastructure.  If interested, you can find the patches here:
> >
> > http://lists.nongnu.org/archive/html/qemu-devel/2014-10/msg03921.html
> >
> > Alternatively, it sounds like you have access to the Linaro GIT repos, in
> > which case you can use the following repo/branch that contains the same
> > patches.  It is based on fairly recent upstream bits.
> >
> > repo: git://git.linaro.org/people/greg.bellows/qemu.git
> > branch: tzqemu_gic_v2
> >
> > If you don't need the security extensions, then you shouldn't need to do
> > anything to the code to get FIQ support on vexpress-a9/15 or virt
> machines.
> Ok but i think i see a RAZ codepath in qemu when accessing the gic
> registers
> configuring the interrupt group.


> > Please let me know if you have any further questions or issues.
> I have the problem that the secure_extn property is not set and i have not
> figured out a way to set these.  The corresponding code is a slighly
> modified
> vexpress_common_init in hw/arm/vexpress.c.:519.
>
> I guess setting the property would be done by
> qdev_prop_set_bool(dev,"security_extn",TRUE);
> but i fail to find the "dev" from the GIC i could use as argument.
>
> Attached is also a snipped from a debugger run verifing that its indeed
> s->security-extn which is missing.
>
>
Ah... Yes, using A9 (GICv1) which means you don't have grouping without the
security extensions.  I tried enabling the security extensions and things
hung, however, I was able to boot A9 Linux and use FIQs with the following
change:

 diff --git a/hw/cpu/a9mpcore.c b/hw/cpu/a9mpcore.c
index c09358c..813ae92 100644

This may be at least a workaround for you while I figure out where the
security configuration gets hung-up.  Can you give this a try and see if
you can make progress?

The security extensions aspect of the code is fairly untested as I still
need secure address space support, so there may be glitches when security
is enabled.

Best regards
> Tim
>
> Breakpoint 3, gic_dist_writeb (opaque=0x555556368a80, offset=136, value=0)
> at
> hw/intc/arm_gic.c:820
> 820             } else if (offset >= 0x80) {
> (gdb) list
> 815                     s->enabled = (value & 0x1);
> 816                     DPRINTF("Distribution %sabled\n", s->enabled ?
> "En" :
> "Dis");
> 817                 }
> 818             } else if (offset < 4) {
> 819                 /* ignored.  */
> 820             } else if (offset >= 0x80) {
> 821                 /* Interrupt Group Registers
> 822                  *
> 823                  * For GIC with Security Extn and Non-secure access
> RAZ/WI
> 824                  * For GICv1 without Security Extn RAZ/WI
> (gdb) n
> 826                 if (!(s->security_extn && ns_access()) &&
> (gdb) n
> 828                                 || s->revision == 2)) {
> (gdb) n
> 999         gic_update(s);
> (gdb) print s->security_extn
> $2 = false
>
>

Comments

Tim Sander Nov. 4, 2014, 3:40 p.m. UTC | #1
Hi Greg
> Ah... Yes, using A9 (GICv1) which means you don't have grouping without the
> security extensions. 
Ok switching the GIC to version 2 works seems to work. In a way that Linux still
boots up and i get a FIQ.

I have some problems still:
It seems as if the exeption of the bugsplat below 
is called from handle_fasteoi_irq (or is it just interrupted?). Which would mean
that the cpu is not jumping to the FIQ handler but the normal irq handler. This
might point to a problem in the qemu FIQ code. But i am not sure, so the error
might also be in the linux user mode.

I have loaded a firmware my driver module with "set_fiq_handler" but the area
where the fiq has landed (0xfff1240) is filled completly with zeros?

Best regards
Tim

Bad mode in data abort handler detected
Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
Modules linked in: firq(O) ipv6
CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0 #1
task: bf2b9300 ti: bf362000 task.ti: bf362000
PC is at 0xffff1240
LR is at handle_fasteoi_irq+0x9c/0x13c
pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
sp : bf363e70  ip : 07a7e79d  fp : 00000000
r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 60004059  DAC: 00000015
Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
Stack: (0xbf363e70 to 0xbf364000)
3e60:                                     bf0083c0 00000000 0000002f 80230d04
3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008 00000000
3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff 8005cd04 0000002f
3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8 80008528
3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180 00000000 805baa00
3f00: 00000000 00000100 00000002 00000022 00000000 bf362000 76e8e4d0 80590080
3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744 200f0113 ffffffff
3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140 bf0079c0 8058cc70
3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008 00000000 80023af0
3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 00000000 76dd3b44
3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000 76f93428 76f93428
3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c 76f92008 00000000
3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff 9fffd821 9fffdc21
[<8005cda0>] (handle_fasteoi_irq) from [<80230d04>] (gic_eoi_irq+0x0/0x4c)
[<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c) 
---[ end trace 3dc3571209a017e1 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Greg Bellows Nov. 4, 2014, 6:33 p.m. UTC | #2
Hi Tim,

Responses inline.

Regards,

Greg

On 4 November 2014 09:40, Tim Sander <tim@krieglstein.org> wrote:

> Hi Greg
> > Ah... Yes, using A9 (GICv1) which means you don't have grouping without
> the
> > security extensions.
> Ok switching the GIC to version 2 works seems to work. In a way that Linux
> still
> boots up and i get a FIQ.
>
> I have some problems still:
> It seems as if the exeption of the bugsplat below
> is called from handle_fasteoi_irq (or is it just interrupted?). Which
> would mean
> that the cpu is not jumping to the FIQ handler but the normal irq handler.
> This
> might point to a problem in the qemu FIQ code. But i am not sure, so the
> error
> might also be in the linux user mode.
>
> I have loaded a firmware my driver module with "set_fiq_handler" but the
> area
> where the fiq has landed (0xfff1240) is filled completly with zeros?
>
> Best regards
> Tim
>
> Bad mode in data abort handler detected
> Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> Modules linked in: firq(O) ipv6
> CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0 #1
> task: bf2b9300 ti: bf362000 task.ti: bf362000
> PC is at 0xffff1240
> LR is at handle_fasteoi_irq+0x9c/0x13c
> pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
> sp : bf363e70  ip : 07a7e79d  fp : 00000000
> r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
> Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
>

It looks like we are in FIQ mode and interrupts have been masked.


> Control: 10c53c7d  Table: 60004059  DAC: 00000015
> Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> Stack: (0xbf363e70 to 0xbf364000)
> 3e60:                                     bf0083c0 00000000 0000002f
> 80230d04
> 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008
> 00000000
> 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff 8005cd04
> 0000002f
> 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8
> 80008528
> 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180 00000000
> 805baa00
> 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000 76e8e4d0
> 80590080
> 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744 200f0113
> ffffffff
> 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140 bf0079c0
> 8058cc70
> 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008 00000000
> 80023af0
> 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 00000000
> 76dd3b44
> 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000 76f93428
> 76f93428
> 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c 76f92008
> 00000000
> 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff 9fffd821
> 9fffdc21
> [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>] (gic_eoi_irq+0x0/0x4c)
>

It certainly looks like we are going down the standard IRQ patch as you
suggested.  I'm not a Linux driver guy, but do you see any kind of activity
(break points, printfs, ...) through your FIQ handler?


> [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> ---[ end trace 3dc3571209a017e1 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
>
>
It is hard to determine entirely what is happening here based on this
info.  I do have code of my own that routes KGDB interrupts as FIQs and
with the workaround I see the FIQs handled as expected.  Some things we can
try to get more info in hopes of pinpointing where to look:

   1. At the top of hw/intc/arm_gic.c there is the following commented out
   line:
       //#define DEBUG_GIC
   Uncomment the line, rebuild and rerun.  This will give us some trace on
   what is going through the GIC code.
   2. Run qemu with the "-d int" option which will print a message on each
   interrupt.  We should see an FIQ at some point if they are occurring. The
   only issue is that there will be numerous IRQs, so you'll have to parse
   through them to find an "exception 6 [FIQ].
   3. If you set a breakpoint in your driver, is it possible to see that
   FIQs are on from the kernel debugger.  Clearly you have to try this from a
   path where interrupts are masked.  I see the following on my system
   mentioned above:
   ...
   Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
   ...
Tim Sander Nov. 12, 2014, 1:56 p.m. UTC | #3
Hi Greg

> > Bad mode in data abort handler detected
> > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > Modules linked in: firq(O) ipv6
> > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0 #1
> > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > PC is at 0xffff1240
> > LR is at handle_fasteoi_irq+0x9c/0x13c
> > pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
> > sp : bf363e70  ip : 07a7e79d  fp : 00000000
> > r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> > r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> > r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
> > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
> 
> It looks like we are in FIQ mode and interrupts have been masked.
Indeed.
 
> > Control: 10c53c7d  Table: 60004059  DAC: 00000015
> > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > Stack: (0xbf363e70 to 0xbf364000)
> > 3e60:                                     bf0083c0 00000000 0000002f
> > 80230d04
> > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008
> > 00000000
> > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff 8005cd04
> > 0000002f
> > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8
> > 80008528
> > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180 00000000
> > 805baa00
> > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000 76e8e4d0
> > 80590080
> > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744 200f0113
> > ffffffff
> > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140 bf0079c0
> > 8058cc70
> > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008 00000000
> > 80023af0
> > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 00000000
> > 76dd3b44
> > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000 76f93428
> > 76f93428
> > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c 76f92008
> > 00000000
> > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff 9fffd821
> > 9fffdc21
> > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>] (gic_eoi_irq+0x0/0x4c)
> 
> It certainly looks like we are going down the standard IRQ patch as you
> suggested.  I'm not a Linux driver guy, but do you see any kind of activity
> (break points, printfs, ...) through your FIQ handler?
I am reaching 0xffff1224 which i believe is the fiq vector address on the vexpress?

> > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > ---[ end trace 3dc3571209a017e1 ]---
> > Kernel panic - not syncing: Fatal exception in interrupt
> 
> It is hard to determine entirely what is happening here based on this
> info.  I do have code of my own that routes KGDB interrupts as FIQs and
> with the workaround I see the FIQs handled as expected.  Some things we can
> try to get more info in hopes of pinpointing where to look:
> 
>    1. At the top of hw/intc/arm_gic.c there is the following commented out
>    line:
>        //#define DEBUG_GIC
>    Uncomment the line, rebuild and rerun.  This will give us some trace on
>    what is going through the GIC code.
I have commented out some debug lines but i see:
Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at hw/intc/arm_gic.c:120
120                         DPRINTF("Raised pending FIQ %d (cpu %d)\n", best_irq, cpu);

With the expected irq nr. 49 (32+17).

>    2. Run qemu with the "-d int" option which will print a message on each
>    interrupt.  We should see an FIQ at some point if they are occurring. The
> only issue is that there will be numerous IRQs, so you'll have to parse
> through them to find an "exception 6 [FIQ].
Here is the relevant output when the FIQ hits:
Taking exception 2 [SVC]
Taking exception 2 [SVC]
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x80221d70
Taking exception 4 [Data Abort]
...with DFSR 0x805 DFAR 0x805c604c
Taking exception 4 [Data Abort]
...with DFSR 0x805 DFAR 0x805c604c
Taking exception 4 [Data Abort]

So the fiq is hitting but unfortunatly i have no idea where the data aborts are coming from.
I have shifted all other Irqs besides 49 to group 1 so that only irq 49 is a FIQ. 
Might it be that i am seeing some secure violations...
The address of the IFAR __idr_pre_get which lives in the linux kernel in lib/idr.c seems to
be implementing ann integer ID management. 

>    3. If you set a breakpoint in your driver, is it possible to see that
>    FIQs are on from the kernel debugger.  Clearly you have to try this from
> a path where interrupts are masked.  I see the following on my system
> mentioned above:
>    ...
>    Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
>    ...
So you mean by debugging via the qemu debug port? I have not enabled the kgdb.
As stated above, i was not able to catch the fiq irq there. But it might be that i get 

I have debugged qemu to see if the irq is routed correctly. The depeest call i could find is this: bt
#0  tcg_handle_interrupt (cpu=0x555556450790, mask=16) at /home/sander/speedy/soc/qemu/translate-all.c:1503
#1  0x0000555555755323 in cpu_interrupt (cpu=0x555556450790, mask=16)
    at /home/sander/speedy/soc/qemu/include/qom/cpu.h:556
#2  0x00005555557561b7 in arm_cpu_set_irq (opaque=0x555556450790, irq=1, level=1)
    at /home/sander/speedy/soc/qemu/target-arm/cpu.c:261
#3  0x00005555558193ec in qemu_set_irq (irq=0x55555642c840, level=1) at hw/core/irq.c:43
#4  0x0000555555879073 in gic_update_with_grouping (s=0x5555564dba80) at hw/intc/arm_gic.c:132
#5  0x000055555587936d in gic_update (s=0x5555564dba80) at hw/intc/arm_gic.c:180
#6  0x00005555558798a7 in gic_set_irq (opaque=0x5555564dba80, irq=49, level=1) at hw/intc/arm_gic.c:264
#7  0x00005555558193ec in qemu_set_irq (irq=0x555556432b00, level=1) at hw/core/irq.c:43
#8  0x0000555555661d4d in a9mp_priv_set_irq (opaque=0x5555564d7260, irq=17, level=1)
    at /home/sander/speedy/soc/qemu/hw/cpu/a9mpcore.c:17
#9  0x00005555558193ec in qemu_set_irq (irq=0x5555564f3c00, level=1) at hw/core/irq.c:43
#10 0x00005555558f6fed in qemu_irq_raise (irq=0x5555564f3c00) at /home/sander/speedy/soc/qemu/include/hw/irq.h:16
#11 0x00005555558f7363 in pml_timer_tick (opaque=0x555556595020) at hw/timer/pml.c:95
#12 0x000055555599be6e in aio_bh_poll (ctx=0x5555563fdad0) at async.c:82
#13 0x00005555559b2d9f in aio_dispatch (ctx=0x5555563fdad0) at aio-posix.c:137
#14 0x000055555599c2cb in aio_ctx_dispatch (source=0x5555563fdad0, callback=0x0, user_data=0x0) at async.c:221
#15 0x00007ffff7901e04 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#16 0x00005555559b0a79 in glib_pollfds_poll () at main-loop.c:200
#17 0x00005555559b0b7a in os_host_main_loop_wait (timeout=0) at main-loop.c:245
#18 0x00005555559b0c52 in main_loop_wait (nonblocking=1) at main-loop.c:494
#19 0x0000555555791d8b in main_loop () at vl.c:1872
#20 0x00005555557998d5 in main (argc=22, argv=0x7fffffffda38, envp=0x7fffffffdaf0) at vl.c:4348

I am not sure if arm_cpu_set_irq(opaque=0x555556450790, irq=1, level=1) represents a fiq 
and if mask 16 is the correct mask for the fiq request.

Row #6 show clearly that irq 49 configured to Group 0 is triggered. All other interrupt are configured to Group 1 
from my Linux kernel. The call to #4 gic_update_with_grouping shows that grouping within the GIC is enabled
and that irq is triggered as FIQ within qemu. All of this looks good as far as i understand. So i am pretty confident
that qemu is working correctly (minus the Prefetch and Data Aborts).

Best regards
Tim
Greg Bellows Nov. 12, 2014, 4 p.m. UTC | #4
On 12 November 2014 07:56, Tim Sander <tim@krieglstein.org> wrote:

> Hi Greg
>
> > > Bad mode in data abort handler detected
> > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > > Modules linked in: firq(O) ipv6
> > > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0 #1
> > > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > > PC is at 0xffff1240
> > > LR is at handle_fasteoi_irq+0x9c/0x13c
> > > pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
> > > sp : bf363e70  ip : 07a7e79d  fp : 00000000
> > > r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> > > r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> > > r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
> > > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
> >
> > It looks like we are in FIQ mode and interrupts have been masked.
> Indeed.
>
> > > Control: 10c53c7d  Table: 60004059  DAC: 00000015
> > > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > > Stack: (0xbf363e70 to 0xbf364000)
> > > 3e60:                                     bf0083c0 00000000 0000002f
> > > 80230d04
> > > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008
> > > 00000000
> > > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff 8005cd04
> > > 0000002f
> > > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8
> > > 80008528
> > > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180 00000000
> > > 805baa00
> > > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000 76e8e4d0
> > > 80590080
> > > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744 200f0113
> > > ffffffff
> > > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140 bf0079c0
> > > 8058cc70
> > > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008 00000000
> > > 80023af0
> > > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 00000000
> > > 76dd3b44
> > > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000 76f93428
> > > 76f93428
> > > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c 76f92008
> > > 00000000
> > > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff 9fffd821
> > > 9fffdc21
> > > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>]
> (gic_eoi_irq+0x0/0x4c)
> >
> > It certainly looks like we are going down the standard IRQ patch as you
> > suggested.  I'm not a Linux driver guy, but do you see any kind of
> activity
> > (break points, printfs, ...) through your FIQ handler?
> I am reaching 0xffff1224 which i believe is the fiq vector address on the
> vexpress?
>

Hmmm.... not sure.  As you mentioned previously (and as seen in the above
register dump), I would expect offset 0x1240 (pc=0xffff1240) for an FIQ.
I'm not sure what is at offset 0x1224, but on my Linux kernel it appears
that offset 0x1220 is vector_addrexcptn (not pabort), that happens to
occupy the HYP trap vector.


>
> > > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > > ---[ end trace 3dc3571209a017e1 ]---
> > > Kernel panic - not syncing: Fatal exception in interrupt
> >
> > It is hard to determine entirely what is happening here based on this
> > info.  I do have code of my own that routes KGDB interrupts as FIQs and
> > with the workaround I see the FIQs handled as expected.  Some things we
> can
> > try to get more info in hopes of pinpointing where to look:
> >
> >    1. At the top of hw/intc/arm_gic.c there is the following commented
> out
> >    line:
> >        //#define DEBUG_GIC
> >    Uncomment the line, rebuild and rerun.  This will give us some trace
> on
> >    what is going through the GIC code.
> I have commented out some debug lines but i see:
> Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at
> hw/intc/arm_gic.c:120
> 120                         DPRINTF("Raised pending FIQ %d (cpu %d)\n",
> best_irq, cpu);
>
> With the expected irq nr. 49 (32+17).
>
> >    2. Run qemu with the "-d int" option which will print a message on
> each
> >    interrupt.  We should see an FIQ at some point if they are occurring.
> The
> > only issue is that there will be numerous IRQs, so you'll have to parse
> > through them to find an "exception 6 [FIQ].
> Here is the relevant output when the FIQ hits:
> Taking exception 2 [SVC]
> Taking exception 2 [SVC]
> pml: pml_timer_tick: raise_irq
> arm_gic: Raised pending FIQ 49 (cpu 0)
> Taking exception 6 [FIQ]
>

This looks to me like the GIC has caught the interrupt and communicated it
to the CPU causing it to take the FIQ exception.


> pml: pml_write: update control flags: 1
> pml: pml_update: start timer
> pml: pml_update: lower irq
> pml: pml_read: read magic
> pml: pml_write: update control flags: 3
> pml: pml_update: start timer
>

Is pml your test driver?  It looks like it initiates the interrupt and
possibly performs some handling following it?


> Taking exception 3 [Prefetch Abort]
> ...with IFSR 0x5 IFAR 0x80221d70
> Taking exception 4 [Data Abort]
> ...with DFSR 0x805 DFAR 0x805c604c
> Taking exception 4 [Data Abort]
> ...with DFSR 0x805 DFAR 0x805c604c
> Taking exception 4 [Data Abort]
>
> So the fiq is hitting but unfortunatly i have no idea where the data
> aborts are coming from.
>

The data aborts are likely a side effect of the prefetch abort taken before
them; it is the interesting one.


> I have shifted all other Irqs besides 49 to group 1 so that only irq 49 is
> a FIQ.
> Might it be that i am seeing some secure violations...
> The address of the IFAR __idr_pre_get which lives in the linux kernel in
> lib/idr.c seems to
> be implementing ann integer ID management.
>
> >    3. If you set a breakpoint in your driver, is it possible to see that
> >    FIQs are on from the kernel debugger.  Clearly you have to try this
> from
> > a path where interrupts are masked.  I see the following on my system
> > mentioned above:
> >    ...
> >    Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> >    ...
> So you mean by debugging via the qemu debug port? I have not enabled the
> kgdb.
> As stated above, i was not able to catch the fiq irq there. But it might
> be that i get
>
> I have debugged qemu to see if the irq is routed correctly. The depeest
> call i could find is this: bt
> #0  tcg_handle_interrupt (cpu=0x555556450790, mask=16) at
> /home/sander/speedy/soc/qemu/translate-all.c:1503
> #1  0x0000555555755323 in cpu_interrupt (cpu=0x555556450790, mask=16)
>     at /home/sander/speedy/soc/qemu/include/qom/cpu.h:556
> #2  0x00005555557561b7 in arm_cpu_set_irq (opaque=0x555556450790, irq=1,
> level=1)
>     at /home/sander/speedy/soc/qemu/target-arm/cpu.c:261
> #3  0x00005555558193ec in qemu_set_irq (irq=0x55555642c840, level=1) at
> hw/core/irq.c:43
> #4  0x0000555555879073 in gic_update_with_grouping (s=0x5555564dba80) at
> hw/intc/arm_gic.c:132
> #5  0x000055555587936d in gic_update (s=0x5555564dba80) at
> hw/intc/arm_gic.c:180
> #6  0x00005555558798a7 in gic_set_irq (opaque=0x5555564dba80, irq=49,
> level=1) at hw/intc/arm_gic.c:264
> #7  0x00005555558193ec in qemu_set_irq (irq=0x555556432b00, level=1) at
> hw/core/irq.c:43
> #8  0x0000555555661d4d in a9mp_priv_set_irq (opaque=0x5555564d7260,
> irq=17, level=1)
>     at /home/sander/speedy/soc/qemu/hw/cpu/a9mpcore.c:17
> #9  0x00005555558193ec in qemu_set_irq (irq=0x5555564f3c00, level=1) at
> hw/core/irq.c:43
> #10 0x00005555558f6fed in qemu_irq_raise (irq=0x5555564f3c00) at
> /home/sander/speedy/soc/qemu/include/hw/irq.h:16
> #11 0x00005555558f7363 in pml_timer_tick (opaque=0x555556595020) at
> hw/timer/pml.c:95
> #12 0x000055555599be6e in aio_bh_poll (ctx=0x5555563fdad0) at async.c:82
> #13 0x00005555559b2d9f in aio_dispatch (ctx=0x5555563fdad0) at
> aio-posix.c:137
> #14 0x000055555599c2cb in aio_ctx_dispatch (source=0x5555563fdad0,
> callback=0x0, user_data=0x0) at async.c:221
> #15 0x00007ffff7901e04 in g_main_context_dispatch () from
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #16 0x00005555559b0a79 in glib_pollfds_poll () at main-loop.c:200
> #17 0x00005555559b0b7a in os_host_main_loop_wait (timeout=0) at
> main-loop.c:245
> #18 0x00005555559b0c52 in main_loop_wait (nonblocking=1) at main-loop.c:494
> #19 0x0000555555791d8b in main_loop () at vl.c:1872
> #20 0x00005555557998d5 in main (argc=22, argv=0x7fffffffda38,
> envp=0x7fffffffdaf0) at vl.c:4348
>
> I am not sure if arm_cpu_set_irq(opaque=0x555556450790, irq=1, level=1)
> represents a fiq
> and if mask 16 is the correct mask for the fiq request.
>

Yeah this routine handles both IRQs and FIQs.  I don't see anything above
that stands out as suspicious.  It may be interesting to try the same test
driver on an A15 emulation if it is not too much trouble.  This would rule
out the A9 workaround not being sufficient for being GICv2.


>
> Row #6 show clearly that irq 49 configured to Group 0 is triggered. All
> other interrupt are configured to Group 1
> from my Linux kernel. The call to #4 gic_update_with_grouping shows that
> grouping within the GIC is enabled
> and that irq is triggered as FIQ within qemu. All of this looks good as
> far as i understand. So i am pretty confident
> that qemu is working correctly (minus the Prefetch and Data Aborts).
>

I agree that QEMU appears to be handling the FIQ properly and it appears
that the CPU is trying to dispatch it.  I understand that the Linux FIQ
handling is a little trickier than IRQs, so I suspect that either something
in the Linux kernel handling or your driver is going awry during handling
or as a result of the FIQ.

Let me know if you need any additional help or you discover any misbehavior.


> Best regards
> Tim
>


Regards,

Greg
Tim Sander Nov. 13, 2014, 1:58 p.m. UTC | #5
Am Mittwoch, 12. November 2014, 10:00:03 schrieb Greg Bellows:
> On 12 November 2014 07:56, Tim Sander <tim@krieglstein.org> wrote:
> > Hi Greg
> > 
> > > > Bad mode in data abort handler detected
> > > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > > > Modules linked in: firq(O) ipv6
> > > > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0 #1
> > > > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > > > PC is at 0xffff1240
> > > > LR is at handle_fasteoi_irq+0x9c/0x13c
> > > > pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
> > > > sp : bf363e70  ip : 07a7e79d  fp : 00000000
> > > > r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> > > > r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> > > > r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
> > > > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
> > > 
> > > It looks like we are in FIQ mode and interrupts have been masked.
> > 
> > Indeed.
> > 
> > > > Control: 10c53c7d  Table: 60004059  DAC: 00000015
> > > > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > > > Stack: (0xbf363e70 to 0xbf364000)
> > > > 3e60:                                     bf0083c0 00000000 0000002f
> > > > 80230d04
> > > > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008
> > > > 00000000
> > > > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff 8005cd04
> > > > 0000002f
> > > > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8
> > > > 80008528
> > > > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180 00000000
> > > > 805baa00
> > > > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000 76e8e4d0
> > > > 80590080
> > > > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744 200f0113
> > > > ffffffff
> > > > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140 bf0079c0
> > > > 8058cc70
> > > > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008 00000000
> > > > 80023af0
> > > > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 00000000
> > > > 76dd3b44
> > > > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000 76f93428
> > > > 76f93428
> > > > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c 76f92008
> > > > 00000000
> > > > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff 9fffd821
> > > > 9fffdc21
> > > > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>]
> > 
> > (gic_eoi_irq+0x0/0x4c)
> > 
> > > It certainly looks like we are going down the standard IRQ patch as you
> > > suggested.  I'm not a Linux driver guy, but do you see any kind of
> > 
> > activity
> > 
> > > (break points, printfs, ...) through your FIQ handler?
> > 
> > I am reaching 0xffff1224 which i believe is the fiq vector address on the
> > vexpress?
> 
> Hmmm.... not sure.  As you mentioned previously (and as seen in the above
> register dump), I would expect offset 0x1240 (pc=0xffff1240) for an FIQ.
> I'm not sure what is at offset 0x1224, but on my Linux kernel it appears
> that offset 0x1220 is vector_addrexcptn (not pabort), that happens to
> occupy the HYP trap vector.
Zounds! You're right, i think this was a typo in my debug script. Which i 
didn't notice. But i am even reaching 0x1240 before but not 0x1244 which means
it aborts on the first fiq instructions. Here is the "-d int" output directly 
after the FIQ hits:
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x800c8dcc  //kmem_cache_alloc
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x8001be00 //v7_pabort
Taking exception 3 [Prefetch Abort]
and then it continue to fail on v7_pabort repeatedly. This shows that there is 
something fishy going on. It is failing on the presumed handler for the 
prefetch abort? But as i see earlier resolved prefetched abort errors i can 
conclude that it works up to the point where the CPU is in FIQ mode. 
FIQ is special in a way that static mapped memory is needed to avoid a page 
lookup as this fails under linux in fiq mode. But 0x800c8dcc (kmem_cache_alloc)
is not called in the FIQ handler which obviously can't use any Linux 
infrastructure. And as i do not reach the breakpoint 0xffff1244 these misses 
happen on the execution of the first address of the FIQ handler.

> > > > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > > > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > > > ---[ end trace 3dc3571209a017e1 ]---
> > > > Kernel panic - not syncing: Fatal exception in interrupt
> > > 
> > > It is hard to determine entirely what is happening here based on this
> > > info.  I do have code of my own that routes KGDB interrupts as FIQs and
> > > with the workaround I see the FIQs handled as expected.  Some things we
> > 
> > can
> > 
> > > try to get more info in hopes of pinpointing where to look:
> > >    1. At the top of hw/intc/arm_gic.c there is the following commented
> > 
> > out
> > 
> > >    line:
> > >        //#define DEBUG_GIC
> > >    
> > >    Uncomment the line, rebuild and rerun.  This will give us some trace
> > 
> > on
> > 
> > >    what is going through the GIC code.
> > 
> > I have commented out some debug lines but i see:
> > Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at
> > hw/intc/arm_gic.c:120
> > 120                         DPRINTF("Raised pending FIQ %d (cpu %d)\n",
> > best_irq, cpu);
> > 
> > With the expected irq nr. 49 (32+17).
> > 
> > >    2. Run qemu with the "-d int" option which will print a message on
> > 
> > each
> > 
> > >    interrupt.  We should see an FIQ at some point if they are occurring.
> > 
> > The
> > 
> > > only issue is that there will be numerous IRQs, so you'll have to parse
> > > through them to find an "exception 6 [FIQ].
> > 
> > Here is the relevant output when the FIQ hits:
> > Taking exception 2 [SVC]
> > Taking exception 2 [SVC]
> > pml: pml_timer_tick: raise_irq
> > arm_gic: Raised pending FIQ 49 (cpu 0)
> > Taking exception 6 [FIQ]
> 
> This looks to me like the GIC has caught the interrupt and communicated it
> to the CPU causing it to take the FIQ exception.
> 
> > pml: pml_write: update control flags: 1
> > pml: pml_update: start timer
> > pml: pml_update: lower irq
> > pml: pml_read: read magic
> > pml: pml_write: update control flags: 3
> > pml: pml_update: start timer
> 
> Is pml your test driver?  It looks like it initiates the interrupt and
> possibly performs some handling following it?
Yes, its just a simple set of some registers to control an interrupt. There is
i added debug output to this driver to see if and when the FIQ is accessing 
the registers. But i see no accesses from FIQ mode.

> > Taking exception 3 [Prefetch Abort]
> > ...with IFSR 0x5 IFAR 0x80221d70
> > Taking exception 4 [Data Abort]
> > ...with DFSR 0x805 DFAR 0x805c604c
> > Taking exception 4 [Data Abort]
> > ...with DFSR 0x805 DFAR 0x805c604c
> > Taking exception 4 [Data Abort]
> > 
> > So the fiq is hitting but unfortunatly i have no idea where the data
> > aborts are coming from.
> 
> The data aborts are likely a side effect of the prefetch abort taken before
> them; it is the interesting one.
Still as above the address is odd. In FIQ mode it should not jump to this 
address at all !?! This is definetly Linux memory space and i am not calling 
anything linux related from FIQ.
 
> > I have shifted all other Irqs besides 49 to group 1 so that only irq 49 is
> > a FIQ.
> > Might it be that i am seeing some secure violations...
> > The address of the IFAR __idr_pre_get which lives in the linux kernel in
> > lib/idr.c seems to
> > be implementing ann integer ID management.
> > 
> > >    3. If you set a breakpoint in your driver, is it possible to see that
> > >    FIQs are on from the kernel debugger.  Clearly you have to try this
> > 
> > from
> > 
> > > a path where interrupts are masked.  I see the following on my system
> > > 
> > > mentioned above:
> > >    ...
> > >    Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> > >    ...
> > 
> > So you mean by debugging via the qemu debug port? I have not enabled the
> > kgdb.
> > As stated above, i was not able to catch the fiq irq there. But it might
> > be that i get
> > 
> > I have debugged qemu to see if the irq is routed correctly. The depeest
> > call i could find is this: bt
> > #0  tcg_handle_interrupt (cpu=0x555556450790, mask=16) at
> > /home/sander/speedy/soc/qemu/translate-all.c:1503
> > #1  0x0000555555755323 in cpu_interrupt (cpu=0x555556450790, mask=16)
> > 
> >     at /home/sander/speedy/soc/qemu/include/qom/cpu.h:556
> > 
> > #2  0x00005555557561b7 in arm_cpu_set_irq (opaque=0x555556450790, irq=1,
> > level=1)
> > 
> >     at /home/sander/speedy/soc/qemu/target-arm/cpu.c:261
> > 
> > #3  0x00005555558193ec in qemu_set_irq (irq=0x55555642c840, level=1) at
> > hw/core/irq.c:43
> > #4  0x0000555555879073 in gic_update_with_grouping (s=0x5555564dba80) at
> > hw/intc/arm_gic.c:132
> > #5  0x000055555587936d in gic_update (s=0x5555564dba80) at
> > hw/intc/arm_gic.c:180
> > #6  0x00005555558798a7 in gic_set_irq (opaque=0x5555564dba80, irq=49,
> > level=1) at hw/intc/arm_gic.c:264
> > #7  0x00005555558193ec in qemu_set_irq (irq=0x555556432b00, level=1) at
> > hw/core/irq.c:43
> > #8  0x0000555555661d4d in a9mp_priv_set_irq (opaque=0x5555564d7260,
> > irq=17, level=1)
> > 
> >     at /home/sander/speedy/soc/qemu/hw/cpu/a9mpcore.c:17
> > 
> > #9  0x00005555558193ec in qemu_set_irq (irq=0x5555564f3c00, level=1) at
> > hw/core/irq.c:43
> > #10 0x00005555558f6fed in qemu_irq_raise (irq=0x5555564f3c00) at
> > /home/sander/speedy/soc/qemu/include/hw/irq.h:16
> > #11 0x00005555558f7363 in pml_timer_tick (opaque=0x555556595020) at
> > hw/timer/pml.c:95
> > #12 0x000055555599be6e in aio_bh_poll (ctx=0x5555563fdad0) at async.c:82
> > #13 0x00005555559b2d9f in aio_dispatch (ctx=0x5555563fdad0) at
> > aio-posix.c:137
> > #14 0x000055555599c2cb in aio_ctx_dispatch (source=0x5555563fdad0,
> > callback=0x0, user_data=0x0) at async.c:221
> > #15 0x00007ffff7901e04 in g_main_context_dispatch () from
> > /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > #16 0x00005555559b0a79 in glib_pollfds_poll () at main-loop.c:200
> > #17 0x00005555559b0b7a in os_host_main_loop_wait (timeout=0) at
> > main-loop.c:245
> > #18 0x00005555559b0c52 in main_loop_wait (nonblocking=1) at
> > main-loop.c:494
> > #19 0x0000555555791d8b in main_loop () at vl.c:1872
> > #20 0x00005555557998d5 in main (argc=22, argv=0x7fffffffda38,
> > envp=0x7fffffffdaf0) at vl.c:4348
> > 
> > I am not sure if arm_cpu_set_irq(opaque=0x555556450790, irq=1, level=1)
> > represents a fiq
> > and if mask 16 is the correct mask for the fiq request.
> 
> Yeah this routine handles both IRQs and FIQs.  I don't see anything above
> that stands out as suspicious.  It may be interesting to try the same test
> driver on an A15 emulation if it is not too much trouble.  This would rule
> out the A9 workaround not being sufficient for being GICv2.
Given the fact that the addresses in which the fault appears are bogus and not 
accessed by the fiq handler at all. I have seen that starting up a different cpu 
is just a matter of a command line option. So i started up my modified vexpress
board (pml hw added) with cortex a15 cpu. Unfortunatly the results are pretty
similar:
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 4 [Data Abort]
...with DFSR 0x5 DFAR 0xbf3d2334 //address not in Kernel space?
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x800120e0 //__dabt_svc
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x80012240 //__pabt_svc
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x80012240//__pabt_svc
Taking exception 3 [Prefetch Abort]

> > Row #6 show clearly that irq 49 configured to Group 0 is triggered. All
> > other interrupt are configured to Group 1
> > from my Linux kernel. The call to #4 gic_update_with_grouping shows that
> > grouping within the GIC is enabled
> > and that irq is triggered as FIQ within qemu. All of this looks good as
> > far as i understand. So i am pretty confident
> > that qemu is working correctly (minus the Prefetch and Data Aborts).
> 
> I agree that QEMU appears to be handling the FIQ properly and it appears
> that the CPU is trying to dispatch it.  I understand that the Linux FIQ
> handling is a little trickier than IRQs, so I suspect that either something
> in the Linux kernel handling or your driver is going awry during handling
> or as a result of the FIQ.
Yes FIQ's are tricky as you need to avoid the page lookup failures. These are 
undesirable in a FIQ anyway. So all the memory i accessed is statically mapped 
so that its allways available in the page table.
 
Best regards
Tim
Greg Bellows Nov. 13, 2014, 3:09 p.m. UTC | #6
On 13 November 2014 07:58, Tim Sander <tim@krieglstein.org> wrote:

> Am Mittwoch, 12. November 2014, 10:00:03 schrieb Greg Bellows:
> > On 12 November 2014 07:56, Tim Sander <tim@krieglstein.org> wrote:
> > > Hi Greg
> > >
> > > > > Bad mode in data abort handler detected
> > > > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > > > > Modules linked in: firq(O) ipv6
> > > > > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0
> #1
> > > > > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > > > > PC is at 0xffff1240
> > > > > LR is at handle_fasteoi_irq+0x9c/0x13c
> > > > > pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
> > > > > sp : bf363e70  ip : 07a7e79d  fp : 00000000
> > > > > r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> > > > > r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> > > > > r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
> > > > > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
> > > >
> > > > It looks like we are in FIQ mode and interrupts have been masked.
> > >
> > > Indeed.
> > >
> > > > > Control: 10c53c7d  Table: 60004059  DAC: 00000015
> > > > > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > > > > Stack: (0xbf363e70 to 0xbf364000)
> > > > > 3e60:                                     bf0083c0 00000000
> 0000002f
> > > > > 80230d04
> > > > > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080
> 76f92008
> > > > > 00000000
> > > > > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff
> 8005cd04
> > > > > 0000002f
> > > > > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c
> bf363ef8
> > > > > 80008528
> > > > > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180
> 00000000
> > > > > 805baa00
> > > > > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000
> 76e8e4d0
> > > > > 80590080
> > > > > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744
> 200f0113
> > > > > ffffffff
> > > > > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140
> bf0079c0
> > > > > 8058cc70
> > > > > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008
> 00000000
> > > > > 80023af0
> > > > > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528
> 00000000
> > > > > 76dd3b44
> > > > > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000
> 76f93428
> > > > > 76f93428
> > > > > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c
> 76f92008
> > > > > 00000000
> > > > > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff
> 9fffd821
> > > > > 9fffdc21
> > > > > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>]
> > >
> > > (gic_eoi_irq+0x0/0x4c)
> > >
> > > > It certainly looks like we are going down the standard IRQ patch as
> you
> > > > suggested.  I'm not a Linux driver guy, but do you see any kind of
> > >
> > > activity
> > >
> > > > (break points, printfs, ...) through your FIQ handler?
> > >
> > > I am reaching 0xffff1224 which i believe is the fiq vector address on
> the
> > > vexpress?
> >
> > Hmmm.... not sure.  As you mentioned previously (and as seen in the above
> > register dump), I would expect offset 0x1240 (pc=0xffff1240) for an FIQ.
> > I'm not sure what is at offset 0x1224, but on my Linux kernel it appears
> > that offset 0x1220 is vector_addrexcptn (not pabort), that happens to
> > occupy the HYP trap vector.
> Zounds! You're right, i think this was a typo in my debug script. Which i
> didn't notice. But i am even reaching 0x1240 before but not 0x1244 which
> means
>

I wouldn't expect it to reach 0x1244 as that is the word after what I
believe should be a branch at 0x1240 to the FIQ handler.  This would mean
we are not overrunning the vector table though.


> it aborts on the first fiq instructions. Here is the "-d int" output
> directly
> after the FIQ hits:
> Taking exception 3 [Prefetch Abort]
> ...with IFSR 0x5 IFAR 0x800c8dcc  //kmem_cache_alloc
> Taking exception 3 [Prefetch Abort]
> ...with IFSR 0x5 IFAR 0x8001be00 //v7_pabort
> Taking exception 3 [Prefetch Abort]
> and then it continue to fail on v7_pabort repeatedly. This shows that
> there is
> something fishy going on. It is failing on the presumed handler for the
> prefetch abort? But as i see earlier resolved prefetched abort errors i can
> conclude that it works up to the point where the CPU is in FIQ mode.
> FIQ is special in a way that static mapped memory is needed to avoid a page
> lookup as this fails under linux in fiq mode. But 0x800c8dcc
> (kmem_cache_alloc)
> is not called in the FIQ handler which obviously can't use any Linux
> infrastructure. And as i do not reach the breakpoint 0xffff1244 these
> misses
> happen on the execution of the first address of the FIQ handler.
>

Can we check the vector table to see if the FIQ entry is as expected?  It
appears that the pabort may be in the right place, but it would be good to
see if the FIQ entry is correct (branching to right place).  I'd expect
that we should be branching to __fiq_svc?  Maybe setting a breakpoint in
the first level handler may be useful?


>
> > > > > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > > > > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > > > > ---[ end trace 3dc3571209a017e1 ]---
> > > > > Kernel panic - not syncing: Fatal exception in interrupt
> > > >
> > > > It is hard to determine entirely what is happening here based on this
> > > > info.  I do have code of my own that routes KGDB interrupts as FIQs
> and
> > > > with the workaround I see the FIQs handled as expected.  Some things
> we
> > >
> > > can
> > >
> > > > try to get more info in hopes of pinpointing where to look:
> > > >    1. At the top of hw/intc/arm_gic.c there is the following
> commented
> > >
> > > out
> > >
> > > >    line:
> > > >        //#define DEBUG_GIC
> > > >
> > > >    Uncomment the line, rebuild and rerun.  This will give us some
> trace
> > >
> > > on
> > >
> > > >    what is going through the GIC code.
> > >
> > > I have commented out some debug lines but i see:
> > > Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at
> > > hw/intc/arm_gic.c:120
> > > 120                         DPRINTF("Raised pending FIQ %d (cpu %d)\n",
> > > best_irq, cpu);
> > >
> > > With the expected irq nr. 49 (32+17).
> > >
> > > >    2. Run qemu with the "-d int" option which will print a message on
> > >
> > > each
> > >
> > > >    interrupt.  We should see an FIQ at some point if they are
> occurring.
> > >
> > > The
> > >
> > > > only issue is that there will be numerous IRQs, so you'll have to
> parse
> > > > through them to find an "exception 6 [FIQ].
> > >
> > > Here is the relevant output when the FIQ hits:
> > > Taking exception 2 [SVC]
> > > Taking exception 2 [SVC]
> > > pml: pml_timer_tick: raise_irq
> > > arm_gic: Raised pending FIQ 49 (cpu 0)
> > > Taking exception 6 [FIQ]
> >
> > This looks to me like the GIC has caught the interrupt and communicated
> it
> > to the CPU causing it to take the FIQ exception.
> >
> > > pml: pml_write: update control flags: 1
> > > pml: pml_update: start timer
> > > pml: pml_update: lower irq
> > > pml: pml_read: read magic
> > > pml: pml_write: update control flags: 3
> > > pml: pml_update: start timer
> >
> > Is pml your test driver?  It looks like it initiates the interrupt and
> > possibly performs some handling following it?
> Yes, its just a simple set of some registers to control an interrupt.
> There is
> i added debug output to this driver to see if and when the FIQ is accessing
> the registers. But i see no accesses from FIQ mode.
>
> > > Taking exception 3 [Prefetch Abort]
> > > ...with IFSR 0x5 IFAR 0x80221d70
> > > Taking exception 4 [Data Abort]
> > > ...with DFSR 0x805 DFAR 0x805c604c
> > > Taking exception 4 [Data Abort]
> > > ...with DFSR 0x805 DFAR 0x805c604c
> > > Taking exception 4 [Data Abort]
> > >
> > > So the fiq is hitting but unfortunatly i have no idea where the data
> > > aborts are coming from.
> >
> > The data aborts are likely a side effect of the prefetch abort taken
> before
> > them; it is the interesting one.
> Still as above the address is odd. In FIQ mode it should not jump to this
> address at all !?! This is definetly Linux memory space and i am not
> calling
> anything linux related from FIQ.
>

I'm a bit confused as it appears the exception pattern has changed.
Previously, we were seeing pabt, dabt, dabt, ..., but then up above the
output is pabt, pabt, pabt, ... .  So, either we are jumping somewhere
random thus breaking repeatability or something else changed?  This is also
reflected in the A15 output below, but its different.


>
> > > I have shifted all other Irqs besides 49 to group 1 so that only irq
> 49 is
> > > a FIQ.
> > > Might it be that i am seeing some secure violations...
> > > The address of the IFAR __idr_pre_get which lives in the linux kernel
> in
> > > lib/idr.c seems to
> > > be implementing ann integer ID management.
> > >
> > > >    3. If you set a breakpoint in your driver, is it possible to see
> that
> > > >    FIQs are on from the kernel debugger.  Clearly you have to try
> this
> > >
> > > from
> > >
> > > > a path where interrupts are masked.  I see the following on my system
> > > >
> > > > mentioned above:
> > > >    ...
> > > >    Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment
> kernel
> > > >    ...
> > >
> > > So you mean by debugging via the qemu debug port? I have not enabled
> the
> > > kgdb.
> > > As stated above, i was not able to catch the fiq irq there. But it
> might
> > > be that i get
> > >
> > > I have debugged qemu to see if the irq is routed correctly. The depeest
> > > call i could find is this: bt
> > > #0  tcg_handle_interrupt (cpu=0x555556450790, mask=16) at
> > > /home/sander/speedy/soc/qemu/translate-all.c:1503
> > > #1  0x0000555555755323 in cpu_interrupt (cpu=0x555556450790, mask=16)
> > >
> > >     at /home/sander/speedy/soc/qemu/include/qom/cpu.h:556
> > >
> > > #2  0x00005555557561b7 in arm_cpu_set_irq (opaque=0x555556450790,
> irq=1,
> > > level=1)
> > >
> > >     at /home/sander/speedy/soc/qemu/target-arm/cpu.c:261
> > >
> > > #3  0x00005555558193ec in qemu_set_irq (irq=0x55555642c840, level=1) at
> > > hw/core/irq.c:43
> > > #4  0x0000555555879073 in gic_update_with_grouping (s=0x5555564dba80)
> at
> > > hw/intc/arm_gic.c:132
> > > #5  0x000055555587936d in gic_update (s=0x5555564dba80) at
> > > hw/intc/arm_gic.c:180
> > > #6  0x00005555558798a7 in gic_set_irq (opaque=0x5555564dba80, irq=49,
> > > level=1) at hw/intc/arm_gic.c:264
> > > #7  0x00005555558193ec in qemu_set_irq (irq=0x555556432b00, level=1) at
> > > hw/core/irq.c:43
> > > #8  0x0000555555661d4d in a9mp_priv_set_irq (opaque=0x5555564d7260,
> > > irq=17, level=1)
> > >
> > >     at /home/sander/speedy/soc/qemu/hw/cpu/a9mpcore.c:17
> > >
> > > #9  0x00005555558193ec in qemu_set_irq (irq=0x5555564f3c00, level=1) at
> > > hw/core/irq.c:43
> > > #10 0x00005555558f6fed in qemu_irq_raise (irq=0x5555564f3c00) at
> > > /home/sander/speedy/soc/qemu/include/hw/irq.h:16
> > > #11 0x00005555558f7363 in pml_timer_tick (opaque=0x555556595020) at
> > > hw/timer/pml.c:95
> > > #12 0x000055555599be6e in aio_bh_poll (ctx=0x5555563fdad0) at
> async.c:82
> > > #13 0x00005555559b2d9f in aio_dispatch (ctx=0x5555563fdad0) at
> > > aio-posix.c:137
> > > #14 0x000055555599c2cb in aio_ctx_dispatch (source=0x5555563fdad0,
> > > callback=0x0, user_data=0x0) at async.c:221
> > > #15 0x00007ffff7901e04 in g_main_context_dispatch () from
> > > /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > > #16 0x00005555559b0a79 in glib_pollfds_poll () at main-loop.c:200
> > > #17 0x00005555559b0b7a in os_host_main_loop_wait (timeout=0) at
> > > main-loop.c:245
> > > #18 0x00005555559b0c52 in main_loop_wait (nonblocking=1) at
> > > main-loop.c:494
> > > #19 0x0000555555791d8b in main_loop () at vl.c:1872
> > > #20 0x00005555557998d5 in main (argc=22, argv=0x7fffffffda38,
> > > envp=0x7fffffffdaf0) at vl.c:4348
> > >
> > > I am not sure if arm_cpu_set_irq(opaque=0x555556450790, irq=1, level=1)
> > > represents a fiq
> > > and if mask 16 is the correct mask for the fiq request.
> >
> > Yeah this routine handles both IRQs and FIQs.  I don't see anything above
> > that stands out as suspicious.  It may be interesting to try the same
> test
> > driver on an A15 emulation if it is not too much trouble.  This would
> rule
> > out the A9 workaround not being sufficient for being GICv2.
> Given the fact that the addresses in which the fault appears are bogus and
> not
> accessed by the fiq handler at all. I have seen that starting up a
> different cpu
> is just a matter of a command line option. So i started up my modified
> vexpress
> board (pml hw added) with cortex a15 cpu. Unfortunatly the results are
> pretty
> similar:
> pml: pml_timer_tick: raise_irq
> arm_gic: Raised pending FIQ 49 (cpu 0)
> Taking exception 6 [FIQ]
> pml: pml_write: update control flags: 1
> pml: pml_update: start timer
> pml: pml_update: lower irq
> pml: pml_read: read magic
> pml: pml_write: update control flags: 3
> pml: pml_update: start timer
> Taking exception 4 [Data Abort]
> ...with DFSR 0x5 DFAR 0xbf3d2334 //address not in Kernel space?
> Taking exception 3 [Prefetch Abort]
> ...with IFSR 0x5 IFAR 0x800120e0 //__dabt_svc
> Taking exception 3 [Prefetch Abort]
> ...with IFSR 0x5 IFAR 0x80012240 //__pabt_svc
> Taking exception 3 [Prefetch Abort]
> ...with IFSR 0x5 IFAR 0x80012240//__pabt_svc
> Taking exception 3 [Prefetch Abort]
>
>
Good data point.  Interesting that we take a data abort first rather than a
prefetch abort.


> > > Row #6 show clearly that irq 49 configured to Group 0 is triggered. All
> > > other interrupt are configured to Group 1
> > > from my Linux kernel. The call to #4 gic_update_with_grouping shows
> that
> > > grouping within the GIC is enabled
> > > and that irq is triggered as FIQ within qemu. All of this looks good as
> > > far as i understand. So i am pretty confident
> > > that qemu is working correctly (minus the Prefetch and Data Aborts).
> >
> > I agree that QEMU appears to be handling the FIQ properly and it appears
> > that the CPU is trying to dispatch it.  I understand that the Linux FIQ
> > handling is a little trickier than IRQs, so I suspect that either
> something
> > in the Linux kernel handling or your driver is going awry during handling
> > or as a result of the FIQ.
> Yes FIQ's are tricky as you need to avoid the page lookup failures. These
> are
> undesirable in a FIQ anyway. So all the memory i accessed is statically
> mapped
> so that its allways available in the page table.
>
> Best regards
> Tim
>
Tim Sander Nov. 13, 2014, 4:26 p.m. UTC | #7
Am Donnerstag, 13. November 2014, 09:09:33 schrieb Greg Bellows:
> On 13 November 2014 07:58, Tim Sander <tim@krieglstein.org> wrote:
> > Am Mittwoch, 12. November 2014, 10:00:03 schrieb Greg Bellows:
> > > On 12 November 2014 07:56, Tim Sander <tim@krieglstein.org> wrote:
> > > > Hi Greg
> > > > 
> > > > > > Bad mode in data abort handler detected
> > > > > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > > > > > Modules linked in: firq(O) ipv6
> > > > > > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G           O 3.14.0
> > 
> > #1
> > 
> > > > > > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > > > > > PC is at 0xffff1240
> > > > > > LR is at handle_fasteoi_irq+0x9c/0x13c
> > > > > > pc : [<ffff1240>]    lr : [<8005cda0>]    psr: 600f01d1
> > > > > > sp : bf363e70  ip : 07a7e79d  fp : 00000000
> > > > > > r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> > > > > > r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> > > > > > r3 : 80230d04  r2 : 0000002f  r1 : 00000000  r0 : bf0083c0
> > > > > > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment
> > > > > > user
> > > > > 
> > > > > It looks like we are in FIQ mode and interrupts have been masked.
> > > > 
> > > > Indeed.
> > > > 
> > > > > > Control: 10c53c7d  Table: 60004059  DAC: 00000015
> > > > > > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > > > > > Stack: (0xbf363e70 to 0xbf364000)
> > > > > > 3e60:                                     bf0083c0 00000000
> > 
> > 0000002f
> > 
> > > > > > 80230d04
> > > > > > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080
> > 
> > 76f92008
> > 
> > > > > > 00000000
> > > > > > 3ea0: 07a7e79d bf363e70 8005cda0 ffff1240 600f01d1 ffffffff
> > 
> > 8005cd04
> > 
> > > > > > 0000002f
> > > > > > 3ec0: 0000002f 800598bc 8058cc70 8000ed00 f820010c 8059684c
> > 
> > bf363ef8
> > 
> > > > > > 80008528
> > > > > > 3ee0: 80023730 80023744 200f0113 ffffffff bf363f2c 80012180
> > 
> > 00000000
> > 
> > > > > > 805baa00
> > > > > > 3f00: 00000000 00000100 00000002 00000022 00000000 bf362000
> > 
> > 76e8e4d0
> > 
> > > > > > 80590080
> > > > > > 3f20: 76f92008 00000000 0000000a bf363f40 80023730 80023744
> > 
> > 200f0113
> > 
> > > > > > ffffffff
> > > > > > 3f40: bf007a14 8059ac00 00000000 0000000a ffff8dd7 00400140
> > 
> > bf0079c0
> > 
> > > > > > 8058cc70
> > > > > > 3f60: 00000022 00000000 f8200100 76e8e4d0 76f9201c 76f92008
> > 
> > 00000000
> > 
> > > > > > 80023af0
> > > > > > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528
> > 
> > 00000000
> > 
> > > > > > 76dd3b44
> > > > > > 3fa0: 600f0010 ffffffff 0000000c 8001233c 00000000 00000000
> > 
> > 76f93428
> > 
> > > > > > 76f93428
> > > > > > 3fc0: 76f93438 00000000 76f93448 0000000c 76e8e4d0 76f9201c
> > 
> > 76f92008
> > 
> > > > > > 00000000
> > > > > > 3fe0: 00000000 7ec115c0 76f60914 76dd3b44 600f0010 ffffffff
> > 
> > 9fffd821
> > 
> > > > > > 9fffdc21
> > > > > > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>]
> > > > 
> > > > (gic_eoi_irq+0x0/0x4c)
> > > > 
> > > > > It certainly looks like we are going down the standard IRQ patch as
> > 
> > you
> > 
> > > > > suggested.  I'm not a Linux driver guy, but do you see any kind of
> > > > 
> > > > activity
> > > > 
> > > > > (break points, printfs, ...) through your FIQ handler?
> > > > 
> > > > I am reaching 0xffff1224 which i believe is the fiq vector address on
> > 
> > the
> > 
> > > > vexpress?
> > > 
> > > Hmmm.... not sure.  As you mentioned previously (and as seen in the
> > > above
> > > register dump), I would expect offset 0x1240 (pc=0xffff1240) for an FIQ.
> > > I'm not sure what is at offset 0x1224, but on my Linux kernel it appears
> > > that offset 0x1220 is vector_addrexcptn (not pabort), that happens to
> > > occupy the HYP trap vector.
> > 
> > Zounds! You're right, i think this was a typo in my debug script. Which i
> > didn't notice. But i am even reaching 0x1240 before but not 0x1244 which
> > means
> 
> I wouldn't expect it to reach 0x1244 as that is the word after what I
> believe should be a branch at 0x1240 to the FIQ handler.  This would mean
> we are not overrunning the vector table though.
There is reason that the FIQ entry is the last in the Vector table. Its 
allowed to put code directly at the interrupt vector table which saves one 
jump. Thats what Linux assumes when using arch/arm/kernel/fiq.c set_fiq_handler
does. It takes some binary blob and copies it just at the FIQ handler space.

> > it aborts on the first fiq instructions. Here is the "-d int" output
> > directly
> > after the FIQ hits:
> > Taking exception 3 [Prefetch Abort]
> > ...with IFSR 0x5 IFAR 0x800c8dcc  //kmem_cache_alloc
> > Taking exception 3 [Prefetch Abort]
> > ...with IFSR 0x5 IFAR 0x8001be00 //v7_pabort
> > Taking exception 3 [Prefetch Abort]
> > and then it continue to fail on v7_pabort repeatedly. This shows that
> > there is
> > something fishy going on. It is failing on the presumed handler for the
> > prefetch abort? But as i see earlier resolved prefetched abort errors i
> > can
> > conclude that it works up to the point where the CPU is in FIQ mode.
> > FIQ is special in a way that static mapped memory is needed to avoid a
> > page
> > lookup as this fails under linux in fiq mode. But 0x800c8dcc
> > (kmem_cache_alloc)
> > is not called in the FIQ handler which obviously can't use any Linux
> > infrastructure. And as i do not reach the breakpoint 0xffff1244 these
> > misses
> > happen on the execution of the first address of the FIQ handler.
> 
> Can we check the vector table to see if the FIQ entry is as expected?  It
> appears that the pabort may be in the right place, but it would be good to
> see if the FIQ entry is correct (branching to right place).  I'd expect
> that we should be branching to __fiq_svc?  Maybe setting a breakpoint in
> the first level handler may be useful?
Ok, was digging also into this. I thought i had checked this already but alas 
it seems the values are not the ones i put into set_fiq_handler. So i digged a 
little deeper and tried to find out the values of  vbar, mvbar hvbar.

This is the gcc inline assembly syntax from my kernel module written in c:
asm("mrc p15, 0, %0, c12, c0, 0" : "=r"(vbar) : : "cc");
asm("mrc p15, 0, %0, c12, c0, 1" : "=r"(mvbar) : : "cc"); <- not implemented?
asm("mrc p15, 4, %0, c12, c0, 0" : "=r"(hvbar) : : "cc");  <- not implemented?

It seems as if neither mvbar nor hvbar are implemented and that vbar returns
zero !? I also have a problem with the addresses:
The fiq handler lies at 0xffff1240 but the vectors_page in Linux points to 
0xbfffe000? You where talking about the fact that the security extensions
where not implemented. I was not aware that the different vbar's where
already part of the security stuff?

> > > > > > [<80230d04>] (gic_eoi_irq) from [<f8200100>] (0xf8200100)
> > > > > > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > > > > > ---[ end trace 3dc3571209a017e1 ]---
> > > > > > Kernel panic - not syncing: Fatal exception in interrupt
> > > > > 
> > > > > It is hard to determine entirely what is happening here based on
> > > > > this
> > > > > info.  I do have code of my own that routes KGDB interrupts as FIQs
> > 
> > and
> > 
> > > > > with the workaround I see the FIQs handled as expected.  Some things
> > 
> > we
> > 
> > > > can
> > > > 
> > > > > try to get more info in hopes of pinpointing where to look:
> > > > >    1. At the top of hw/intc/arm_gic.c there is the following
> > 
> > commented
> > 
> > > > out
> > > > 
> > > > >    line:
> > > > >        //#define DEBUG_GIC
> > > > >    
> > > > >    Uncomment the line, rebuild and rerun.  This will give us some
> > 
> > trace
> > 
> > > > on
> > > > 
> > > > >    what is going through the GIC code.
> > > > 
> > > > I have commented out some debug lines but i see:
> > > > Breakpoint 1, gic_update_with_grouping (s=0x5555564dba80) at
> > > > hw/intc/arm_gic.c:120
> > > > 120                         DPRINTF("Raised pending FIQ %d (cpu
> > > > %d)\n",
> > > > best_irq, cpu);
> > > > 
> > > > With the expected irq nr. 49 (32+17).
> > > > 
> > > > >    2. Run qemu with the "-d int" option which will print a message
> > > > >    on
> > > > 
> > > > each
> > > > 
> > > > >    interrupt.  We should see an FIQ at some point if they are
> > 
> > occurring.
> > 
> > > > The
> > > > 
> > > > > only issue is that there will be numerous IRQs, so you'll have to
> > 
> > parse
> > 
> > > > > through them to find an "exception 6 [FIQ].
> > > > 
> > > > Here is the relevant output when the FIQ hits:
> > > > Taking exception 2 [SVC]
> > > > Taking exception 2 [SVC]
> > > > pml: pml_timer_tick: raise_irq
> > > > arm_gic: Raised pending FIQ 49 (cpu 0)
> > > > Taking exception 6 [FIQ]
> > > 
> > > This looks to me like the GIC has caught the interrupt and communicated
> > 
> > it
> > 
> > > to the CPU causing it to take the FIQ exception.
> > > 
> > > > pml: pml_write: update control flags: 1
> > > > pml: pml_update: start timer
> > > > pml: pml_update: lower irq
> > > > pml: pml_read: read magic
> > > > pml: pml_write: update control flags: 3
> > > > pml: pml_update: start timer
> > > 
> > > Is pml your test driver?  It looks like it initiates the interrupt and
> > > possibly performs some handling following it?
> > 
> > Yes, its just a simple set of some registers to control an interrupt.
> > There is
> > i added debug output to this driver to see if and when the FIQ is
> > accessing
> > the registers. But i see no accesses from FIQ mode.
> > 
> > > > Taking exception 3 [Prefetch Abort]
> > > > ...with IFSR 0x5 IFAR 0x80221d70
> > > > Taking exception 4 [Data Abort]
> > > > ...with DFSR 0x805 DFAR 0x805c604c
> > > > Taking exception 4 [Data Abort]
> > > > ...with DFSR 0x805 DFAR 0x805c604c
> > > > Taking exception 4 [Data Abort]
> > > > 
> > > > So the fiq is hitting but unfortunatly i have no idea where the data
> > > > aborts are coming from.
> > > 
> > > The data aborts are likely a side effect of the prefetch abort taken
> > 
> > before
> > 
> > > them; it is the interesting one.
> > 
> > Still as above the address is odd. In FIQ mode it should not jump to this
> > address at all !?! This is definetly Linux memory space and i am not
> > calling
> > anything linux related from FIQ.
> 
> I'm a bit confused as it appears the exception pattern has changed.
> Previously, we were seeing pabt, dabt, dabt, ..., but then up above the
> output is pabt, pabt, pabt, ... .  So, either we are jumping somewhere
> random thus breaking repeatability or something else changed?  This is also
> reflected in the A15 output below, but its different.
Mh, it seems that the first Prefetch Abort IFAR is random and IFSR is also 
differnt:
First run:
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76f6a5e4
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fcc25c
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76e7f884
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76e7e61c
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76ee5b40
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fce060
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fd245c
Taking exception 4 [Data Abort]
...with DFSR 0x17 DFAR 0x76fcdf88

Second run:
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 1 [Undefined Instruction]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 1 [Undefined Instruction]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]

Third run:
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 2 [SVC]
Taking exception 3 [Prefetch Abort]
...with IFSR 0x17 IFAR 0x76ea8000
Taking exception 2 [SVC]
Taking exception 1 [Undefined Instruction]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 2 [SVC]
Taking exception 4 [Data Abort]
...with DFSR 0x817 DFAR 0x76d20000
Taking exception 2 [SVC]
Taking exception 2 [SVC]

But then again, i think that the kernel and qemu disagree about the position
of the vector table. So it seems its just accessing random in the irq which 
leads to different results each time. 

So probably will just outline what i do in the kernel to get a fiq from the 
gic:
* i create some static mappings to make sure i have no pagefault in FIQ mode
* Then i reprogramm the gic in a way that all irqs are mapped to group 1 
* one special irq which will be programmed to group 0
* fiq mode is enabled for group 0
It seems as if the gic implementation in qemu knows this dance as i am seeing 
the FIQ happening. But then i have my doubts (due to the missing vbars and the 
different addresses kernel vs. qemu) that the cpu is up to the task in qemu?

The code i am using have been ported from an Altera SOC Cortex A9 and works 
there. While some addresses where hardcoded i *think* that i have meanwhile 
found all wrong addresses in the static mappings which bite me earlier. But 
now i have the impression that i have to dig on the qemu side again.

Best regards
Tim
Peter Maydell Nov. 13, 2014, 4:46 p.m. UTC | #8
On 13 November 2014 16:26, Tim Sander <tim@krieglstein.org> wrote:
> This is the gcc inline assembly syntax from my kernel module written in c:
> asm("mrc p15, 0, %0, c12, c0, 0" : "=r"(vbar) : : "cc");
> asm("mrc p15, 0, %0, c12, c0, 1" : "=r"(mvbar) : : "cc"); <- not implemented?
> asm("mrc p15, 4, %0, c12, c0, 0" : "=r"(hvbar) : : "cc");  <- not implemented?
>
> It seems as if neither mvbar nor hvbar are implemented and that vbar returns
> zero !? I also have a problem with the addresses:
> The fiq handler lies at 0xffff1240 but the vectors_page in Linux points to
> 0xbfffe000? You where talking about the fact that the security extensions
> where not implemented. I was not aware that the different vbar's where
> already part of the security stuff?

MVBAR is part of the Security extensions. HVBAR is part of the
Virtualization extensions. In mainline QEMU we implement neither
of those extensions, and so don't implement the associated
registers. (Strictly speaking, VBAR is also only in the
Security extensions, but we provide it as a workaround for
guests that assume our CPUs should implement it.)

-- PMM
Greg Bellows Nov. 13, 2014, 8:09 p.m. UTC | #9
On 13 November 2014 10:46, Peter Maydell <peter.maydell@linaro.org> wrote:

> On 13 November 2014 16:26, Tim Sander <tim@krieglstein.org> wrote:
> > This is the gcc inline assembly syntax from my kernel module written in
> c:
> > asm("mrc p15, 0, %0, c12, c0, 0" : "=r"(vbar) : : "cc");
> > asm("mrc p15, 0, %0, c12, c0, 1" : "=r"(mvbar) : : "cc"); <- not
> implemented?
> > asm("mrc p15, 4, %0, c12, c0, 0" : "=r"(hvbar) : : "cc");  <- not
> implemented?
> >
> > It seems as if neither mvbar nor hvbar are implemented and that vbar
> returns
> > zero !? I also have a problem with the addresses:
> > The fiq handler lies at 0xffff1240 but the vectors_page in Linux points
> to
> > 0xbfffe000? You where talking about the fact that the security extensions
> > where not implemented. I was not aware that the different vbar's where
> > already part of the security stuff?
>
> MVBAR is part of the Security extensions. HVBAR is part of the
> Virtualization extensions. In mainline QEMU we implement neither
> of those extensions, and so don't implement the associated
> registers. (Strictly speaking, VBAR is also only in the
> Security extensions, but we provide it as a workaround for
> guests that assume our CPUs should implement it.)
>

Peter beat me to it.  None of the VBAR registers should matter in your case
which coincides with the use of hivecs.

It may be worthwhile to put a kernel breakpoint in handle_fiq_as_nmi() just
to see where it goes.  If CONFIG_ARM_GIC is enabled it should take you to
your handler I suspect.  Plus, if you get there then we have likely proven
that QEMU is getting the kernel to the right place.  I set a BP in this
routine on my A9 run and appear to be hitting it correctly.

Let me know how I can help in debugging the QEMU side of things.


>
> -- PMM
>
Tim Sander Nov. 14, 2014, 3:34 p.m. UTC | #10
> > > 0xbfffe000? You where talking about the fact that the security
> > > extensions
> > > where not implemented. I was not aware that the different vbar's where
> > > already part of the security stuff?
> > 
> > MVBAR is part of the Security extensions. HVBAR is part of the
> > Virtualization extensions. In mainline QEMU we implement neither
> > of those extensions, and so don't implement the associated
> > registers. (Strictly speaking, VBAR is also only in the
> > Security extensions, but we provide it as a workaround for
> > guests that assume our CPUs should implement it.)
> 
> Peter beat me to it.  None of the VBAR registers should matter in your case
> which coincides with the use of hivecs.
While writing this mail i found out that the integrated debugger is causing 
harm in combination with the fiq. So everything below the braces seems to
be related to the this problem. But i still wanted to keep the data points for 
reference:

{
Ok, so qemu only implements the SCTLR.V bit to control the memory address of 
the interrupt vector. So its either 0 or 0xffff0000. That is fine with me. 
Currently i have the problem that a call to set_fiq_handler does not place the 
binary stuff loaded at the address where qemu is jumping to which is presumably
0xffff1240. I have checked that SCTLR.V =1 under linux which is fine.

The background info to set_fiq_handler from my understanding is that it copies 
the given stuff directly at the address where the FIQ vector is located. This 
works as the FIQ is the last entry and thus there is some memory space for a 
short interrupt handler. I checked the memory when entering the FIQ with the
integrated gdb:
(gdb) info reg
r0             0x0      0
r1             0x0      0
r2             0x1      1
r3             0x76eb34c8       1995125960
r4             0x76eb34c8       1995125960
r5             0x76f633b8       1995846584
r6             0x2a     42
r7             0x76f4c28c       1995752076
r8             0xf8200100       -132120320
r9             0xe0040000       -536608768
r10            0x60004059       1610629209
r11            0x0      0
r12            0x0      0
sp             0x908be000       0x908be000
lr             0x76dfc108       1994375432
pc             0xffff1240       0xffff1240 <firq_fiq_handler>
cpsr           0x600f01d1       1611596241
(gdb) x 0xffff1240
0xe599b00c

But my firq_fiq_handler starts with 0xee12af10? I know that this works on real 
hardware so i suspect that this an error within qemu? Or at least that there 
is something amiss in the way the memory is initialized or handled.

Is there a way to instrument the memory below the vector table to get debug 
logs if the memory is modified?
}

> It may be worthwhile to put a kernel breakpoint in handle_fiq_as_nmi() just
> to see where it goes.  If CONFIG_ARM_GIC is enabled it should take you to
> your handler I suspect.  Plus, if you get there then we have likely proven
> that QEMU is getting the kernel to the right place.  I set a BP in this
> routine on my A9 run and appear to be hitting it correctly.
So you are talking about the linux kernel, right? CONFIG_ARM_GIC=y check but
i can't find handle_fiq_as_nmi? Even a fuzzier "rgrep nmi * |grep fiq" does not 
find anything. 

Concerning the fact that qemu is jumping to the right address:
To i have put a breakpoint to 0xffff001c which is the fiq base vector address.
There is an instruction 0xea000480 which seems to be a pc relative branch to 
0x1224 which then lands at 0xffff1240.

But the internal debugger gives me some concerns. If i do at the gdb command 
line:
hb *0xffff001c
hb *0xffff1240
The debugger only stops at the first breakpoint. If i leave the first breakpoint 
away the debugger stops at 0xffff1240. As i know that at 0xffff01c it should jump
right to 0xffff1240 i would expect that both breakpoints are triggered.

Then if i reach the breakpoint at 0xffff1240 i know i am at the fiq code. But 
(gdb) x 0xffff1240 gives the wrong value. Nevertheless i see now (after 
correcting the static map of the GIC) the following debug output of my test
device when single-stepping from PC=0xffff1240:
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: stop timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: stop timer

This means that there has been some code executed, most probably my FIQ 
handler, but the debugger showed me:
Breakpoint 1, firq_fiq_handler () at fiq.S:26
26              mrc p15, 0, r10, c2, c0, 0         @ read TTBR0   < ok
(gdb) s        <- oh my why is it single stepping into the kernel from FIQ?
test_ti_thread_flag (flag=1, ti=0x8f84e000) at include/asm-generic/preempt.h:71
71              return !--*preempt_count_ptr() && tif_need_resched();
(gdb) s      <- next step does not look any better...
test_bit (addr=0x8f84e000, nr=1) at include/asm-generic/bitops/non-
atomic.h:105
105             return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));

The second run is even stranger:
Breakpoint 1, firq_fiq_handler () at fiq.S:26
26              mrc p15, 0, r10, c2, c0, 0         @ read TTBR0
(gdb) s
Cannot access memory at address 0x4
(gdb) c
Continuing.
Cannot access memory at address 0x4
...
qemu seems completly unusable from here on...

I am pretty sure now that my FIQ handler is executed.
I see multiple accesses to my virtual pml test hardware:
arm_gic: Raised pending FIQ 49 (cpu 0)
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
arm_gic: Enabled IRQ 37
[  OK  ] Found device /dev/ttyAMA0.
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
pml: pml_timer_tick: raise_irq

Which seems like normal operation. Especially the log
message shows that other stuff gets executed.

But after a while the interrupts stop and nothing happens
The system is not reacting to keypresses anymore. Not even 
Ctrl-A-X. But this seems as if the debug output in the GIC and/or
my pml test driver locked the qemu up? 

Also if i connect to the gdb port while the fiq is running the
qemu stops the execution.

But besides the problems with the debugger which set me of course
the qemu seems to happy emulate FIQs, which is really nice :-)

Best regards
Tim
Greg Bellows Nov. 14, 2014, 4:50 p.m. UTC | #11
On 14 November 2014 09:34, Tim Sander <tim@krieglstein.org> wrote:

> > > > 0xbfffe000? You where talking about the fact that the security
> > > > extensions
> > > > where not implemented. I was not aware that the different vbar's
> where
> > > > already part of the security stuff?
> > >
> > > MVBAR is part of the Security extensions. HVBAR is part of the
> > > Virtualization extensions. In mainline QEMU we implement neither
> > > of those extensions, and so don't implement the associated
> > > registers. (Strictly speaking, VBAR is also only in the
> > > Security extensions, but we provide it as a workaround for
> > > guests that assume our CPUs should implement it.)
> >
> > Peter beat me to it.  None of the VBAR registers should matter in your
> case
> > which coincides with the use of hivecs.
> While writing this mail i found out that the integrated debugger is causing
> harm in combination with the fiq. So everything below the braces seems to
> be related to the this problem. But i still wanted to keep the data points
> for
> reference:
>
> {
> Ok, so qemu only implements the SCTLR.V bit to control the memory address
> of
> the interrupt vector. So its either 0 or 0xffff0000. That is fine with me.
> Currently i have the problem that a call to set_fiq_handler does not place
> the
> binary stuff loaded at the address where qemu is jumping to which is
> presumably
> 0xffff1240. I have checked that SCTLR.V =1 under linux which is fine.
>
> The background info to set_fiq_handler from my understanding is that it
> copies
> the given stuff directly at the address where the FIQ vector is located.
> This
> works as the FIQ is the last entry and thus there is some memory space for
> a
> short interrupt handler. I checked the memory when entering the FIQ with
> the
> integrated gdb:
> (gdb) info reg
> r0             0x0      0
> r1             0x0      0
> r2             0x1      1
> r3             0x76eb34c8       1995125960
> r4             0x76eb34c8       1995125960
> r5             0x76f633b8       1995846584
> r6             0x2a     42
> r7             0x76f4c28c       1995752076
> r8             0xf8200100       -132120320
> r9             0xe0040000       -536608768
> r10            0x60004059       1610629209
> r11            0x0      0
> r12            0x0      0
> sp             0x908be000       0x908be000
> lr             0x76dfc108       1994375432
> pc             0xffff1240       0xffff1240 <firq_fiq_handler>
> cpsr           0x600f01d1       1611596241
> (gdb) x 0xffff1240
> 0xe599b00c
>
> But my firq_fiq_handler starts with 0xee12af10? I know that this works on
> real
> hardware so i suspect that this an error within qemu? Or at least that
> there
> is something amiss in the way the memory is initialized or handled.
>
> Is there a way to instrument the memory below the vector table to get debug
> logs if the memory is modified?
> }
>
> > It may be worthwhile to put a kernel breakpoint in handle_fiq_as_nmi()
> just
> > to see where it goes.  If CONFIG_ARM_GIC is enabled it should take you to
> > your handler I suspect.  Plus, if you get there then we have likely
> proven
> > that QEMU is getting the kernel to the right place.  I set a BP in this
> > routine on my A9 run and appear to be hitting it correctly.
> So you are talking about the linux kernel, right? CONFIG_ARM_GIC=y check
> but
> i can't find handle_fiq_as_nmi? Even a fuzzier "rgrep nmi * |grep fiq"
> does not
> find anything.
>

Maybe we are working off different versions of the kernel sources.  I'm
using a kernel variant of v3.18-rc1.  I took a look at my 3.15 kernel and
it does not have the routine, so perhaps yours is an earlier version as
well.

I don't spend much time working in or tracking the Linux kernel, so I am
not sure when the difference was introduced.  I just found it to be a
convenient function to set a BP for early FIQ debugging, you may have
something different.

Interestingly, as I researched the Linux FIQ support I found this mail
thread...

http://www.spinics.net/lists/arm-kernel/msg14960.html

As I don't have access to your code, I could not verify that the SVC SPSR
was being preserved, but it may be worth you looking into it as I can see
this potentially resulting in all kinds of random behavior.  More
interestingly, this comment and code appears to have been changed in later
versions of the FIQ code, so perhaps this has been fixed or improved (My
3.18 kernel does not have the comment).


> Concerning the fact that qemu is jumping to the right address:
> To i have put a breakpoint to 0xffff001c which is the fiq base vector
> address.
> There is an instruction 0xea000480 which seems to be a pc relative branch
> to
> 0x1224 which then lands at 0xffff1240.
>
> But the internal debugger gives me some concerns. If i do at the gdb
> command
> line:
> hb *0xffff001c
> hb *0xffff1240
> The debugger only stops at the first breakpoint. If i leave the first
> breakpoint
> away the debugger stops at 0xffff1240. As i know that at 0xffff01c it
> should jump
> right to 0xffff1240 i would expect that both breakpoints are triggered.
>
> Then if i reach the breakpoint at 0xffff1240 i know i am at the fiq code.
> But
> (gdb) x 0xffff1240 gives the wrong value. Nevertheless i see now (after
> correcting the static map of the GIC) the following debug output of my test
> device when single-stepping from PC=0xffff1240:
> Taking exception 6 [FIQ]
> pml: pml_write: update control flags: 1
> pml: pml_update: stop timer
> pml: pml_update: lower irq
> pml: pml_read: read magic
> pml: pml_write: update control flags: 3
> pml: pml_update: stop timer
>
> This means that there has been some code executed, most probably my FIQ
> handler, but the debugger showed me:
> Breakpoint 1, firq_fiq_handler () at fiq.S:26
> 26              mrc p15, 0, r10, c2, c0, 0         @ read TTBR0   < ok
> (gdb) s        <- oh my why is it single stepping into the kernel from FIQ?
> test_ti_thread_flag (flag=1, ti=0x8f84e000) at
> include/asm-generic/preempt.h:71
> 71              return !--*preempt_count_ptr() && tif_need_resched();
> (gdb) s      <- next step does not look any better...
> test_bit (addr=0x8f84e000, nr=1) at include/asm-generic/bitops/non-
> atomic.h:105
> 105             return 1UL & (addr[BIT_WORD(nr)] >> (nr &
> (BITS_PER_LONG-1)));
>
> The second run is even stranger:
> Breakpoint 1, firq_fiq_handler () at fiq.S:26
> 26              mrc p15, 0, r10, c2, c0, 0         @ read TTBR0
> (gdb) s
> Cannot access memory at address 0x4
> (gdb) c
> Continuing.
> Cannot access memory at address 0x4
> ...
> qemu seems completly unusable from here on...
>
> I am pretty sure now that my FIQ handler is executed.
> I see multiple accesses to my virtual pml test hardware:
> arm_gic: Raised pending FIQ 49 (cpu 0)
> pml: pml_write: update control flags: 1
> pml: pml_update: start timer
> pml: pml_update: lower irq
> pml: pml_read: read magic
> pml: pml_write: update control flags: 3
> pml: pml_update: start timer
> arm_gic: Enabled IRQ 37
> [  OK  ] Found device /dev/ttyAMA0.
> pml: pml_timer_tick: raise_irq
> arm_gic: Raised pending FIQ 49 (cpu 0)
> pml: pml_write: update control flags: 1
> pml: pml_update: start timer
> pml: pml_update: lower irq
> pml: pml_read: read magic
> pml: pml_write: update control flags: 3
> pml: pml_update: start timer
> pml: pml_timer_tick: raise_irq
>
> Which seems like normal operation. Especially the log
> message shows that other stuff gets executed.
>
> But after a while the interrupts stop and nothing happens
> The system is not reacting to keypresses anymore. Not even
> Ctrl-A-X. But this seems as if the debug output in the GIC and/or
> my pml test driver locked the qemu up?
>

Hmmm... almost sounds like we lost an interrupt or ack which could be in
QEMU. Does execution cease if run as A15?


>
> Also if i connect to the gdb port while the fiq is running the
> qemu stops the execution.
>
> But besides the problems with the debugger which set me of course
> the qemu seems to happy emulate FIQs, which is really nice :-)
>
>
I'm happy to hear that we found a working scenario, but hangs and such
should not happen.  I need to determine a way to  look into this more
myself to see if it is related to grouping or FIQ support.


> Best regards
> Tim
>
Tim Sander Nov. 17, 2014, 2:33 p.m. UTC | #12
Hi Greg

Am Freitag, 14. November 2014, 10:50:40 schrieb Greg Bellows:
> On 14 November 2014 09:34, Tim Sander <tim@krieglstein.org> wrote:
> > > > > 0xbfffe000? You where talking about the fact that the security
> > > > > extensions
> > > > > where not implemented. I was not aware that the different vbar's
> > 
> > where
> > 
> > > > > already part of the security stuff?
> > > > 
> > > > MVBAR is part of the Security extensions. HVBAR is part of the
> > > > Virtualization extensions. In mainline QEMU we implement neither
> > > > of those extensions, and so don't implement the associated
> > > > registers. (Strictly speaking, VBAR is also only in the
> > > > Security extensions, but we provide it as a workaround for
> > > > guests that assume our CPUs should implement it.)
> > > 
> > > Peter beat me to it.  None of the VBAR registers should matter in your
> > 
> > case
> > 
> > > which coincides with the use of hivecs.
> > 
> > While writing this mail i found out that the integrated debugger is
> > causing
> > harm in combination with the fiq. So everything below the braces seems to
> > be related to the this problem. But i still wanted to keep the data points
> > for
> > reference:
> > 
> > {
> > Ok, so qemu only implements the SCTLR.V bit to control the memory address
> > of
> > the interrupt vector. So its either 0 or 0xffff0000. That is fine with me.
> > Currently i have the problem that a call to set_fiq_handler does not place
> > the
> > binary stuff loaded at the address where qemu is jumping to which is
> > presumably
> > 0xffff1240. I have checked that SCTLR.V =1 under linux which is fine.
> > 
> > The background info to set_fiq_handler from my understanding is that it
> > copies
> > the given stuff directly at the address where the FIQ vector is located.
> > This
> > works as the FIQ is the last entry and thus there is some memory space for
> > a
> > short interrupt handler. I checked the memory when entering the FIQ with
> > the
> > integrated gdb:
> > (gdb) info reg
> > r0             0x0      0
> > r1             0x0      0
> > r2             0x1      1
> > r3             0x76eb34c8       1995125960
> > r4             0x76eb34c8       1995125960
> > r5             0x76f633b8       1995846584
> > r6             0x2a     42
> > r7             0x76f4c28c       1995752076
> > r8             0xf8200100       -132120320
> > r9             0xe0040000       -536608768
> > r10            0x60004059       1610629209
> > r11            0x0      0
> > r12            0x0      0
> > sp             0x908be000       0x908be000
> > lr             0x76dfc108       1994375432
> > pc             0xffff1240       0xffff1240 <firq_fiq_handler>
> > cpsr           0x600f01d1       1611596241
> > (gdb) x 0xffff1240
> > 0xe599b00c
> > 
> > But my firq_fiq_handler starts with 0xee12af10? I know that this works on
> > real
> > hardware so i suspect that this an error within qemu? Or at least that
> > there
> > is something amiss in the way the memory is initialized or handled.
> > 
> > Is there a way to instrument the memory below the vector table to get
> > debug
> > logs if the memory is modified?
> > }
> > 
> > > It may be worthwhile to put a kernel breakpoint in handle_fiq_as_nmi()
> > 
> > just
> > 
> > > to see where it goes.  If CONFIG_ARM_GIC is enabled it should take you
> > > to
> > > your handler I suspect.  Plus, if you get there then we have likely
> > 
> > proven
> > 
> > > that QEMU is getting the kernel to the right place.  I set a BP in this
> > > routine on my A9 run and appear to be hitting it correctly.
> > 
> > So you are talking about the linux kernel, right? CONFIG_ARM_GIC=y check
> > but
> > i can't find handle_fiq_as_nmi? Even a fuzzier "rgrep nmi * |grep fiq"
> > does not
> > find anything.
> 
> Maybe we are working off different versions of the kernel sources.  I'm
> using a kernel variant of v3.18-rc1.  I took a look at my 3.15 kernel and
> it does not have the routine, so perhaps yours is an earlier version as
> well.
I am on 3.14 as i am working with rt-preempt kernels right now.

> I don't spend much time working in or tracking the Linux kernel, so I am
> not sure when the difference was introduced.  I just found it to be a
> convenient function to set a BP for early FIQ debugging, you may have
> something different.
> 
> Interestingly, as I researched the Linux FIQ support I found this mail
> thread...
> 
> http://www.spinics.net/lists/arm-kernel/msg14960.html
> 
> As I don't have access to your code, I could not verify that the SVC SPSR
> was being preserved, but it may be worth you looking into it as I can see
> this potentially resulting in all kinds of random behavior.  More
> interestingly, this comment and code appears to have been changed in later
> versions of the FIQ code, so perhaps this has been fixed or improved (My
> 3.18 kernel does not have the comment).
I have not following the 3.18 kernel concering the FIQ but i will take a look.
But regarding the above link i think preserving SPSR is only needed if mode 
switching is beeing done from fiq. But as i just return from the handler i am 
assuming that the problem above is not mine. The only problem i have (besides 
the qemu debugger) is that i am missing some static mappings so i get a bad 
mode error when hitting a pagefault in FIQ mode. 

> > Concerning the fact that qemu is jumping to the right address:
> > To i have put a breakpoint to 0xffff001c which is the fiq base vector
> > address.
> > There is an instruction 0xea000480 which seems to be a pc relative branch
> > to
> > 0x1224 which then lands at 0xffff1240.
> > 
> > But the internal debugger gives me some concerns. If i do at the gdb
> > command
> > line:
> > hb *0xffff001c
> > hb *0xffff1240
> > The debugger only stops at the first breakpoint. If i leave the first
> > breakpoint
> > away the debugger stops at 0xffff1240. As i know that at 0xffff01c it
> > should jump
> > right to 0xffff1240 i would expect that both breakpoints are triggered.
> > 
> > Then if i reach the breakpoint at 0xffff1240 i know i am at the fiq code.
> > But
> > (gdb) x 0xffff1240 gives the wrong value. Nevertheless i see now (after
> > correcting the static map of the GIC) the following debug output of my
> > test
> > device when single-stepping from PC=0xffff1240:
> > Taking exception 6 [FIQ]
> > pml: pml_write: update control flags: 1
> > pml: pml_update: stop timer
> > pml: pml_update: lower irq
> > pml: pml_read: read magic
> > pml: pml_write: update control flags: 3
> > pml: pml_update: stop timer
> > 
> > This means that there has been some code executed, most probably my FIQ
> > handler, but the debugger showed me:
> > Breakpoint 1, firq_fiq_handler () at fiq.S:26
> > 26              mrc p15, 0, r10, c2, c0, 0         @ read TTBR0   < ok
> > (gdb) s        <- oh my why is it single stepping into the kernel from
> > FIQ?
> > test_ti_thread_flag (flag=1, ti=0x8f84e000) at
> > include/asm-generic/preempt.h:71
> > 71              return !--*preempt_count_ptr() && tif_need_resched();
> > (gdb) s      <- next step does not look any better...
> > test_bit (addr=0x8f84e000, nr=1) at include/asm-generic/bitops/non-
> > atomic.h:105
> > 105             return 1UL & (addr[BIT_WORD(nr)] >> (nr &
> > (BITS_PER_LONG-1)));
> > 
> > The second run is even stranger:
> > Breakpoint 1, firq_fiq_handler () at fiq.S:26
> > 26              mrc p15, 0, r10, c2, c0, 0         @ read TTBR0
> > (gdb) s
> > Cannot access memory at address 0x4
> > (gdb) c
> > Continuing.
> > Cannot access memory at address 0x4
> > ...
> > qemu seems completly unusable from here on...
> > 
> > I am pretty sure now that my FIQ handler is executed.
> > I see multiple accesses to my virtual pml test hardware:
> > arm_gic: Raised pending FIQ 49 (cpu 0)
> > pml: pml_write: update control flags: 1
> > pml: pml_update: start timer
> > pml: pml_update: lower irq
> > pml: pml_read: read magic
> > pml: pml_write: update control flags: 3
> > pml: pml_update: start timer
> > arm_gic: Enabled IRQ 37
> > [  OK  ] Found device /dev/ttyAMA0.
> > pml: pml_timer_tick: raise_irq
> > arm_gic: Raised pending FIQ 49 (cpu 0)
> > pml: pml_write: update control flags: 1
> > pml: pml_update: start timer
> > pml: pml_update: lower irq
> > pml: pml_read: read magic
> > pml: pml_write: update control flags: 3
> > pml: pml_update: start timer
> > pml: pml_timer_tick: raise_irq
> > 
> > Which seems like normal operation. Especially the log
> > message shows that other stuff gets executed.
> > 
> > But after a while the interrupts stop and nothing happens
> > The system is not reacting to keypresses anymore. Not even
> > Ctrl-A-X. But this seems as if the debug output in the GIC and/or
> > my pml test driver locked the qemu up?
> 
> Hmmm... almost sounds like we lost an interrupt or ack which could be in
> QEMU. Does execution cease if run as A15?
I think by now that its not related to the CPU core but the gdb debug port.
As soon as the debugger is open and a fiq is hit, problems start. This was 
a bit unfortunate for my tests as i was using the integrated debugger to debug 
the fiq. But results are completly bogus and definetly do not represent the
qemu execution as this is running fine as i can see from my debug output from 
my virtual hardware driver.

> > Also if i connect to the gdb port while the fiq is running the
> > qemu stops the execution.
> > 
> > But besides the problems with the debugger which set me of course
> > the qemu seems to happy emulate FIQs, which is really nice :-)
> 
> I'm happy to hear that we found a working scenario, but hangs and such
> should not happen.  I need to determine a way to  look into this more
> myself to see if it is related to grouping or FIQ support.
I can prepare you a ptxdist.org based environment and my patches for my 
testdriver if you need a target. This should give you a linux environment in 
less than 30minutes of work and about 30min of compile time (depending on 
cpu). 

Best regards
Tim
diff mbox

Patch

--- a/hw/cpu/a9mpcore.c
+++ b/hw/cpu/a9mpcore.c
@@ -29,6 +29,8 @@  static void a9mp_priv_initfn(Object *obj)

     object_initialize(&s->gic, sizeof(s->gic), TYPE_ARM_GIC);
     qdev_set_parent_bus(DEVICE(&s->gic), sysbus_get_default());
+    qdev_prop_set_uint32(DEVICE(&s->gic), "revision", 2);

     object_initialize(&s->gtimer, sizeof(s->gtimer), TYPE_A9_GTIMER);
     qdev_set_parent_bus(DEVICE(&s->gtimer), sysbus_get_default());