Patchwork Broken Microblaze timer

login
register
mail settings
Submitter Peter A. G. Crosthwaite
Date June 14, 2012, 2:29 a.m.
Message ID <CAEgOgz6UwyirFOfTe9UnK3_pLyz6nhJdfuhon4L+Nw_nOMyyCQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/164787/
State New
Headers show

Comments

Peter A. G. Crosthwaite - June 14, 2012, 2:29 a.m.
Hi all,

For a while now I have had this hack in my tree:

Author: Peter A. G. Crosthwaite <peter.crosthwaite@petalogix.com>
Date:   Fri Mar 30 15:04:05 2012 +1000

    async.c: disabled event notifications.

    This is a nasty hack thats needed to prevent the xilinx timer from
deadlocking

    Signed-off-by: Peter A. G. Crosthwaite <peter.crosthwaite@petalogix.com>

freezes on boot. I managed to ascertain that its coming from the
ptimer used by the system timer (hw/xilinx_timer.c). The CPU is either
halted and never resumes, or the timer is flooding with halt requests
and the CPU never gets another look in.

The question is, is this a failure in ptimer, xilinx_timer or the
async framework? Can ptimer be misused such that the CPU is locked up?

I wish to push my linux image as test image for microblaze little
endian, but this issue acts as a blocker. I could play with kernel
configs to get it to behave like the s3adsp image of Edgars, but id
rather fix the issue in QEMU. I will upload my images to qemu.org when
I sort out accnts etc.

Regards,
Peter
Paolo Bonzini - June 14, 2012, 9:19 a.m.
Il 14/06/2012 04:29, Peter Crosthwaite ha scritto:
> Obviously this sucks as a patch, but without this hack, the system
> freezes on boot. I managed to ascertain that its coming from the
> ptimer used by the system timer (hw/xilinx_timer.c). The CPU is either
> halted and never resumes, or the timer is flooding with halt requests
> and the CPU never gets another look in.
> 
> The question is, is this a failure in ptimer, xilinx_timer or the
> async framework? Can ptimer be misused such that the CPU is locked up?

Yes, this looks like the ptimer is flooding the iothread so that the CPU
does not get an occasion to run.

Perhaps you want to limit the rate of the hw/xilinx_timer.c to something
like 1000 Hz or something like that.

Paolo
Peter Chubb - June 14, 2012, 10:39 p.m.
>>>>> "Paolo" == Paolo Bonzini <pbonzini@redhat.com> writes:

Paolo> Il 14/06/2012 04:29, Peter Crosthwaite ha scritto:
>> Obviously this sucks as a patch, but without this hack, the system
>> freezes on boot. I managed to ascertain that its coming from the
>> ptimer used by the system timer (hw/xilinx_timer.c). The CPU is
>> either halted and never resumes, or the timer is flooding with halt
>> requests and the CPU never gets another look in.
>> 
>> The question is, is this a failure in ptimer, xilinx_timer or the
>> async framework? Can ptimer be misused such that the CPU is locked
>> up?

Paolo> Yes, this looks like the ptimer is flooding the iothread so
Paolo> that the CPU does not get an occasion to run.

Paolo> Perhaps you want to limit the rate of the hw/xilinx_timer.c to
Paolo> something like 1000 Hz or something like that.

Changeset cf36b31db209a261ee3bc2737e788e1ced0a1bec Limit ptimer rate
to something achievable
is meant to fix this in the cases where the timer is auto-reloaded.

xilinx_timer.c always uses the timer in one-shot mode, so it needs to
be fixed in there.



--
Dr Peter Chubb				        peter.chubb AT nicta.com.au
http://www.ssrg.nicta.com.au          Software Systems Research Group/NICTA
Peter A. G. Crosthwaite - June 15, 2012, 4:40 a.m.
On Fri, Jun 15, 2012 at 8:39 AM, Peter Chubb <peter.chubb@nicta.com.au> wrote:
>>>>>> "Paolo" == Paolo Bonzini <pbonzini@redhat.com> writes:
>
> Paolo> Il 14/06/2012 04:29, Peter Crosthwaite ha scritto:
>>> Obviously this sucks as a patch, but without this hack, the system
>>> freezes on boot. I managed to ascertain that its coming from the
>>> ptimer used by the system timer (hw/xilinx_timer.c). The CPU is
>>> either halted and never resumes, or the timer is flooding with halt
>>> requests and the CPU never gets another look in.
>>>
>>> The question is, is this a failure in ptimer, xilinx_timer or the
>>> async framework? Can ptimer be misused such that the CPU is locked
>>> up?
>
> Paolo> Yes, this looks like the ptimer is flooding the iothread so
> Paolo> that the CPU does not get an occasion to run.
>
> Paolo> Perhaps you want to limit the rate of the hw/xilinx_timer.c to
> Paolo> something like 1000 Hz or something like that.
>
> Changeset cf36b31db209a261ee3bc2737e788e1ced0a1bec Limit ptimer rate
> to something achievable
> is meant to fix this in the cases where the timer is auto-reloaded.
>
> xilinx_timer.c always uses the timer in one-shot mode, so it needs to
> be fixed in there.
>

So ptimer has safeguards against misuse in periodic mode but not
one-shot mode? This strikes me as inconsistent. I.E. for one use case
the safeguard is in ptimer, and the other in the client device. I
think if we are in the bussiness of putting these safeguards in ptimer
itself then it should be there for all use cases yes? I.E. the fix for
this is making sure one-shot ptimer give the cpu a look in.

Regards,
Peter

>
>
> --
> Dr Peter Chubb                                  peter.chubb AT nicta.com.au
> http://www.ssrg.nicta.com.au          Software Systems Research Group/NICTA
Peter Chubb - June 15, 2012, 5:07 a.m.
>>>>> "Peter" == Peter Crosthwaite <peter.crosthwaite@petalogix.com> writes:

Peter> So ptimer has safeguards against misuse in periodic mode but
Peter> not one-shot mode? This strikes me as inconsistent. I.E. for
Peter> one use case the safeguard is in ptimer, and the other in the
Peter> client device. I think if we are in the bussiness of putting
Peter> these safeguards in ptimer itself then it should be there for
Peter> all use cases yes? I.E. the fix for this is making sure
Peter> one-shot ptimer give the cpu a look in.

When a one-shot timer fires, it has to be reset by the CPU before
it'll fire again.  So the CPU always gets a look in --- or ought to.

--
Dr Peter Chubb				        peter.chubb AT nicta.com.au
http://www.ssrg.nicta.com.au          Software Systems Research Group/NICTA

Patch

diff --git a/async.c b/async.c
index 85cc641..6173a59 100644
--- a/async.c
+++ b/async.c
@@ -106,7 +106,7 @@  void qemu_bh_schedule(QEMUBH *bh)
     bh->scheduled = 1;
     bh->idle = 0;
     /* stop the currently executing CPU to execute the BH ASAP */
-    qemu_notify_event();
+    //qemu_notify_event();
 }

Obviously this sucks as a patch, but without this hack, the system