Patchwork MXS: Convert mutexes in clock.c to spinlocks

login
register
mail settings
Submitter Marek Vasut
Date Dec. 18, 2011, 2:06 p.m.
Message ID <1324217174-6574-1-git-send-email-marek.vasut@gmail.com>
Download mbox | patch
Permalink /patch/132064/
State New
Headers show

Comments

Marek Vasut - Dec. 18, 2011, 2:06 p.m.
The mutexes can't be safely used under certain circumstances. I noticed this
issue during some network instability at home:

1) I received information about duplicate ipv6 address
2) I use amba-pl011 driver for console output

Backtrace produced (not 80-per-line aligned):

eth0: IPv6 duplicate address XXXX::YYYY:ZZZZ:WWWW:0 detected!
------------[ cut here ]------------
WARNING: at kernel/mutex.c:198 mutex_lock_nested+0x284/0x2c0()
Modules linked in:
[<c0013ab8>] (unwind_backtrace+0x0/0xf0) from [<c001d934>] (warn_slowpath_common+0x4c/0x64)
[<c001d934>] (warn_slowpath_common+0x4c/0x64) from [<c001d968>] (warn_slowpath_null+0x1c/0x24)
[<c001d968>] (warn_slowpath_null+0x1c/0x24) from [<c033a4f4>] (mutex_lock_nested+0x284/0x2c0)
[<c033a4f4>] (mutex_lock_nested+0x284/0x2c0) from [<c0017d88>] (clk_enable+0x20/0x48)
[<c0017d88>] (clk_enable+0x20/0x48) from [<c01dfbf0>] (pl011_console_write+0x20/0x74)
[<c01dfbf0>] (pl011_console_write+0x20/0x74) from [<c001da98>] (__call_console_drivers+0x7c/0x94)
[<c001da98>] (__call_console_drivers+0x7c/0x94) from [<c001df20>] (console_unlock+0xf8/0x244)
[<c001df20>] (console_unlock+0xf8/0x244) from [<c001e308>] (vprintk+0x29c/0x484)
[<c001e308>] (vprintk+0x29c/0x484) from [<c03355a8>] (printk+0x20/0x30)
[<c03355a8>] (printk+0x20/0x30) from [<c02df600>] (addrconf_dad_failure+0x148/0x158)
[<c02df600>] (addrconf_dad_failure+0x148/0x158) from [<c02ebde0>] (ndisc_rcv+0xb38/0xdb0)
[<c02ebde0>] (ndisc_rcv+0xb38/0xdb0) from [<c02f1b28>] (icmpv6_rcv+0x6ec/0x9c4)
[<c02f1b28>] (icmpv6_rcv+0x6ec/0x9c4) from [<c02da9a4>] (ip6_input_finish+0x144/0x4e0)
[<c02da9a4>] (ip6_input_finish+0x144/0x4e0) from [<c02db424>] (ip6_mc_input+0xb8/0x154)
[<c02db424>] (ip6_mc_input+0xb8/0x154) from [<c02db12c>] (ipv6_rcv+0x300/0x53c)
[<c02db12c>] (ipv6_rcv+0x300/0x53c) from [<c0267b78>] (__netif_receive_skb+0x240/0x420)
[<c0267b78>] (__netif_receive_skb+0x240/0x420) from [<c0267de8>] (process_backlog+0x90/0x150)
[<c0267de8>] (process_backlog+0x90/0x150) from [<c026a5f8>] (net_rx_action+0xc0/0x264)
[<c026a5f8>] (net_rx_action+0xc0/0x264) from [<c0023cfc>] (__do_softirq+0xa8/0x214)
[<c0023cfc>] (__do_softirq+0xa8/0x214) from [<c00242c0>] (irq_exit+0x8c/0x94)
[<c00242c0>] (irq_exit+0x8c/0x94) from [<c000f6d0>] (handle_IRQ+0x34/0x84)
[<c000f6d0>] (handle_IRQ+0x34/0x84) from [<c000e5d8>] (__irq_usr+0x38/0x80)
---[ end trace 32eab6a8dcdca9c0 ]---

So the problem summary:
1) ICMPv6 packet is received
2) Interrupt is produced, packet is actually received
3) Packet processing goes on (in tasklet)
4) Duplicate ipv6 address detected, warning produced
5) printk() called, calls pl011_console_write()
6) pl011_console_write() calls clk_enable()
7) clk_enable() for mxs does a mutex_lock(), which is wrong in this context

Signed-off-by: Marek Vasut <marek.vasut@gmail.com>
Cc: Wolfgang Denk <wd@denx.de>
Cc: Stefano Babic <sbabic@denx.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Huang Shijie <b32955@freescale.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
---
 arch/arm/mach-mxs/clock.c |   25 +++++++++++++++----------
 1 files changed, 15 insertions(+), 10 deletions(-)
Shawn Guo - Dec. 19, 2011, 3:27 a.m.
On Sun, Dec 18, 2011 at 03:06:13PM +0100, Marek Vasut wrote:
> The mutexes can't be safely used under certain circumstances. I noticed this
> issue during some network instability at home:
> 
Yes, this is a known issue.  And there was some discussion[1] about
why mutex is needed.  But I really have not thought about why we can
not use spinlock only, since using mutex only leads to the issue we
are seeing here, and using spinlock in enable/disable and mutex in
rate/parent will not work, because the mxs clocks have enable/disable
and rate/parent functions access the same register.  I know it's not
good to hold spinlock in rate/parent functions for a long time, but
do we have a way around rather than using spinlock for both sets of
functions?
Marek Vasut - Dec. 19, 2011, 4:03 a.m.
> On Sun, Dec 18, 2011 at 03:06:13PM +0100, Marek Vasut wrote:
> > The mutexes can't be safely used under certain circumstances. I noticed
> > this
> 
> > issue during some network instability at home:
> Yes, this is a known issue.  And there was some discussion[1] about
> why mutex is needed.

Thanks for pointing this out, I was unaware of it.

> But I really have not thought about why we can
> not use spinlock only, since using mutex only leads to the issue we
> are seeing here, and using spinlock in enable/disable and mutex in
> rate/parent will not work, because the mxs clocks have enable/disable
> and rate/parent functions access the same register.  I know it's not
> good to hold spinlock in rate/parent functions for a long time, but
> do we have a way around rather than using spinlock for both sets of
> functions?

Yea, spinlock is not good either. On the other hand, is it really held for so 
long ?

M
Russell King - ARM Linux - Dec. 19, 2011, 8:22 a.m.
On Mon, Dec 19, 2011 at 05:03:45AM +0100, Marek Vasut wrote:
> > On Sun, Dec 18, 2011 at 03:06:13PM +0100, Marek Vasut wrote:
> > > The mutexes can't be safely used under certain circumstances. I noticed
> > > this
> > 
> > > issue during some network instability at home:
> > Yes, this is a known issue.  And there was some discussion[1] about
> > why mutex is needed.
> 
> Thanks for pointing this out, I was unaware of it.
> 
> > But I really have not thought about why we can
> > not use spinlock only, since using mutex only leads to the issue we
> > are seeing here, and using spinlock in enable/disable and mutex in
> > rate/parent will not work, because the mxs clocks have enable/disable
> > and rate/parent functions access the same register.  I know it's not
> > good to hold spinlock in rate/parent functions for a long time, but
> > do we have a way around rather than using spinlock for both sets of
> > functions?
> 
> Yea, spinlock is not good either. On the other hand, is it really held for so 
> long ?

There is another solution to this, which I've pointed out before when
this has come up:

1. Convert all your drivers to _also_ use clk_prepare()/clk_unprepare().
   You need to do this anyway as it will become mandatory for the common
   clk stuff.

2. Rename your existing clk_enable()/clk_disable() implementation to
   clk_prepare()/clk_unprepare().  Ensure CONFIG_HAVE_CLK_PREPARE is
   selected.

3. Provide a new no-op clk_enable()/clk_disable() functions.

This fixes the issue because clk_prepare()/clk_unprepare() must only be
called from process contexts, whereas clk_enable()/clk_disable() may be
called from atomic contexts as well.
Marek Vasut - Dec. 19, 2011, 11:57 a.m.
> On Mon, Dec 19, 2011 at 05:03:45AM +0100, Marek Vasut wrote:
> > > On Sun, Dec 18, 2011 at 03:06:13PM +0100, Marek Vasut wrote:
> > > > The mutexes can't be safely used under certain circumstances. I
> > > > noticed this
> > > 
> > > > issue during some network instability at home:
> > > Yes, this is a known issue.  And there was some discussion[1] about
> > > why mutex is needed.
> > 
> > Thanks for pointing this out, I was unaware of it.
> > 
> > > But I really have not thought about why we can
> > > not use spinlock only, since using mutex only leads to the issue we
> > > are seeing here, and using spinlock in enable/disable and mutex in
> > > rate/parent will not work, because the mxs clocks have enable/disable
> > > and rate/parent functions access the same register.  I know it's not
> > > good to hold spinlock in rate/parent functions for a long time, but
> > > do we have a way around rather than using spinlock for both sets of
> > > functions?
> > 
> > Yea, spinlock is not good either. On the other hand, is it really held
> > for so long ?
> 
> There is another solution to this, which I've pointed out before when
> this has come up:
> 
> 1. Convert all your drivers to _also_ use clk_prepare()/clk_unprepare().
>    You need to do this anyway as it will become mandatory for the common
>    clk stuff.
> 
> 2. Rename your existing clk_enable()/clk_disable() implementation to
>    clk_prepare()/clk_unprepare().  Ensure CONFIG_HAVE_CLK_PREPARE is
>    selected.
> 
> 3. Provide a new no-op clk_enable()/clk_disable() functions.

Well, I'm still unsure how'd you then enable/disable the clock ? 
clk_prepare/clk_unprepare is good, but how would that help in avoiding the 
mutex/spinlock?

> 
> This fixes the issue because clk_prepare()/clk_unprepare() must only be
> called from process contexts, whereas clk_enable()/clk_disable() may be
> called from atomic contexts as well.

Sure, but I need to enable the clock in atomic context ...

M
Russell King - ARM Linux - Dec. 19, 2011, 12:15 p.m.
On Mon, Dec 19, 2011 at 12:57:05PM +0100, Marek Vasut wrote:
> > There is another solution to this, which I've pointed out before when
> > this has come up:
> > 
> > 1. Convert all your drivers to _also_ use clk_prepare()/clk_unprepare().
> >    You need to do this anyway as it will become mandatory for the common
> >    clk stuff.
> > 
> > 2. Rename your existing clk_enable()/clk_disable() implementation to
> >    clk_prepare()/clk_unprepare().  Ensure CONFIG_HAVE_CLK_PREPARE is
> >    selected.
> > 
> > 3. Provide a new no-op clk_enable()/clk_disable() functions.
> 
> Well, I'm still unsure how'd you then enable/disable the clock ? 
> clk_prepare/clk_unprepare is good, but how would that help in avoiding the 
> mutex/spinlock?

Then, quite simply, you don't understand clk_prepare/clk_unprepare.

Lets go back to the original discussion.  We have two classes of
implementations of the clk API:

1. We have those where clk_enable() and clk_disable() can be called from
   any context.

2. We have those where clk_enable() and clk_disable() can only be called
   from process context.

Then we have drivers - some of which call clk_enable() and clk_disable()
from atomic contexts.

Obviously, this situation can't be unified as is, because the different
requirements of the implementations and drivers are such that it's
impossible to have a single implementation which works across the board.

So, to allow a single implementation to proceed, it was decided that we'd
split clk_enable() and clk_disable() into two parts - an atomic part and
a non-atomic part.

The atomic part keeps the clk_enable() and clk_disable() name.  The
non-atomic part gets the new clk_prepare() and clk_unprepare() name.

Drivers must respect the following sequence throughout their code:

	clk = clk_get();
	clk_prepare(clk);
	clk_enable(clk);

	clk_disable(clk);
	clk_unprepare(clk);
	clk_put(clk);

In other words, it is _illegal_ for any driver to enable a clock which
they have not _themselves_ prepared - in the same way that it is illegal
to call clk_enable() on a clk that they have already put.

Drivers which only enable/disable their clocks from process context would
pair the clk_prepare()+clk_enable() calls together, and the clk_disable()+
clk_unprepare() calls.

Drivers which enable/disable their clocks from atomic contexts must ensure
that a clk is already prepared in some process context prior to calling
clk_enable() - as it is not permitted to call clk_prepare() from atomic
contexts.

For the PL011 UART, this means that:

1. Clocks are prepared when the port is opened _or_ when the console is
   configured.

2. Clocks are enabled when the port is opened _or_ when there is a message
   to output through the console.

3. Clocks are disabled when the port is closed _or_ when the console output
   has finished.

4. Clocks are unprepared when the port is closed.

This means that if you are capable of _only _process-context based clock
handing, and you use this driver, your clock will be enabled for the
console port at boot time and _never_ turned off.  That's the penalty
you _have_ to pay to satisfy the conditions imposed by your implementation.

So, there are two solutions:

1. Fix your clk API implementation such that clk_enable/clk_disable() are
   callable from atomic contexts.

2. Fix your clk API to rename the non-atomic clk_enable/clk_disable() to
   clk_prepare/clk_unprepare() and provide dummy clk_enable/clk_disable().

'Fixing' the driver is not an option - and would be just a case of moving
the existing clk_enable/clk_disable calls to where the clk_prepare/
clk_unprepare calls currently are - crippling those implementations which
are good citizens.

So, in summary, you have everything you require to fix it outside the
driver.  You just have to decide which of the two options you want to
proceed with, and actually (and finally) do it instead of endlessly
procrastinating and waiting for more and more bug reports (which is
exactly what has happened so far.)
Marek Vasut - Dec. 19, 2011, 8:54 p.m.
> On Mon, Dec 19, 2011 at 12:57:05PM +0100, Marek Vasut wrote:
> > > There is another solution to this, which I've pointed out before when
> > > this has come up:
> > > 
> > > 1. Convert all your drivers to _also_ use
> > > clk_prepare()/clk_unprepare().
> > > 
> > >    You need to do this anyway as it will become mandatory for the
> > >    common clk stuff.
> > > 
> > > 2. Rename your existing clk_enable()/clk_disable() implementation to
> > > 
> > >    clk_prepare()/clk_unprepare().  Ensure CONFIG_HAVE_CLK_PREPARE is
> > >    selected.
> > > 
> > > 3. Provide a new no-op clk_enable()/clk_disable() functions.
> > 
> > Well, I'm still unsure how'd you then enable/disable the clock ?
> > clk_prepare/clk_unprepare is good, but how would that help in avoiding
> > the mutex/spinlock?
> 
> Then, quite simply, you don't understand clk_prepare/clk_unprepare.
> 
> Lets go back to the original discussion.  We have two classes of
> implementations of the clk API:
> 
> 1. We have those where clk_enable() and clk_disable() can be called from
>    any context.
> 
> 2. We have those where clk_enable() and clk_disable() can only be called
>    from process context.

Yes, that's the MXS clock case and that's where it gets broken.
> 
> Then we have drivers - some of which call clk_enable() and clk_disable()
> from atomic contexts.

OK

> 
> Obviously, this situation can't be unified as is, because the different
> requirements of the implementations and drivers are such that it's
> impossible to have a single implementation which works across the board.
> 
> So, to allow a single implementation to proceed, it was decided that we'd
> split clk_enable() and clk_disable() into two parts - an atomic part and
> a non-atomic part.
> 
> The atomic part keeps the clk_enable() and clk_disable() name.  The
> non-atomic part gets the new clk_prepare() and clk_unprepare() name.

Ok, understood so far.
> 
> Drivers must respect the following sequence throughout their code:
> 
> 	clk = clk_get();
> 	clk_prepare(clk);
> 	clk_enable(clk);
> 
> 	clk_disable(clk);
> 	clk_unprepare(clk);
> 	clk_put(clk);
> 
> In other words, it is _illegal_ for any driver to enable a clock which
> they have not _themselves_ prepared - in the same way that it is illegal
> to call clk_enable() on a clk that they have already put.
> 
> Drivers which only enable/disable their clocks from process context would
> pair the clk_prepare()+clk_enable() calls together, and the clk_disable()+
> clk_unprepare() calls.

I see ... so basically, the locking will happen in clk_prepare() call.
> 
> Drivers which enable/disable their clocks from atomic contexts must ensure
> that a clk is already prepared in some process context prior to calling
> clk_enable() - as it is not permitted to call clk_prepare() from atomic
> contexts.
> 
> For the PL011 UART, this means that:
> 
> 1. Clocks are prepared when the port is opened _or_ when the console is
>    configured.

Ok, so here I either lock or lock-configure-unlock ... which one is it ?
> 
> 2. Clocks are enabled when the port is opened _or_ when there is a message
>    to output through the console.

But to enable the clock, I need to lock again. Or I can lock in prepare() call, 
but then the locking is meaningless because the lock'd be held all this time and 
noone else would be able to operate with the MXS clock.
> 
> 3. Clocks are disabled when the port is closed _or_ when the console output
>    has finished.
> 
> 4. Clocks are unprepared when the port is closed.
> 
> This means that if you are capable of _only _process-context based clock
> handing, and you use this driver, your clock will be enabled for the
> console port at boot time and _never_ turned off.  That's the penalty
> you _have_ to pay to satisfy the conditions imposed by your implementation.

My implementation ... of what? Clock framework or the driver? Why would the 
spinlock be bad for this case -- only because it'd be held for too long ?

The implementation of clk_enable/clk_disable for mxs is -- from my understanding 
-- really fast:

spinlock_lock()
increase usecount
enable/disable clock by doing RMW of one bit on one register
spinlock_unlock()

And this can be made even faster via MXS hardware specifics (like bitwise-set 
registers, avoiding the RMW).

> 
> So, there are two solutions:
> 
> 1. Fix your clk API implementation such that clk_enable/clk_disable() are
>    callable from atomic contexts.

Yea ... the spinlock is IMO reasonable, I see no problem with that, the code 
path is really fast. It can be made even faster I think.
> 
> 2. Fix your clk API to rename the non-atomic clk_enable/clk_disable() to
>    clk_prepare/clk_unprepare() and provide dummy clk_enable/clk_disable().

I don't see a point in this, this'd add way more overhead and overcomplicate the 
whole stuff.
> 
> 'Fixing' the driver is not an option - and would be just a case of moving
> the existing clk_enable/clk_disable calls to where the clk_prepare/
> clk_unprepare calls currently are - crippling those implementations which
> are good citizens.

Indeed.
> 
> So, in summary, you have everything you require to fix it outside the
> driver.  You just have to decide which of the two options you want to
> proceed with, and actually (and finally) do it instead of endlessly
> procrastinating and waiting for more and more bug reports (which is
> exactly what has happened so far.)

What the hell, I just recently found this bug and I submitted a patch right 
away! What are you complaining about?!

Grumble!

But thanks for the explanation!

M
Russell King - ARM Linux - Dec. 19, 2011, 9:03 p.m.
On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > So, in summary, you have everything you require to fix it outside the
> > driver.  You just have to decide which of the two options you want to
> > proceed with, and actually (and finally) do it instead of endlessly
> > procrastinating and waiting for more and more bug reports (which is
> > exactly what has happened so far.)
> 
> What the hell, I just recently found this bug and I submitted a patch right 
> away! What are you complaining about?!

If you want to take that attitude to my attempt to help you understand
the problem and see solutions, I'll ignore you permanently for being an
absolute twit.  I'm not going to spend time giving a detailed explaination
about the background and options over something to only then have it
immediately shoved back in my face with such a response.
Marek Vasut - Dec. 19, 2011, 9:05 p.m.
> On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > > So, in summary, you have everything you require to fix it outside the
> > > driver.  You just have to decide which of the two options you want to
> > > proceed with, and actually (and finally) do it instead of endlessly
> > > procrastinating and waiting for more and more bug reports (which is
> > > exactly what has happened so far.)
> > 
> > What the hell, I just recently found this bug and I submitted a patch
> > right away! What are you complaining about?!
> 
> If you want to take that attitude to my attempt to help you understand
> the problem and see solutions, I'll ignore you permanently for being an
> absolute twit.

Go ahead, but you accused me of procrastinating and waiting even if the first 
thing I did when I saw the bug was start solving it. That's just insane!

> I'm not going to spend time giving a detailed explaination
> about the background and options over something to only then have it
> immediately shoved back in my face with such a response.

I consider my response to the last part of your email appropriate.

M
Russell King - ARM Linux - Dec. 19, 2011, 9:28 p.m.
On Mon, Dec 19, 2011 at 10:05:25PM +0100, Marek Vasut wrote:
> > On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > > > So, in summary, you have everything you require to fix it outside the
> > > > driver.  You just have to decide which of the two options you want to
> > > > proceed with, and actually (and finally) do it instead of endlessly
> > > > procrastinating and waiting for more and more bug reports (which is
> > > > exactly what has happened so far.)
> > > 
> > > What the hell, I just recently found this bug and I submitted a patch
> > > right away! What are you complaining about?!
> > 
> > If you want to take that attitude to my attempt to help you understand
> > the problem and see solutions, I'll ignore you permanently for being an
> > absolute twit.
> 
> Go ahead, but you accused me of procrastinating and waiting even if the first 
> thing I did when I saw the bug was start solving it. That's just insane!
> 
> > I'm not going to spend time giving a detailed explaination
> > about the background and options over something to only then have it
> > immediately shoved back in my face with such a response.
> 
> I consider my response to the last part of your email appropriate.

Sorry, it wasn't directed personally at you, but to the entire MXS
community.  The facts over this are:

1. This problem has been known about since October.
2. It's been discussed several times - every time along the same lines.
3. There is zero apparant progress on the issue.

Here's two of the discussions over it, where I've said exactly the same
thing:

http://lists.arm.linux.org.uk/lurker/thread/20111018.173744.46c4bd76.en.html
http://lists.arm.linux.org.uk/lurker/thread/20111123.183640.222b05cf.en.html

So now, tell me - is this _finally_ going to get fixed in the MXS code,
or is the previous discussion about converting stuff to spinlocks etc
just going to be repeated yet again?
Marek Vasut - Dec. 19, 2011, 9:37 p.m.
> On Mon, Dec 19, 2011 at 10:05:25PM +0100, Marek Vasut wrote:
> > > On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > > > > So, in summary, you have everything you require to fix it outside
> > > > > the driver.  You just have to decide which of the two options you
> > > > > want to proceed with, and actually (and finally) do it instead of
> > > > > endlessly procrastinating and waiting for more and more bug
> > > > > reports (which is exactly what has happened so far.)
> > > > 
> > > > What the hell, I just recently found this bug and I submitted a patch
> > > > right away! What are you complaining about?!
> > > 
> > > If you want to take that attitude to my attempt to help you understand
> > > the problem and see solutions, I'll ignore you permanently for being an
> > > absolute twit.
> > 
> > Go ahead, but you accused me of procrastinating and waiting even if the
> > first thing I did when I saw the bug was start solving it. That's just
> > insane!
> > 
> > > I'm not going to spend time giving a detailed explaination
> > > about the background and options over something to only then have it
> > > immediately shoved back in my face with such a response.
> > 
> > I consider my response to the last part of your email appropriate.
> 
> Sorry, it wasn't directed personally at you, but to the entire MXS
> community.  The facts over this are:

Ah! I'm sorry I was so direct and rude too. I was unaware it was discussed 
before, I started this effort on my own just recently.

> 
> 1. This problem has been known about since October.

I was really away from the kernel community for a while so I didn't know.

> 2. It's been discussed several times - every time along the same lines.
> 3. There is zero apparant progress on the issue.
> 
> Here's two of the discussions over it, where I've said exactly the same
> thing:
> 
> http://lists.arm.linux.org.uk/lurker/thread/20111018.173744.46c4bd76.en.htm
> l
> http://lists.arm.linux.org.uk/lurker/thread/20111123.183640.222b05cf.en.ht
> ml
> 
> So now, tell me - is this _finally_ going to get fixed in the MXS code,
> or is the previous discussion about converting stuff to spinlocks etc
> just going to be repeated yet again?

Spinlocks are OK as far as the code within them is fast, right ? But hm ... 
actually, we might be able to toggle the clock in one instruction by using the 
bitwise set/clear registers. That way, we won't need the locks at all, but we'd 
loose the usecount ... which is useless anyway).

M
Marek Vasut - Dec. 19, 2011, 9:43 p.m.
> > On Mon, Dec 19, 2011 at 10:05:25PM +0100, Marek Vasut wrote:
> > > > On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > > > > > So, in summary, you have everything you require to fix it outside
> > > > > > the driver.  You just have to decide which of the two options you
> > > > > > want to proceed with, and actually (and finally) do it instead of
> > > > > > endlessly procrastinating and waiting for more and more bug
> > > > > > reports (which is exactly what has happened so far.)
> > > > > 
> > > > > What the hell, I just recently found this bug and I submitted a
> > > > > patch right away! What are you complaining about?!
> > > > 
> > > > If you want to take that attitude to my attempt to help you
> > > > understand the problem and see solutions, I'll ignore you
> > > > permanently for being an absolute twit.
> > > 
> > > Go ahead, but you accused me of procrastinating and waiting even if the
> > > first thing I did when I saw the bug was start solving it. That's just
> > > insane!
> > > 
> > > > I'm not going to spend time giving a detailed explaination
> > > > about the background and options over something to only then have it
> > > > immediately shoved back in my face with such a response.
> > > 
> > > I consider my response to the last part of your email appropriate.
> > 
> > Sorry, it wasn't directed personally at you, but to the entire MXS
> 
> > community.  The facts over this are:
> Ah! I'm sorry I was so direct and rude too. I was unaware it was discussed
> before, I started this effort on my own just recently.
> 
> > 1. This problem has been known about since October.
> 
> I was really away from the kernel community for a while so I didn't know.
> 
> > 2. It's been discussed several times - every time along the same lines.
> > 3. There is zero apparant progress on the issue.
> > 
> > Here's two of the discussions over it, where I've said exactly the same
> > thing:
> > 
> > http://lists.arm.linux.org.uk/lurker/thread/20111018.173744.46c4bd76.en.h
> > tm l
> > http://lists.arm.linux.org.uk/lurker/thread/20111123.183640.222b05cf.en.h
> > t ml
> > 
> > So now, tell me - is this _finally_ going to get fixed in the MXS code,
> > or is the previous discussion about converting stuff to spinlocks etc
> > just going to be repeated yet again?
> 
> Spinlocks are OK as far as the code within them is fast, right ? But hm ...
> actually, we might be able to toggle the clock in one instruction by using
> the bitwise set/clear registers. That way, we won't need the locks at all,
> but we'd loose the usecount ... which is useless anyway).
> 
> M

I see ... clk_set_rate() can be slow. OK, looking further into it.
Shawn Guo - Dec. 19, 2011, 11:23 p.m.
On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > So, in summary, you have everything you require to fix it outside the
> > driver.  You just have to decide which of the two options you want to
> > proceed with, and actually (and finally) do it instead of endlessly
> > procrastinating and waiting for more and more bug reports (which is
> > exactly what has happened so far.)
> 

Marek, the 'you' here really means me rather than you.

Russell, sorry for leaving the issue here for long time, and I'm working
on the patches you suggested right now.

Regards,
Shawn

> What the hell, I just recently found this bug and I submitted a patch right 
> away! What are you complaining about?!
> 
> Grumble!
> 
> But thanks for the explanation!
> 
> M
>
Marek Vasut - Dec. 19, 2011, 11:35 p.m.
> On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > > So, in summary, you have everything you require to fix it outside the
> > > driver.  You just have to decide which of the two options you want to
> > > proceed with, and actually (and finally) do it instead of endlessly
> > > procrastinating and waiting for more and more bug reports (which is
> > > exactly what has happened so far.)
> 
> Marek, the 'you' here really means me rather than you.

Sure, I figured as much after Russells reply.
> 
> Russell, sorry for leaving the issue here for long time, and I'm working
> on the patches you suggested right now.

I see ... shall I back off from this issue then? Can I somehow help you on this?

M
Shawn Guo - Dec. 20, 2011, 12:19 a.m.
On Tue, Dec 20, 2011 at 12:35:52AM +0100, Marek Vasut wrote:
> > On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > > > So, in summary, you have everything you require to fix it outside the
> > > > driver.  You just have to decide which of the two options you want to
> > > > proceed with, and actually (and finally) do it instead of endlessly
> > > > procrastinating and waiting for more and more bug reports (which is
> > > > exactly what has happened so far.)
> > 
> > Marek, the 'you' here really means me rather than you.
> 
> Sure, I figured as much after Russells reply.
> > 
> > Russell, sorry for leaving the issue here for long time, and I'm working
> > on the patches you suggested right now.
> 
> I see ... shall I back off from this issue then? Can I somehow help you on this?
> 
I will try to work out the series soon, and you can help to see if it
solves your problem.
Marek Vasut - Dec. 20, 2011, 3:09 a.m.
> On Tue, Dec 20, 2011 at 12:35:52AM +0100, Marek Vasut wrote:
> > > On Mon, Dec 19, 2011 at 09:54:25PM +0100, Marek Vasut wrote:
> > > > > So, in summary, you have everything you require to fix it outside
> > > > > the driver.  You just have to decide which of the two options you
> > > > > want to proceed with, and actually (and finally) do it instead of
> > > > > endlessly procrastinating and waiting for more and more bug
> > > > > reports (which is exactly what has happened so far.)
> > > 
> > > Marek, the 'you' here really means me rather than you.
> > 
> > Sure, I figured as much after Russells reply.
> > 
> > > Russell, sorry for leaving the issue here for long time, and I'm
> > > working on the patches you suggested right now.
> > 
> > I see ... shall I back off from this issue then? Can I somehow help you
> > on this?
> 
> I will try to work out the series soon, and you can help to see if it
> solves your problem.

All right, I'll be glad to review. Make sure to Cc me :)

Cheers!

M

Patch

diff --git a/arch/arm/mach-mxs/clock.c b/arch/arm/mach-mxs/clock.c
index a7093c8..677ecff 100644
--- a/arch/arm/mach-mxs/clock.c
+++ b/arch/arm/mach-mxs/clock.c
@@ -32,7 +32,7 @@ 
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/module.h>
-#include <linux/mutex.h>
+#include <linux/spinlock.h>
 #include <linux/platform_device.h>
 #include <linux/proc_fs.h>
 #include <linux/semaphore.h>
@@ -41,7 +41,7 @@ 
 #include <mach/clock.h>
 
 static LIST_HEAD(clocks);
-static DEFINE_MUTEX(clocks_mutex);
+static DEFINE_SPINLOCK(clocks_lock);
 
 /*-------------------------------------------------------------------------
  * Standard clock functions defined in include/linux/clk.h
@@ -80,13 +80,14 @@  static int __clk_enable(struct clk *clk)
 int clk_enable(struct clk *clk)
 {
 	int ret = 0;
+	unsigned long flags;
 
 	if (clk == NULL || IS_ERR(clk))
 		return -EINVAL;
 
-	mutex_lock(&clocks_mutex);
+	spin_lock_irqsave(&clocks_lock, flags);
 	ret = __clk_enable(clk);
-	mutex_unlock(&clocks_mutex);
+	spin_unlock_irqrestore(&clocks_lock, flags);
 
 	return ret;
 }
@@ -98,12 +99,14 @@  EXPORT_SYMBOL(clk_enable);
  */
 void clk_disable(struct clk *clk)
 {
+	unsigned long flags;
+
 	if (clk == NULL || IS_ERR(clk))
 		return;
 
-	mutex_lock(&clocks_mutex);
+	spin_lock_irqsave(&clocks_lock, flags);
 	__clk_disable(clk);
-	mutex_unlock(&clocks_mutex);
+	spin_unlock_irqrestore(&clocks_lock, flags);
 }
 EXPORT_SYMBOL(clk_disable);
 
@@ -142,14 +145,15 @@  EXPORT_SYMBOL(clk_round_rate);
  */
 int clk_set_rate(struct clk *clk, unsigned long rate)
 {
+	unsigned long flags;
 	int ret = -EINVAL;
 
 	if (clk == NULL || IS_ERR(clk) || clk->set_rate == NULL || rate == 0)
 		return ret;
 
-	mutex_lock(&clocks_mutex);
+	spin_lock_irqsave(&clocks_lock, flags);
 	ret = clk->set_rate(clk, rate);
-	mutex_unlock(&clocks_mutex);
+	spin_unlock_irqrestore(&clocks_lock, flags);
 
 	return ret;
 }
@@ -160,6 +164,7 @@  int clk_set_parent(struct clk *clk, struct clk *parent)
 {
 	int ret = -EINVAL;
 	struct clk *old;
+	unsigned long flags;
 
 	if (clk == NULL || IS_ERR(clk) || parent == NULL ||
 	    IS_ERR(parent) || clk->set_parent == NULL)
@@ -168,7 +173,7 @@  int clk_set_parent(struct clk *clk, struct clk *parent)
 	if (clk->usecount)
 		clk_enable(parent);
 
-	mutex_lock(&clocks_mutex);
+	spin_lock_irqsave(&clocks_lock, flags);
 	ret = clk->set_parent(clk, parent);
 	if (ret == 0) {
 		old = clk->parent;
@@ -176,7 +181,7 @@  int clk_set_parent(struct clk *clk, struct clk *parent)
 	} else {
 		old = parent;
 	}
-	mutex_unlock(&clocks_mutex);
+	spin_unlock_irqrestore(&clocks_lock, flags);
 
 	if (clk->usecount)
 		clk_disable(old);