diff mbox

3.18: lockdep problems in cpufreq

Message ID 002f01d0186b$2700b730$75022590$%brar@samsung.com
State New
Headers show

Commit Message

Yadwinder Singh Brar Dec. 15, 2014, 1:28 p.m. UTC
> -----Original Message-----
> From: Viresh Kumar [mailto:viresh.kumar@linaro.org]
> Sent: Monday, December 15, 2014 9:27 AM
> To: Rafael J. Wysocki; yadi.brar@samsung.com
> Cc: Russell King - ARM Linux; linux-arm-kernel@lists.infradead.org;
> linux-pm@vger.kernel.org; Eduardo Valentin
> Subject: Re: 3.18: lockdep problems in cpufreq
> 
> On 15 December 2014 at 03:53, Rafael J. Wysocki <rjw@rjwysocki.net>
> wrote:
> > On Sunday, December 14, 2014 09:36:55 PM Russell King - ARM Linux
> wrote:
> >> Here's a nice Christmas present one of my iMX6 machines gave me
> tonight.
> >> I haven't begun to look into it.
> 
> Is the culprit this one?
> 
> Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq
> policy with thermal constraints")
> 
> As this is what it changed:
> 
> @@ -316,21 +312,28 @@ static int cpufreq_thermal_notifier(struct
> notifier_block *nb,  {
>         struct cpufreq_policy *policy = data;
>         unsigned long max_freq = 0;
> +       struct cpufreq_cooling_device *cpufreq_dev;
> 
> -       if (event != CPUFREQ_ADJUST || notify_device == NOTIFY_INVALID)
> +       if (event != CPUFREQ_ADJUST)
>                 return 0;
> 
> -       if (cpumask_test_cpu(policy->cpu, &notify_device-
> >allowed_cpus))
> -               max_freq = notify_device->cpufreq_val;
> -       else
> -               return 0;
> +       mutex_lock(&cooling_cpufreq_lock);
> +       list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
> +               if (!cpumask_test_cpu(policy->cpu,
> +                                       &cpufreq_dev->allowed_cpus))
> +                       continue;
> +
> +               if (!cpufreq_dev->cpufreq_val)
> +                       cpufreq_dev->cpufreq_val = get_cpu_frequency(
> +                                       cpumask_any(&cpufreq_dev-
> >allowed_cpus),
> +                                       cpufreq_dev->cpufreq_state);
> 
> -       /* Never exceed user_policy.max */
> -       if (max_freq > policy->user_policy.max)
> -               max_freq = policy->user_policy.max;
> +               max_freq = cpufreq_dev->cpufreq_val;
> 
> -       if (policy->max != max_freq)
> -               cpufreq_verify_within_limits(policy, 0, max_freq);
> +               if (policy->max != max_freq)
> +                       cpufreq_verify_within_limits(policy, 0,
> max_freq);
> +       }
> +       mutex_unlock(&cooling_cpufreq_lock);
> 
>         return 0;
>  }
> 
> 
> >> ======================================================
> >> [ INFO: possible circular locking dependency detected ] 3.18.0+
> #1453
> >> Not tainted
> >> -------------------------------------------------------
> >> rc.local/770 is trying to acquire lock:
> >>  (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>]
> >> cpufreq_thermal_notifier+0x34/0xfc
> >>
> >> but task is already holding lock:
> >>  ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>]
> >> __blocking_notifier_call_chain+0x34/0x68
> >>
> >> which lock already depends on the new lock.
> >
> > Well, that's for Viresh.
> 
> Maybe not as the rework I have done is queued for this merge window.
> I was afraid really after reading the "offenders" discussion on IRC :)
> 
> Cc'd Yadwinder as well who wrote this patch.

Thanks :).

Unfortunately, I didn’t get any such warning though I tested
patch enabling CONFIG_PROVE_LOCKING before posting. It seems
Russell is trying to load imx_thermal as module and parallely
Changing cpufreq governor from userspace, which was not my
test case.

Anyways, after analyzing log and code,I think problem is not
in cpufreq_thermal_notifier which was modified in patch as
stated above. Actual problem is in __cpufreq_cooling_register
which is unnecessarily calling cpufreq_register_notifier()
inside section protected by cooling_cpufreq_lock.
Because cpufreq_policy_notifier_list).rwsem is already held
by store_scaling_governor when __cpufreq_cooling_register is
trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock. 

So I think following can fix the problem:



Best Regards,
Yadwinder

Comments

Russell King - ARM Linux Dec. 15, 2014, 1:46 p.m. UTC | #1
On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote:
> Unfortunately, I didn’t get any such warning though I tested
> patch enabling CONFIG_PROVE_LOCKING before posting. It seems
> Russell is trying to load imx_thermal as module and parallely
> Changing cpufreq governor from userspace, which was not my
> test case.

No.  Yes, imx_thermal is a module, which is loaded by udev very early
in the userspace boot.  Later on, in the rc.local, I poke the governor
from userspace.

This is evidenced by the bluetooth modules being loaded after imx_thermal:

Module                  Size  Used by
bnep                   10574  2
rfcomm                 33720  0
bluetooth             293598  10 bnep,rfcomm
nfsd                   90264  0
exportfs                3988  1 nfsd
hid_cypress             1763  0
snd_soc_fsl_spdif       9753  2
imx_pcm_dma             1137  1 snd_soc_fsl_spdif
imx_sdma               12885  2
imx2_wdt                3248  0
imx_thermal             5386  0
snd_soc_imx_spdif       1877  0

and the timestamp on the bluetooth message:

[   25.503739] Bluetooth: Core ver 2.19

vs the timestamp on the lockdep message:

[   29.499195] [ INFO: possible circular locking dependency detected ]

My rc.local does this:

    # Configure cpufreq
    echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
    echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

> Anyways, after analyzing log and code,I think problem is not
> in cpufreq_thermal_notifier which was modified in patch as
> stated above. Actual problem is in __cpufreq_cooling_register
> which is unnecessarily calling cpufreq_register_notifier()
> inside section protected by cooling_cpufreq_lock.
> Because cpufreq_policy_notifier_list).rwsem is already held
> by store_scaling_governor when __cpufreq_cooling_register is
> trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock. 
> 
> So I think following can fix the problem:
> 
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index ad09e51..622ea40 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *np,
>         cpufreq_dev->cpufreq_state = 0;
>         mutex_lock(&cooling_cpufreq_lock);
>  
> -       /* Register the notifier for first cpufreq cooling device */
> -       if (cpufreq_dev_count == 0)
> -               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> -                                         CPUFREQ_POLICY_NOTIFIER);
>         cpufreq_dev_count++;
>         list_add(&cpufreq_dev->node, &cpufreq_dev_list);
>  
>         mutex_unlock(&cooling_cpufreq_lock);
>  
> +       /* Register the notifier for first cpufreq cooling device */
> +       if (cpufreq_dev_count == 0)
> +               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> +                                         CPUFREQ_POLICY_NOTIFIER);

No, sorry, this patch is worse.

1. cpufreq_register_notifier() will never be called, even on the first
   caller to this code: at the point where it's tested for zero, it will
   be incremented to one by the previous code.

2. What happens if two threads come here?

   The first thread succeeds in grabbing cooling_cpufreq_lock.  The second
   thread stalls waiting for cooling_cpufreq_lock to be released.

   The first thread increments cpufreq_dev_count, adds to the list, and then
   releases the lock.  At this point, control may be passed to the second
   thread (since mutex_unlock() will wake it up.)  The second thread gets
   into the critical region, and increments cpufreq_dev_count again.

   By the time the first thread runs, cpufreq_dev_count may be two at this
   point.

Unfortunately, you do need some kind of synchronisation here.  If it's
not important when cpufreq_register_notifier() gets called, maybe this
can work:

	bool register;

	mutex_lock(&cooling_cpufreq_lock);
	register = cpufreq_dev_count++ == 0;
	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
	mutex_unlock(&cooling_cpufreq_lock);

	if (register)
		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
					  CPUFREQ_POLICY_NOTIFIER);

However, I suspect that may well be buggy if we have a cleanup path which
unregisters the notifier when cpufreq_dev_count is decremented to zero...
which we seem to have in cpufreq_cooling_unregister().

The answer may well be to have finer grained locking here:

- one lock to protect cpufreq_dev_list, which is only ever taken when
  adding or removing entries from it

- a second lock to protect cpufreq_dev_count and the calls to
  cpufreq_register_notifier() and cpufreq_unregister_notifier()

and you would /never/ take either of those two locks inside each other.
In other words:

	mutex_lock(&cooling_list_lock);
	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
	mutex_unlock(&cooling_list_lock);

	mutex_lock(&cooling_cpufreq_lock);
	if (cpufreq_dev_count++ == 0)
		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
					  CPUFREQ_POLICY_NOTIFIER);
	mutex_unlock(&cooling_cpufreq_lock);

and similar in the cleanup path.  The notifier itself would only ever
take the cooling_list_lock.
Viresh Kumar Dec. 15, 2014, 2:38 p.m. UTC | #2
On 15 December 2014 at 18:58, Yadwinder Singh Brar
<yadi.brar@samsung.com> wrote:
> Unfortunately, I didn’t get any such warning though I tested
> patch enabling CONFIG_PROVE_LOCKING before posting. It seems

Even I couldn't reproduce it :)

> Anyways, after analyzing log and code,I think problem is not
> in cpufreq_thermal_notifier which was modified in patch as
> stated above. Actual problem is in __cpufreq_cooling_register
> which is unnecessarily calling cpufreq_register_notifier()
> inside section protected by cooling_cpufreq_lock.

Agreed, and I was looking to reproduce first before writing
some crazy code.

> Because cpufreq_policy_notifier_list).rwsem is already held
> by store_scaling_governor when __cpufreq_cooling_register is
> trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock.

Correct.

> So I think following can fix the problem:
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index ad09e51..622ea40 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *np,
>         cpufreq_dev->cpufreq_state = 0;
>         mutex_lock(&cooling_cpufreq_lock);
>
> -       /* Register the notifier for first cpufreq cooling device */
> -       if (cpufreq_dev_count == 0)
> -               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> -                                         CPUFREQ_POLICY_NOTIFIER);
>         cpufreq_dev_count++;
>         list_add(&cpufreq_dev->node, &cpufreq_dev_list);
>
>         mutex_unlock(&cooling_cpufreq_lock);
>
> +       /* Register the notifier for first cpufreq cooling device */
> +       if (cpufreq_dev_count == 0)

This will always fail :), over that we need to access cpufreq_dev_count
from within the lock.

> +               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> +                                         CPUFREQ_POLICY_NOTIFIER);
>         return cool_dev;
>  }
Yadwinder Singh Brar Dec. 15, 2014, 2:54 p.m. UTC | #3
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Monday, December 15, 2014 7:17 PM
> To: Yadwinder Singh Brar
> Cc: 'Viresh Kumar'; 'Rafael J. Wysocki'; linux-arm-
> kernel@lists.infradead.org; linux-pm@vger.kernel.org; 'Eduardo
> Valentin'
> Subject: Re: 3.18: lockdep problems in cpufreq
> 
> On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote:
> > Unfortunately, I didn’t get any such warning though I tested patch
> > enabling CONFIG_PROVE_LOCKING before posting. It seems Russell is
> > trying to load imx_thermal as module and parallely Changing cpufreq
> > governor from userspace, which was not my test case.
> 
> No.  Yes, imx_thermal is a module, which is loaded by udev very early
> in the userspace boot.  Later on, in the rc.local, I poke the governor
> from userspace.
> 
> This is evidenced by the bluetooth modules being loaded after
> imx_thermal:
> 
> Module                  Size  Used by
> bnep                   10574  2
> rfcomm                 33720  0
> bluetooth             293598  10 bnep,rfcomm
> nfsd                   90264  0
> exportfs                3988  1 nfsd
> hid_cypress             1763  0
> snd_soc_fsl_spdif       9753  2
> imx_pcm_dma             1137  1 snd_soc_fsl_spdif
> imx_sdma               12885  2
> imx2_wdt                3248  0
> imx_thermal             5386  0
> snd_soc_imx_spdif       1877  0
> 
> and the timestamp on the bluetooth message:
> 
> [   25.503739] Bluetooth: Core ver 2.19
> 
> vs the timestamp on the lockdep message:
> 
> [   29.499195] [ INFO: possible circular locking dependency detected ]
> 
> My rc.local does this:
> 
>     # Configure cpufreq
>     echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy
>     echo performance >
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> 
> > Anyways, after analyzing log and code,I think problem is not in
> > cpufreq_thermal_notifier which was modified in patch as stated above.
> > Actual problem is in __cpufreq_cooling_register which is
> unnecessarily
> > calling cpufreq_register_notifier() inside section protected by
> > cooling_cpufreq_lock.
> > Because cpufreq_policy_notifier_list).rwsem is already held by
> > store_scaling_governor when __cpufreq_cooling_register is trying to
> > cpufreq_policy_notifier_list while holding cooling_cpufreq_lock.
> >
> > So I think following can fix the problem:
> >
> > diff --git a/drivers/thermal/cpu_cooling.c
> > b/drivers/thermal/cpu_cooling.c index ad09e51..622ea40 100644
> > --- a/drivers/thermal/cpu_cooling.c
> > +++ b/drivers/thermal/cpu_cooling.c
> > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node
> *np,
> >         cpufreq_dev->cpufreq_state = 0;
> >         mutex_lock(&cooling_cpufreq_lock);
> >
> > -       /* Register the notifier for first cpufreq cooling device */
> > -       if (cpufreq_dev_count == 0)
> > -
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > -                                         CPUFREQ_POLICY_NOTIFIER);
> >         cpufreq_dev_count++;
> >         list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> >
> >         mutex_unlock(&cooling_cpufreq_lock);
> >
> > +       /* Register the notifier for first cpufreq cooling device */
> > +       if (cpufreq_dev_count == 0)
> > +
> cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > +                                         CPUFREQ_POLICY_NOTIFIER);
> 
> No, sorry, this patch is worse.
> 

Actually it was not patch :P , just moved cpufreq_register_notifier()
out of locking, as it is(C&P) to explain my point.

> 1. cpufreq_register_notifier() will never be called, even on the first
>    caller to this code: at the point where it's tested for zero, it
> will
>    be incremented to one by the previous code.
> 
> 2. What happens if two threads come here?
> 
>    The first thread succeeds in grabbing cooling_cpufreq_lock.  The
> second
>    thread stalls waiting for cooling_cpufreq_lock to be released.
> 
>    The first thread increments cpufreq_dev_count, adds to the list, and
> then
>    releases the lock.  At this point, control may be passed to the
> second
>    thread (since mutex_unlock() will wake it up.)  The second thread
> gets
>    into the critical region, and increments cpufreq_dev_count again.
> 
>    By the time the first thread runs, cpufreq_dev_count may be two at
> this
>    point.

Yes, may be.

> 
> Unfortunately, you do need some kind of synchronisation here.  If it's
> not important when cpufreq_register_notifier() gets called, maybe this
> can work:
> 
> 	bool register;
> 
> 	mutex_lock(&cooling_cpufreq_lock);
> 	register = cpufreq_dev_count++ == 0;
> 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> 	mutex_unlock(&cooling_cpufreq_lock);
> 
> 	if (register)

register may be 0 in scenario you stated above in second point.
So this approach will not work.
 
> 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> 					  CPUFREQ_POLICY_NOTIFIER);
> 
> However, I suspect that may well be buggy if we have a cleanup path
> which unregisters the notifier when cpufreq_dev_count is decremented to
> zero...
> which we seem to have in cpufreq_cooling_unregister().
> 
> The answer may well be to have finer grained locking here:
> 
> - one lock to protect cpufreq_dev_list, which is only ever taken when
>   adding or removing entries from it
> 
> - a second lock to protect cpufreq_dev_count and the calls to
>   cpufreq_register_notifier() and cpufreq_unregister_notifier()
> 
> and you would /never/ take either of those two locks inside each other.
> In other words:
> 
> 	mutex_lock(&cooling_list_lock);
> 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> 	mutex_unlock(&cooling_list_lock);
> 
> 	mutex_lock(&cooling_cpufreq_lock);
> 	if (cpufreq_dev_count++ == 0)
> 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> 					  CPUFREQ_POLICY_NOTIFIER);
> 	mutex_unlock(&cooling_cpufreq_lock);
> 
> and similar in the cleanup path.  The notifier itself would only ever
> take the cooling_list_lock.
> 

I agree with this approach, if its fine for others also, I can implement
and post patch.

Thanks,
Yadwinder

> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.
Russell King - ARM Linux Dec. 15, 2014, 5:43 p.m. UTC | #4
On Mon, Dec 15, 2014 at 08:24:05PM +0530, Yadwinder Singh Brar wrote:
> > The answer may well be to have finer grained locking here:
> > 
> > - one lock to protect cpufreq_dev_list, which is only ever taken when
> >   adding or removing entries from it
> > 
> > - a second lock to protect cpufreq_dev_count and the calls to
> >   cpufreq_register_notifier() and cpufreq_unregister_notifier()
> > 
> > and you would /never/ take either of those two locks inside each other.
> > In other words:
> > 
> > 	mutex_lock(&cooling_list_lock);
> > 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > 	mutex_unlock(&cooling_list_lock);
> > 
> > 	mutex_lock(&cooling_cpufreq_lock);
> > 	if (cpufreq_dev_count++ == 0)
> > 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > 					  CPUFREQ_POLICY_NOTIFIER);
> > 	mutex_unlock(&cooling_cpufreq_lock);
> > 
> > and similar in the cleanup path.  The notifier itself would only ever
> > take the cooling_list_lock.
> > 
> 
> I agree with this approach, if its fine for others also, I can implement
> and post patch.

Yes, please do.
Rafael J. Wysocki Dec. 15, 2014, 9:41 p.m. UTC | #5
On Monday, December 15, 2014 05:43:36 PM Russell King - ARM Linux wrote:
> On Mon, Dec 15, 2014 at 08:24:05PM +0530, Yadwinder Singh Brar wrote:
> > > The answer may well be to have finer grained locking here:
> > > 
> > > - one lock to protect cpufreq_dev_list, which is only ever taken when
> > >   adding or removing entries from it
> > > 
> > > - a second lock to protect cpufreq_dev_count and the calls to
> > >   cpufreq_register_notifier() and cpufreq_unregister_notifier()
> > > 
> > > and you would /never/ take either of those two locks inside each other.
> > > In other words:
> > > 
> > > 	mutex_lock(&cooling_list_lock);
> > > 	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > > 	mutex_unlock(&cooling_list_lock);
> > > 
> > > 	mutex_lock(&cooling_cpufreq_lock);
> > > 	if (cpufreq_dev_count++ == 0)
> > > 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > > 					  CPUFREQ_POLICY_NOTIFIER);
> > > 	mutex_unlock(&cooling_cpufreq_lock);
> > > 
> > > and similar in the cleanup path.  The notifier itself would only ever
> > > take the cooling_list_lock.
> > > 
> > 
> > I agree with this approach, if its fine for others also, I can implement
> > and post patch.
> 
> Yes, please do.

Indeed.  We have a regression here to fix.

Rafael
Viresh Kumar Dec. 16, 2014, 3:37 a.m. UTC | #6
On 15 December 2014 at 20:24, Yadwinder Singh Brar
<yadi.brar@samsung.com> wrote:
>> Unfortunately, you do need some kind of synchronisation here.  If it's
>> not important when cpufreq_register_notifier() gets called, maybe this
>> can work:
>>
>>       bool register;
>>
>>       mutex_lock(&cooling_cpufreq_lock);
>>       register = cpufreq_dev_count++ == 0;
>>       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
>>       mutex_unlock(&cooling_cpufreq_lock);
>>
>>       if (register)
>
> register may be 0 in scenario you stated above in second point.
> So this approach will not work.

I didn't understood what you meant here. register will be zero only
for one of the threads.
Viresh Kumar Dec. 16, 2014, 3:41 a.m. UTC | #7
On 16 December 2014 at 04:39, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> Well, here's a patch which I'm running on top of 3.18 at the moment,
> which is basically what I described in my email, and I'm running with it
> and it is without any lockdep complaint.

We need two separate patches now, one for 3.18 and other one for 3.19-rc.
3.19 has see lots of changes in this particular file and so we need to
change few things here.

> 8<===
> From: Russell King <rmk+kernel@arm.linux.org.uk>
> thermal: cpu_cooling: fix lockdep problems in cpu_cooling
>
> A recent change to the cpu_cooling code introduced a AB-BA deadlock
> scenario between the cpufreq_policy_notifier_list rwsem and the
> cooling_cpufreq_lock.  This is caused by cooling_cpufreq_lock being held
> before the registration/removal of the notifier block (an operation
> which takes the rwsem), and the notifier code itself which takes the
> locks in the reverse order.
>
> Solve this by moving to finer grained locking - use one mutex to protect
> the cpufreq_dev_list as a whole, and a separate lock to ensure correct
> ordering of cpufreq notifier registration and removal.
>
> I considered taking the cooling_list_lock within cooling_cpufreq_lock to
> protect the registration sequence as a whole, but that adds a dependency
> between these two locks which is best avoided (lest someone tries to
> take those two new locks in the reverse order.)  In any case, it's safer
> to have an empty cpufreq_dev_list than to have unnecessary dependencies
> between locks.
>
> Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with thermal constraints")
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
>
>  drivers/thermal/cpu_cooling.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index ad09e51ffae4..9e42c6f30785 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -57,6 +57,7 @@ static DEFINE_MUTEX(cooling_cpufreq_lock);
>
>  static unsigned int cpufreq_dev_count;
>
> +static DEFINE_MUTEX(cooling_list_lock);
>  static LIST_HEAD(cpufreq_dev_list);
>
>  /**
> @@ -317,7 +318,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>         if (event != CPUFREQ_ADJUST)
>                 return 0;
>
> -       mutex_lock(&cooling_cpufreq_lock);
> +       mutex_lock(&cooling_list_lock);
>         list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
>                 if (!cpumask_test_cpu(policy->cpu,
>                                         &cpufreq_dev->allowed_cpus))
> @@ -333,7 +334,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>                 if (policy->max != max_freq)
>                         cpufreq_verify_within_limits(policy, 0, max_freq);
>         }
> -       mutex_unlock(&cooling_cpufreq_lock);
> +       mutex_unlock(&cooling_list_lock);
>
>         return 0;
>  }
> @@ -482,6 +483,11 @@ __cpufreq_cooling_register(struct device_node *np,
>         }
>         cpufreq_dev->cool_dev = cool_dev;
>         cpufreq_dev->cpufreq_state = 0;
> +
> +       mutex_lock(&cooling_list_lock);
> +       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> +       mutex_unlock(&cooling_list_lock);
> +
>         mutex_lock(&cooling_cpufreq_lock);
>
>         /* Register the notifier for first cpufreq cooling device */
> @@ -489,7 +495,6 @@ __cpufreq_cooling_register(struct device_node *np,
>                 cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
>                                           CPUFREQ_POLICY_NOTIFIER);
>         cpufreq_dev_count++;
> -       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
>
>         mutex_unlock(&cooling_cpufreq_lock);
>
> @@ -553,7 +558,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>
>         cpufreq_dev = cdev->devdata;
>         mutex_lock(&cooling_cpufreq_lock);
> -       list_del(&cpufreq_dev->node);
>         cpufreq_dev_count--;
>
>         /* Unregister the notifier for the last cpufreq cooling device */
> @@ -562,6 +566,10 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>                                             CPUFREQ_POLICY_NOTIFIER);
>         mutex_unlock(&cooling_cpufreq_lock);
>
> +       mutex_lock(&cooling_list_lock);
> +       list_del(&cpufreq_dev->node);
> +       mutex_unlock(&cooling_list_lock);
> +
>         thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
>         release_idr(&cpufreq_idr, cpufreq_dev->id);
>         kfree(cpufreq_dev);

For 3.18

Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Russell King - ARM Linux Jan. 6, 2015, 3:38 p.m. UTC | #8
On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> On 16 December 2014 at 04:39, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > which is basically what I described in my email, and I'm running with it
> > and it is without any lockdep complaint.
> 
> We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> 3.19 has see lots of changes in this particular file and so we need to
> change few things here.

I'm sorry, I don't have the bandwidth to create a patch for 3.19-rc
right now.
Russell King - ARM Linux May 18, 2015, 6:56 p.m. UTC | #9
On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> On 16 December 2014 at 04:39, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > which is basically what I described in my email, and I'm running with it
> > and it is without any lockdep complaint.
> 
> We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> 3.19 has see lots of changes in this particular file and so we need to
> change few things here.

What happened with this?  I'm still carrying the patch.

> > 8<===
> > From: Russell King <rmk+kernel@arm.linux.org.uk>
> > thermal: cpu_cooling: fix lockdep problems in cpu_cooling
> >
> > A recent change to the cpu_cooling code introduced a AB-BA deadlock
> > scenario between the cpufreq_policy_notifier_list rwsem and the
> > cooling_cpufreq_lock.  This is caused by cooling_cpufreq_lock being held
> > before the registration/removal of the notifier block (an operation
> > which takes the rwsem), and the notifier code itself which takes the
> > locks in the reverse order.
> >
> > Solve this by moving to finer grained locking - use one mutex to protect
> > the cpufreq_dev_list as a whole, and a separate lock to ensure correct
> > ordering of cpufreq notifier registration and removal.
> >
> > I considered taking the cooling_list_lock within cooling_cpufreq_lock to
> > protect the registration sequence as a whole, but that adds a dependency
> > between these two locks which is best avoided (lest someone tries to
> > take those two new locks in the reverse order.)  In any case, it's safer
> > to have an empty cpufreq_dev_list than to have unnecessary dependencies
> > between locks.
> >
> > Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with thermal constraints")
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > ---
> >
> >  drivers/thermal/cpu_cooling.c | 16 ++++++++++++----
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> > index ad09e51ffae4..9e42c6f30785 100644
> > --- a/drivers/thermal/cpu_cooling.c
> > +++ b/drivers/thermal/cpu_cooling.c
> > @@ -57,6 +57,7 @@ static DEFINE_MUTEX(cooling_cpufreq_lock);
> >
> >  static unsigned int cpufreq_dev_count;
> >
> > +static DEFINE_MUTEX(cooling_list_lock);
> >  static LIST_HEAD(cpufreq_dev_list);
> >
> >  /**
> > @@ -317,7 +318,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> >         if (event != CPUFREQ_ADJUST)
> >                 return 0;
> >
> > -       mutex_lock(&cooling_cpufreq_lock);
> > +       mutex_lock(&cooling_list_lock);
> >         list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
> >                 if (!cpumask_test_cpu(policy->cpu,
> >                                         &cpufreq_dev->allowed_cpus))
> > @@ -333,7 +334,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> >                 if (policy->max != max_freq)
> >                         cpufreq_verify_within_limits(policy, 0, max_freq);
> >         }
> > -       mutex_unlock(&cooling_cpufreq_lock);
> > +       mutex_unlock(&cooling_list_lock);
> >
> >         return 0;
> >  }
> > @@ -482,6 +483,11 @@ __cpufreq_cooling_register(struct device_node *np,
> >         }
> >         cpufreq_dev->cool_dev = cool_dev;
> >         cpufreq_dev->cpufreq_state = 0;
> > +
> > +       mutex_lock(&cooling_list_lock);
> > +       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > +       mutex_unlock(&cooling_list_lock);
> > +
> >         mutex_lock(&cooling_cpufreq_lock);
> >
> >         /* Register the notifier for first cpufreq cooling device */
> > @@ -489,7 +495,6 @@ __cpufreq_cooling_register(struct device_node *np,
> >                 cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> >                                           CPUFREQ_POLICY_NOTIFIER);
> >         cpufreq_dev_count++;
> > -       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> >
> >         mutex_unlock(&cooling_cpufreq_lock);
> >
> > @@ -553,7 +558,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
> >
> >         cpufreq_dev = cdev->devdata;
> >         mutex_lock(&cooling_cpufreq_lock);
> > -       list_del(&cpufreq_dev->node);
> >         cpufreq_dev_count--;
> >
> >         /* Unregister the notifier for the last cpufreq cooling device */
> > @@ -562,6 +566,10 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
> >                                             CPUFREQ_POLICY_NOTIFIER);
> >         mutex_unlock(&cooling_cpufreq_lock);
> >
> > +       mutex_lock(&cooling_list_lock);
> > +       list_del(&cpufreq_dev->node);
> > +       mutex_unlock(&cooling_list_lock);
> > +
> >         thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
> >         release_idr(&cpufreq_idr, cpufreq_dev->id);
> >         kfree(cpufreq_dev);
> 
> For 3.18
> 
> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Rafael J. Wysocki May 18, 2015, 10:05 p.m. UTC | #10
On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote:
> On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> > On 16 December 2014 at 04:39, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> > > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > > which is basically what I described in my email, and I'm running with it
> > > and it is without any lockdep complaint.
> > 
> > We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> > 3.19 has see lots of changes in this particular file and so we need to
> > change few things here.
> 
> What happened with this?  I'm still carrying the patch.

This should go in through the thermal tree.  Eduardo?

> 
> > > 8<===
> > > From: Russell King <rmk+kernel@arm.linux.org.uk>
> > > thermal: cpu_cooling: fix lockdep problems in cpu_cooling
> > >
> > > A recent change to the cpu_cooling code introduced a AB-BA deadlock
> > > scenario between the cpufreq_policy_notifier_list rwsem and the
> > > cooling_cpufreq_lock.  This is caused by cooling_cpufreq_lock being held
> > > before the registration/removal of the notifier block (an operation
> > > which takes the rwsem), and the notifier code itself which takes the
> > > locks in the reverse order.
> > >
> > > Solve this by moving to finer grained locking - use one mutex to protect
> > > the cpufreq_dev_list as a whole, and a separate lock to ensure correct
> > > ordering of cpufreq notifier registration and removal.
> > >
> > > I considered taking the cooling_list_lock within cooling_cpufreq_lock to
> > > protect the registration sequence as a whole, but that adds a dependency
> > > between these two locks which is best avoided (lest someone tries to
> > > take those two new locks in the reverse order.)  In any case, it's safer
> > > to have an empty cpufreq_dev_list than to have unnecessary dependencies
> > > between locks.
> > >
> > > Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with thermal constraints")
> > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > > ---
> > >
> > >  drivers/thermal/cpu_cooling.c | 16 ++++++++++++----
> > >  1 file changed, 12 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> > > index ad09e51ffae4..9e42c6f30785 100644
> > > --- a/drivers/thermal/cpu_cooling.c
> > > +++ b/drivers/thermal/cpu_cooling.c
> > > @@ -57,6 +57,7 @@ static DEFINE_MUTEX(cooling_cpufreq_lock);
> > >
> > >  static unsigned int cpufreq_dev_count;
> > >
> > > +static DEFINE_MUTEX(cooling_list_lock);
> > >  static LIST_HEAD(cpufreq_dev_list);
> > >
> > >  /**
> > > @@ -317,7 +318,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> > >         if (event != CPUFREQ_ADJUST)
> > >                 return 0;
> > >
> > > -       mutex_lock(&cooling_cpufreq_lock);
> > > +       mutex_lock(&cooling_list_lock);
> > >         list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
> > >                 if (!cpumask_test_cpu(policy->cpu,
> > >                                         &cpufreq_dev->allowed_cpus))
> > > @@ -333,7 +334,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> > >                 if (policy->max != max_freq)
> > >                         cpufreq_verify_within_limits(policy, 0, max_freq);
> > >         }
> > > -       mutex_unlock(&cooling_cpufreq_lock);
> > > +       mutex_unlock(&cooling_list_lock);
> > >
> > >         return 0;
> > >  }
> > > @@ -482,6 +483,11 @@ __cpufreq_cooling_register(struct device_node *np,
> > >         }
> > >         cpufreq_dev->cool_dev = cool_dev;
> > >         cpufreq_dev->cpufreq_state = 0;
> > > +
> > > +       mutex_lock(&cooling_list_lock);
> > > +       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > > +       mutex_unlock(&cooling_list_lock);
> > > +
> > >         mutex_lock(&cooling_cpufreq_lock);
> > >
> > >         /* Register the notifier for first cpufreq cooling device */
> > > @@ -489,7 +495,6 @@ __cpufreq_cooling_register(struct device_node *np,
> > >                 cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> > >                                           CPUFREQ_POLICY_NOTIFIER);
> > >         cpufreq_dev_count++;
> > > -       list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > >
> > >         mutex_unlock(&cooling_cpufreq_lock);
> > >
> > > @@ -553,7 +558,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
> > >
> > >         cpufreq_dev = cdev->devdata;
> > >         mutex_lock(&cooling_cpufreq_lock);
> > > -       list_del(&cpufreq_dev->node);
> > >         cpufreq_dev_count--;
> > >
> > >         /* Unregister the notifier for the last cpufreq cooling device */
> > > @@ -562,6 +566,10 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
> > >                                             CPUFREQ_POLICY_NOTIFIER);
> > >         mutex_unlock(&cooling_cpufreq_lock);
> > >
> > > +       mutex_lock(&cooling_list_lock);
> > > +       list_del(&cpufreq_dev->node);
> > > +       mutex_unlock(&cooling_list_lock);
> > > +
> > >         thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
> > >         release_idr(&cpufreq_idr, cpufreq_dev->id);
> > >         kfree(cpufreq_dev);
> > 
> > For 3.18
> > 
> > Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
> 
>
Russell King - ARM Linux Aug. 11, 2015, 5:03 p.m. UTC | #11
On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote:
> On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote:
> > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> > > On 16 December 2014 at 04:39, Russell King - ARM Linux
> > > <linux@arm.linux.org.uk> wrote:
> > > > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > > > which is basically what I described in my email, and I'm running with it
> > > > and it is without any lockdep complaint.
> > > 
> > > We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> > > 3.19 has see lots of changes in this particular file and so we need to
> > > change few things here.
> > 
> > What happened with this?  I'm still carrying the patch.
> 
> This should go in through the thermal tree.  Eduardo?

Having waited a long time for any kind of response from Eduardo, I've
given up.  My conclusion is that Eduardo isn't interested in this.

I've re-checked, and the AB-BA deadlock is still there in the latest
code.  So, I've taken it upon myself to throw this into my for-next
branch to force the issue - not something I _want_ to do, but I'm doing
this out of frustration.  It's clear to me that "playing nice" by email
does _not_ work with some people.

I'm rather hoping that Stephen reports a merge conflict with linux-next
this evening to highlight this situation.  I've added additional commentry
to the commit message on the patch giving the reason why I've done this,
and the relevant message IDs showing the past history.

I've not decided whether I'm going to ask Linus to take this patch
directly or not, that rather depends whether there's any co-operation
from Eduardo on this.  I'd rather Eduardo took the patch.

The patch I have has had to be updated again for changes to the driver,
but I really don't see the point of re-posting it just for it to be
ignored yet again.

I'm really disappointed by this dysfunctional state of affairs, and
that what should be an urgent fix for an observable problem is still
not merged some nine months after it was first identified.
Viresh Kumar Aug. 12, 2015, 5:16 a.m. UTC | #12
On 11-08-15, 18:03, Russell King - ARM Linux wrote:
> On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote:
> > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote:
> > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> > > > On 16 December 2014 at 04:39, Russell King - ARM Linux
> > > > <linux@arm.linux.org.uk> wrote:
> > > > > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > > > > which is basically what I described in my email, and I'm running with it
> > > > > and it is without any lockdep complaint.
> > > > 
> > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> > > > 3.19 has see lots of changes in this particular file and so we need to
> > > > change few things here.

Was the patch for latest kernel ever circulated ? Probably that's why
Eduardo never got back to it.
Russell King - ARM Linux Aug. 12, 2015, 7:21 a.m. UTC | #13
On Wed, Aug 12, 2015 at 10:46:59AM +0530, Viresh Kumar wrote:
> On 11-08-15, 18:03, Russell King - ARM Linux wrote:
> > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote:
> > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote:
> > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux
> > > > > <linux@arm.linux.org.uk> wrote:
> > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > > > > > which is basically what I described in my email, and I'm running with it
> > > > > > and it is without any lockdep complaint.
> > > > > 
> > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> > > > > 3.19 has see lots of changes in this particular file and so we need to
> > > > > change few things here.
> 
> Was the patch for latest kernel ever circulated ? Probably that's why
> Eduardo never got back to it.

Maybe not - I often lose track of which patches are in what state -
tracking the state of 250+ patches over many months and in some cases
_years_ is not easy.

That's still no reason for Eduardo to _completely_ ignore this thread,
especially when Rafael explicitly tried to get comment from Eduardo.
Viresh Kumar Aug. 12, 2015, 7:35 a.m. UTC | #14
On 12-08-15, 08:21, Russell King - ARM Linux wrote:
> Maybe not - I often lose track of which patches are in what state -
> tracking the state of 250+ patches over many months and in some cases
> _years_ is not easy.

I agree..

> That's still no reason for Eduardo to _completely_ ignore this thread,
> especially when Rafael explicitly tried to get comment from Eduardo.

Oh, no. I am not trying to be anybody's advocate here. I was just
highlighting the point here, that I remembered.

Even I was quite busy then and couldn't get back to it.

Lets fix this now. I believe you have a patch for this based on latest
kernel? Mind pasting that here for quick review?
Russell King - ARM Linux Aug. 12, 2015, 7:49 a.m. UTC | #15
On Wed, Aug 12, 2015 at 01:05:30PM +0530, Viresh Kumar wrote:
> On 12-08-15, 08:21, Russell King - ARM Linux wrote:
> > Maybe not - I often lose track of which patches are in what state -
> > tracking the state of 250+ patches over many months and in some cases
> > _years_ is not easy.
> 
> I agree..
> 
> > That's still no reason for Eduardo to _completely_ ignore this thread,
> > especially when Rafael explicitly tried to get comment from Eduardo.
> 
> Oh, no. I am not trying to be anybody's advocate here. I was just
> highlighting the point here, that I remembered.
> 
> Even I was quite busy then and couldn't get back to it.
> 
> Lets fix this now. I believe you have a patch for this based on latest
> kernel? Mind pasting that here for quick review?

I don't think it's changed much since the original, but I've posted
what's in linux-next as of today, and it doesn't conflict with anything.
So there's no excuse about getting it merged.

The problem will be back-porting it to stable kernels, because I think
it's had to be updated at least a couple of times.  I don't have the old
versions anymore, so I'm just going to say "not my problem" - sorry.
Viresh Kumar Aug. 12, 2015, 8:11 a.m. UTC | #16
On 12-08-15, 08:41, Russell King wrote:
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
>  /**
> @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>  	switch (event) {
>  
>  	case CPUFREQ_ADJUST:
> -		mutex_lock(&cooling_cpufreq_lock);
> +		mutex_lock(&cooling_list_lock);

There is one more place where the list's locking needs update:
cpufreq_cooling_get_level().

>  		list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
>  			if (!cpumask_test_cpu(policy->cpu,
>  					      &cpufreq_dev->allowed_cpus))
> @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>  				cpufreq_verify_within_limits(policy, 0,
>  							     max_freq);
>  		}
> -		mutex_unlock(&cooling_cpufreq_lock);
> +		mutex_unlock(&cooling_list_lock);
>  		break;
>  	default:
>  		return NOTIFY_DONE;
> @@ -865,12 +868,15 @@ __cpufreq_cooling_register(struct device_node *np,
>  	cpufreq_dev->cool_dev = cool_dev;
>  
>  	mutex_lock(&cooling_cpufreq_lock);
> +	mutex_lock(&cooling_list_lock);

Why is the list lock taken from within the existing lock here? and ...

> +	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> +	mutex_unlock(&cooling_list_lock);
>  
>  	/* Register the notifier for first cpufreq cooling device */
> -	if (list_empty(&cpufreq_dev_list))
> +	if (cpufreq_dev_count == 0)
>  		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
>  					  CPUFREQ_POLICY_NOTIFIER);
> -	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> +	cpufreq_dev_count++;

Maybe:
        if (!cpufreq_dev_count++)
                cpufreq_register_notifier();

>  
>  	mutex_unlock(&cooling_cpufreq_lock);
>  
> @@ -1014,14 +1020,18 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>  
>  	cpufreq_dev = cdev->devdata;
>  	mutex_lock(&cooling_cpufreq_lock);
> -	list_del(&cpufreq_dev->node);
> +	cpufreq_dev_count--;
>  
>  	/* Unregister the notifier for the last cpufreq cooling device */
> -	if (list_empty(&cpufreq_dev_list))
> +	if (cpufreq_dev_count == 0)

Maybe:
        if (!--cpufreq_dev_count)
                cpufreq_register_notifier();

>  		cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block,
>  					    CPUFREQ_POLICY_NOTIFIER);
>  	mutex_unlock(&cooling_cpufreq_lock);
>  
> +	mutex_lock(&cooling_list_lock);

... The same list lock is not taken from within the earlier critical
section?

> +	list_del(&cpufreq_dev->node);
> +	mutex_unlock(&cooling_list_lock);
> +
>  	thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
>  	release_idr(&cpufreq_idr, cpufreq_dev->id);
>  	kfree(cpufreq_dev->time_in_idle_timestamp);
> -- 
> 2.1.0
Russell King - ARM Linux Aug. 12, 2015, 9:08 a.m. UTC | #17
On Wed, Aug 12, 2015 at 01:41:43PM +0530, Viresh Kumar wrote:
> On 12-08-15, 08:41, Russell King wrote:
> > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> >  /**
> > @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> >  	switch (event) {
> >  
> >  	case CPUFREQ_ADJUST:
> > -		mutex_lock(&cooling_cpufreq_lock);
> > +		mutex_lock(&cooling_list_lock);
> 
> There is one more place where the list's locking needs update:
> cpufreq_cooling_get_level().
> 
> >  		list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
> >  			if (!cpumask_test_cpu(policy->cpu,
> >  					      &cpufreq_dev->allowed_cpus))
> > @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> >  				cpufreq_verify_within_limits(policy, 0,
> >  							     max_freq);
> >  		}
> > -		mutex_unlock(&cooling_cpufreq_lock);
> > +		mutex_unlock(&cooling_list_lock);
> >  		break;
> >  	default:
> >  		return NOTIFY_DONE;
> > @@ -865,12 +868,15 @@ __cpufreq_cooling_register(struct device_node *np,
> >  	cpufreq_dev->cool_dev = cool_dev;
> >  
> >  	mutex_lock(&cooling_cpufreq_lock);
> > +	mutex_lock(&cooling_list_lock);
> 
> Why is the list lock taken from within the existing lock here? and ...

To preserve the behaviour of the code, which is to atomically add to
the list and register the notifier.

> > +	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > +	mutex_unlock(&cooling_list_lock);
> >  
> >  	/* Register the notifier for first cpufreq cooling device */
> > -	if (list_empty(&cpufreq_dev_list))
> > +	if (cpufreq_dev_count == 0)
> >  		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
> >  					  CPUFREQ_POLICY_NOTIFIER);
> > -	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> > +	cpufreq_dev_count++;
> 
> Maybe:
>         if (!cpufreq_dev_count++)
>                 cpufreq_register_notifier();

Possibly, had the code previously been like that.  See:

commit 2479bb6443d6a793f896219a34bfab0cc410f0b4
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Thu Dec 4 09:42:04 2014 +0530

which post-dates this patch, and removes this "style" in favour of using
the broken list mechanism.  Frankly... you don't have a leg to stand on
because _you_ knew about this problem, and you saw my patch.  So removing
something that my patch to fix a serious bug relied upon...

You know what?  At this point, I'm just not going to bother.  You fix
this lockdep problem, I'm obviously useless at kernel stuff.
Russell King - ARM Linux Aug. 12, 2015, 9:08 a.m. UTC | #18
On Wed, Aug 12, 2015 at 01:42:57PM +0530, Viresh Kumar wrote:
> On 12-08-15, 08:49, Russell King - ARM Linux wrote:
> > The problem will be back-porting it to stable kernels, because I think
> > it's had to be updated at least a couple of times.  I don't have the old
> > versions anymore, so I'm just going to say "not my problem" - sorry.
> 
> Your old 3.18 version :)

And now generate versions for all the stable kernels inbetween.  Now
your problem.  I'm sick of this.
Viresh Kumar Aug. 12, 2015, 9:18 a.m. UTC | #19
On 12-08-15, 10:08, Russell King - ARM Linux wrote:
> Possibly, had the code previously been like that.  See:
> 
> commit 2479bb6443d6a793f896219a34bfab0cc410f0b4
> Author: Viresh Kumar <viresh.kumar@linaro.org>
> Date:   Thu Dec 4 09:42:04 2014 +0530
> 
> which post-dates this patch, and removes this "style" in favour of using
> the broken list mechanism.

Just to make sure I understand it properly, do you want to say that I
added 2479bb6 patch after you reported this problem & your patch?

If yes, then I don't see how my patch came after yours. Mine was
written on 4/Dec/2014 and committed on 8/Dec/2014 and you first
reported the problem on 14/Dec/2014.

> Frankly... you don't have a leg to stand on

I can accept that :)

> because _you_ knew about this problem, and you saw my patch.  So removing
> something that my patch to fix a serious bug relied upon...

I don't think I removed that stuff after your patch.

> You know what?  At this point, I'm just not going to bother.  You fix
> this lockdep problem, I'm obviously useless at kernel stuff.

Okay, leave it on me. I will re-circulate your patch.
Viresh Kumar Aug. 12, 2015, 9:19 a.m. UTC | #20
On 12-08-15, 10:08, Russell King - ARM Linux wrote:
> And now generate versions for all the stable kernels inbetween.  Now
> your problem.  I'm sick of this.

Haha.. There wouldn't be many. 3.19 got large number of changes (from
me) and so we required a newer version. But after that should be
somewhat stable.
Rafael J. Wysocki Aug. 13, 2015, 1:20 a.m. UTC | #21
On Tuesday, August 11, 2015 06:03:57 PM Russell King - ARM Linux wrote:
> On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote:
> > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote:
> > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> > > > On 16 December 2014 at 04:39, Russell King - ARM Linux
> > > > <linux@arm.linux.org.uk> wrote:
> > > > > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > > > > which is basically what I described in my email, and I'm running with it
> > > > > and it is without any lockdep complaint.
> > > > 
> > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> > > > 3.19 has see lots of changes in this particular file and so we need to
> > > > change few things here.
> > > 
> > > What happened with this?  I'm still carrying the patch.
> > 
> > This should go in through the thermal tree.  Eduardo?
> 
> Having waited a long time for any kind of response from Eduardo, I've
> given up.  My conclusion is that Eduardo isn't interested in this.
> 
> I've re-checked, and the AB-BA deadlock is still there in the latest
> code.  So, I've taken it upon myself to throw this into my for-next
> branch to force the issue - not something I _want_ to do, but I'm doing
> this out of frustration.  It's clear to me that "playing nice" by email
> does _not_ work with some people.
> 
> I'm rather hoping that Stephen reports a merge conflict with linux-next
> this evening to highlight this situation.  I've added additional commentry
> to the commit message on the patch giving the reason why I've done this,
> and the relevant message IDs showing the past history.
> 
> I've not decided whether I'm going to ask Linus to take this patch
> directly or not, that rather depends whether there's any co-operation
> from Eduardo on this.  I'd rather Eduardo took the patch.
> 
> The patch I have has had to be updated again for changes to the driver,
> but I really don't see the point of re-posting it just for it to be
> ignored yet again.
> 
> I'm really disappointed by this dysfunctional state of affairs, and
> that what should be an urgent fix for an observable problem is still
> not merged some nine months after it was first identified.

I guess it might help if you sent the updated patch in a new thread.
Russell King - ARM Linux Aug. 13, 2015, 8:17 a.m. UTC | #22
On Thu, Aug 13, 2015 at 03:20:35AM +0200, Rafael J. Wysocki wrote:
> On Tuesday, August 11, 2015 06:03:57 PM Russell King - ARM Linux wrote:
> > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote:
> > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote:
> > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux
> > > > > <linux@arm.linux.org.uk> wrote:
> > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > > > > > which is basically what I described in my email, and I'm running with it
> > > > > > and it is without any lockdep complaint.
> > > > > 
> > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> > > > > 3.19 has see lots of changes in this particular file and so we need to
> > > > > change few things here.
> > > > 
> > > > What happened with this?  I'm still carrying the patch.
> > > 
> > > This should go in through the thermal tree.  Eduardo?
> > 
> > Having waited a long time for any kind of response from Eduardo, I've
> > given up.  My conclusion is that Eduardo isn't interested in this.
> > 
> > I've re-checked, and the AB-BA deadlock is still there in the latest
> > code.  So, I've taken it upon myself to throw this into my for-next
> > branch to force the issue - not something I _want_ to do, but I'm doing
> > this out of frustration.  It's clear to me that "playing nice" by email
> > does _not_ work with some people.
> > 
> > I'm rather hoping that Stephen reports a merge conflict with linux-next
> > this evening to highlight this situation.  I've added additional commentry
> > to the commit message on the patch giving the reason why I've done this,
> > and the relevant message IDs showing the past history.
> > 
> > I've not decided whether I'm going to ask Linus to take this patch
> > directly or not, that rather depends whether there's any co-operation
> > from Eduardo on this.  I'd rather Eduardo took the patch.
> > 
> > The patch I have has had to be updated again for changes to the driver,
> > but I really don't see the point of re-posting it just for it to be
> > ignored yet again.
> > 
> > I'm really disappointed by this dysfunctional state of affairs, and
> > that what should be an urgent fix for an observable problem is still
> > not merged some nine months after it was first identified.
> 
> I guess it might help if you sent the updated patch in a new thread.

That I doubt.  Eduardo has not bothered to reply at _any_ time.  I
have to question whether there is anyone even reading that email
address, or whether it's a redirect to /dev/null.  All the evidence I
have right now is that this Eduardo is a ficticous character.

I would have at least expected some complaints when I said "I've put
it in linux-next" but... absolutely nothing.

So... my only conclusion is that you're all pulling my leg that there
_is_ this "Eduardo" maintainer who's supposed to be taking patches for
this stuff.

As I've said, I'm not bothering with this anymore, it's just far too
much effort to play these stupid games.  The deadlock can stay for all
I care.

Sorry.
Viresh Kumar Aug. 13, 2015, 8:22 a.m. UTC | #23
On 13-08-15, 09:17, Russell King - ARM Linux wrote:
> That I doubt.  Eduardo has not bothered to reply at _any_ time.  I
> have to question whether there is anyone even reading that email
> address, or whether it's a redirect to /dev/null.  All the evidence I
> have right now is that this Eduardo is a ficticous character.
> 
> I would have at least expected some complaints when I said "I've put
> it in linux-next" but... absolutely nothing.
> 
> So... my only conclusion is that you're all pulling my leg that there
> _is_ this "Eduardo" maintainer who's supposed to be taking patches for
> this stuff.

Haha, If that's the case then we are all in the same boat.
Who is Eduardo ? A bot ? :)

Anyway, AFAIK, he wasn't around for sometime and came back last week
only. Even I have expected him to reply to yesterday's discussions.

> As I've said, I'm not bothering with this anymore, it's just far too
> much effort to play these stupid games.  The deadlock can stay for all
> I care.

And finally I took the charge to carry it now. :)
Eduardo Valentin Aug. 14, 2015, 4:50 a.m. UTC | #24
Hi Russell,

On Wed, Aug 12, 2015 at 08:41:49AM +0100, Russell King wrote:

<cut>

> 
> For some reason, despite Rafael saying that this should go in, despite it
> having been reviewed, Eduardo appears to be ignoring basic fixes to code
> under his control.  Having waited more than a reasonable amount of time,
> it is now time to take some action against this blockage.  Hence, I'm
> putting this into linux-next in the hope that it'll conflict and elicit
> a response for this important bug fix.

Sorry if I let this one to fall into the cracks for this long.

I am looking at this now and queuing as soon as possible.

Thanks for pushing this fix.

BR,

Eduardo

> 
> Relevant message IDs:
> 20141214213655.GA11285@n2100.arm.linux.org.uk
> 7573578.UE8ufgjWuX@vostro.rjw.lan
> 20141215230922.GL11285@n2100.arm.linux.org.uk
> CAKohponhkk9BK6-7ScHeZa4R43iooWQFV8YoPLv6LjO40GNq=A@mail.gmail.com
> 20150518185645.GA28053@n2100.arm.linux.org.uk
> 2574268.XBqpdL2VLI@vostro.rjw.lan
> ---
>  drivers/thermal/cpu_cooling.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index 6509c61b9648..d1fd02eec58d 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -107,6 +107,9 @@ struct cpufreq_cooling_device {
>  static DEFINE_IDR(cpufreq_idr);
>  static DEFINE_MUTEX(cooling_cpufreq_lock);
>  
> +static unsigned int cpufreq_dev_count;
> +
> +static DEFINE_MUTEX(cooling_list_lock);
>  static LIST_HEAD(cpufreq_dev_list);
>  
>  /**
> @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>  	switch (event) {
>  
>  	case CPUFREQ_ADJUST:
> -		mutex_lock(&cooling_cpufreq_lock);
> +		mutex_lock(&cooling_list_lock);
>  		list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
>  			if (!cpumask_test_cpu(policy->cpu,
>  					      &cpufreq_dev->allowed_cpus))
> @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>  				cpufreq_verify_within_limits(policy, 0,
>  							     max_freq);
>  		}
> -		mutex_unlock(&cooling_cpufreq_lock);
> +		mutex_unlock(&cooling_list_lock);
>  		break;
>  	default:
>  		return NOTIFY_DONE;
> @@ -865,12 +868,15 @@ __cpufreq_cooling_register(struct device_node *np,
>  	cpufreq_dev->cool_dev = cool_dev;
>  
>  	mutex_lock(&cooling_cpufreq_lock);
> +	mutex_lock(&cooling_list_lock);
> +	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> +	mutex_unlock(&cooling_list_lock);
>  
>  	/* Register the notifier for first cpufreq cooling device */
> -	if (list_empty(&cpufreq_dev_list))
> +	if (cpufreq_dev_count == 0)
>  		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
>  					  CPUFREQ_POLICY_NOTIFIER);
> -	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
> +	cpufreq_dev_count++;
>  
>  	mutex_unlock(&cooling_cpufreq_lock);
>  
> @@ -1014,14 +1020,18 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>  
>  	cpufreq_dev = cdev->devdata;
>  	mutex_lock(&cooling_cpufreq_lock);
> -	list_del(&cpufreq_dev->node);
> +	cpufreq_dev_count--;
>  
>  	/* Unregister the notifier for the last cpufreq cooling device */
> -	if (list_empty(&cpufreq_dev_list))
> +	if (cpufreq_dev_count == 0)
>  		cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block,
>  					    CPUFREQ_POLICY_NOTIFIER);
>  	mutex_unlock(&cooling_cpufreq_lock);
>  
> +	mutex_lock(&cooling_list_lock);
> +	list_del(&cpufreq_dev->node);
> +	mutex_unlock(&cooling_list_lock);
> +
>  	thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
>  	release_idr(&cpufreq_idr, cpufreq_dev->id);
>  	kfree(cpufreq_dev->time_in_idle_timestamp);
> -- 
> 2.1.0
>
Viresh Kumar Aug. 14, 2015, 5:53 a.m. UTC | #25
On 13-08-15, 21:50, Eduardo Valentin wrote:
> Sorry if I let this one to fall into the cracks for this long.
> 
> I am looking at this now and queuing as soon as possible.

You applied the wrong one, please drop it. And apply the one I have
sent as a separate thread, with my sob on it.
Eduardo Valentin Aug. 15, 2015, 1:19 a.m. UTC | #26
On Fri, Aug 14, 2015 at 11:23:59AM +0530, Viresh Kumar wrote:
> On 13-08-15, 21:50, Eduardo Valentin wrote:
> > Sorry if I let this one to fall into the cracks for this long.
> > 
> > I am looking at this now and queuing as soon as possible.
> 
> You applied the wrong one, please drop it. And apply the one I have
> sent as a separate thread, with my sob on it.

OK. I am updating it to your V2.


> 
> -- 
> viresh
Viresh Kumar Aug. 15, 2015, 3:06 a.m. UTC | #27
On 14-08-15, 18:19, Eduardo Valentin wrote:
> On Fri, Aug 14, 2015 at 11:23:59AM +0530, Viresh Kumar wrote:
> > On 13-08-15, 21:50, Eduardo Valentin wrote:
> > > Sorry if I let this one to fall into the cracks for this long.
> > > 
> > > I am looking at this now and queuing as soon as possible.
> > 
> > You applied the wrong one, please drop it. And apply the one I have
> > sent as a separate thread, with my sob on it.
> 
> OK. I am updating it to your V2.

Looks fine now.
Rafael J. Wysocki Aug. 18, 2015, 1:32 a.m. UTC | #28
On Thursday, August 13, 2015 09:17:44 AM Russell King - ARM Linux wrote:
> On Thu, Aug 13, 2015 at 03:20:35AM +0200, Rafael J. Wysocki wrote:
> > On Tuesday, August 11, 2015 06:03:57 PM Russell King - ARM Linux wrote:
> > > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote:
> > > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote:
> > > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote:
> > > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux
> > > > > > <linux@arm.linux.org.uk> wrote:
> > > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment,
> > > > > > > which is basically what I described in my email, and I'm running with it
> > > > > > > and it is without any lockdep complaint.
> > > > > > 
> > > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc.
> > > > > > 3.19 has see lots of changes in this particular file and so we need to
> > > > > > change few things here.
> > > > > 
> > > > > What happened with this?  I'm still carrying the patch.
> > > > 
> > > > This should go in through the thermal tree.  Eduardo?
> > > 
> > > Having waited a long time for any kind of response from Eduardo, I've
> > > given up.  My conclusion is that Eduardo isn't interested in this.
> > > 
> > > I've re-checked, and the AB-BA deadlock is still there in the latest
> > > code.  So, I've taken it upon myself to throw this into my for-next
> > > branch to force the issue - not something I _want_ to do, but I'm doing
> > > this out of frustration.  It's clear to me that "playing nice" by email
> > > does _not_ work with some people.
> > > 
> > > I'm rather hoping that Stephen reports a merge conflict with linux-next
> > > this evening to highlight this situation.  I've added additional commentry
> > > to the commit message on the patch giving the reason why I've done this,
> > > and the relevant message IDs showing the past history.
> > > 
> > > I've not decided whether I'm going to ask Linus to take this patch
> > > directly or not, that rather depends whether there's any co-operation
> > > from Eduardo on this.  I'd rather Eduardo took the patch.
> > > 
> > > The patch I have has had to be updated again for changes to the driver,
> > > but I really don't see the point of re-posting it just for it to be
> > > ignored yet again.
> > > 
> > > I'm really disappointed by this dysfunctional state of affairs, and
> > > that what should be an urgent fix for an observable problem is still
> > > not merged some nine months after it was first identified.
> > 
> > I guess it might help if you sent the updated patch in a new thread.
> 
> That I doubt.  Eduardo has not bothered to reply at _any_ time.  I
> have to question whether there is anyone even reading that email
> address, or whether it's a redirect to /dev/null.  All the evidence I
> have right now is that this Eduardo is a ficticous character.
> 
> I would have at least expected some complaints when I said "I've put
> it in linux-next" but... absolutely nothing.
> 
> So... my only conclusion is that you're all pulling my leg that there
> _is_ this "Eduardo" maintainer who's supposed to be taking patches for
> this stuff.

Well, that's the situation as per MAINTAINERS today.  You seem to be concerned
that it may not reflect the reality, but in that case I can only recommend
sending a patch against MAINTAINERS to remove Eduardo from there.

> As I've said, I'm not bothering with this anymore, it's just far too
> much effort to play these stupid games.

I'm not sure what you mean by "these stupid games" here.

> The deadlock can stay for all I care.

Fair enough.

Thanks,
Rafael
Eduardo Valentin Aug. 18, 2015, 9:30 a.m. UTC | #29
Hi there,

On Mon, Aug 17, 2015 at 6:32 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

<cut>

>
> Well, that's the situation as per MAINTAINERS today.  You seem to be concerned
> that it may not reflect the reality, but in that case I can only recommend
> sending a patch against MAINTAINERS to remove Eduardo from there.
>
>> As I've said, I'm not bothering with this anymore, it's just far too
>> much effort to play these stupid games.
>
> I'm not sure what you mean by "these stupid games" here.
>
>> The deadlock can stay for all I care.
>
> Fair enough.

I applied the patch. And it has generated the conflict in linux-next, as was mentioned in this thread. I already sent the pull request.

Yes, I missed the thread. Apologize for this and for the inconvenience I have caused. The deadlock will be taken care using Russel's patch.

Thanks Viresh for notifying me.


>
> Thanks,
> Rafael
>

BR,
diff mbox

Patch

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index ad09e51..622ea40 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -484,15 +484,15 @@  __cpufreq_cooling_register(struct device_node *np,
        cpufreq_dev->cpufreq_state = 0;
        mutex_lock(&cooling_cpufreq_lock);
 
-       /* Register the notifier for first cpufreq cooling device */
-       if (cpufreq_dev_count == 0)
-               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
-                                         CPUFREQ_POLICY_NOTIFIER);
        cpufreq_dev_count++;
        list_add(&cpufreq_dev->node, &cpufreq_dev_list);
 
        mutex_unlock(&cooling_cpufreq_lock);
 
+       /* Register the notifier for first cpufreq cooling device */
+       if (cpufreq_dev_count == 0)
+               cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
+                                         CPUFREQ_POLICY_NOTIFIER);
        return cool_dev;
 }