Message ID | 002f01d0186b$2700b730$75022590$%brar@samsung.com |
---|---|
State | New |
Headers | show |
On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote: > Unfortunately, I didn’t get any such warning though I tested > patch enabling CONFIG_PROVE_LOCKING before posting. It seems > Russell is trying to load imx_thermal as module and parallely > Changing cpufreq governor from userspace, which was not my > test case. No. Yes, imx_thermal is a module, which is loaded by udev very early in the userspace boot. Later on, in the rc.local, I poke the governor from userspace. This is evidenced by the bluetooth modules being loaded after imx_thermal: Module Size Used by bnep 10574 2 rfcomm 33720 0 bluetooth 293598 10 bnep,rfcomm nfsd 90264 0 exportfs 3988 1 nfsd hid_cypress 1763 0 snd_soc_fsl_spdif 9753 2 imx_pcm_dma 1137 1 snd_soc_fsl_spdif imx_sdma 12885 2 imx2_wdt 3248 0 imx_thermal 5386 0 snd_soc_imx_spdif 1877 0 and the timestamp on the bluetooth message: [ 25.503739] Bluetooth: Core ver 2.19 vs the timestamp on the lockdep message: [ 29.499195] [ INFO: possible circular locking dependency detected ] My rc.local does this: # Configure cpufreq echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > Anyways, after analyzing log and code,I think problem is not > in cpufreq_thermal_notifier which was modified in patch as > stated above. Actual problem is in __cpufreq_cooling_register > which is unnecessarily calling cpufreq_register_notifier() > inside section protected by cooling_cpufreq_lock. > Because cpufreq_policy_notifier_list).rwsem is already held > by store_scaling_governor when __cpufreq_cooling_register is > trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock. > > So I think following can fix the problem: > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > index ad09e51..622ea40 100644 > --- a/drivers/thermal/cpu_cooling.c > +++ b/drivers/thermal/cpu_cooling.c > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *np, > cpufreq_dev->cpufreq_state = 0; > mutex_lock(&cooling_cpufreq_lock); > > - /* Register the notifier for first cpufreq cooling device */ > - if (cpufreq_dev_count == 0) > - cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > - CPUFREQ_POLICY_NOTIFIER); > cpufreq_dev_count++; > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > mutex_unlock(&cooling_cpufreq_lock); > > + /* Register the notifier for first cpufreq cooling device */ > + if (cpufreq_dev_count == 0) > + cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > + CPUFREQ_POLICY_NOTIFIER); No, sorry, this patch is worse. 1. cpufreq_register_notifier() will never be called, even on the first caller to this code: at the point where it's tested for zero, it will be incremented to one by the previous code. 2. What happens if two threads come here? The first thread succeeds in grabbing cooling_cpufreq_lock. The second thread stalls waiting for cooling_cpufreq_lock to be released. The first thread increments cpufreq_dev_count, adds to the list, and then releases the lock. At this point, control may be passed to the second thread (since mutex_unlock() will wake it up.) The second thread gets into the critical region, and increments cpufreq_dev_count again. By the time the first thread runs, cpufreq_dev_count may be two at this point. Unfortunately, you do need some kind of synchronisation here. If it's not important when cpufreq_register_notifier() gets called, maybe this can work: bool register; mutex_lock(&cooling_cpufreq_lock); register = cpufreq_dev_count++ == 0; list_add(&cpufreq_dev->node, &cpufreq_dev_list); mutex_unlock(&cooling_cpufreq_lock); if (register) cpufreq_register_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); However, I suspect that may well be buggy if we have a cleanup path which unregisters the notifier when cpufreq_dev_count is decremented to zero... which we seem to have in cpufreq_cooling_unregister(). The answer may well be to have finer grained locking here: - one lock to protect cpufreq_dev_list, which is only ever taken when adding or removing entries from it - a second lock to protect cpufreq_dev_count and the calls to cpufreq_register_notifier() and cpufreq_unregister_notifier() and you would /never/ take either of those two locks inside each other. In other words: mutex_lock(&cooling_list_lock); list_add(&cpufreq_dev->node, &cpufreq_dev_list); mutex_unlock(&cooling_list_lock); mutex_lock(&cooling_cpufreq_lock); if (cpufreq_dev_count++ == 0) cpufreq_register_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); mutex_unlock(&cooling_cpufreq_lock); and similar in the cleanup path. The notifier itself would only ever take the cooling_list_lock.
On 15 December 2014 at 18:58, Yadwinder Singh Brar <yadi.brar@samsung.com> wrote: > Unfortunately, I didn’t get any such warning though I tested > patch enabling CONFIG_PROVE_LOCKING before posting. It seems Even I couldn't reproduce it :) > Anyways, after analyzing log and code,I think problem is not > in cpufreq_thermal_notifier which was modified in patch as > stated above. Actual problem is in __cpufreq_cooling_register > which is unnecessarily calling cpufreq_register_notifier() > inside section protected by cooling_cpufreq_lock. Agreed, and I was looking to reproduce first before writing some crazy code. > Because cpufreq_policy_notifier_list).rwsem is already held > by store_scaling_governor when __cpufreq_cooling_register is > trying to cpufreq_policy_notifier_list while holding cooling_cpufreq_lock. Correct. > So I think following can fix the problem: > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > index ad09e51..622ea40 100644 > --- a/drivers/thermal/cpu_cooling.c > +++ b/drivers/thermal/cpu_cooling.c > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *np, > cpufreq_dev->cpufreq_state = 0; > mutex_lock(&cooling_cpufreq_lock); > > - /* Register the notifier for first cpufreq cooling device */ > - if (cpufreq_dev_count == 0) > - cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > - CPUFREQ_POLICY_NOTIFIER); > cpufreq_dev_count++; > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > mutex_unlock(&cooling_cpufreq_lock); > > + /* Register the notifier for first cpufreq cooling device */ > + if (cpufreq_dev_count == 0) This will always fail :), over that we need to access cpufreq_dev_count from within the lock. > + cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > + CPUFREQ_POLICY_NOTIFIER); > return cool_dev; > }
> -----Original Message----- > From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk] > Sent: Monday, December 15, 2014 7:17 PM > To: Yadwinder Singh Brar > Cc: 'Viresh Kumar'; 'Rafael J. Wysocki'; linux-arm- > kernel@lists.infradead.org; linux-pm@vger.kernel.org; 'Eduardo > Valentin' > Subject: Re: 3.18: lockdep problems in cpufreq > > On Mon, Dec 15, 2014 at 06:58:41PM +0530, Yadwinder Singh Brar wrote: > > Unfortunately, I didn’t get any such warning though I tested patch > > enabling CONFIG_PROVE_LOCKING before posting. It seems Russell is > > trying to load imx_thermal as module and parallely Changing cpufreq > > governor from userspace, which was not my test case. > > No. Yes, imx_thermal is a module, which is loaded by udev very early > in the userspace boot. Later on, in the rc.local, I poke the governor > from userspace. > > This is evidenced by the bluetooth modules being loaded after > imx_thermal: > > Module Size Used by > bnep 10574 2 > rfcomm 33720 0 > bluetooth 293598 10 bnep,rfcomm > nfsd 90264 0 > exportfs 3988 1 nfsd > hid_cypress 1763 0 > snd_soc_fsl_spdif 9753 2 > imx_pcm_dma 1137 1 snd_soc_fsl_spdif > imx_sdma 12885 2 > imx2_wdt 3248 0 > imx_thermal 5386 0 > snd_soc_imx_spdif 1877 0 > > and the timestamp on the bluetooth message: > > [ 25.503739] Bluetooth: Core ver 2.19 > > vs the timestamp on the lockdep message: > > [ 29.499195] [ INFO: possible circular locking dependency detected ] > > My rc.local does this: > > # Configure cpufreq > echo 1 > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy > echo performance > > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > > > Anyways, after analyzing log and code,I think problem is not in > > cpufreq_thermal_notifier which was modified in patch as stated above. > > Actual problem is in __cpufreq_cooling_register which is > unnecessarily > > calling cpufreq_register_notifier() inside section protected by > > cooling_cpufreq_lock. > > Because cpufreq_policy_notifier_list).rwsem is already held by > > store_scaling_governor when __cpufreq_cooling_register is trying to > > cpufreq_policy_notifier_list while holding cooling_cpufreq_lock. > > > > So I think following can fix the problem: > > > > diff --git a/drivers/thermal/cpu_cooling.c > > b/drivers/thermal/cpu_cooling.c index ad09e51..622ea40 100644 > > --- a/drivers/thermal/cpu_cooling.c > > +++ b/drivers/thermal/cpu_cooling.c > > @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node > *np, > > cpufreq_dev->cpufreq_state = 0; > > mutex_lock(&cooling_cpufreq_lock); > > > > - /* Register the notifier for first cpufreq cooling device */ > > - if (cpufreq_dev_count == 0) > > - > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > > - CPUFREQ_POLICY_NOTIFIER); > > cpufreq_dev_count++; > > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > > > mutex_unlock(&cooling_cpufreq_lock); > > > > + /* Register the notifier for first cpufreq cooling device */ > > + if (cpufreq_dev_count == 0) > > + > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > > + CPUFREQ_POLICY_NOTIFIER); > > No, sorry, this patch is worse. > Actually it was not patch :P , just moved cpufreq_register_notifier() out of locking, as it is(C&P) to explain my point. > 1. cpufreq_register_notifier() will never be called, even on the first > caller to this code: at the point where it's tested for zero, it > will > be incremented to one by the previous code. > > 2. What happens if two threads come here? > > The first thread succeeds in grabbing cooling_cpufreq_lock. The > second > thread stalls waiting for cooling_cpufreq_lock to be released. > > The first thread increments cpufreq_dev_count, adds to the list, and > then > releases the lock. At this point, control may be passed to the > second > thread (since mutex_unlock() will wake it up.) The second thread > gets > into the critical region, and increments cpufreq_dev_count again. > > By the time the first thread runs, cpufreq_dev_count may be two at > this > point. Yes, may be. > > Unfortunately, you do need some kind of synchronisation here. If it's > not important when cpufreq_register_notifier() gets called, maybe this > can work: > > bool register; > > mutex_lock(&cooling_cpufreq_lock); > register = cpufreq_dev_count++ == 0; > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > mutex_unlock(&cooling_cpufreq_lock); > > if (register) register may be 0 in scenario you stated above in second point. So this approach will not work. > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > CPUFREQ_POLICY_NOTIFIER); > > However, I suspect that may well be buggy if we have a cleanup path > which unregisters the notifier when cpufreq_dev_count is decremented to > zero... > which we seem to have in cpufreq_cooling_unregister(). > > The answer may well be to have finer grained locking here: > > - one lock to protect cpufreq_dev_list, which is only ever taken when > adding or removing entries from it > > - a second lock to protect cpufreq_dev_count and the calls to > cpufreq_register_notifier() and cpufreq_unregister_notifier() > > and you would /never/ take either of those two locks inside each other. > In other words: > > mutex_lock(&cooling_list_lock); > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > mutex_unlock(&cooling_list_lock); > > mutex_lock(&cooling_cpufreq_lock); > if (cpufreq_dev_count++ == 0) > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > CPUFREQ_POLICY_NOTIFIER); > mutex_unlock(&cooling_cpufreq_lock); > > and similar in the cleanup path. The notifier itself would only ever > take the cooling_list_lock. > I agree with this approach, if its fine for others also, I can implement and post patch. Thanks, Yadwinder > -- > FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up > according to speedtest.net.
On Mon, Dec 15, 2014 at 08:24:05PM +0530, Yadwinder Singh Brar wrote: > > The answer may well be to have finer grained locking here: > > > > - one lock to protect cpufreq_dev_list, which is only ever taken when > > adding or removing entries from it > > > > - a second lock to protect cpufreq_dev_count and the calls to > > cpufreq_register_notifier() and cpufreq_unregister_notifier() > > > > and you would /never/ take either of those two locks inside each other. > > In other words: > > > > mutex_lock(&cooling_list_lock); > > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > mutex_unlock(&cooling_list_lock); > > > > mutex_lock(&cooling_cpufreq_lock); > > if (cpufreq_dev_count++ == 0) > > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > > CPUFREQ_POLICY_NOTIFIER); > > mutex_unlock(&cooling_cpufreq_lock); > > > > and similar in the cleanup path. The notifier itself would only ever > > take the cooling_list_lock. > > > > I agree with this approach, if its fine for others also, I can implement > and post patch. Yes, please do.
On Monday, December 15, 2014 05:43:36 PM Russell King - ARM Linux wrote: > On Mon, Dec 15, 2014 at 08:24:05PM +0530, Yadwinder Singh Brar wrote: > > > The answer may well be to have finer grained locking here: > > > > > > - one lock to protect cpufreq_dev_list, which is only ever taken when > > > adding or removing entries from it > > > > > > - a second lock to protect cpufreq_dev_count and the calls to > > > cpufreq_register_notifier() and cpufreq_unregister_notifier() > > > > > > and you would /never/ take either of those two locks inside each other. > > > In other words: > > > > > > mutex_lock(&cooling_list_lock); > > > list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > > mutex_unlock(&cooling_list_lock); > > > > > > mutex_lock(&cooling_cpufreq_lock); > > > if (cpufreq_dev_count++ == 0) > > > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > > > CPUFREQ_POLICY_NOTIFIER); > > > mutex_unlock(&cooling_cpufreq_lock); > > > > > > and similar in the cleanup path. The notifier itself would only ever > > > take the cooling_list_lock. > > > > > > > I agree with this approach, if its fine for others also, I can implement > > and post patch. > > Yes, please do. Indeed. We have a regression here to fix. Rafael
On 15 December 2014 at 20:24, Yadwinder Singh Brar <yadi.brar@samsung.com> wrote: >> Unfortunately, you do need some kind of synchronisation here. If it's >> not important when cpufreq_register_notifier() gets called, maybe this >> can work: >> >> bool register; >> >> mutex_lock(&cooling_cpufreq_lock); >> register = cpufreq_dev_count++ == 0; >> list_add(&cpufreq_dev->node, &cpufreq_dev_list); >> mutex_unlock(&cooling_cpufreq_lock); >> >> if (register) > > register may be 0 in scenario you stated above in second point. > So this approach will not work. I didn't understood what you meant here. register will be zero only for one of the threads.
On 16 December 2014 at 04:39, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > Well, here's a patch which I'm running on top of 3.18 at the moment, > which is basically what I described in my email, and I'm running with it > and it is without any lockdep complaint. We need two separate patches now, one for 3.18 and other one for 3.19-rc. 3.19 has see lots of changes in this particular file and so we need to change few things here. > 8<=== > From: Russell King <rmk+kernel@arm.linux.org.uk> > thermal: cpu_cooling: fix lockdep problems in cpu_cooling > > A recent change to the cpu_cooling code introduced a AB-BA deadlock > scenario between the cpufreq_policy_notifier_list rwsem and the > cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held > before the registration/removal of the notifier block (an operation > which takes the rwsem), and the notifier code itself which takes the > locks in the reverse order. > > Solve this by moving to finer grained locking - use one mutex to protect > the cpufreq_dev_list as a whole, and a separate lock to ensure correct > ordering of cpufreq notifier registration and removal. > > I considered taking the cooling_list_lock within cooling_cpufreq_lock to > protect the registration sequence as a whole, but that adds a dependency > between these two locks which is best avoided (lest someone tries to > take those two new locks in the reverse order.) In any case, it's safer > to have an empty cpufreq_dev_list than to have unnecessary dependencies > between locks. > > Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with thermal constraints") > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > --- > > drivers/thermal/cpu_cooling.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > index ad09e51ffae4..9e42c6f30785 100644 > --- a/drivers/thermal/cpu_cooling.c > +++ b/drivers/thermal/cpu_cooling.c > @@ -57,6 +57,7 @@ static DEFINE_MUTEX(cooling_cpufreq_lock); > > static unsigned int cpufreq_dev_count; > > +static DEFINE_MUTEX(cooling_list_lock); > static LIST_HEAD(cpufreq_dev_list); > > /** > @@ -317,7 +318,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > if (event != CPUFREQ_ADJUST) > return 0; > > - mutex_lock(&cooling_cpufreq_lock); > + mutex_lock(&cooling_list_lock); > list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { > if (!cpumask_test_cpu(policy->cpu, > &cpufreq_dev->allowed_cpus)) > @@ -333,7 +334,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > if (policy->max != max_freq) > cpufreq_verify_within_limits(policy, 0, max_freq); > } > - mutex_unlock(&cooling_cpufreq_lock); > + mutex_unlock(&cooling_list_lock); > > return 0; > } > @@ -482,6 +483,11 @@ __cpufreq_cooling_register(struct device_node *np, > } > cpufreq_dev->cool_dev = cool_dev; > cpufreq_dev->cpufreq_state = 0; > + > + mutex_lock(&cooling_list_lock); > + list_add(&cpufreq_dev->node, &cpufreq_dev_list); > + mutex_unlock(&cooling_list_lock); > + > mutex_lock(&cooling_cpufreq_lock); > > /* Register the notifier for first cpufreq cooling device */ > @@ -489,7 +495,6 @@ __cpufreq_cooling_register(struct device_node *np, > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > CPUFREQ_POLICY_NOTIFIER); > cpufreq_dev_count++; > - list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > mutex_unlock(&cooling_cpufreq_lock); > > @@ -553,7 +558,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > > cpufreq_dev = cdev->devdata; > mutex_lock(&cooling_cpufreq_lock); > - list_del(&cpufreq_dev->node); > cpufreq_dev_count--; > > /* Unregister the notifier for the last cpufreq cooling device */ > @@ -562,6 +566,10 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > CPUFREQ_POLICY_NOTIFIER); > mutex_unlock(&cooling_cpufreq_lock); > > + mutex_lock(&cooling_list_lock); > + list_del(&cpufreq_dev->node); > + mutex_unlock(&cooling_list_lock); > + > thermal_cooling_device_unregister(cpufreq_dev->cool_dev); > release_idr(&cpufreq_idr, cpufreq_dev->id); > kfree(cpufreq_dev); For 3.18 Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > On 16 December 2014 at 04:39, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > which is basically what I described in my email, and I'm running with it > > and it is without any lockdep complaint. > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > 3.19 has see lots of changes in this particular file and so we need to > change few things here. I'm sorry, I don't have the bandwidth to create a patch for 3.19-rc right now.
On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > On 16 December 2014 at 04:39, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > which is basically what I described in my email, and I'm running with it > > and it is without any lockdep complaint. > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > 3.19 has see lots of changes in this particular file and so we need to > change few things here. What happened with this? I'm still carrying the patch. > > 8<=== > > From: Russell King <rmk+kernel@arm.linux.org.uk> > > thermal: cpu_cooling: fix lockdep problems in cpu_cooling > > > > A recent change to the cpu_cooling code introduced a AB-BA deadlock > > scenario between the cpufreq_policy_notifier_list rwsem and the > > cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held > > before the registration/removal of the notifier block (an operation > > which takes the rwsem), and the notifier code itself which takes the > > locks in the reverse order. > > > > Solve this by moving to finer grained locking - use one mutex to protect > > the cpufreq_dev_list as a whole, and a separate lock to ensure correct > > ordering of cpufreq notifier registration and removal. > > > > I considered taking the cooling_list_lock within cooling_cpufreq_lock to > > protect the registration sequence as a whole, but that adds a dependency > > between these two locks which is best avoided (lest someone tries to > > take those two new locks in the reverse order.) In any case, it's safer > > to have an empty cpufreq_dev_list than to have unnecessary dependencies > > between locks. > > > > Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with thermal constraints") > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > > --- > > > > drivers/thermal/cpu_cooling.c | 16 ++++++++++++---- > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > > index ad09e51ffae4..9e42c6f30785 100644 > > --- a/drivers/thermal/cpu_cooling.c > > +++ b/drivers/thermal/cpu_cooling.c > > @@ -57,6 +57,7 @@ static DEFINE_MUTEX(cooling_cpufreq_lock); > > > > static unsigned int cpufreq_dev_count; > > > > +static DEFINE_MUTEX(cooling_list_lock); > > static LIST_HEAD(cpufreq_dev_list); > > > > /** > > @@ -317,7 +318,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > if (event != CPUFREQ_ADJUST) > > return 0; > > > > - mutex_lock(&cooling_cpufreq_lock); > > + mutex_lock(&cooling_list_lock); > > list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { > > if (!cpumask_test_cpu(policy->cpu, > > &cpufreq_dev->allowed_cpus)) > > @@ -333,7 +334,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > if (policy->max != max_freq) > > cpufreq_verify_within_limits(policy, 0, max_freq); > > } > > - mutex_unlock(&cooling_cpufreq_lock); > > + mutex_unlock(&cooling_list_lock); > > > > return 0; > > } > > @@ -482,6 +483,11 @@ __cpufreq_cooling_register(struct device_node *np, > > } > > cpufreq_dev->cool_dev = cool_dev; > > cpufreq_dev->cpufreq_state = 0; > > + > > + mutex_lock(&cooling_list_lock); > > + list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > + mutex_unlock(&cooling_list_lock); > > + > > mutex_lock(&cooling_cpufreq_lock); > > > > /* Register the notifier for first cpufreq cooling device */ > > @@ -489,7 +495,6 @@ __cpufreq_cooling_register(struct device_node *np, > > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > > CPUFREQ_POLICY_NOTIFIER); > > cpufreq_dev_count++; > > - list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > > > mutex_unlock(&cooling_cpufreq_lock); > > > > @@ -553,7 +558,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > > > > cpufreq_dev = cdev->devdata; > > mutex_lock(&cooling_cpufreq_lock); > > - list_del(&cpufreq_dev->node); > > cpufreq_dev_count--; > > > > /* Unregister the notifier for the last cpufreq cooling device */ > > @@ -562,6 +566,10 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > > CPUFREQ_POLICY_NOTIFIER); > > mutex_unlock(&cooling_cpufreq_lock); > > > > + mutex_lock(&cooling_list_lock); > > + list_del(&cpufreq_dev->node); > > + mutex_unlock(&cooling_list_lock); > > + > > thermal_cooling_device_unregister(cpufreq_dev->cool_dev); > > release_idr(&cpufreq_idr, cpufreq_dev->id); > > kfree(cpufreq_dev); > > For 3.18 > > Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote: > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > > On 16 December 2014 at 04:39, Russell King - ARM Linux > > <linux@arm.linux.org.uk> wrote: > > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > > which is basically what I described in my email, and I'm running with it > > > and it is without any lockdep complaint. > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > > 3.19 has see lots of changes in this particular file and so we need to > > change few things here. > > What happened with this? I'm still carrying the patch. This should go in through the thermal tree. Eduardo? > > > > 8<=== > > > From: Russell King <rmk+kernel@arm.linux.org.uk> > > > thermal: cpu_cooling: fix lockdep problems in cpu_cooling > > > > > > A recent change to the cpu_cooling code introduced a AB-BA deadlock > > > scenario between the cpufreq_policy_notifier_list rwsem and the > > > cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held > > > before the registration/removal of the notifier block (an operation > > > which takes the rwsem), and the notifier code itself which takes the > > > locks in the reverse order. > > > > > > Solve this by moving to finer grained locking - use one mutex to protect > > > the cpufreq_dev_list as a whole, and a separate lock to ensure correct > > > ordering of cpufreq notifier registration and removal. > > > > > > I considered taking the cooling_list_lock within cooling_cpufreq_lock to > > > protect the registration sequence as a whole, but that adds a dependency > > > between these two locks which is best avoided (lest someone tries to > > > take those two new locks in the reverse order.) In any case, it's safer > > > to have an empty cpufreq_dev_list than to have unnecessary dependencies > > > between locks. > > > > > > Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with thermal constraints") > > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > > > --- > > > > > > drivers/thermal/cpu_cooling.c | 16 ++++++++++++---- > > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > > > index ad09e51ffae4..9e42c6f30785 100644 > > > --- a/drivers/thermal/cpu_cooling.c > > > +++ b/drivers/thermal/cpu_cooling.c > > > @@ -57,6 +57,7 @@ static DEFINE_MUTEX(cooling_cpufreq_lock); > > > > > > static unsigned int cpufreq_dev_count; > > > > > > +static DEFINE_MUTEX(cooling_list_lock); > > > static LIST_HEAD(cpufreq_dev_list); > > > > > > /** > > > @@ -317,7 +318,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > > if (event != CPUFREQ_ADJUST) > > > return 0; > > > > > > - mutex_lock(&cooling_cpufreq_lock); > > > + mutex_lock(&cooling_list_lock); > > > list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { > > > if (!cpumask_test_cpu(policy->cpu, > > > &cpufreq_dev->allowed_cpus)) > > > @@ -333,7 +334,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > > if (policy->max != max_freq) > > > cpufreq_verify_within_limits(policy, 0, max_freq); > > > } > > > - mutex_unlock(&cooling_cpufreq_lock); > > > + mutex_unlock(&cooling_list_lock); > > > > > > return 0; > > > } > > > @@ -482,6 +483,11 @@ __cpufreq_cooling_register(struct device_node *np, > > > } > > > cpufreq_dev->cool_dev = cool_dev; > > > cpufreq_dev->cpufreq_state = 0; > > > + > > > + mutex_lock(&cooling_list_lock); > > > + list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > > + mutex_unlock(&cooling_list_lock); > > > + > > > mutex_lock(&cooling_cpufreq_lock); > > > > > > /* Register the notifier for first cpufreq cooling device */ > > > @@ -489,7 +495,6 @@ __cpufreq_cooling_register(struct device_node *np, > > > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > > > CPUFREQ_POLICY_NOTIFIER); > > > cpufreq_dev_count++; > > > - list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > > > > > mutex_unlock(&cooling_cpufreq_lock); > > > > > > @@ -553,7 +558,6 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > > > > > > cpufreq_dev = cdev->devdata; > > > mutex_lock(&cooling_cpufreq_lock); > > > - list_del(&cpufreq_dev->node); > > > cpufreq_dev_count--; > > > > > > /* Unregister the notifier for the last cpufreq cooling device */ > > > @@ -562,6 +566,10 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > > > CPUFREQ_POLICY_NOTIFIER); > > > mutex_unlock(&cooling_cpufreq_lock); > > > > > > + mutex_lock(&cooling_list_lock); > > > + list_del(&cpufreq_dev->node); > > > + mutex_unlock(&cooling_list_lock); > > > + > > > thermal_cooling_device_unregister(cpufreq_dev->cool_dev); > > > release_idr(&cpufreq_idr, cpufreq_dev->id); > > > kfree(cpufreq_dev); > > > > For 3.18 > > > > Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> > >
On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote: > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote: > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > > > On 16 December 2014 at 04:39, Russell King - ARM Linux > > > <linux@arm.linux.org.uk> wrote: > > > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > > > which is basically what I described in my email, and I'm running with it > > > > and it is without any lockdep complaint. > > > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > > > 3.19 has see lots of changes in this particular file and so we need to > > > change few things here. > > > > What happened with this? I'm still carrying the patch. > > This should go in through the thermal tree. Eduardo? Having waited a long time for any kind of response from Eduardo, I've given up. My conclusion is that Eduardo isn't interested in this. I've re-checked, and the AB-BA deadlock is still there in the latest code. So, I've taken it upon myself to throw this into my for-next branch to force the issue - not something I _want_ to do, but I'm doing this out of frustration. It's clear to me that "playing nice" by email does _not_ work with some people. I'm rather hoping that Stephen reports a merge conflict with linux-next this evening to highlight this situation. I've added additional commentry to the commit message on the patch giving the reason why I've done this, and the relevant message IDs showing the past history. I've not decided whether I'm going to ask Linus to take this patch directly or not, that rather depends whether there's any co-operation from Eduardo on this. I'd rather Eduardo took the patch. The patch I have has had to be updated again for changes to the driver, but I really don't see the point of re-posting it just for it to be ignored yet again. I'm really disappointed by this dysfunctional state of affairs, and that what should be an urgent fix for an observable problem is still not merged some nine months after it was first identified.
On 11-08-15, 18:03, Russell King - ARM Linux wrote: > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote: > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote: > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux > > > > <linux@arm.linux.org.uk> wrote: > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > > > > which is basically what I described in my email, and I'm running with it > > > > > and it is without any lockdep complaint. > > > > > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > > > > 3.19 has see lots of changes in this particular file and so we need to > > > > change few things here. Was the patch for latest kernel ever circulated ? Probably that's why Eduardo never got back to it.
On Wed, Aug 12, 2015 at 10:46:59AM +0530, Viresh Kumar wrote: > On 11-08-15, 18:03, Russell King - ARM Linux wrote: > > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote: > > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote: > > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux > > > > > <linux@arm.linux.org.uk> wrote: > > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > > > > > which is basically what I described in my email, and I'm running with it > > > > > > and it is without any lockdep complaint. > > > > > > > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > > > > > 3.19 has see lots of changes in this particular file and so we need to > > > > > change few things here. > > Was the patch for latest kernel ever circulated ? Probably that's why > Eduardo never got back to it. Maybe not - I often lose track of which patches are in what state - tracking the state of 250+ patches over many months and in some cases _years_ is not easy. That's still no reason for Eduardo to _completely_ ignore this thread, especially when Rafael explicitly tried to get comment from Eduardo.
On 12-08-15, 08:21, Russell King - ARM Linux wrote: > Maybe not - I often lose track of which patches are in what state - > tracking the state of 250+ patches over many months and in some cases > _years_ is not easy. I agree.. > That's still no reason for Eduardo to _completely_ ignore this thread, > especially when Rafael explicitly tried to get comment from Eduardo. Oh, no. I am not trying to be anybody's advocate here. I was just highlighting the point here, that I remembered. Even I was quite busy then and couldn't get back to it. Lets fix this now. I believe you have a patch for this based on latest kernel? Mind pasting that here for quick review?
On Wed, Aug 12, 2015 at 01:05:30PM +0530, Viresh Kumar wrote: > On 12-08-15, 08:21, Russell King - ARM Linux wrote: > > Maybe not - I often lose track of which patches are in what state - > > tracking the state of 250+ patches over many months and in some cases > > _years_ is not easy. > > I agree.. > > > That's still no reason for Eduardo to _completely_ ignore this thread, > > especially when Rafael explicitly tried to get comment from Eduardo. > > Oh, no. I am not trying to be anybody's advocate here. I was just > highlighting the point here, that I remembered. > > Even I was quite busy then and couldn't get back to it. > > Lets fix this now. I believe you have a patch for this based on latest > kernel? Mind pasting that here for quick review? I don't think it's changed much since the original, but I've posted what's in linux-next as of today, and it doesn't conflict with anything. So there's no excuse about getting it merged. The problem will be back-porting it to stable kernels, because I think it's had to be updated at least a couple of times. I don't have the old versions anymore, so I'm just going to say "not my problem" - sorry.
On 12-08-15, 08:41, Russell King wrote: > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > /** > @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > switch (event) { > > case CPUFREQ_ADJUST: > - mutex_lock(&cooling_cpufreq_lock); > + mutex_lock(&cooling_list_lock); There is one more place where the list's locking needs update: cpufreq_cooling_get_level(). > list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { > if (!cpumask_test_cpu(policy->cpu, > &cpufreq_dev->allowed_cpus)) > @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > cpufreq_verify_within_limits(policy, 0, > max_freq); > } > - mutex_unlock(&cooling_cpufreq_lock); > + mutex_unlock(&cooling_list_lock); > break; > default: > return NOTIFY_DONE; > @@ -865,12 +868,15 @@ __cpufreq_cooling_register(struct device_node *np, > cpufreq_dev->cool_dev = cool_dev; > > mutex_lock(&cooling_cpufreq_lock); > + mutex_lock(&cooling_list_lock); Why is the list lock taken from within the existing lock here? and ... > + list_add(&cpufreq_dev->node, &cpufreq_dev_list); > + mutex_unlock(&cooling_list_lock); > > /* Register the notifier for first cpufreq cooling device */ > - if (list_empty(&cpufreq_dev_list)) > + if (cpufreq_dev_count == 0) > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > CPUFREQ_POLICY_NOTIFIER); > - list_add(&cpufreq_dev->node, &cpufreq_dev_list); > + cpufreq_dev_count++; Maybe: if (!cpufreq_dev_count++) cpufreq_register_notifier(); > > mutex_unlock(&cooling_cpufreq_lock); > > @@ -1014,14 +1020,18 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > > cpufreq_dev = cdev->devdata; > mutex_lock(&cooling_cpufreq_lock); > - list_del(&cpufreq_dev->node); > + cpufreq_dev_count--; > > /* Unregister the notifier for the last cpufreq cooling device */ > - if (list_empty(&cpufreq_dev_list)) > + if (cpufreq_dev_count == 0) Maybe: if (!--cpufreq_dev_count) cpufreq_register_notifier(); > cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block, > CPUFREQ_POLICY_NOTIFIER); > mutex_unlock(&cooling_cpufreq_lock); > > + mutex_lock(&cooling_list_lock); ... The same list lock is not taken from within the earlier critical section? > + list_del(&cpufreq_dev->node); > + mutex_unlock(&cooling_list_lock); > + > thermal_cooling_device_unregister(cpufreq_dev->cool_dev); > release_idr(&cpufreq_idr, cpufreq_dev->id); > kfree(cpufreq_dev->time_in_idle_timestamp); > -- > 2.1.0
On Wed, Aug 12, 2015 at 01:41:43PM +0530, Viresh Kumar wrote: > On 12-08-15, 08:41, Russell King wrote: > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > > /** > > @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > switch (event) { > > > > case CPUFREQ_ADJUST: > > - mutex_lock(&cooling_cpufreq_lock); > > + mutex_lock(&cooling_list_lock); > > There is one more place where the list's locking needs update: > cpufreq_cooling_get_level(). > > > list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { > > if (!cpumask_test_cpu(policy->cpu, > > &cpufreq_dev->allowed_cpus)) > > @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > > cpufreq_verify_within_limits(policy, 0, > > max_freq); > > } > > - mutex_unlock(&cooling_cpufreq_lock); > > + mutex_unlock(&cooling_list_lock); > > break; > > default: > > return NOTIFY_DONE; > > @@ -865,12 +868,15 @@ __cpufreq_cooling_register(struct device_node *np, > > cpufreq_dev->cool_dev = cool_dev; > > > > mutex_lock(&cooling_cpufreq_lock); > > + mutex_lock(&cooling_list_lock); > > Why is the list lock taken from within the existing lock here? and ... To preserve the behaviour of the code, which is to atomically add to the list and register the notifier. > > + list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > + mutex_unlock(&cooling_list_lock); > > > > /* Register the notifier for first cpufreq cooling device */ > > - if (list_empty(&cpufreq_dev_list)) > > + if (cpufreq_dev_count == 0) > > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > > CPUFREQ_POLICY_NOTIFIER); > > - list_add(&cpufreq_dev->node, &cpufreq_dev_list); > > + cpufreq_dev_count++; > > Maybe: > if (!cpufreq_dev_count++) > cpufreq_register_notifier(); Possibly, had the code previously been like that. See: commit 2479bb6443d6a793f896219a34bfab0cc410f0b4 Author: Viresh Kumar <viresh.kumar@linaro.org> Date: Thu Dec 4 09:42:04 2014 +0530 which post-dates this patch, and removes this "style" in favour of using the broken list mechanism. Frankly... you don't have a leg to stand on because _you_ knew about this problem, and you saw my patch. So removing something that my patch to fix a serious bug relied upon... You know what? At this point, I'm just not going to bother. You fix this lockdep problem, I'm obviously useless at kernel stuff.
On Wed, Aug 12, 2015 at 01:42:57PM +0530, Viresh Kumar wrote: > On 12-08-15, 08:49, Russell King - ARM Linux wrote: > > The problem will be back-porting it to stable kernels, because I think > > it's had to be updated at least a couple of times. I don't have the old > > versions anymore, so I'm just going to say "not my problem" - sorry. > > Your old 3.18 version :) And now generate versions for all the stable kernels inbetween. Now your problem. I'm sick of this.
On 12-08-15, 10:08, Russell King - ARM Linux wrote: > Possibly, had the code previously been like that. See: > > commit 2479bb6443d6a793f896219a34bfab0cc410f0b4 > Author: Viresh Kumar <viresh.kumar@linaro.org> > Date: Thu Dec 4 09:42:04 2014 +0530 > > which post-dates this patch, and removes this "style" in favour of using > the broken list mechanism. Just to make sure I understand it properly, do you want to say that I added 2479bb6 patch after you reported this problem & your patch? If yes, then I don't see how my patch came after yours. Mine was written on 4/Dec/2014 and committed on 8/Dec/2014 and you first reported the problem on 14/Dec/2014. > Frankly... you don't have a leg to stand on I can accept that :) > because _you_ knew about this problem, and you saw my patch. So removing > something that my patch to fix a serious bug relied upon... I don't think I removed that stuff after your patch. > You know what? At this point, I'm just not going to bother. You fix > this lockdep problem, I'm obviously useless at kernel stuff. Okay, leave it on me. I will re-circulate your patch.
On 12-08-15, 10:08, Russell King - ARM Linux wrote: > And now generate versions for all the stable kernels inbetween. Now > your problem. I'm sick of this. Haha.. There wouldn't be many. 3.19 got large number of changes (from me) and so we required a newer version. But after that should be somewhat stable.
On Tuesday, August 11, 2015 06:03:57 PM Russell King - ARM Linux wrote: > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote: > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote: > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux > > > > <linux@arm.linux.org.uk> wrote: > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > > > > which is basically what I described in my email, and I'm running with it > > > > > and it is without any lockdep complaint. > > > > > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > > > > 3.19 has see lots of changes in this particular file and so we need to > > > > change few things here. > > > > > > What happened with this? I'm still carrying the patch. > > > > This should go in through the thermal tree. Eduardo? > > Having waited a long time for any kind of response from Eduardo, I've > given up. My conclusion is that Eduardo isn't interested in this. > > I've re-checked, and the AB-BA deadlock is still there in the latest > code. So, I've taken it upon myself to throw this into my for-next > branch to force the issue - not something I _want_ to do, but I'm doing > this out of frustration. It's clear to me that "playing nice" by email > does _not_ work with some people. > > I'm rather hoping that Stephen reports a merge conflict with linux-next > this evening to highlight this situation. I've added additional commentry > to the commit message on the patch giving the reason why I've done this, > and the relevant message IDs showing the past history. > > I've not decided whether I'm going to ask Linus to take this patch > directly or not, that rather depends whether there's any co-operation > from Eduardo on this. I'd rather Eduardo took the patch. > > The patch I have has had to be updated again for changes to the driver, > but I really don't see the point of re-posting it just for it to be > ignored yet again. > > I'm really disappointed by this dysfunctional state of affairs, and > that what should be an urgent fix for an observable problem is still > not merged some nine months after it was first identified. I guess it might help if you sent the updated patch in a new thread.
On Thu, Aug 13, 2015 at 03:20:35AM +0200, Rafael J. Wysocki wrote: > On Tuesday, August 11, 2015 06:03:57 PM Russell King - ARM Linux wrote: > > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote: > > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote: > > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux > > > > > <linux@arm.linux.org.uk> wrote: > > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > > > > > which is basically what I described in my email, and I'm running with it > > > > > > and it is without any lockdep complaint. > > > > > > > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > > > > > 3.19 has see lots of changes in this particular file and so we need to > > > > > change few things here. > > > > > > > > What happened with this? I'm still carrying the patch. > > > > > > This should go in through the thermal tree. Eduardo? > > > > Having waited a long time for any kind of response from Eduardo, I've > > given up. My conclusion is that Eduardo isn't interested in this. > > > > I've re-checked, and the AB-BA deadlock is still there in the latest > > code. So, I've taken it upon myself to throw this into my for-next > > branch to force the issue - not something I _want_ to do, but I'm doing > > this out of frustration. It's clear to me that "playing nice" by email > > does _not_ work with some people. > > > > I'm rather hoping that Stephen reports a merge conflict with linux-next > > this evening to highlight this situation. I've added additional commentry > > to the commit message on the patch giving the reason why I've done this, > > and the relevant message IDs showing the past history. > > > > I've not decided whether I'm going to ask Linus to take this patch > > directly or not, that rather depends whether there's any co-operation > > from Eduardo on this. I'd rather Eduardo took the patch. > > > > The patch I have has had to be updated again for changes to the driver, > > but I really don't see the point of re-posting it just for it to be > > ignored yet again. > > > > I'm really disappointed by this dysfunctional state of affairs, and > > that what should be an urgent fix for an observable problem is still > > not merged some nine months after it was first identified. > > I guess it might help if you sent the updated patch in a new thread. That I doubt. Eduardo has not bothered to reply at _any_ time. I have to question whether there is anyone even reading that email address, or whether it's a redirect to /dev/null. All the evidence I have right now is that this Eduardo is a ficticous character. I would have at least expected some complaints when I said "I've put it in linux-next" but... absolutely nothing. So... my only conclusion is that you're all pulling my leg that there _is_ this "Eduardo" maintainer who's supposed to be taking patches for this stuff. As I've said, I'm not bothering with this anymore, it's just far too much effort to play these stupid games. The deadlock can stay for all I care. Sorry.
On 13-08-15, 09:17, Russell King - ARM Linux wrote: > That I doubt. Eduardo has not bothered to reply at _any_ time. I > have to question whether there is anyone even reading that email > address, or whether it's a redirect to /dev/null. All the evidence I > have right now is that this Eduardo is a ficticous character. > > I would have at least expected some complaints when I said "I've put > it in linux-next" but... absolutely nothing. > > So... my only conclusion is that you're all pulling my leg that there > _is_ this "Eduardo" maintainer who's supposed to be taking patches for > this stuff. Haha, If that's the case then we are all in the same boat. Who is Eduardo ? A bot ? :) Anyway, AFAIK, he wasn't around for sometime and came back last week only. Even I have expected him to reply to yesterday's discussions. > As I've said, I'm not bothering with this anymore, it's just far too > much effort to play these stupid games. The deadlock can stay for all > I care. And finally I took the charge to carry it now. :)
Hi Russell, On Wed, Aug 12, 2015 at 08:41:49AM +0100, Russell King wrote: <cut> > > For some reason, despite Rafael saying that this should go in, despite it > having been reviewed, Eduardo appears to be ignoring basic fixes to code > under his control. Having waited more than a reasonable amount of time, > it is now time to take some action against this blockage. Hence, I'm > putting this into linux-next in the hope that it'll conflict and elicit > a response for this important bug fix. Sorry if I let this one to fall into the cracks for this long. I am looking at this now and queuing as soon as possible. Thanks for pushing this fix. BR, Eduardo > > Relevant message IDs: > 20141214213655.GA11285@n2100.arm.linux.org.uk > 7573578.UE8ufgjWuX@vostro.rjw.lan > 20141215230922.GL11285@n2100.arm.linux.org.uk > CAKohponhkk9BK6-7ScHeZa4R43iooWQFV8YoPLv6LjO40GNq=A@mail.gmail.com > 20150518185645.GA28053@n2100.arm.linux.org.uk > 2574268.XBqpdL2VLI@vostro.rjw.lan > --- > drivers/thermal/cpu_cooling.c | 22 ++++++++++++++++------ > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c > index 6509c61b9648..d1fd02eec58d 100644 > --- a/drivers/thermal/cpu_cooling.c > +++ b/drivers/thermal/cpu_cooling.c > @@ -107,6 +107,9 @@ struct cpufreq_cooling_device { > static DEFINE_IDR(cpufreq_idr); > static DEFINE_MUTEX(cooling_cpufreq_lock); > > +static unsigned int cpufreq_dev_count; > + > +static DEFINE_MUTEX(cooling_list_lock); > static LIST_HEAD(cpufreq_dev_list); > > /** > @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > switch (event) { > > case CPUFREQ_ADJUST: > - mutex_lock(&cooling_cpufreq_lock); > + mutex_lock(&cooling_list_lock); > list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { > if (!cpumask_test_cpu(policy->cpu, > &cpufreq_dev->allowed_cpus)) > @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, > cpufreq_verify_within_limits(policy, 0, > max_freq); > } > - mutex_unlock(&cooling_cpufreq_lock); > + mutex_unlock(&cooling_list_lock); > break; > default: > return NOTIFY_DONE; > @@ -865,12 +868,15 @@ __cpufreq_cooling_register(struct device_node *np, > cpufreq_dev->cool_dev = cool_dev; > > mutex_lock(&cooling_cpufreq_lock); > + mutex_lock(&cooling_list_lock); > + list_add(&cpufreq_dev->node, &cpufreq_dev_list); > + mutex_unlock(&cooling_list_lock); > > /* Register the notifier for first cpufreq cooling device */ > - if (list_empty(&cpufreq_dev_list)) > + if (cpufreq_dev_count == 0) > cpufreq_register_notifier(&thermal_cpufreq_notifier_block, > CPUFREQ_POLICY_NOTIFIER); > - list_add(&cpufreq_dev->node, &cpufreq_dev_list); > + cpufreq_dev_count++; > > mutex_unlock(&cooling_cpufreq_lock); > > @@ -1014,14 +1020,18 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) > > cpufreq_dev = cdev->devdata; > mutex_lock(&cooling_cpufreq_lock); > - list_del(&cpufreq_dev->node); > + cpufreq_dev_count--; > > /* Unregister the notifier for the last cpufreq cooling device */ > - if (list_empty(&cpufreq_dev_list)) > + if (cpufreq_dev_count == 0) > cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block, > CPUFREQ_POLICY_NOTIFIER); > mutex_unlock(&cooling_cpufreq_lock); > > + mutex_lock(&cooling_list_lock); > + list_del(&cpufreq_dev->node); > + mutex_unlock(&cooling_list_lock); > + > thermal_cooling_device_unregister(cpufreq_dev->cool_dev); > release_idr(&cpufreq_idr, cpufreq_dev->id); > kfree(cpufreq_dev->time_in_idle_timestamp); > -- > 2.1.0 >
On 13-08-15, 21:50, Eduardo Valentin wrote: > Sorry if I let this one to fall into the cracks for this long. > > I am looking at this now and queuing as soon as possible. You applied the wrong one, please drop it. And apply the one I have sent as a separate thread, with my sob on it.
On Fri, Aug 14, 2015 at 11:23:59AM +0530, Viresh Kumar wrote: > On 13-08-15, 21:50, Eduardo Valentin wrote: > > Sorry if I let this one to fall into the cracks for this long. > > > > I am looking at this now and queuing as soon as possible. > > You applied the wrong one, please drop it. And apply the one I have > sent as a separate thread, with my sob on it. OK. I am updating it to your V2. > > -- > viresh
On 14-08-15, 18:19, Eduardo Valentin wrote: > On Fri, Aug 14, 2015 at 11:23:59AM +0530, Viresh Kumar wrote: > > On 13-08-15, 21:50, Eduardo Valentin wrote: > > > Sorry if I let this one to fall into the cracks for this long. > > > > > > I am looking at this now and queuing as soon as possible. > > > > You applied the wrong one, please drop it. And apply the one I have > > sent as a separate thread, with my sob on it. > > OK. I am updating it to your V2. Looks fine now.
On Thursday, August 13, 2015 09:17:44 AM Russell King - ARM Linux wrote: > On Thu, Aug 13, 2015 at 03:20:35AM +0200, Rafael J. Wysocki wrote: > > On Tuesday, August 11, 2015 06:03:57 PM Russell King - ARM Linux wrote: > > > On Tue, May 19, 2015 at 12:05:55AM +0200, Rafael J. Wysocki wrote: > > > > On Monday, May 18, 2015 07:56:45 PM Russell King - ARM Linux wrote: > > > > > On Tue, Dec 16, 2014 at 09:11:53AM +0530, Viresh Kumar wrote: > > > > > > On 16 December 2014 at 04:39, Russell King - ARM Linux > > > > > > <linux@arm.linux.org.uk> wrote: > > > > > > > Well, here's a patch which I'm running on top of 3.18 at the moment, > > > > > > > which is basically what I described in my email, and I'm running with it > > > > > > > and it is without any lockdep complaint. > > > > > > > > > > > > We need two separate patches now, one for 3.18 and other one for 3.19-rc. > > > > > > 3.19 has see lots of changes in this particular file and so we need to > > > > > > change few things here. > > > > > > > > > > What happened with this? I'm still carrying the patch. > > > > > > > > This should go in through the thermal tree. Eduardo? > > > > > > Having waited a long time for any kind of response from Eduardo, I've > > > given up. My conclusion is that Eduardo isn't interested in this. > > > > > > I've re-checked, and the AB-BA deadlock is still there in the latest > > > code. So, I've taken it upon myself to throw this into my for-next > > > branch to force the issue - not something I _want_ to do, but I'm doing > > > this out of frustration. It's clear to me that "playing nice" by email > > > does _not_ work with some people. > > > > > > I'm rather hoping that Stephen reports a merge conflict with linux-next > > > this evening to highlight this situation. I've added additional commentry > > > to the commit message on the patch giving the reason why I've done this, > > > and the relevant message IDs showing the past history. > > > > > > I've not decided whether I'm going to ask Linus to take this patch > > > directly or not, that rather depends whether there's any co-operation > > > from Eduardo on this. I'd rather Eduardo took the patch. > > > > > > The patch I have has had to be updated again for changes to the driver, > > > but I really don't see the point of re-posting it just for it to be > > > ignored yet again. > > > > > > I'm really disappointed by this dysfunctional state of affairs, and > > > that what should be an urgent fix for an observable problem is still > > > not merged some nine months after it was first identified. > > > > I guess it might help if you sent the updated patch in a new thread. > > That I doubt. Eduardo has not bothered to reply at _any_ time. I > have to question whether there is anyone even reading that email > address, or whether it's a redirect to /dev/null. All the evidence I > have right now is that this Eduardo is a ficticous character. > > I would have at least expected some complaints when I said "I've put > it in linux-next" but... absolutely nothing. > > So... my only conclusion is that you're all pulling my leg that there > _is_ this "Eduardo" maintainer who's supposed to be taking patches for > this stuff. Well, that's the situation as per MAINTAINERS today. You seem to be concerned that it may not reflect the reality, but in that case I can only recommend sending a patch against MAINTAINERS to remove Eduardo from there. > As I've said, I'm not bothering with this anymore, it's just far too > much effort to play these stupid games. I'm not sure what you mean by "these stupid games" here. > The deadlock can stay for all I care. Fair enough. Thanks, Rafael
Hi there, On Mon, Aug 17, 2015 at 6:32 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: <cut> > > Well, that's the situation as per MAINTAINERS today. You seem to be concerned > that it may not reflect the reality, but in that case I can only recommend > sending a patch against MAINTAINERS to remove Eduardo from there. > >> As I've said, I'm not bothering with this anymore, it's just far too >> much effort to play these stupid games. > > I'm not sure what you mean by "these stupid games" here. > >> The deadlock can stay for all I care. > > Fair enough. I applied the patch. And it has generated the conflict in linux-next, as was mentioned in this thread. I already sent the pull request. Yes, I missed the thread. Apologize for this and for the inconvenience I have caused. The deadlock will be taken care using Russel's patch. Thanks Viresh for notifying me. > > Thanks, > Rafael > BR,
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c index ad09e51..622ea40 100644 --- a/drivers/thermal/cpu_cooling.c +++ b/drivers/thermal/cpu_cooling.c @@ -484,15 +484,15 @@ __cpufreq_cooling_register(struct device_node *np, cpufreq_dev->cpufreq_state = 0; mutex_lock(&cooling_cpufreq_lock); - /* Register the notifier for first cpufreq cooling device */ - if (cpufreq_dev_count == 0) - cpufreq_register_notifier(&thermal_cpufreq_notifier_block, - CPUFREQ_POLICY_NOTIFIER); cpufreq_dev_count++; list_add(&cpufreq_dev->node, &cpufreq_dev_list); mutex_unlock(&cooling_cpufreq_lock); + /* Register the notifier for first cpufreq cooling device */ + if (cpufreq_dev_count == 0) + cpufreq_register_notifier(&thermal_cpufreq_notifier_block, + CPUFREQ_POLICY_NOTIFIER); return cool_dev; }