diff mbox

windfarm: decrement client count when unregistering

Message ID 1438344538.13223.9.camel@tiscali.nl (mailing list archive)
State Accepted
Delegated to: Michael Ellerman
Headers show

Commit Message

Paul Bolle July 31, 2015, 12:08 p.m. UTC
wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.

Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines")

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
---
cross-compiled only. I don't have a PPC machine at hand, sorry. And this
does need some run-time testing, I'd day.

windfarm_corex_exit() contains:
    BUG_ON(wf_client_count != 0);

I wonder why that, apparently. never triggered.

 drivers/macintosh/windfarm_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michael Ellerman Aug. 5, 2015, 4:16 a.m. UTC | #1
On Fri, 2015-31-07 at 12:08:58 UTC, Paul Bolle wrote:
> wf_unregister_client() increments the client count when a client
> unregisters. That is obviously incorrect. Decrement that client count
> instead.
> 
> Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines")
> 
> Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
> ---
> cross-compiled only. I don't have a PPC machine at hand, sorry. And this
> does need some run-time testing, I'd day.
> 
> windfarm_corex_exit() contains:
>     BUG_ON(wf_client_count != 0);
> 
> I wonder why that, apparently. never triggered.

Hmm interesting.

A quick test here on an iMacG5 shows that we get into a state where we can't
remove windfarm_core:

  $ lsmod
  Module                  Size  Used by
  windfarm_smu_sensors    7549  2
  windfarm_core          15391  1 windfarm_smu_sensors


Which means we can't trigger windfarm_core_exit() and the BUG_ON().

I also get an oops when removing windfarm_lm75_sensor, so I suspect there are
gremlins in the module ref counting for windfarm.

I'll merge this as probably correct.

  ------------[ cut here ]------------
  WARNING: at ../kernel/module.c:1116
  Modules linked in: windfarm_lm75_sensor(-) windfarm_smu_sensors windfarm_smu_controls windfarm_core [last unloaded: windfarm_cpufreq_clamp]
  CPU: 0 PID: 2860 Comm: modprobe Not tainted 4.2.0-rc2-00043-gf4e908dd3cbe-dirty #2
  task: c00000003d9c4fe0 ti: c00000003df20000 task.ti: c00000003df20000
  NIP: c0000000000d62d0 LR: d0000000004338bc CTR: c0000000000d62a0
  REGS: c00000003df23660 TRAP: 0700   Not tainted  (4.2.0-rc2-00043-gf4e908dd3cbe-dirty)
  MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 82002884  XER: 20000000
  SOFTE: 1 
  GPR00: d0000000004338b0 c00000003df238e0 c000000000b27800 d000000000474b00 
  GPR04: c00000003d185900 0000000000000001 000000003e5de000 000000000000175c 
  GPR08: c000000000a2c068 0000000000000001 ffffffffffffffff d0000000004343c0 
  GPR12: c0000000000d62a0 c00000000ffff000 0000000000000004 0000000000000001 
  GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000ffe108bc 
  GPR20: 0000000000000000 00000000209b0278 0000000000000000 0000000000000001 
  GPR24: 00000000ffe11915 00000000209b0008 00000000209b02ac 0000000000000000 
  GPR28: 0000000000000000 c00000003d1b3c80 0000000000000000 d000000000474b00 
  NIP [c0000000000d62d0] .module_put+0x30/0x40
  LR [d0000000004338bc] .wf_put_sensor+0x9c/0xf0 [windfarm_core]
  Call Trace:
  [c00000003df238e0] [d0000000004338b0] .wf_put_sensor+0x90/0xf0 [windfarm_core] (unreliable)
  [c00000003df23960] [d000000000474020] .wf_lm75_remove+0x20/0x40 [windfarm_lm75_sensor]
  [c00000003df239d0] [c00000000058cb8c] .i2c_device_remove+0x7c/0xb0
  [c00000003df23a50] [c000000000450dd4] .__device_release_driver+0xb4/0x180
  [c00000003df23ad0] [c000000000451a08] .driver_detach+0x138/0x180
  [c00000003df23b70] [c000000000450720] .bus_remove_driver+0x70/0xf0
  [c00000003df23bf0] [c0000000004523a8] .driver_unregister+0x38/0x70
  [c00000003df23c70] [c00000000058d718] .i2c_del_driver+0x28/0x40
  [c00000003df23cf0] [d0000000004743fc] .wf_lm75_driver_exit+0x18/0x2cc [windfarm_lm75_sensor]
  [c00000003df23d60] [c0000000000d82bc] .SyS_delete_module+0x18c/0x250
  [c00000003df23e30] [c000000000007c98] system_call+0x38/0xd0
  Instruction dump:
  2c230000 4d820020 392302e0 7c2004ac 7d404828 2c0a0001 394affff 41c00010 
  7d40492d 40c2ffec 7c0004ac 55490ffe <0b090000> 4e800020 60000000 60000000 
  ---[ end trace 013348a741cf9320 ]---


cheers
Paul Bolle Aug. 6, 2015, 10:21 p.m. UTC | #2
On wo, 2015-08-05 at 14:16 +1000, Michael Ellerman wrote:
> On Fri, 2015-31-07 at 12:08:58 UTC, Paul Bolle wrote:
> > windfarm_corex_exit() contains:
> >     BUG_ON(wf_client_count != 0);
> > 
> > I wonder why that, apparently. never triggered.
> 
> Hmm interesting.
> 
> A quick test here on an iMacG5 shows that we get into a state where we can't
> remove windfarm_core:
> 
>   $ lsmod
>   Module                  Size  Used by
>   windfarm_smu_sensors    7549  2
>   windfarm_core          15391  1 windfarm_smu_sensors
> 
> 
> Which means we can't trigger windfarm_core_exit() and the BUG_ON().

Perhaps this is what, roughly, happens:

smu_sensors_init()
    smu_ads_create()
        /* Let's assume this happens ... */
        ads->sens.ops = &smu_cpuamp_ops
        ads->sens.name = "cpu-current"
    smu_ads_create()
        /* ditto ... */
        ads->sens.ops = &smu_cpuvolt_ops
        ads->sens.name = "cpu-voltage"

    /* ... so this would then be true */
    if (volt_sensor && curr_sensor)
        /* and we do this */
        smu_cpu_power_create(&volt_sensor->sens, &curr_sensor->sens)
            wf_get_sensor(&volt_sensor->sens)
                try_module_get(volt_sensor->sens->ops->owner /* THIS_MODULE */)
            wf_get_sensor(&curr_sensor->sens)
                try_module_get(curr_sensor->sens->ops->owner /* THIS_MODULE */)

The cleanup would have happened here:

smu_sensors_exit()
    while (!list_empty(&smu_ads)
        wf_unregister_sensor(&ads->sens)
            wf_put_sensor()
                /* would this also be done for sensors that never 
                 * triggered a call to module_get()? */
                module_put(ads->sens->ops->owner /* THIS MODULE */)

But, whatever it is that smu_sensors_exit() wants to do, it will never
be called since there are these two references to this module that
smu_sensors_init() created itself, preventing the unloading of this
module.

Does the above look plausible?

Note that this was only cobbled together by staring at the code for far
too long. If I had some powerpc machine at hand I could have actually
tested this with a few strategically placed printk()'s.

> I also get an oops when removing windfarm_lm75_sensor, so I suspect there are
> gremlins in the module ref counting for windfarm.

(This I haven't (yet) looked into.)

> I'll merge this as probably correct.

Hope this helps,


Paul Bolle
Paul Bolle Aug. 6, 2015, 11:09 p.m. UTC | #3
On vr, 2015-08-07 at 00:21 +0200, Paul Bolle wrote:
> On wo, 2015-08-05 at 14:16 +1000, Michael Ellerman wrote:
> > I also get an oops when removing windfarm_lm75_sensor, so I suspect there are
> > gremlins in the module ref counting for windfarm.
> 
> (This I haven't (yet) looked into.)

And that might be, sort of, related. Because oops is probably triggered
by the, it seems, rather straightforward chain of events triggered by
unloading an I2C module. (So windfarm_lm75_sensor refcount must be
zero.) Which gets interesting at:

    wf_lm75_remove()
        wf_unregister_sensor(&wf_lm75_sensor->sens)
            wf_put_sensor(&wf_lm75_sensor->sens)
                module_put(wf_lm75_sensor->sens->ops->owner /* THIS_MODULE */)

And in windfarm_lm75_sensor we trigger this issue because in the
.probe() function there appears to be no corresponding call to
try_module_get() preventing unloading the module, as we saw in windfarm_
smu_sensors.

So module refcounting looks broken for both these modules in opposite
ways. Gremlins indeed.

Good luck!


Paul Bolle
Michael Ellerman Aug. 10, 2015, 9:27 a.m. UTC | #4
On Fri, 2015-31-07 at 12:08:58 UTC, Paul Bolle wrote:
> wf_unregister_client() increments the client count when a client
> unregisters. That is obviously incorrect. Decrement that client count
> instead.
> 
> Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines")
> 
> Signed-off-by: Paul Bolle <pebolle@tiscali.nl>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fe2b592173ff0274e70d

cheers
diff mbox

Patch

diff --git a/drivers/macintosh/windfarm_core.c
b/drivers/macintosh/windfarm_core.c
index 3ee198b65843..cc7ece1712b5 100644
--- a/drivers/macintosh/windfarm_core.c
+++ b/drivers/macintosh/windfarm_core.c
@@ -435,7 +435,7 @@  int wf_unregister_client(struct notifier_block *nb)
 {
 	mutex_lock(&wf_lock);
 	blocking_notifier_chain_unregister(&wf_client_list, nb);
-	wf_client_count++;
+	wf_client_count--;
 	if (wf_client_count == 0)
 		wf_stop_thread();
 	mutex_unlock(&wf_lock);