Message ID | 20160525154618.GD13765@ulmo.ba.sec |
---|---|
State | Not Applicable, archived |
Headers | show |
On 5/25/2016 11:46 AM, Thierry Reding wrote: > On Wed, May 25, 2016 at 12:03:47PM +0100, Jon Hunter wrote: >> >> On 25/05/16 11:58, Jon Hunter wrote: >> >> ... >> >>> Looking at this a bit more I am wondering if we should prevent the >>> battery for being polled before the registration has completed ... >>> >>> diff --git a/drivers/power/bq27xxx_battery.c >>> b/drivers/power/bq27xxx_battery.c >>> index 45f6ebf88df6..32649183ecd9 100644 >>> --- a/drivers/power/bq27xxx_battery.c >>> +++ b/drivers/power/bq27xxx_battery.c >>> @@ -871,12 +871,14 @@ static int bq27xxx_battery_get_property(struct >>> power_supply *psy, >>> int ret = 0; >>> struct bq27xxx_device_info *di = power_supply_get_drvdata(psy); >>> >>> - mutex_lock(&di->lock); >>> - if (time_is_before_jiffies(di->last_update + 5 * HZ)) { >>> - cancel_delayed_work_sync(&di->work); >>> - bq27xxx_battery_poll(&di->work.work); >>> + if (di->bat) { >>> + mutex_lock(&di->lock); >>> + if (time_is_before_jiffies(di->last_update + 5 * HZ)) { >>> + cancel_delayed_work_sync(&di->work); >>> + bq27xxx_battery_poll(&di->work.work); >>> + } >>> + mutex_unlock(&di->lock); >>> } >>> - mutex_unlock(&di->lock); >> >> Alternatively, maybe the following is simpler ... >> >> diff --git a/drivers/power/bq27xxx_battery.c >> b/drivers/power/bq27xxx_battery.c >> index 45f6ebf88df6..8a713b52e9f6 100644 >> --- a/drivers/power/bq27xxx_battery.c >> +++ b/drivers/power/bq27xxx_battery.c >> @@ -733,7 +733,8 @@ static void bq27xxx_battery_poll(struct work_struct >> *work) >> container_of(work, struct bq27xxx_device_info, >> work.work); >> >> - bq27xxx_battery_update(di); >> + if (di->bat) >> + bq27xxx_battery_update(di); > > How about this, which should be the most minimal to fix it (though it's > completely untested) and still update the internal cache (it just won't > signal an supply change, which wouldn't work at this point anyway). The > patch makes up for the supply change notification by doing that instead > of a full bq27xxx_battery_update() at the end of ->probe(). This should > take care of always sending out a uevent on successful probe, whereas a > bq27xxx_battery_update() at the end of ->probe() may not send one if it > is presented with the same data. > The problem I see with this is that this only fixes this for the bq27xxx driver. The real problem is that during the registration for (di->bat = power_supply_register...) the core is calling back into the driver being registered passing it an incomplete struct. As far as I can tell, the call should never be made in the first place. In fact, for all drivers that register and support thermal, this should be happening. Adding Krzysztof Kozlowski who has looked into similar issues recently it appears. -rhyland
On 25/05/16 16:46, Thierry Reding wrote: ... > How about this, which should be the most minimal to fix it (though it's > completely untested) and still update the internal cache (it just won't > signal an supply change, which wouldn't work at this point anyway). The > patch makes up for the supply change notification by doing that instead > of a full bq27xxx_battery_update() at the end of ->probe(). This should > take care of always sending out a uevent on successful probe, whereas a > bq27xxx_battery_update() at the end of ->probe() may not send one if it > is presented with the same data. > > Thierry > --- >8 --- > diff --git a/drivers/power/bq27xxx_battery.c b/drivers/power/bq27xxx_battery.c > index 45f6ebf88df6..df1b4cb2bbc2 100644 > --- a/drivers/power/bq27xxx_battery.c > +++ b/drivers/power/bq27xxx_battery.c > @@ -717,7 +717,13 @@ void bq27xxx_battery_update(struct bq27xxx_device_info *di) > di->charge_design_full = bq27xxx_battery_read_dcap(di); > } > > - if (di->cache.capacity != cache.capacity) > + /* > + * This function ends up being called while the power supply is being > + * registered, hence di->bat will be NULL on the first call, causing > + * power_supply_changed() to oops. Avoid that by checking if we have > + * been registered already or not. > + */ > + if (di->bat && di->cache.capacity != cache.capacity) > power_supply_changed(di->bat); > > if (memcmp(&di->cache, &cache, sizeof(cache)) != 0) > @@ -984,7 +990,7 @@ int bq27xxx_battery_setup(struct bq27xxx_device_info *di) > > dev_info(di->dev, "support ver. %s enabled\n", DRIVER_VERSION); > > - bq27xxx_battery_update(di); > + power_supply_changed(di->bat); > > return 0; > } I think that would work too, my only concern is that this assumes that bq27xxx_battery_update() is called during the registration of the power supply. Looking at the backtrace from the panic we have ... [ 1.984150] [<ffff000008614984>] bq27xxx_battery_update+0x88/0x51c [ 1.990321] [<ffff000008615084>] bq27xxx_battery_poll+0x24/0x70 [ 1.996231] [<ffff000008615180>] bq27xxx_battery_get_property+0xb0/0x3b4 [ 2.002923] [<ffff0000086133d8>] power_supply_read_temp+0x2c/0x54 [ 2.009005] [<ffff000008616508>] thermal_zone_get_temp+0x5c/0x11c [ 2.015089] [<ffff0000086183b0>] thermal_zone_device_update+0x34/0xb4 [ 2.021518] [<ffff0000086193b4>] thermal_zone_device_register+0x87c/0x8cc [ 2.028295] [<ffff000008613b6c>] __power_supply_register+0x370/0x430 [ 2.034638] [<ffff000008613c54>] power_supply_register_no_ws+0x10/0x18 [ 2.041155] [<ffff000008614f1c>] bq27xxx_battery_setup+0x104/0x15c [ 2.047325] [<ffff000008615668>] bq27xxx_battery_i2c_probe+0xd0/0x1b0 Here bq27xxx_battery_update() is being called during the thermal zone registration and so as long as all bq27xxx devices have a POWER_SUPPLY_PROP_TEMP property then it *should* be ok. It would only break if there was a new bq27xxx with no temp support. May be that is a bit fragile and we are better off explicitly calling bq27xxx_battery_update()? Cheers Jon
On 25/05/16 16:55, Rhyland Klein wrote: > On 5/25/2016 11:46 AM, Thierry Reding wrote: >> On Wed, May 25, 2016 at 12:03:47PM +0100, Jon Hunter wrote: >>> >>> On 25/05/16 11:58, Jon Hunter wrote: >>> >>> ... >>> >>>> Looking at this a bit more I am wondering if we should prevent the >>>> battery for being polled before the registration has completed ... >>>> >>>> diff --git a/drivers/power/bq27xxx_battery.c >>>> b/drivers/power/bq27xxx_battery.c >>>> index 45f6ebf88df6..32649183ecd9 100644 >>>> --- a/drivers/power/bq27xxx_battery.c >>>> +++ b/drivers/power/bq27xxx_battery.c >>>> @@ -871,12 +871,14 @@ static int bq27xxx_battery_get_property(struct >>>> power_supply *psy, >>>> int ret = 0; >>>> struct bq27xxx_device_info *di = power_supply_get_drvdata(psy); >>>> >>>> - mutex_lock(&di->lock); >>>> - if (time_is_before_jiffies(di->last_update + 5 * HZ)) { >>>> - cancel_delayed_work_sync(&di->work); >>>> - bq27xxx_battery_poll(&di->work.work); >>>> + if (di->bat) { >>>> + mutex_lock(&di->lock); >>>> + if (time_is_before_jiffies(di->last_update + 5 * HZ)) { >>>> + cancel_delayed_work_sync(&di->work); >>>> + bq27xxx_battery_poll(&di->work.work); >>>> + } >>>> + mutex_unlock(&di->lock); >>>> } >>>> - mutex_unlock(&di->lock); >>> >>> Alternatively, maybe the following is simpler ... >>> >>> diff --git a/drivers/power/bq27xxx_battery.c >>> b/drivers/power/bq27xxx_battery.c >>> index 45f6ebf88df6..8a713b52e9f6 100644 >>> --- a/drivers/power/bq27xxx_battery.c >>> +++ b/drivers/power/bq27xxx_battery.c >>> @@ -733,7 +733,8 @@ static void bq27xxx_battery_poll(struct work_struct >>> *work) >>> container_of(work, struct bq27xxx_device_info, >>> work.work); >>> >>> - bq27xxx_battery_update(di); >>> + if (di->bat) >>> + bq27xxx_battery_update(di); >> >> How about this, which should be the most minimal to fix it (though it's >> completely untested) and still update the internal cache (it just won't >> signal an supply change, which wouldn't work at this point anyway). The >> patch makes up for the supply change notification by doing that instead >> of a full bq27xxx_battery_update() at the end of ->probe(). This should >> take care of always sending out a uevent on successful probe, whereas a >> bq27xxx_battery_update() at the end of ->probe() may not send one if it >> is presented with the same data. >> > > The problem I see with this is that this only fixes this for the bq27xxx > driver. The real problem is that during the registration for (di->bat = > power_supply_register...) the core is calling back into the driver being > registered passing it an incomplete struct. As far as I can tell, the > call should never be made in the first place. In fact, for all drivers > that register and support thermal, this should be happening. So power_supply_read_temp() calls ->get_property() and passes the power_supply psy struct which is initialised. The problem is that inside the bq27xxx driver, this then kicks off the worker thread to update the bq27xxx state and when this worker thread runs it attempts to access the same psy struct but by dereferencing a pointer to it from the bq27xxx_device_info where the pointer has not been initialised yet. Therefore, IMO it seems that we should not allow this worker thread to start until the registration has completed and hence the pointer is initialised. I don't see why the temperature could not be read during the registration to get the initial temp and it does seem to work fine if we prevent this worker thread from running. I am sure there are lot of other devices that have the POWER_SUPPLY_PROP_TEMP property and so I would have thought if this is a generic problem it would have come up before now? Plus this worker thread that triggers the crash is specific to the bq27xxx. Cheers Jon
--- a/drivers/power/bq27xxx_battery.c +++ b/drivers/power/bq27xxx_battery.c @@ -717,7 +717,13 @@ void bq27xxx_battery_update(struct bq27xxx_device_info *di) di->charge_design_full = bq27xxx_battery_read_dcap(di); } - if (di->cache.capacity != cache.capacity) + /* + * This function ends up being called while the power supply is being + * registered, hence di->bat will be NULL on the first call, causing + * power_supply_changed() to oops. Avoid that by checking if we have + * been registered already or not. + */ + if (di->bat && di->cache.capacity != cache.capacity) power_supply_changed(di->bat); if (memcmp(&di->cache, &cache, sizeof(cache)) != 0) @@ -984,7 +990,7 @@ int bq27xxx_battery_setup(struct bq27xxx_device_info *di) dev_info(di->dev, "support ver. %s enabled\n", DRIVER_VERSION); - bq27xxx_battery_update(di); + power_supply_changed(di->bat); return 0; }