diff mbox

arm64: defconfig: Enable cros-ec and battery driver

Message ID 5745D2DD.6080300@nvidia.com
State Not Applicable, archived
Headers show

Commit Message

Jon Hunter May 25, 2016, 4:29 p.m. UTC
On 25/05/16 17:10, Jon Hunter wrote:

...

> So power_supply_read_temp() calls ->get_property() and passes the
> power_supply psy struct which is initialised. The problem is that inside
> the bq27xxx driver, this then kicks off the worker thread to update the
> bq27xxx state and when this worker thread runs it attempts to access the
> same psy struct but by dereferencing a pointer to it from the
> bq27xxx_device_info where the pointer has not been initialised yet.
> Therefore, IMO it seems that we should not allow this worker thread to
> start until the registration has completed and hence the pointer is
> initialised.

Sorry, it is not the actual worker thread that triggers the NULL pointer
deference, but the function bq27xxx_battery_poll() that schedules the
worker thread. Anyway, I still don't see that we need to update the
bq27xxx state during the registration especially seeing as we call
bq27xxx_battery_update() after the registration is complete. It seems
that updating the overall state should be mutually exclusive from
reading the temp.

Looking at my patch, it does appear that the worker thread which also
calls bq27xxx_battery_update() is still scheduled and so may be it
should be ...



Cheers
Jon

Comments

Rhyland Klein May 25, 2016, 4:36 p.m. UTC | #1
On 5/25/2016 12:29 PM, Jon Hunter wrote:
> 
> On 25/05/16 17:10, Jon Hunter wrote:
> 
> ...
> 
>> So power_supply_read_temp() calls ->get_property() and passes the
>> power_supply psy struct which is initialised. The problem is that inside
>> the bq27xxx driver, this then kicks off the worker thread to update the
>> bq27xxx state and when this worker thread runs it attempts to access the
>> same psy struct but by dereferencing a pointer to it from the
>> bq27xxx_device_info where the pointer has not been initialised yet.
>> Therefore, IMO it seems that we should not allow this worker thread to
>> start until the registration has completed and hence the pointer is
>> initialised.
> 
> Sorry, it is not the actual worker thread that triggers the NULL pointer
> deference, but the function bq27xxx_battery_poll() that schedules the
> worker thread. Anyway, I still don't see that we need to update the
> bq27xxx state during the registration especially seeing as we call
> bq27xxx_battery_update() after the registration is complete. It seems
> that updating the overall state should be mutually exclusive from
> reading the temp.
> 
> Looking at my patch, it does appear that the worker thread which also
> calls bq27xxx_battery_update() is still scheduled and so may be it
> should be ...
> 
> diff --git a/drivers/power/bq27xxx_battery.c b/drivers/power/bq27xxx_battery.c
> index 45f6ebf88df6..1334ed522332 100644
> --- a/drivers/power/bq27xxx_battery.c
> +++ b/drivers/power/bq27xxx_battery.c
> @@ -733,6 +733,9 @@ static void bq27xxx_battery_poll(struct work_struct *work)
>                         container_of(work, struct bq27xxx_device_info,
>                                      work.work);
>  
> +       if (!di->bat)
> +               return;
> +
>         bq27xxx_battery_update(di);
>  
>         if (poll_interval > 0) {
> 
> 


I can see that getting the temperature could work. I would point out
that I don't see any recent changes to bq27xxx or the power_supply_core
that would imply this is a regression. My guess is that up until now,
for devices that support the TEMP property, CONFIG_THERMAL isn't been
enabled.

So here are my thoughts.... we can do 2 things here:

1) patch bq27xxx in some manner that will allow the bq27xxx driver to
work report the temp during register (such as above patch).
2) Patch the core to avoid using get_property callback during registration.

I think for our immediate concern and crash, #1 is fine. It will work
and is fine. I however think this is just a symptom of the larger issue
(#2). In this case, the problem we see is that di->bat is used before it
is set, and we have a way around it. However, for EVERY device that
registers and has TEMP prop (and CONFIG_THERMAL enabled) it is going to
receive a call with its relative di->bat uninitialized too.

I don't know for certain if #2 has caused problems anywhere else, and I
would be surprised if it has and hasn't been caught.


AS far as this crash is concerned, I think either approach will work.
Adding in David, Dmitry, and Sebastian (maintainers) to see if they have
a preferred approach.

-rhyland
Jon Hunter May 25, 2016, 5:26 p.m. UTC | #2
On 25/05/16 17:36, Rhyland Klein wrote:

...

> I can see that getting the temperature could work. I would point out
> that I don't see any recent changes to bq27xxx or the power_supply_core
> that would imply this is a regression. My guess is that up until now,
> for devices that support the TEMP property, CONFIG_THERMAL isn't been
> enabled.
> 
> So here are my thoughts.... we can do 2 things here:
> 
> 1) patch bq27xxx in some manner that will allow the bq27xxx driver to
> work report the temp during register (such as above patch).
> 2) Patch the core to avoid using get_property callback during registration.

I wonder if #2 will cause other problems for other devices as this
really changes the functionality.

> I think for our immediate concern and crash, #1 is fine. It will work
> and is fine. I however think this is just a symptom of the larger issue
> (#2). In this case, the problem we see is that di->bat is used before it
> is set, and we have a way around it. However, for EVERY device that
> registers and has TEMP prop (and CONFIG_THERMAL enabled) it is going to
> receive a call with its relative di->bat uninitialized too.

I don't think we are understanding each other and I still think that
this could be specific to the bq27xxx. And here is why ...

The power supply driver is going to receive a call to it's
->get_property() function with a *VALID* pointer to the power_supply
structure, *psy. When initialised, di->bat == psy (assuming bq27xxx
naming) and so in other words, they point to the same thing. Therefore,
in the normal case, you should not need to access di->bat from within
->get_property() because you already have a valid pointer to the
power_supply structure, *psy.

So the ONLY problem would be IF the ->get_property function calls
power_supply_get_drvdata() to get a pointer to the drvdata, *di, and
then dereferences and uses the pointer, di->bat, which may NOT be
initialised yet. I am wondering how likely this is as it seems a bit
daft to do this, unless you are doing something like the bq27xxx driver
is attempting to do.

Again IMO the problem is related to how the bq27xxx driver has
implemented the status update.

Cheers
Jon
Rhyland Klein May 25, 2016, 7:44 p.m. UTC | #3
On 5/25/2016 1:26 PM, Jon Hunter wrote:
> 
> On 25/05/16 17:36, Rhyland Klein wrote:
> 
> ...
> 
>> I can see that getting the temperature could work. I would point out
>> that I don't see any recent changes to bq27xxx or the power_supply_core
>> that would imply this is a regression. My guess is that up until now,
>> for devices that support the TEMP property, CONFIG_THERMAL isn't been
>> enabled.
>>
>> So here are my thoughts.... we can do 2 things here:
>>
>> 1) patch bq27xxx in some manner that will allow the bq27xxx driver to
>> work report the temp during register (such as above patch).
>> 2) Patch the core to avoid using get_property callback during registration.
> 
> I wonder if #2 will cause other problems for other devices as this
> really changes the functionality.
> 
>> I think for our immediate concern and crash, #1 is fine. It will work
>> and is fine. I however think this is just a symptom of the larger issue
>> (#2). In this case, the problem we see is that di->bat is used before it
>> is set, and we have a way around it. However, for EVERY device that
>> registers and has TEMP prop (and CONFIG_THERMAL enabled) it is going to
>> receive a call with its relative di->bat uninitialized too.
> 
> I don't think we are understanding each other and I still think that
> this could be specific to the bq27xxx. And here is why ...
> 
> The power supply driver is going to receive a call to it's
> ->get_property() function with a *VALID* pointer to the power_supply
> structure, *psy. When initialised, di->bat == psy (assuming bq27xxx
> naming) and so in other words, they point to the same thing. Therefore,
> in the normal case, you should not need to access di->bat from within
> ->get_property() because you already have a valid pointer to the
> power_supply structure, *psy.
> 
> So the ONLY problem would be IF the ->get_property function calls
> power_supply_get_drvdata() to get a pointer to the drvdata, *di, and
> then dereferences and uses the pointer, di->bat, which may NOT be
> initialised yet. I am wondering how likely this is as it seems a bit
> daft to do this, unless you are doing something like the bq27xxx driver
> is attempting to do.
> 
> Again IMO the problem is related to how the bq27xxx driver has
> implemented the status update.
> 

And you might be completely correct, that is something that can only
happen specifically with the bq27xxx driver. In which case, making the
fix there should be the fix. I just know from the commit log (and some
previous work with power supply drivers) that the case of get_property
being called during registration has caused problems before. That's why
I am trying to make sure we cover the generic case if it exists. Using
scheduled work is common for power_supplies to regularly update their
status.

As for your proposed patches for bq27xxx, I think the latest one you
suggested (@12:36PM EST) with the change for
battery_update->battery_poll as well makes the most sense for bq27xxx. I
would like to point out though that if we patch this, the cache won't be
populated for the first TEMP request, which has the same end effect as
the patch I proposed to power_supply_read_temp. I believe both will
return 0 for the temp.

I think that patch would work just fine in place of what I suggested for
this specific crash.

-rhyland
Krzysztof Kozlowski May 27, 2016, 8:37 a.m. UTC | #4
On 05/25/2016 09:44 PM, Rhyland Klein wrote:
> On 5/25/2016 1:26 PM, Jon Hunter wrote:
>>
>> On 25/05/16 17:36, Rhyland Klein wrote:
>>
>> ...
>>
>>> I can see that getting the temperature could work. I would point out
>>> that I don't see any recent changes to bq27xxx or the power_supply_core
>>> that would imply this is a regression. My guess is that up until now,
>>> for devices that support the TEMP property, CONFIG_THERMAL isn't been
>>> enabled.
>>>
>>> So here are my thoughts.... we can do 2 things here:
>>>
>>> 1) patch bq27xxx in some manner that will allow the bq27xxx driver to
>>> work report the temp during register (such as above patch).
>>> 2) Patch the core to avoid using get_property callback during registration.
>>
>> I wonder if #2 will cause other problems for other devices as this
>> really changes the functionality.
>>
>>> I think for our immediate concern and crash, #1 is fine. It will work
>>> and is fine. I however think this is just a symptom of the larger issue
>>> (#2). In this case, the problem we see is that di->bat is used before it
>>> is set, and we have a way around it. However, for EVERY device that
>>> registers and has TEMP prop (and CONFIG_THERMAL enabled) it is going to
>>> receive a call with its relative di->bat uninitialized too.
>>
>> I don't think we are understanding each other and I still think that
>> this could be specific to the bq27xxx. And here is why ...
>>
>> The power supply driver is going to receive a call to it's
>> ->get_property() function with a *VALID* pointer to the power_supply
>> structure, *psy. When initialised, di->bat == psy (assuming bq27xxx
>> naming) and so in other words, they point to the same thing. Therefore,
>> in the normal case, you should not need to access di->bat from within
>> ->get_property() because you already have a valid pointer to the
>> power_supply structure, *psy.
>>
>> So the ONLY problem would be IF the ->get_property function calls
>> power_supply_get_drvdata() to get a pointer to the drvdata, *di, and
>> then dereferences and uses the pointer, di->bat, which may NOT be
>> initialised yet. I am wondering how likely this is as it seems a bit
>> daft to do this, unless you are doing something like the bq27xxx driver
>> is attempting to do.
>>
>> Again IMO the problem is related to how the bq27xxx driver has
>> implemented the status update.
>>
> 
> And you might be completely correct, that is something that can only
> happen specifically with the bq27xxx driver. In which case, making the
> fix there should be the fix. I just know from the commit log (and some
> previous work with power supply drivers) that the case of get_property
> being called during registration has caused problems before. That's why
> I am trying to make sure we cover the generic case if it exists. Using
> scheduled work is common for power_supplies to regularly update their
> status.
> 
> As for your proposed patches for bq27xxx, I think the latest one you
> suggested (@12:36PM EST) with the change for
> battery_update->battery_poll as well makes the most sense for bq27xxx. I
> would like to point out though that if we patch this, the cache won't be
> populated for the first TEMP request, which has the same end effect as
> the patch I proposed to power_supply_read_temp. I believe both will
> return 0 for the temp.
> 
> I think that patch would work just fine in place of what I suggested for
> this specific crash.

Hello all,

Indeed I was struggling with similar issue in bq27x00_battery. The issue
was introduced by... me :(  when moving the ownership of power supply
structure from driver to the core. However IMHO my change exposed the
fundamental problem with power supply.

Anyway a fix for this issue was:
7f1a57fdd6cb6e7b (power_supply: Fix possible NULL pointer dereference on
early uevent)
AFAIU, this fix no longer fixes all the issues, right?

As for the fundamental problem, the power supply core should not call
back the driver (get_property()) until the probe ends. Even if the
di->bat was initialized, some other fields of driver could not be set
yet. In general, the probe did not end so we should avoid calling driver
internal functions.

In this particular problem:
1. Fix for the driver (!di->bat) is okay... but it won't solve the
problem in general.
2. I think the core should handle it somehow...

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Kozlowski May 27, 2016, 9:19 a.m. UTC | #5
On 05/27/2016 10:37 AM, Krzysztof Kozlowski wrote:
>> And you might be completely correct, that is something that can only
>> happen specifically with the bq27xxx driver. In which case, making the
>> fix there should be the fix. I just know from the commit log (and some
>> previous work with power supply drivers) that the case of get_property
>> being called during registration has caused problems before. That's why
>> I am trying to make sure we cover the generic case if it exists. Using
>> scheduled work is common for power_supplies to regularly update their
>> status.
>>
>> As for your proposed patches for bq27xxx, I think the latest one you
>> suggested (@12:36PM EST) with the change for
>> battery_update->battery_poll as well makes the most sense for bq27xxx. I
>> would like to point out though that if we patch this, the cache won't be
>> populated for the first TEMP request, which has the same end effect as
>> the patch I proposed to power_supply_read_temp. I believe both will
>> return 0 for the temp.
>>
>> I think that patch would work just fine in place of what I suggested for
>> this specific crash.
> 
> Hello all,
> 
> Indeed I was struggling with similar issue in bq27x00_battery. The issue
> was introduced by... me :(  when moving the ownership of power supply
> structure from driver to the core. However IMHO my change exposed the
> fundamental problem with power supply.
> 
> Anyway a fix for this issue was:
> 7f1a57fdd6cb6e7b (power_supply: Fix possible NULL pointer dereference on
> early uevent)
> AFAIU, this fix no longer fixes all the issues, right?
> 
> As for the fundamental problem, the power supply core should not call
> back the driver (get_property()) until the probe ends. Even if the
> di->bat was initialized, some other fields of driver could not be set
> yet. In general, the probe did not end so we should avoid calling driver
> internal functions.
> 
> In this particular problem:
> 1. Fix for the driver (!di->bat) is okay... but it won't solve the
> problem in general.
> 2. I think the core should handle it somehow...

I was thinking about some more generic solutions for that. Few ideas:
1. Split the power_supply_register() into register + manual call to
power_supply_changed(). Each driver will have to call the
power_supply_changed() when it is ready to do it. After that call, it is
expected that driver provides everything for power supply (it can
receive callbacks).

2. Since 7f1a57fdd6cb ("power_supply: Fix possible NULL pointer
dereference on early uevent") the power_supply_changed() is called from
a deferred work. Separate thread. We can introduce (in the core only) a
mutex:
	power_supply_deferred_register_work()
	{
		psy->mutex_lock();
		power_supply_changed(psy);
		psy->mutex_unlock();
	}
and add it also to all of API:
	power_supply_get_property() {
		psy->mutex_lock();
		psy->get_property();
		psy->mutex_unlock();
	}
The changes would be limited only to the core but we will introduce
strict locking over all of the psy callbacks.

3. We can go back to previous API, leaving the allocation done by the core:
	some_drv_probe() {
		err = power_supply_register(&some_drv->psy...);
	}


I think the second solution seems to be the most self-contained and robust.

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter May 27, 2016, 10:28 a.m. UTC | #6
Hi Krzysztof,

On 27/05/16 09:37, Krzysztof Kozlowski wrote:

...

> Indeed I was struggling with similar issue in bq27x00_battery. The issue
> was introduced by... me :(  when moving the ownership of power supply
> structure from driver to the core. However IMHO my change exposed the
> fundamental problem with power supply.
> 
> Anyway a fix for this issue was:
> 7f1a57fdd6cb6e7b (power_supply: Fix possible NULL pointer dereference on
> early uevent)
> AFAIU, this fix no longer fixes all the issues, right?
> 
> As for the fundamental problem, the power supply core should not call
> back the driver (get_property()) until the probe ends. Even if the
> di->bat was initialized, some other fields of driver could not be set
> yet. In general, the probe did not end so we should avoid calling driver
> internal functions.

For my understanding, can you elaborate why the power-supply core should
not call back to the drivers ->get_property() before the probe ends? I
assume that registering the power-supply should be the last thing done
in the probe and so the power-supply should be configured at that point.

The problems with the bq27xxx seem to stem from the periodic update of
the bq27xxx status and so it is not clear to me that this is a generic
problem for all power-supply devices.

Cheers
Jon
Krzysztof Kozlowski May 27, 2016, 11:46 a.m. UTC | #7
On 05/27/2016 12:28 PM, Jon Hunter wrote:
> Hi Krzysztof,
> 
> On 27/05/16 09:37, Krzysztof Kozlowski wrote:
> 
> ...
> 
>> Indeed I was struggling with similar issue in bq27x00_battery. The issue
>> was introduced by... me :(  when moving the ownership of power supply
>> structure from driver to the core. However IMHO my change exposed the
>> fundamental problem with power supply.
>>
>> Anyway a fix for this issue was:
>> 7f1a57fdd6cb6e7b (power_supply: Fix possible NULL pointer dereference on
>> early uevent)
>> AFAIU, this fix no longer fixes all the issues, right?
>>
>> As for the fundamental problem, the power supply core should not call
>> back the driver (get_property()) until the probe ends. Even if the
>> di->bat was initialized, some other fields of driver could not be set
>> yet. In general, the probe did not end so we should avoid calling driver
>> internal functions.
> 
> For my understanding, can you elaborate why the power-supply core should
> not call back to the drivers ->get_property() before the probe ends? I
> assume that registering the power-supply should be the last thing done
> in the probe and so the power-supply should be configured at that point.

It is not only about power supply but other resources allocated by the
driver. If the power_supply_register() is a last call, then no problem.
But if not, then these resources won't be available.

Actually I exaggerated a little bit as a fundamental problem as this is
quite common pattern. When driver provides something (like power supply)
then after registration it should be ready for calls coming from the
core or user space. It does not have to be power supply. It might be
exposing sysfs entries or file operations (exposed before calling
power_supply_register()).

> The problems with the bq27xxx seem to stem from the periodic update of
> the bq27xxx status and so it is not clear to me that this is a generic
> problem for all power-supply devices.

Initially, the generic problem was that the core would call back the
driver from power_supply_register() in a synchronous way through
power_supply_changed(). The commit 7f1a57fdd6c changed it to an
asynchronous call. Here it looks like the same problem - the
power_supply_register() calls thermal which calls
thermal_zone_device_update() and we are back at the driver... before
finishing power_supply_register() call.

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter May 27, 2016, 12:17 p.m. UTC | #8
On 27/05/16 12:46, Krzysztof Kozlowski wrote:
> On 05/27/2016 12:28 PM, Jon Hunter wrote:
>> Hi Krzysztof,
>>
>> On 27/05/16 09:37, Krzysztof Kozlowski wrote:
>>
>> ...
>>
>>> Indeed I was struggling with similar issue in bq27x00_battery. The issue
>>> was introduced by... me :(  when moving the ownership of power supply
>>> structure from driver to the core. However IMHO my change exposed the
>>> fundamental problem with power supply.
>>>
>>> Anyway a fix for this issue was:
>>> 7f1a57fdd6cb6e7b (power_supply: Fix possible NULL pointer dereference on
>>> early uevent)
>>> AFAIU, this fix no longer fixes all the issues, right?
>>>
>>> As for the fundamental problem, the power supply core should not call
>>> back the driver (get_property()) until the probe ends. Even if the
>>> di->bat was initialized, some other fields of driver could not be set
>>> yet. In general, the probe did not end so we should avoid calling driver
>>> internal functions.
>>
>> For my understanding, can you elaborate why the power-supply core should
>> not call back to the drivers ->get_property() before the probe ends? I
>> assume that registering the power-supply should be the last thing done
>> in the probe and so the power-supply should be configured at that point.
> 
> It is not only about power supply but other resources allocated by the
> driver. If the power_supply_register() is a last call, then no problem.
> But if not, then these resources won't be available.
> 
> Actually I exaggerated a little bit as a fundamental problem as this is
> quite common pattern. When driver provides something (like power supply)
> then after registration it should be ready for calls coming from the
> core or user space. It does not have to be power supply. It might be
> exposing sysfs entries or file operations (exposed before calling
> power_supply_register()).

Right, exactly when you register with the power-supply core the device
better be ready so that handle any incoming calls.

>> The problems with the bq27xxx seem to stem from the periodic update of
>> the bq27xxx status and so it is not clear to me that this is a generic
>> problem for all power-supply devices.
> 
> Initially, the generic problem was that the core would call back the
> driver from power_supply_register() in a synchronous way through
> power_supply_changed(). The commit 7f1a57fdd6c changed it to an
> asynchronous call. Here it looks like the same problem - the
> power_supply_register() calls thermal which calls
> thermal_zone_device_update() and we are back at the driver... before
> finishing power_supply_register() call.

So I am still not convinced this is a generic problem but a problem with
the bq27xxx. In fact, I think that commit 7f1a57fdd6c could be avoided
if we did something like ...

http://marc.info/?l=linux-kernel&m=146425896332433&w=2

AFAICT in most cases, in ->get_property() you should have no need to
access a driver's equivalent of di->bat, because you have already been
passed a pointer to this via the *psy argument.

Cheers
Jon
Krzysztof Kozlowski May 27, 2016, 12:55 p.m. UTC | #9
On 05/27/2016 02:17 PM, Jon Hunter wrote:
> 
> On 27/05/16 12:46, Krzysztof Kozlowski wrote:
>> On 05/27/2016 12:28 PM, Jon Hunter wrote:
>>> Hi Krzysztof,
>>>
>>> On 27/05/16 09:37, Krzysztof Kozlowski wrote:
>>>
>>> ...
>>>
>>>> Indeed I was struggling with similar issue in bq27x00_battery. The issue
>>>> was introduced by... me :(  when moving the ownership of power supply
>>>> structure from driver to the core. However IMHO my change exposed the
>>>> fundamental problem with power supply.
>>>>
>>>> Anyway a fix for this issue was:
>>>> 7f1a57fdd6cb6e7b (power_supply: Fix possible NULL pointer dereference on
>>>> early uevent)
>>>> AFAIU, this fix no longer fixes all the issues, right?
>>>>
>>>> As for the fundamental problem, the power supply core should not call
>>>> back the driver (get_property()) until the probe ends. Even if the
>>>> di->bat was initialized, some other fields of driver could not be set
>>>> yet. In general, the probe did not end so we should avoid calling driver
>>>> internal functions.
>>>
>>> For my understanding, can you elaborate why the power-supply core should
>>> not call back to the drivers ->get_property() before the probe ends? I
>>> assume that registering the power-supply should be the last thing done
>>> in the probe and so the power-supply should be configured at that point.
>>
>> It is not only about power supply but other resources allocated by the
>> driver. If the power_supply_register() is a last call, then no problem.
>> But if not, then these resources won't be available.
>>
>> Actually I exaggerated a little bit as a fundamental problem as this is
>> quite common pattern. When driver provides something (like power supply)
>> then after registration it should be ready for calls coming from the
>> core or user space. It does not have to be power supply. It might be
>> exposing sysfs entries or file operations (exposed before calling
>> power_supply_register()).
> 
> Right, exactly when you register with the power-supply core the device
> better be ready so that handle any incoming calls.

Yes, the unusual thing here is that the device is called back directly
from the power_supply_register() call.

> 
>>> The problems with the bq27xxx seem to stem from the periodic update of
>>> the bq27xxx status and so it is not clear to me that this is a generic
>>> problem for all power-supply devices.
>>
>> Initially, the generic problem was that the core would call back the
>> driver from power_supply_register() in a synchronous way through
>> power_supply_changed(). The commit 7f1a57fdd6c changed it to an
>> asynchronous call. Here it looks like the same problem - the
>> power_supply_register() calls thermal which calls
>> thermal_zone_device_update() and we are back at the driver... before
>> finishing power_supply_register() call.
> 
> So I am still not convinced this is a generic problem but a problem with
> the bq27xxx. In fact, I think that commit 7f1a57fdd6c could be avoided
> if we did something like ...
> 
> http://marc.info/?l=linux-kernel&m=146425896332433&w=2
> 
> AFAICT in most cases, in ->get_property() you should have no need to
> access a driver's equivalent of di->bat, because you have already been
> passed a pointer to this via the *psy argument.

I agree that get_property() shouldn't access di->bat. However if it is
not forbidden (at least by documentation) then someone might just do it
because he does not know about such requirement.

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Hunter May 31, 2016, 5:24 p.m. UTC | #10
On 27/05/16 13:55, Krzysztof Kozlowski wrote:
> On 05/27/2016 02:17 PM, Jon Hunter wrote:
>>
>> On 27/05/16 12:46, Krzysztof Kozlowski wrote:
>>> On 05/27/2016 12:28 PM, Jon Hunter wrote:
>>>> Hi Krzysztof,
>>>>
>>>> On 27/05/16 09:37, Krzysztof Kozlowski wrote:
>>>>
>>>> ...
>>>>
>>>>> Indeed I was struggling with similar issue in bq27x00_battery. The issue
>>>>> was introduced by... me :(  when moving the ownership of power supply
>>>>> structure from driver to the core. However IMHO my change exposed the
>>>>> fundamental problem with power supply.
>>>>>
>>>>> Anyway a fix for this issue was:
>>>>> 7f1a57fdd6cb6e7b (power_supply: Fix possible NULL pointer dereference on
>>>>> early uevent)
>>>>> AFAIU, this fix no longer fixes all the issues, right?
>>>>>
>>>>> As for the fundamental problem, the power supply core should not call
>>>>> back the driver (get_property()) until the probe ends. Even if the
>>>>> di->bat was initialized, some other fields of driver could not be set
>>>>> yet. In general, the probe did not end so we should avoid calling driver
>>>>> internal functions.
>>>>
>>>> For my understanding, can you elaborate why the power-supply core should
>>>> not call back to the drivers ->get_property() before the probe ends? I
>>>> assume that registering the power-supply should be the last thing done
>>>> in the probe and so the power-supply should be configured at that point.
>>>
>>> It is not only about power supply but other resources allocated by the
>>> driver. If the power_supply_register() is a last call, then no problem.
>>> But if not, then these resources won't be available.
>>>
>>> Actually I exaggerated a little bit as a fundamental problem as this is
>>> quite common pattern. When driver provides something (like power supply)
>>> then after registration it should be ready for calls coming from the
>>> core or user space. It does not have to be power supply. It might be
>>> exposing sysfs entries or file operations (exposed before calling
>>> power_supply_register()).
>>
>> Right, exactly when you register with the power-supply core the device
>> better be ready so that handle any incoming calls.
> 
> Yes, the unusual thing here is that the device is called back directly
> from the power_supply_register() call.
> 
>>
>>>> The problems with the bq27xxx seem to stem from the periodic update of
>>>> the bq27xxx status and so it is not clear to me that this is a generic
>>>> problem for all power-supply devices.
>>>
>>> Initially, the generic problem was that the core would call back the
>>> driver from power_supply_register() in a synchronous way through
>>> power_supply_changed(). The commit 7f1a57fdd6c changed it to an
>>> asynchronous call. Here it looks like the same problem - the
>>> power_supply_register() calls thermal which calls
>>> thermal_zone_device_update() and we are back at the driver... before
>>> finishing power_supply_register() call.
>>
>> So I am still not convinced this is a generic problem but a problem with
>> the bq27xxx. In fact, I think that commit 7f1a57fdd6c could be avoided
>> if we did something like ...
>>
>> http://marc.info/?l=linux-kernel&m=146425896332433&w=2
>>
>> AFAICT in most cases, in ->get_property() you should have no need to
>> access a driver's equivalent of di->bat, because you have already been
>> passed a pointer to this via the *psy argument.
> 
> I agree that get_property() shouldn't access di->bat. However if it is
> not forbidden (at least by documentation) then someone might just do it
> because he does not know about such requirement.

In that case, shouldn't the driver should check that di->bat is valid
before anyone attempts to dereference it? However, if you and/or Rhyland
have a generic fix for preventing this, please go ahead and propose it.

Cheers
Jon
diff mbox

Patch

diff --git a/drivers/power/bq27xxx_battery.c b/drivers/power/bq27xxx_battery.c
index 45f6ebf88df6..1334ed522332 100644
--- a/drivers/power/bq27xxx_battery.c
+++ b/drivers/power/bq27xxx_battery.c
@@ -733,6 +733,9 @@  static void bq27xxx_battery_poll(struct work_struct *work)
                        container_of(work, struct bq27xxx_device_info,
                                     work.work);
 
+       if (!di->bat)
+               return;
+
        bq27xxx_battery_update(di);
 
        if (poll_interval > 0) {