diff mbox

[U-Boot,1/5,v2] dm: device_remove: Don't return in device_chld_remove() upon error

Message ID 20170424074804.15143-1-sr@denx.de
State Changes Requested
Delegated to: Bin Meng
Headers show

Commit Message

Stefan Roese April 24, 2017, 7:48 a.m. UTC
On my x86 platform I've noticed, that calling dm_uninit() or the new
function dm_remove_devices_flags() does not remove the desired device at
all. Debugging showed, that the serial uclass returns -EPERM in
serial_pre_remove() and this leads to a complete stop of the device
removal pretty early, as the serial device is one of the first ones in
the DM. Here the dm tree output:

=> dm tree
 Class       Probed   Name
----------------------------------------
 root        [ + ]    root_driver
 rsa_mod_exp [   ]    |-- mod_exp_sw
 serial      [ + ]    |-- serial
 rtc         [   ]    |-- rtc
 timer       [ + ]    |-- tsc-timer
 syscon      [ + ]    |-- pch_pinctrl
 ...

In this example, device_remove(root) will stop directly after trying to
remove the "serial" device.

To solve this problem, this patch removes the return upon error check in
the device_remove() call in device_chld_remove(). This leads to
device_chld_remove() continuing with the device_remove() call to the
following child devices.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Simon Glass <sjg@chromium.org>
Cc: Bin Meng <bmeng.cn@gmail.com>
---
v2:
- Add debug() output in error case

 drivers/core/device-remove.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Simon Glass April 29, 2017, 12:26 a.m. UTC | #1
Hi Stefan,

On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote:
> On my x86 platform I've noticed, that calling dm_uninit() or the new
> function dm_remove_devices_flags() does not remove the desired device at
> all. Debugging showed, that the serial uclass returns -EPERM in
> serial_pre_remove() and this leads to a complete stop of the device
> removal pretty early, as the serial device is one of the first ones in
> the DM. Here the dm tree output:
>
> => dm tree
>  Class       Probed   Name
> ----------------------------------------
>  root        [ + ]    root_driver
>  rsa_mod_exp [   ]    |-- mod_exp_sw
>  serial      [ + ]    |-- serial
>  rtc         [   ]    |-- rtc
>  timer       [ + ]    |-- tsc-timer
>  syscon      [ + ]    |-- pch_pinctrl
>  ...
>
> In this example, device_remove(root) will stop directly after trying to
> remove the "serial" device.
>
> To solve this problem, this patch removes the return upon error check in
> the device_remove() call in device_chld_remove(). This leads to
> device_chld_remove() continuing with the device_remove() call to the
> following child devices.
>
> Signed-off-by: Stefan Roese <sr@denx.de>
> Cc: Simon Glass <sjg@chromium.org>
> Cc: Bin Meng <bmeng.cn@gmail.com>
> ---
> v2:
> - Add debug() output in error case
>
>  drivers/core/device-remove.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

I thought that your change to force removal of the stdio dev would
make this change unnecessary?

I really would rather fix the root cause if we can.

>
> diff --git a/drivers/core/device-remove.c b/drivers/core/device-remove.c
> index cc0043b990..a1c0103af0 100644
> --- a/drivers/core/device-remove.c
> +++ b/drivers/core/device-remove.c
> @@ -58,8 +58,14 @@ static int device_chld_remove(struct udevice *dev, uint flags)
>
>         list_for_each_entry_safe(pos, n, &dev->child_head, sibling_node) {
>                 ret = device_remove(pos, flags);
> -               if (ret)
> -                       return ret;
> +               /*
> +                * Don't stop on error here, the remaining child devices still
> +                * need to get removed.
> +                */
> +               if (ret) {
> +                       debug("%s: device_remove(%s) returned %d\n",
> +                             __func__, pos->name, ret);
> +               }
>         }
>
>         return 0;
> --
> 2.12.2
>

Regards,
Simon
Stefan Roese May 1, 2017, 6:14 a.m. UTC | #2
Hi Simon,

On 29.04.2017 02:26, Simon Glass wrote:
> On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote:
>> On my x86 platform I've noticed, that calling dm_uninit() or the new
>> function dm_remove_devices_flags() does not remove the desired device at
>> all. Debugging showed, that the serial uclass returns -EPERM in
>> serial_pre_remove() and this leads to a complete stop of the device
>> removal pretty early, as the serial device is one of the first ones in
>> the DM. Here the dm tree output:
>>
>> => dm tree
>>  Class       Probed   Name
>> ----------------------------------------
>>  root        [ + ]    root_driver
>>  rsa_mod_exp [   ]    |-- mod_exp_sw
>>  serial      [ + ]    |-- serial
>>  rtc         [   ]    |-- rtc
>>  timer       [ + ]    |-- tsc-timer
>>  syscon      [ + ]    |-- pch_pinctrl
>>  ...
>>
>> In this example, device_remove(root) will stop directly after trying to
>> remove the "serial" device.
>>
>> To solve this problem, this patch removes the return upon error check in
>> the device_remove() call in device_chld_remove(). This leads to
>> device_chld_remove() continuing with the device_remove() call to the
>> following child devices.
>>
>> Signed-off-by: Stefan Roese <sr@denx.de>
>> Cc: Simon Glass <sjg@chromium.org>
>> Cc: Bin Meng <bmeng.cn@gmail.com>
>> ---
>> v2:
>> - Add debug() output in error case
>>
>>  drivers/core/device-remove.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> I thought that your change to force removal of the stdio dev would
> make this change unnecessary?

Yes, the force removal made this change unnecessary in this specific
case. But...

> I really would rather fix the root cause if we can.

... the current implementation to exit the loop over all children
upon error and not remove the remaining children is wrong IMO. All
devices should at least be tried to get removed, even if one fails
to get removed. This is what this patch makes sure of.

Thanks,
Stefan
Simon Glass May 2, 2017, 11:27 a.m. UTC | #3
Hi Stefan,

On 1 May 2017 at 00:14, Stefan Roese <sr@denx.de> wrote:
> Hi Simon,
>
>
> On 29.04.2017 02:26, Simon Glass wrote:
>>
>> On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote:
>>>
>>> On my x86 platform I've noticed, that calling dm_uninit() or the new
>>> function dm_remove_devices_flags() does not remove the desired device at
>>> all. Debugging showed, that the serial uclass returns -EPERM in
>>> serial_pre_remove() and this leads to a complete stop of the device
>>> removal pretty early, as the serial device is one of the first ones in
>>> the DM. Here the dm tree output:
>>>
>>> => dm tree
>>>  Class       Probed   Name
>>> ----------------------------------------
>>>  root        [ + ]    root_driver
>>>  rsa_mod_exp [   ]    |-- mod_exp_sw
>>>  serial      [ + ]    |-- serial
>>>  rtc         [   ]    |-- rtc
>>>  timer       [ + ]    |-- tsc-timer
>>>  syscon      [ + ]    |-- pch_pinctrl
>>>  ...
>>>
>>> In this example, device_remove(root) will stop directly after trying to
>>> remove the "serial" device.
>>>
>>> To solve this problem, this patch removes the return upon error check in
>>> the device_remove() call in device_chld_remove(). This leads to
>>> device_chld_remove() continuing with the device_remove() call to the
>>> following child devices.
>>>
>>> Signed-off-by: Stefan Roese <sr@denx.de>
>>> Cc: Simon Glass <sjg@chromium.org>
>>> Cc: Bin Meng <bmeng.cn@gmail.com>
>>> ---
>>> v2:
>>> - Add debug() output in error case
>>>
>>>  drivers/core/device-remove.c | 10 ++++++++--
>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>>
>> I thought that your change to force removal of the stdio dev would
>> make this change unnecessary?
>
>
> Yes, the force removal made this change unnecessary in this specific
> case. But...
>
>> I really would rather fix the root cause if we can.
>
>
> ... the current implementation to exit the loop over all children
> upon error and not remove the remaining children is wrong IMO. All
> devices should at least be tried to get removed, even if one fails
> to get removed. This is what this patch makes sure of.

Yes I see that, but not being able to remove is actually an error. In
the normal course of events, a device that will not remove itself is
likely buggy.

What do you think about adding a new remove flag to indicate that
failures should be skipped?

Regards,
Simon
Stefan Roese May 2, 2017, 11:45 a.m. UTC | #4
Hi Simon,

On 02.05.2017 13:27, Simon Glass wrote:
> On 1 May 2017 at 00:14, Stefan Roese <sr@denx.de> wrote:
>> Hi Simon,
>>
>>
>> On 29.04.2017 02:26, Simon Glass wrote:
>>>
>>> On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote:
>>>>
>>>> On my x86 platform I've noticed, that calling dm_uninit() or the new
>>>> function dm_remove_devices_flags() does not remove the desired device at
>>>> all. Debugging showed, that the serial uclass returns -EPERM in
>>>> serial_pre_remove() and this leads to a complete stop of the device
>>>> removal pretty early, as the serial device is one of the first ones in
>>>> the DM. Here the dm tree output:
>>>>
>>>> => dm tree
>>>>  Class       Probed   Name
>>>> ----------------------------------------
>>>>  root        [ + ]    root_driver
>>>>  rsa_mod_exp [   ]    |-- mod_exp_sw
>>>>  serial      [ + ]    |-- serial
>>>>  rtc         [   ]    |-- rtc
>>>>  timer       [ + ]    |-- tsc-timer
>>>>  syscon      [ + ]    |-- pch_pinctrl
>>>>  ...
>>>>
>>>> In this example, device_remove(root) will stop directly after trying to
>>>> remove the "serial" device.
>>>>
>>>> To solve this problem, this patch removes the return upon error check in
>>>> the device_remove() call in device_chld_remove(). This leads to
>>>> device_chld_remove() continuing with the device_remove() call to the
>>>> following child devices.
>>>>
>>>> Signed-off-by: Stefan Roese <sr@denx.de>
>>>> Cc: Simon Glass <sjg@chromium.org>
>>>> Cc: Bin Meng <bmeng.cn@gmail.com>
>>>> ---
>>>> v2:
>>>> - Add debug() output in error case
>>>>
>>>>  drivers/core/device-remove.c | 10 ++++++++--
>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>>
>>> I thought that your change to force removal of the stdio dev would
>>> make this change unnecessary?
>>
>>
>> Yes, the force removal made this change unnecessary in this specific
>> case. But...
>>
>>> I really would rather fix the root cause if we can.
>>
>>
>> ... the current implementation to exit the loop over all children
>> upon error and not remove the remaining children is wrong IMO. All
>> devices should at least be tried to get removed, even if one fails
>> to get removed. This is what this patch makes sure of.
>
> Yes I see that, but not being able to remove is actually an error. In
> the normal course of events, a device that will not remove itself is
> likely buggy.

Isn't it enough then to just print an error message in this case
in this loop - change debug() to printf() in this current patch
version? Then "users" of this code will be aware of such remove
failures and can take appropriate actions (fix bug etc in their
setup).

> What do you think about adding a new remove flag to indicate that
> failures should be skipped?

I'm a bit afraid that this makes the code overly complex. But if
you prefer to have it this way, then I can come up with such a
version as well. Just let me know.

Thanks,
Stefan
Simon Glass May 4, 2017, 4:50 p.m. UTC | #5
Hi Stefan,

On 2 May 2017 at 05:45, Stefan Roese <sr@denx.de> wrote:
> Hi Simon,
>
>
> On 02.05.2017 13:27, Simon Glass wrote:
>>
>> On 1 May 2017 at 00:14, Stefan Roese <sr@denx.de> wrote:
>>>
>>> Hi Simon,
>>>
>>>
>>> On 29.04.2017 02:26, Simon Glass wrote:
>>>>
>>>>
>>>> On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote:
>>>>>
>>>>>
>>>>> On my x86 platform I've noticed, that calling dm_uninit() or the new
>>>>> function dm_remove_devices_flags() does not remove the desired device
>>>>> at
>>>>> all. Debugging showed, that the serial uclass returns -EPERM in
>>>>> serial_pre_remove() and this leads to a complete stop of the device
>>>>> removal pretty early, as the serial device is one of the first ones in
>>>>> the DM. Here the dm tree output:
>>>>>
>>>>> => dm tree
>>>>>  Class       Probed   Name
>>>>> ----------------------------------------
>>>>>  root        [ + ]    root_driver
>>>>>  rsa_mod_exp [   ]    |-- mod_exp_sw
>>>>>  serial      [ + ]    |-- serial
>>>>>  rtc         [   ]    |-- rtc
>>>>>  timer       [ + ]    |-- tsc-timer
>>>>>  syscon      [ + ]    |-- pch_pinctrl
>>>>>  ...
>>>>>
>>>>> In this example, device_remove(root) will stop directly after trying to
>>>>> remove the "serial" device.
>>>>>
>>>>> To solve this problem, this patch removes the return upon error check
>>>>> in
>>>>> the device_remove() call in device_chld_remove(). This leads to
>>>>> device_chld_remove() continuing with the device_remove() call to the
>>>>> following child devices.
>>>>>
>>>>> Signed-off-by: Stefan Roese <sr@denx.de>
>>>>> Cc: Simon Glass <sjg@chromium.org>
>>>>> Cc: Bin Meng <bmeng.cn@gmail.com>
>>>>> ---
>>>>> v2:
>>>>> - Add debug() output in error case
>>>>>
>>>>>  drivers/core/device-remove.c | 10 ++++++++--
>>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>>
>>>>
>>>> I thought that your change to force removal of the stdio dev would
>>>> make this change unnecessary?
>>>
>>>
>>>
>>> Yes, the force removal made this change unnecessary in this specific
>>> case. But...
>>>
>>>> I really would rather fix the root cause if we can.
>>>
>>>
>>>
>>> ... the current implementation to exit the loop over all children
>>> upon error and not remove the remaining children is wrong IMO. All
>>> devices should at least be tried to get removed, even if one fails
>>> to get removed. This is what this patch makes sure of.
>>
>>
>> Yes I see that, but not being able to remove is actually an error. In
>> the normal course of events, a device that will not remove itself is
>> likely buggy.
>
>
> Isn't it enough then to just print an error message in this case
> in this loop - change debug() to printf() in this current patch
> version? Then "users" of this code will be aware of such remove
> failures and can take appropriate actions (fix bug etc in their
> setup).

Possibly, but programmatically it becomes impossible to detect a
failure. Say the USB fails to stop its DMA, we might want to reboot
rather than continue to boot Linux and crash.

>
>> What do you think about adding a new remove flag to indicate that
>> failures should be skipped?
>
>
> I'm a bit afraid that this makes the code overly complex. But if
> you prefer to have it this way, then I can come up with such a
> version as well. Just let me know.

I don't think the complexity is too great. It does need a new test.
But I think we should hold the line on error checking even with
remove(), by default.

>
> Thanks,
> Stefan

Regards,
Simon
Stefan Roese May 8, 2017, 7:35 a.m. UTC | #6
Hi Simon,

On 04.05.2017 18:50, Simon Glass wrote:

<snip>

>>>> ... the current implementation to exit the loop over all children
>>>> upon error and not remove the remaining children is wrong IMO. All
>>>> devices should at least be tried to get removed, even if one fails
>>>> to get removed. This is what this patch makes sure of.
>>>
>>>
>>> Yes I see that, but not being able to remove is actually an error. In
>>> the normal course of events, a device that will not remove itself is
>>> likely buggy.
>>
>>
>> Isn't it enough then to just print an error message in this case
>> in this loop - change debug() to printf() in this current patch
>> version? Then "users" of this code will be aware of such remove
>> failures and can take appropriate actions (fix bug etc in their
>> setup).
>
> Possibly, but programmatically it becomes impossible to detect a
> failure. Say the USB fails to stop its DMA, we might want to reboot
> rather than continue to boot Linux and crash.

Okay, I see your point.

>>
>>> What do you think about adding a new remove flag to indicate that
>>> failures should be skipped?
>>
>>
>> I'm a bit afraid that this makes the code overly complex. But if
>> you prefer to have it this way, then I can come up with such a
>> version as well. Just let me know.
>
> I don't think the complexity is too great. It does need a new test.
> But I think we should hold the line on error checking even with
> remove(), by default.

Understood. How about doing it this way: You drop this patch from
the series for now (it still works for us with the force remove of
the serial driver) and apply the remaining patches. I'll try to get
back to this skip-failures flag implementation in a few days / weeks.

Is this okay for you?

Thanks,
Stefan
Simon Glass May 9, 2017, 3:20 a.m. UTC | #7
Hi Stefan,

On 8 May 2017 at 01:35, Stefan Roese <sr@denx.de> wrote:
> Hi Simon,
>
> On 04.05.2017 18:50, Simon Glass wrote:
>
> <snip>
>
>>>>> ... the current implementation to exit the loop over all children
>>>>> upon error and not remove the remaining children is wrong IMO. All
>>>>> devices should at least be tried to get removed, even if one fails
>>>>> to get removed. This is what this patch makes sure of.
>>>>
>>>>
>>>>
>>>> Yes I see that, but not being able to remove is actually an error. In
>>>> the normal course of events, a device that will not remove itself is
>>>> likely buggy.
>>>
>>>
>>>
>>> Isn't it enough then to just print an error message in this case
>>> in this loop - change debug() to printf() in this current patch
>>> version? Then "users" of this code will be aware of such remove
>>> failures and can take appropriate actions (fix bug etc in their
>>> setup).
>>
>>
>> Possibly, but programmatically it becomes impossible to detect a
>> failure. Say the USB fails to stop its DMA, we might want to reboot
>> rather than continue to boot Linux and crash.
>
>
> Okay, I see your point.
>
>>>
>>>> What do you think about adding a new remove flag to indicate that
>>>> failures should be skipped?
>>>
>>>
>>>
>>> I'm a bit afraid that this makes the code overly complex. But if
>>> you prefer to have it this way, then I can come up with such a
>>> version as well. Just let me know.
>>
>>
>> I don't think the complexity is too great. It does need a new test.
>> But I think we should hold the line on error checking even with
>> remove(), by default.
>
>
> Understood. How about doing it this way: You drop this patch from
> the series for now (it still works for us with the force remove of
> the serial driver) and apply the remaining patches. I'll try to get
> back to this skip-failures flag implementation in a few days / weeks.
>
> Is this okay for you?

Yes that's fine.

>
> Thanks,
> Stefan

Regards,
Simon
diff mbox

Patch

diff --git a/drivers/core/device-remove.c b/drivers/core/device-remove.c
index cc0043b990..a1c0103af0 100644
--- a/drivers/core/device-remove.c
+++ b/drivers/core/device-remove.c
@@ -58,8 +58,14 @@  static int device_chld_remove(struct udevice *dev, uint flags)
 
 	list_for_each_entry_safe(pos, n, &dev->child_head, sibling_node) {
 		ret = device_remove(pos, flags);
-		if (ret)
-			return ret;
+		/*
+		 * Don't stop on error here, the remaining child devices still
+		 * need to get removed.
+		 */
+		if (ret) {
+			debug("%s: device_remove(%s) returned %d\n",
+			      __func__, pos->name, ret);
+		}
 	}
 
 	return 0;