Message ID | 20170424074804.15143-1-sr@denx.de |
---|---|
State | Changes Requested |
Delegated to: | Bin Meng |
Headers | show |
Hi Stefan, On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote: > On my x86 platform I've noticed, that calling dm_uninit() or the new > function dm_remove_devices_flags() does not remove the desired device at > all. Debugging showed, that the serial uclass returns -EPERM in > serial_pre_remove() and this leads to a complete stop of the device > removal pretty early, as the serial device is one of the first ones in > the DM. Here the dm tree output: > > => dm tree > Class Probed Name > ---------------------------------------- > root [ + ] root_driver > rsa_mod_exp [ ] |-- mod_exp_sw > serial [ + ] |-- serial > rtc [ ] |-- rtc > timer [ + ] |-- tsc-timer > syscon [ + ] |-- pch_pinctrl > ... > > In this example, device_remove(root) will stop directly after trying to > remove the "serial" device. > > To solve this problem, this patch removes the return upon error check in > the device_remove() call in device_chld_remove(). This leads to > device_chld_remove() continuing with the device_remove() call to the > following child devices. > > Signed-off-by: Stefan Roese <sr@denx.de> > Cc: Simon Glass <sjg@chromium.org> > Cc: Bin Meng <bmeng.cn@gmail.com> > --- > v2: > - Add debug() output in error case > > drivers/core/device-remove.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) I thought that your change to force removal of the stdio dev would make this change unnecessary? I really would rather fix the root cause if we can. > > diff --git a/drivers/core/device-remove.c b/drivers/core/device-remove.c > index cc0043b990..a1c0103af0 100644 > --- a/drivers/core/device-remove.c > +++ b/drivers/core/device-remove.c > @@ -58,8 +58,14 @@ static int device_chld_remove(struct udevice *dev, uint flags) > > list_for_each_entry_safe(pos, n, &dev->child_head, sibling_node) { > ret = device_remove(pos, flags); > - if (ret) > - return ret; > + /* > + * Don't stop on error here, the remaining child devices still > + * need to get removed. > + */ > + if (ret) { > + debug("%s: device_remove(%s) returned %d\n", > + __func__, pos->name, ret); > + } > } > > return 0; > -- > 2.12.2 > Regards, Simon
Hi Simon, On 29.04.2017 02:26, Simon Glass wrote: > On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote: >> On my x86 platform I've noticed, that calling dm_uninit() or the new >> function dm_remove_devices_flags() does not remove the desired device at >> all. Debugging showed, that the serial uclass returns -EPERM in >> serial_pre_remove() and this leads to a complete stop of the device >> removal pretty early, as the serial device is one of the first ones in >> the DM. Here the dm tree output: >> >> => dm tree >> Class Probed Name >> ---------------------------------------- >> root [ + ] root_driver >> rsa_mod_exp [ ] |-- mod_exp_sw >> serial [ + ] |-- serial >> rtc [ ] |-- rtc >> timer [ + ] |-- tsc-timer >> syscon [ + ] |-- pch_pinctrl >> ... >> >> In this example, device_remove(root) will stop directly after trying to >> remove the "serial" device. >> >> To solve this problem, this patch removes the return upon error check in >> the device_remove() call in device_chld_remove(). This leads to >> device_chld_remove() continuing with the device_remove() call to the >> following child devices. >> >> Signed-off-by: Stefan Roese <sr@denx.de> >> Cc: Simon Glass <sjg@chromium.org> >> Cc: Bin Meng <bmeng.cn@gmail.com> >> --- >> v2: >> - Add debug() output in error case >> >> drivers/core/device-remove.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) > > I thought that your change to force removal of the stdio dev would > make this change unnecessary? Yes, the force removal made this change unnecessary in this specific case. But... > I really would rather fix the root cause if we can. ... the current implementation to exit the loop over all children upon error and not remove the remaining children is wrong IMO. All devices should at least be tried to get removed, even if one fails to get removed. This is what this patch makes sure of. Thanks, Stefan
Hi Stefan, On 1 May 2017 at 00:14, Stefan Roese <sr@denx.de> wrote: > Hi Simon, > > > On 29.04.2017 02:26, Simon Glass wrote: >> >> On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote: >>> >>> On my x86 platform I've noticed, that calling dm_uninit() or the new >>> function dm_remove_devices_flags() does not remove the desired device at >>> all. Debugging showed, that the serial uclass returns -EPERM in >>> serial_pre_remove() and this leads to a complete stop of the device >>> removal pretty early, as the serial device is one of the first ones in >>> the DM. Here the dm tree output: >>> >>> => dm tree >>> Class Probed Name >>> ---------------------------------------- >>> root [ + ] root_driver >>> rsa_mod_exp [ ] |-- mod_exp_sw >>> serial [ + ] |-- serial >>> rtc [ ] |-- rtc >>> timer [ + ] |-- tsc-timer >>> syscon [ + ] |-- pch_pinctrl >>> ... >>> >>> In this example, device_remove(root) will stop directly after trying to >>> remove the "serial" device. >>> >>> To solve this problem, this patch removes the return upon error check in >>> the device_remove() call in device_chld_remove(). This leads to >>> device_chld_remove() continuing with the device_remove() call to the >>> following child devices. >>> >>> Signed-off-by: Stefan Roese <sr@denx.de> >>> Cc: Simon Glass <sjg@chromium.org> >>> Cc: Bin Meng <bmeng.cn@gmail.com> >>> --- >>> v2: >>> - Add debug() output in error case >>> >>> drivers/core/device-remove.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> >> I thought that your change to force removal of the stdio dev would >> make this change unnecessary? > > > Yes, the force removal made this change unnecessary in this specific > case. But... > >> I really would rather fix the root cause if we can. > > > ... the current implementation to exit the loop over all children > upon error and not remove the remaining children is wrong IMO. All > devices should at least be tried to get removed, even if one fails > to get removed. This is what this patch makes sure of. Yes I see that, but not being able to remove is actually an error. In the normal course of events, a device that will not remove itself is likely buggy. What do you think about adding a new remove flag to indicate that failures should be skipped? Regards, Simon
Hi Simon, On 02.05.2017 13:27, Simon Glass wrote: > On 1 May 2017 at 00:14, Stefan Roese <sr@denx.de> wrote: >> Hi Simon, >> >> >> On 29.04.2017 02:26, Simon Glass wrote: >>> >>> On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote: >>>> >>>> On my x86 platform I've noticed, that calling dm_uninit() or the new >>>> function dm_remove_devices_flags() does not remove the desired device at >>>> all. Debugging showed, that the serial uclass returns -EPERM in >>>> serial_pre_remove() and this leads to a complete stop of the device >>>> removal pretty early, as the serial device is one of the first ones in >>>> the DM. Here the dm tree output: >>>> >>>> => dm tree >>>> Class Probed Name >>>> ---------------------------------------- >>>> root [ + ] root_driver >>>> rsa_mod_exp [ ] |-- mod_exp_sw >>>> serial [ + ] |-- serial >>>> rtc [ ] |-- rtc >>>> timer [ + ] |-- tsc-timer >>>> syscon [ + ] |-- pch_pinctrl >>>> ... >>>> >>>> In this example, device_remove(root) will stop directly after trying to >>>> remove the "serial" device. >>>> >>>> To solve this problem, this patch removes the return upon error check in >>>> the device_remove() call in device_chld_remove(). This leads to >>>> device_chld_remove() continuing with the device_remove() call to the >>>> following child devices. >>>> >>>> Signed-off-by: Stefan Roese <sr@denx.de> >>>> Cc: Simon Glass <sjg@chromium.org> >>>> Cc: Bin Meng <bmeng.cn@gmail.com> >>>> --- >>>> v2: >>>> - Add debug() output in error case >>>> >>>> drivers/core/device-remove.c | 10 ++++++++-- >>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> >>> I thought that your change to force removal of the stdio dev would >>> make this change unnecessary? >> >> >> Yes, the force removal made this change unnecessary in this specific >> case. But... >> >>> I really would rather fix the root cause if we can. >> >> >> ... the current implementation to exit the loop over all children >> upon error and not remove the remaining children is wrong IMO. All >> devices should at least be tried to get removed, even if one fails >> to get removed. This is what this patch makes sure of. > > Yes I see that, but not being able to remove is actually an error. In > the normal course of events, a device that will not remove itself is > likely buggy. Isn't it enough then to just print an error message in this case in this loop - change debug() to printf() in this current patch version? Then "users" of this code will be aware of such remove failures and can take appropriate actions (fix bug etc in their setup). > What do you think about adding a new remove flag to indicate that > failures should be skipped? I'm a bit afraid that this makes the code overly complex. But if you prefer to have it this way, then I can come up with such a version as well. Just let me know. Thanks, Stefan
Hi Stefan, On 2 May 2017 at 05:45, Stefan Roese <sr@denx.de> wrote: > Hi Simon, > > > On 02.05.2017 13:27, Simon Glass wrote: >> >> On 1 May 2017 at 00:14, Stefan Roese <sr@denx.de> wrote: >>> >>> Hi Simon, >>> >>> >>> On 29.04.2017 02:26, Simon Glass wrote: >>>> >>>> >>>> On 24 April 2017 at 01:48, Stefan Roese <sr@denx.de> wrote: >>>>> >>>>> >>>>> On my x86 platform I've noticed, that calling dm_uninit() or the new >>>>> function dm_remove_devices_flags() does not remove the desired device >>>>> at >>>>> all. Debugging showed, that the serial uclass returns -EPERM in >>>>> serial_pre_remove() and this leads to a complete stop of the device >>>>> removal pretty early, as the serial device is one of the first ones in >>>>> the DM. Here the dm tree output: >>>>> >>>>> => dm tree >>>>> Class Probed Name >>>>> ---------------------------------------- >>>>> root [ + ] root_driver >>>>> rsa_mod_exp [ ] |-- mod_exp_sw >>>>> serial [ + ] |-- serial >>>>> rtc [ ] |-- rtc >>>>> timer [ + ] |-- tsc-timer >>>>> syscon [ + ] |-- pch_pinctrl >>>>> ... >>>>> >>>>> In this example, device_remove(root) will stop directly after trying to >>>>> remove the "serial" device. >>>>> >>>>> To solve this problem, this patch removes the return upon error check >>>>> in >>>>> the device_remove() call in device_chld_remove(). This leads to >>>>> device_chld_remove() continuing with the device_remove() call to the >>>>> following child devices. >>>>> >>>>> Signed-off-by: Stefan Roese <sr@denx.de> >>>>> Cc: Simon Glass <sjg@chromium.org> >>>>> Cc: Bin Meng <bmeng.cn@gmail.com> >>>>> --- >>>>> v2: >>>>> - Add debug() output in error case >>>>> >>>>> drivers/core/device-remove.c | 10 ++++++++-- >>>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>> >>>> >>>> >>>> I thought that your change to force removal of the stdio dev would >>>> make this change unnecessary? >>> >>> >>> >>> Yes, the force removal made this change unnecessary in this specific >>> case. But... >>> >>>> I really would rather fix the root cause if we can. >>> >>> >>> >>> ... the current implementation to exit the loop over all children >>> upon error and not remove the remaining children is wrong IMO. All >>> devices should at least be tried to get removed, even if one fails >>> to get removed. This is what this patch makes sure of. >> >> >> Yes I see that, but not being able to remove is actually an error. In >> the normal course of events, a device that will not remove itself is >> likely buggy. > > > Isn't it enough then to just print an error message in this case > in this loop - change debug() to printf() in this current patch > version? Then "users" of this code will be aware of such remove > failures and can take appropriate actions (fix bug etc in their > setup). Possibly, but programmatically it becomes impossible to detect a failure. Say the USB fails to stop its DMA, we might want to reboot rather than continue to boot Linux and crash. > >> What do you think about adding a new remove flag to indicate that >> failures should be skipped? > > > I'm a bit afraid that this makes the code overly complex. But if > you prefer to have it this way, then I can come up with such a > version as well. Just let me know. I don't think the complexity is too great. It does need a new test. But I think we should hold the line on error checking even with remove(), by default. > > Thanks, > Stefan Regards, Simon
Hi Simon, On 04.05.2017 18:50, Simon Glass wrote: <snip> >>>> ... the current implementation to exit the loop over all children >>>> upon error and not remove the remaining children is wrong IMO. All >>>> devices should at least be tried to get removed, even if one fails >>>> to get removed. This is what this patch makes sure of. >>> >>> >>> Yes I see that, but not being able to remove is actually an error. In >>> the normal course of events, a device that will not remove itself is >>> likely buggy. >> >> >> Isn't it enough then to just print an error message in this case >> in this loop - change debug() to printf() in this current patch >> version? Then "users" of this code will be aware of such remove >> failures and can take appropriate actions (fix bug etc in their >> setup). > > Possibly, but programmatically it becomes impossible to detect a > failure. Say the USB fails to stop its DMA, we might want to reboot > rather than continue to boot Linux and crash. Okay, I see your point. >> >>> What do you think about adding a new remove flag to indicate that >>> failures should be skipped? >> >> >> I'm a bit afraid that this makes the code overly complex. But if >> you prefer to have it this way, then I can come up with such a >> version as well. Just let me know. > > I don't think the complexity is too great. It does need a new test. > But I think we should hold the line on error checking even with > remove(), by default. Understood. How about doing it this way: You drop this patch from the series for now (it still works for us with the force remove of the serial driver) and apply the remaining patches. I'll try to get back to this skip-failures flag implementation in a few days / weeks. Is this okay for you? Thanks, Stefan
Hi Stefan, On 8 May 2017 at 01:35, Stefan Roese <sr@denx.de> wrote: > Hi Simon, > > On 04.05.2017 18:50, Simon Glass wrote: > > <snip> > >>>>> ... the current implementation to exit the loop over all children >>>>> upon error and not remove the remaining children is wrong IMO. All >>>>> devices should at least be tried to get removed, even if one fails >>>>> to get removed. This is what this patch makes sure of. >>>> >>>> >>>> >>>> Yes I see that, but not being able to remove is actually an error. In >>>> the normal course of events, a device that will not remove itself is >>>> likely buggy. >>> >>> >>> >>> Isn't it enough then to just print an error message in this case >>> in this loop - change debug() to printf() in this current patch >>> version? Then "users" of this code will be aware of such remove >>> failures and can take appropriate actions (fix bug etc in their >>> setup). >> >> >> Possibly, but programmatically it becomes impossible to detect a >> failure. Say the USB fails to stop its DMA, we might want to reboot >> rather than continue to boot Linux and crash. > > > Okay, I see your point. > >>> >>>> What do you think about adding a new remove flag to indicate that >>>> failures should be skipped? >>> >>> >>> >>> I'm a bit afraid that this makes the code overly complex. But if >>> you prefer to have it this way, then I can come up with such a >>> version as well. Just let me know. >> >> >> I don't think the complexity is too great. It does need a new test. >> But I think we should hold the line on error checking even with >> remove(), by default. > > > Understood. How about doing it this way: You drop this patch from > the series for now (it still works for us with the force remove of > the serial driver) and apply the remaining patches. I'll try to get > back to this skip-failures flag implementation in a few days / weeks. > > Is this okay for you? Yes that's fine. > > Thanks, > Stefan Regards, Simon
diff --git a/drivers/core/device-remove.c b/drivers/core/device-remove.c index cc0043b990..a1c0103af0 100644 --- a/drivers/core/device-remove.c +++ b/drivers/core/device-remove.c @@ -58,8 +58,14 @@ static int device_chld_remove(struct udevice *dev, uint flags) list_for_each_entry_safe(pos, n, &dev->child_head, sibling_node) { ret = device_remove(pos, flags); - if (ret) - return ret; + /* + * Don't stop on error here, the remaining child devices still + * need to get removed. + */ + if (ret) { + debug("%s: device_remove(%s) returned %d\n", + __func__, pos->name, ret); + } } return 0;
On my x86 platform I've noticed, that calling dm_uninit() or the new function dm_remove_devices_flags() does not remove the desired device at all. Debugging showed, that the serial uclass returns -EPERM in serial_pre_remove() and this leads to a complete stop of the device removal pretty early, as the serial device is one of the first ones in the DM. Here the dm tree output: => dm tree Class Probed Name ---------------------------------------- root [ + ] root_driver rsa_mod_exp [ ] |-- mod_exp_sw serial [ + ] |-- serial rtc [ ] |-- rtc timer [ + ] |-- tsc-timer syscon [ + ] |-- pch_pinctrl ... In this example, device_remove(root) will stop directly after trying to remove the "serial" device. To solve this problem, this patch removes the return upon error check in the device_remove() call in device_chld_remove(). This leads to device_chld_remove() continuing with the device_remove() call to the following child devices. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Simon Glass <sjg@chromium.org> Cc: Bin Meng <bmeng.cn@gmail.com> --- v2: - Add debug() output in error case drivers/core/device-remove.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)