diff mbox series

[v4,07/10] watchdog: wdt-uclass.c: handle all DM watchdogs in watchdog_reset()

Message ID 20210802150016.588750-8-rasmus.villemoes@prevas.dk
State Superseded
Delegated to: Stefan Roese
Headers show
Series handling all DM watchdogs in watchdog_reset() | expand

Commit Message

Rasmus Villemoes Aug. 2, 2021, 3 p.m. UTC
A board can have and make use of more than one watchdog device, say
one built into the SOC and an external gpio-petted one. Having
wdt-uclass only handle the first is both a little arbitrary and
unexpected.

So change initr_watchdog() so we visit (probe) all DM watchdog
devices, and call the init_watchdog_dev helper for each.

Similarly let watchdog_reset() loop over the whole uclass - each
having their own ratelimiting metadata, and a separate "is this device
running" flag.

This gets rid of the watchdog_dev member of struct global_data.  We
do, however, still need the GD_FLG_WDT_READY set in
initr_watchdog(). This is because watchdog_reset() can get called
before DM is ready, and I don't think we can call uclass_get() that
early.

The current code just returns 0 if "getting" the first device fails -
that can of course happen because there are no devices, but it could
also happen if its ->probe call failed. In keeping with that, continue
with the handling of the remaining devices even if one fails to
probe. This is also why we cannot use uclass_probe_all().

If desired, it's possible to later add a per-device "u-boot,autostart"
boolean property, so that one can do CONFIG_WATCHDOG_AUTOSTART
per-device.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
---
 drivers/watchdog/wdt-uclass.c     | 56 ++++++++++++++++++++-----------
 include/asm-generic/global_data.h |  6 ----
 2 files changed, 36 insertions(+), 26 deletions(-)

Comments

Simon Glass Aug. 2, 2021, 7:22 p.m. UTC | #1
Hi Rasmus,

On Mon, 2 Aug 2021 at 09:00, Rasmus Villemoes
<rasmus.villemoes@prevas.dk> wrote:
>
> A board can have and make use of more than one watchdog device, say
> one built into the SOC and an external gpio-petted one. Having
> wdt-uclass only handle the first is both a little arbitrary and
> unexpected.
>
> So change initr_watchdog() so we visit (probe) all DM watchdog
> devices, and call the init_watchdog_dev helper for each.
>
> Similarly let watchdog_reset() loop over the whole uclass - each
> having their own ratelimiting metadata, and a separate "is this device
> running" flag.
>
> This gets rid of the watchdog_dev member of struct global_data.  We
> do, however, still need the GD_FLG_WDT_READY set in
> initr_watchdog(). This is because watchdog_reset() can get called
> before DM is ready, and I don't think we can call uclass_get() that
> early.
>
> The current code just returns 0 if "getting" the first device fails -
> that can of course happen because there are no devices, but it could
> also happen if its ->probe call failed. In keeping with that, continue
> with the handling of the remaining devices even if one fails to
> probe. This is also why we cannot use uclass_probe_all().
>
> If desired, it's possible to later add a per-device "u-boot,autostart"
> boolean property, so that one can do CONFIG_WATCHDOG_AUTOSTART
> per-device.
>
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>  drivers/watchdog/wdt-uclass.c     | 56 ++++++++++++++++++++-----------
>  include/asm-generic/global_data.h |  6 ----
>  2 files changed, 36 insertions(+), 26 deletions(-)

Reviewed-by: Simon Glass <sjg@chromium.org>
Stefan Roese Aug. 3, 2021, 6:29 a.m. UTC | #2
On 02.08.21 17:00, Rasmus Villemoes wrote:
> A board can have and make use of more than one watchdog device, say
> one built into the SOC and an external gpio-petted one. Having
> wdt-uclass only handle the first is both a little arbitrary and
> unexpected.
> 
> So change initr_watchdog() so we visit (probe) all DM watchdog
> devices, and call the init_watchdog_dev helper for each.
> 
> Similarly let watchdog_reset() loop over the whole uclass - each
> having their own ratelimiting metadata, and a separate "is this device
> running" flag.
> 
> This gets rid of the watchdog_dev member of struct global_data.  We
> do, however, still need the GD_FLG_WDT_READY set in
> initr_watchdog(). This is because watchdog_reset() can get called
> before DM is ready, and I don't think we can call uclass_get() that
> early.
> 
> The current code just returns 0 if "getting" the first device fails -
> that can of course happen because there are no devices, but it could
> also happen if its ->probe call failed. In keeping with that, continue
> with the handling of the remaining devices even if one fails to
> probe. This is also why we cannot use uclass_probe_all().
> 
> If desired, it's possible to later add a per-device "u-boot,autostart"
> boolean property, so that one can do CONFIG_WATCHDOG_AUTOSTART
> per-device.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

Reviewed-by: Stefan Roese <sr@denx.de>

Thanks,
Stefan

> ---
>   drivers/watchdog/wdt-uclass.c     | 56 ++++++++++++++++++++-----------
>   include/asm-generic/global_data.h |  6 ----
>   2 files changed, 36 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
> index 358fc68e27..0ce8b3a425 100644
> --- a/drivers/watchdog/wdt-uclass.c
> +++ b/drivers/watchdog/wdt-uclass.c
> @@ -61,20 +61,24 @@ static void init_watchdog_dev(struct udevice *dev)
>   
>   int initr_watchdog(void)
>   {
> -	/*
> -	 * Init watchdog: This will call the probe function of the
> -	 * watchdog driver, enabling the use of the device
> -	 */
> -	if (uclass_get_device_by_seq(UCLASS_WDT, 0,
> -				     (struct udevice **)&gd->watchdog_dev)) {
> -		debug("WDT:   Not found by seq!\n");
> -		if (uclass_get_device(UCLASS_WDT, 0,
> -				      (struct udevice **)&gd->watchdog_dev)) {
> -			printf("WDT:   Not found!\n");
> -			return 0;
> +	struct udevice *dev;
> +	struct uclass *uc;
> +	int ret;
> +
> +	ret = uclass_get(UCLASS_WDT, &uc);
> +	if (ret) {
> +		log_debug("Error getting UCLASS_WDT: %d\n", ret);
> +		return 0;
> +	}
> +
> +	uclass_foreach_dev(dev, uc) {
> +		ret = device_probe(dev);
> +		if (ret) {
> +			log_debug("Error probing %s: %d\n", dev->name, ret);
> +			continue;
>   		}
> +		init_watchdog_dev(dev);
>   	}
> -	init_watchdog_dev(gd->watchdog_dev);
>   
>   	gd->flags |= GD_FLG_WDT_READY;
>   	return 0;
> @@ -157,22 +161,34 @@ void watchdog_reset(void)
>   {
>   	struct wdt_priv *priv;
>   	struct udevice *dev;
> +	struct uclass *uc;
>   	ulong now;
>   
>   	/* Exit if GD is not ready or watchdog is not initialized yet */
>   	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
>   		return;
>   
> -	dev = gd->watchdog_dev;
> -	priv = dev_get_uclass_priv(dev);
> -	if (!priv->running)
> +	if (uclass_get(UCLASS_WDT, &uc))
>   		return;
>   
> -	/* Do not reset the watchdog too often */
> -	now = get_timer(0);
> -	if (time_after_eq(now, priv->next_reset)) {
> -		priv->next_reset = now + priv->reset_period;
> -		wdt_reset(dev);
> +	/*
> +	 * All devices bound to the wdt uclass should have been probed
> +	 * in initr_watchdog(). But just in case something went wrong,
> +	 * check device_active() before accessing the uclass private
> +	 * data.
> +	 */
> +	uclass_foreach_dev(dev, uc) {
> +		if (!device_active(dev))
> +			continue;
> +		priv = dev_get_uclass_priv(dev);
> +		if (!priv->running)
> +			continue;
> +		/* Do not reset the watchdog too often */
> +		now = get_timer(0);
> +		if (time_after_eq(now, priv->next_reset)) {
> +			priv->next_reset = now + priv->reset_period;
> +			wdt_reset(dev);
> +		}
>   	}
>   }
>   #endif
> diff --git a/include/asm-generic/global_data.h b/include/asm-generic/global_data.h
> index e55070303f..28d749538c 100644
> --- a/include/asm-generic/global_data.h
> +++ b/include/asm-generic/global_data.h
> @@ -447,12 +447,6 @@ struct global_data {
>   	 */
>   	fdt_addr_t translation_offset;
>   #endif
> -#if CONFIG_IS_ENABLED(WDT)
> -	/**
> -	 * @watchdog_dev: watchdog device
> -	 */
> -	struct udevice *watchdog_dev;
> -#endif
>   #ifdef CONFIG_GENERATE_ACPI_TABLE
>   	/**
>   	 * @acpi_ctx: ACPI context pointer
> 


Viele Grüße,
Stefan
Stefan Roese Aug. 3, 2021, 8:28 a.m. UTC | #3
Hi Rasmus,

On 02.08.21 17:00, Rasmus Villemoes wrote:
> A board can have and make use of more than one watchdog device, say
> one built into the SOC and an external gpio-petted one. Having
> wdt-uclass only handle the first is both a little arbitrary and
> unexpected.
> 
> So change initr_watchdog() so we visit (probe) all DM watchdog
> devices, and call the init_watchdog_dev helper for each.
> 
> Similarly let watchdog_reset() loop over the whole uclass - each
> having their own ratelimiting metadata, and a separate "is this device
> running" flag.
> 
> This gets rid of the watchdog_dev member of struct global_data.  We
> do, however, still need the GD_FLG_WDT_READY set in
> initr_watchdog(). This is because watchdog_reset() can get called
> before DM is ready, and I don't think we can call uclass_get() that
> early.
> 
> The current code just returns 0 if "getting" the first device fails -
> that can of course happen because there are no devices, but it could
> also happen if its ->probe call failed. In keeping with that, continue
> with the handling of the remaining devices even if one fails to
> probe. This is also why we cannot use uclass_probe_all().
> 
> If desired, it's possible to later add a per-device "u-boot,autostart"
> boolean property, so that one can do CONFIG_WATCHDOG_AUTOSTART
> per-device.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
> ---
>   drivers/watchdog/wdt-uclass.c     | 56 ++++++++++++++++++++-----------
>   include/asm-generic/global_data.h |  6 ----
>   2 files changed, 36 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
> index 358fc68e27..0ce8b3a425 100644
> --- a/drivers/watchdog/wdt-uclass.c
> +++ b/drivers/watchdog/wdt-uclass.c
> @@ -61,20 +61,24 @@ static void init_watchdog_dev(struct udevice *dev)
>   
>   int initr_watchdog(void)
>   {
> -	/*
> -	 * Init watchdog: This will call the probe function of the
> -	 * watchdog driver, enabling the use of the device
> -	 */
> -	if (uclass_get_device_by_seq(UCLASS_WDT, 0,
> -				     (struct udevice **)&gd->watchdog_dev)) {
> -		debug("WDT:   Not found by seq!\n");
> -		if (uclass_get_device(UCLASS_WDT, 0,
> -				      (struct udevice **)&gd->watchdog_dev)) {
> -			printf("WDT:   Not found!\n");
> -			return 0;
> +	struct udevice *dev;
> +	struct uclass *uc;
> +	int ret;
> +
> +	ret = uclass_get(UCLASS_WDT, &uc);
> +	if (ret) {
> +		log_debug("Error getting UCLASS_WDT: %d\n", ret);
> +		return 0;
> +	}
> +
> +	uclass_foreach_dev(dev, uc) {
> +		ret = device_probe(dev);
> +		if (ret) {
> +			log_debug("Error probing %s: %d\n", dev->name, ret);
> +			continue;
>   		}
> +		init_watchdog_dev(dev);
>   	}
> -	init_watchdog_dev(gd->watchdog_dev);
>   
>   	gd->flags |= GD_FLG_WDT_READY;
>   	return 0;
> @@ -157,22 +161,34 @@ void watchdog_reset(void)
>   {
>   	struct wdt_priv *priv;
>   	struct udevice *dev;
> +	struct uclass *uc;
>   	ulong now;
>   
>   	/* Exit if GD is not ready or watchdog is not initialized yet */
>   	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
>   		return;
>   
> -	dev = gd->watchdog_dev;
> -	priv = dev_get_uclass_priv(dev);
> -	if (!priv->running)
> +	if (uclass_get(UCLASS_WDT, &uc))
>   		return;
>   
> -	/* Do not reset the watchdog too often */
> -	now = get_timer(0);
> -	if (time_after_eq(now, priv->next_reset)) {
> -		priv->next_reset = now + priv->reset_period;
> -		wdt_reset(dev);
> +	/*
> +	 * All devices bound to the wdt uclass should have been probed
> +	 * in initr_watchdog(). But just in case something went wrong,
> +	 * check device_active() before accessing the uclass private
> +	 * data.
> +	 */
> +	uclass_foreach_dev(dev, uc) {
> +		if (!device_active(dev))
> +			continue;
> +		priv = dev_get_uclass_priv(dev);
> +		if (!priv->running)
> +			continue;
> +		/* Do not reset the watchdog too often */
> +		now = get_timer(0);
> +		if (time_after_eq(now, priv->next_reset)) {
> +			priv->next_reset = now + priv->reset_period;
> +			wdt_reset(dev);
> +		}
>   	}
>   }
>   #endif
> diff --git a/include/asm-generic/global_data.h b/include/asm-generic/global_data.h
> index e55070303f..28d749538c 100644
> --- a/include/asm-generic/global_data.h
> +++ b/include/asm-generic/global_data.h
> @@ -447,12 +447,6 @@ struct global_data {
>   	 */
>   	fdt_addr_t translation_offset;
>   #endif
> -#if CONFIG_IS_ENABLED(WDT)
> -	/**
> -	 * @watchdog_dev: watchdog device
> -	 */
> -	struct udevice *watchdog_dev;
> -#endif

After applying the patchset v4 to current master, I see this error when
building for "x530":

board/alliedtelesis/x530/x530.c: In function 'arch_preboot_os':
board/alliedtelesis/x530/x530.c:125:13: error: 'gd_t' {aka 'volatile 
struct global_data'} has no member named 'watchdog_dev'
   125 |  wdt_stop(gd->watchdog_dev);
       |             ^~
make[1]: *** [scripts/Makefile.build:254: 
board/alliedtelesis/x530/x530.o] Error 1

Perhaps we need a common function now to stop all watchdogs, which can
be called from such places?

Thanks,
Stefan
Rasmus Villemoes Aug. 11, 2021, 11:32 a.m. UTC | #4
On 03/08/2021 10.28, Stefan Roese wrote:
> Hi Rasmus,
> 
>>   #endif
>> diff --git a/include/asm-generic/global_data.h
>> b/include/asm-generic/global_data.h
>> index e55070303f..28d749538c 100644
>> --- a/include/asm-generic/global_data.h
>> +++ b/include/asm-generic/global_data.h
>> @@ -447,12 +447,6 @@ struct global_data {
>>        */
>>       fdt_addr_t translation_offset;
>>   #endif
>> -#if CONFIG_IS_ENABLED(WDT)
>> -    /**
>> -     * @watchdog_dev: watchdog device
>> -     */
>> -    struct udevice *watchdog_dev;
>> -#endif
> 
> After applying the patchset v4 to current master, I see this error when
> building for "x530":

I'm not sure how I missed this, I was convinced I had done a grep and
seen that all references to ->watchdog_dev were gone.

> board/alliedtelesis/x530/x530.c: In function 'arch_preboot_os':
> board/alliedtelesis/x530/x530.c:125:13: error: 'gd_t' {aka 'volatile
> struct global_data'} has no member named 'watchdog_dev'
>   125 |  wdt_stop(gd->watchdog_dev);
>       |             ^~
> make[1]: *** [scripts/Makefile.build:254:
> board/alliedtelesis/x530/x530.o] Error 1
> 
> Perhaps we need a common function now to stop all watchdogs, which can
> be called from such places?

Yes, I think that's the right thing, even if there's just one single
caller. Dead code elimination should remove that global function for all
other boards. Here is what I have right now:

    watchdog: wdt-uclass: add wdt_stop_all() helper

    Since the watchdog_dev member of struct global_data is going away in
    favor of the wdt-uclass handling all watchdog devices, prepare for
    that by adding a helper to call wdt_stop() on all known devices.

    Initially, this will only be used in one single
    place (board/alliedtelesis/x530/x530.c), and that user currently
    ignores the return value of wdt_stop(). It's not clear what one should
    do if we encounter an error (remember the error but still stop the
    remaining ones? return immediately? try to unwind and
    restart the devices already stopped?). Moreover, some watchdogs are by
    design always-running (and don't have a ->stop method at all), so at
    the very least some exception for -ENOSYS would be in order.

    So for now, and until a real use case appears from which we can decide
    the desired semantics, have the function return void and just emit a
    log_debug() if an error is encountered.

diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
index 358fc68e27..75ff4c2a6c 100644
--- a/drivers/watchdog/wdt-uclass.c
+++ b/drivers/watchdog/wdt-uclass.c
@@ -116,6 +116,31 @@ int wdt_stop(struct udevice *dev)
        return ret;
 }

+void wdt_stop_all(void)
+{
+       struct wdt_priv *priv;
+       struct udevice *dev;
+       struct uclass *uc;
+       int ret;
+
+       ret = uclass_get(UCLASS_WDT, &uc);
+       if (ret) {
+               log_debug("Error getting UCLASS_WDT: %d\n", ret);
+               return;
+       }
+
+       uclass_foreach_dev(dev, uc) {
+               if (!device_active(dev))
+                       continue;
+               priv = dev_get_uclass_priv(dev);
+               if (!priv->running)
+                       continue;
+               ret = wdt_stop(dev);
+               if (ret)
+                       log_debug("wdt_stop(%s) failed: %d\n",
dev->name, ret);
+       }
+}
+
 int wdt_reset(struct udevice *dev)
 {
        const struct wdt_ops *ops = device_get_ops(dev);

(along with a declaration in wdt.h, and another patch for x530 to make
use of it). Something like that?

Rasmus
Stefan Roese Aug. 11, 2021, 11:49 a.m. UTC | #5
Hi Rasmus,

On 11.08.21 13:32, Rasmus Villemoes wrote:
> On 03/08/2021 10.28, Stefan Roese wrote:
>> Hi Rasmus,
>>
>>>    #endif
>>> diff --git a/include/asm-generic/global_data.h
>>> b/include/asm-generic/global_data.h
>>> index e55070303f..28d749538c 100644
>>> --- a/include/asm-generic/global_data.h
>>> +++ b/include/asm-generic/global_data.h
>>> @@ -447,12 +447,6 @@ struct global_data {
>>>         */
>>>        fdt_addr_t translation_offset;
>>>    #endif
>>> -#if CONFIG_IS_ENABLED(WDT)
>>> -    /**
>>> -     * @watchdog_dev: watchdog device
>>> -     */
>>> -    struct udevice *watchdog_dev;
>>> -#endif
>>
>> After applying the patchset v4 to current master, I see this error when
>> building for "x530":
> 
> I'm not sure how I missed this, I was convinced I had done a grep and
> seen that all references to ->watchdog_dev were gone.
> 
>> board/alliedtelesis/x530/x530.c: In function 'arch_preboot_os':
>> board/alliedtelesis/x530/x530.c:125:13: error: 'gd_t' {aka 'volatile
>> struct global_data'} has no member named 'watchdog_dev'
>>    125 |  wdt_stop(gd->watchdog_dev);
>>        |             ^~
>> make[1]: *** [scripts/Makefile.build:254:
>> board/alliedtelesis/x530/x530.o] Error 1
>>
>> Perhaps we need a common function now to stop all watchdogs, which can
>> be called from such places?
> 
> Yes, I think that's the right thing, even if there's just one single
> caller. Dead code elimination should remove that global function for all
> other boards. Here is what I have right now:
> 
>      watchdog: wdt-uclass: add wdt_stop_all() helper
> 
>      Since the watchdog_dev member of struct global_data is going away in
>      favor of the wdt-uclass handling all watchdog devices, prepare for
>      that by adding a helper to call wdt_stop() on all known devices.
> 
>      Initially, this will only be used in one single
>      place (board/alliedtelesis/x530/x530.c), and that user currently
>      ignores the return value of wdt_stop(). It's not clear what one should
>      do if we encounter an error (remember the error but still stop the
>      remaining ones? return immediately? try to unwind and
>      restart the devices already stopped?). Moreover, some watchdogs are by
>      design always-running (and don't have a ->stop method at all), so at
>      the very least some exception for -ENOSYS would be in order.
> 
>      So for now, and until a real use case appears from which we can decide
>      the desired semantics, have the function return void and just emit a
>      log_debug() if an error is encountered.
> 
> diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
> index 358fc68e27..75ff4c2a6c 100644
> --- a/drivers/watchdog/wdt-uclass.c
> +++ b/drivers/watchdog/wdt-uclass.c
> @@ -116,6 +116,31 @@ int wdt_stop(struct udevice *dev)
>          return ret;
>   }
> 
> +void wdt_stop_all(void)
> +{
> +       struct wdt_priv *priv;
> +       struct udevice *dev;
> +       struct uclass *uc;
> +       int ret;
> +
> +       ret = uclass_get(UCLASS_WDT, &uc);
> +       if (ret) {
> +               log_debug("Error getting UCLASS_WDT: %d\n", ret);

Perhaps log_err()?

> +               return;
> +       }
> +
> +       uclass_foreach_dev(dev, uc) {
> +               if (!device_active(dev))
> +                       continue;
> +               priv = dev_get_uclass_priv(dev);
> +               if (!priv->running)
> +                       continue;
> +               ret = wdt_stop(dev);
> +               if (ret)
> +                       log_debug("wdt_stop(%s) failed: %d\n",
> dev->name, ret);

Again, log_err() might be better?

> +       }
> +}
> +
>   int wdt_reset(struct udevice *dev)
>   {
>          const struct wdt_ops *ops = device_get_ops(dev);
> 
> (along with a declaration in wdt.h, and another patch for x530 to make
> use of it). Something like that?

Looks good, thanks for quickly working on this. Not sure, if this new
function should be "void" or better "int" so that the error can be
returned.

A follow-up patch on-top your patchset v4 will cause git bisect
problems. So you need to integrate this into your patches and send a
v5 I'm afraid.

Thanks,
Stefan
Rasmus Villemoes Aug. 11, 2021, 12:13 p.m. UTC | #6
On 11/08/2021 13.49, Stefan Roese wrote:
> Hi Rasmus,
> 
> On 11.08.21 13:32, Rasmus Villemoes wrote:
>> On 03/08/2021 10.28, Stefan Roese wrote:
>>> Hi Rasmus,
>>>
>>>>    #endif
>>>> diff --git a/include/asm-generic/global_data.h
>>>> b/include/asm-generic/global_data.h
>>>> index e55070303f..28d749538c 100644
>>>> --- a/include/asm-generic/global_data.h
>>>> +++ b/include/asm-generic/global_data.h
>>>> @@ -447,12 +447,6 @@ struct global_data {
>>>>         */
>>>>        fdt_addr_t translation_offset;
>>>>    #endif
>>>> -#if CONFIG_IS_ENABLED(WDT)
>>>> -    /**
>>>> -     * @watchdog_dev: watchdog device
>>>> -     */
>>>> -    struct udevice *watchdog_dev;
>>>> -#endif
>>>
>>> After applying the patchset v4 to current master, I see this error when
>>> building for "x530":
>>
>> I'm not sure how I missed this, I was convinced I had done a grep and
>> seen that all references to ->watchdog_dev were gone.
>>
>>> board/alliedtelesis/x530/x530.c: In function 'arch_preboot_os':
>>> board/alliedtelesis/x530/x530.c:125:13: error: 'gd_t' {aka 'volatile
>>> struct global_data'} has no member named 'watchdog_dev'
>>>    125 |  wdt_stop(gd->watchdog_dev);
>>>        |             ^~
>>> make[1]: *** [scripts/Makefile.build:254:
>>> board/alliedtelesis/x530/x530.o] Error 1
>>>
>>> Perhaps we need a common function now to stop all watchdogs, which can
>>> be called from such places?
>>
>> Yes, I think that's the right thing, even if there's just one single
>> caller. Dead code elimination should remove that global function for all
>> other boards. Here is what I have right now:
>>
>>      watchdog: wdt-uclass: add wdt_stop_all() helper
>>
>>      Since the watchdog_dev member of struct global_data is going away in
>>      favor of the wdt-uclass handling all watchdog devices, prepare for
>>      that by adding a helper to call wdt_stop() on all known devices.
>>
>>      Initially, this will only be used in one single
>>      place (board/alliedtelesis/x530/x530.c), and that user currently
>>      ignores the return value of wdt_stop(). It's not clear what one
>> should
>>      do if we encounter an error (remember the error but still stop the
>>      remaining ones? return immediately? try to unwind and
>>      restart the devices already stopped?). Moreover, some watchdogs
>> are by
>>      design always-running (and don't have a ->stop method at all), so at
>>      the very least some exception for -ENOSYS would be in order.
>>
>>      So for now, and until a real use case appears from which we can
>> decide
>>      the desired semantics, have the function return void and just emit a
>>      log_debug() if an error is encountered.
>>
>> diff --git a/drivers/watchdog/wdt-uclass.c
>> b/drivers/watchdog/wdt-uclass.c
>> index 358fc68e27..75ff4c2a6c 100644
>> --- a/drivers/watchdog/wdt-uclass.c
>> +++ b/drivers/watchdog/wdt-uclass.c
>> @@ -116,6 +116,31 @@ int wdt_stop(struct udevice *dev)
>>          return ret;
>>   }
>>
>> +void wdt_stop_all(void)
>> +{
>> +       struct wdt_priv *priv;
>> +       struct udevice *dev;
>> +       struct uclass *uc;
>> +       int ret;
>> +
>> +       ret = uclass_get(UCLASS_WDT, &uc);
>> +       if (ret) {
>> +               log_debug("Error getting UCLASS_WDT: %d\n", ret);
> 
> Perhaps log_err()?

No, we've already been over this in earlier discussions (it's the exact
same pattern and reasoning as initr_watchdog). If I made it log_err(),
it would cost .text for something that never-ever happens in practice,
while log_debug() is usually a no-op, but can be compiled in if
something truly fishy seems to be going on.

>>   int wdt_reset(struct udevice *dev)
>>   {
>>          const struct wdt_ops *ops = device_get_ops(dev);
>>
>> (along with a declaration in wdt.h, and another patch for x530 to make
>> use of it). Something like that?
> 
> Looks good, thanks for quickly working on this. Not sure, if this new
> function should be "void" or better "int" so that the error can be
> returned.

That's why I included my tentative commit log, so you could see my
explanation for why I made it void. Until some user shows up that
_wants_ a return value, there's no point making it return int. When that
user shows up, we can discuss which int (return early on failure?
remember that an error was seen but still call wdt_stop on remaining
devices? etc. etc.).

> A follow-up patch on-top your patchset v4 will cause git bisect
> problems. So you need to integrate this into your patches and send a
> v5 I'm afraid.

Definitely, it's already reworked in between 6 and 7 in my local branch,
I just didn't want to send a 12-patch series without some "yeah,
something like that would work".

Rasmus
Stefan Roese Aug. 11, 2021, 12:19 p.m. UTC | #7
On 11.08.21 14:13, Rasmus Villemoes wrote:
> On 11/08/2021 13.49, Stefan Roese wrote:
>> Hi Rasmus,
>>
>> On 11.08.21 13:32, Rasmus Villemoes wrote:
>>> On 03/08/2021 10.28, Stefan Roese wrote:
>>>> Hi Rasmus,
>>>>
>>>>>     #endif
>>>>> diff --git a/include/asm-generic/global_data.h
>>>>> b/include/asm-generic/global_data.h
>>>>> index e55070303f..28d749538c 100644
>>>>> --- a/include/asm-generic/global_data.h
>>>>> +++ b/include/asm-generic/global_data.h
>>>>> @@ -447,12 +447,6 @@ struct global_data {
>>>>>          */
>>>>>         fdt_addr_t translation_offset;
>>>>>     #endif
>>>>> -#if CONFIG_IS_ENABLED(WDT)
>>>>> -    /**
>>>>> -     * @watchdog_dev: watchdog device
>>>>> -     */
>>>>> -    struct udevice *watchdog_dev;
>>>>> -#endif
>>>>
>>>> After applying the patchset v4 to current master, I see this error when
>>>> building for "x530":
>>>
>>> I'm not sure how I missed this, I was convinced I had done a grep and
>>> seen that all references to ->watchdog_dev were gone.
>>>
>>>> board/alliedtelesis/x530/x530.c: In function 'arch_preboot_os':
>>>> board/alliedtelesis/x530/x530.c:125:13: error: 'gd_t' {aka 'volatile
>>>> struct global_data'} has no member named 'watchdog_dev'
>>>>     125 |  wdt_stop(gd->watchdog_dev);
>>>>         |             ^~
>>>> make[1]: *** [scripts/Makefile.build:254:
>>>> board/alliedtelesis/x530/x530.o] Error 1
>>>>
>>>> Perhaps we need a common function now to stop all watchdogs, which can
>>>> be called from such places?
>>>
>>> Yes, I think that's the right thing, even if there's just one single
>>> caller. Dead code elimination should remove that global function for all
>>> other boards. Here is what I have right now:
>>>
>>>       watchdog: wdt-uclass: add wdt_stop_all() helper
>>>
>>>       Since the watchdog_dev member of struct global_data is going away in
>>>       favor of the wdt-uclass handling all watchdog devices, prepare for
>>>       that by adding a helper to call wdt_stop() on all known devices.
>>>
>>>       Initially, this will only be used in one single
>>>       place (board/alliedtelesis/x530/x530.c), and that user currently
>>>       ignores the return value of wdt_stop(). It's not clear what one
>>> should
>>>       do if we encounter an error (remember the error but still stop the
>>>       remaining ones? return immediately? try to unwind and
>>>       restart the devices already stopped?). Moreover, some watchdogs
>>> are by
>>>       design always-running (and don't have a ->stop method at all), so at
>>>       the very least some exception for -ENOSYS would be in order.
>>>
>>>       So for now, and until a real use case appears from which we can
>>> decide
>>>       the desired semantics, have the function return void and just emit a
>>>       log_debug() if an error is encountered.
>>>
>>> diff --git a/drivers/watchdog/wdt-uclass.c
>>> b/drivers/watchdog/wdt-uclass.c
>>> index 358fc68e27..75ff4c2a6c 100644
>>> --- a/drivers/watchdog/wdt-uclass.c
>>> +++ b/drivers/watchdog/wdt-uclass.c
>>> @@ -116,6 +116,31 @@ int wdt_stop(struct udevice *dev)
>>>           return ret;
>>>    }
>>>
>>> +void wdt_stop_all(void)
>>> +{
>>> +       struct wdt_priv *priv;
>>> +       struct udevice *dev;
>>> +       struct uclass *uc;
>>> +       int ret;
>>> +
>>> +       ret = uclass_get(UCLASS_WDT, &uc);
>>> +       if (ret) {
>>> +               log_debug("Error getting UCLASS_WDT: %d\n", ret);
>>
>> Perhaps log_err()?
> 
> No, we've already been over this in earlier discussions (it's the exact
> same pattern and reasoning as initr_watchdog). If I made it log_err(),
> it would cost .text for something that never-ever happens in practice,
> while log_debug() is usually a no-op, but can be compiled in if
> something truly fishy seems to be going on.

Okay, then please stick with log_debug().

>>>    int wdt_reset(struct udevice *dev)
>>>    {
>>>           const struct wdt_ops *ops = device_get_ops(dev);
>>>
>>> (along with a declaration in wdt.h, and another patch for x530 to make
>>> use of it). Something like that?
>>
>> Looks good, thanks for quickly working on this. Not sure, if this new
>> function should be "void" or better "int" so that the error can be
>> returned.
> 
> That's why I included my tentative commit log, so you could see my
> explanation for why I made it void.

Ah, sorry for not reading it thoroughly. My bad.

> Until some user shows up that
> _wants_ a return value, there's no point making it return int. When that
> user shows up, we can discuss which int (return early on failure?
> remember that an error was seen but still call wdt_stop on remaining
> devices? etc. etc.).

I'm fine with it.

>> A follow-up patch on-top your patchset v4 will cause git bisect
>> problems. So you need to integrate this into your patches and send a
>> v5 I'm afraid.
> 
> Definitely, it's already reworked in between 6 and 7 in my local branch,
> I just didn't want to send a 12-patch series without some "yeah,
> something like that would work".

Perfect. Please go ahead then.

Thanks,
Stefan
Wolfgang Denk Aug. 11, 2021, 12:29 p.m. UTC | #8
Dear Rasmus,

In message <3d48015a-07d3-e296-b9ba-a1edd455ce9e@prevas.dk> you wrote:
>
> >> +       if (ret) {
> >> +               log_debug("Error getting UCLASS_WDT: %d\n", ret);
> > 
> > Perhaps log_err()?
>
> No, we've already been over this in earlier discussions (it's the exact
> same pattern and reasoning as initr_watchdog). If I made it log_err(),
> it would cost .text for something that never-ever happens in practice,
> while log_debug() is usually a no-op, but can be compiled in if
> something truly fishy seems to be going on.

This argument fits on all types or effors: they are supposed to
never ever happen - at least in theory; in reality they do, and more
often than we like.

And a proper error message is mandatory for correct error handling.

> > Looks good, thanks for quickly working on this. Not sure, if this new
> > function should be "void" or better "int" so that the error can be
> > returned.
>
> That's why I included my tentative commit log, so you could see my
> explanation for why I made it void. Until some user shows up that
> _wants_ a return value, there's no point making it return int. When that
> user shows up, we can discuss which int (return early on failure?
> remember that an error was seen but still call wdt_stop on remaining
> devices? etc. etc.).

Returning an error code is always a good ide, no matter if
current users check it or not.


Best regards,

Wolfgang Denk
Tom Rini Aug. 11, 2021, 12:43 p.m. UTC | #9
On Wed, Aug 11, 2021 at 02:29:12PM +0200, Wolfgang Denk wrote:
> Dear Rasmus,
> 
> In message <3d48015a-07d3-e296-b9ba-a1edd455ce9e@prevas.dk> you wrote:
> >
> > >> +       if (ret) {
> > >> +               log_debug("Error getting UCLASS_WDT: %d\n", ret);
> > > 
> > > Perhaps log_err()?
> >
> > No, we've already been over this in earlier discussions (it's the exact
> > same pattern and reasoning as initr_watchdog). If I made it log_err(),
> > it would cost .text for something that never-ever happens in practice,
> > while log_debug() is usually a no-op, but can be compiled in if
> > something truly fishy seems to be going on.
> 
> This argument fits on all types or effors: they are supposed to
> never ever happen - at least in theory; in reality they do, and more
> often than we like.
> 
> And a proper error message is mandatory for correct error handling.

Error messages are a fine line to walk.  We've got to have been very
badly corrupted to go down this error path.  There's going to be lots of
other error messages popping out.  Saving a bit of .text is good.  It
makes it easier to justify spending a little .text later.

> > > Looks good, thanks for quickly working on this. Not sure, if this new
> > > function should be "void" or better "int" so that the error can be
> > > returned.
> >
> > That's why I included my tentative commit log, so you could see my
> > explanation for why I made it void. Until some user shows up that
> > _wants_ a return value, there's no point making it return int. When that
> > user shows up, we can discuss which int (return early on failure?
> > remember that an error was seen but still call wdt_stop on remaining
> > devices? etc. etc.).
> 
> Returning an error code is always a good ide, no matter if
> current users check it or not.

And here I agree, catch an error code, pass the error code back to the
caller.  That's far more important than making sure that every error
code we catch logs a message by default every time.
Wolfgang Denk Aug. 12, 2021, 6:40 a.m. UTC | #10
Dear Tom,

In message <20210811124318.GT858@bill-the-cat> you wrote:
> 
> > This argument fits on all types or effors: they are supposed to
> > never ever happen - at least in theory; in reality they do, and more
> > often than we like.
> > 
> > And a proper error message is mandatory for correct error handling.
> 
> Error messages are a fine line to walk.  We've got to have been very
> badly corrupted to go down this error path.  There's going to be lots of
> other error messages popping out.  Saving a bit of .text is good.  It
> makes it easier to justify spending a little .text later.

Letting errors slip through unnoticed when there is a trival way to
at least inform the user of the problem is simply unacceptable.

Please do not let U-Boot degrade into such a crappy piece of code.

There are tons of other places where we don't even mention code
size, so if you want to save on that, there are many bette4r places
to save than error handling.

> And here I agree, catch an error code, pass the error code back to the
> caller.  That's far more important than making sure that every error
> code we catch logs a message by default every time.

It does not matter where the error is reported - in the called
function, or in some caller firther up the call tree.  But it _must_
be reportet at least once.

So if we don't issue an error message here, we need to check and fix
the callers, too.

Best regards,

Wolfgang Denk
Tom Rini Aug. 12, 2021, 1:48 p.m. UTC | #11
On Thu, Aug 12, 2021 at 08:40:21AM +0200, Wolfgang Denk wrote:
> Dear Tom,
> 
> In message <20210811124318.GT858@bill-the-cat> you wrote:
> > 
> > > This argument fits on all types or effors: they are supposed to
> > > never ever happen - at least in theory; in reality they do, and more
> > > often than we like.
> > > 
> > > And a proper error message is mandatory for correct error handling.
> > 
> > Error messages are a fine line to walk.  We've got to have been very
> > badly corrupted to go down this error path.  There's going to be lots of
> > other error messages popping out.  Saving a bit of .text is good.  It
> > makes it easier to justify spending a little .text later.
> 
> Letting errors slip through unnoticed when there is a trival way to
> at least inform the user of the problem is simply unacceptable.
> 
> Please do not let U-Boot degrade into such a crappy piece of code.
> 
> There are tons of other places where we don't even mention code
> size, so if you want to save on that, there are many bette4r places
> to save than error handling.

Alright, lets take a look at what kind of area of the code we're talking
about.  uclass_get is a pretty fundamental thing.  If that fails, your
system is on fire.  Things are massively corrupt.  Lets look at other
existing callers to see what happens.  Most callers check the return
code, like you need to, and pass it up the chain to deal with.  We have
a few board specific ones such as
board/Marvell/octeontx2/board.c::board_quiesce_devices() that is also
conceptually like the x530 case in the next part of the series.  That
does print on failure.  The rest of the ones that print an error message
are about commands and it's somewhat helpful there.

So yes, return codes need to be checked and passed.  But no, not every
single error path needs to print to the user along every part of an
error path either.

> > And here I agree, catch an error code, pass the error code back to the
> > caller.  That's far more important than making sure that every error
> > code we catch logs a message by default every time.
> 
> It does not matter where the error is reported - in the called
> function, or in some caller firther up the call tree.  But it _must_
> be reportet at least once.
> 
> So if we don't issue an error message here, we need to check and fix
> the callers, too.

That would be the next patch in the series where the BSP author isn't
currently checking the return value, and this series doesn't change
that.  Perhaps it should, and CC the maintainer.  But I think has been
said a few times over the course of this series, what exactly is one
going to do about the failure?  Getting specific for a moment, if you're
in the case of "shutdown the watchdog" and the watchdog doesn't shutdown
like you want it to, do you hang and hope the watchdog is alive to kick
things still?  hang and lock the system?  Figure the system is on fire
anyhow but add another message to the failure spew?

Again, I think the change that's needed to this patch is to make it
return the error code to the caller.  Let the caller decide.  And make
sure to CC the board maintainer on the next go-round so they can chime
in about that.
Simon Glass Aug. 12, 2021, 2:12 p.m. UTC | #12
Hi Tom,

On Thu, 12 Aug 2021 at 07:48, Tom Rini <trini@konsulko.com> wrote:
>
> On Thu, Aug 12, 2021 at 08:40:21AM +0200, Wolfgang Denk wrote:
> > Dear Tom,
> >
> > In message <20210811124318.GT858@bill-the-cat> you wrote:
> > >
> > > > This argument fits on all types or effors: they are supposed to
> > > > never ever happen - at least in theory; in reality they do, and more
> > > > often than we like.
> > > >
> > > > And a proper error message is mandatory for correct error handling.
> > >
> > > Error messages are a fine line to walk.  We've got to have been very
> > > badly corrupted to go down this error path.  There's going to be lots of
> > > other error messages popping out.  Saving a bit of .text is good.  It
> > > makes it easier to justify spending a little .text later.
> >
> > Letting errors slip through unnoticed when there is a trival way to
> > at least inform the user of the problem is simply unacceptable.
> >
> > Please do not let U-Boot degrade into such a crappy piece of code.
> >
> > There are tons of other places where we don't even mention code
> > size, so if you want to save on that, there are many bette4r places
> > to save than error handling.
>
> Alright, lets take a look at what kind of area of the code we're talking
> about.  uclass_get is a pretty fundamental thing.  If that fails, your
> system is on fire.  Things are massively corrupt.  Lets look at other
> existing callers to see what happens.  Most callers check the return
> code, like you need to, and pass it up the chain to deal with.  We have
> a few board specific ones such as
> board/Marvell/octeontx2/board.c::board_quiesce_devices() that is also
> conceptually like the x530 case in the next part of the series.  That
> does print on failure.  The rest of the ones that print an error message
> are about commands and it's somewhat helpful there.
>
> So yes, return codes need to be checked and passed.  But no, not every
> single error path needs to print to the user along every part of an
> error path either.
>
> > > And here I agree, catch an error code, pass the error code back to the
> > > caller.  That's far more important than making sure that every error
> > > code we catch logs a message by default every time.
> >
> > It does not matter where the error is reported - in the called
> > function, or in some caller firther up the call tree.  But it _must_
> > be reportet at least once.
> >
> > So if we don't issue an error message here, we need to check and fix
> > the callers, too.
>
> That would be the next patch in the series where the BSP author isn't
> currently checking the return value, and this series doesn't change
> that.  Perhaps it should, and CC the maintainer.  But I think has been
> said a few times over the course of this series, what exactly is one
> going to do about the failure?  Getting specific for a moment, if you're
> in the case of "shutdown the watchdog" and the watchdog doesn't shutdown
> like you want it to, do you hang and hope the watchdog is alive to kick
> things still?  hang and lock the system?  Figure the system is on fire
> anyhow but add another message to the failure spew?
>
> Again, I think the change that's needed to this patch is to make it
> return the error code to the caller.  Let the caller decide.  And make
> sure to CC the board maintainer on the next go-round so they can chime
> in about that.

I strongly agree with this.I try to encourage people not to report
errors inside drivers, since the caller should be able to deal with
it, particularly if the error number provides useful information. It
bloats the code.

But we do have these strange cases where there is no obvious thing to
do. Where we are probing several devices to fire them up and one
fails, people worry that this means that some of them won't get
probed. In that case I tend to continue on and probe them all, but
then return error if any of them failed. At least the caller knows.

Also I like the new log_ret() and log_msg_ret() functions, which can
log an error as it goes up the stack if LOG_ERROR_RETURN is enabled.
It is nice to be able to see where an error came from. We could
perhaps improve this by logging the error when it is created (the
first time some calls 'return -Exxx').

I'm not a fan of board code which calls a function and doesn't check
the error. The board may not operate correctly, or it may limp along,
but the board author should be able to get all the bugs of it at the
start so that we are not requested invalid clocks, etc.

Regards,
Simon
Wolfgang Denk Aug. 12, 2021, 2:21 p.m. UTC | #13
Dear Tom,

In message <20210812134833.GU858@bill-the-cat> you wrote:
> 
> Alright, lets take a look at what kind of area of the code we're talking
> about.  uclass_get is a pretty fundamental thing.  If that fails, your
> system is on fire.  Things are massively corrupt.

Full agreement here.

> So yes, return codes need to be checked and passed.  But no, not every
> single error path needs to print to the user along every part of an
> error path either.

So if "the system is on fire" is one of the cases where an error
message should be omitted to save maybe 50 or 100 bytes of image
size?  This sounds wrong to me.

When a system crashes or hangs, it is extremely helpful to gen an
indication of what happened.

Printing messages only with debug enabled is pretty worthless, as in
the real world the failures will happen in the field when running
the production image with debug not enabled.

And as soon as you do enable debug, you will have a different image,
which may or may not show the problem.  We have all been there
before.

> That would be the next patch in the series where the BSP author isn't
> currently checking the return value, and this series doesn't change
> that.  Perhaps it should, and CC the maintainer.

Indeed.

> But I think has been
> said a few times over the course of this series, what exactly is one
> going to do about the failure?  Getting specific for a moment, if you're
> in the case of "shutdown the watchdog" and the watchdog doesn't shutdown
> like you want it to, do you hang and hope the watchdog is alive to kick
> things still?  hang and lock the system?  Figure the system is on fire
> anyhow but add another message to the failure spew?

Adding a message about the _cause_ of the failure, i. e. at least
information about the place where it was first detected, is what
will be helpful to the engineer who is supposed to analyze the
problem.

And yes, if such a fundamental operation fails, it is wise to abort
the operation of this device - either by hard resetting it or by
hanging it, depending of what the chances are that a reboot might
fix the problem.

IMO it is better to fail a broken device in a reliable mode instead
of letting it run and having more spectacular crashes (likely with
more serious consequences) happen later.

> Again, I think the change that's needed to this patch is to make it
> return the error code to the caller.  Let the caller decide.  And make
> sure to CC the board maintainer on the next go-round so they can chime
> in about that.

If we agree that in the disputed case "the system is on fire", then
there is actually very little to decide.  There should be only one
possible action: stop here, before more damage happens.

Best regards,

Wolfgang Denk
Tom Rini Aug. 12, 2021, 4:20 p.m. UTC | #14
On Thu, Aug 12, 2021 at 04:21:29PM +0200, Wolfgang Denk wrote:
> Dear Tom,
> 
> In message <20210812134833.GU858@bill-the-cat> you wrote:
> > 
> > Alright, lets take a look at what kind of area of the code we're talking
> > about.  uclass_get is a pretty fundamental thing.  If that fails, your
> > system is on fire.  Things are massively corrupt.
> 
> Full agreement here.
> 
> > So yes, return codes need to be checked and passed.  But no, not every
> > single error path needs to print to the user along every part of an
> > error path either.
> 
> So if "the system is on fire" is one of the cases where an error
> message should be omitted to save maybe 50 or 100 bytes of image
> size?  This sounds wrong to me.

It sounds right to me because it's unlikely everything caught fire
because of this call right here and likely it's because of one of the
messages much further up on the console log.  Hopefully we haven't
caused that message to be unavailable now due to unhelpful failure
messages.

A log message needs to have value to it above and beyond boiling down to
"%s: %d", __func__, __LINE__ having been reached.  This, right here, is
not a log message that matters.  With DM we've made a great deal of
progress in being able to populate meaningful errors back up to our
callers rather than -1 for everything.  So yes, in sum, these functions
need to return a value.  The BSP ought to care (in the next patch), even
if it doesn't today when it could.  But that's on the BSP author as they
know better than you or I what that system is being used for.
Wolfgang Denk Aug. 13, 2021, 6:17 a.m. UTC | #15
Dear Tom,

In message <20210812162034.GY858@bill-the-cat> you wrote:
> 
> > So if "the system is on fire" is one of the cases where an error
> > message should be omitted to save maybe 50 or 100 bytes of image
> > size?  This sounds wrong to me.
>
> It sounds right to me because it's unlikely everything caught fire
> because of this call right here and likely it's because of one of the
> messages much further up on the console log.  Hopefully we haven't
> caused that message to be unavailable now due to unhelpful failure
> messages.

Omitting code to handle situations that are unlikely to happen is
definitely not what I consider robust programming, and nothing which
I would let pass a code review.

But if you insist, there is no more to do for me here that to note
that fact that we disagree.


Wolfgang Denk
Stefan Roese Aug. 17, 2021, 9:28 a.m. UTC | #16
On 12.08.21 15:48, Tom Rini wrote:
> On Thu, Aug 12, 2021 at 08:40:21AM +0200, Wolfgang Denk wrote:
>> Dear Tom,
>>
>> In message <20210811124318.GT858@bill-the-cat> you wrote:
>>>
>>>> This argument fits on all types or effors: they are supposed to
>>>> never ever happen - at least in theory; in reality they do, and more
>>>> often than we like.
>>>>
>>>> And a proper error message is mandatory for correct error handling.
>>>
>>> Error messages are a fine line to walk.  We've got to have been very
>>> badly corrupted to go down this error path.  There's going to be lots of
>>> other error messages popping out.  Saving a bit of .text is good.  It
>>> makes it easier to justify spending a little .text later.
>>
>> Letting errors slip through unnoticed when there is a trival way to
>> at least inform the user of the problem is simply unacceptable.
>>
>> Please do not let U-Boot degrade into such a crappy piece of code.
>>
>> There are tons of other places where we don't even mention code
>> size, so if you want to save on that, there are many bette4r places
>> to save than error handling.
> 
> Alright, lets take a look at what kind of area of the code we're talking
> about.  uclass_get is a pretty fundamental thing.  If that fails, your
> system is on fire.  Things are massively corrupt.  Lets look at other
> existing callers to see what happens.  Most callers check the return
> code, like you need to, and pass it up the chain to deal with.  We have
> a few board specific ones such as
> board/Marvell/octeontx2/board.c::board_quiesce_devices() that is also
> conceptually like the x530 case in the next part of the series.  That
> does print on failure.  The rest of the ones that print an error message
> are about commands and it's somewhat helpful there.
> 
> So yes, return codes need to be checked and passed.  But no, not every
> single error path needs to print to the user along every part of an
> error path either.
> 
>>> And here I agree, catch an error code, pass the error code back to the
>>> caller.  That's far more important than making sure that every error
>>> code we catch logs a message by default every time.
>>
>> It does not matter where the error is reported - in the called
>> function, or in some caller firther up the call tree.  But it _must_
>> be reportet at least once.
>>
>> So if we don't issue an error message here, we need to check and fix
>> the callers, too.
> 
> That would be the next patch in the series where the BSP author isn't
> currently checking the return value, and this series doesn't change
> that.  Perhaps it should, and CC the maintainer.  But I think has been
> said a few times over the course of this series, what exactly is one
> going to do about the failure?  Getting specific for a moment, if you're
> in the case of "shutdown the watchdog" and the watchdog doesn't shutdown
> like you want it to, do you hang and hope the watchdog is alive to kick
> things still?  hang and lock the system?  Figure the system is on fire
> anyhow but add another message to the failure spew?
> 
> Again, I think the change that's needed to this patch is to make it
> return the error code to the caller.  Let the caller decide.  And make
> sure to CC the board maintainer on the next go-round so they can chime
> in about that.

Getting back to this to hopefully get this decided:

It seems that we (most of us?) agree on this change, that wdt_stop_all()
shall be changed to return an error code and the caller can decide what
to do with it?

If yes, then Rasmus, could you please re-spin this patchset accordingly
and send v6?

Thanks,
Stefan
Tom Rini Aug. 17, 2021, 12:35 p.m. UTC | #17
On Tue, Aug 17, 2021 at 11:28:39AM +0200, Stefan Roese wrote:
> On 12.08.21 15:48, Tom Rini wrote:
> > On Thu, Aug 12, 2021 at 08:40:21AM +0200, Wolfgang Denk wrote:
> > > Dear Tom,
> > > 
> > > In message <20210811124318.GT858@bill-the-cat> you wrote:
> > > > 
> > > > > This argument fits on all types or effors: they are supposed to
> > > > > never ever happen - at least in theory; in reality they do, and more
> > > > > often than we like.
> > > > > 
> > > > > And a proper error message is mandatory for correct error handling.
> > > > 
> > > > Error messages are a fine line to walk.  We've got to have been very
> > > > badly corrupted to go down this error path.  There's going to be lots of
> > > > other error messages popping out.  Saving a bit of .text is good.  It
> > > > makes it easier to justify spending a little .text later.
> > > 
> > > Letting errors slip through unnoticed when there is a trival way to
> > > at least inform the user of the problem is simply unacceptable.
> > > 
> > > Please do not let U-Boot degrade into such a crappy piece of code.
> > > 
> > > There are tons of other places where we don't even mention code
> > > size, so if you want to save on that, there are many bette4r places
> > > to save than error handling.
> > 
> > Alright, lets take a look at what kind of area of the code we're talking
> > about.  uclass_get is a pretty fundamental thing.  If that fails, your
> > system is on fire.  Things are massively corrupt.  Lets look at other
> > existing callers to see what happens.  Most callers check the return
> > code, like you need to, and pass it up the chain to deal with.  We have
> > a few board specific ones such as
> > board/Marvell/octeontx2/board.c::board_quiesce_devices() that is also
> > conceptually like the x530 case in the next part of the series.  That
> > does print on failure.  The rest of the ones that print an error message
> > are about commands and it's somewhat helpful there.
> > 
> > So yes, return codes need to be checked and passed.  But no, not every
> > single error path needs to print to the user along every part of an
> > error path either.
> > 
> > > > And here I agree, catch an error code, pass the error code back to the
> > > > caller.  That's far more important than making sure that every error
> > > > code we catch logs a message by default every time.
> > > 
> > > It does not matter where the error is reported - in the called
> > > function, or in some caller firther up the call tree.  But it _must_
> > > be reportet at least once.
> > > 
> > > So if we don't issue an error message here, we need to check and fix
> > > the callers, too.
> > 
> > That would be the next patch in the series where the BSP author isn't
> > currently checking the return value, and this series doesn't change
> > that.  Perhaps it should, and CC the maintainer.  But I think has been
> > said a few times over the course of this series, what exactly is one
> > going to do about the failure?  Getting specific for a moment, if you're
> > in the case of "shutdown the watchdog" and the watchdog doesn't shutdown
> > like you want it to, do you hang and hope the watchdog is alive to kick
> > things still?  hang and lock the system?  Figure the system is on fire
> > anyhow but add another message to the failure spew?
> > 
> > Again, I think the change that's needed to this patch is to make it
> > return the error code to the caller.  Let the caller decide.  And make
> > sure to CC the board maintainer on the next go-round so they can chime
> > in about that.
> 
> Getting back to this to hopefully get this decided:
> 
> It seems that we (most of us?) agree on this change, that wdt_stop_all()
> shall be changed to return an error code and the caller can decide what
> to do with it?
> 
> If yes, then Rasmus, could you please re-spin this patchset accordingly
> and send v6?

Yes, please and thanks.
Stefan Roese Aug. 27, 2021, 6:30 a.m. UTC | #18
Hi Tom,

On 17.08.21 14:35, Tom Rini wrote:

<snip>

>> Getting back to this to hopefully get this decided:
>>
>> It seems that we (most of us?) agree on this change, that wdt_stop_all()
>> shall be changed to return an error code and the caller can decide what
>> to do with it?
>>
>> If yes, then Rasmus, could you please re-spin this patchset accordingly
>> and send v6?
> 
> Yes, please and thanks.

Tom, would you like me to push this patchset in at this stage (rc2), or
better defer to the next merge window?

Thanks,
Stefan
Tom Rini Aug. 27, 2021, 12:26 p.m. UTC | #19
On Fri, Aug 27, 2021 at 08:30:48AM +0200, Stefan Roese wrote:
> Hi Tom,
> 
> On 17.08.21 14:35, Tom Rini wrote:
> 
> <snip>
> 
> > > Getting back to this to hopefully get this decided:
> > > 
> > > It seems that we (most of us?) agree on this change, that wdt_stop_all()
> > > shall be changed to return an error code and the caller can decide what
> > > to do with it?
> > > 
> > > If yes, then Rasmus, could you please re-spin this patchset accordingly
> > > and send v6?
> > 
> > Yes, please and thanks.
> 
> Tom, would you like me to push this patchset in at this stage (rc2), or
> better defer to the next merge window?

I'm going to open -next on Monday, so lets put it there.  Thanks!
diff mbox series

Patch

diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
index 358fc68e27..0ce8b3a425 100644
--- a/drivers/watchdog/wdt-uclass.c
+++ b/drivers/watchdog/wdt-uclass.c
@@ -61,20 +61,24 @@  static void init_watchdog_dev(struct udevice *dev)
 
 int initr_watchdog(void)
 {
-	/*
-	 * Init watchdog: This will call the probe function of the
-	 * watchdog driver, enabling the use of the device
-	 */
-	if (uclass_get_device_by_seq(UCLASS_WDT, 0,
-				     (struct udevice **)&gd->watchdog_dev)) {
-		debug("WDT:   Not found by seq!\n");
-		if (uclass_get_device(UCLASS_WDT, 0,
-				      (struct udevice **)&gd->watchdog_dev)) {
-			printf("WDT:   Not found!\n");
-			return 0;
+	struct udevice *dev;
+	struct uclass *uc;
+	int ret;
+
+	ret = uclass_get(UCLASS_WDT, &uc);
+	if (ret) {
+		log_debug("Error getting UCLASS_WDT: %d\n", ret);
+		return 0;
+	}
+
+	uclass_foreach_dev(dev, uc) {
+		ret = device_probe(dev);
+		if (ret) {
+			log_debug("Error probing %s: %d\n", dev->name, ret);
+			continue;
 		}
+		init_watchdog_dev(dev);
 	}
-	init_watchdog_dev(gd->watchdog_dev);
 
 	gd->flags |= GD_FLG_WDT_READY;
 	return 0;
@@ -157,22 +161,34 @@  void watchdog_reset(void)
 {
 	struct wdt_priv *priv;
 	struct udevice *dev;
+	struct uclass *uc;
 	ulong now;
 
 	/* Exit if GD is not ready or watchdog is not initialized yet */
 	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
 		return;
 
-	dev = gd->watchdog_dev;
-	priv = dev_get_uclass_priv(dev);
-	if (!priv->running)
+	if (uclass_get(UCLASS_WDT, &uc))
 		return;
 
-	/* Do not reset the watchdog too often */
-	now = get_timer(0);
-	if (time_after_eq(now, priv->next_reset)) {
-		priv->next_reset = now + priv->reset_period;
-		wdt_reset(dev);
+	/*
+	 * All devices bound to the wdt uclass should have been probed
+	 * in initr_watchdog(). But just in case something went wrong,
+	 * check device_active() before accessing the uclass private
+	 * data.
+	 */
+	uclass_foreach_dev(dev, uc) {
+		if (!device_active(dev))
+			continue;
+		priv = dev_get_uclass_priv(dev);
+		if (!priv->running)
+			continue;
+		/* Do not reset the watchdog too often */
+		now = get_timer(0);
+		if (time_after_eq(now, priv->next_reset)) {
+			priv->next_reset = now + priv->reset_period;
+			wdt_reset(dev);
+		}
 	}
 }
 #endif
diff --git a/include/asm-generic/global_data.h b/include/asm-generic/global_data.h
index e55070303f..28d749538c 100644
--- a/include/asm-generic/global_data.h
+++ b/include/asm-generic/global_data.h
@@ -447,12 +447,6 @@  struct global_data {
 	 */
 	fdt_addr_t translation_offset;
 #endif
-#if CONFIG_IS_ENABLED(WDT)
-	/**
-	 * @watchdog_dev: watchdog device
-	 */
-	struct udevice *watchdog_dev;
-#endif
 #ifdef CONFIG_GENERATE_ACPI_TABLE
 	/**
 	 * @acpi_ctx: ACPI context pointer