diff mbox series

[U-Boot] watchdog: omap_wdt: improve watchdog reset path

Message ID 1519866948-25614-1-git-send-email-ruslan.bilovol@gmail.com
State Accepted
Commit 87c692cbc17bc987c8a5f8c0377d8326348786ea
Delegated to: Tom Rini
Headers show
Series [U-Boot] watchdog: omap_wdt: improve watchdog reset path | expand

Commit Message

Ruslan Bilovol March 1, 2018, 1:15 a.m. UTC
Remove busy looping during watchdog reset.
Each polling of W_PEND_WTGR bit ("finish posted
write") after watchdog reset takes 120-140us
on BeagleBone Black board. Current U-Boot code
has watchdog resets in random places and often
there is situation when watchdog is reset
few times in a row in nested functions.
This adds extra delays and slows the whole system.

Instead of polling W_PEND_WTGR bit, we skip
watchdog reset if the bit is set. Anyway, watchdog
is in the middle of reset *right now*, so we can
just return.

This noticeably increases performance of the
system. Below are some measurements on BBB:
 - DFU upload over USB                 15% faster
 - fastboot image upload               3x times faster
 - USB ep0 transfers with 4k packets   20% faster

Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
---
 drivers/watchdog/omap_wdt.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Ruslan Bilovol March 1, 2018, 2:10 p.m. UTC | #1
Hi Lukasz,

On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski <lukma@denx.de> wrote:
> Hi Ruslan,
>
>> Remove busy looping during watchdog reset.
>> Each polling of W_PEND_WTGR bit ("finish posted
>> write") after watchdog reset takes 120-140us
>> on BeagleBone Black board. Current U-Boot code
>> has watchdog resets in random places and often
>> there is situation when watchdog is reset
>> few times in a row in nested functions.
>> This adds extra delays and slows the whole system.
>>
>> Instead of polling W_PEND_WTGR bit, we skip
>> watchdog reset if the bit is set. Anyway, watchdog
>> is in the middle of reset *right now*, so we can
>> just return.
>
> It seems like similar problem may be in the Linux kernel driver:
> v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
>
> Linux driver also waits for the write.

Right, Linux driver has similar code but it doesn't affect performance.
This is because of watchdog usage in Linux, it is enabled and
reset by userspace. This is quite rare event.
Moreover, since Linux has interrupts enabled and has scheduling,
such busy loops in the omap driver are not very different to
just mdelay() usage. The system can handle interrupts, and can
even do preemption if PREEMPT is enabled.
So use of such busy loops in that particular case shouldn't cause
any noticeable performance issues.

In U-Boot we have polling instead of interrupts and something like
cooperative multitasking. Also watchdog resets much more often,
and such busy pollings in the driver delay execution of other code.

For example, in DFU we have indefinite loop, something like:
   --->
  |   dfu_write()
  |   watchdog_reset()
  |   ...
  |   handle_usb()
  |      `----- watchdog_reset()
  |       `---- ...
  `----'

And each delay caused by watchdog reset adds significant
overhead comparing to handling USB events or writing to the
storage.

So as bottom line, we don't need to do similar change in
Linux driver because there is almost no impact on performance
in that case. But in case of U-Boot which uses polling instead
of interrupts, this patch causes such a big performance
improvement.

Best regards,
Ruslan

>
>>
>> This noticeably increases performance of the
>> system. Below are some measurements on BBB:
>>  - DFU upload over USB                 15% faster
>>  - fastboot image upload               3x times faster
>>  - USB ep0 transfers with 4k packets   20% faster
>>
>> Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
>> ---
>>  drivers/watchdog/omap_wdt.c | 21 +++++++++++++++------
>>  1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c
>> index 7b1f429..d724c96 100644
>> --- a/drivers/watchdog/omap_wdt.c
>> +++ b/drivers/watchdog/omap_wdt.c
>> @@ -53,16 +53,25 @@ void hw_watchdog_reset(void)
>>  {
>>       struct wd_timer *wdt = (struct wd_timer *)WDT_BASE;
>>
>> -     /* wait for posted write to complete */
>> -     while ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
>> -             ;
>> +     /*
>> +      * Somebody just triggered watchdog reset and write to WTGR
>> register
>> +      * is in progress. It is resetting right now, no need to
>> trigger it
>> +      * again
>> +      */
>> +     if ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
>> +             return;
>>
>>       wdt_trgr_pattern = ~wdt_trgr_pattern;
>>       writel(wdt_trgr_pattern, &wdt->wdtwtgr);
>>
>> -     /* wait for posted write to complete */
>> -     while ((readl(&wdt->wdtwwps) & WDT_WWPS_PEND_WTGR))
>> -             ;
>> +     /*
>> +      * Don't wait for posted write to complete, i.e. don't check
>> +      * WDT_WWPS_PEND_WTGR bit in WWPS register. There is no
>> writes to
>> +      * WTGR register outside of this func, and if entering it
>> +      * we see WDT_WWPS_PEND_WTGR bit set, it means watchdog reset
>> +      * was just triggered. This prevents us from wasting time in
>> busy
>> +      * polling of WDT_WWPS_PEND_WTGR bit.
>> +      */
>>  }
>>
>>  static int omap_wdt_set_timeout(unsigned int timeout)
>
>
>
>
> Best regards,
>
> Lukasz Majewski
>
> --
>
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Sam Protsenko March 1, 2018, 2:29 p.m. UTC | #2
On 1 March 2018 at 03:15, Ruslan Bilovol <ruslan.bilovol@gmail.com> wrote:
> Remove busy looping during watchdog reset.
> Each polling of W_PEND_WTGR bit ("finish posted
> write") after watchdog reset takes 120-140us
> on BeagleBone Black board. Current U-Boot code
> has watchdog resets in random places and often
> there is situation when watchdog is reset
> few times in a row in nested functions.
> This adds extra delays and slows the whole system.
>
> Instead of polling W_PEND_WTGR bit, we skip
> watchdog reset if the bit is set. Anyway, watchdog
> is in the middle of reset *right now*, so we can
> just return.
>
> This noticeably increases performance of the
> system. Below are some measurements on BBB:
>  - DFU upload over USB                 15% faster
>  - fastboot image upload               3x times faster
>  - USB ep0 transfers with 4k packets   20% faster
>
> Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
> ---
>  drivers/watchdog/omap_wdt.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c
> index 7b1f429..d724c96 100644
> --- a/drivers/watchdog/omap_wdt.c
> +++ b/drivers/watchdog/omap_wdt.c
> @@ -53,16 +53,25 @@ void hw_watchdog_reset(void)
>  {
>         struct wd_timer *wdt = (struct wd_timer *)WDT_BASE;
>
> -       /* wait for posted write to complete */
> -       while ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
> -               ;
> +       /*
> +        * Somebody just triggered watchdog reset and write to WTGR register
> +        * is in progress. It is resetting right now, no need to trigger it
> +        * again
> +        */
> +       if ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
> +               return;
>
>         wdt_trgr_pattern = ~wdt_trgr_pattern;
>         writel(wdt_trgr_pattern, &wdt->wdtwtgr);
>
> -       /* wait for posted write to complete */
> -       while ((readl(&wdt->wdtwwps) & WDT_WWPS_PEND_WTGR))
> -               ;
> +       /*
> +        * Don't wait for posted write to complete, i.e. don't check
> +        * WDT_WWPS_PEND_WTGR bit in WWPS register. There is no writes to
> +        * WTGR register outside of this func, and if entering it
> +        * we see WDT_WWPS_PEND_WTGR bit set, it means watchdog reset
> +        * was just triggered. This prevents us from wasting time in busy
> +        * polling of WDT_WWPS_PEND_WTGR bit.
> +        */
>  }
>
>  static int omap_wdt_set_timeout(unsigned int timeout)
> --
> 1.9.1
>

Tested-by: Sam Protsenko <semen.protsenko@linaro.org>

Tested this patch on my BeagleBone Black Rev A5A. Works fine. I can
see USB throughput improvement when doing "fastboot flash". Here are
my results:

 - before patch: 2.72 MiB/sec
 - when disabling CONFIG_OMAP_WATCHDOG: 5.13 MiB/sec
 - with this patch: 5.08 MiB/sec

So I can see 1.87 times improvement in speed, which is really good
news for us, especially for flashing big images. I wonder, what is the
max USB speed limit for transfer via EP1. E.g., what will we see if
test it via Gadget Zero in Linux? Maybe there is a room for further
improvements. But those are only my thoughts.

Thanks, Ruslan!
Tom Rini March 1, 2018, 2:33 p.m. UTC | #3
On Thu, Mar 01, 2018 at 04:10:44PM +0200, Ruslan Bilovol wrote:
> Hi Lukasz,
> 
> On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski <lukma@denx.de> wrote:
> > Hi Ruslan,
> >
> >> Remove busy looping during watchdog reset.
> >> Each polling of W_PEND_WTGR bit ("finish posted
> >> write") after watchdog reset takes 120-140us
> >> on BeagleBone Black board. Current U-Boot code
> >> has watchdog resets in random places and often
> >> there is situation when watchdog is reset
> >> few times in a row in nested functions.
> >> This adds extra delays and slows the whole system.
> >>
> >> Instead of polling W_PEND_WTGR bit, we skip
> >> watchdog reset if the bit is set. Anyway, watchdog
> >> is in the middle of reset *right now*, so we can
> >> just return.
> >
> > It seems like similar problem may be in the Linux kernel driver:
> > v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
> >
> > Linux driver also waits for the write.
> 
> Right, Linux driver has similar code but it doesn't affect performance.
> This is because of watchdog usage in Linux, it is enabled and
> reset by userspace. This is quite rare event.
> Moreover, since Linux has interrupts enabled and has scheduling,
> such busy loops in the omap driver are not very different to
> just mdelay() usage. The system can handle interrupts, and can
> even do preemption if PREEMPT is enabled.
> So use of such busy loops in that particular case shouldn't cause
> any noticeable performance issues.

True.  But, can you also submit a patch to the kernel side (and CC me) ?
That's far more likely to get the attention of the engineers that might
know of corner cases with the various SoC families where we might need
to keep doing what we're doing or other possible problems.  Thanks!
Lokesh Vutla March 2, 2018, 9:13 a.m. UTC | #4
On Thursday 01 March 2018 06:45 AM, Ruslan Bilovol wrote:
> Remove busy looping during watchdog reset.
> Each polling of W_PEND_WTGR bit ("finish posted
> write") after watchdog reset takes 120-140us
> on BeagleBone Black board. Current U-Boot code
> has watchdog resets in random places and often
> there is situation when watchdog is reset
> few times in a row in nested functions.
> This adds extra delays and slows the whole system.
> 
> Instead of polling W_PEND_WTGR bit, we skip
> watchdog reset if the bit is set. Anyway, watchdog
> is in the middle of reset *right now*, so we can
> just return.
> 
> This noticeably increases performance of the
> system. Below are some measurements on BBB:
>  - DFU upload over USB                 15% faster
>  - fastboot image upload               3x times faster
>  - USB ep0 transfers with 4k packets   20% faster
> 
> Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com

Given the limited usage of trigger register in this driver, I do not see
any issue with this patch.

Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com>


Thanks and regards,
Lokesh

> ---
>  drivers/watchdog/omap_wdt.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c
> index 7b1f429..d724c96 100644
> --- a/drivers/watchdog/omap_wdt.c
> +++ b/drivers/watchdog/omap_wdt.c
> @@ -53,16 +53,25 @@ void hw_watchdog_reset(void)
>  {
>  	struct wd_timer *wdt = (struct wd_timer *)WDT_BASE;
>  
> -	/* wait for posted write to complete */
> -	while ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
> -		;
> +	/*
> +	 * Somebody just triggered watchdog reset and write to WTGR register
> +	 * is in progress. It is resetting right now, no need to trigger it
> +	 * again
> +	 */
> +	if ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
> +		return;
>  
>  	wdt_trgr_pattern = ~wdt_trgr_pattern;
>  	writel(wdt_trgr_pattern, &wdt->wdtwtgr);
>  
> -	/* wait for posted write to complete */
> -	while ((readl(&wdt->wdtwwps) & WDT_WWPS_PEND_WTGR))
> -		;
> +	/*
> +	 * Don't wait for posted write to complete, i.e. don't check
> +	 * WDT_WWPS_PEND_WTGR bit in WWPS register. There is no writes to
> +	 * WTGR register outside of this func, and if entering it
> +	 * we see WDT_WWPS_PEND_WTGR bit set, it means watchdog reset
> +	 * was just triggered. This prevents us from wasting time in busy
> +	 * polling of WDT_WWPS_PEND_WTGR bit.
> +	 */
>  }
>  
>  static int omap_wdt_set_timeout(unsigned int timeout)
>
Lukasz Majewski March 2, 2018, 2:20 p.m. UTC | #5
Hi Ruslan,

> Hi Lukasz,
> 
> On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski <lukma@denx.de>
> wrote:
> > Hi Ruslan,
> >  
> >> Remove busy looping during watchdog reset.
> >> Each polling of W_PEND_WTGR bit ("finish posted
> >> write") after watchdog reset takes 120-140us
> >> on BeagleBone Black board. Current U-Boot code
> >> has watchdog resets in random places and often
> >> there is situation when watchdog is reset
> >> few times in a row in nested functions.
> >> This adds extra delays and slows the whole system.
> >>
> >> Instead of polling W_PEND_WTGR bit, we skip
> >> watchdog reset if the bit is set. Anyway, watchdog
> >> is in the middle of reset *right now*, so we can
> >> just return.  
> >
> > It seems like similar problem may be in the Linux kernel driver:
> > v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
> >
> > Linux driver also waits for the write.  
> 
> Right, Linux driver has similar code but it doesn't affect
> performance. This is because of watchdog usage in Linux, it is
> enabled and reset by userspace. This is quite rare event.
> Moreover, since Linux has interrupts enabled and has scheduling,
> such busy loops in the omap driver are not very different to
> just mdelay() usage. The system can handle interrupts, and can
> even do preemption if PREEMPT is enabled.
> So use of such busy loops in that particular case shouldn't cause
> any noticeable performance issues.
> 
> In U-Boot we have polling instead of interrupts and something like
> cooperative multitasking. Also watchdog resets much more often,
> and such busy pollings in the driver delay execution of other code.
> 
> For example, in DFU we have indefinite loop, something like:
>    --->  
>   |   dfu_write()
>   |   watchdog_reset()
>   |   ...
>   |   handle_usb()
>   |      `----- watchdog_reset()
>   |       `---- ...
>   `----'
> 
> And each delay caused by watchdog reset adds significant
> overhead comparing to handling USB events or writing to the
> storage.
> 
> So as bottom line, we don't need to do similar change in
> Linux driver because there is almost no impact on performance
> in that case. But in case of U-Boot which uses polling instead
> of interrupts, this patch causes such a big performance
> improvement.

Thank you for this work - it really improves speed considerably :-).

Reviewed-by: Lukasz Majewski <lukma@denx.de>

> 
> Best regards,
> Ruslan
> 
> >  
> >>
> >> This noticeably increases performance of the
> >> system. Below are some measurements on BBB:
> >>  - DFU upload over USB                 15% faster
> >>  - fastboot image upload               3x times faster
> >>  - USB ep0 transfers with 4k packets   20% faster
> >>
> >> Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
> >> ---
> >>  drivers/watchdog/omap_wdt.c | 21 +++++++++++++++------
> >>  1 file changed, 15 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/watchdog/omap_wdt.c
> >> b/drivers/watchdog/omap_wdt.c index 7b1f429..d724c96 100644
> >> --- a/drivers/watchdog/omap_wdt.c
> >> +++ b/drivers/watchdog/omap_wdt.c
> >> @@ -53,16 +53,25 @@ void hw_watchdog_reset(void)
> >>  {
> >>       struct wd_timer *wdt = (struct wd_timer *)WDT_BASE;
> >>
> >> -     /* wait for posted write to complete */
> >> -     while ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
> >> -             ;
> >> +     /*
> >> +      * Somebody just triggered watchdog reset and write to WTGR
> >> register
> >> +      * is in progress. It is resetting right now, no need to
> >> trigger it
> >> +      * again
> >> +      */
> >> +     if ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
> >> +             return;
> >>
> >>       wdt_trgr_pattern = ~wdt_trgr_pattern;
> >>       writel(wdt_trgr_pattern, &wdt->wdtwtgr);
> >>
> >> -     /* wait for posted write to complete */
> >> -     while ((readl(&wdt->wdtwwps) & WDT_WWPS_PEND_WTGR))
> >> -             ;
> >> +     /*
> >> +      * Don't wait for posted write to complete, i.e. don't check
> >> +      * WDT_WWPS_PEND_WTGR bit in WWPS register. There is no
> >> writes to
> >> +      * WTGR register outside of this func, and if entering it
> >> +      * we see WDT_WWPS_PEND_WTGR bit set, it means watchdog reset
> >> +      * was just triggered. This prevents us from wasting time in
> >> busy
> >> +      * polling of WDT_WWPS_PEND_WTGR bit.
> >> +      */
> >>  }
> >>
> >>  static int omap_wdt_set_timeout(unsigned int timeout)  
> >
> >
> >
> >
> > Best regards,
> >
> > Lukasz Majewski
> >
> > --
> >
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang
> > Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell,
> > Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email:
> > wd@denx.de  




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Sam Protsenko March 2, 2018, 5:34 p.m. UTC | #6
On 1 March 2018 at 16:33, Tom Rini <trini@konsulko.com> wrote:
> On Thu, Mar 01, 2018 at 04:10:44PM +0200, Ruslan Bilovol wrote:
>> Hi Lukasz,
>>
>> On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski <lukma@denx.de> wrote:
>> > Hi Ruslan,
>> >
>> >> Remove busy looping during watchdog reset.
>> >> Each polling of W_PEND_WTGR bit ("finish posted
>> >> write") after watchdog reset takes 120-140us
>> >> on BeagleBone Black board. Current U-Boot code
>> >> has watchdog resets in random places and often
>> >> there is situation when watchdog is reset
>> >> few times in a row in nested functions.
>> >> This adds extra delays and slows the whole system.
>> >>
>> >> Instead of polling W_PEND_WTGR bit, we skip
>> >> watchdog reset if the bit is set. Anyway, watchdog
>> >> is in the middle of reset *right now*, so we can
>> >> just return.
>> >
>> > It seems like similar problem may be in the Linux kernel driver:
>> > v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
>> >
>> > Linux driver also waits for the write.
>>
>> Right, Linux driver has similar code but it doesn't affect performance.
>> This is because of watchdog usage in Linux, it is enabled and
>> reset by userspace. This is quite rare event.
>> Moreover, since Linux has interrupts enabled and has scheduling,
>> such busy loops in the omap driver are not very different to
>> just mdelay() usage. The system can handle interrupts, and can
>> even do preemption if PREEMPT is enabled.
>> So use of such busy loops in that particular case shouldn't cause
>> any noticeable performance issues.
>
> True.  But, can you also submit a patch to the kernel side (and CC me) ?
> That's far more likely to get the attention of the engineers that might
> know of corner cases with the various SoC families where we might need
> to keep doing what we're doing or other possible problems.  Thanks!
>

Some additional input from my side.

My previous measurements were wrong, due to usage of slow USB hub port
on my laptop. Using another USB port I have next transfer speed for
"fastboot flash" operation:
 - without patch: 2.84 MiB/sec
 - with patch: 7.81  MiB/sec

So it gives us about 2.75 times improvement, as stated by Ruslan in
his commit message. Also, I tested that WDT continues to work
correctly, and it does (tried several use-cases, and also debug patch
with infinite loop). So again:

    Tested-by: Sam Protsenko <semen.protsenko@linaro.org>

Also:

    Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>

I checked the code and I can say that there shouldn't be any concerns
about WDT functionality with this patch. By the way, in my opinion,
for kernel this patch doesn't make much sense, and there might be even
confusion in case we send it... If there any concerns about it, we can
invite related engineers in this discussion, asking them to review
this.

Overall, I think this patch is "must have" in U-Boot, as it gives us
dramatic improvement without any drawbacks, especially for AM335x
boards. It's probably the best approach for WDT reset procedure until
interrupts are enabled for ARM architecture (if we even consider it).

> --
> Tom
Ruslan Bilovol March 4, 2018, 2 p.m. UTC | #7
On Thu, Mar 1, 2018 at 4:33 PM, Tom Rini <trini@konsulko.com> wrote:
> On Thu, Mar 01, 2018 at 04:10:44PM +0200, Ruslan Bilovol wrote:
>> Hi Lukasz,
>>
>> On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski <lukma@denx.de> wrote:
>> > Hi Ruslan,
>> >
>> >> Remove busy looping during watchdog reset.
>> >> Each polling of W_PEND_WTGR bit ("finish posted
>> >> write") after watchdog reset takes 120-140us
>> >> on BeagleBone Black board. Current U-Boot code
>> >> has watchdog resets in random places and often
>> >> there is situation when watchdog is reset
>> >> few times in a row in nested functions.
>> >> This adds extra delays and slows the whole system.
>> >>
>> >> Instead of polling W_PEND_WTGR bit, we skip
>> >> watchdog reset if the bit is set. Anyway, watchdog
>> >> is in the middle of reset *right now*, so we can
>> >> just return.
>> >
>> > It seems like similar problem may be in the Linux kernel driver:
>> > v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
>> >
>> > Linux driver also waits for the write.
>>
>> Right, Linux driver has similar code but it doesn't affect performance.
>> This is because of watchdog usage in Linux, it is enabled and
>> reset by userspace. This is quite rare event.
>> Moreover, since Linux has interrupts enabled and has scheduling,
>> such busy loops in the omap driver are not very different to
>> just mdelay() usage. The system can handle interrupts, and can
>> even do preemption if PREEMPT is enabled.
>> So use of such busy loops in that particular case shouldn't cause
>> any noticeable performance issues.
>
> True.  But, can you also submit a patch to the kernel side (and CC me) ?
> That's far more likely to get the attention of the engineers that might
> know of corner cases with the various SoC families where we might need
> to keep doing what we're doing or other possible problems.  Thanks!

Do you mean just resend this patch, but include kernel watchdog
mailing list?
Or reimplement it for kernel?

In last case it doesn't have much sense - it brings almost no
value to the driver, since watchdog usage approach by kernel
is too different comparing to U-Boot.
Yet also kernel driver is more complicated and such a change may
trigger it's rework. Probably will need to introduce additional
synchronization at least.

By the way, if needed, I can resend this patch including watchdog
kernel mailing list and related maintainer, so kernel folks can review
and provide some feedback.

Best regards,
Ruslan Bilovol
Tom Rini March 5, 2018, 2:40 p.m. UTC | #8
On Sun, Mar 04, 2018 at 04:00:15PM +0200, Ruslan Bilovol wrote:
> On Thu, Mar 1, 2018 at 4:33 PM, Tom Rini <trini@konsulko.com> wrote:
> > On Thu, Mar 01, 2018 at 04:10:44PM +0200, Ruslan Bilovol wrote:
> >> Hi Lukasz,
> >>
> >> On Thu, Mar 1, 2018 at 12:53 PM, Lukasz Majewski <lukma@denx.de> wrote:
> >> > Hi Ruslan,
> >> >
> >> >> Remove busy looping during watchdog reset.
> >> >> Each polling of W_PEND_WTGR bit ("finish posted
> >> >> write") after watchdog reset takes 120-140us
> >> >> on BeagleBone Black board. Current U-Boot code
> >> >> has watchdog resets in random places and often
> >> >> there is situation when watchdog is reset
> >> >> few times in a row in nested functions.
> >> >> This adds extra delays and slows the whole system.
> >> >>
> >> >> Instead of polling W_PEND_WTGR bit, we skip
> >> >> watchdog reset if the bit is set. Anyway, watchdog
> >> >> is in the middle of reset *right now*, so we can
> >> >> just return.
> >> >
> >> > It seems like similar problem may be in the Linux kernel driver:
> >> > v4.16-rc1/source/drivers/watchdog/omap_wdt.c#L74
> >> >
> >> > Linux driver also waits for the write.
> >>
> >> Right, Linux driver has similar code but it doesn't affect performance.
> >> This is because of watchdog usage in Linux, it is enabled and
> >> reset by userspace. This is quite rare event.
> >> Moreover, since Linux has interrupts enabled and has scheduling,
> >> such busy loops in the omap driver are not very different to
> >> just mdelay() usage. The system can handle interrupts, and can
> >> even do preemption if PREEMPT is enabled.
> >> So use of such busy loops in that particular case shouldn't cause
> >> any noticeable performance issues.
> >
> > True.  But, can you also submit a patch to the kernel side (and CC me) ?
> > That's far more likely to get the attention of the engineers that might
> > know of corner cases with the various SoC families where we might need
> > to keep doing what we're doing or other possible problems.  Thanks!
> 
> Do you mean just resend this patch, but include kernel watchdog
> mailing list?

Well, I'm going to set aside what I said here since we've gotten some TI
folks to review the patch here.
Alex Kiernan March 16, 2018, 7:39 a.m. UTC | #9
On Thu, Mar 1, 2018 at 1:15 AM, Ruslan Bilovol <ruslan.bilovol@gmail.com> wrote:
> Remove busy looping during watchdog reset.
> Each polling of W_PEND_WTGR bit ("finish posted
> write") after watchdog reset takes 120-140us
> on BeagleBone Black board. Current U-Boot code
> has watchdog resets in random places and often
> there is situation when watchdog is reset
> few times in a row in nested functions.
> This adds extra delays and slows the whole system.
>
> Instead of polling W_PEND_WTGR bit, we skip
> watchdog reset if the bit is set. Anyway, watchdog
> is in the middle of reset *right now*, so we can
> just return.
>
> This noticeably increases performance of the
> system. Below are some measurements on BBB:
>  - DFU upload over USB                 15% faster
>  - fastboot image upload               3x times faster
>  - USB ep0 transfers with 4k packets   20% faster
>
> Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>

Tested-by: Alex Kiernan <alex.kiernan@gmail.com>
Tom Rini March 16, 2018, 1:50 p.m. UTC | #10
On Thu, Mar 01, 2018 at 03:15:48AM +0200, Ruslan Bilovol wrote:

> Remove busy looping during watchdog reset.
> Each polling of W_PEND_WTGR bit ("finish posted
> write") after watchdog reset takes 120-140us
> on BeagleBone Black board. Current U-Boot code
> has watchdog resets in random places and often
> there is situation when watchdog is reset
> few times in a row in nested functions.
> This adds extra delays and slows the whole system.
> 
> Instead of polling W_PEND_WTGR bit, we skip
> watchdog reset if the bit is set. Anyway, watchdog
> is in the middle of reset *right now*, so we can
> just return.
> 
> This noticeably increases performance of the
> system. Below are some measurements on BBB:
>  - DFU upload over USB                 15% faster
>  - fastboot image upload               3x times faster
>  - USB ep0 transfers with 4k packets   20% faster
> 
> Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
> Tested-by: Sam Protsenko <semen.protsenko@linaro.org>
> Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com>
> Reviewed-by: Lukasz Majewski <lukma@denx.de>
> Tested-by: Alex Kiernan <alex.kiernan@gmail.com>

Applied to u-boot/master, thanks!
diff mbox series

Patch

diff --git a/drivers/watchdog/omap_wdt.c b/drivers/watchdog/omap_wdt.c
index 7b1f429..d724c96 100644
--- a/drivers/watchdog/omap_wdt.c
+++ b/drivers/watchdog/omap_wdt.c
@@ -53,16 +53,25 @@  void hw_watchdog_reset(void)
 {
 	struct wd_timer *wdt = (struct wd_timer *)WDT_BASE;
 
-	/* wait for posted write to complete */
-	while ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
-		;
+	/*
+	 * Somebody just triggered watchdog reset and write to WTGR register
+	 * is in progress. It is resetting right now, no need to trigger it
+	 * again
+	 */
+	if ((readl(&wdt->wdtwwps)) & WDT_WWPS_PEND_WTGR)
+		return;
 
 	wdt_trgr_pattern = ~wdt_trgr_pattern;
 	writel(wdt_trgr_pattern, &wdt->wdtwtgr);
 
-	/* wait for posted write to complete */
-	while ((readl(&wdt->wdtwwps) & WDT_WWPS_PEND_WTGR))
-		;
+	/*
+	 * Don't wait for posted write to complete, i.e. don't check
+	 * WDT_WWPS_PEND_WTGR bit in WWPS register. There is no writes to
+	 * WTGR register outside of this func, and if entering it
+	 * we see WDT_WWPS_PEND_WTGR bit set, it means watchdog reset
+	 * was just triggered. This prevents us from wasting time in busy
+	 * polling of WDT_WWPS_PEND_WTGR bit.
+	 */
 }
 
 static int omap_wdt_set_timeout(unsigned int timeout)