diff mbox series

Don't detect lock timeouts when timebase is invalid

Message ID 20180309054708.14538-1-stewart@linux.vnet.ibm.com
State Accepted
Headers show
Series Don't detect lock timeouts when timebase is invalid | expand

Commit Message

Stewart Smith March 9, 2018, 5:47 a.m. UTC
We can have threads waiting on hmi_lock who have an
invalid timebase. Because of this, we want to go poke
the register directly rather than rely on
this_cpu()->tb_invalid (which won't have been set yet).

Without this patch, you get something like this when
you're injecting timebase errors:
[10976.202052846,4] WARNING: Lock has been spinning for 10976394ms

Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
 core/lock.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Oliver O'Halloran March 9, 2018, 5:50 a.m. UTC | #1
On Fri, Mar 9, 2018 at 4:47 PM, Stewart Smith
<stewart@linux.vnet.ibm.com> wrote:
> We can have threads waiting on hmi_lock who have an
> invalid timebase. Because of this, we want to go poke
> the register directly rather than rely on
> this_cpu()->tb_invalid (which won't have been set yet).

Maybe we should just switch the polarity of tb_invalid to tb_valid instead?

>
> Without this patch, you get something like this when
> you're injecting timebase errors:
> [10976.202052846,4] WARNING: Lock has been spinning for 10976394ms
>
> Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
> ---
>  core/lock.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/core/lock.c b/core/lock.c
> index 2d94ebb4e285..318f42d92d51 100644
> --- a/core/lock.c
> +++ b/core/lock.c
> @@ -140,6 +140,13 @@ static inline bool lock_timeout(unsigned long start)
>         unsigned long wait = tb_to_msecs(mftb());
>
>         if (wait - start > LOCK_TIMEOUT_MS) {
> +               /*
> +                * If the timebase is invalid, we shouldn't
> +                * throw an error. This is possible with pending HMIs
> +                * that need to recover TB.
> +                */
> +               if( !(mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID))
> +                       return false;
>                 prlog(PR_WARNING, "WARNING: Lock has been "\
>                       "spinning for %lums\n", wait - start);
>                 backtrace();
> --
> 2.14.3
>
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Stewart Smith March 13, 2018, 4:12 a.m. UTC | #2
Oliver <oohall@gmail.com> writes:
> On Fri, Mar 9, 2018 at 4:47 PM, Stewart Smith
> <stewart@linux.vnet.ibm.com> wrote:
>> We can have threads waiting on hmi_lock who have an
>> invalid timebase. Because of this, we want to go poke
>> the register directly rather than rely on
>> this_cpu()->tb_invalid (which won't have been set yet).
>
> Maybe we should just switch the polarity of tb_invalid to tb_valid
> instead?

possibly? Either way, there's going to be a window where it's lies.

I do kind of like the "if (this_cpu()->tb_valid)" pattern more than the
invalid one.
Stewart Smith March 13, 2018, 4:35 a.m. UTC | #3
Stewart Smith <stewart@linux.vnet.ibm.com> writes:
> We can have threads waiting on hmi_lock who have an
> invalid timebase. Because of this, we want to go poke
> the register directly rather than rely on
> this_cpu()->tb_invalid (which won't have been set yet).
>
> Without this patch, you get something like this when
> you're injecting timebase errors:
> [10976.202052846,4] WARNING: Lock has been spinning for 10976394ms
>
> Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
> ---
>  core/lock.c | 7 +++++++
>  1 file changed, 7 insertions(+)

Merged to master as of a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee
diff mbox series

Patch

diff --git a/core/lock.c b/core/lock.c
index 2d94ebb4e285..318f42d92d51 100644
--- a/core/lock.c
+++ b/core/lock.c
@@ -140,6 +140,13 @@  static inline bool lock_timeout(unsigned long start)
 	unsigned long wait = tb_to_msecs(mftb());
 
 	if (wait - start > LOCK_TIMEOUT_MS) {
+		/*
+		 * If the timebase is invalid, we shouldn't
+		 * throw an error. This is possible with pending HMIs
+		 * that need to recover TB.
+		 */
+		if( !(mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID))
+			return false;
 		prlog(PR_WARNING, "WARNING: Lock has been "\
 		      "spinning for %lums\n", wait - start);
 		backtrace();