Message ID | 20180328024452.26780-1-stewart@linux.vnet.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | core/lock.c: ensure valid start value for lock spin duration warning | expand |
On 03/28/2018 08:14 AM, Stewart Smith wrote: > The previous fix in a8e6cc3f4 only addressed half of the problem, as > we could also get an invalid value for start, causing us to fail > in a weird way. > > This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS > test in op-test-framework. > > You'd get to this part of the test and get the erroneous lock > spinning warnings: > > PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000 > 0000080000000000 > [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms > [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms > [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms > > This patch checks the validity of timebase before setting start, > and only checks the lock timeout if we got a valid start value. Looks good to me. Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Thanks, -Mahesh. > > Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee > Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8 > Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> > --- > core/lock.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/core/lock.c b/core/lock.c > index 620d26b21345..4ae3a213951c 100644 > --- a/core/lock.c > +++ b/core/lock.c > @@ -206,7 +206,7 @@ bool try_lock_caller(struct lock *l, const char *owner) > void lock_caller(struct lock *l, const char *owner) > { > bool timeout_warn = false; > - unsigned long start; > + unsigned long start = 0; > > if (bust_locks) > return; > @@ -218,7 +218,13 @@ void lock_caller(struct lock *l, const char *owner) > add_lock_request(l); > > #ifdef DEBUG_LOCKS > - start = tb_to_msecs(mftb()); > + /* > + * Ensure that we get a valid start value > + * as we may be handling TFMR errors and taking > + * a lock to do so, so timebase could be garbage > + */ > + if( (mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID)) > + start = tb_to_msecs(mftb()); > #endif > > for (;;) { > @@ -229,7 +235,7 @@ void lock_caller(struct lock *l, const char *owner) > barrier(); > smt_medium(); > > - if (!timeout_warn) > + if (start && !timeout_warn) > timeout_warn = lock_timeout(start); > } >
On 2018-03-30 13:45, Mahesh Jagannath Salgaonkar wrote: > On 03/28/2018 08:14 AM, Stewart Smith wrote: >> The previous fix in a8e6cc3f4 only addressed half of the problem, as >> we could also get an invalid value for start, causing us to fail >> in a weird way. >> >> This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS >> test in op-test-framework. >> >> You'd get to this part of the test and get the erroneous lock >> spinning warnings: >> >> PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 >> 0003080000000000 >> 0000080000000000 >> [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms >> [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms >> [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms >> >> This patch checks the validity of timebase before setting start, >> and only checks the lock timeout if we got a valid start value. > > Looks good to me. > > Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> > > Thanks, > -Mahesh. > Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> >> >> Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee >> Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8 >> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> >> --- >> core/lock.c | 12 +++++++++--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/core/lock.c b/core/lock.c >> index 620d26b21345..4ae3a213951c 100644 >> --- a/core/lock.c >> +++ b/core/lock.c >> @@ -206,7 +206,7 @@ bool try_lock_caller(struct lock *l, const char >> *owner) >> void lock_caller(struct lock *l, const char *owner) >> { >> bool timeout_warn = false; >> - unsigned long start; >> + unsigned long start = 0; >> >> if (bust_locks) >> return; >> @@ -218,7 +218,13 @@ void lock_caller(struct lock *l, const char >> *owner) >> add_lock_request(l); >> >> #ifdef DEBUG_LOCKS >> - start = tb_to_msecs(mftb()); >> + /* >> + * Ensure that we get a valid start value >> + * as we may be handling TFMR errors and taking >> + * a lock to do so, so timebase could be garbage >> + */ >> + if( (mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID)) >> + start = tb_to_msecs(mftb()); >> #endif >> >> for (;;) { >> @@ -229,7 +235,7 @@ void lock_caller(struct lock *l, const char >> *owner) >> barrier(); >> smt_medium(); >> >> - if (!timeout_warn) >> + if (start && !timeout_warn) >> timeout_warn = lock_timeout(start); >> } >> > > _______________________________________________ > Skiboot mailing list > Skiboot@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/skiboot
Stewart Smith <stewart@linux.vnet.ibm.com> writes: > The previous fix in a8e6cc3f4 only addressed half of the problem, as > we could also get an invalid value for start, causing us to fail > in a weird way. > > This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS > test in op-test-framework. > > You'd get to this part of the test and get the erroneous lock > spinning warnings: > > PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000 > 0000080000000000 > [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms > [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms > [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms > > This patch checks the validity of timebase before setting start, > and only checks the lock timeout if we got a valid start value. > > Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee > Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8 > Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> > --- > core/lock.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) Merged to master as of 36f6c25c731446d2dafa1be3b412c387259f47bd
diff --git a/core/lock.c b/core/lock.c index 620d26b21345..4ae3a213951c 100644 --- a/core/lock.c +++ b/core/lock.c @@ -206,7 +206,7 @@ bool try_lock_caller(struct lock *l, const char *owner) void lock_caller(struct lock *l, const char *owner) { bool timeout_warn = false; - unsigned long start; + unsigned long start = 0; if (bust_locks) return; @@ -218,7 +218,13 @@ void lock_caller(struct lock *l, const char *owner) add_lock_request(l); #ifdef DEBUG_LOCKS - start = tb_to_msecs(mftb()); + /* + * Ensure that we get a valid start value + * as we may be handling TFMR errors and taking + * a lock to do so, so timebase could be garbage + */ + if( (mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID)) + start = tb_to_msecs(mftb()); #endif for (;;) { @@ -229,7 +235,7 @@ void lock_caller(struct lock *l, const char *owner) barrier(); smt_medium(); - if (!timeout_warn) + if (start && !timeout_warn) timeout_warn = lock_timeout(start); }
The previous fix in a8e6cc3f4 only addressed half of the problem, as we could also get an invalid value for start, causing us to fail in a weird way. This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS test in op-test-framework. You'd get to this part of the test and get the erroneous lock spinning warnings: PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000 0000080000000000 [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms This patch checks the validity of timebase before setting start, and only checks the lock timeout if we got a valid start value. Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> --- core/lock.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)