diff mbox series

core/lock.c: ensure valid start value for lock spin duration warning

Message ID 20180328024452.26780-1-stewart@linux.vnet.ibm.com
State Accepted
Headers show
Series core/lock.c: ensure valid start value for lock spin duration warning | expand

Commit Message

Stewart Smith March 28, 2018, 2:44 a.m. UTC
The previous fix in a8e6cc3f4 only addressed half of the problem, as
we could also get an invalid value for start, causing us to fail
in a weird way.

This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS
test in op-test-framework.

You'd get to this part of the test and get the erroneous lock
spinning warnings:

PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000
0000080000000000
[  790.140976993,4] WARNING: Lock has been spinning for 790275ms
[  790.140976993,4] WARNING: Lock has been spinning for 790275ms
[  790.140976918,4] WARNING: Lock has been spinning for 790275ms

This patch checks the validity of timebase before setting start,
and only checks the lock timeout if we got a valid start value.

Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee
Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
---
 core/lock.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Mahesh J Salgaonkar March 30, 2018, 8:15 a.m. UTC | #1
On 03/28/2018 08:14 AM, Stewart Smith wrote:
> The previous fix in a8e6cc3f4 only addressed half of the problem, as
> we could also get an invalid value for start, causing us to fail
> in a weird way.
> 
> This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS
> test in op-test-framework.
> 
> You'd get to this part of the test and get the erroneous lock
> spinning warnings:
> 
> PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000
> 0000080000000000
> [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
> [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
> [  790.140976918,4] WARNING: Lock has been spinning for 790275ms
> 
> This patch checks the validity of timebase before setting start,
> and only checks the lock timeout if we got a valid start value.

Looks good to me.

Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Thanks,
-Mahesh.

> 
> Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee
> Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
> ---
>  core/lock.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/core/lock.c b/core/lock.c
> index 620d26b21345..4ae3a213951c 100644
> --- a/core/lock.c
> +++ b/core/lock.c
> @@ -206,7 +206,7 @@ bool try_lock_caller(struct lock *l, const char *owner)
>  void lock_caller(struct lock *l, const char *owner)
>  {
>  	bool timeout_warn = false;
> -	unsigned long start;
> +	unsigned long start = 0;
> 
>  	if (bust_locks)
>  		return;
> @@ -218,7 +218,13 @@ void lock_caller(struct lock *l, const char *owner)
>  	add_lock_request(l);
> 
>  #ifdef DEBUG_LOCKS
> -	start = tb_to_msecs(mftb());
> +	/*
> +	 * Ensure that we get a valid start value
> +	 * as we may be handling TFMR errors and taking
> +	 * a lock to do so, so timebase could be garbage
> +	 */
> +	if( (mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID))
> +		start = tb_to_msecs(mftb());
>  #endif
> 
>  	for (;;) {
> @@ -229,7 +235,7 @@ void lock_caller(struct lock *l, const char *owner)
>  			barrier();
>  		smt_medium();
> 
> -		if (!timeout_warn)
> +		if (start && !timeout_warn)
>  			timeout_warn = lock_timeout(start);
>  	}
>
ppaidipe March 30, 2018, 10:51 a.m. UTC | #2
On 2018-03-30 13:45, Mahesh Jagannath Salgaonkar wrote:
> On 03/28/2018 08:14 AM, Stewart Smith wrote:
>> The previous fix in a8e6cc3f4 only addressed half of the problem, as
>> we could also get an invalid value for start, causing us to fail
>> in a weird way.
>> 
>> This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS
>> test in op-test-framework.
>> 
>> You'd get to this part of the test and get the erroneous lock
>> spinning warnings:
>> 
>> PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 
>> 0003080000000000
>> 0000080000000000
>> [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
>> [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
>> [  790.140976918,4] WARNING: Lock has been spinning for 790275ms
>> 
>> This patch checks the validity of timebase before setting start,
>> and only checks the lock timeout if we got a valid start value.
> 
> Looks good to me.
> 
> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Thanks,
> -Mahesh.
> 


Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>


>> 
>> Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee
>> Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
>> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
>> ---
>>  core/lock.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>> 
>> diff --git a/core/lock.c b/core/lock.c
>> index 620d26b21345..4ae3a213951c 100644
>> --- a/core/lock.c
>> +++ b/core/lock.c
>> @@ -206,7 +206,7 @@ bool try_lock_caller(struct lock *l, const char 
>> *owner)
>>  void lock_caller(struct lock *l, const char *owner)
>>  {
>>  	bool timeout_warn = false;
>> -	unsigned long start;
>> +	unsigned long start = 0;
>> 
>>  	if (bust_locks)
>>  		return;
>> @@ -218,7 +218,13 @@ void lock_caller(struct lock *l, const char 
>> *owner)
>>  	add_lock_request(l);
>> 
>>  #ifdef DEBUG_LOCKS
>> -	start = tb_to_msecs(mftb());
>> +	/*
>> +	 * Ensure that we get a valid start value
>> +	 * as we may be handling TFMR errors and taking
>> +	 * a lock to do so, so timebase could be garbage
>> +	 */
>> +	if( (mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID))
>> +		start = tb_to_msecs(mftb());
>>  #endif
>> 
>>  	for (;;) {
>> @@ -229,7 +235,7 @@ void lock_caller(struct lock *l, const char 
>> *owner)
>>  			barrier();
>>  		smt_medium();
>> 
>> -		if (!timeout_warn)
>> +		if (start && !timeout_warn)
>>  			timeout_warn = lock_timeout(start);
>>  	}
>> 
> 
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Stewart Smith April 4, 2018, 8:53 a.m. UTC | #3
Stewart Smith <stewart@linux.vnet.ibm.com> writes:
> The previous fix in a8e6cc3f4 only addressed half of the problem, as
> we could also get an invalid value for start, causing us to fail
> in a weird way.
>
> This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS
> test in op-test-framework.
>
> You'd get to this part of the test and get the erroneous lock
> spinning warnings:
>
> PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000
> 0000080000000000
> [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
> [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
> [  790.140976918,4] WARNING: Lock has been spinning for 790275ms
>
> This patch checks the validity of timebase before setting start,
> and only checks the lock timeout if we got a valid start value.
>
> Fixes: a8e6cc3f47525f86ef1d69d69a477b6264d0f8ee
> Fixes: 84186ef0944c9413262f0974ddab3fb1343ccfe8
> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
> ---
>  core/lock.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)

Merged to master as of 36f6c25c731446d2dafa1be3b412c387259f47bd
diff mbox series

Patch

diff --git a/core/lock.c b/core/lock.c
index 620d26b21345..4ae3a213951c 100644
--- a/core/lock.c
+++ b/core/lock.c
@@ -206,7 +206,7 @@  bool try_lock_caller(struct lock *l, const char *owner)
 void lock_caller(struct lock *l, const char *owner)
 {
 	bool timeout_warn = false;
-	unsigned long start;
+	unsigned long start = 0;
 
 	if (bust_locks)
 		return;
@@ -218,7 +218,13 @@  void lock_caller(struct lock *l, const char *owner)
 	add_lock_request(l);
 
 #ifdef DEBUG_LOCKS
-	start = tb_to_msecs(mftb());
+	/*
+	 * Ensure that we get a valid start value
+	 * as we may be handling TFMR errors and taking
+	 * a lock to do so, so timebase could be garbage
+	 */
+	if( (mfspr(SPR_TFMR) & SPR_TFMR_TB_VALID))
+		start = tb_to_msecs(mftb());
 #endif
 
 	for (;;) {
@@ -229,7 +235,7 @@  void lock_caller(struct lock *l, const char *owner)
 			barrier();
 		smt_medium();
 
-		if (!timeout_warn)
+		if (start && !timeout_warn)
 			timeout_warn = lock_timeout(start);
 	}