powerpc/perf: Fix for core/nest imc call trace on cpuhotplug

Message ID 1507099852-28004-1-git-send-email-anju@linux.vnet.ibm.com
State Accepted
Commit 0d923820c6db1644c27c2d0a5af8920fc0f8cd81
Headers show
Series
  • powerpc/perf: Fix for core/nest imc call trace on cpuhotplug
Related show

Commit Message

Anju T Sudhakar Oct. 4, 2017, 6:50 a.m.
Nest/core pmu units are enabled only when it is used. A reference count is      
maintained for the events which uses the nest/core pmu units. Currently in      
*_imc_counters_release function a WARN() is used for notification of any        
underflow of ref count.                                                         
                                                                                
The case where event ref count hit a negative value is, when perf session is    
started, followed by offlining of all cpus in a given core.                     
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the     
ref->count to zero, if the current cpu which is about to offline is the last    
cpu in a given core and make an OPAL call to disable the engine in that core.   
And on perf session termination, perf->destroy (core_imc_counters_release) will 
first decrement the ref->count for this core and based on the ref->count value  
an opal call is made to disable the core-imc engine.                            
Now, since cpuhotplug path already clears the ref->count for core and disabled  
the engine, perf->destroy() decrementing again at event termination make it     
negative which in turn fires the WARN_ON. The same happens for nest units.      
                                                                                
Add a check to see if the reference count is alreday zero, before decrementing  
the count, so that the ref count will not hit a negative value.                 
                                                                                
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>                
---
 arch/powerpc/perf/imc-pmu.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Santosh Sivaraj Oct. 5, 2017, 9:50 a.m. | #1
* Anju T Sudhakar <anju@linux.vnet.ibm.com> wrote (on 2017-10-04 06:50:52 +0000):

> Nest/core pmu units are enabled only when it is used. A reference count is      
> maintained for the events which uses the nest/core pmu units. Currently in      
> *_imc_counters_release function a WARN() is used for notification of any        
> underflow of ref count.                                                         
>                                                                                 
> The case where event ref count hit a negative value is, when perf session is    
> started, followed by offlining of all cpus in a given core.                     
> i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the     
> ref->count to zero, if the current cpu which is about to offline is the last    
> cpu in a given core and make an OPAL call to disable the engine in that core.   
> And on perf session termination, perf->destroy (core_imc_counters_release) will 
> first decrement the ref->count for this core and based on the ref->count value  
> an opal call is made to disable the core-imc engine.                            
> Now, since cpuhotplug path already clears the ref->count for core and disabled  
> the engine, perf->destroy() decrementing again at event termination make it     
> negative which in turn fires the WARN_ON. The same happens for nest units.      
>                                                                                 
> Add a check to see if the reference count is alreday zero, before decrementing  
> the count, so that the ref count will not hit a negative value.                 
>                                                                                 
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>

Reviewed-by: Santosh Sivaraj <santosh@fossix.org>
> ---
>  arch/powerpc/perf/imc-pmu.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index 9ccac86f3463..e3a1f65933b5 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event *event)
>  
>  	/* Take the mutex lock for this node and then decrement the reference count */
>  	mutex_lock(&ref->lock);
> +	if (ref->refc == 0) {
> +		/*
> +		 * The scenario where this is true is, when perf session is
> +		 * started, followed by offlining of all cpus in a given node.
> +		 *
> +		 * In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
> +		 * function set the ref->count to zero, if the cpu which is
> +		 * about to offline is the last cpu in a given node and make
> +		 * an OPAL call to disable the engine in that node.
> +		 *
> +		 */
> +		mutex_unlock(&ref->lock);
> +		return;
> +	}
>  	ref->refc--;
>  	if (ref->refc == 0) {
>  		rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
> @@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event *event)
>  		return;
>  
>  	mutex_lock(&ref->lock);
> +	if (ref->refc == 0) {
> +		/*
> +		 * The scenario where this is true is, when perf session is
> +		 * started, followed by offlining of all cpus in a given core.
> +		 *
> +		 * In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
> +		 * function set the ref->count to zero, if the cpu which is
> +		 * about to offline is the last cpu in a given core and make
> +		 * an OPAL call to disable the engine in that core.
> +		 *
> +		 */
> +		mutex_unlock(&ref->lock);
> +		return;
> +	}
>  	ref->refc--;
>  	if (ref->refc == 0) {
>  		rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,

--
Anju T Sudhakar Oct. 5, 2017, 9:51 a.m. | #2
Hi Santosh,


On Thursday 05 October 2017 03:20 PM, Santosh Sivaraj wrote:
> * Anju T Sudhakar <anju@linux.vnet.ibm.com> wrote (on 2017-10-04 06:50:52 +0000):
>
>> Nest/core pmu units are enabled only when it is used. A reference count is
>> maintained for the events which uses the nest/core pmu units. Currently in
>> *_imc_counters_release function a WARN() is used for notification of any
>> underflow of ref count.
>>                                                                                  
>> The case where event ref count hit a negative value is, when perf session is
>> started, followed by offlining of all cpus in a given core.
>> i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
>> ref->count to zero, if the current cpu which is about to offline is the last
>> cpu in a given core and make an OPAL call to disable the engine in that core.
>> And on perf session termination, perf->destroy (core_imc_counters_release) will
>> first decrement the ref->count for this core and based on the ref->count value
>> an opal call is made to disable the core-imc engine.
>> Now, since cpuhotplug path already clears the ref->count for core and disabled
>> the engine, perf->destroy() decrementing again at event termination make it
>> negative which in turn fires the WARN_ON. The same happens for nest units.
>>                                                                                  
>> Add a check to see if the reference count is alreday zero, before decrementing
>> the count, so that the ref count will not hit a negative value.
>>                                                                                  
>> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Reviewed-by: Santosh Sivaraj <santosh@fossix.org>

Thanks for reviewing.

-Anju
>> ---
>>   arch/powerpc/perf/imc-pmu.c | 28 ++++++++++++++++++++++++++++
>>   1 file changed, 28 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
>> index 9ccac86f3463..e3a1f65933b5 100644
>> --- a/arch/powerpc/perf/imc-pmu.c
>> +++ b/arch/powerpc/perf/imc-pmu.c
>> @@ -399,6 +399,20 @@ static void nest_imc_counters_release(struct perf_event *event)
>>   
>>   	/* Take the mutex lock for this node and then decrement the reference count */
>>   	mutex_lock(&ref->lock);
>> +	if (ref->refc == 0) {
>> +		/*
>> +		 * The scenario where this is true is, when perf session is
>> +		 * started, followed by offlining of all cpus in a given node.
>> +		 *
>> +		 * In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
>> +		 * function set the ref->count to zero, if the cpu which is
>> +		 * about to offline is the last cpu in a given node and make
>> +		 * an OPAL call to disable the engine in that node.
>> +		 *
>> +		 */
>> +		mutex_unlock(&ref->lock);
>> +		return;
>> +	}
>>   	ref->refc--;
>>   	if (ref->refc == 0) {
>>   		rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
>> @@ -646,6 +660,20 @@ static void core_imc_counters_release(struct perf_event *event)
>>   		return;
>>   
>>   	mutex_lock(&ref->lock);
>> +	if (ref->refc == 0) {
>> +		/*
>> +		 * The scenario where this is true is, when perf session is
>> +		 * started, followed by offlining of all cpus in a given core.
>> +		 *
>> +		 * In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
>> +		 * function set the ref->count to zero, if the cpu which is
>> +		 * about to offline is the last cpu in a given core and make
>> +		 * an OPAL call to disable the engine in that core.
>> +		 *
>> +		 */
>> +		mutex_unlock(&ref->lock);
>> +		return;
>> +	}
>>   	ref->refc--;
>>   	if (ref->refc == 0) {
>>   		rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,
Michael Ellerman Oct. 13, 2017, 12:32 p.m. | #3
On Wed, 2017-10-04 at 06:50:52 UTC, Anju T Sudhakar wrote:
> Nest/core pmu units are enabled only when it is used. A reference count is      
> maintained for the events which uses the nest/core pmu units. Currently in      
> *_imc_counters_release function a WARN() is used for notification of any        
> underflow of ref count.                                                         
>                                                                                 
> The case where event ref count hit a negative value is, when perf session is    
> started, followed by offlining of all cpus in a given core.                     
> i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the     
> ref->count to zero, if the current cpu which is about to offline is the last    
> cpu in a given core and make an OPAL call to disable the engine in that core.   
> And on perf session termination, perf->destroy (core_imc_counters_release) will 
> first decrement the ref->count for this core and based on the ref->count value  
> an opal call is made to disable the core-imc engine.                            
> Now, since cpuhotplug path already clears the ref->count for core and disabled  
> the engine, perf->destroy() decrementing again at event termination make it     
> negative which in turn fires the WARN_ON. The same happens for nest units.      
>                                                                                 
> Add a check to see if the reference count is alreday zero, before decrementing  
> the count, so that the ref count will not hit a negative value.                 
>                                                                                 
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Reviewed-by: Santosh Sivaraj <santosh@fossix.org>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/0d923820c6db1644c27c2d0a5af892

cheers

Patch

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9ccac86f3463..e3a1f65933b5 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -399,6 +399,20 @@  static void nest_imc_counters_release(struct perf_event *event)
 
 	/* Take the mutex lock for this node and then decrement the reference count */
 	mutex_lock(&ref->lock);
+	if (ref->refc == 0) {
+		/*
+		 * The scenario where this is true is, when perf session is
+		 * started, followed by offlining of all cpus in a given node.
+		 *
+		 * In the cpuhotplug offline path, ppc_nest_imc_cpu_offline()
+		 * function set the ref->count to zero, if the cpu which is
+		 * about to offline is the last cpu in a given node and make
+		 * an OPAL call to disable the engine in that node.
+		 *
+		 */
+		mutex_unlock(&ref->lock);
+		return;
+	}
 	ref->refc--;
 	if (ref->refc == 0) {
 		rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
@@ -646,6 +660,20 @@  static void core_imc_counters_release(struct perf_event *event)
 		return;
 
 	mutex_lock(&ref->lock);
+	if (ref->refc == 0) {
+		/*
+		 * The scenario where this is true is, when perf session is
+		 * started, followed by offlining of all cpus in a given core.
+		 *
+		 * In the cpuhotplug offline path, ppc_core_imc_cpu_offline()
+		 * function set the ref->count to zero, if the cpu which is
+		 * about to offline is the last cpu in a given core and make
+		 * an OPAL call to disable the engine in that core.
+		 *
+		 */
+		mutex_unlock(&ref->lock);
+		return;
+	}
 	ref->refc--;
 	if (ref->refc == 0) {
 		rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE,