mbox series

[v2,0/3] skiboot: OPAL support for IMC trace-mode

Message ID 20181212061545.9756-1-anju@linux.vnet.ibm.com
Headers show
Series skiboot: OPAL support for IMC trace-mode | expand

Message

Anju T Sudhakar Dec. 12, 2018, 6:15 a.m. UTC
IMC (In-Memory collection counters) is a hardware monitoring facility      
that collects large number of hardware performance events.                 
POWER9 support two modes for IMC which are the Accumulation mode and       
Trace mode. In Accumulation mode, event counts are accumulated in system   
Memory. Hypervisor then reads the posted counts periodically or when       
requested. In IMC Trace mode, event counted is fixed for cycles and on     
each overflow, hardware snapshots the program counter along with other     
details and writes into memory pointed by LDBAR(ring buffer memory,        
hardware wraps around). LDBAR has bits to indicate the IMC trace-mode.
								    
Trace-IMC Implementation:                                                  
--------------------------                                                 
To enable trace-imc, we need to                                            
								    
* Add trace node in the DTS file for power9, so that the new trace node can
be discovered by the kernel.                                               
								    
Informations included in the DTS file are as follows, (a snippet from      
the ima-catalog)                                                           
								    
TRACE_IMC: trace-events {                                                  
     #address-cells = <0x1>;                                        
     #size-cells = <0x1>;                                           
     event@10200000 {                                               
	 event-name = "cycles" ;                                    
	 reg = <0x10200000 0x8>;                                    
	 desc = "Reference cycles" ;                                
     };                                                             
 };                                                                 
 trace@0 {                                                          
	 compatible = "ibm,imc-counters";                           
	 events-prefix = "trace_";                                  
	 reg = <0x0 0x8>;                                           
	 events = < &TRACE_IMC >;                                   
	 type = <0x2>;                                              
	 size = <0x40000>;                                          
 };                                                                 
								    
OP-BUILD changes needed to include the "trace node" is already pulled in   
to the ima-catalog repo.                                                   
								    
ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07
								    
* Enchance the opal_imc_counters_* calls to support this new trace mode    
in imc. Add support to initialize the trace-mode scom.                     
								    
TRACE_IMC_SCOM bit representation:                                         
								    
0:1     : SAMPSEL                                                          
2:33    : CPMC_LOAD                                                        
34:40   : CPMC1SEL                                                         
41:47   : CPMC2SEL                                                         
48:50   : BUFFERSIZE                                                       
51:63   : RESERVED                                                         
								    
CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines  
the event to count. BUFFRSIZE indicates the memory range. On each overflow,
hardware snapshots program counter along with other details and update the 
memory and reloads the CMPC_LOAD value for the next sampling duration.     
IMC hardware does not support exceptions, so it quietly wraps around if    
memory buffer reaches the end.                                             
----------------                                                           
								    
Key benefit of imc trace-mode is, each sample record contains the address
pointer along with other information. So that, we can profile the IP
without interrupting the application.
								    
Performance data using 'perf top' with and without trace-imc event:        
								    
When the application is monitored with trace-imc event, we dont take any   
PMI interrupts.                                                            
								    
PMI interrupts count when `perf top` command is executed without trace-imc event.
								    
# perf top                                      
12.53%  [kernel]       [k] arch_cpu_idle                               
11.32%  [kernel]       [k] rcu_idle_enter                              
10.76%  [kernel]       [k] __next_timer_interrupt                      
 9.49%  [kernel]       [k] find_next_bit                               
 8.06%  [kernel]       [k] rcu_dynticks_eqs_exit                       
 7.82%  [kernel]       [k] do_idle                                     
 5.71%  [kernel]       [k] tick_nohz_idle_stop_tic                     
     [-----------------------]                                      
# cat /proc/interrupts  (a snippet from the output)                        
9944      1072        804        804       1644        804       1306      
804        804        804        804        804        804        804      
804        804       1961       1602        804        804       1258      
[-----------------------------------------------------------------]        
803        803        803        803        803        803        803      
803        803        803        803        804        804        804     
804        804        804        804        804        804        803     
803        803        803        803        803       1306        803     
803   Performance monitoring interrupts                                   
								    
								    
`perf top` with trace-imc (right after 'perf top' without trace-imc event):
								    
# perf top -e trace_imc/trace_cycles/                                      
12.50%  [kernel]          [k] arch_cpu_idle                            
11.81%  [kernel]          [k] __next_timer_interrupt                   
11.22%  [kernel]          [k] rcu_idle_enter                           
10.25%  [kernel]          [k] find_next_bit                            
 7.91%  [kernel]          [k] do_idle                                  
 7.69%  [kernel]          [k] rcu_dynticks_eqs_exit                    
 5.20%  [kernel]          [k] tick_nohz_idle_stop_tick                 
     [-----------------------]                                      
								    
# cat /proc/interrupts (a snippet from the output)                         
								    
9944      1072        804        804       1644        804       1306      
804        804        804        804        804        804        804      
804        804       1961       1602        804        804       1258      
[-----------------------------------------------------------------]        
803        803        803        803        803        803        803      
803        803        803        804        804        804        804
804        804        804        804        804        804        803     
803        803        803        803        803       1306        803     
803   Performance monitoring interrupts                                   
								    
The PMI interrupts count remains the same.

Changelog:

From v1 -> v2
---------------

* Added documentation for the new trace-mode support
* Updated the commit message in the last patch


Anju T Sudhakar (5):
  skiboot/doc: Add documentation for trace-mode in imc.rst
  skiboot: Edit documentation for IMC opal call to include trace-imc
  skiboot: Trace IMC Macro definitions
  skiboot: Enable opal calls to init/start/stop IMC Trace mode
  skiboot: Mambo hack to add trace-imc nodes in the device-tree

 doc/imc.rst                        |   9 +++
 doc/opal-api/opal-imc-counters.rst |  28 ++++----
 external/mambo/skiboot.tcl         |  21 ++++++
 hw/imc.c                           | 103 +++++++++++++++++++++++++++++
 include/imc.h                      |  30 +++++++++
 include/opal-api.h                 |   1 +
 6 files changed, 180 insertions(+), 12 deletions(-)

Comments

maddy Feb. 4, 2019, 3:45 a.m. UTC | #1
On 12/12/18 11:45 AM, Anju T Sudhakar wrote:
> IMC (In-Memory collection counters) is a hardware monitoring facility
> that collects large number of hardware performance events.
> POWER9 support two modes for IMC which are the Accumulation mode and
> Trace mode. In Accumulation mode, event counts are accumulated in system
> Memory. Hypervisor then reads the posted counts periodically or when
> requested. In IMC Trace mode, event counted is fixed for cycles and on
> each overflow, hardware snapshots the program counter along with other
> details and writes into memory pointed by LDBAR(ring buffer memory,
> hardware wraps around). LDBAR has bits to indicate the IMC trace-mode.
> 				
Hi Stewart,

Any update on this? Kindly let know if you have concerns.
Maddy

> 				
> Trace-IMC Implementation:
> --------------------------
> To enable trace-imc, we need to
> 								
> * Add trace node in the DTS file for power9, so that the new trace node can
> be discovered by the kernel.
> 								
> Informations included in the DTS file are as follows, (a snippet from
> the ima-catalog)
> 								
> TRACE_IMC: trace-events {
>       #address-cells = <0x1>;
>       #size-cells = <0x1>;
>       event@10200000 {
> 	 event-name = "cycles" ;
> 	 reg = <0x10200000 0x8>;
> 	 desc = "Reference cycles" ;
>       };
>   };
>   trace@0 {
> 	 compatible = "ibm,imc-counters";
> 	 events-prefix = "trace_";
> 	 reg = <0x0 0x8>;
> 	 events = < &TRACE_IMC >;
> 	 type = <0x2>;
> 	 size = <0x40000>;
>   };
> 								
> OP-BUILD changes needed to include the "trace node" is already pulled in
> to the ima-catalog repo.
> 								
> ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07
> 								
> * Enchance the opal_imc_counters_* calls to support this new trace mode
> in imc. Add support to initialize the trace-mode scom.
> 								
> TRACE_IMC_SCOM bit representation:
> 								
> 0:1     : SAMPSEL
> 2:33    : CPMC_LOAD
> 34:40   : CPMC1SEL
> 41:47   : CPMC2SEL
> 48:50   : BUFFERSIZE
> 51:63   : RESERVED
> 								
> CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines
> the event to count. BUFFRSIZE indicates the memory range. On each overflow,
> hardware snapshots program counter along with other details and update the
> memory and reloads the CMPC_LOAD value for the next sampling duration.
> IMC hardware does not support exceptions, so it quietly wraps around if
> memory buffer reaches the end.
> ----------------
> 								
> Key benefit of imc trace-mode is, each sample record contains the address
> pointer along with other information. So that, we can profile the IP
> without interrupting the application.
> 								
> Performance data using 'perf top' with and without trace-imc event:
> 								
> When the application is monitored with trace-imc event, we dont take any
> PMI interrupts.
> 								
> PMI interrupts count when `perf top` command is executed without trace-imc event.
> 								
> # perf top
> 12.53%  [kernel]       [k] arch_cpu_idle
> 11.32%  [kernel]       [k] rcu_idle_enter
> 10.76%  [kernel]       [k] __next_timer_interrupt
>   9.49%  [kernel]       [k] find_next_bit
>   8.06%  [kernel]       [k] rcu_dynticks_eqs_exit
>   7.82%  [kernel]       [k] do_idle
>   5.71%  [kernel]       [k] tick_nohz_idle_stop_tic
>       [-----------------------]
> # cat /proc/interrupts  (a snippet from the output)
> 9944      1072        804        804       1644        804       1306
> 804        804        804        804        804        804        804
> 804        804       1961       1602        804        804       1258
> [-----------------------------------------------------------------]
> 803        803        803        803        803        803        803
> 803        803        803        803        804        804        804
> 804        804        804        804        804        804        803
> 803        803        803        803        803       1306        803
> 803   Performance monitoring interrupts
> 								
> 								
> `perf top` with trace-imc (right after 'perf top' without trace-imc event):
> 								
> # perf top -e trace_imc/trace_cycles/
> 12.50%  [kernel]          [k] arch_cpu_idle
> 11.81%  [kernel]          [k] __next_timer_interrupt
> 11.22%  [kernel]          [k] rcu_idle_enter
> 10.25%  [kernel]          [k] find_next_bit
>   7.91%  [kernel]          [k] do_idle
>   7.69%  [kernel]          [k] rcu_dynticks_eqs_exit
>   5.20%  [kernel]          [k] tick_nohz_idle_stop_tick
>       [-----------------------]
> 								
> # cat /proc/interrupts (a snippet from the output)
> 								
> 9944      1072        804        804       1644        804       1306
> 804        804        804        804        804        804        804
> 804        804       1961       1602        804        804       1258
> [-----------------------------------------------------------------]
> 803        803        803        803        803        803        803
> 803        803        803        804        804        804        804
> 804        804        804        804        804        804        803
> 803        803        803        803        803       1306        803
> 803   Performance monitoring interrupts
> 								
> The PMI interrupts count remains the same.
>
> Changelog:
>
>  From v1 -> v2
> ---------------
>
> * Added documentation for the new trace-mode support
> * Updated the commit message in the last patch
>
>
> Anju T Sudhakar (5):
>    skiboot/doc: Add documentation for trace-mode in imc.rst
>    skiboot: Edit documentation for IMC opal call to include trace-imc
>    skiboot: Trace IMC Macro definitions
>    skiboot: Enable opal calls to init/start/stop IMC Trace mode
>    skiboot: Mambo hack to add trace-imc nodes in the device-tree
>
>   doc/imc.rst                        |   9 +++
>   doc/opal-api/opal-imc-counters.rst |  28 ++++----
>   external/mambo/skiboot.tcl         |  21 ++++++
>   hw/imc.c                           | 103 +++++++++++++++++++++++++++++
>   include/imc.h                      |  30 +++++++++
>   include/opal-api.h                 |   1 +
>   6 files changed, 180 insertions(+), 12 deletions(-)
>
Stewart Smith Feb. 25, 2019, 5:39 a.m. UTC | #2
Anju T Sudhakar <anju@linux.vnet.ibm.com> writes:
> IMC (In-Memory collection counters) is a hardware monitoring facility      
> that collects large number of hardware performance events.                 
> POWER9 support two modes for IMC which are the Accumulation mode and       
> Trace mode. In Accumulation mode, event counts are accumulated in system   
> Memory. Hypervisor then reads the posted counts periodically or when       
> requested. In IMC Trace mode, event counted is fixed for cycles and on     
> each overflow, hardware snapshots the program counter along with other     
> details and writes into memory pointed by LDBAR(ring buffer memory,        
> hardware wraps around). LDBAR has bits to indicate the IMC trace-mode.

Are there any tests for op-test to test this functionality?

> 								    
> Trace-IMC Implementation:                                                  
> --------------------------                                                 
> To enable trace-imc, we need to                                            
> 								    
> * Add trace node in the DTS file for power9, so that the new trace node can
> be discovered by the kernel.                                               
> 								    
> Informations included in the DTS file are as follows, (a snippet from      
> the ima-catalog)                                                           
> 								    
> TRACE_IMC: trace-events {                                                  
>      #address-cells = <0x1>;                                        
>      #size-cells = <0x1>;                                           
>      event@10200000 {                                               
> 	 event-name = "cycles" ;                                    
> 	 reg = <0x10200000 0x8>;                                    
> 	 desc = "Reference cycles" ;                                
>      };                                                             
>  };                                                                 
>  trace@0 {                                                          
> 	 compatible = "ibm,imc-counters";                           
> 	 events-prefix = "trace_";                                  
> 	 reg = <0x0 0x8>;                                           
> 	 events = < &TRACE_IMC >;                                   
> 	 type = <0x2>;                                              
> 	 size = <0x40000>;                                          
>  };                                                                 
> 								    
> OP-BUILD changes needed to include the "trace node" is already pulled in   
> to the ima-catalog repo.                                                   
> 								    
> ps://github.com/open-power/op-build/commit/d3e75dc26d1283d7d5eb444bff1ec9e40d5dfc07
> 								    
> * Enchance the opal_imc_counters_* calls to support this new trace mode    
> in imc. Add support to initialize the trace-mode scom.                     
> 								    
> TRACE_IMC_SCOM bit representation:                                         
> 								    
> 0:1     : SAMPSEL                                                          
> 2:33    : CPMC_LOAD                                                        
> 34:40   : CPMC1SEL                                                         
> 41:47   : CPMC2SEL                                                         
> 48:50   : BUFFERSIZE                                                       
> 51:63   : RESERVED                                                         
> 								    
> CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines  
> the event to count. BUFFRSIZE indicates the memory range. On each overflow,
> hardware snapshots program counter along with other details and update the 
> memory and reloads the CMPC_LOAD value for the next sampling duration.     
> IMC hardware does not support exceptions, so it quietly wraps around if    
> memory buffer reaches the end.                                             
> ----------------                                                           
> 								    
> Key benefit of imc trace-mode is, each sample record contains the address
> pointer along with other information. So that, we can profile the IP
> without interrupting the application.
> 								    
> Performance data using 'perf top' with and without trace-imc event:        
> 								    
> When the application is monitored with trace-imc event, we dont take any   
> PMI interrupts.                                                            
> 								    
> PMI interrupts count when `perf top` command is executed without trace-imc event.
> 								    
> # perf top                                      
> 12.53%  [kernel]       [k] arch_cpu_idle                               
> 11.32%  [kernel]       [k] rcu_idle_enter                              
> 10.76%  [kernel]       [k] __next_timer_interrupt                      
>  9.49%  [kernel]       [k] find_next_bit                               
>  8.06%  [kernel]       [k] rcu_dynticks_eqs_exit                       
>  7.82%  [kernel]       [k] do_idle                                     
>  5.71%  [kernel]       [k] tick_nohz_idle_stop_tic                     
>      [-----------------------]                                      
> # cat /proc/interrupts  (a snippet from the output)                        
> 9944      1072        804        804       1644        804       1306      
> 804        804        804        804        804        804        804      
> 804        804       1961       1602        804        804       1258      
> [-----------------------------------------------------------------]        
> 803        803        803        803        803        803        803      
> 803        803        803        803        804        804        804     
> 804        804        804        804        804        804        803     
> 803        803        803        803        803       1306        803     
> 803   Performance monitoring interrupts                                   
> 								    
> 								    
> `perf top` with trace-imc (right after 'perf top' without trace-imc event):
> 								    
> # perf top -e trace_imc/trace_cycles/                                      
> 12.50%  [kernel]          [k] arch_cpu_idle                            
> 11.81%  [kernel]          [k] __next_timer_interrupt                   
> 11.22%  [kernel]          [k] rcu_idle_enter                           
> 10.25%  [kernel]          [k] find_next_bit                            
>  7.91%  [kernel]          [k] do_idle                                  
>  7.69%  [kernel]          [k] rcu_dynticks_eqs_exit                    
>  5.20%  [kernel]          [k] tick_nohz_idle_stop_tick                 
>      [-----------------------]                                      
> 								    
> # cat /proc/interrupts (a snippet from the output)                         
> 								    
> 9944      1072        804        804       1644        804       1306      
> 804        804        804        804        804        804        804      
> 804        804       1961       1602        804        804       1258      
> [-----------------------------------------------------------------]        
> 803        803        803        803        803        803        803      
> 803        803        803        804        804        804        804
> 804        804        804        804        804        804        803     
> 803        803        803        803        803       1306        803     
> 803   Performance monitoring interrupts                                   
> 								    
> The PMI interrupts count remains the same.

Something like the above, documenting the advantage, should probably
also be in the OPAL docs. i.e. this is a way for OS to look at
performance counters of recent events without having the performance
monitoring interrupts fire. e.g. you could leave it on constantly
without an impact to performance (one hopes).



>
> Changelog:
>
> From v1 -> v2
> ---------------
>
> * Added documentation for the new trace-mode support
> * Updated the commit message in the last patch
>
>
> Anju T Sudhakar (5):
>   skiboot/doc: Add documentation for trace-mode in imc.rst
>   skiboot: Edit documentation for IMC opal call to include trace-imc
>   skiboot: Trace IMC Macro definitions
>   skiboot: Enable opal calls to init/start/stop IMC Trace mode
>   skiboot: Mambo hack to add trace-imc nodes in the device-tree
>
>  doc/imc.rst                        |   9 +++
>  doc/opal-api/opal-imc-counters.rst |  28 ++++----
>  external/mambo/skiboot.tcl         |  21 ++++++
>  hw/imc.c                           | 103 +++++++++++++++++++++++++++++
>  include/imc.h                      |  30 +++++++++
>  include/opal-api.h                 |   1 +
>  6 files changed, 180 insertions(+), 12 deletions(-)
>
> -- 
> 2.17.1
>