diff mbox

powerpc/mm: Add trace point for tracking hash pte fault

Message ID 1421753750-17135-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Delegated to: Michael Ellerman
Headers show

Commit Message

Aneesh Kumar K.V Jan. 20, 2015, 11:35 a.m. UTC
This enables us to understand how many hash fault we are taking
when running benchmarks.

For ex:
-bash-4.2# ./perf stat -e  powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30  -P -n 1000
...

 Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000':

       1,10,04,075      powerpc:hash_fault
       1,10,03,429      page-faults

      30.865978991 seconds time elapsed

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/trace.h | 21 +++++++++++++++++++++
 arch/powerpc/mm/hash_utils_64.c  |  2 ++
 2 files changed, 23 insertions(+)

Comments

Michael Ellerman Jan. 21, 2015, 3:07 a.m. UTC | #1
On Tue, 2015-01-20 at 17:05 +0530, Aneesh Kumar K.V wrote:
> This enables us to understand how many hash fault we are taking
> when running benchmarks.
> 
> For ex:
> -bash-4.2# ./perf stat -e  powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30  -P -n 1000
> ...
> 
>  Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000':
> 
>        1,10,04,075      powerpc:hash_fault
>        1,10,03,429      page-faults
> 
>       30.865978991 seconds time elapsed

Looks good.

Can you attach some test results that show it's not hurting performance when
it's disabled.

cheers
Aneesh Kumar K.V Jan. 21, 2015, 8:45 a.m. UTC | #2
Michael Ellerman <mpe@ellerman.id.au> writes:

> On Tue, 2015-01-20 at 17:05 +0530, Aneesh Kumar K.V wrote:
>> This enables us to understand how many hash fault we are taking
>> when running benchmarks.
>> 
>> For ex:
>> -bash-4.2# ./perf stat -e  powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30  -P -n 1000
>> ...
>> 
>>  Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000':
>> 
>>        1,10,04,075      powerpc:hash_fault
>>        1,10,03,429      page-faults
>> 
>>       30.865978991 seconds time elapsed
>
> Looks good.
>
> Can you attach some test results that show it's not hurting performance when
> it's disabled.
>

ebizzy with -S 30 -t 1 -P gave
13627 records/s -> Without patch
13546 records/s -> With patch with tracepoint disabled
12408 records/s -> With patch with tracepoint enabled.

perf stat gave the below data for the above run.

 22,38,284      page-faults                                                 
 22,38,291      powerpc:hash_fault

I also used random_access_bench that Anton wrote, it actually
create lots of hash fault. A simple run gives.
(random_access_bench -o load -g -i -t 10 16G)

1,888      page-faults                                                 
2,64,283   powerpc:hash_fault                                          

random_access_bench gave:
1435.979 MB/s -> Without patch
1435.29  MB/s -> With patch with tracepoint disabled
1434.75  MB/s -> With patch with tracepoint enabled.

-aneesh
Michael Ellerman Jan. 28, 2015, 6:11 a.m. UTC | #3
On Wed, 2015-01-21 at 14:15 +0530, Aneesh Kumar K.V wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> 
> > On Tue, 2015-01-20 at 17:05 +0530, Aneesh Kumar K.V wrote:
> >> This enables us to understand how many hash fault we are taking
> >> when running benchmarks.
> >> 
> >> For ex:
> >> -bash-4.2# ./perf stat -e  powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30  -P -n 1000
> >> ...
> >> 
> >>  Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000':
> >> 
> >>        1,10,04,075      powerpc:hash_fault
> >>        1,10,03,429      page-faults
> >> 
> >>       30.865978991 seconds time elapsed
> >
> > Looks good.
> >
> > Can you attach some test results that show it's not hurting performance when
> > it's disabled.
> 
> ebizzy with -S 30 -t 1 -P gave
> 13627 records/s -> Without patch
> 13546 records/s -> With patch with tracepoint disabled

OK. So that's about -0.6%. Are we happy with that? I'm not sure.

Can you do a few more runs and see if that's a stable result.

> random_access_bench gave:
> 1435.979 MB/s -> Without patch
> 1435.29  MB/s -> With patch with tracepoint disabled

That's more like -0.05% which is in the noise.

cheers
Anton Blanchard Feb. 2, 2015, 10:26 a.m. UTC | #4
Hi,

> > ebizzy with -S 30 -t 1 -P gave
> > 13627 records/s -> Without patch
> > 13546 records/s -> With patch with tracepoint disabled
> 
> OK. So that's about -0.6%. Are we happy with that? I'm not sure.
> 
> Can you do a few more runs and see if that's a stable result.

Surprisingly large. Is CONFIG_JUMP_LABEL enabled? That should reduce the
tracepoint to just a nop.

Anton
Aneesh Kumar K.V Feb. 2, 2015, 4:12 p.m. UTC | #5
Michael Ellerman <mpe@ellerman.id.au> writes:

> On Wed, 2015-01-21 at 14:15 +0530, Aneesh Kumar K.V wrote:
>> Michael Ellerman <mpe@ellerman.id.au> writes:
>> 
>> > On Tue, 2015-01-20 at 17:05 +0530, Aneesh Kumar K.V wrote:
>> >> This enables us to understand how many hash fault we are taking
>> >> when running benchmarks.
>> >> 
>> >> For ex:
>> >> -bash-4.2# ./perf stat -e  powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30  -P -n 1000
>> >> ...
>> >> 
>> >>  Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000':
>> >> 
>> >>        1,10,04,075      powerpc:hash_fault
>> >>        1,10,03,429      page-faults
>> >> 
>> >>       30.865978991 seconds time elapsed
>> >
>> > Looks good.
>> >
>> > Can you attach some test results that show it's not hurting performance when
>> > it's disabled.
>> 
>> ebizzy with -S 30 -t 1 -P gave
>> 13627 records/s -> Without patch
>> 13546 records/s -> With patch with tracepoint disabled
>
> OK. So that's about -0.6%. Are we happy with that? I'm not sure.
>
> Can you do a few more runs and see if that's a stable result.

That is within the run variance for that test. Infact I found it
difficult to get a stable records/s with ebizzy run, even after fixing
the random state variable and forcing single thread. I ended up doing
a micro benchmark that allocate a large region and touch one byte per
page.

That resulted in
time perf stat -e page-faults -e powerpc:hash_fault  ./a.out

 Performance counter stats for './a.out':

         10,00,062      page-faults                                                 
         10,00,068      powerpc:hash_fault                                          

      12.414350121 seconds time elapsed


real    0m12.558s
user    0m0.577s
sys     0m11.932s

and with that test we have an average system time for 10 run
Without patch
sys: 0m11.2425

With patch:
sys: 0m11.3258

ie, a -0.7% impact 

If that impact is high we could possibly put that tracepoint within #ifdef
CONFIG_DEBUG_VM ?

-aneesh
Aneesh Kumar K.V Feb. 2, 2015, 4:21 p.m. UTC | #6
Anton Blanchard <anton@samba.org> writes:

> Hi,
>
>> > ebizzy with -S 30 -t 1 -P gave
>> > 13627 records/s -> Without patch
>> > 13546 records/s -> With patch with tracepoint disabled
>> 
>> OK. So that's about -0.6%. Are we happy with that? I'm not sure.
>> 
>> Can you do a few more runs and see if that's a stable result.
>
> Surprisingly large. Is CONFIG_JUMP_LABEL enabled? That should reduce the
> tracepoint to just a nop.

yes. We do use jump label. I also verified that looking at .s

#APP
 # 23 "./arch/powerpc/include/asm/jump_label.h" 1
        1:
        nop
        .pushsection __jump_table,  "aw"
        .llong 1b, .L201, __tracepoint_hash_fault+8      #,
        .popsection

 # 0 "" 2
.....

-aneesh
Anton Blanchard Feb. 2, 2015, 10:01 p.m. UTC | #7
Hi Aneesh,

> yes. We do use jump label. I also verified that looking at .s
> 
> #APP
>  # 23 "./arch/powerpc/include/asm/jump_label.h" 1
>         1:
>         nop
>         .pushsection __jump_table,  "aw"
>         .llong 1b, .L201, __tracepoint_hash_fault+8      #,
>         .popsection
> 
>  # 0 "" 2

So we insert a single nop, and the slow path is in another section. I'd
be surprised if we could measure this, unless the nop causes a branch
target alignment issue the slow path caused some hot path icache layout
issues.

> Without patch
> sys: 0m11.2425
> 
> With patch:
> sys: 0m11.3258
>
> ie, a -0.7% impact 
> 
> If that impact is high we could possibly put that tracepoint within
> #ifdef CONFIG_DEBUG_VM ?

Did the real time change? I'd be careful about comparing based on
system time.

My feeling is we should not hide it behind CONFIG_DEBUG_VM.

Anton
Aneesh Kumar K.V Feb. 3, 2015, 3:07 a.m. UTC | #8
Anton Blanchard <anton@samba.org> writes:

> Hi Aneesh,
>
>> yes. We do use jump label. I also verified that looking at .s
>> 
>> #APP
>>  # 23 "./arch/powerpc/include/asm/jump_label.h" 1
>>         1:
>>         nop
>>         .pushsection __jump_table,  "aw"
>>         .llong 1b, .L201, __tracepoint_hash_fault+8      #,
>>         .popsection
>> 
>>  # 0 "" 2
>
> So we insert a single nop, and the slow path is in another section. I'd
> be surprised if we could measure this, unless the nop causes a branch
> target alignment issue the slow path caused some hot path icache layout
> issues.
>
>> Without patch
>> sys: 0m11.2425
>> 
>> With patch:
>> sys: 0m11.3258
>>
>> ie, a -0.7% impact 
>> 
>> If that impact is high we could possibly put that tracepoint within
>> #ifdef CONFIG_DEBUG_VM ?
>
> Did the real time change? I'd be careful about comparing based on
> system time.


Yes it did. 
11.8769 with patch
11.7924 without patch

I will look at the perf stat data difference between the both runs and update.

-aneesh
Aneesh Kumar K.V April 2, 2015, 8:44 a.m. UTC | #9
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> Michael Ellerman <mpe@ellerman.id.au> writes:
>
....
....
> With patch:
> sys: 0m11.3258
>
> ie, a -0.7% impact 
>
> If that impact is high we could possibly put that tracepoint within #ifdef
> CONFIG_DEBUG_VM ?

Since the ebizzy runs results were not stable, I did a micro benchmark
to measure this and I noticed that results observed are within the
run variance of the test. I made sure we don't have context-switches
between the runs. If I try to get large number of page-faults, we end up
with context switches.

for ex: We get without patch
--------------------------------
root@qemu-pr-host trace-fault]# bash run 

 Performance counter stats for './a.out 3000 300':

               643      page-faults               #    0.089 M/sec                  
          7.236562      task-clock (msec)         #    0.928 CPUs utilized          
         2,179,213      stalled-cycles-frontend   #    0.00% frontend cycles idle   
        17,174,367      stalled-cycles-backend    #    0.00% backend  cycles idle   
                 0      context-switches          #    0.000 K/sec                  

       0.007794658 seconds time elapsed

[root@qemu-pr-host trace-fault]# 

And with-patch:
---------------
[root@qemu-pr-host trace-fault]# bash run 

 Performance counter stats for './a.out 3000 300':

               643      page-faults               #    0.089 M/sec                  
          7.233746      task-clock (msec)         #    0.921 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  

       0.007854876 seconds time elapsed


 Performance counter stats for './a.out 3000 300':

               643      page-faults               #    0.087 M/sec                  
               649      powerpc:hash_fault        #    0.087 M/sec                  
          7.430376      task-clock (msec)         #    0.938 CPUs utilized          
         2,347,174      stalled-cycles-frontend   #    0.00% frontend cycles idle   
        17,524,282      stalled-cycles-backend    #    0.00% backend  cycles idle   
                 0      context-switches          #    0.000 K/sec                  

       0.007920284 seconds time elapsed

[root@qemu-pr-host trace-fault]#
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h
index c15da6073cb8..ded56b183888 100644
--- a/arch/powerpc/include/asm/trace.h
+++ b/arch/powerpc/include/asm/trace.h
@@ -142,8 +142,29 @@  TRACE_EVENT_FN(opal_exit,
 
 	opal_tracepoint_regfunc, opal_tracepoint_unregfunc
 );
+
 #endif
 
+TRACE_EVENT(hash_fault,
+
+	    TP_PROTO(unsigned long addr, unsigned long access, unsigned long trap),
+	    TP_ARGS(addr, access, trap),
+	    TP_STRUCT__entry(
+		    __field(unsigned long, addr)
+		    __field(unsigned long, access)
+		    __field(unsigned long, trap)
+		    ),
+
+	    TP_fast_assign(
+		    __entry->addr = addr;
+		    __entry->access = access;
+		    __entry->trap = trap;
+		    ),
+
+	    TP_printk("hash fault with addr 0x%lx and access = 0x%lx trap = 0x%lx",
+		      __entry->addr, __entry->access, __entry->trap)
+);
+
 #endif /* _TRACE_POWERPC_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 2c2022d16059..7e88470a876f 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -57,6 +57,7 @@ 
 #include <asm/fadump.h>
 #include <asm/firmware.h>
 #include <asm/tm.h>
+#include <asm/trace.h>
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -1004,6 +1005,7 @@  int hash_page_mm(struct mm_struct *mm, unsigned long ea,
 
 	DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
 		ea, access, trap);
+	trace_hash_fault(ea, access, trap);
 
 	/* Get region & vsid */
  	switch (REGION_ID(ea)) {