diff mbox

[v2,1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

Message ID 1488796993-25495-2-git-send-email-maddy@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show

Commit Message

maddy March 6, 2017, 10:43 a.m. UTC
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 include/uapi/linux/perf_event.h       | 16 ++++++++++++++++
 tools/include/uapi/linux/perf_event.h | 16 ++++++++++++++++
 2 files changed, 32 insertions(+)

Comments

Peter Zijlstra March 6, 2017, 11:22 a.m. UTC | #1
On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> 
> perf_mem_data_src is an union that is initialized via the ->val field
> and accessed via the bitmap fields. For this to work on big endian
> platforms, we also need a big-endian represenation of perf_mem_data_src.

Doesn't this break interpreting the data on a different endian machine?
David Laight March 6, 2017, 2:59 p.m. UTC | #2
From: Peter Zijlstra
> Sent: 06 March 2017 11:22
> To: Madhavan Srinivasan
> Cc: Wang Nan; Alexander Shishkin; linux-kernel@vger.kernel.org; Arnaldo Carvalho de Melo; Alexei
> Starovoitov; Ingo Molnar; Stephane Eranian; Sukadev Bhattiprolu; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH v2 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src
> 
> On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> > From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> >
> > perf_mem_data_src is an union that is initialized via the ->val field
> > and accessed via the bitmap fields. For this to work on big endian
> > platforms, we also need a big-endian represenation of perf_mem_data_src.
> 
> Doesn't this break interpreting the data on a different endian machine?

Best to avoid bitfields if you ever care about the bit order.

	David
Peter Zijlstra March 6, 2017, 3:28 p.m. UTC | #3
On Mon, Mar 06, 2017 at 02:59:07PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 06 March 2017 11:22
> > To: Madhavan Srinivasan
> > Cc: Wang Nan; Alexander Shishkin; linux-kernel@vger.kernel.org; Arnaldo Carvalho de Melo; Alexei
> > Starovoitov; Ingo Molnar; Stephane Eranian; Sukadev Bhattiprolu; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH v2 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src
> > 
> > On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> > > From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> > >
> > > perf_mem_data_src is an union that is initialized via the ->val field
> > > and accessed via the bitmap fields. For this to work on big endian
> > > platforms, we also need a big-endian represenation of perf_mem_data_src.
> > 
> > Doesn't this break interpreting the data on a different endian machine?
> 
> Best to avoid bitfields if you ever care about the bit order.

Too late for that. But perf tool has quite a bit of code to muck fields
between different endians. With the full intent that generation on one
machine is readable by another.
maddy March 7, 2017, 9:58 a.m. UTC | #4
On Monday 06 March 2017 04:52 PM, Peter Zijlstra wrote:
> On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
>> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
>>
>> perf_mem_data_src is an union that is initialized via the ->val field
>> and accessed via the bitmap fields. For this to work on big endian
>> platforms, we also need a big-endian represenation of perf_mem_data_src.
> Doesn't this break interpreting the data on a different endian machine?

IIUC, we will need this patch to not to break the interpreting data
on a different endian machine. Data collected from power8 LE/BE
guests with this patchset applied. Kindly correct me if I missed
your question here.


With this patchset applied, perf.data from a power8 BigEndian guest:
==============================================================

$ sudo ./perf record -d -e mem_access ls
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.007 MB perf.data (8 samples) ]

$ sudo ./perf report --mem-mode --stdio
   # To display the perf.data header info, please use 
--header/--header-only options.
   #
   #
   # Total Lost Samples: 0
   #
   # Samples: 8  of event 'mem_access'
   # Total weight : 8
   # Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
   #
   # Overhead  Local Weight  Memory access Symbol                       
Shared Object     Data Symbol                             Data 
Object                         Snoop         TLB access              Locked
   # ........  ............  ........................ 
...........................  ................ 
...................................... 
..................................  ............ ......................  
......
   #
       25.00%  0             L2 hit                    [H] 
0xc00000000000c910       [unknown]         [H] 0xc000000f170e5310 
[unknown]                           N/A N/A                     No
       12.50%  0             L2 hit                    [k] 
.idle_cpu                [kernel.vmlinux]  [k] __per_cpu_offset+0x68 
[kernel.vmlinux].data..read_mostly  N/A N/A                     No
       12.50%  0             L2 hit                    [H] 
0xc00000000000ca58       [unknown]         [H] 0xc000000f170e5200 
[unknown]                           N/A N/A                     No
       12.50%  0             L3 hit                    [k] 
.copypage_power7         [kernel.vmlinux]  [k] 0xc00000002f6fc600 
[kernel.vmlinux].bss                N/A N/A                     No
       12.50%  0             L3 hit                    [k] 
.copypage_power7         [kernel.vmlinux]  [k] 0xc00000003f8b1980 
[kernel.vmlinux].bss                N/A N/A                     No
       12.50%  0             Local RAM hit             [k] 
._raw_spin_lock_irqsave  [kernel.vmlinux]  [k] 0xc000000033b5bdf4 
[kernel.vmlinux].bss                Miss N/A                     No
       12.50%  0             Remote Cache (1 hop) hit  [k] 
.perf_iterate_ctx        [kernel.vmlinux]  [k] 0xc000000000e88648 
[kernel.vmlinux]                    HitM N/A                     No


perf report from power8 LittleEndian guest (with this patch applied to 
perf tool):
==================================================================================

$ ./perf report --mem-mode --stdio -i perf.data.p8be.withpatch
   No kallsyms or vmlinux with build-id 
ca8a1a9d4b62b2a67ee01050afb1dfa03565a655 was found
   /boot/vmlinux with build id ca8a1a9d4b62b2a67ee01050afb1dfa03565a655 
not found, continuing without symbols
   No kallsyms or vmlinux with build-id 
ca8a1a9d4b62b2a67ee01050afb1dfa03565a655 was found
   /boot/vmlinux with build id ca8a1a9d4b62b2a67ee01050afb1dfa03565a655 
not found, continuing without symbols
   # To display the perf.data header info, please use 
--header/--header-only options.
   #
   #
   # Total Lost Samples: 0
   #
   # Samples: 8  of event 'mem_access'
   # Total weight : 8
   # Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
   #
   # Overhead  Local Weight  Memory access Symbol                  
Shared Object     Data Symbol Data Object       Snoop         TLB 
access              Locked
   # ........  ............  ........................ 
......................  ................  ...................... 
................  ............  ......................  ......
   #
       25.00%  0             L2 hit                    [H] 
0xc00000000000c910  [unknown]         [H] 0xc000000f170e5310 
[unknown]         N/A           N/A                     No
       12.50%  0             L2 hit                    [k] 
0xc0000000000f4d0c  [kernel.vmlinux]  [k] 0xc000000000f2dac8 
[kernel.vmlinux]  N/A           N/A                     No
       12.50%  0             L2 hit                    [H] 
0xc00000000000ca58  [unknown]         [H] 0xc000000f170e5200 
[unknown]         N/A           N/A                     No
       12.50%  0             L3 hit                    [k] 
0xc00000000006b560  [kernel.vmlinux]  [k] 0xc00000002f6fc600 
[kernel.vmlinux]  N/A           N/A                     No
       12.50%  0             L3 hit                    [k] 
0xc00000000006b560  [kernel.vmlinux]  [k] 0xc00000003f8b1980 
[kernel.vmlinux]  N/A           N/A                     No
       12.50%  0             Local RAM hit             [k] 
0xc00000000094ad14  [kernel.vmlinux]  [k] 0xc000000033b5bdf4 
[kernel.vmlinux]  Miss          N/A                     No
       12.50%  0             Remote Cache (1 hop) hit  [k] 
0xc0000000001ce31c  [kernel.vmlinux]  [k] 0xc000000000e88648 
[kernel.vmlinux]  HitM          N/A                     No



With this patch, perf.data from a power8 LE guest:
===================================================

$ sudo ./perf record -d -e mem_access ls
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.010 MB perf.data (6 samples) ]

$ sudo ./perf report --mem-mode --stdio
   # To display the perf.data header info, please use 
--header/--header-only options.
   #
   #
   # Total Lost Samples: 0
   #
   # Samples: 6  of event 'mem_access'
   # Total weight : 6
   # Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
   #
   # Overhead  Local Weight  Memory access Symbol                  
Shared Object     Data Symbol Data Object       Snoop         TLB 
access              Locked
   # ........  ............  ........................ 
......................  ................  ...................... 
................  ............  ......................  ......
   #
       16.67%  0             L2 hit                    [.] 
_init               ls                [.] 0x000001002ef4dd71 
[heap]            N/A           N/A                     No
       16.67%  0             L2 hit                    [k] 
irq_exit            [kernel.vmlinux]  [k] 0xc0000000ff6e2080 
[kernel.vmlinux]  N/A           N/A                     No
       16.67%  0             L2 hit                    [H] 
0xc0000000000ba210  [unknown]         [H] 0xc00000075ce2fe38 
[unknown]         N/A           N/A                     No
       16.67%  0             L2 hit                    [H] 
0xc00000000000bdb0  [unknown]         [H] 0xc00000075ce2fee0 
[unknown]         N/A           N/A                     No
       16.67%  0             L2 hit                    [H] 
0xc0000000000bb444  [unknown]         [H] 0xc00000075ce30490 
[unknown]         N/A           N/A                     No
       16.67%  0             L3 hit                    [H] 
0x0000000000066524  [unknown]         [H] 0xc0000000014e0000 
[unknown]         N/A           N/A                     No

perf report from power7 BE guest(with this patch applied to perf tool):
=========================================================================

$ ./perf report --mem-mode --stdio
   No kallsyms or vmlinux with build-id 
06240a7589956e7d388bdffdab9f7f138834fa81 was found
   /lib/modules/4.10.0-rc5+/build/vmlinux with build id 
06240a7589956e7d388bdffdab9f7f138834fa81 not found, continuing without 
symbols
   No kallsyms or vmlinux with build-id 
06240a7589956e7d388bdffdab9f7f138834fa81 was found
   /lib/modules/4.10.0-rc5+/build/vmlinux with build id 
06240a7589956e7d388bdffdab9f7f138834fa81 not found, continuing without 
symbols
   /usr/bin/ls with build id cf69c39cf0d28e5be86c03de5c556e3ce8d6ce27 
not found, continuing without symbols
   # To display the perf.data header info, please use 
--header/--header-only options.
   #
   #
   # Total Lost Samples: 0
   #
   # Samples: 6  of event 'mem_access'
   # Total weight : 6
   # Sort order   : 
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
   #
   # Overhead  Local Weight  Memory access Symbol                  
Shared Object     Data Symbol Data Object       Snoop         TLB 
access              Locked
   # ........  ............  ........................ 
......................  ................  ...................... 
................  ............  ......................  ......
   #
       16.67%  0             L2 hit                    [k] 
0xc0000000000d3cc8  [kernel.vmlinux]  [k] 0xc0000000ff6e2080 
[kernel.vmlinux]  N/A           N/A                     No
       16.67%  0             L2 hit                    [.] 
0x0000000000012ba8  ls                [.] 0x000001002ef4dd71 
[heap]            N/A           N/A                     No
       16.67%  0             L2 hit                    [H] 
0xc0000000000ba210  [unknown]         [H] 0xc00000075ce2fe38 
[unknown]         N/A           N/A                     No
       16.67%  0             L2 hit                    [H] 
0xc00000000000bdb0  [unknown]         [H] 0xc00000075ce2fee0 
[unknown]         N/A           N/A                     No
       16.67%  0             L2 hit                    [H] 
0xc0000000000bb444  [unknown]         [H] 0xc00000075ce30490 
[unknown]         N/A           N/A                     No
       16.67%  0             L3 hit                    [H] 
0x0000000000066524  [unknown]         [H] 0xc0000000014e0000 
[unknown]         N/A           N/A                     No


Maddy


>
Peter Zijlstra March 7, 2017, 10:23 a.m. UTC | #5
On Tue, Mar 07, 2017 at 03:28:17PM +0530, Madhavan Srinivasan wrote:
> 
> 
> On Monday 06 March 2017 04:52 PM, Peter Zijlstra wrote:
> >On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> >>From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> >>
> >>perf_mem_data_src is an union that is initialized via the ->val field
> >>and accessed via the bitmap fields. For this to work on big endian
> >>platforms, we also need a big-endian represenation of perf_mem_data_src.
> >Doesn't this break interpreting the data on a different endian machine?
> 
> IIUC, we will need this patch to not to break the interpreting data
> on a different endian machine. Data collected from power8 LE/BE
> guests with this patchset applied. Kindly correct me if I missed
> your question here.

So your patch adds compile time bitfield differences. My worry was that
there was no dynamic conversion routine in the tools (it has for a lot
of other places).

This yields two questions:

 - are these two static layouts identical? (seeing that you illustrate
   cross-endian things working this seems likely).

 - should you not have fixed this in the tool only? This patch
   effectively breaks ABI on big-endian architectures.
maddy March 13, 2017, 11:15 a.m. UTC | #6
On Tuesday 07 March 2017 03:53 PM, Peter Zijlstra wrote:
> On Tue, Mar 07, 2017 at 03:28:17PM +0530, Madhavan Srinivasan wrote:
>>
>> On Monday 06 March 2017 04:52 PM, Peter Zijlstra wrote:
>>> On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
>>>> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
>>>>
>>>> perf_mem_data_src is an union that is initialized via the ->val field
>>>> and accessed via the bitmap fields. For this to work on big endian
>>>> platforms, we also need a big-endian represenation of perf_mem_data_src.
>>> Doesn't this break interpreting the data on a different endian machine?
>> IIUC, we will need this patch to not to break the interpreting data
>> on a different endian machine. Data collected from power8 LE/BE
>> guests with this patchset applied. Kindly correct me if I missed
>> your question here.
> So your patch adds compile time bitfield differences. My worry was that
> there was no dynamic conversion routine in the tools (it has for a lot
> of other places).
>
> This yields two questions:
>
>   - are these two static layouts identical? (seeing that you illustrate
>     cross-endian things working this seems likely).
>
>   - should you not have fixed this in the tool only? This patch
>     effectively breaks ABI on big-endian architectures.

IIUC, we are the first BE user for this feature
(Kindly correct me if I am wrong), so technically we
are not breaking ABI here :) .  But let me also look  at
the dynamic conversion part.

Maddy

>
Peter Zijlstra March 13, 2017, 12:50 p.m. UTC | #7
On Mon, Mar 13, 2017 at 04:45:51PM +0530, Madhavan Srinivasan wrote:
> >  - should you not have fixed this in the tool only? This patch
> >    effectively breaks ABI on big-endian architectures.
> 
> IIUC, we are the first BE user for this feature
> (Kindly correct me if I am wrong), so technically we
> are not breaking ABI here :) .  But let me also look  at
> the dynamic conversion part.

Huh? PPC hasn't yet implemented this? Then why are you fixing it?
maddy March 14, 2017, 9:01 a.m. UTC | #8
On Monday 13 March 2017 06:20 PM, Peter Zijlstra wrote:
> On Mon, Mar 13, 2017 at 04:45:51PM +0530, Madhavan Srinivasan wrote:
>>>   - should you not have fixed this in the tool only? This patch
>>>     effectively breaks ABI on big-endian architectures.
>> IIUC, we are the first BE user for this feature
>> (Kindly correct me if I am wrong), so technically we
>> are not breaking ABI here :) .  But let me also look  at
>> the dynamic conversion part.
> Huh? PPC hasn't yet implemented this? Then why are you fixing it?

yes, PPC hasn't implemented this (until now).
And did not understand "Then why are you fixing it?"

Maddy
>
Peter Zijlstra March 14, 2017, 12:56 p.m. UTC | #9
On Tue, Mar 14, 2017 at 02:31:51PM +0530, Madhavan Srinivasan wrote:

> >Huh? PPC hasn't yet implemented this? Then why are you fixing it?
> 
> yes, PPC hasn't implemented this (until now).

until now where?

> And did not understand "Then why are you fixing it?"

I see no implementation; so why are you poking at it.
Michael Ellerman March 15, 2017, 6:20 a.m. UTC | #10
Hi Peter,

Peter Zijlstra <peterz@infradead.org> writes:
> On Tue, Mar 14, 2017 at 02:31:51PM +0530, Madhavan Srinivasan wrote:
>
>> >Huh? PPC hasn't yet implemented this? Then why are you fixing it?
>> 
>> yes, PPC hasn't implemented this (until now).
>
> until now where?

On powerpc there is currently no kernel support for filling the data_src
value with anything meaningful.

A user can still request PERF_SAMPLE_DATA_SRC (perf report -d), but they
just get the default value from perf_sample_data_init(), which is
PERF_MEM_NA.

Though even that is currently broken with a big endian perf tool.

>> And did not understand "Then why are you fixing it?"
>
> I see no implementation; so why are you poking at it.

Maddy has posted an implementation of the kernel part for powerpc in
patch 2 of this series, but maybe you're not on Cc?


Regardless of us wanting to do the kernel side on powerpc, the current
API is broken on big endian.

That's because in the kernel the PERF_MEM_NA value is constructed using
shifts:

  /* TLB access */
  #define PERF_MEM_TLB_NA		0x01 /* not available */
  ...
  #define PERF_MEM_TLB_SHIFT	26
  
  #define PERF_MEM_S(a, s) \
  	(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
  
  #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
  		    PERF_MEM_S(LVL, NA)   |\
  		    PERF_MEM_S(SNOOP, NA) |\
  		    PERF_MEM_S(LOCK, NA)  |\
  		    PERF_MEM_S(TLB, NA))

Which works out as:

  ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))


Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.

But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.

So eg. in perf_mem__tlb_scnprintf() we see:
  data_src->val = 0x5080021
             op = 0x0
            lvl = 0x0
          snoop = 0x0
           lock = 0x0
           dtlb = 0x0
           rsvd = 0x5080021


So this patch does what I think is the minimal fix, of changing the
definition of the bitfields to match the values that are already
exported by the kernel on big endian. And it makes no change on little
endian.

cheers
Peter Zijlstra March 15, 2017, 12:23 p.m. UTC | #11
On Wed, Mar 15, 2017 at 05:20:15PM +1100, Michael Ellerman wrote:

> > I see no implementation; so why are you poking at it.
> 
> Maddy has posted an implementation of the kernel part for powerpc in
> patch 2 of this series, but maybe you're not on Cc?

I am not indeed. That and a completely inadequate Changelog have lead to
great confusion.
maddy March 16, 2017, 5:47 a.m. UTC | #12
On Wednesday 15 March 2017 11:50 AM, Michael Ellerman wrote:
> Hi Peter,
>
> Peter Zijlstra <peterz@infradead.org> writes:
>> On Tue, Mar 14, 2017 at 02:31:51PM +0530, Madhavan Srinivasan wrote:
>>
>>>> Huh? PPC hasn't yet implemented this? Then why are you fixing it?
>>> yes, PPC hasn't implemented this (until now).
>> until now where?
> On powerpc there is currently no kernel support for filling the data_src
> value with anything meaningful.
>
> A user can still request PERF_SAMPLE_DATA_SRC (perf report -d), but they
> just get the default value from perf_sample_data_init(), which is
> PERF_MEM_NA.
>
> Though even that is currently broken with a big endian perf tool.
>
>>> And did not understand "Then why are you fixing it?"
>> I see no implementation; so why are you poking at it.
> Maddy has posted an implementation of the kernel part for powerpc in
> patch 2 of this series, but maybe you're not on Cc?

Sorry, was out yesterday.

Yes my bad. I CCed lkml and ppcdev and took the emails
from get_maintainer script and added to each file.

I will send out a v3 with peterz and others in all patch.

>
> Regardless of us wanting to do the kernel side on powerpc, the current
> API is broken on big endian.
>
> That's because in the kernel the PERF_MEM_NA value is constructed using
> shifts:
>
>    /* TLB access */
>    #define PERF_MEM_TLB_NA		0x01 /* not available */
>    ...
>    #define PERF_MEM_TLB_SHIFT	26
>    
>    #define PERF_MEM_S(a, s) \
>    	(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
>    
>    #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
>    		    PERF_MEM_S(LVL, NA)   |\
>    		    PERF_MEM_S(SNOOP, NA) |\
>    		    PERF_MEM_S(LOCK, NA)  |\
>    		    PERF_MEM_S(TLB, NA))
>
> Which works out as:
>
>    ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))
>
>
> Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
> in CPU endian.
>
> But then in the perf tool, the code uses the bitfields to inspect the
> value, and currently the bitfields are defined using little endian
> ordering.
>
> So eg. in perf_mem__tlb_scnprintf() we see:
>    data_src->val = 0x5080021
>               op = 0x0
>              lvl = 0x0
>            snoop = 0x0
>             lock = 0x0
>             dtlb = 0x0
>             rsvd = 0x5080021
>
>
> So this patch does what I think is the minimal fix, of changing the
> definition of the bitfields to match the values that are already
> exported by the kernel on big endian. And it makes no change on little
> endian.

Thanks for the detailed explanation. I will add this to the patch
commit message in the v3.

Maddy

>
> cheers
>
maddy March 16, 2017, 5:53 a.m. UTC | #13
On Wednesday 15 March 2017 05:53 PM, Peter Zijlstra wrote:
> On Wed, Mar 15, 2017 at 05:20:15PM +1100, Michael Ellerman wrote:
>
>>> I see no implementation; so why are you poking at it.
>> Maddy has posted an implementation of the kernel part for powerpc in
>> patch 2 of this series, but maybe you're not on Cc?
> I am not indeed. That and a completely inadequate Changelog have lead to
> great confusion.

Yes. my bad. I will send out a v3 today and will CC. Also will add
ellerman's explanation to the commit message.

Sorry for the confusion.

Maddy
diff mbox

Patch

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@  enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP		(1UL << 2) /* pid=cgroup id, per-cpu mode only */
 #define PERF_FLAG_FD_CLOEXEC		(1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
 	__u64 val;
 	struct {
@@ -902,6 +903,21 @@  union perf_mem_data_src {
 			mem_rsvd:31;
 	};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+	__u64 val;
+	struct {
+		__u64	mem_rsvd:31,
+			mem_dtlb:7,	/* tlb access */
+			mem_lock:2,	/* lock instr */
+			mem_snoop:5,	/* snoop mode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_op:5;	/* type of opcode */
+	};
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA		0x01 /* not available */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@  enum perf_callchain_context {
 #define PERF_FLAG_PID_CGROUP		(1UL << 2) /* pid=cgroup id, per-cpu mode only */
 #define PERF_FLAG_FD_CLOEXEC		(1UL << 3) /* O_CLOEXEC */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
 	__u64 val;
 	struct {
@@ -902,6 +903,21 @@  union perf_mem_data_src {
 			mem_rsvd:31;
 	};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+	__u64 val;
+	struct {
+		__u64	mem_rsvd:31,
+			mem_dtlb:7,	/* tlb access */
+			mem_lock:2,	/* lock instr */
+			mem_snoop:5,	/* snoop mode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_op:5;	/* type of opcode */
+	};
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA		0x01 /* not available */