From patchwork Wed Oct 30 10:42:46 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 287186 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from ozlabs.org (localhost [IPv6:::1]) by ozlabs.org (Postfix) with ESMTP id 738E32C04C2 for ; Wed, 30 Oct 2013 21:43:47 +1100 (EST) Received: by ozlabs.org (Postfix) id 727902C019B; Wed, 30 Oct 2013 21:43:19 +1100 (EST) Delivered-To: linuxppc-dev@ozlabs.org Received: from merlin.infradead.org (unknown [IPv6:2001:4978:20e::2]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 208E92C0127 for ; Wed, 30 Oct 2013 21:43:19 +1100 (EST) Received: from dhcp-077-248-225-117.chello.nl ([77.248.225.117] helo=laptop) by merlin.infradead.org with esmtpsa (Exim 4.80.1 #2 (Red Hat Linux)) id 1VbTEe-00053Y-Ei; Wed, 30 Oct 2013 10:42:48 +0000 Received: by laptop (Postfix, from userid 1000) id 889171003D8BC; Wed, 30 Oct 2013 11:42:46 +0100 (CET) Date: Wed, 30 Oct 2013 11:42:46 +0100 From: Peter Zijlstra To: Vince Weaver Subject: Re: perf events ring buffer memory barrier on powerpc Message-ID: <20131030104246.GH16117@laptop.programming.kicks-ass.net> References: <20131025173749.GG19466@laptop.lan> <20131028132634.GO19466@laptop.lan> <20131028163418.GD4126@linux.vnet.ibm.com> <20131028201735.GA15629@redhat.com> <20131029102131.GA16117@laptop.programming.kicks-ass.net> <20131029103057.GN2490@laptop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Cc: Michael Neuling , james.hogan@imgtec.com, Mathieu Desnoyers , Oleg Nesterov , LKML , Linux PPC dev , Anton Blanchard , Frederic Weisbecker , Victor Kaplansky , "Paul E. McKenney" X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.16rc2 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Oct 29, 2013 at 03:27:05PM -0400, Vince Weaver wrote: > On Tue, 29 Oct 2013, Peter Zijlstra wrote: > > > On Tue, Oct 29, 2013 at 11:21:31AM +0100, Peter Zijlstra wrote: > > --- linux-2.6.orig/include/uapi/linux/perf_event.h > > +++ linux-2.6/include/uapi/linux/perf_event.h > > @@ -479,13 +479,15 @@ struct perf_event_mmap_page { > > /* > > * Control data for the mmap() data buffer. > > * > > - * User-space reading the @data_head value should issue an rmb(), on > > - * SMP capable platforms, after reading this value -- see > > - * perf_event_wakeup(). > > + * User-space reading the @data_head value should issue an smp_rmb(), > > + * after reading this value. > > so where's the patch fixing perf to use the new recommendations? Fair enough, thanks for reminding me about that. See below. > Is this purely a performance thing or a correctness change? Correctness, although I suppose on most archs you'd be hard pushed to notice. > A change like this a bit of a pain, especially as userspace doesn't really > have nice access to smb_mb() defines so a lot of cut-and-pasting will > ensue for everyone who's trying to parse the mmap buffer. Agreed; we should maybe push for a user visible asm/barrier.h or so. --- Subject: perf, tool: Add required memory barriers To match patch bf378d341e48 ("perf: Fix perf ring buffer memory ordering") change userspace to also adhere to the ordering outlined. Most barrier implementations were gleaned from arch/*/include/asm/barrier.h and with the exception of metag I'm fairly sure they're correct. Cc: James Hogan Signed-off-by: Peter Zijlstra --- tools/perf/perf.h | 39 +++++++++++++++++++++++++++++++++++++-- tools/perf/util/evlist.h | 2 +- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/tools/perf/perf.h b/tools/perf/perf.h index f61c230beec4..1b8a0a2a63d4 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -4,6 +4,8 @@ #include #if defined(__i386__) +#define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory") +#define wmb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory") #define rmb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory") #define cpu_relax() asm volatile("rep; nop" ::: "memory"); #define CPUINFO_PROC "model name" @@ -13,6 +15,8 @@ #endif #if defined(__x86_64__) +#define mb() asm volatile("mfence" ::: "memory") +#define wmb() asm volatile("sfence" ::: "memory") #define rmb() asm volatile("lfence" ::: "memory") #define cpu_relax() asm volatile("rep; nop" ::: "memory"); #define CPUINFO_PROC "model name" @@ -23,20 +27,28 @@ #ifdef __powerpc__ #include "../../arch/powerpc/include/uapi/asm/unistd.h" +#define mb() asm volatile ("sync" ::: "memory") +#define wmb() asm volatile ("sync" ::: "memory") #define rmb() asm volatile ("sync" ::: "memory") #define cpu_relax() asm volatile ("" ::: "memory"); #define CPUINFO_PROC "cpu" #endif #ifdef __s390__ +#define mb() asm volatile("bcr 15,0" ::: "memory") +#define wmb() asm volatile("bcr 15,0" ::: "memory") #define rmb() asm volatile("bcr 15,0" ::: "memory") #define cpu_relax() asm volatile("" ::: "memory"); #endif #ifdef __sh__ #if defined(__SH4A__) || defined(__SH5__) +# define mb() asm volatile("synco" ::: "memory") +# define wmb() asm volatile("synco" ::: "memory") # define rmb() asm volatile("synco" ::: "memory") #else +# define mb() asm volatile("" ::: "memory") +# define wmb() asm volatile("" ::: "memory") # define rmb() asm volatile("" ::: "memory") #endif #define cpu_relax() asm volatile("" ::: "memory") @@ -44,24 +56,38 @@ #endif #ifdef __hppa__ +#define mb() asm volatile("" ::: "memory") +#define wmb() asm volatile("" ::: "memory") #define rmb() asm volatile("" ::: "memory") #define cpu_relax() asm volatile("" ::: "memory"); #define CPUINFO_PROC "cpu" #endif #ifdef __sparc__ +#ifdef __LP64__ +#define mb() asm volatile("ba,pt %%xcc, 1f\n" \ + "membar #StoreLoad\n" \ + "1:\n"":::"memory") +#else +#define mb() asm volatile("":::"memory") +#endif +#define wmb() asm volatile("":::"memory") #define rmb() asm volatile("":::"memory") #define cpu_relax() asm volatile("":::"memory") #define CPUINFO_PROC "cpu" #endif #ifdef __alpha__ +#define mb() asm volatile("mb" ::: "memory") +#define wmb() asm volatile("wmb" ::: "memory") #define rmb() asm volatile("mb" ::: "memory") #define cpu_relax() asm volatile("" ::: "memory") #define CPUINFO_PROC "cpu model" #endif #ifdef __ia64__ +#define mb() asm volatile ("mf" ::: "memory") +#define wmb() asm volatile ("mf" ::: "memory") #define rmb() asm volatile ("mf" ::: "memory") #define cpu_relax() asm volatile ("hint @pause" ::: "memory") #define CPUINFO_PROC "model name" @@ -72,35 +98,44 @@ * Use the __kuser_memory_barrier helper in the CPU helper page. See * arch/arm/kernel/entry-armv.S in the kernel source for details. */ +#define mb() ((void(*)(void))0xffff0fa0)() +#define wmb() ((void(*)(void))0xffff0fa0)() #define rmb() ((void(*)(void))0xffff0fa0)() #define cpu_relax() asm volatile("":::"memory") #define CPUINFO_PROC "Processor" #endif #ifdef __aarch64__ -#define rmb() asm volatile("dmb ld" ::: "memory") +#define mb() asm volatile("dmb ish" ::: "memory") +#define wmb() asm volatile("dmb ishld" ::: "memory") +#define rmb() asm volatile("dmb ishst" ::: "memory") #define cpu_relax() asm volatile("yield" ::: "memory") #endif #ifdef __mips__ -#define rmb() asm volatile( \ +#define mb() asm volatile( \ ".set mips2\n\t" \ "sync\n\t" \ ".set mips0" \ : /* no output */ \ : /* no input */ \ : "memory") +#define wmb() mb() +#define rmb() mb() #define cpu_relax() asm volatile("" ::: "memory") #define CPUINFO_PROC "cpu model" #endif #ifdef __arc__ +#define mb() asm volatile("" ::: "memory") +#define wmb() asm volatile("" ::: "memory") #define rmb() asm volatile("" ::: "memory") #define cpu_relax() rmb() #define CPUINFO_PROC "Processor" #endif #ifdef __metag__ +/* XXX no clue */ #define rmb() asm volatile("" ::: "memory") #define cpu_relax() asm volatile("" ::: "memory") #define CPUINFO_PROC "CPU" diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 6e8acc9abe38..8ab1b5ae4a0e 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -189,7 +189,7 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md, /* * ensure all reads are done before we write the tail out. */ - /* mb(); */ + mb(); pc->data_tail = tail; }