Message ID | 20141117171812.22333.90395.stgit@ahduyck-server |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2014-11-17 at 09:18 -0800, Alexander Duyck wrote: > There are a number of situations where the mandatory barriers rmb() and > wmb() are used to order memory/memory operations in the device drivers > and those barriers are much heavier than they actually need to be. For > example in the case of PowerPC wmb() calls the heavy-weight sync > instruction when for memory/memory operations all that is really needed is > an lsync or eieio instruction. So essentially those are the same as the smp_* variants but not nop'ed out on !CONFIG_SMP right ? Ben. > This commit adds a fast (and loose) version of the mandatory memory > barriers rmb() and wmb(). The prefix to the name is actually based on the > version of the functions that already exist in the mips and tile trees. > However I thought it applicable since it gets at what we are trying to > accomplish with these barriers and somethat implies their risky nature. > > These new barriers are not as safe as the standard rmb() and wmb(). > Specifically they do not guarantee ordering between cache-enabled and > cache-inhibited memories. The primary use case for these would be to > enforce ordering of memory reads/writes when accessing cache-enabled memory > that is shared between the CPU and a device. > > It may also be noted that there is no fast_mb(). This is due to the fact > that most architectures didn't seem to have a good way to do a full memory > barrier quickly and so they usually resorted to an mb() for their smp_mb > call. As such there no point in adding a fast_mb() function if it is going > to map to mb() for all architectures anyway. > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Cc: Michael Ellerman <michael@ellerman.id.au> > Cc: Michael Neuling <mikey@neuling.org> > Cc: Russell King <linux@arm.linux.org.uk> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: David Miller <davem@davemloft.net> > Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> > --- > Documentation/memory-barriers.txt | 41 +++++++++++++++++++++++++++++++++++ > arch/arm/include/asm/barrier.h | 4 +++ > arch/arm64/include/asm/barrier.h | 3 +++ > arch/ia64/include/asm/barrier.h | 3 +++ > arch/metag/include/asm/barrier.h | 14 ++++++------ > arch/powerpc/include/asm/barrier.h | 22 +++++++++++-------- > arch/s390/include/asm/barrier.h | 2 ++ > arch/sparc/include/asm/barrier_64.h | 3 +++ > arch/x86/include/asm/barrier.h | 11 ++++++--- > arch/x86/um/asm/barrier.h | 13 ++++++----- > include/asm-generic/barrier.h | 8 +++++++ > 11 files changed, 98 insertions(+), 26 deletions(-) > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index 22a969c..fe55dea 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1615,6 +1615,47 @@ There are some more advanced barrier functions: > operations" subsection for information on where to use these. > > > + (*) fast_wmb(); > + (*) fast_rmb(); > + > + These are for use with memory based device I/O to guarantee the ordering > + of cache-enabled writes or reads with respect to other writes or reads > + to cache-enabled memory. > + > + For example, consider a device driver that shares memory with a device > + and uses a descriptor status value to indicate if the descriptor belongs > + to the device or the CPU, and a doorbell to notify it when new > + descriptors are available: > + > + if (desc->status != DEVICE_OWN) { > + /* do not read data until we own descriptor */ > + fast_rmb(); > + > + /* read/modify data */ > + read_data = desc->data; > + desc->data = write_data; > + > + /* flush modifications before status update */ > + fast_wmb(); > + > + /* assign ownership */ > + desc->status = DEVICE_OWN; > + > + /* force memory to sync before notifying device via MMIO */ > + wmb(); > + > + /* notify device of new descriptors */ > + writel(DESC_NOTIFY, doorbell); > + } > + > + The fast_rmb() allows us guarantee the device has released ownership > + before we read the data from the descriptor, and he fast_wmb() allows > + us to guarantee the data is written to the descriptor before the device > + can see it now has ownership. The wmb() is needed to guarantee that the > + cache-enabled memory writes have completed before attempting a write to > + the cache-inhibited MMIO region. > + > + > MMIO WRITE BARRIER > ------------------ > > diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h > index c6a3e73..c57903c 100644 > --- a/arch/arm/include/asm/barrier.h > +++ b/arch/arm/include/asm/barrier.h > @@ -43,10 +43,14 @@ > #define mb() do { dsb(); outer_sync(); } while (0) > #define rmb() dsb() > #define wmb() do { dsb(st); outer_sync(); } while (0) > +#define fast_rmb() dmb() > +#define fast_wmb() dmb(st) > #else > #define mb() barrier() > #define rmb() barrier() > #define wmb() barrier() > +#define fast_rmb() barrier() > +#define fast_wmb() barrier() > #endif > > #ifndef CONFIG_SMP > diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h > index 6389d60..3ca1a15 100644 > --- a/arch/arm64/include/asm/barrier.h > +++ b/arch/arm64/include/asm/barrier.h > @@ -32,6 +32,9 @@ > #define rmb() dsb(ld) > #define wmb() dsb(st) > > +#define fast_rmb() dmb(ld) > +#define fast_wmb() dmb(st) > + > #ifndef CONFIG_SMP > #define smp_mb() barrier() > #define smp_rmb() barrier() > diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h > index e8fffb0..997f5b0 100644 > --- a/arch/ia64/include/asm/barrier.h > +++ b/arch/ia64/include/asm/barrier.h > @@ -39,6 +39,9 @@ > #define rmb() mb() > #define wmb() mb() > > +#define fast_rmb() mb() > +#define fast_wmb() mb() > + > #ifdef CONFIG_SMP > # define smp_mb() mb() > #else > diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h > index 6d8b8c9..edddb6d 100644 > --- a/arch/metag/include/asm/barrier.h > +++ b/arch/metag/include/asm/barrier.h > @@ -4,8 +4,6 @@ > #include <asm/metag_mem.h> > > #define nop() asm volatile ("NOP") > -#define mb() wmb() > -#define rmb() barrier() > > #ifdef CONFIG_METAG_META21 > > @@ -41,11 +39,13 @@ static inline void wr_fence(void) > > #endif /* !CONFIG_METAG_META21 */ > > -static inline void wmb(void) > -{ > - /* flush writes through the write combiner */ > - wr_fence(); > -} > +/* flush writes through the write combiner */ > +#define mb() wr_fence() > +#define rmb() barrier() > +#define wmb() mb() > + > +#define fast_rmb() rmb() > +#define fast_wmb() wmb() > > #ifndef CONFIG_SMP > #define fence() do { } while (0) > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h > index cb6d66c..f480097 100644 > --- a/arch/powerpc/include/asm/barrier.h > +++ b/arch/powerpc/include/asm/barrier.h > @@ -36,22 +36,20 @@ > > #define set_mb(var, value) do { var = value; mb(); } while (0) > > -#ifdef CONFIG_SMP > - > #ifdef __SUBARCH_HAS_LWSYNC > # define SMPWMB LWSYNC > #else > # define SMPWMB eieio > #endif > > -#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > +#define fast_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > +#define fast_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > > +#ifdef CONFIG_SMP > #define smp_mb() mb() > -#define smp_rmb() __lwsync() > -#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > +#define smp_rmb() fast_rmb() > +#define smp_wmb() fast_wmb() > #else > -#define __lwsync() barrier() > - > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > @@ -69,10 +67,16 @@ > #define data_barrier(x) \ > asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); > > +/* > + * The use of smp_rmb() in these functions are actually meant to map from > + * smp_rmb()->fast_rmb()->LWSYNC. This way if smp is disabled then > + * smp_rmb()->barrier(), or if the platform doesn't support lwsync it will > + * map to the more heavy-weight sync. > + */ > #define smp_store_release(p, v) \ > do { \ > compiletime_assert_atomic_type(*p); \ > - __lwsync(); \ > + smp_rmb(); \ > ACCESS_ONCE(*p) = (v); \ > } while (0) > > @@ -80,7 +84,7 @@ do { \ > ({ \ > typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > compiletime_assert_atomic_type(*p); \ > - __lwsync(); \ > + smp_rmb(); \ > ___p1; \ > }) > > diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h > index 33d191d..9a31301 100644 > --- a/arch/s390/include/asm/barrier.h > +++ b/arch/s390/include/asm/barrier.h > @@ -24,6 +24,8 @@ > > #define rmb() mb() > #define wmb() mb() > +#define fast_rmb() rmb() > +#define fast_wmb() wmb() > #define smp_mb() mb() > #define smp_rmb() rmb() > #define smp_wmb() wmb() > diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h > index 6c974c0..eac2777 100644 > --- a/arch/sparc/include/asm/barrier_64.h > +++ b/arch/sparc/include/asm/barrier_64.h > @@ -37,6 +37,9 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ > #define rmb() __asm__ __volatile__("":::"memory") > #define wmb() __asm__ __volatile__("":::"memory") > > +#define fast_rmb() rmb() > +#define fast_wmb() wmb() > + > #define set_mb(__var, __value) \ > do { __var = __value; membar_safe("#StoreLoad"); } while(0) > > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h > index 5238000..cf440e7 100644 > --- a/arch/x86/include/asm/barrier.h > +++ b/arch/x86/include/asm/barrier.h > @@ -24,13 +24,16 @@ > #define wmb() asm volatile("sfence" ::: "memory") > #endif > > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > #ifdef CONFIG_X86_PPRO_FENCE > -# define smp_rmb() rmb() > +#define fast_rmb() rmb() > #else > -# define smp_rmb() barrier() > +#define fast_rmb() barrier() > #endif > +#define fast_wmb() barrier() > + > +#ifdef CONFIG_SMP > +#define smp_mb() mb() > +#define smp_rmb() fast_rmb() > #define smp_wmb() barrier() > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > #else /* !SMP */ > diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h > index d6511d9..2c0405d 100644 > --- a/arch/x86/um/asm/barrier.h > +++ b/arch/x86/um/asm/barrier.h > @@ -29,17 +29,18 @@ > > #endif /* CONFIG_X86_32 */ > > -#ifdef CONFIG_SMP > - > -#define smp_mb() mb() > #ifdef CONFIG_X86_PPRO_FENCE > -#define smp_rmb() rmb() > +#define fast_rmb() rmb() > #else /* CONFIG_X86_PPRO_FENCE */ > -#define smp_rmb() barrier() > +#define fast_rmb() barrier() > #endif /* CONFIG_X86_PPRO_FENCE */ > +#define fast_wmb() barrier() > > -#define smp_wmb() barrier() > +#ifdef CONFIG_SMP > > +#define smp_mb() mb() > +#define smp_rmb() fast_rmb() > +#define smp_wmb() fast_wmb() > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > > #else /* CONFIG_SMP */ > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > index 1402fa8..c1d60b9 100644 > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -42,6 +42,14 @@ > #define wmb() mb() > #endif > > +#ifndef fast_rmb > +#define fast_rmb() rmb() > +#endif > + > +#ifndef fast_wmb > +#define fast_wmb() wmb() > +#endif > + > #ifndef read_barrier_depends > #define read_barrier_depends() do { } while (0) > #endif > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 17, 2014 at 09:18:13AM -0800, Alexander Duyck wrote: > There are a number of situations where the mandatory barriers rmb() and > wmb() are used to order memory/memory operations in the device drivers > and those barriers are much heavier than they actually need to be. For > example in the case of PowerPC wmb() calls the heavy-weight sync > instruction when for memory/memory operations all that is really needed is > an lsync or eieio instruction. Is this still the case if one of the memory operations is MMIO? Last I knew, it was not. > This commit adds a fast (and loose) version of the mandatory memory > barriers rmb() and wmb(). The prefix to the name is actually based on the > version of the functions that already exist in the mips and tile trees. > However I thought it applicable since it gets at what we are trying to > accomplish with these barriers and somethat implies their risky nature. > > These new barriers are not as safe as the standard rmb() and wmb(). > Specifically they do not guarantee ordering between cache-enabled and > cache-inhibited memories. The primary use case for these would be to > enforce ordering of memory reads/writes when accessing cache-enabled memory > that is shared between the CPU and a device. > > It may also be noted that there is no fast_mb(). This is due to the fact > that most architectures didn't seem to have a good way to do a full memory > barrier quickly and so they usually resorted to an mb() for their smp_mb > call. As such there no point in adding a fast_mb() function if it is going > to map to mb() for all architectures anyway. I must confess that I still don't entirely understand the motivation. Some problems in PowerPC barrier.h called out below. Thanx, Paul > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Frederic Weisbecker <fweisbec@gmail.com> > Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Cc: Michael Ellerman <michael@ellerman.id.au> > Cc: Michael Neuling <mikey@neuling.org> > Cc: Russell King <linux@arm.linux.org.uk> > Cc: Geert Uytterhoeven <geert@linux-m68k.org> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: David Miller <davem@davemloft.net> > Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> > --- > Documentation/memory-barriers.txt | 41 +++++++++++++++++++++++++++++++++++ > arch/arm/include/asm/barrier.h | 4 +++ > arch/arm64/include/asm/barrier.h | 3 +++ > arch/ia64/include/asm/barrier.h | 3 +++ > arch/metag/include/asm/barrier.h | 14 ++++++------ > arch/powerpc/include/asm/barrier.h | 22 +++++++++++-------- > arch/s390/include/asm/barrier.h | 2 ++ > arch/sparc/include/asm/barrier_64.h | 3 +++ > arch/x86/include/asm/barrier.h | 11 ++++++--- > arch/x86/um/asm/barrier.h | 13 ++++++----- > include/asm-generic/barrier.h | 8 +++++++ > 11 files changed, 98 insertions(+), 26 deletions(-) > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index 22a969c..fe55dea 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1615,6 +1615,47 @@ There are some more advanced barrier functions: > operations" subsection for information on where to use these. > > > + (*) fast_wmb(); > + (*) fast_rmb(); > + > + These are for use with memory based device I/O to guarantee the ordering > + of cache-enabled writes or reads with respect to other writes or reads > + to cache-enabled memory. > + > + For example, consider a device driver that shares memory with a device > + and uses a descriptor status value to indicate if the descriptor belongs > + to the device or the CPU, and a doorbell to notify it when new > + descriptors are available: > + > + if (desc->status != DEVICE_OWN) { > + /* do not read data until we own descriptor */ > + fast_rmb(); > + > + /* read/modify data */ > + read_data = desc->data; > + desc->data = write_data; > + > + /* flush modifications before status update */ > + fast_wmb(); > + > + /* assign ownership */ > + desc->status = DEVICE_OWN; > + > + /* force memory to sync before notifying device via MMIO */ > + wmb(); > + > + /* notify device of new descriptors */ > + writel(DESC_NOTIFY, doorbell); > + } > + > + The fast_rmb() allows us guarantee the device has released ownership > + before we read the data from the descriptor, and he fast_wmb() allows > + us to guarantee the data is written to the descriptor before the device > + can see it now has ownership. The wmb() is needed to guarantee that the > + cache-enabled memory writes have completed before attempting a write to > + the cache-inhibited MMIO region. > + > + > MMIO WRITE BARRIER > ------------------ > > diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h > index c6a3e73..c57903c 100644 > --- a/arch/arm/include/asm/barrier.h > +++ b/arch/arm/include/asm/barrier.h > @@ -43,10 +43,14 @@ > #define mb() do { dsb(); outer_sync(); } while (0) > #define rmb() dsb() > #define wmb() do { dsb(st); outer_sync(); } while (0) > +#define fast_rmb() dmb() > +#define fast_wmb() dmb(st) > #else > #define mb() barrier() > #define rmb() barrier() > #define wmb() barrier() > +#define fast_rmb() barrier() > +#define fast_wmb() barrier() > #endif > > #ifndef CONFIG_SMP > diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h > index 6389d60..3ca1a15 100644 > --- a/arch/arm64/include/asm/barrier.h > +++ b/arch/arm64/include/asm/barrier.h > @@ -32,6 +32,9 @@ > #define rmb() dsb(ld) > #define wmb() dsb(st) > > +#define fast_rmb() dmb(ld) > +#define fast_wmb() dmb(st) > + > #ifndef CONFIG_SMP > #define smp_mb() barrier() > #define smp_rmb() barrier() > diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h > index e8fffb0..997f5b0 100644 > --- a/arch/ia64/include/asm/barrier.h > +++ b/arch/ia64/include/asm/barrier.h > @@ -39,6 +39,9 @@ > #define rmb() mb() > #define wmb() mb() > > +#define fast_rmb() mb() > +#define fast_wmb() mb() > + > #ifdef CONFIG_SMP > # define smp_mb() mb() > #else > diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h > index 6d8b8c9..edddb6d 100644 > --- a/arch/metag/include/asm/barrier.h > +++ b/arch/metag/include/asm/barrier.h > @@ -4,8 +4,6 @@ > #include <asm/metag_mem.h> > > #define nop() asm volatile ("NOP") > -#define mb() wmb() > -#define rmb() barrier() > > #ifdef CONFIG_METAG_META21 > > @@ -41,11 +39,13 @@ static inline void wr_fence(void) > > #endif /* !CONFIG_METAG_META21 */ > > -static inline void wmb(void) > -{ > - /* flush writes through the write combiner */ > - wr_fence(); > -} > +/* flush writes through the write combiner */ > +#define mb() wr_fence() > +#define rmb() barrier() > +#define wmb() mb() > + > +#define fast_rmb() rmb() > +#define fast_wmb() wmb() > > #ifndef CONFIG_SMP > #define fence() do { } while (0) > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h > index cb6d66c..f480097 100644 > --- a/arch/powerpc/include/asm/barrier.h > +++ b/arch/powerpc/include/asm/barrier.h > @@ -36,22 +36,20 @@ > > #define set_mb(var, value) do { var = value; mb(); } while (0) > > -#ifdef CONFIG_SMP > - > #ifdef __SUBARCH_HAS_LWSYNC > # define SMPWMB LWSYNC > #else > # define SMPWMB eieio > #endif > > -#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > +#define fast_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > +#define fast_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > > +#ifdef CONFIG_SMP > #define smp_mb() mb() > -#define smp_rmb() __lwsync() > -#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > +#define smp_rmb() fast_rmb() > +#define smp_wmb() fast_wmb() > #else > -#define __lwsync() barrier() > - > #define smp_mb() barrier() > #define smp_rmb() barrier() > #define smp_wmb() barrier() > @@ -69,10 +67,16 @@ > #define data_barrier(x) \ > asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); > > +/* > + * The use of smp_rmb() in these functions are actually meant to map from > + * smp_rmb()->fast_rmb()->LWSYNC. This way if smp is disabled then > + * smp_rmb()->barrier(), or if the platform doesn't support lwsync it will > + * map to the more heavy-weight sync. > + */ > #define smp_store_release(p, v) \ > do { \ > compiletime_assert_atomic_type(*p); \ > - __lwsync(); \ > + smp_rmb(); \ This is not good at all. For smp_store_release(), we absolutely must order prior loads and stores against the assignment on the following line. This is not something that smp_rmb() does, nor is it something that smp_wmb() does. Yes, it might happen to now, but this could easily break in the future -- plus this change is extremely misleading. The original __lwsync() is much more clear. > ACCESS_ONCE(*p) = (v); \ > } while (0) > > @@ -80,7 +84,7 @@ do { \ > ({ \ > typeof(*p) ___p1 = ACCESS_ONCE(*p); \ > compiletime_assert_atomic_type(*p); \ > - __lwsync(); \ > + smp_rmb(); \ > ___p1; \ > }) > > diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h > index 33d191d..9a31301 100644 > --- a/arch/s390/include/asm/barrier.h > +++ b/arch/s390/include/asm/barrier.h > @@ -24,6 +24,8 @@ > > #define rmb() mb() > #define wmb() mb() > +#define fast_rmb() rmb() > +#define fast_wmb() wmb() > #define smp_mb() mb() > #define smp_rmb() rmb() > #define smp_wmb() wmb() > diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h > index 6c974c0..eac2777 100644 > --- a/arch/sparc/include/asm/barrier_64.h > +++ b/arch/sparc/include/asm/barrier_64.h > @@ -37,6 +37,9 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ > #define rmb() __asm__ __volatile__("":::"memory") > #define wmb() __asm__ __volatile__("":::"memory") > > +#define fast_rmb() rmb() > +#define fast_wmb() wmb() > + > #define set_mb(__var, __value) \ > do { __var = __value; membar_safe("#StoreLoad"); } while(0) > > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h > index 5238000..cf440e7 100644 > --- a/arch/x86/include/asm/barrier.h > +++ b/arch/x86/include/asm/barrier.h > @@ -24,13 +24,16 @@ > #define wmb() asm volatile("sfence" ::: "memory") > #endif > > -#ifdef CONFIG_SMP > -#define smp_mb() mb() > #ifdef CONFIG_X86_PPRO_FENCE > -# define smp_rmb() rmb() > +#define fast_rmb() rmb() > #else > -# define smp_rmb() barrier() > +#define fast_rmb() barrier() > #endif > +#define fast_wmb() barrier() > + > +#ifdef CONFIG_SMP > +#define smp_mb() mb() > +#define smp_rmb() fast_rmb() > #define smp_wmb() barrier() > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > #else /* !SMP */ > diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h > index d6511d9..2c0405d 100644 > --- a/arch/x86/um/asm/barrier.h > +++ b/arch/x86/um/asm/barrier.h > @@ -29,17 +29,18 @@ > > #endif /* CONFIG_X86_32 */ > > -#ifdef CONFIG_SMP > - > -#define smp_mb() mb() > #ifdef CONFIG_X86_PPRO_FENCE > -#define smp_rmb() rmb() > +#define fast_rmb() rmb() > #else /* CONFIG_X86_PPRO_FENCE */ > -#define smp_rmb() barrier() > +#define fast_rmb() barrier() > #endif /* CONFIG_X86_PPRO_FENCE */ > +#define fast_wmb() barrier() > > -#define smp_wmb() barrier() > +#ifdef CONFIG_SMP > > +#define smp_mb() mb() > +#define smp_rmb() fast_rmb() > +#define smp_wmb() fast_wmb() > #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) > > #else /* CONFIG_SMP */ > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > index 1402fa8..c1d60b9 100644 > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -42,6 +42,14 @@ > #define wmb() mb() > #endif > > +#ifndef fast_rmb > +#define fast_rmb() rmb() > +#endif > + > +#ifndef fast_wmb > +#define fast_wmb() wmb() > +#endif > + > #ifndef read_barrier_depends > #define read_barrier_depends() do { } while (0) > #endif > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/17/2014 12:04 PM, Benjamin Herrenschmidt wrote: > On Mon, 2014-11-17 at 09:18 -0800, Alexander Duyck wrote: >> There are a number of situations where the mandatory barriers rmb() and >> wmb() are used to order memory/memory operations in the device drivers >> and those barriers are much heavier than they actually need to be. For >> example in the case of PowerPC wmb() calls the heavy-weight sync >> instruction when for memory/memory operations all that is really needed is >> an lsync or eieio instruction. > So essentially those are the same as the smp_* variants but not nop'ed > out on !CONFIG_SMP right ? > > Ben. > Yes and no. So for example on ARM I used the dmb() operation, however I have to use the barrier at the system level instead of just the inner shared domain. However on many other architectures they are just the same as the smp_* variants. Basically the resultant code is somewhere between the smp and non-smp barriers in terms of what they cover. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 17, 2014 at 9:18 AM, Alexander Duyck <alexander.h.duyck@redhat.com> wrote: > There are a number of situations where the mandatory barriers rmb() and > wmb() are used to order memory/memory operations in the device drivers > and those barriers are much heavier than they actually need to be. Ugh. I absolutely despise the name. It's not "fast". It's just limited. It's the same as "smp_*mb()", in that it works on cacheable memory, but it actually stays around even for non-SMP builds. So I think the name is actively misleading. Naming should be about what it does, not about some kind of PR thing that confuses people into thinking it's "better". Maybe "dma_*mb()" would be acceptable, and ends up having the same naming convention as "smb_*mb()", and explains what it's about. And yes, in the same spirit, it would probably be good to try to eventually get rid of the plain "*mb()" functions, and perhaps call them "mmio_*mb()" to clarify that they are about ordering memory wrt mmio. Hmm? Linus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/17/2014 12:18 PM, Paul E. McKenney wrote: > On Mon, Nov 17, 2014 at 09:18:13AM -0800, Alexander Duyck wrote: >> There are a number of situations where the mandatory barriers rmb() and >> wmb() are used to order memory/memory operations in the device drivers >> and those barriers are much heavier than they actually need to be. For >> example in the case of PowerPC wmb() calls the heavy-weight sync >> instruction when for memory/memory operations all that is really needed is >> an lsync or eieio instruction. > > Is this still the case if one of the memory operations is MMIO? Last > I knew, it was not. This barrier is not meant for use in MMIO operations, for that you still need a full barrier as I call out in the documentation section. What the barrier does is allow for a lightweight barrier for accesses to coherent system memory. So for example many device drivers have to perform a read of the descriptor to see if the device is done with it. We need an rmb() following that check to prevent any other accesses. Right now on x86 that rmb() becomes an lfence instruction and is quite expensive, and as it turns out we don't need it since the x86 doesn't reorder reads. The same kind of thing applies to PowerPC, only in that case we use a sync when what we really need is a lwsync. >> This commit adds a fast (and loose) version of the mandatory memory >> barriers rmb() and wmb(). The prefix to the name is actually based on the >> version of the functions that already exist in the mips and tile trees. >> However I thought it applicable since it gets at what we are trying to >> accomplish with these barriers and somethat implies their risky nature. >> >> These new barriers are not as safe as the standard rmb() and wmb(). >> Specifically they do not guarantee ordering between cache-enabled and >> cache-inhibited memories. The primary use case for these would be to >> enforce ordering of memory reads/writes when accessing cache-enabled memory >> that is shared between the CPU and a device. >> >> It may also be noted that there is no fast_mb(). This is due to the fact >> that most architectures didn't seem to have a good way to do a full memory >> barrier quickly and so they usually resorted to an mb() for their smp_mb >> call. As such there no point in adding a fast_mb() function if it is going >> to map to mb() for all architectures anyway. > > I must confess that I still don't entirely understand the motivation. The motivation is to provide finer grained barriers. So this provides an in-between that allows us to "choose the right hammer". In the case of PowerPC it is the difference between sync/lwsync, on ARM it is dsb()/dmb(), and on x86 it is lfence/barrier(). > Some problems in PowerPC barrier.h called out below. > > Thanx, Paul > <snip> >> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt >> index 22a969c..fe55dea 100644 >> --- a/Documentation/memory-barriers.txt >> +++ b/Documentation/memory-barriers.txt >> @@ -1615,6 +1615,47 @@ There are some more advanced barrier functions: >> operations" subsection for information on where to use these. >> >> >> + (*) fast_wmb(); >> + (*) fast_rmb(); >> + >> + These are for use with memory based device I/O to guarantee the ordering >> + of cache-enabled writes or reads with respect to other writes or reads >> + to cache-enabled memory. >> + >> + For example, consider a device driver that shares memory with a device >> + and uses a descriptor status value to indicate if the descriptor belongs >> + to the device or the CPU, and a doorbell to notify it when new >> + descriptors are available: >> + >> + if (desc->status != DEVICE_OWN) { >> + /* do not read data until we own descriptor */ >> + fast_rmb(); >> + >> + /* read/modify data */ >> + read_data = desc->data; >> + desc->data = write_data; >> + >> + /* flush modifications before status update */ >> + fast_wmb(); >> + >> + /* assign ownership */ >> + desc->status = DEVICE_OWN; >> + >> + /* force memory to sync before notifying device via MMIO */ >> + wmb(); >> + >> + /* notify device of new descriptors */ >> + writel(DESC_NOTIFY, doorbell); >> + } >> + >> + The fast_rmb() allows us guarantee the device has released ownership >> + before we read the data from the descriptor, and he fast_wmb() allows >> + us to guarantee the data is written to the descriptor before the device >> + can see it now has ownership. The wmb() is needed to guarantee that the >> + cache-enabled memory writes have completed before attempting a write to >> + the cache-inhibited MMIO region. >> + >> + >> MMIO WRITE BARRIER >> ------------------ The general idea is that the device/CPU follow acquire/release style semantics and we need the lightweight barriers to enforce ordering to prevent us from accessing the descriptors during those periods where we do not own them. As the example shows we still need a full barrier when going between MMIO and standard memory. Hopefully that is what is conveyed in the documentation bits I have above. <snip> >> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h >> index cb6d66c..f480097 100644 >> --- a/arch/powerpc/include/asm/barrier.h >> +++ b/arch/powerpc/include/asm/barrier.h >> @@ -36,22 +36,20 @@ >> >> #define set_mb(var, value) do { var = value; mb(); } while (0) >> >> -#ifdef CONFIG_SMP >> - >> #ifdef __SUBARCH_HAS_LWSYNC >> # define SMPWMB LWSYNC >> #else >> # define SMPWMB eieio >> #endif >> >> -#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") >> +#define fast_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") >> +#define fast_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") >> >> +#ifdef CONFIG_SMP >> #define smp_mb() mb() >> -#define smp_rmb() __lwsync() >> -#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") >> +#define smp_rmb() fast_rmb() >> +#define smp_wmb() fast_wmb() >> #else >> -#define __lwsync() barrier() >> - >> #define smp_mb() barrier() >> #define smp_rmb() barrier() >> #define smp_wmb() barrier() >> @@ -69,10 +67,16 @@ >> #define data_barrier(x) \ >> asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); >> >> +/* >> + * The use of smp_rmb() in these functions are actually meant to map from >> + * smp_rmb()->fast_rmb()->LWSYNC. This way if smp is disabled then >> + * smp_rmb()->barrier(), or if the platform doesn't support lwsync it will >> + * map to the more heavy-weight sync. >> + */ >> #define smp_store_release(p, v) \ >> do { \ >> compiletime_assert_atomic_type(*p); \ >> - __lwsync(); \ >> + smp_rmb(); \ > > This is not good at all. For smp_store_release(), we absolutely > must order prior loads and stores against the assignment on the following > line. This is not something that smp_rmb() does, nor is it something > that smp_wmb() does. Yes, it might happen to now, but this could easily > break in the future -- plus this change is extremely misleading. > > The original __lwsync() is much more clear. The problem I had with __lwsync is that it really wasn't all that clear. It was the lwsync instruction if SMP was enabled, otherwise it was just a barrier call. What I did is move the definition of __lwsync in the SMP case into fast_rmb, which in turn is accessed by smp_rmb. I tried to make this clear in the comment just above the two calls. The resultant assembly code should be exactly the same. What I could do is have it added back as a smp_lwsync if that works for you. That way there is something there to give you a hint that it becomes a barrier() call as soon as SMP is disabled. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/17/2014 12:52 PM, Linus Torvalds wrote: > On Mon, Nov 17, 2014 at 9:18 AM, Alexander Duyck > <alexander.h.duyck@redhat.com> wrote: >> There are a number of situations where the mandatory barriers rmb() and >> wmb() are used to order memory/memory operations in the device drivers >> and those barriers are much heavier than they actually need to be. > Ugh. I absolutely despise the name. > > It's not "fast". It's just limited. It's the same as "smp_*mb()", in > that it works on cacheable memory, but it actually stays around even > for non-SMP builds. > > So I think the name is actively misleading. > > Naming should be about what it does, not about some kind of PR thing > that confuses people into thinking it's "better". > > Maybe "dma_*mb()" would be acceptable, and ends up having the same > naming convention as "smb_*mb()", and explains what it's about. What would you think of the name "coherent_*mb()"? I would prefer to avoid dma in the name since, at least in my mind, that implies MMIO. It also ties in well with dma_alloc_coherent/dma_free_coherent which is what would typically be used to allocate the memory we would be using the barrier to protect anyway. > And yes, in the same spirit, it would probably be good to try to > eventually get rid of the plain "*mb()" functions, and perhaps call > them "mmio_*mb()" to clarify that they are about ordering memory wrt > mmio. > > Hmm? > > Linus I will work on pulling all of the coherent barrier cases out of using the plain "*mb()" calls first. We need to sort that out before we could look at renaming the plain barrier functions. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 17, 2014 at 01:11:57PM -0800, Alexander Duyck wrote: > On 11/17/2014 12:18 PM, Paul E. McKenney wrote: > >On Mon, Nov 17, 2014 at 09:18:13AM -0800, Alexander Duyck wrote: > >>There are a number of situations where the mandatory barriers rmb() and > >>wmb() are used to order memory/memory operations in the device drivers > >>and those barriers are much heavier than they actually need to be. For > >>example in the case of PowerPC wmb() calls the heavy-weight sync > >>instruction when for memory/memory operations all that is really needed is > >>an lsync or eieio instruction. > > > >Is this still the case if one of the memory operations is MMIO? Last > >I knew, it was not. > > This barrier is not meant for use in MMIO operations, for that you > still need a full barrier as I call out in the documentation > section. What the barrier does is allow for a lightweight barrier > for accesses to coherent system memory. So for example many device > drivers have to perform a read of the descriptor to see if the > device is done with it. We need an rmb() following that check to > prevent any other accesses. > > Right now on x86 that rmb() becomes an lfence instruction and is > quite expensive, and as it turns out we don't need it since the x86 > doesn't reorder reads. The same kind of thing applies to PowerPC, > only in that case we use a sync when what we really need is a > lwsync. Would it make sense to have a memory barrier that enforced the non-store-buffer orderings, that is prior reads before later reads and writes and prior writes before later writes? This was discussed earlier this year ((http://lwn.net/Articles/586838/, https://lwn.net/Articles/588300/). If I recall correctly, one of the biggest obstacles was the name. ;-) > >>This commit adds a fast (and loose) version of the mandatory memory > >>barriers rmb() and wmb(). The prefix to the name is actually based on the > >>version of the functions that already exist in the mips and tile trees. > >>However I thought it applicable since it gets at what we are trying to > >>accomplish with these barriers and somethat implies their risky nature. > >> > >>These new barriers are not as safe as the standard rmb() and wmb(). > >>Specifically they do not guarantee ordering between cache-enabled and > >>cache-inhibited memories. The primary use case for these would be to > >>enforce ordering of memory reads/writes when accessing cache-enabled memory > >>that is shared between the CPU and a device. > >> > >>It may also be noted that there is no fast_mb(). This is due to the fact > >>that most architectures didn't seem to have a good way to do a full memory > >>barrier quickly and so they usually resorted to an mb() for their smp_mb > >>call. As such there no point in adding a fast_mb() function if it is going > >>to map to mb() for all architectures anyway. > > > >I must confess that I still don't entirely understand the motivation. > > The motivation is to provide finer grained barriers. So this > provides an in-between that allows us to "choose the right hammer". > In the case of PowerPC it is the difference between sync/lwsync, on > ARM it is dsb()/dmb(), and on x86 it is lfence/barrier(). Ah, so ARM will motivate a fast_wmb(), given its instruction set. > >Some problems in PowerPC barrier.h called out below. > > > > Thanx, Paul > > > > <snip> > > >>diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > >>index 22a969c..fe55dea 100644 > >>--- a/Documentation/memory-barriers.txt > >>+++ b/Documentation/memory-barriers.txt > >>@@ -1615,6 +1615,47 @@ There are some more advanced barrier functions: > >> operations" subsection for information on where to use these. > >> > >> > >>+ (*) fast_wmb(); > >>+ (*) fast_rmb(); > >>+ > >>+ These are for use with memory based device I/O to guarantee the ordering > >>+ of cache-enabled writes or reads with respect to other writes or reads > >>+ to cache-enabled memory. > >>+ > >>+ For example, consider a device driver that shares memory with a device > >>+ and uses a descriptor status value to indicate if the descriptor belongs > >>+ to the device or the CPU, and a doorbell to notify it when new > >>+ descriptors are available: > >>+ > >>+ if (desc->status != DEVICE_OWN) { > >>+ /* do not read data until we own descriptor */ > >>+ fast_rmb(); > >>+ > >>+ /* read/modify data */ > >>+ read_data = desc->data; > >>+ desc->data = write_data; > >>+ > >>+ /* flush modifications before status update */ > >>+ fast_wmb(); > >>+ > >>+ /* assign ownership */ > >>+ desc->status = DEVICE_OWN; > >>+ > >>+ /* force memory to sync before notifying device via MMIO */ > >>+ wmb(); > >>+ > >>+ /* notify device of new descriptors */ > >>+ writel(DESC_NOTIFY, doorbell); > >>+ } > >>+ > >>+ The fast_rmb() allows us guarantee the device has released ownership > >>+ before we read the data from the descriptor, and he fast_wmb() allows > >>+ us to guarantee the data is written to the descriptor before the device > >>+ can see it now has ownership. The wmb() is needed to guarantee that the > >>+ cache-enabled memory writes have completed before attempting a write to > >>+ the cache-inhibited MMIO region. > >>+ > >>+ > >> MMIO WRITE BARRIER > >> ------------------ > > The general idea is that the device/CPU follow acquire/release style > semantics and we need the lightweight barriers to enforce ordering > to prevent us from accessing the descriptors during those periods > where we do not own them. As the example shows we still need a full > barrier when going between MMIO and standard memory. Hopefully that > is what is conveyed in the documentation bits I have above. Thank you for the clarification. > <snip> > > >>diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h > >>index cb6d66c..f480097 100644 > >>--- a/arch/powerpc/include/asm/barrier.h > >>+++ b/arch/powerpc/include/asm/barrier.h > >>@@ -36,22 +36,20 @@ > >> > >> #define set_mb(var, value) do { var = value; mb(); } while (0) > >> > >>-#ifdef CONFIG_SMP > >>- > >> #ifdef __SUBARCH_HAS_LWSYNC > >> # define SMPWMB LWSYNC > >> #else > >> # define SMPWMB eieio > >> #endif > >> > >>-#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > >>+#define fast_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") > >>+#define fast_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > >> > >>+#ifdef CONFIG_SMP > >> #define smp_mb() mb() > >>-#define smp_rmb() __lwsync() > >>-#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") > >>+#define smp_rmb() fast_rmb() > >>+#define smp_wmb() fast_wmb() > >> #else > >>-#define __lwsync() barrier() > >>- > >> #define smp_mb() barrier() > >> #define smp_rmb() barrier() > >> #define smp_wmb() barrier() > >>@@ -69,10 +67,16 @@ > >> #define data_barrier(x) \ > >> asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); > >> > >>+/* > >>+ * The use of smp_rmb() in these functions are actually meant to map from > >>+ * smp_rmb()->fast_rmb()->LWSYNC. This way if smp is disabled then > >>+ * smp_rmb()->barrier(), or if the platform doesn't support lwsync it will > >>+ * map to the more heavy-weight sync. > >>+ */ > >> #define smp_store_release(p, v) \ > >> do { \ > >> compiletime_assert_atomic_type(*p); \ > >>- __lwsync(); \ > >>+ smp_rmb(); \ > > > >This is not good at all. For smp_store_release(), we absolutely > >must order prior loads and stores against the assignment on the following > >line. This is not something that smp_rmb() does, nor is it something > >that smp_wmb() does. Yes, it might happen to now, but this could easily > >break in the future -- plus this change is extremely misleading. > > > >The original __lwsync() is much more clear. > > The problem I had with __lwsync is that it really wasn't all that > clear. It was the lwsync instruction if SMP was enabled, otherwise > it was just a barrier call. What I did is move the definition of > __lwsync in the SMP case into fast_rmb, which in turn is accessed by > smp_rmb. I tried to make this clear in the comment just above the > two calls. The resultant assembly code should be exactly the same. > > What I could do is have it added back as a smp_lwsync if that works > for you. That way there is something there to give you a hint that > it becomes a barrier() call as soon as SMP is disabled. An smp_lwsync() would be a great improvement! Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-11-17 at 12:18 -0800, Paul E. McKenney wrote: > On Mon, Nov 17, 2014 at 09:18:13AM -0800, Alexander Duyck wrote: > > There are a number of situations where the mandatory barriers rmb() and > > wmb() are used to order memory/memory operations in the device drivers > > and those barriers are much heavier than they actually need to be. For > > example in the case of PowerPC wmb() calls the heavy-weight sync > > instruction when for memory/memory operations all that is really needed is > > an lsync or eieio instruction. > > Is this still the case if one of the memory operations is MMIO? Last > I knew, it was not. I *think* (Alexander, correct me if I'm wrong), that what he wants is the memory<->memory barriers (the smp_* ones) basically for ordering his loads or stores from/to the DMA area. The problem is that the smp_* ones aren't compiled for !CONFIG_SMP IE. Something like: - Read valid bit from descriptor - Read rest of descriptor That needs an rmb of some sort in between, but a full blown "rmb" will also order vs. MMIOs and end up being a full sync, while an smp_rmb is a lwsync which is more lightweight. Similarily: - Populate descriptor - Write valid bit Same deal with wmb ... Basically, rmb and wmb order both cachable and non-cachable (memory and MMIO) which makes them needlessly heavy on powerpc and possibly others when all you need is to order memory accesses to some DMA data structures. In that case you really want the normal smp_* variants except they may not be around... Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-11-17 at 12:24 -0800, Alexander Duyck wrote: > Yes and no. So for example on ARM I used the dmb() operation, however > I > have to use the barrier at the system level instead of just the inner > shared domain. However on many other architectures they are just the > same as the smp_* variants. > > Basically the resultant code is somewhere between the smp and non-smp > barriers in terms of what they cover. There I don't quite follow you. You need to explain better especially in the documentation because otherwise people will get it wrong... If it's ordering in the coherent domain, I fail to see how a DMA agent is different than another processor when it comes to barriers, so I fail to see the difference with smp_* I understand the MMIO vs. memory issue, we do have the same on powerpc, but that other aspect eludes me. Ben. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-11-17 at 12:52 -0800, Linus Torvalds wrote: > Maybe "dma_*mb()" would be acceptable, and ends up having the same > naming convention as "smb_*mb()", and explains what it's about. Yes, that was what I was about to suggest as well. > And yes, in the same spirit, it would probably be good to try to > eventually get rid of the plain "*mb()" functions, and perhaps call > them "mmio_*mb()" to clarify that they are about ordering memory wrt > mmio. It will always be somewhat unclear to users who don't read the doc anyway :) IE. the dma_* ones do only DMA vs DMA (or vs other processors) but the mmio_* one do anything vs anything. Not a huge deal tho. I still like dma_* for Alexander's new stuff but I wouldn't bother with changing the existing ones. Ben. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-11-17 at 13:54 -0800, Alexander Duyck wrote: > What would you think of the name "coherent_*mb()"? I would prefer to > avoid dma in the name since, at least in my mind, that implies MMIO. I'm lazy, I like typing less, so I like dma_* but I don't object to coherent_* if at least one more person thinks it makes the semantics clearer :) > It also ties in well with dma_alloc_coherent/dma_free_coherent which is > what would typically be used to allocate the memory we would be using > the barrier to protect anyway. Agreed. > > And yes, in the same spirit, it would probably be good to try to > > eventually get rid of the plain "*mb()" functions, and perhaps call > > them "mmio_*mb()" to clarify that they are about ordering memory wrt > > mmio. > > > > Hmm? > > > > Linus > > I will work on pulling all of the coherent barrier cases out of using > the plain "*mb()" calls first. We need to sort that out before we could > look at renaming the plain barrier functions. Makes sense. Cheers, Ben. > - Alex > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/17/2014 04:39 PM, Benjamin Herrenschmidt wrote: > On Mon, 2014-11-17 at 12:24 -0800, Alexander Duyck wrote: >> Yes and no. So for example on ARM I used the dmb() operation, however >> I >> have to use the barrier at the system level instead of just the inner >> shared domain. However on many other architectures they are just the >> same as the smp_* variants. >> >> Basically the resultant code is somewhere between the smp and non-smp >> barriers in terms of what they cover. > There I don't quite follow you. You need to explain better especially in > the documentation because otherwise people will get it wrong... > > If it's ordering in the coherent domain, I fail to see how a DMA agent > is different than another processor when it comes to barriers, so I fail > to see the difference with smp_* > > I understand the MMIO vs. memory issue, we do have the same on powerpc, > but that other aspect eludes me. > > Ben. ARM adds some funky things. They have two different types of primitives, a dmb() which is a data memory barrier, and a dsb() which is a data synchronization barrier. Then with each of those they have the "domains" the barriers are effective within. So for example on ARM a rmb() is dsb(sy) which means it is a system wide synchronization barrier which stops execution on the CPU core until the read completes. However the smp_rmb() is a dmb(ish) which means it is only a barrier as far as the inner shareable domain which I believe only goes as far as the local shared cache hierarchy and only guarantees read ordering without necessarily halting the CPU or stopping in-order speculative reads. So what a coherent_rmb() would be in my setup is dmb(sy) which means the barrier runs all the way out to memory, and it is allowed to speculative read as long as it does it in order. If it is still unclear you might check out Will Deacon's talk on the topic at https://www.youtube.com/watch?v=6ORn6_35kKo, at about 7:00 in he explains the whole domains thing, and at 13:30 he explains dmb()/dsb(). - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/17/2014 03:17 PM, Paul E. McKenney wrote: > On Mon, Nov 17, 2014 at 01:11:57PM -0800, Alexander Duyck wrote: >> On 11/17/2014 12:18 PM, Paul E. McKenney wrote: >>> On Mon, Nov 17, 2014 at 09:18:13AM -0800, Alexander Duyck wrote: >>>> There are a number of situations where the mandatory barriers rmb() and >>>> wmb() are used to order memory/memory operations in the device drivers >>>> and those barriers are much heavier than they actually need to be. For >>>> example in the case of PowerPC wmb() calls the heavy-weight sync >>>> instruction when for memory/memory operations all that is really needed is >>>> an lsync or eieio instruction. >>> Is this still the case if one of the memory operations is MMIO? Last >>> I knew, it was not. >> This barrier is not meant for use in MMIO operations, for that you >> still need a full barrier as I call out in the documentation >> section. What the barrier does is allow for a lightweight barrier >> for accesses to coherent system memory. So for example many device >> drivers have to perform a read of the descriptor to see if the >> device is done with it. We need an rmb() following that check to >> prevent any other accesses. >> >> Right now on x86 that rmb() becomes an lfence instruction and is >> quite expensive, and as it turns out we don't need it since the x86 >> doesn't reorder reads. The same kind of thing applies to PowerPC, >> only in that case we use a sync when what we really need is a >> lwsync. > Would it make sense to have a memory barrier that enforced the > non-store-buffer orderings, that is prior reads before later > reads and writes and prior writes before later writes? This was > discussed earlier this year ((http://lwn.net/Articles/586838/, > https://lwn.net/Articles/588300/). If I recall correctly, one of > the biggest obstacles was the name. ;-) You''re talking bout acquire and release barriers, or something else? For most devices the two barriers I have defined should do the job, I had tried doing load_acquire/store_release type functions in the previous path set and that was shot down as the preference seemed to be for barriers instead to remove some of the abstraction as to what was occurring. >>>> This commit adds a fast (and loose) version of the mandatory memory >>>> barriers rmb() and wmb(). The prefix to the name is actually based on the >>>> version of the functions that already exist in the mips and tile trees. >>>> However I thought it applicable since it gets at what we are trying to >>>> accomplish with these barriers and somethat implies their risky nature. >>>> >>>> These new barriers are not as safe as the standard rmb() and wmb(). >>>> Specifically they do not guarantee ordering between cache-enabled and >>>> cache-inhibited memories. The primary use case for these would be to >>>> enforce ordering of memory reads/writes when accessing cache-enabled memory >>>> that is shared between the CPU and a device. >>>> >>>> It may also be noted that there is no fast_mb(). This is due to the fact >>>> that most architectures didn't seem to have a good way to do a full memory >>>> barrier quickly and so they usually resorted to an mb() for their smp_mb >>>> call. As such there no point in adding a fast_mb() function if it is going >>>> to map to mb() for all architectures anyway. >>> I must confess that I still don't entirely understand the motivation. >> The motivation is to provide finer grained barriers. So this >> provides an in-between that allows us to "choose the right hammer". >> In the case of PowerPC it is the difference between sync/lwsync, on >> ARM it is dsb()/dmb(), and on x86 it is lfence/barrier(). > Ah, so ARM will motivate a fast_wmb(), given its instruction set. Actually it was x86 that started this, lfence or sfence is much more expensive then just barrier(). From there I realized we had issues in PowerPC as well with sync vs lwsync, and ARM with dsb() vs dmb(). >> <snip> >> >>>> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h >>>> index cb6d66c..f480097 100644 >>>> --- a/arch/powerpc/include/asm/barrier.h >>>> +++ b/arch/powerpc/include/asm/barrier.h >>>> @@ -36,22 +36,20 @@ >>>> >>>> #define set_mb(var, value) do { var = value; mb(); } while (0) >>>> >>>> -#ifdef CONFIG_SMP >>>> - >>>> #ifdef __SUBARCH_HAS_LWSYNC >>>> # define SMPWMB LWSYNC >>>> #else >>>> # define SMPWMB eieio >>>> #endif >>>> >>>> -#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") >>>> +#define fast_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") >>>> +#define fast_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") >>>> >>>> +#ifdef CONFIG_SMP >>>> #define smp_mb() mb() >>>> -#define smp_rmb() __lwsync() >>>> -#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") >>>> +#define smp_rmb() fast_rmb() >>>> +#define smp_wmb() fast_wmb() >>>> #else >>>> -#define __lwsync() barrier() >>>> - >>>> #define smp_mb() barrier() >>>> #define smp_rmb() barrier() >>>> #define smp_wmb() barrier() >>>> @@ -69,10 +67,16 @@ >>>> #define data_barrier(x) \ >>>> asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); >>>> >>>> +/* >>>> + * The use of smp_rmb() in these functions are actually meant to map from >>>> + * smp_rmb()->fast_rmb()->LWSYNC. This way if smp is disabled then >>>> + * smp_rmb()->barrier(), or if the platform doesn't support lwsync it will >>>> + * map to the more heavy-weight sync. >>>> + */ >>>> #define smp_store_release(p, v) \ >>>> do { \ >>>> compiletime_assert_atomic_type(*p); \ >>>> - __lwsync(); \ >>>> + smp_rmb(); \ >>> This is not good at all. For smp_store_release(), we absolutely >>> must order prior loads and stores against the assignment on the following >>> line. This is not something that smp_rmb() does, nor is it something >>> that smp_wmb() does. Yes, it might happen to now, but this could easily >>> break in the future -- plus this change is extremely misleading. >>> >>> The original __lwsync() is much more clear. >> The problem I had with __lwsync is that it really wasn't all that >> clear. It was the lwsync instruction if SMP was enabled, otherwise >> it was just a barrier call. What I did is move the definition of >> __lwsync in the SMP case into fast_rmb, which in turn is accessed by >> smp_rmb. I tried to make this clear in the comment just above the >> two calls. The resultant assembly code should be exactly the same. >> >> What I could do is have it added back as a smp_lwsync if that works >> for you. That way there is something there to give you a hint that >> it becomes a barrier() call as soon as SMP is disabled. > An smp_lwsync() would be a great improvement! > > Thanx, Paul > Okay, that will be in the next patch then. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 18, 2014 at 03:13:29AM +0000, Alexander Duyck wrote: > On 11/17/2014 04:39 PM, Benjamin Herrenschmidt wrote: > > On Mon, 2014-11-17 at 12:24 -0800, Alexander Duyck wrote: > >> Yes and no. So for example on ARM I used the dmb() operation, however > >> I > >> have to use the barrier at the system level instead of just the inner > >> shared domain. However on many other architectures they are just the > >> same as the smp_* variants. > >> > >> Basically the resultant code is somewhere between the smp and non-smp > >> barriers in terms of what they cover. > > There I don't quite follow you. You need to explain better especially in > > the documentation because otherwise people will get it wrong... > > > > If it's ordering in the coherent domain, I fail to see how a DMA agent > > is different than another processor when it comes to barriers, so I fail > > to see the difference with smp_* > > > > I understand the MMIO vs. memory issue, we do have the same on powerpc, > > but that other aspect eludes me. > > > > ARM adds some funky things. They have two different types of > primitives, a dmb() which is a data memory barrier, and a dsb() which is > a data synchronization barrier. Then with each of those they have the > "domains" the barriers are effective within. > > So for example on ARM a rmb() is dsb(sy) which means it is a system wide > synchronization barrier which stops execution on the CPU core until the > read completes. However the smp_rmb() is a dmb(ish) which means it is > only a barrier as far as the inner shareable domain which I believe only > goes as far as the local shared cache hierarchy and only guarantees read > ordering without necessarily halting the CPU or stopping in-order > speculative reads. So what a coherent_rmb() would be in my setup is > dmb(sy) which means the barrier runs all the way out to memory, and it > is allowed to speculative read as long as it does it in order. > > If it is still unclear you might check out Will Deacon's talk on the > topic at https://www.youtube.com/watch?v=6ORn6_35kKo, at about 7:00 in > he explains the whole domains thing, and at 13:30 he explains dmb()/dsb(). So actually, this is an interesting case where the barrier would like to know whether the memory returned by dma_alloc_coherent is h/w coherent (normal, cacheable) or s/w coherent (normal, non-cacheable). I think Ben is thinking of the h/w coherent case (i.e. actual snooping into the CPU caches by the DMA master). For the former, we could use inner-shareable barriers. For the latter, we'd need to use outer-shareable barriers. If we can't tell, then these should be dmb(osh), which will work for both. Will -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/18/2014 03:58 AM, Will Deacon wrote: > On Tue, Nov 18, 2014 at 03:13:29AM +0000, Alexander Duyck wrote: >> On 11/17/2014 04:39 PM, Benjamin Herrenschmidt wrote: >>> On Mon, 2014-11-17 at 12:24 -0800, Alexander Duyck wrote: >>>> Yes and no. So for example on ARM I used the dmb() operation, however >>>> I >>>> have to use the barrier at the system level instead of just the inner >>>> shared domain. However on many other architectures they are just the >>>> same as the smp_* variants. >>>> >>>> Basically the resultant code is somewhere between the smp and non-smp >>>> barriers in terms of what they cover. >>> There I don't quite follow you. You need to explain better especially in >>> the documentation because otherwise people will get it wrong... >>> >>> If it's ordering in the coherent domain, I fail to see how a DMA agent >>> is different than another processor when it comes to barriers, so I fail >>> to see the difference with smp_* >>> >>> I understand the MMIO vs. memory issue, we do have the same on powerpc, >>> but that other aspect eludes me. >>> >> ARM adds some funky things. They have two different types of >> primitives, a dmb() which is a data memory barrier, and a dsb() which is >> a data synchronization barrier. Then with each of those they have the >> "domains" the barriers are effective within. >> >> So for example on ARM a rmb() is dsb(sy) which means it is a system wide >> synchronization barrier which stops execution on the CPU core until the >> read completes. However the smp_rmb() is a dmb(ish) which means it is >> only a barrier as far as the inner shareable domain which I believe only >> goes as far as the local shared cache hierarchy and only guarantees read >> ordering without necessarily halting the CPU or stopping in-order >> speculative reads. So what a coherent_rmb() would be in my setup is >> dmb(sy) which means the barrier runs all the way out to memory, and it >> is allowed to speculative read as long as it does it in order. >> >> If it is still unclear you might check out Will Deacon's talk on the >> topic at https://www.youtube.com/watch?v=6ORn6_35kKo, at about 7:00 in >> he explains the whole domains thing, and at 13:30 he explains dmb()/dsb(). > So actually, this is an interesting case where the barrier would like to > know whether the memory returned by dma_alloc_coherent is h/w coherent > (normal, cacheable) or s/w coherent (normal, non-cacheable). I think Ben > is thinking of the h/w coherent case (i.e. actual snooping into the CPU > caches by the DMA master). > > For the former, we could use inner-shareable barriers. For the latter, we'd > need to use outer-shareable barriers. > > If we can't tell, then these should be dmb(osh), which will work for both. > > Will Okay, so I will update the ARM portion of my patches to use osh and oshst then since it sounds like I was using too strong of barriers. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 18, 2014 at 04:20:46PM +0000, Alexander Duyck wrote: > On 11/18/2014 03:58 AM, Will Deacon wrote: > > So actually, this is an interesting case where the barrier would like to > > know whether the memory returned by dma_alloc_coherent is h/w coherent > > (normal, cacheable) or s/w coherent (normal, non-cacheable). I think Ben > > is thinking of the h/w coherent case (i.e. actual snooping into the CPU > > caches by the DMA master). > > > > For the former, we could use inner-shareable barriers. For the latter, we'd > > need to use outer-shareable barriers. > > > > If we can't tell, then these should be dmb(osh), which will work for both. > > > > Okay, so I will update the ARM portion of my patches to use osh and > oshst then since it sounds like I was using too strong of barriers. Sounds good. Another reason this is interesting is because the native acquire/release instructions on ARMv8 actually take into account the shareability domain of the virtual address, so using them would give you the shareability domain you want but slightly stronger ordering guarantees within that domain. Still, either of them will be a damn sight better than the dsb we currently have courtesy of the mandatory barriers. Will -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-11-17 at 19:13 -0800, Alexander Duyck wrote: > > ARM adds some funky things. They have two different types of > primitives, a dmb() which is a data memory barrier, and a dsb() which is > a data synchronization barrier. Then with each of those they have the > "domains" the barriers are effective within. > > So for example on ARM a rmb() is dsb(sy) which means it is a system wide > synchronization barrier which stops execution on the CPU core until the > read completes. That's amazingly heavy handed ... I can see that being useful for MMIO, we do something similar in our MMIO accessors by using a special variant of trap instruction that never traps to make the core thing the load value has been consumed. But that's typically only needed to guarantee MMIO timings. > However the smp_rmb() is a dmb(ish) which means it is > only a barrier as far as the inner shareable domain which I believe only > goes as far as the local shared cache hierarchy and only guarantees read > ordering without necessarily halting the CPU or stopping in-order > speculative reads. So what a coherent_rmb() would be in my setup is > dmb(sy) which means the barrier runs all the way out to memory, and it > is allowed to speculative read as long as it does it in order. Correct, which is thus the same as smp_rmb() ... which was my original point, or am I missing something else ? > If it is still unclear you might check out Will Deacon's talk on the > topic at https://www.youtube.com/watch?v=6ORn6_35kKo, at about 7:00 in > he explains the whole domains thing, and at 13:30 he explains dmb()/dsb(). Ok, I'll try to watch that when I get a chance. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 22a969c..fe55dea 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1615,6 +1615,47 @@ There are some more advanced barrier functions: operations" subsection for information on where to use these. + (*) fast_wmb(); + (*) fast_rmb(); + + These are for use with memory based device I/O to guarantee the ordering + of cache-enabled writes or reads with respect to other writes or reads + to cache-enabled memory. + + For example, consider a device driver that shares memory with a device + and uses a descriptor status value to indicate if the descriptor belongs + to the device or the CPU, and a doorbell to notify it when new + descriptors are available: + + if (desc->status != DEVICE_OWN) { + /* do not read data until we own descriptor */ + fast_rmb(); + + /* read/modify data */ + read_data = desc->data; + desc->data = write_data; + + /* flush modifications before status update */ + fast_wmb(); + + /* assign ownership */ + desc->status = DEVICE_OWN; + + /* force memory to sync before notifying device via MMIO */ + wmb(); + + /* notify device of new descriptors */ + writel(DESC_NOTIFY, doorbell); + } + + The fast_rmb() allows us guarantee the device has released ownership + before we read the data from the descriptor, and he fast_wmb() allows + us to guarantee the data is written to the descriptor before the device + can see it now has ownership. The wmb() is needed to guarantee that the + cache-enabled memory writes have completed before attempting a write to + the cache-inhibited MMIO region. + + MMIO WRITE BARRIER ------------------ diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index c6a3e73..c57903c 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h @@ -43,10 +43,14 @@ #define mb() do { dsb(); outer_sync(); } while (0) #define rmb() dsb() #define wmb() do { dsb(st); outer_sync(); } while (0) +#define fast_rmb() dmb() +#define fast_wmb() dmb(st) #else #define mb() barrier() #define rmb() barrier() #define wmb() barrier() +#define fast_rmb() barrier() +#define fast_wmb() barrier() #endif #ifndef CONFIG_SMP diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index 6389d60..3ca1a15 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -32,6 +32,9 @@ #define rmb() dsb(ld) #define wmb() dsb(st) +#define fast_rmb() dmb(ld) +#define fast_wmb() dmb(st) + #ifndef CONFIG_SMP #define smp_mb() barrier() #define smp_rmb() barrier() diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h index e8fffb0..997f5b0 100644 --- a/arch/ia64/include/asm/barrier.h +++ b/arch/ia64/include/asm/barrier.h @@ -39,6 +39,9 @@ #define rmb() mb() #define wmb() mb() +#define fast_rmb() mb() +#define fast_wmb() mb() + #ifdef CONFIG_SMP # define smp_mb() mb() #else diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h index 6d8b8c9..edddb6d 100644 --- a/arch/metag/include/asm/barrier.h +++ b/arch/metag/include/asm/barrier.h @@ -4,8 +4,6 @@ #include <asm/metag_mem.h> #define nop() asm volatile ("NOP") -#define mb() wmb() -#define rmb() barrier() #ifdef CONFIG_METAG_META21 @@ -41,11 +39,13 @@ static inline void wr_fence(void) #endif /* !CONFIG_METAG_META21 */ -static inline void wmb(void) -{ - /* flush writes through the write combiner */ - wr_fence(); -} +/* flush writes through the write combiner */ +#define mb() wr_fence() +#define rmb() barrier() +#define wmb() mb() + +#define fast_rmb() rmb() +#define fast_wmb() wmb() #ifndef CONFIG_SMP #define fence() do { } while (0) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index cb6d66c..f480097 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -36,22 +36,20 @@ #define set_mb(var, value) do { var = value; mb(); } while (0) -#ifdef CONFIG_SMP - #ifdef __SUBARCH_HAS_LWSYNC # define SMPWMB LWSYNC #else # define SMPWMB eieio #endif -#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") +#define fast_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") +#define fast_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") +#ifdef CONFIG_SMP #define smp_mb() mb() -#define smp_rmb() __lwsync() -#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") +#define smp_rmb() fast_rmb() +#define smp_wmb() fast_wmb() #else -#define __lwsync() barrier() - #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() @@ -69,10 +67,16 @@ #define data_barrier(x) \ asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); +/* + * The use of smp_rmb() in these functions are actually meant to map from + * smp_rmb()->fast_rmb()->LWSYNC. This way if smp is disabled then + * smp_rmb()->barrier(), or if the platform doesn't support lwsync it will + * map to the more heavy-weight sync. + */ #define smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ - __lwsync(); \ + smp_rmb(); \ ACCESS_ONCE(*p) = (v); \ } while (0) @@ -80,7 +84,7 @@ do { \ ({ \ typeof(*p) ___p1 = ACCESS_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ - __lwsync(); \ + smp_rmb(); \ ___p1; \ }) diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h index 33d191d..9a31301 100644 --- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -24,6 +24,8 @@ #define rmb() mb() #define wmb() mb() +#define fast_rmb() rmb() +#define fast_wmb() wmb() #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h index 6c974c0..eac2777 100644 --- a/arch/sparc/include/asm/barrier_64.h +++ b/arch/sparc/include/asm/barrier_64.h @@ -37,6 +37,9 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ #define rmb() __asm__ __volatile__("":::"memory") #define wmb() __asm__ __volatile__("":::"memory") +#define fast_rmb() rmb() +#define fast_wmb() wmb() + #define set_mb(__var, __value) \ do { __var = __value; membar_safe("#StoreLoad"); } while(0) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 5238000..cf440e7 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -24,13 +24,16 @@ #define wmb() asm volatile("sfence" ::: "memory") #endif -#ifdef CONFIG_SMP -#define smp_mb() mb() #ifdef CONFIG_X86_PPRO_FENCE -# define smp_rmb() rmb() +#define fast_rmb() rmb() #else -# define smp_rmb() barrier() +#define fast_rmb() barrier() #endif +#define fast_wmb() barrier() + +#ifdef CONFIG_SMP +#define smp_mb() mb() +#define smp_rmb() fast_rmb() #define smp_wmb() barrier() #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) #else /* !SMP */ diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h index d6511d9..2c0405d 100644 --- a/arch/x86/um/asm/barrier.h +++ b/arch/x86/um/asm/barrier.h @@ -29,17 +29,18 @@ #endif /* CONFIG_X86_32 */ -#ifdef CONFIG_SMP - -#define smp_mb() mb() #ifdef CONFIG_X86_PPRO_FENCE -#define smp_rmb() rmb() +#define fast_rmb() rmb() #else /* CONFIG_X86_PPRO_FENCE */ -#define smp_rmb() barrier() +#define fast_rmb() barrier() #endif /* CONFIG_X86_PPRO_FENCE */ +#define fast_wmb() barrier() -#define smp_wmb() barrier() +#ifdef CONFIG_SMP +#define smp_mb() mb() +#define smp_rmb() fast_rmb() +#define smp_wmb() fast_wmb() #define set_mb(var, value) do { (void)xchg(&var, value); } while (0) #else /* CONFIG_SMP */ diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 1402fa8..c1d60b9 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -42,6 +42,14 @@ #define wmb() mb() #endif +#ifndef fast_rmb +#define fast_rmb() rmb() +#endif + +#ifndef fast_wmb +#define fast_wmb() wmb() +#endif + #ifndef read_barrier_depends #define read_barrier_depends() do { } while (0) #endif
There are a number of situations where the mandatory barriers rmb() and wmb() are used to order memory/memory operations in the device drivers and those barriers are much heavier than they actually need to be. For example in the case of PowerPC wmb() calls the heavy-weight sync instruction when for memory/memory operations all that is really needed is an lsync or eieio instruction. This commit adds a fast (and loose) version of the mandatory memory barriers rmb() and wmb(). The prefix to the name is actually based on the version of the functions that already exist in the mips and tile trees. However I thought it applicable since it gets at what we are trying to accomplish with these barriers and somethat implies their risky nature. These new barriers are not as safe as the standard rmb() and wmb(). Specifically they do not guarantee ordering between cache-enabled and cache-inhibited memories. The primary use case for these would be to enforce ordering of memory reads/writes when accessing cache-enabled memory that is shared between the CPU and a device. It may also be noted that there is no fast_mb(). This is due to the fact that most architectures didn't seem to have a good way to do a full memory barrier quickly and so they usually resorted to an mb() for their smp_mb call. As such there no point in adding a fast_mb() function if it is going to map to mb() for all architectures anyway. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> --- Documentation/memory-barriers.txt | 41 +++++++++++++++++++++++++++++++++++ arch/arm/include/asm/barrier.h | 4 +++ arch/arm64/include/asm/barrier.h | 3 +++ arch/ia64/include/asm/barrier.h | 3 +++ arch/metag/include/asm/barrier.h | 14 ++++++------ arch/powerpc/include/asm/barrier.h | 22 +++++++++++-------- arch/s390/include/asm/barrier.h | 2 ++ arch/sparc/include/asm/barrier_64.h | 3 +++ arch/x86/include/asm/barrier.h | 11 ++++++--- arch/x86/um/asm/barrier.h | 13 ++++++----- include/asm-generic/barrier.h | 8 +++++++ 11 files changed, 98 insertions(+), 26 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html