From patchwork Wed Nov 12 03:51:18 2008
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Nick Piggin <npiggin@suse.de>
X-Patchwork-Id: 8303
X-Patchwork-Delegate: paulus@samba.org
Return-Path: <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from ozlabs.org (localhost [127.0.0.1])
	by ozlabs.org (Postfix) with ESMTP id D8769DDE0A
	for <patchwork-incoming@ozlabs.org>;
	Wed, 12 Nov 2008 14:52:13 +1100 (EST)
X-Original-To: linuxppc-dev@ozlabs.org
Delivered-To: linuxppc-dev@ozlabs.org
Received: from mx1.suse.de (ns1.suse.de [195.135.220.2])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mx1.suse.de", Issuer "CAcert Class 3 Root" (verified OK))
	by ozlabs.org (Postfix) with ESMTPS id 8D885DDDF3
	for <linuxppc-dev@ozlabs.org>; Wed, 12 Nov 2008 14:51:27 +1100 (EST)
Received: from Relay1.suse.de (relay-ext.suse.de [195.135.221.8])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.suse.de (Postfix) with ESMTP id 7A7B8429C6;
	Wed, 12 Nov 2008 04:51:18 +0100 (CET)
Date: Wed, 12 Nov 2008 04:51:18 +0100
From: Nick Piggin <npiggin@suse.de>
To: Paul Mackerras <paulus@samba.org>, linuxppc-dev@ozlabs.org
Subject: [patch 2/3] powerpc: optimise smp_rmb
Message-ID: <20081112035118.GG26053@wotan.suse.de>
References: <20081112035048.GF26053@wotan.suse.de>
Mime-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20081112035048.GF26053@wotan.suse.de>
User-Agent: Mutt/1.5.9i
X-BeenThere: linuxppc-dev@ozlabs.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: Linux on PowerPC Developers Mail List <linuxppc-dev.ozlabs.org>
List-Unsubscribe: <https://ozlabs.org/mailman/options/linuxppc-dev>,
	<mailto:linuxppc-dev-request@ozlabs.org?subject=unsubscribe>
List-Archive: <http://ozlabs.org/pipermail/linuxppc-dev>
List-Post: <mailto:linuxppc-dev@ozlabs.org>
List-Help: <mailto:linuxppc-dev-request@ozlabs.org?subject=help>
List-Subscribe: <https://ozlabs.org/mailman/listinfo/linuxppc-dev>,
	<mailto:linuxppc-dev-request@ozlabs.org?subject=subscribe>
Sender: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org
Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@ozlabs.org

After commit 598056d5af8fef1dbe8f96f5c2b641a528184e5a, rmb() becomes a sync
instruction, which is needed to order cacheable vs noncacheable loads. However
smp_rmb() is #defined to rmb(), and smp_rmb() can be an lwsync.

Restore smp_rmb() performance by using lwsync there. Update comments.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/arch/powerpc/include/asm/system.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/system.h	2008-11-12 12:28:57.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/system.h	2008-11-12 12:35:12.000000000 +1100
@@ -23,15 +23,17 @@
  * read_barrier_depends() prevents data-dependent loads being reordered
  *	across this point (nop on PPC).
  *
- * We have to use the sync instructions for mb(), since lwsync doesn't
- * order loads with respect to previous stores.  Lwsync is fine for
- * rmb(), though. Note that rmb() actually uses a sync on 32-bit
- * architectures.
+ * *mb() variants without smp_ prefix must order all types of memory
+ * operations with one another. sync is the only instruction sufficient
+ * to do this.
  *
- * For wmb(), we use sync since wmb is used in drivers to order
- * stores to system memory with respect to writes to the device.
- * However, smp_wmb() can be a lighter-weight lwsync or eieio barrier
- * on SMP since it is only used to order updates to system memory.
+ * For the smp_ barriers, ordering is for cacheable memory operations
+ * only. We have to use the sync instruction for smp_mb(), since lwsync
+ * doesn't order loads with respect to previous stores.  Lwsync can be
+ * used for smp_rmb() and smp_wmb().
+ *
+ * However, on CPUs that don't support lwsync, lwsync actually maps to a
+ * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
  */
 #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
@@ -51,7 +53,7 @@
 #endif
 
 #define smp_mb()	mb()
-#define smp_rmb()	rmb()
+#define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else