diff mbox

[3/3] powerpc/e6500: hw tablewalk: order the memory access when acquire/release tcd lock

Message ID 1439466697-18989-3-git-send-email-haokexin@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Scott Wood
Headers show

Commit Message

Kevin Hao Aug. 13, 2015, 11:51 a.m. UTC
I didn't find anything unusual. But I think we do need to order the
load/store of esel_next when acquire/release tcd lock. For acquire,
add a data dependency to order the loads of lock and esel_next.
For release, even there already have a "isync" here, but it doesn't
guarantee any memory access order. So we still need "lwsync" for
the two stores for lock and esel_next.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
 arch/powerpc/mm/tlb_low_64e.S | 3 +++
 1 file changed, 3 insertions(+)

Comments

Scott Wood Aug. 14, 2015, 3:39 a.m. UTC | #1
On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote:
> I didn't find anything unusual. But I think we do need to order the
> load/store of esel_next when acquire/release tcd lock. For acquire,
> add a data dependency to order the loads of lock and esel_next.
> For release, even there already have a "isync" here, but it doesn't
> guarantee any memory access order. So we still need "lwsync" for
> the two stores for lock and esel_next.

I was going to say that esel_next is just a hint and it doesn't really matter 
if we occasionally get the wrong value, unless it happens often enough to 
cause more performance degradation than the lwsync causes.  However, with the 
A-008139 workaround we do need to read the same value from esel_next both 
times.  It might be less costly to save/restore an additional register 
instead of lwsync, though.

-Scott
Kevin Hao Aug. 14, 2015, 7:13 a.m. UTC | #2
On Thu, Aug 13, 2015 at 10:39:19PM -0500, Scott Wood wrote:
> On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote:
> > I didn't find anything unusual. But I think we do need to order the
> > load/store of esel_next when acquire/release tcd lock. For acquire,
> > add a data dependency to order the loads of lock and esel_next.
> > For release, even there already have a "isync" here, but it doesn't
> > guarantee any memory access order. So we still need "lwsync" for
> > the two stores for lock and esel_next.
> 
> I was going to say that esel_next is just a hint and it doesn't really matter 
> if we occasionally get the wrong value, unless it happens often enough to 
> cause more performance degradation than the lwsync causes.  However, with the 
> A-008139 workaround we do need to read the same value from esel_next both 
> times.  It might be less costly to save/restore an additional register 
> instead of lwsync, though.

I will try to get some benchmark number to compare which method is a bit better.
Do you have any recommended benchmark for a case this is?

Thanks,
Kevin

> 
> -Scott
>
Scott Wood Aug. 15, 2015, 12:44 a.m. UTC | #3
On Fri, 2015-08-14 at 15:13 +0800, Kevin Hao wrote:
> On Thu, Aug 13, 2015 at 10:39:19PM -0500, Scott Wood wrote:
> > On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote:
> > > I didn't find anything unusual. But I think we do need to order the
> > > load/store of esel_next when acquire/release tcd lock. For acquire,
> > > add a data dependency to order the loads of lock and esel_next.
> > > For release, even there already have a "isync" here, but it doesn't
> > > guarantee any memory access order. So we still need "lwsync" for
> > > the two stores for lock and esel_next.
> > 
> > I was going to say that esel_next is just a hint and it doesn't really 
> > matter 
> > if we occasionally get the wrong value, unless it happens often enough to 
> > cause more performance degradation than the lwsync causes.  However, with 
> > the 
> > A-008139 workaround we do need to read the same value from esel_next both 
> > times.  It might be less costly to save/restore an additional register 
> > instead of lwsync, though.
> 
> I will try to get some benchmark number to compare which method is a bit 
> better.
> Do you have any recommended benchmark for a case this is?

lmbench lat_mem_rd with a stride chosen to maximize TLB misses.  For the 
uncontended case, one instance; for the contended case, two instances, one 
pinned to each thread of a core.

-Scott
diff mbox

Patch

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index e4185581c5a7..964754911987 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -334,6 +334,8 @@  BEGIN_FTR_SECTION		/* CPU_FTR_SMT */
 	 * with tlbilx before overwriting.
 	 */
 
+	andi	r15,r15,0	/* add a data dependency to order the loards */
+	add	r11,r11,r15	/* between the lock and esel_next */
 	lbz	r15,TCD_ESEL_NEXT(r11)
 	rlwinm	r10,r15,16,0xff0000
 	oris	r10,r10,MAS0_TLBSEL(1)@h
@@ -447,6 +449,7 @@  BEGIN_FTR_SECTION
 	beq	cr1,1f		/* no unlock if lock was recursively grabbed */
 	li	r15,0
 	isync
+	lwsync
 	stb	r15,0(r11)
 1:
 END_FTR_SECTION_IFSET(CPU_FTR_SMT)