Message ID | OF14B8C2B7.A99F8A1E-ONC1257815.0057AC0A-C1257815.0057F664@transmode.se (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Hello Joakim, Joakim Tjernlund wrote: >> Sent by: linuxppc-dev-bounces+joakim.tjernlund=transmode.se@lists.ozlabs.org >> >> Rafael Beims <rbeims@gmail.com> wrote on 2011/01/10 17:35:38: >>>> Once you have tested it and it works, please send a patch to remove the 8xx workaround. >>>> Make sure Scott is cc:ed >>>> >>>> >>> I tested linux-2.6.33 on my ppc880 board today, and even without the >>> slowdown.patch applied, the board runs processes with good >>> performance. >>> It really seems that the problem is solved from linux-2.6.33 on. >>> >>> I'm not sure what you mean by sending a patch to remove the >>> workaround. The only thing that I did in the 2.6.32 version was to >>> apply the slowdown.patch attached in the message from Michael. >>> >>> Could you clarify please? >> Yes, this part in arch/powerpc/mm/pgtable.c: >> #ifdef CONFIG_8xx >> /* On 8xx, cache control instructions (particularly >> * "dcbst" from flush_dcache_icache) fault as write >> * operation if there is an unpopulated TLB entry >> * for the address in question. To workaround that, >> * we invalidate the TLB here, thus avoiding dcbst >> * misbehaviour. >> */ >> /* 8xx doesn't care about PID, size or ind args */ >> _tlbil_va(addr, 0, 0, 0); >> #endif /* CONFIG_8xx */ >> >> Should be removed in >= 2.6.33 kernels. >> My 8xx TLB work fixes this problem more efficiently. > > Can you test these 2 patches on recent 2.6 linux: >>From 9024200169bf86b4f34cb3b1ebf68e0056237bc0 Mon Sep 17 00:00:00 2001 > From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> > Date: Tue, 11 Jan 2011 13:43:42 +0100 > Subject: [PATCH 1/2] powerpc: Move 8xx invalidation of non present TLBs [...] > and > >>From 0ef93601290a75b087495dddeee6062a870f1dc6 Mon Sep 17 00:00:00 2001 > From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> > Date: Tue, 11 Jan 2011 13:55:22 +0100 > Subject: [PATCH 2/2] powerpc: Remove 8xx redundant dcbst workaround. Tested this on a board similliar to the mainline tqm8xx board with lmbench: -bash-3.2# cat /proc/cpuinfo processor : 0 cpu : 8xx clock : 80.000000MHz revision : 0.0 (pvr 0050 0000) bogomips : 10.00 timebase : 5000000 platform : KUP4K model : KUP4K Memory : 96 MB -bash-3.2# -bash-3.2# cat /proc/version Linux version 2.6.34-00064-g3e81b6b (hs@pollux.denx.de) (gcc version 4.2.2) #89 Thu Jan 20 08:39:52 CET 2011 -bash-3.2# (First run of lmbench without your 2 patches, the two other runs with it) -bash-3.2# make see cd results && make summary >summary.out 2>summary.errs cd results && make percent >percent.out 2>percent.errs -bash-3.2# cat results/summary.out make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' L M B E N C H 3 . 0 S U M M A R Y ------------------------------------ (Alpha software, do not distribute) Basic system parameters ------------------------------------------------------------------------------ Host OS Description Mhz tlb cache mem scal pages line par load bytes --------- ------------- ----------------------- ---- ----- ----- ------ ---- kup4k Linux 2.6.34- powerpc-linux-gnu 79 28 16 1.1400 1 kup4k Linux 2.6.34- powerpc-linux-gnu 79 28 16 1.0200 1 kup4k Linux 2.6.34- powerpc-linux-gnu 79 28 16 1.1000 1 Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- kup4k Linux 2.6.34- 79 2.58 12.3 126. 1285 353. 22.8 149. 8418 34.K 101K kup4k Linux 2.6.34- 79 2.59 13.1 127. 1273 320. 23.4 127. 8251 33.K 100K kup4k Linux 2.6.34- 79 2.47 13.1 127. 1288 315. 23.6 128. 8413 34.K 101K Basic integer operations - times in nanoseconds - smaller is better ------------------------------------------------------------------- Host OS intgr intgr intgr intgr intgr bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ kup4k Linux 2.6.34- 12.6 14.4 1.3500 103.9 170.6 kup4k Linux 2.6.34- 13.2 15.0 1.3100 100.0 170.5 kup4k Linux 2.6.34- 13.2 14.4 1.2900 104.1 162.1 Basic uint64 operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS int64 int64 int64 int64 int64 bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ kup4k Linux 2.6.34- 12. 11.1 1637.9 1602.4 kup4k Linux 2.6.34- 13. 11.1 1643.6 1604.2 kup4k Linux 2.6.34- 13. 11.1 1639.7 1600.8 Basic float operations - times in nanoseconds - smaller is better ----------------------------------------------------------------- Host OS float float float float add mul div bogo --------- ------------- ------ ------ ------ ------ kup4k Linux 2.6.34- 840.5 1304.3 4593.3 8703.0 kup4k Linux 2.6.34- 843.5 1366.6 4601.7 8814.0 kup4k Linux 2.6.34- 807.8 1377.5 4610.0 8710.0 Basic double operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS double double double double add mul div bogo --------- ------------- ------ ------ ------ ------ kup4k Linux 2.6.34- 1309.2 2235.2 3132.2 13.9K kup4k Linux 2.6.34- 1252.0 2339.0 2993.8 13.9K kup4k Linux 2.6.34- 1311.2 2335.2 2997.2 13.9K Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- kup4k Linux 2.6.34- 131.8 144.7 130.8 168.4 207.8 190.7 248.1 kup4k Linux 2.6.34- 129.4 142.4 140.8 186.4 211.1 187.0 257.9 kup4k Linux 2.6.34- 121.3 155.6 131.0 196.8 201.5 198.5 240.7 *Local* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- kup4k Linux 2.6.34- 131.8 444.2 771. 1024. 1432. 3876 kup4k Linux 2.6.34- 129.4 455.2 722. 1021. 1434. 3831 kup4k Linux 2.6.34- 121.3 458.8 761. 1004. 1435. 3866 *Remote* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS UDP RPC/ TCP RPC/ TCP UDP TCP conn --------- ------------- ----- ----- ----- ----- ---- kup4k Linux 2.6.34- kup4k Linux 2.6.34- kup4k Linux 2.6.34- File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- kup4k Linux 2.6.34- 16.7K 10.3K 90.9K 13.7K 22.6K 27.1 43.4 117.9 kup4k Linux 2.6.34- 16.9K 15.6K 100.0K 16.1K 22.7K 9.590 39.8 119.2 kup4k Linux 2.6.34- 16.7K 13.5K 100.0K 15.9K 22.8K 9.306 39.8 119.6 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- kup4k Linux 2.6.34- 13.3 13.3 11.0 18.3 49.5 23.7 23.3 49.5 35.5 kup4k Linux 2.6.34- 13.2 13.4 10.8 18.4 49.5 23.4 23.2 49.5 35.4 kup4k Linux 2.6.34- 13.1 13.2 11.0 18.3 49.5 23.7 23.4 49.5 35.5 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ------------------------------------------------------------------------------ Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses --------- ------------- --- ---- ---- -------- -------- ------- kup4k Linux 2.6.34- 79 26.4 278.6 277.0 1145.6 No L2 cache? kup4k Linux 2.6.34- 79 26.4 278.7 277.1 1147.1 No L2 cache? kup4k Linux 2.6.34- 79 26.4 278.8 276.6 1146.9 No L2 cache? make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results' -bash-3.2# bye, Heiko
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 1f1a04b..6cd99e2 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -221,7 +221,11 @@ DataAccess: stw r10,_DSISR(r11) mr r5,r10 mfspr r4,SPRN_DAR - li r10,0x00f0 + /* invalidate ~PRESENT TLBs, 8xx MMU don't do this */ + andis. r10,r5,0x4000 + beq+ 1f + tlbie r4 +1: li r10,0x00f0 mtspr SPRN_DAR,r10 /* Tag DAR, to be used in DTLB Error */ EXC_XFER_EE_LITE(0x300, handle_page_fault) @@ -234,7 +238,11 @@ InstructionAccess: EXCEPTION_PROLOG mr r4,r12 mr r5,r9 - EXC_XFER_EE_LITE(0x400, handle_page_fault) + /* invalidate ~PRESENT TLBs, 8xx MMU don't do this */ + andis. r10,r5,0x4000 + beq+ 1f + tlbie r4 +1: EXC_XFER_EE_LITE(0x400, handle_page_fault) /* External interrupt */ EXCEPTION(0x500, HardwareInterrupt, do_IRQ, EXC_XFER_LITE) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 1bd712c..31226c8 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -247,12 +247,6 @@ good_area: goto bad_area; #endif /* CONFIG_6xx */ #if defined(CONFIG_8xx) - /* 8xx sometimes need to load a invalid/non-present TLBs. - * These must be invalidated separately as linux mm don't. - */ - if (error_code & 0x40000000) /* no translation? */ - _tlbil_va(address, 0, 0, 0); - /* The MPC8xx seems to always set 0x80000000, which is * "undefined". Of those that can be set, this is the only * one which seems bad. -- 1.7.3.4 and From 0ef93601290a75b087495dddeee6062a870f1dc6 Mon Sep 17 00:00:00 2001 From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Date: Tue, 11 Jan 2011 13:55:22 +0100 Subject: [PATCH 2/2] powerpc: Remove 8xx redundant dcbst workaround. On 8xx dcbst fault as write operation if there is an unpopulated TLB entry for the address in question. There is as of commit 0a2ab51ffb8dfdf51402dcfb446629648c96bc78, powerpc/8xx: Fixup DAR from buggy dcbX instructions a better workaround in the TLB error handler so this bad one can be removed. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> --- arch/powerpc/mm/pgtable.c | 11 ----------- 1 files changed, 0 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index ebc2f38..d3f47a6 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -185,17 +185,6 @@ static pte_t set_pte_filter(pte_t pte, unsigned long addr) if (!pg) return pte; if (!test_bit(PG_arch_1, &pg->flags)) { -#ifdef CONFIG_8xx - /* On 8xx, cache control instructions (particularly - * "dcbst" from flush_dcache_icache) fault as write - * operation if there is an unpopulated TLB entry - * for the address in question. To workaround that, - * we invalidate the TLB here, thus avoiding dcbst - * misbehaviour. - */ - /* 8xx doesn't care about PID, size or ind args */ - _tlbil_va(addr, 0, 0, 0); -#endif /* CONFIG_8xx */ flush_dcache_icache_page(pg); set_bit(PG_arch_1, &pg->flags); }