diff mbox

[v2,01/25] powerpc/8xx: Save r3 all the time in DTLB miss handler

Message ID 2d35de4435e873f23d37e3b5b5fb34c64421f136.1442939410.git.christophe.leroy@c-s.fr (mailing list archive)
State Superseded
Delegated to: Scott Wood
Headers show

Commit Message

Christophe Leroy Sept. 22, 2015, 4:50 p.m. UTC
We are spending between 40 and 160 cycles with a mean of 65 cycles in
the TLB handling routines (measured with mftbl) so make it more
simple althought it adds one instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
No change in v2

 arch/powerpc/kernel/head_8xx.S | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

Comments

Scott Wood Sept. 28, 2015, 10:07 p.m. UTC | #1
On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> We are spending between 40 and 160 cycles with a mean of 65 cycles in
> the TLB handling routines (measured with mftbl) so make it more
> simple althought it adds one instruction.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Does this just make it simpler or does it make it faster?  What is the
performance impact?  Is the performance impact seen with or without
CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
mtspr/mfspr combo in order to replace one mfspr.

-Scott
Christophe Leroy Oct. 6, 2015, 1:35 p.m. UTC | #2
Le 29/09/2015 00:07, Scott Wood a écrit :
> On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
>> We are spending between 40 and 160 cycles with a mean of 65 cycles in
>> the TLB handling routines (measured with mftbl) so make it more
>> simple althought it adds one instruction.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Does this just make it simpler or does it make it faster?  What is the
> performance impact?  Is the performance impact seen with or without
> CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> mtspr/mfspr combo in order to replace one mfspr.
>
>
The performance impact is not noticeable. Theoritically it adds 1 cycle 
on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
spend around 10% of the time in TLB handling exceptions, that represents 
only 0.15% of the total CPU time. So that's almost nothing.
Behind the fact to get in simpler, the main reason is because I need a 
third register for the following patch in the set, otherwise I would 
spend a more time saving and restoring CR several times.

Christophe
Scott Wood Oct. 6, 2015, 4:39 p.m. UTC | #3
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 00:07, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles in
> > > the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction.
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > Does this just make it simpler or does it make it faster?  What is the
> > performance impact?  Is the performance impact seen with or without
> > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > mtspr/mfspr combo in order to replace one mfspr.
> > 
> > 
> The performance impact is not noticeable. Theoritically it adds 1 cycle 
> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
> spend around 10% of the time in TLB handling exceptions, that represents 
> only 0.15% of the total CPU time. So that's almost nothing.
> Behind the fact to get in simpler, the main reason is because I need a 
> third register for the following patch in the set, otherwise I would 
> spend a more time saving and restoring CR several times.

If you had said in the changelog that it was because future patches would 
need the register to be saved, we could have avoided this exchange...  
Especially with large patchsets, I review the patches one at a time.  Don't 
assume I know what's coming in patch n+1 (and especially not n+m) when I 
review patch n.

-Scott
Scott Wood Oct. 6, 2015, 4:46 p.m. UTC | #4
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> Le 29/09/2015 00:07, Scott Wood a écrit :
> > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > We are spending between 40 and 160 cycles with a mean of 65 cycles in
> > > the TLB handling routines (measured with mftbl) so make it more
> > > simple althought it adds one instruction.
> > > 
> > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > Does this just make it simpler or does it make it faster?  What is the
> > performance impact?  Is the performance impact seen with or without
> > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > mtspr/mfspr combo in order to replace one mfspr.
> > 
> > 
> The performance impact is not noticeable. Theoritically it adds 1 cycle 
> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we 
> spend around 10% of the time in TLB handling exceptions, that represents 
> only 0.15% of the total CPU time. So that's almost nothing.
> Behind the fact to get in simpler, the main reason is because I need a 
> third register for the following patch in the set, otherwise I would 
> spend a more time saving and restoring CR several times.

FWIW, the added instruction is an SPR access and I doubt that's only one 
cycle.

-Scott
Christophe Leroy Oct. 6, 2015, 8:30 p.m. UTC | #5
Le 06/10/2015 18:46, Scott Wood a écrit :
> On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
>> Le 29/09/2015 00:07, Scott Wood a écrit :
>>> On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
>>>> We are spending between 40 and 160 cycles with a mean of 65 cycles in
>>>> the TLB handling routines (measured with mftbl) so make it more
>>>> simple althought it adds one instruction.
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>> Does this just make it simpler or does it make it faster?  What is the
>>> performance impact?  Is the performance impact seen with or without
>>> CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
>>> mtspr/mfspr combo in order to replace one mfspr.
>>>
>>>
>> The performance impact is not noticeable. Theoritically it adds 1 cycle
>> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we
>> spend around 10% of the time in TLB handling exceptions, that represents
>> only 0.15% of the total CPU time. So that's almost nothing.
>> Behind the fact to get in simpler, the main reason is because I need a
>> third register for the following patch in the set, otherwise I would
>> spend a more time saving and restoring CR several times.
> FWIW, the added instruction is an SPR access and I doubt that's only one
> cycle.
>
>
According to the mpc885 reference manual (table 9-1), Instruction 
Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR
and CTR and to SPRs external to the core" is "serialize + 1 cycle".
Taking into account we preeceeding instructions are also 'mtspr', we are 
already serialized, so it is only one cycle I believe.
Am I interpreting it wrong ?

Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
Scott Wood Oct. 6, 2015, 8:38 p.m. UTC | #6
On Tue, 2015-10-06 at 22:30 +0200, christophe leroy wrote:
> Le 06/10/2015 18:46, Scott Wood a écrit :
> > On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote:
> > > Le 29/09/2015 00:07, Scott Wood a écrit :
> > > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote:
> > > > > We are spending between 40 and 160 cycles with a mean of 65 cycles 
> > > > > in
> > > > > the TLB handling routines (measured with mftbl) so make it more
> > > > > simple althought it adds one instruction.
> > > > > 
> > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> > > > Does this just make it simpler or does it make it faster?  What is the
> > > > performance impact?  Is the performance impact seen with or without
> > > > CONFIG_8xx_CPU6 enabled?  Without it, it looks like you're adding an
> > > > mtspr/mfspr combo in order to replace one mfspr.
> > > > 
> > > > 
> > > The performance impact is not noticeable. Theoritically it adds 1 cycle
> > > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we
> > > spend around 10% of the time in TLB handling exceptions, that represents
> > > only 0.15% of the total CPU time. So that's almost nothing.
> > > Behind the fact to get in simpler, the main reason is because I need a
> > > third register for the following patch in the set, otherwise I would
> > > spend a more time saving and restoring CR several times.
> > FWIW, the added instruction is an SPR access and I doubt that's only one
> > cycle.
> > 
> > 
> According to the mpc885 reference manual (table 9-1), Instruction 
> Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR
> and CTR and to SPRs external to the core" is "serialize + 1 cycle".
> Taking into account we preeceeding instructions are also 'mtspr', we are 
> already serialized, so it is only one cycle I believe.
> Am I interpreting it wrong ?

I don't know.  The manual doesn't go into much detail about the mechanics of 
serialization.  If it's just about "block[ing] all execution units" without 
any effect on fetching, decoding, etc. then maybe you're right.

-Scott
diff mbox

Patch

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 78c1eba..1557926 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -385,23 +385,20 @@  InstructionTLBMiss:
 
 	. = 0x1200
 DataStoreTLBMiss:
-#ifdef CONFIG_8xx_CPU6
 	mtspr	SPRN_SPRG_SCRATCH2, r3
-#endif
 	EXCEPTION_PROLOG_0
-	mfcr	r10
+	mfcr	r3
 
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
-	mfspr	r11, SPRN_MD_EPN
-	IS_KERNEL(r11, r11)
+	mfspr	r10, SPRN_MD_EPN
+	IS_KERNEL(r11, r10)
 	mfspr	r11, SPRN_M_TW	/* Get level 1 table */
 	BRANCH_UNLESS_KERNEL(3f)
 	lis	r11, (swapper_pg_dir-PAGE_OFFSET)@ha
 3:
-	mtcr	r10
-	mfspr	r10, SPRN_MD_EPN
+	mtcr	r3
 
 	/* Insert level 1 index */
 	rlwimi	r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29
@@ -453,9 +450,7 @@  DataStoreTLBMiss:
 	MTSPR_CPU6(SPRN_MD_RPN, r10, r3)	/* Update TLB entry */
 
 	/* Restore registers */
-#ifdef CONFIG_8xx_CPU6
 	mfspr	r3, SPRN_SPRG_SCRATCH2
-#endif
 	mtspr	SPRN_DAR, r11	/* Tag DAR */
 	EXCEPTION_EPILOG_0
 	rfi