Message ID | 2d35de4435e873f23d37e3b5b5fb34c64421f136.1442939410.git.christophe.leroy@c-s.fr (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Scott Wood |
Headers | show |
On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: > We are spending between 40 and 160 cycles with a mean of 65 cycles in > the TLB handling routines (measured with mftbl) so make it more > simple althought it adds one instruction. > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Does this just make it simpler or does it make it faster? What is the performance impact? Is the performance impact seen with or without CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an mtspr/mfspr combo in order to replace one mfspr. -Scott
Le 29/09/2015 00:07, Scott Wood a écrit : > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: >> We are spending between 40 and 160 cycles with a mean of 65 cycles in >> the TLB handling routines (measured with mftbl) so make it more >> simple althought it adds one instruction. >> >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> > Does this just make it simpler or does it make it faster? What is the > performance impact? Is the performance impact seen with or without > CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an > mtspr/mfspr combo in order to replace one mfspr. > > The performance impact is not noticeable. Theoritically it adds 1 cycle on a mean of 65 cycles, that is 1.5%. Even in the worst case where we spend around 10% of the time in TLB handling exceptions, that represents only 0.15% of the total CPU time. So that's almost nothing. Behind the fact to get in simpler, the main reason is because I need a third register for the following patch in the set, otherwise I would spend a more time saving and restoring CR several times. Christophe
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: > Le 29/09/2015 00:07, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: > > > We are spending between 40 and 160 cycles with a mean of 65 cycles in > > > the TLB handling routines (measured with mftbl) so make it more > > > simple althought it adds one instruction. > > > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> > > Does this just make it simpler or does it make it faster? What is the > > performance impact? Is the performance impact seen with or without > > CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an > > mtspr/mfspr combo in order to replace one mfspr. > > > > > The performance impact is not noticeable. Theoritically it adds 1 cycle > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we > spend around 10% of the time in TLB handling exceptions, that represents > only 0.15% of the total CPU time. So that's almost nothing. > Behind the fact to get in simpler, the main reason is because I need a > third register for the following patch in the set, otherwise I would > spend a more time saving and restoring CR several times. If you had said in the changelog that it was because future patches would need the register to be saved, we could have avoided this exchange... Especially with large patchsets, I review the patches one at a time. Don't assume I know what's coming in patch n+1 (and especially not n+m) when I review patch n. -Scott
On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: > Le 29/09/2015 00:07, Scott Wood a écrit : > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: > > > We are spending between 40 and 160 cycles with a mean of 65 cycles in > > > the TLB handling routines (measured with mftbl) so make it more > > > simple althought it adds one instruction. > > > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> > > Does this just make it simpler or does it make it faster? What is the > > performance impact? Is the performance impact seen with or without > > CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an > > mtspr/mfspr combo in order to replace one mfspr. > > > > > The performance impact is not noticeable. Theoritically it adds 1 cycle > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we > spend around 10% of the time in TLB handling exceptions, that represents > only 0.15% of the total CPU time. So that's almost nothing. > Behind the fact to get in simpler, the main reason is because I need a > third register for the following patch in the set, otherwise I would > spend a more time saving and restoring CR several times. FWIW, the added instruction is an SPR access and I doubt that's only one cycle. -Scott
Le 06/10/2015 18:46, Scott Wood a écrit : > On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: >> Le 29/09/2015 00:07, Scott Wood a écrit : >>> On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: >>>> We are spending between 40 and 160 cycles with a mean of 65 cycles in >>>> the TLB handling routines (measured with mftbl) so make it more >>>> simple althought it adds one instruction. >>>> >>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> >>> Does this just make it simpler or does it make it faster? What is the >>> performance impact? Is the performance impact seen with or without >>> CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an >>> mtspr/mfspr combo in order to replace one mfspr. >>> >>> >> The performance impact is not noticeable. Theoritically it adds 1 cycle >> on a mean of 65 cycles, that is 1.5%. Even in the worst case where we >> spend around 10% of the time in TLB handling exceptions, that represents >> only 0.15% of the total CPU time. So that's almost nothing. >> Behind the fact to get in simpler, the main reason is because I need a >> third register for the following patch in the set, otherwise I would >> spend a more time saving and restoring CR several times. > FWIW, the added instruction is an SPR access and I doubt that's only one > cycle. > > According to the mpc885 reference manual (table 9-1), Instruction Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR and CTR and to SPRs external to the core" is "serialize + 1 cycle". Taking into account we preeceeding instructions are also 'mtspr', we are already serialized, so it is only one cycle I believe. Am I interpreting it wrong ? Christophe --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus
On Tue, 2015-10-06 at 22:30 +0200, christophe leroy wrote: > Le 06/10/2015 18:46, Scott Wood a écrit : > > On Tue, 2015-10-06 at 15:35 +0200, Christophe Leroy wrote: > > > Le 29/09/2015 00:07, Scott Wood a écrit : > > > > On Tue, Sep 22, 2015 at 06:50:29PM +0200, Christophe Leroy wrote: > > > > > We are spending between 40 and 160 cycles with a mean of 65 cycles > > > > > in > > > > > the TLB handling routines (measured with mftbl) so make it more > > > > > simple althought it adds one instruction. > > > > > > > > > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> > > > > Does this just make it simpler or does it make it faster? What is the > > > > performance impact? Is the performance impact seen with or without > > > > CONFIG_8xx_CPU6 enabled? Without it, it looks like you're adding an > > > > mtspr/mfspr combo in order to replace one mfspr. > > > > > > > > > > > The performance impact is not noticeable. Theoritically it adds 1 cycle > > > on a mean of 65 cycles, that is 1.5%. Even in the worst case where we > > > spend around 10% of the time in TLB handling exceptions, that represents > > > only 0.15% of the total CPU time. So that's almost nothing. > > > Behind the fact to get in simpler, the main reason is because I need a > > > third register for the following patch in the set, otherwise I would > > > spend a more time saving and restoring CR several times. > > FWIW, the added instruction is an SPR access and I doubt that's only one > > cycle. > > > > > According to the mpc885 reference manual (table 9-1), Instruction > Execution Timing for "Move to: mtspr, mtcrf, mtmsr, mcrxr except mtspr to LR > and CTR and to SPRs external to the core" is "serialize + 1 cycle". > Taking into account we preeceeding instructions are also 'mtspr', we are > already serialized, so it is only one cycle I believe. > Am I interpreting it wrong ? I don't know. The manual doesn't go into much detail about the mechanics of serialization. If it's just about "block[ing] all execution units" without any effect on fetching, decoding, etc. then maybe you're right. -Scott
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 78c1eba..1557926 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -385,23 +385,20 @@ InstructionTLBMiss: . = 0x1200 DataStoreTLBMiss: -#ifdef CONFIG_8xx_CPU6 mtspr SPRN_SPRG_SCRATCH2, r3 -#endif EXCEPTION_PROLOG_0 - mfcr r10 + mfcr r3 /* If we are faulting a kernel address, we have to use the * kernel page tables. */ - mfspr r11, SPRN_MD_EPN - IS_KERNEL(r11, r11) + mfspr r10, SPRN_MD_EPN + IS_KERNEL(r11, r10) mfspr r11, SPRN_M_TW /* Get level 1 table */ BRANCH_UNLESS_KERNEL(3f) lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha 3: - mtcr r10 - mfspr r10, SPRN_MD_EPN + mtcr r3 /* Insert level 1 index */ rlwimi r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 29 @@ -453,9 +450,7 @@ DataStoreTLBMiss: MTSPR_CPU6(SPRN_MD_RPN, r10, r3) /* Update TLB entry */ /* Restore registers */ -#ifdef CONFIG_8xx_CPU6 mfspr r3, SPRN_SPRG_SCRATCH2 -#endif mtspr SPRN_DAR, r11 /* Tag DAR */ EXCEPTION_EPILOG_0 rfi
We are spending between 40 and 160 cycles with a mean of 65 cycles in the TLB handling routines (measured with mftbl) so make it more simple althought it adds one instruction. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> --- No change in v2 arch/powerpc/kernel/head_8xx.S | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)