diff mbox

[RFC,v5] MPC5121 TLB errata workaround

Message ID 200903161652.09747.david.jander@protonic.nl (mailing list archive)
State Superseded, archived
Headers show

Commit Message

David Jander March 16, 2009, 3:52 p.m. UTC
Complete workaround for DTLB errata in e300c2/c3/c4 processors.

Due to the bug, the hardware-implemented LRU algorythm always goes to way
1 of the TLB. This fix implements the proposed software workaround in
form of a LRW table for chosing the TLB-way.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David Jander <david@protonic.nl>

---

Comments

David Jander March 16, 2009, 4:09 p.m. UTC | #1
In this patch, I placed the LRW table in SPRG6 like before, but Kumar's code 
seems a little more compact, so I decided to use that one and fix it ;-)

It's a pity we seem to have one register short in the handler, so we need to 
load SPRN_SRR1 twice :-(

Allthough the code-path now has 1 instruction less than my previous version 
most of the time (and 2 instructions more when way is not adjusted), 
benchmark results are barely affected by this:

1.- mplayer: Total time: 50.392s (50.316s previous patch)

2.- prboom timedemo: 20.1 fps (19.9 fps previous patch)

3.- memcpy speed: 179 MByte/s (180 Mbyte/s previous patch)

Conclusion: difference not measurable between v4 and v5.

Best regards,
Kenneth Johansson March 16, 2009, 4:34 p.m. UTC | #2
On Mon, 2009-03-16 at 16:52 +0100, David Jander wrote:
> Complete workaround for DTLB errata in e300c2/c3/c4 processors.
> 
> Due to the bug, the hardware-implemented LRU algorythm always goes to way
> 1 of the TLB. This fix implements the proposed software workaround in
> form of a LRW table for chosing the TLB-way.
> 
> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
> Signed-off-by: David Jander <david@protonic.nl>

I think we have a winner. with one instruction slot left :)
I tried your V4 and V5 and could not see any difference in speed.


Acked-by: Kenneth Johansson <kenneth@southpole.se>
Kumar Gala March 16, 2009, 5:47 p.m. UTC | #3
On Mar 16, 2009, at 11:09 AM, David Jander wrote:

>
> In this patch, I placed the LRW table in SPRG6 like before, but  
> Kumar's code
> seems a little more compact, so I decided to use that one and fix  
> it ;-)
>
> It's a pity we seem to have one register short in the handler, so we  
> need to
> load SPRN_SRR1 twice :-(
>
> Allthough the code-path now has 1 instruction less than my previous  
> version
> most of the time (and 2 instructions more when way is not adjusted),
> benchmark results are barely affected by this:
>
> 1.- mplayer: Total time: 50.392s (50.316s previous patch)
>
> 2.- prboom timedemo: 20.1 fps (19.9 fps previous patch)
>
> 3.- memcpy speed: 179 MByte/s (180 Mbyte/s previous patch)
>
> Conclusion: difference not measurable between v4 and v5.

Can we try this memory instead of the SPRG to see if any noticeable  
difference?

- k
Kumar Gala March 16, 2009, 6:05 p.m. UTC | #4
On Mar 16, 2009, at 10:52 AM, David Jander wrote:

> Complete workaround for DTLB errata in e300c2/c3/c4 processors.
>
> Due to the bug, the hardware-implemented LRU algorythm always goes  
> to way
> 1 of the TLB. This fix implements the proposed software workaround in
> form of a LRW table for chosing the TLB-way.
>
> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
> Signed-off-by: David Jander <david@protonic.nl>
>
> ---
> diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/ 
> head_32.S
> index 0f4fac5..3971ee4 100644
> --- a/arch/powerpc/kernel/head_32.S
> +++ b/arch/powerpc/kernel/head_32.S
> @@ -578,9 +578,21 @@ DataLoadTLBMiss:
> 	andc	r1,r3,r1		/* PP = user? (rw&dirty? 2: 3): 0 */
> 	mtspr	SPRN_RPA,r1
> 	mfspr	r3,SPRN_DMISS
> +	mfspr	r2,SPRN_SRR1		/* Need to restore CR0 */
> +	mtcrf	0x80,r2
> +#ifdef CONFIG_PPC_MPC512x
> +	li	r0,1
> +	mfspr	r1,SPRN_SPRG6
> +	rlwinm	r2,r3,17,27,31		/* Get Address bits 19:15 */

Don't we want:
	rlwinm	r2,r3,20,27,31		/* Get address bits 15:19 */

> +	slw	r0,r0,r2
> +	xor	r1,r0,r1
> +	srw	r0,r1,r2
> +	mtspr   SPRN_SPRG6,r1
> +	mfspr	r2,SPRN_SRR1
> +	rlwimi	r2,r0,31-14,14,14
> +	mtspr   SPRN_SRR1,r2
> +#endif
> 	tlbld	r3
> -	mfspr	r3,SPRN_SRR1		/* Need to restore CR0 */
> -	mtcrf	0x80,r3
> 	rfi

- k
David Jander March 17, 2009, 10:38 a.m. UTC | #5
On Monday 16 March 2009 19:05:00 Kumar Gala wrote:
> On Mar 16, 2009, at 10:52 AM, David Jander wrote:
> > Complete workaround for DTLB errata in e300c2/c3/c4 processors.
> >
> > Due to the bug, the hardware-implemented LRU algorythm always goes
> > to way
> > 1 of the TLB. This fix implements the proposed software workaround in
> > form of a LRW table for chosing the TLB-way.
> >
> > Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
> > Signed-off-by: David Jander <david@protonic.nl>
> >
> > ---
> > diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/
> > head_32.S
> > index 0f4fac5..3971ee4 100644
> > --- a/arch/powerpc/kernel/head_32.S
> > +++ b/arch/powerpc/kernel/head_32.S
> > @@ -578,9 +578,21 @@ DataLoadTLBMiss:
> > 	andc	r1,r3,r1		/* PP = user? (rw&dirty? 2: 3): 0 */
> > 	mtspr	SPRN_RPA,r1
> > 	mfspr	r3,SPRN_DMISS
> > +	mfspr	r2,SPRN_SRR1		/* Need to restore CR0 */
> > +	mtcrf	0x80,r2
> > +#ifdef CONFIG_PPC_MPC512x
> > +	li	r0,1
> > +	mfspr	r1,SPRN_SPRG6
> > +	rlwinm	r2,r3,17,27,31		/* Get Address bits 19:15 */
>
> Don't we want:
> 	rlwinm	r2,r3,20,27,31		/* Get address bits 15:19 */

Hmm, you are right, I must have accidentally counted LE bit-ordering ;-)
Only funny part is, that I don't measure any performance difference after 
fixing this.... *sigh*.

Regards,
Kumar Gala March 17, 2009, 12:43 p.m. UTC | #6
On Mar 17, 2009, at 5:38 AM, David Jander wrote:

> On Monday 16 March 2009 19:05:00 Kumar Gala wrote:
>> On Mar 16, 2009, at 10:52 AM, David Jander wrote:
>>> Complete workaround for DTLB errata in e300c2/c3/c4 processors.
>>>
>>> Due to the bug, the hardware-implemented LRU algorythm always goes
>>> to way
>>> 1 of the TLB. This fix implements the proposed software workaround  
>>> in
>>> form of a LRW table for chosing the TLB-way.
>>>
>>> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
>>> Signed-off-by: David Jander <david@protonic.nl>
>>>
>>> ---
>>> diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/
>>> head_32.S
>>> index 0f4fac5..3971ee4 100644
>>> --- a/arch/powerpc/kernel/head_32.S
>>> +++ b/arch/powerpc/kernel/head_32.S
>>> @@ -578,9 +578,21 @@ DataLoadTLBMiss:
>>> 	andc	r1,r3,r1		/* PP = user? (rw&dirty? 2: 3): 0 */
>>> 	mtspr	SPRN_RPA,r1
>>> 	mfspr	r3,SPRN_DMISS
>>> +	mfspr	r2,SPRN_SRR1		/* Need to restore CR0 */
>>> +	mtcrf	0x80,r2
>>> +#ifdef CONFIG_PPC_MPC512x
>>> +	li	r0,1
>>> +	mfspr	r1,SPRN_SPRG6
>>> +	rlwinm	r2,r3,17,27,31		/* Get Address bits 19:15 */
>>
>> Don't we want:
>> 	rlwinm	r2,r3,20,27,31		/* Get address bits 15:19 */
>
> Hmm, you are right, I must have accidentally counted LE bit- 
> ordering ;-)
> Only funny part is, that I don't measure any performance difference  
> after
> fixing this.... *sigh*.

I wouldn't expect it to.  You are picking some bits to hash on.  You  
picked some different bits but the distribution of addresses was  
probably similiar.

- k
Wolfgang Denk June 6, 2009, 10:07 p.m. UTC | #7
Dear David Jander,

In message <200903161652.09747.david.jander@protonic.nl> you wrote:
> Complete workaround for DTLB errata in e300c2/c3/c4 processors.
> 
> Due to the bug, the hardware-implemented LRU algorythm always goes to way
> 1 of the TLB. This fix implements the proposed software workaround in
> form of a LRW table for chosing the TLB-way.
> 
> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
> Signed-off-by: David Jander <david@protonic.nl>

What is the actual status of this patch?

Patchwork (http://patchwork.ozlabs.org/patch/24502/) says it's
"superseded" - but by what?

I can't see such code in mainline - what happened to it?

Best regards,

Wolfgang Denk
Benjamin Herrenschmidt June 6, 2009, 10:42 p.m. UTC | #8
On Sun, 2009-06-07 at 00:07 +0200, Wolfgang Denk wrote:
> Dear David Jander,
> 
> In message <200903161652.09747.david.jander@protonic.nl> you wrote:
> > Complete workaround for DTLB errata in e300c2/c3/c4 processors.
> > 
> > Due to the bug, the hardware-implemented LRU algorythm always goes to way
> > 1 of the TLB. This fix implements the proposed software workaround in
> > form of a LRW table for chosing the TLB-way.
> > 
> > Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
> > Signed-off-by: David Jander <david@protonic.nl>
> 
> What is the actual status of this patch?
> 
> Patchwork (http://patchwork.ozlabs.org/patch/24502/) says it's
> "superseded" - but by what?
> 
> I can't see such code in mainline - what happened to it?

I can see the code in mainline ... but only in the -data- TLB miss
handler, not the instruction one...

Kumar ? Shouldn't we have the workaround in both ?

Cheers,
Ben.
Kumar Gala June 8, 2009, 6:16 p.m. UTC | #9
On Jun 6, 2009, at 5:42 PM, Benjamin Herrenschmidt wrote:

> On Sun, 2009-06-07 at 00:07 +0200, Wolfgang Denk wrote:
>> Dear David Jander,
>>
>> In message <200903161652.09747.david.jander@protonic.nl> you wrote:
>>> Complete workaround for DTLB errata in e300c2/c3/c4 processors.
>>>
>>> Due to the bug, the hardware-implemented LRU algorythm always goes  
>>> to way
>>> 1 of the TLB. This fix implements the proposed software workaround  
>>> in
>>> form of a LRW table for chosing the TLB-way.
>>>
>>> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
>>> Signed-off-by: David Jander <david@protonic.nl>
>>
>> What is the actual status of this patch?
>>
>> Patchwork (http://patchwork.ozlabs.org/patch/24502/) says it's
>> "superseded" - but by what?
>>
>> I can't see such code in mainline - what happened to it?
>
> I can see the code in mainline ... but only in the -data- TLB miss
> handler, not the instruction one...
>
> Kumar ? Shouldn't we have the workaround in both ?

The errata was only for the d-side.  The patch is in the mainline  
kernel.

- k
diff mbox

Patch

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 0f4fac5..3971ee4 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -578,9 +578,21 @@  DataLoadTLBMiss:
 	andc	r1,r3,r1		/* PP = user? (rw&dirty? 2: 3): 0 */
 	mtspr	SPRN_RPA,r1
 	mfspr	r3,SPRN_DMISS
+	mfspr	r2,SPRN_SRR1		/* Need to restore CR0 */
+	mtcrf	0x80,r2
+#ifdef CONFIG_PPC_MPC512x
+	li	r0,1
+	mfspr	r1,SPRN_SPRG6
+	rlwinm	r2,r3,17,27,31		/* Get Address bits 19:15 */
+	slw	r0,r0,r2
+	xor	r1,r0,r1
+	srw	r0,r1,r2
+	mtspr   SPRN_SPRG6,r1
+	mfspr	r2,SPRN_SRR1
+	rlwimi	r2,r0,31-14,14,14
+	mtspr   SPRN_SRR1,r2
+#endif
 	tlbld	r3
-	mfspr	r3,SPRN_SRR1		/* Need to restore CR0 */
-	mtcrf	0x80,r3
 	rfi
 DataAddressInvalid:
 	mfspr	r3,SPRN_SRR1
@@ -646,9 +658,21 @@  DataStoreTLBMiss:
 	andc	r1,r3,r1		/* PP = user? 2: 0 */
 	mtspr	SPRN_RPA,r1
 	mfspr	r3,SPRN_DMISS
+	mfspr	r2,SPRN_SRR1		/* Need to restore CR0 */
+	mtcrf	0x80,r2
+#ifdef CONFIG_PPC_MPC512x
+	li	r0,1
+	mfspr	r1,SPRN_SPRG6
+	rlwinm	r2,r3,17,27,31		/* Get Address bits 19:15 */
+	slw	r0,r0,r2
+	xor	r1,r0,r1
+	srw	r0,r1,r2
+	mtspr   SPRN_SPRG6,r1
+	mfspr	r2,SPRN_SRR1
+	rlwimi	r2,r0,31-14,14,14
+	mtspr   SPRN_SRR1,r2
+#endif
 	tlbld	r3
-	mfspr	r3,SPRN_SRR1		/* Need to restore CR0 */
-	mtcrf	0x80,r3
 	rfi
 
 #ifndef CONFIG_ALTIVEC