Message ID | 200903161652.09747.david.jander@protonic.nl (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
In this patch, I placed the LRW table in SPRG6 like before, but Kumar's code seems a little more compact, so I decided to use that one and fix it ;-) It's a pity we seem to have one register short in the handler, so we need to load SPRN_SRR1 twice :-( Allthough the code-path now has 1 instruction less than my previous version most of the time (and 2 instructions more when way is not adjusted), benchmark results are barely affected by this: 1.- mplayer: Total time: 50.392s (50.316s previous patch) 2.- prboom timedemo: 20.1 fps (19.9 fps previous patch) 3.- memcpy speed: 179 MByte/s (180 Mbyte/s previous patch) Conclusion: difference not measurable between v4 and v5. Best regards,
On Mon, 2009-03-16 at 16:52 +0100, David Jander wrote: > Complete workaround for DTLB errata in e300c2/c3/c4 processors. > > Due to the bug, the hardware-implemented LRU algorythm always goes to way > 1 of the TLB. This fix implements the proposed software workaround in > form of a LRW table for chosing the TLB-way. > > Signed-off-by: Kumar Gala <galak@kernel.crashing.org> > Signed-off-by: David Jander <david@protonic.nl> I think we have a winner. with one instruction slot left :) I tried your V4 and V5 and could not see any difference in speed. Acked-by: Kenneth Johansson <kenneth@southpole.se>
On Mar 16, 2009, at 11:09 AM, David Jander wrote: > > In this patch, I placed the LRW table in SPRG6 like before, but > Kumar's code > seems a little more compact, so I decided to use that one and fix > it ;-) > > It's a pity we seem to have one register short in the handler, so we > need to > load SPRN_SRR1 twice :-( > > Allthough the code-path now has 1 instruction less than my previous > version > most of the time (and 2 instructions more when way is not adjusted), > benchmark results are barely affected by this: > > 1.- mplayer: Total time: 50.392s (50.316s previous patch) > > 2.- prboom timedemo: 20.1 fps (19.9 fps previous patch) > > 3.- memcpy speed: 179 MByte/s (180 Mbyte/s previous patch) > > Conclusion: difference not measurable between v4 and v5. Can we try this memory instead of the SPRG to see if any noticeable difference? - k
On Mar 16, 2009, at 10:52 AM, David Jander wrote: > Complete workaround for DTLB errata in e300c2/c3/c4 processors. > > Due to the bug, the hardware-implemented LRU algorythm always goes > to way > 1 of the TLB. This fix implements the proposed software workaround in > form of a LRW table for chosing the TLB-way. > > Signed-off-by: Kumar Gala <galak@kernel.crashing.org> > Signed-off-by: David Jander <david@protonic.nl> > > --- > diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/ > head_32.S > index 0f4fac5..3971ee4 100644 > --- a/arch/powerpc/kernel/head_32.S > +++ b/arch/powerpc/kernel/head_32.S > @@ -578,9 +578,21 @@ DataLoadTLBMiss: > andc r1,r3,r1 /* PP = user? (rw&dirty? 2: 3): 0 */ > mtspr SPRN_RPA,r1 > mfspr r3,SPRN_DMISS > + mfspr r2,SPRN_SRR1 /* Need to restore CR0 */ > + mtcrf 0x80,r2 > +#ifdef CONFIG_PPC_MPC512x > + li r0,1 > + mfspr r1,SPRN_SPRG6 > + rlwinm r2,r3,17,27,31 /* Get Address bits 19:15 */ Don't we want: rlwinm r2,r3,20,27,31 /* Get address bits 15:19 */ > + slw r0,r0,r2 > + xor r1,r0,r1 > + srw r0,r1,r2 > + mtspr SPRN_SPRG6,r1 > + mfspr r2,SPRN_SRR1 > + rlwimi r2,r0,31-14,14,14 > + mtspr SPRN_SRR1,r2 > +#endif > tlbld r3 > - mfspr r3,SPRN_SRR1 /* Need to restore CR0 */ > - mtcrf 0x80,r3 > rfi - k
On Monday 16 March 2009 19:05:00 Kumar Gala wrote: > On Mar 16, 2009, at 10:52 AM, David Jander wrote: > > Complete workaround for DTLB errata in e300c2/c3/c4 processors. > > > > Due to the bug, the hardware-implemented LRU algorythm always goes > > to way > > 1 of the TLB. This fix implements the proposed software workaround in > > form of a LRW table for chosing the TLB-way. > > > > Signed-off-by: Kumar Gala <galak@kernel.crashing.org> > > Signed-off-by: David Jander <david@protonic.nl> > > > > --- > > diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/ > > head_32.S > > index 0f4fac5..3971ee4 100644 > > --- a/arch/powerpc/kernel/head_32.S > > +++ b/arch/powerpc/kernel/head_32.S > > @@ -578,9 +578,21 @@ DataLoadTLBMiss: > > andc r1,r3,r1 /* PP = user? (rw&dirty? 2: 3): 0 */ > > mtspr SPRN_RPA,r1 > > mfspr r3,SPRN_DMISS > > + mfspr r2,SPRN_SRR1 /* Need to restore CR0 */ > > + mtcrf 0x80,r2 > > +#ifdef CONFIG_PPC_MPC512x > > + li r0,1 > > + mfspr r1,SPRN_SPRG6 > > + rlwinm r2,r3,17,27,31 /* Get Address bits 19:15 */ > > Don't we want: > rlwinm r2,r3,20,27,31 /* Get address bits 15:19 */ Hmm, you are right, I must have accidentally counted LE bit-ordering ;-) Only funny part is, that I don't measure any performance difference after fixing this.... *sigh*. Regards,
On Mar 17, 2009, at 5:38 AM, David Jander wrote: > On Monday 16 March 2009 19:05:00 Kumar Gala wrote: >> On Mar 16, 2009, at 10:52 AM, David Jander wrote: >>> Complete workaround for DTLB errata in e300c2/c3/c4 processors. >>> >>> Due to the bug, the hardware-implemented LRU algorythm always goes >>> to way >>> 1 of the TLB. This fix implements the proposed software workaround >>> in >>> form of a LRW table for chosing the TLB-way. >>> >>> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> >>> Signed-off-by: David Jander <david@protonic.nl> >>> >>> --- >>> diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/ >>> head_32.S >>> index 0f4fac5..3971ee4 100644 >>> --- a/arch/powerpc/kernel/head_32.S >>> +++ b/arch/powerpc/kernel/head_32.S >>> @@ -578,9 +578,21 @@ DataLoadTLBMiss: >>> andc r1,r3,r1 /* PP = user? (rw&dirty? 2: 3): 0 */ >>> mtspr SPRN_RPA,r1 >>> mfspr r3,SPRN_DMISS >>> + mfspr r2,SPRN_SRR1 /* Need to restore CR0 */ >>> + mtcrf 0x80,r2 >>> +#ifdef CONFIG_PPC_MPC512x >>> + li r0,1 >>> + mfspr r1,SPRN_SPRG6 >>> + rlwinm r2,r3,17,27,31 /* Get Address bits 19:15 */ >> >> Don't we want: >> rlwinm r2,r3,20,27,31 /* Get address bits 15:19 */ > > Hmm, you are right, I must have accidentally counted LE bit- > ordering ;-) > Only funny part is, that I don't measure any performance difference > after > fixing this.... *sigh*. I wouldn't expect it to. You are picking some bits to hash on. You picked some different bits but the distribution of addresses was probably similiar. - k
Dear David Jander, In message <200903161652.09747.david.jander@protonic.nl> you wrote: > Complete workaround for DTLB errata in e300c2/c3/c4 processors. > > Due to the bug, the hardware-implemented LRU algorythm always goes to way > 1 of the TLB. This fix implements the proposed software workaround in > form of a LRW table for chosing the TLB-way. > > Signed-off-by: Kumar Gala <galak@kernel.crashing.org> > Signed-off-by: David Jander <david@protonic.nl> What is the actual status of this patch? Patchwork (http://patchwork.ozlabs.org/patch/24502/) says it's "superseded" - but by what? I can't see such code in mainline - what happened to it? Best regards, Wolfgang Denk
On Sun, 2009-06-07 at 00:07 +0200, Wolfgang Denk wrote: > Dear David Jander, > > In message <200903161652.09747.david.jander@protonic.nl> you wrote: > > Complete workaround for DTLB errata in e300c2/c3/c4 processors. > > > > Due to the bug, the hardware-implemented LRU algorythm always goes to way > > 1 of the TLB. This fix implements the proposed software workaround in > > form of a LRW table for chosing the TLB-way. > > > > Signed-off-by: Kumar Gala <galak@kernel.crashing.org> > > Signed-off-by: David Jander <david@protonic.nl> > > What is the actual status of this patch? > > Patchwork (http://patchwork.ozlabs.org/patch/24502/) says it's > "superseded" - but by what? > > I can't see such code in mainline - what happened to it? I can see the code in mainline ... but only in the -data- TLB miss handler, not the instruction one... Kumar ? Shouldn't we have the workaround in both ? Cheers, Ben.
On Jun 6, 2009, at 5:42 PM, Benjamin Herrenschmidt wrote: > On Sun, 2009-06-07 at 00:07 +0200, Wolfgang Denk wrote: >> Dear David Jander, >> >> In message <200903161652.09747.david.jander@protonic.nl> you wrote: >>> Complete workaround for DTLB errata in e300c2/c3/c4 processors. >>> >>> Due to the bug, the hardware-implemented LRU algorythm always goes >>> to way >>> 1 of the TLB. This fix implements the proposed software workaround >>> in >>> form of a LRW table for chosing the TLB-way. >>> >>> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> >>> Signed-off-by: David Jander <david@protonic.nl> >> >> What is the actual status of this patch? >> >> Patchwork (http://patchwork.ozlabs.org/patch/24502/) says it's >> "superseded" - but by what? >> >> I can't see such code in mainline - what happened to it? > > I can see the code in mainline ... but only in the -data- TLB miss > handler, not the instruction one... > > Kumar ? Shouldn't we have the workaround in both ? The errata was only for the d-side. The patch is in the mainline kernel. - k
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index 0f4fac5..3971ee4 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -578,9 +578,21 @@ DataLoadTLBMiss: andc r1,r3,r1 /* PP = user? (rw&dirty? 2: 3): 0 */ mtspr SPRN_RPA,r1 mfspr r3,SPRN_DMISS + mfspr r2,SPRN_SRR1 /* Need to restore CR0 */ + mtcrf 0x80,r2 +#ifdef CONFIG_PPC_MPC512x + li r0,1 + mfspr r1,SPRN_SPRG6 + rlwinm r2,r3,17,27,31 /* Get Address bits 19:15 */ + slw r0,r0,r2 + xor r1,r0,r1 + srw r0,r1,r2 + mtspr SPRN_SPRG6,r1 + mfspr r2,SPRN_SRR1 + rlwimi r2,r0,31-14,14,14 + mtspr SPRN_SRR1,r2 +#endif tlbld r3 - mfspr r3,SPRN_SRR1 /* Need to restore CR0 */ - mtcrf 0x80,r3 rfi DataAddressInvalid: mfspr r3,SPRN_SRR1 @@ -646,9 +658,21 @@ DataStoreTLBMiss: andc r1,r3,r1 /* PP = user? 2: 0 */ mtspr SPRN_RPA,r1 mfspr r3,SPRN_DMISS + mfspr r2,SPRN_SRR1 /* Need to restore CR0 */ + mtcrf 0x80,r2 +#ifdef CONFIG_PPC_MPC512x + li r0,1 + mfspr r1,SPRN_SPRG6 + rlwinm r2,r3,17,27,31 /* Get Address bits 19:15 */ + slw r0,r0,r2 + xor r1,r0,r1 + srw r0,r1,r2 + mtspr SPRN_SPRG6,r1 + mfspr r2,SPRN_SRR1 + rlwimi r2,r0,31-14,14,14 + mtspr SPRN_SRR1,r2 +#endif tlbld r3 - mfspr r3,SPRN_SRR1 /* Need to restore CR0 */ - mtcrf 0x80,r3 rfi #ifndef CONFIG_ALTIVEC