Patchwork powerpc/8xx: fix regression introduced by cache coherency rewrite

login
register
mail settings
Submitter Joakim Tjernlund
Date Sept. 30, 2009, 5:23 p.m.
Message ID <OFC29CE070.42D58057-ONC1257641.005F19FD-C1257641.005F8ED3@transmode.se>
Download mbox | patch
Permalink /patch/34609/
State Superseded
Headers show

Comments

Joakim Tjernlund - Sept. 30, 2009, 5:23 p.m.
>
> Rex Feany <RFeany@mrv.com> wrote on 30/09/2009 11:00:02:
> >
> > Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
> >
> > > > Ok, I have made some minor tweaks and added debug code in
> > > > do_page_fault(). Would be great if you could try on both
> > > > .31 and top of tree.
> > > >
> > > >  Jocke
> > >
> > > OOPS, found a bug. Use this one instead:
> >
> > .31 - no change, it worked before your patch and it works after.
> > None of your debugging printks show up. I tried removing the tlbil_va()
> > from do_dcache_icache_coherency() and userspace goes back to be being
> > slow/non functional.
>
> Had a look at linus tree and there is something I don't understand.
> Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> works and top of tree does not, how can that be?
>
> To me it seems more likely that some mm change introduced between .31 and
> top of tree is the culprit.
> My patch addresses the problem described in the comment:
> /* On 8xx, cache control instructions (particularly
>  * "dcbst" from flush_dcache_icache) fault as write
>  * operation if there is an unpopulated TLB entry
>  * for the address in question. To workaround that,
>  * we invalidate the TLB here, thus avoiding dcbst
>  * misbehaviour.
>  */
> Now you are using this old fix to paper over some other bug or so I think.

There is something fishy with the TLB status, looking into the mpc862 manual I
don't see how it can work reliably. Need to dwell some more.
Anyhow, I have incorporated some of my findings into a new patch,
perhaps this will be the golden one?
Benjamin Herrenschmidt - Sept. 30, 2009, 10:35 p.m.
> > Had a look at linus tree and there is something I don't understand.
> > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > works and top of tree does not, how can that be?
> >
> > To me it seems more likely that some mm change introduced between .31 and
> > top of tree is the culprit.
> > My patch addresses the problem described in the comment:
> > /* On 8xx, cache control instructions (particularly
> >  * "dcbst" from flush_dcache_icache) fault as write
> >  * operation if there is an unpopulated TLB entry
> >  * for the address in question. To workaround that,
> >  * we invalidate the TLB here, thus avoiding dcbst
> >  * misbehaviour.
> >  */
> > Now you are using this old fix to paper over some other bug or so I think.
> 
> There is something fishy with the TLB status, looking into the mpc862 manual I
> don't see how it can work reliably. Need to dwell some more.
> Anyhow, I have incorporated some of my findings into a new patch,
> perhaps this will be the golden one?

Well, I still wonder about what whole unpopulated entry business.

>From what I can see, the TLB miss code will check _PAGE_PRESENT, and
when not set, it will -still- insert something into the TLB (unlike
all other CPU types that go straight to data access faults from there).

So we end up populating with those unpopulated entries that will then
cause us to take a DSI (or ISI) instead of a TLB miss the next time
around ... and so we need to remove them once we do that, no ? IE. Once
we have actually put a valid PTE in.

At least that's my understanding and why I believe we should always have
tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
in putting it into the 2 "filter" functions in the new code.

Well.. actually, the ptep_set_access_flags() will already do a
flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
really need here would be in set_pte_filter(), to do the same if we are
writing a PTE that has _PAGE_PRESENT, at least on 8xx.

But just unconditionally doing a tlbil_va() in both the filter functions
shouldn't harm and also fix the problem, though Rex seems to indicate
that is not the case.

Now, we -might- have something else broken on 8xx, hard to tell. You may
want to try to bisect, adding back the removed tlbil_va() as you go
backward, between .31 and upstream... 

Cheers,
Ben.
Joakim Tjernlund - Oct. 1, 2009, 7:05 a.m.
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 01/10/2009 00:35:59:
>
>
> > > Had a look at linus tree and there is something I don't understand.
> > > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > > works and top of tree does not, how can that be?
> > >
> > > To me it seems more likely that some mm change introduced between .31 and
> > > top of tree is the culprit.
> > > My patch addresses the problem described in the comment:
> > > /* On 8xx, cache control instructions (particularly
> > >  * "dcbst" from flush_dcache_icache) fault as write
> > >  * operation if there is an unpopulated TLB entry
> > >  * for the address in question. To workaround that,
> > >  * we invalidate the TLB here, thus avoiding dcbst
> > >  * misbehaviour.
> > >  */
> > > Now you are using this old fix to paper over some other bug or so I think.
> >
> > There is something fishy with the TLB status, looking into the mpc862 manual I
> > don't see how it can work reliably. Need to dwell some more.
> > Anyhow, I have incorporated some of my findings into a new patch,
> > perhaps this will be the golden one?
>
> Well, I still wonder about what whole unpopulated entry business.

Yes, is a odd compared to the other ppc arch. I know why it is there
and I also know that there is alternative way to work around the problem
it is supposed to fix. I went back to the old way in my patch but it didn't
help. although I don't see why there is such a benefit to branch to DataAccess
directly, is it so expensive to to a DTLB error and then go to DataAccess?

>
> >From what I can see, the TLB miss code will check _PAGE_PRESENT, and

No, it will branch to before that happens.

> when not set, it will -still- insert something into the TLB (unlike
> all other CPU types that go straight to data access faults from there).

Yes, a non valid pte so that you will trap to DTLB Error.
The other arch also muck around to get load/store bit etc. and
accroding to the 8xx user manual there are no such info available, not even DAR:

Table 6-14. Register Settings after a Data TLB Miss Exception
Register  Setting
SRR0      Set to the EA of the instruction that caused the exception.
SRR1      1–4     0
          10–15 0
          Others Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR       IP      No change
          ME      No change
          LE      Copied from the ILE setting of the interrupted process
          Other 0

However from the past we know that DAR is avaliable (but not for dcbX insn though)

For a TLB Error there are:
Table 6-16. Register Settings after a Data TLB Error Exception
Register Setting
SRR0     Set to the EA of the instruction that caused the exception.
SRR1     1–4     0
         10–15 0
         Others—Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR      IP      No change
         ME      No change
         LE      Copied from the ILE setting of the interrupted process
         Other   0
DSISR    0       0
         1       Set if the translation of an attempted access is not found in the translation tables. Otherwise, cleared
         2–3     0
         4       Set if the memory access is not permitted by the protection mechanism; otherwise cleared
         5       0
         6       1 for a store operation; 0 for a load operation.
         7–31    0
DAR      Set to the EA of the data access that caused the exception.

For completeness here are ITLB Miss/Error too:
Table 6-13. Register Settings after an Instruction TLB Miss Exception
Register  Setting
SRR0      Set to the EA of the instruction that caused the exception.
SRR1      0–3      0
          4       1
          10      1
          11–15   0
          Others  Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR       IP      No change
          ME      No change
          LE      Copied from the ILE setting of the interrupted process
          Other 0

Table 6-15. Register Settings after an Instruction TLB Error Exception
Register Setting
SRR0     Set to the EA of the instruction that caused the exception.
SRR1     Note: Only one of bits 1, 3, and 4 will be set.
         1       1 if the translation of an attempted access is not in the translation tables. Otherwise 0
         2       0
         3       1 if the fetch access was to guarded memory when MSR[IR] = 1. Otherwise 0
         4       1 if the access is not permitted by the protection mechanism; otherwise 0.
         11–15 0
         Others Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR      IP      No change
         ME      No change
         LE      Copied from the ILE setting of the interrupted process
         Other   0

>
> So we end up populating with those unpopulated entries that will then
> cause us to take a DSI (or ISI) instead of a TLB miss the next time
> around ... and so we need to remove them once we do that, no ? IE. Once
> we have actually put a valid PTE in.
>
> At least that's my understanding and why I believe we should always have
> tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> in putting it into the 2 "filter" functions in the new code.
>
> Well.. actually, the ptep_set_access_flags() will already do a
> flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> really need here would be in set_pte_filter(), to do the same if we are
> writing a PTE that has _PAGE_PRESENT, at least on 8xx.

I am pondering another idea. I suspect that this bit is set for non-valid ptes:
1       Set if the translation of an attempted access is not found in the translation tables. Otherwise, cleared
So perhaps we can test this bit in do_page_fault() and then do the
tlbil_va() directly?

>
> But just unconditionally doing a tlbil_va() in both the filter functions
> shouldn't harm and also fix the problem, though Rex seems to indicate
> that is not the case.
>
> Now, we -might- have something else broken on 8xx, hard to tell. You may
> want to try to bisect, adding back the removed tlbil_va() as you go
> backward, between .31 and upstream...
>
> Cheers,
> Ben.
>
>
>
>
Scott Wood - Oct. 2, 2009, 9:49 p.m.
On Thu, Oct 01, 2009 at 08:35:59AM +1000, Benjamin Herrenschmidt wrote:
> >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> when not set, it will -still- insert something into the TLB (unlike
> all other CPU types that go straight to data access faults from there).
> 
> So we end up populating with those unpopulated entries that will then
> cause us to take a DSI (or ISI) instead of a TLB miss the next time
> around ... and so we need to remove them once we do that, no ? IE. Once
> we have actually put a valid PTE in.
> 
> At least that's my understanding and why I believe we should always have
> tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> in putting it into the 2 "filter" functions in the new code.
> 
> Well.. actually, the ptep_set_access_flags() will already do a
> flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> really need here would be in set_pte_filter(), to do the same if we are
> writing a PTE that has _PAGE_PRESENT, at least on 8xx.
> 
> But just unconditionally doing a tlbil_va() in both the filter functions
> shouldn't harm and also fix the problem, though Rex seems to indicate
> that is not the case.

Adding a tlbil_va to do_page_fault makes the problem go away for me (on
top of your "merge" branch) -- none of the other changes in this thread
do (assuming I didn't miss any).  FWIW, when it gets stuck on a fault,
DSISR is 0xc0000000, and handle_mm_fault returns zero.

-Scott
Benjamin Herrenschmidt - Oct. 2, 2009, 10:04 p.m.
On Fri, 2009-10-02 at 16:49 -0500, Scott Wood wrote:
> Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> top of your "merge" branch) -- none of the other changes in this thread
> do (assuming I didn't miss any).  FWIW, when it gets stuck on a fault,
> DSISR is 0xc0000000, and handle_mm_fault returns zero. 

But in that case, it should hit ptep_set_access_flags() (via
handle_mm_fault) and from there call tlbil_va (provided we add a call to
it in the relevant filter function which I proposed initially), no ?

Ben.
Joakim Tjernlund - Oct. 4, 2009, 8:10 p.m.
Scott Wood <scottwood@freescale.com> wrote on 02/10/2009 23:49:49:
>
> On Thu, Oct 01, 2009 at 08:35:59AM +1000, Benjamin Herrenschmidt wrote:
> > >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> > when not set, it will -still- insert something into the TLB (unlike
> > all other CPU types that go straight to data access faults from there).
> >
> > So we end up populating with those unpopulated entries that will then
> > cause us to take a DSI (or ISI) instead of a TLB miss the next time
> > around ... and so we need to remove them once we do that, no ? IE. Once
> > we have actually put a valid PTE in.
> >
> > At least that's my understanding and why I believe we should always have
> > tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> > in putting it into the 2 "filter" functions in the new code.
> >
> > Well.. actually, the ptep_set_access_flags() will already do a
> > flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> > really need here would be in set_pte_filter(), to do the same if we are
> > writing a PTE that has _PAGE_PRESENT, at least on 8xx.
> >
> > But just unconditionally doing a tlbil_va() in both the filter functions
> > shouldn't harm and also fix the problem, though Rex seems to indicate
> > that is not the case.
>
> Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> top of your "merge" branch) -- none of the other changes in this thread
> do (assuming I didn't miss any).  FWIW, when it gets stuck on a fault,
> DSISR is 0xc0000000, and handle_mm_fault returns zero.

Scott and Rex

I have managed to update the TLB code to make proper use of dirty and accessed states.
Advantages are:
 - I/D TLB Miss never needs to write to the linux pte, saving a few cycles
 - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
 - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
    and there will be no extra DTLB Error to actually set the changed bit
    when a page has been made dirty.
- Proper RO/RW mapping of user space.

Cons:
 - 4 more insn in TLB Miss handlers, but the since the linux pte isn't
   written it should still be a win.

However, I did this on my 2.4 tree but I can port it to 2.6 if you guys
can test it for me.

 Jocke
Benjamin Herrenschmidt - Oct. 4, 2009, 8:28 p.m.
> I have managed to update the TLB code to make proper use of dirty and accessed states.
> Advantages are:
>  - I/D TLB Miss never needs to write to the linux pte, saving a few cycles

That's good, that leaves us with only 40x to fix now. Also we can remove
atomic updates of PTEs for all non-hash. It's pointless on those CPUs
anyway.

>  - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.

No need for that neither.

ISI/DSI shouldn't touch the PTE. They should just fall back to C code
which takes care of it all.l

>  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
>     and there will be no extra DTLB Error to actually set the changed bit
>     when a page has been made dirty.
> - Proper RO/RW mapping of user space.
> 
> Cons:
>  - 4 more insn in TLB Miss handlers, but the since the linux pte isn't
>    written it should still be a win.
>
> However, I did this on my 2.4 tree but I can port it to 2.6 if you guys
> can test it for me.

Why don't you use and test 2.6 ? :-)

Cheers,
Ben.
Joakim Tjernlund - Oct. 4, 2009, 8:45 p.m.
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 04/10/2009 22:28:38:
>
> > I have managed to update the TLB code to make proper use of dirty and accessed states.
> > Advantages are:
> >  - I/D TLB Miss never needs to write to the linux pte, saving a few cycles
>
> That's good, that leaves us with only 40x to fix now. Also we can remove
> atomic updates of PTEs for all non-hash. It's pointless on those CPUs
> anyway.
>
> >  - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
>
> No need for that neither.

Since 8xx lacks HW support for ACCESSED, the only way is map
the page NoAccess and take a TLB Error on first access that sets
access bit (or bails to do_page_fault)

>
> ISI/DSI shouldn't touch the PTE. They should just fall back to C code
> which takes care of it all.l

Yes, that is what I do now(i.e I only read the pte). ISI and DSI is the
TLB Miss handlers on 8xx.

>
> >  - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> >     and there will be no extra DTLB Error to actually set the changed bit
> >     when a page has been made dirty.
> > - Proper RO/RW mapping of user space.
> >
> > Cons:
> >  - 4 more insn in TLB Miss handlers, but the since the linux pte isn't
> >    written it should still be a win.
> >
> > However, I did this on my 2.4 tree but I can port it to 2.6 if you guys
> > can test it for me.
>
> Why don't you use and test 2.6 ? :-)

Because porting my 8xx board to 2.6 isn't going to be easy so
I havn't yet. One day I might when we can't get away with 2.4 on our
old boards.
Joakim Tjernlund - Oct. 5, 2009, 7:16 p.m.
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 04/10/2009 22:28:38:
>
>
> > I have managed to update the TLB code to make proper use of dirty and accessed states.
> > Advantages are:
> >  - I/D TLB Miss never needs to write to the linux pte, saving a few cycles
>
> That's good, that leaves us with only 40x to fix now. Also we can remove
> atomic updates of PTEs for all non-hash. It's pointless on those CPUs
> anyway.
>
> >  - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
>
> No need for that neither.
>
> ISI/DSI shouldn't touch the PTE. They should just fall back to C code
> which takes care of it all.l

Ben, for my understanding: It seems to that the TLB Miss routines in
head_32.S are less than optimal as it too touches the pte every time
it hits. Would it not be better to test if ACCESSED and friends are already set
and skip storing the same pte over and over again?

 Jocke
Scott Wood - Oct. 5, 2009, 7:28 p.m.
On Sat, Oct 03, 2009 at 08:04:33AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2009-10-02 at 16:49 -0500, Scott Wood wrote:
> > Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> > top of your "merge" branch) -- none of the other changes in this thread
> > do (assuming I didn't miss any).  FWIW, when it gets stuck on a fault,
> > DSISR is 0xc0000000, and handle_mm_fault returns zero. 
> 
> But in that case, it should hit ptep_set_access_flags() (via
> handle_mm_fault) and from there call tlbil_va (provided we add a call to
> it in the relevant filter function which I proposed initially), no ?

Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
didn't put it in the filter function, because that doesn't take address as a
parameter).  I'd misread your suggestion as referring to the various changes
to set_pte_filter() that were posted.

As for unconditionally invalidating in set_pte_filter(), that doesn't boot
for me unless I check for kernel addresses -- I guess we populate page
tables that overlap the pinned large page region? 

-Scott
Benjamin Herrenschmidt - Oct. 5, 2009, 8:28 p.m.
On Mon, 2009-10-05 at 21:16 +0200, Joakim Tjernlund wrote:
> Ben, for my understanding: It seems to that the TLB Miss routines in
> head_32.S are less than optimal as it too touches the pte every time
> it hits. Would it not be better to test if ACCESSED and friends are
> already set
> and skip storing the same pte over and over again?

I wouldn't think it's a big deal, but then, the 32-bit hash code
has to also update _PAGE_HASHPTE etc... overall I wouldn't touch
it for now.

However, 8xx should instead look at what I do in recent versions
of head_44x.S or what Kumar did in head_fsl_booke.S

Cheers,
Ben.
Benjamin Herrenschmidt - Oct. 5, 2009, 8:29 p.m.
On Mon, 2009-10-05 at 14:28 -0500, Scott Wood wrote:
> Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
> didn't put it in the filter function, because that doesn't take address as a
> parameter).  I'd misread your suggestion as referring to the various changes
> to set_pte_filter() that were posted.
> 
> As for unconditionally invalidating in set_pte_filter(), that doesn't boot
> for me unless I check for kernel addresses -- I guess we populate page
> tables that overlap the pinned large page region? 

Good point. I think we do on 8xx. Does doing it in set_pte_filter()
(with the kernel check) makes any difference ? faster ? slower ? no
visible change ?

Cheers,
Ben.
Scott Wood - Oct. 5, 2009, 9:04 p.m.
Benjamin Herrenschmidt wrote:
> On Mon, 2009-10-05 at 14:28 -0500, Scott Wood wrote:
>> Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
>> didn't put it in the filter function, because that doesn't take address as a
>> parameter).  I'd misread your suggestion as referring to the various changes
>> to set_pte_filter() that were posted.
>>
>> As for unconditionally invalidating in set_pte_filter(), that doesn't boot
>> for me unless I check for kernel addresses -- I guess we populate page
>> tables that overlap the pinned large page region? 
> 
> Good point. I think we do on 8xx. Does doing it in set_pte_filter()
> (with the kernel check) makes any difference ? faster ? slower ? no
> visible change ?

ptep_set_access_flags() is enough to make the worst of it go away.  I'm 
having another intermittent problem that seems to happen more often 
without the change in set_pte_filter(), but I've yet to narrow down 
what's actually going on there.

-Scott

Patch

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 4dd38f1..b3f6687 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -709,6 +709,14 @@  ret_from_except:
 	SYNC			/* Some chip revs have problems here... */
 	MTMSRD(r10)		/* disable interrupts */

+#ifdef CONFIG_8xx
+	/* Tag DAR with a well know value.
+	 * This needs to match head_8xx.S and
+	 * do_page_fault()
+	 */
+	li	r3, 0x00f0
+	mtspr	SPRN_DAR, r3
+#endif
 	lwz	r3,_MSR(r1)	/* Returning to user mode? */
 	andi.	r0,r3,MSR_PR
 	beq	resume_kernel
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..de1fd58 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -39,6 +39,15 @@ 
 #else
 #define DO_8xx_CPU6(val, reg)
 #endif
+
+/* DAR needs to be tagged with a known value so that the
+ * DataTLB Miss/Error and do_page_fault() can recognize a
+ * buggy dcbx instruction and workaround the problem.
+ * dcbf, dcbi, dcbst, dcbz instructions do not update DAR
+ * when trapping into a Data TLB Miss/Error. See
+ * DataStoreTLBMiss and DataTLBError for details
+ */
+
 	__HEAD
 _ENTRY(_stext);
 _ENTRY(_start);
@@ -222,6 +231,7 @@  DataAccess:
 	stw	r10,_DSISR(r11)
 	mr	r5,r10
 	mfspr	r4,SPRN_DAR
+	stw	r4,_DAR(r11)
 	EXC_XFER_EE_LITE(0x300, handle_page_fault)

 /* Instruction access exception.
@@ -352,7 +362,7 @@  InstructionTLBMiss:
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
-2:	li	r11, 0x00f0
+	li	r11, 0x00f0
 	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x2d80, r3)
 	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
@@ -365,6 +375,19 @@  InstructionTLBMiss:
 	lwz	r3, 8(r0)
 #endif
 	rfi
+2:
+	mfspr	r10, SPRN_SRR1
+	rlwinm	r10,r10,0,5,3 /* clear bit 4(0x08000000) */
+	mtspr	SPRN_SRR1, r10
+
+	mfspr	r10, SPRN_M_TW	/* Restore registers */
+	lwz	r11, 0(r0)
+	mtcr	r11
+	lwz	r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+	lwz	r3, 8(r0)
+#endif
+	b	InstructionAccess

 	. = 0x1200
 DataStoreTLBMiss:
@@ -428,10 +451,11 @@  DataStoreTLBMiss:
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
-2:	li	r11, 0x00f0
+	li	r11, 0x00f0
 	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
+	mtspr	SPRN_DAR, r11		/* Tag DAR */

 	mfspr	r10, SPRN_M_TW	/* Restore registers */
 	lwz	r11, 0(r0)
@@ -441,7 +465,18 @@  DataStoreTLBMiss:
 	lwz	r3, 8(r0)
 #endif
 	rfi
+2:
+	li	r10, 0
+	mtspr	SPRN_DSISR, r10 /* Clear all bits */

+	mfspr	r10, SPRN_M_TW	/* Restore registers */
+	lwz	r11, 0(r0)
+	mtcr	r11
+	lwz	r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+	lwz	r3, 8(r0)
+#endif
+	b	DataAccess
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -492,6 +527,8 @@  DataTLBError:
 	 * assuming we only use the dcbi instruction on kernel addresses.
 	 */
 	mfspr	r10, SPRN_DAR
+	cmpwi	cr0, r10, 0x00f0	/* check if DAR holds a tag */
+	beq-	2f
 	rlwinm	r11, r10, 0, 0, 19
 	ori	r11, r11, MD_EVALID
 	mfspr	r10, SPRN_M_CASID
@@ -550,6 +587,7 @@  DataTLBError:
 	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
+	mtspr	SPRN_DAR, r11		/* Tag DAR */

 	mfspr	r10, SPRN_M_TW	/* Restore registers */
 	lwz	r11, 0(r0)