diff mbox

powerpc: add the missing required isync for the coherent icache flush

Message ID 20130819115029.GD20146@pek-khao-d1.corp.ad.wrs.com (mailing list archive)
State Superseded
Headers show

Commit Message

Kevin Hao Aug. 19, 2013, 11:50 a.m. UTC
On Mon, Aug 19, 2013 at 01:37:17PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-08-19 at 13:36 +1000, Benjamin Herrenschmidt wrote:
> 
> > The semantic of this function is to make data executable. Even if the
> > implementation has a snooping icache, it *still* needs to make sure
> > prefetched code is tossed out of the pipeline which is what isync
> > should provide.
> > 
> > The architecture actually specifies that.
> 
> In fact, on P5 and later, I think we are supposed to do a single dummy
> icbi followed by sync and isync.

Do you mean something like this? But why?


Thanks,
Kevin

> 
> Cheers,
> Ben.
> 
> > Cheers,
> > Ben.
> > 
> > > >  	blr
> > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > >  	rlwinm	r3,r3,0,0,31-PAGE_SHIFT		/* Get page base address
> > > > */
> > > > @@ -474,6 +475,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_44x)
> > > >   */
> > > >  _GLOBAL(__flush_dcache_icache_phys)
> > > >  BEGIN_FTR_SECTION
> > > > +	isync
> > > >  	blr					/* for 601, do nothing */
> > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > >  	mfmsr	r10
> > > > diff --git a/arch/powerpc/kernel/misc_64.S
> > > > b/arch/powerpc/kernel/misc_64.S
> > > > index 992a78e..d74fefb 100644
> > > > --- a/arch/powerpc/kernel/misc_64.S
> > > > +++ b/arch/powerpc/kernel/misc_64.S
> > > > @@ -69,6 +69,7 @@ PPC64_CACHES:
> > > > 
> > > >  _KPROBE(flush_icache_range)
> > > >  BEGIN_FTR_SECTION
> > > > +	isync
> > > >  	blr
> > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > >  /*
> > > > --
> > > > 1.8.3.1
> > > > 
> > > > _______________________________________________
> > > > Linuxppc-dev mailing list
> > > > Linuxppc-dev@lists.ozlabs.org
> > > > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > > 
> > 
> 
>

Comments

Benjamin Herrenschmidt Aug. 19, 2013, 12:45 p.m. UTC | #1
On Mon, 2013-08-19 at 19:50 +0800, Kevin Hao wrote:
> On Mon, Aug 19, 2013 at 01:37:17PM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2013-08-19 at 13:36 +1000, Benjamin Herrenschmidt wrote:
> > 
> > > The semantic of this function is to make data executable. Even if the
> > > implementation has a snooping icache, it *still* needs to make sure
> > > prefetched code is tossed out of the pipeline which is what isync
> > > should provide.
> > > 
> > > The architecture actually specifies that.
> > 
> > In fact, on P5 and later, I think we are supposed to do a single dummy
> > icbi followed by sync and isync.
> 
> Do you mean something like this? But why?

It has to do with making sure prefetched and/or speculated instructions
are properly tossed.

My understanding is that icbi basically tells the ifetch buffers to drop
it, sync orders the icbi and isync ensures its execution has been
synchronized. At least I *think* that's the required sequence, I have to
dbl check the arch, maybe tomorrow. I wouldn't be surprise if we also
need a sync before the icbi to order the actual stores to memory that
might have modified instructions with the icbi.

Cheers,
Ben.

> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index d74fefb..8fcdec7 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -69,6 +69,8 @@ PPC64_CACHES:
>  
>  _KPROBE(flush_icache_range)
>  BEGIN_FTR_SECTION
> +	icbi	0,r3
> +	sync
>  	isync
>  	blr
>  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> @@ -209,6 +211,8 @@ _GLOBAL(flush_inval_dcache_range)
>   */
>  _GLOBAL(__flush_dcache_icache)
>  BEGIN_FTR_SECTION
> +	icbi	0,r3
> +	sync
>  	isync
>  	blr
>  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> 
> Thanks,
> Kevin
> 
> > 
> > Cheers,
> > Ben.
> > 
> > > Cheers,
> > > Ben.
> > > 
> > > > >  	blr
> > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > >  	rlwinm	r3,r3,0,0,31-PAGE_SHIFT		/* Get page base address
> > > > > */
> > > > > @@ -474,6 +475,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_44x)
> > > > >   */
> > > > >  _GLOBAL(__flush_dcache_icache_phys)
> > > > >  BEGIN_FTR_SECTION
> > > > > +	isync
> > > > >  	blr					/* for 601, do nothing */
> > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > >  	mfmsr	r10
> > > > > diff --git a/arch/powerpc/kernel/misc_64.S
> > > > > b/arch/powerpc/kernel/misc_64.S
> > > > > index 992a78e..d74fefb 100644
> > > > > --- a/arch/powerpc/kernel/misc_64.S
> > > > > +++ b/arch/powerpc/kernel/misc_64.S
> > > > > @@ -69,6 +69,7 @@ PPC64_CACHES:
> > > > > 
> > > > >  _KPROBE(flush_icache_range)
> > > > >  BEGIN_FTR_SECTION
> > > > > +	isync
> > > > >  	blr
> > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > >  /*
> > > > > --
> > > > > 1.8.3.1
> > > > > 
> > > > > _______________________________________________
> > > > > Linuxppc-dev mailing list
> > > > > Linuxppc-dev@lists.ozlabs.org
> > > > > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > > > 
> > > 
> > 
> >
Kevin Hao Aug. 20, 2013, 12:16 p.m. UTC | #2
On Mon, Aug 19, 2013 at 10:45:35PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-08-19 at 19:50 +0800, Kevin Hao wrote:
> > On Mon, Aug 19, 2013 at 01:37:17PM +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2013-08-19 at 13:36 +1000, Benjamin Herrenschmidt wrote:
> > > 
> > > > The semantic of this function is to make data executable. Even if the
> > > > implementation has a snooping icache, it *still* needs to make sure
> > > > prefetched code is tossed out of the pipeline which is what isync
> > > > should provide.
> > > > 
> > > > The architecture actually specifies that.
> > > 
> > > In fact, on P5 and later, I think we are supposed to do a single dummy
> > > icbi followed by sync and isync.
> > 
> > Do you mean something like this? But why?
> 
> It has to do with making sure prefetched and/or speculated instructions
> are properly tossed.
> 
> My understanding is that icbi basically tells the ifetch buffers to drop
> it

Dummy question: What does the ifetch buffers mean? The instruction fetch
pipeline or instruction dispatch pipeline? Shouldn't all the prefetched
instructions in these buffers be discarded by isync?


>, sync orders the icbi and isync ensures its execution has been
> synchronized. At least I *think* that's the required sequence, I have to
> dbl check the arch, maybe tomorrow. I wouldn't be surprise if we also
> need a sync before the icbi to order the actual stores to memory that
> might have modified instructions with the icbi.

Doesn't the coherence between icache and dcache be maintained by the snooping?

Thanks,
Kevin

> 
> Cheers,
> Ben.
> 
> > diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> > index d74fefb..8fcdec7 100644
> > --- a/arch/powerpc/kernel/misc_64.S
> > +++ b/arch/powerpc/kernel/misc_64.S
> > @@ -69,6 +69,8 @@ PPC64_CACHES:
> >  
> >  _KPROBE(flush_icache_range)
> >  BEGIN_FTR_SECTION
> > +	icbi	0,r3
> > +	sync
> >  	isync
> >  	blr
> >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > @@ -209,6 +211,8 @@ _GLOBAL(flush_inval_dcache_range)
> >   */
> >  _GLOBAL(__flush_dcache_icache)
> >  BEGIN_FTR_SECTION
> > +	icbi	0,r3
> > +	sync
> >  	isync
> >  	blr
> >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > 
> > Thanks,
> > Kevin
> > 
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > > Cheers,
> > > > Ben.
> > > > 
> > > > > >  	blr
> > > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > > >  	rlwinm	r3,r3,0,0,31-PAGE_SHIFT		/* Get page base address
> > > > > > */
> > > > > > @@ -474,6 +475,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_44x)
> > > > > >   */
> > > > > >  _GLOBAL(__flush_dcache_icache_phys)
> > > > > >  BEGIN_FTR_SECTION
> > > > > > +	isync
> > > > > >  	blr					/* for 601, do nothing */
> > > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > > >  	mfmsr	r10
> > > > > > diff --git a/arch/powerpc/kernel/misc_64.S
> > > > > > b/arch/powerpc/kernel/misc_64.S
> > > > > > index 992a78e..d74fefb 100644
> > > > > > --- a/arch/powerpc/kernel/misc_64.S
> > > > > > +++ b/arch/powerpc/kernel/misc_64.S
> > > > > > @@ -69,6 +69,7 @@ PPC64_CACHES:
> > > > > > 
> > > > > >  _KPROBE(flush_icache_range)
> > > > > >  BEGIN_FTR_SECTION
> > > > > > +	isync
> > > > > >  	blr
> > > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > > >  /*
> > > > > > --
> > > > > > 1.8.3.1
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Linuxppc-dev mailing list
> > > > > > Linuxppc-dev@lists.ozlabs.org
> > > > > > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > > > > 
> > > > 
> > > 
> > > 
> 
>
Benjamin Herrenschmidt Aug. 20, 2013, 9:28 p.m. UTC | #3
On Tue, 2013-08-20 at 20:16 +0800, Kevin Hao wrote:

> Dummy question: What does the ifetch buffers mean? The instruction fetch
> pipeline or instruction dispatch pipeline? Shouldn't all the prefetched
> instructions in these buffers be discarded by isync?

Architecturally isync doesn't have to toss prefetch completely. It only
needs to make sense that context changes performed by previous
instructions (and interrupts/traps) happen at the point of the isync,
for example, it will ensure that a trap conditional is fully evaluated
before subsequent instruction execution, etc....

So in this case, it makes sure the icbi has been executed and it's the
icbi that invalidates the prefetched instructions.

> 
> >, sync orders the icbi and isync ensures its execution has been
> > synchronized. At least I *think* that's the required sequence, I have to
> > dbl check the arch, maybe tomorrow. I wouldn't be surprise if we also
> > need a sync before the icbi to order the actual stores to memory that
> > might have modified instructions with the icbi.
> 
> Doesn't the coherence between icache and dcache be maintained by the snooping?

The icache yes, but not the ifetch buffers (basically think of them as
buffering stages between icache and dispatch). At least that's my
understanding of the implementation. 

Cheers,
Ben.

> Thanks,
> Kevin
> 
> > 
> > Cheers,
> > Ben.
> > 
> > > diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> > > index d74fefb..8fcdec7 100644
> > > --- a/arch/powerpc/kernel/misc_64.S
> > > +++ b/arch/powerpc/kernel/misc_64.S
> > > @@ -69,6 +69,8 @@ PPC64_CACHES:
> > >  
> > >  _KPROBE(flush_icache_range)
> > >  BEGIN_FTR_SECTION
> > > +	icbi	0,r3
> > > +	sync
> > >  	isync
> > >  	blr
> > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > @@ -209,6 +211,8 @@ _GLOBAL(flush_inval_dcache_range)
> > >   */
> > >  _GLOBAL(__flush_dcache_icache)
> > >  BEGIN_FTR_SECTION
> > > +	icbi	0,r3
> > > +	sync
> > >  	isync
> > >  	blr
> > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > 
> > > Thanks,
> > > Kevin
> > > 
> > > > 
> > > > Cheers,
> > > > Ben.
> > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > > 
> > > > > > >  	blr
> > > > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > > > >  	rlwinm	r3,r3,0,0,31-PAGE_SHIFT		/* Get page base address
> > > > > > > */
> > > > > > > @@ -474,6 +475,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_TYPE_44x)
> > > > > > >   */
> > > > > > >  _GLOBAL(__flush_dcache_icache_phys)
> > > > > > >  BEGIN_FTR_SECTION
> > > > > > > +	isync
> > > > > > >  	blr					/* for 601, do nothing */
> > > > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > > > >  	mfmsr	r10
> > > > > > > diff --git a/arch/powerpc/kernel/misc_64.S
> > > > > > > b/arch/powerpc/kernel/misc_64.S
> > > > > > > index 992a78e..d74fefb 100644
> > > > > > > --- a/arch/powerpc/kernel/misc_64.S
> > > > > > > +++ b/arch/powerpc/kernel/misc_64.S
> > > > > > > @@ -69,6 +69,7 @@ PPC64_CACHES:
> > > > > > > 
> > > > > > >  _KPROBE(flush_icache_range)
> > > > > > >  BEGIN_FTR_SECTION
> > > > > > > +	isync
> > > > > > >  	blr
> > > > > > >  END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
> > > > > > >  /*
> > > > > > > --
> > > > > > > 1.8.3.1
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Linuxppc-dev mailing list
> > > > > > > Linuxppc-dev@lists.ozlabs.org
> > > > > > > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > > > > > 
> > > > > 
> > > > 
> > > > 
> > 
> >
Kevin Hao Aug. 21, 2013, 11:49 a.m. UTC | #4
On Wed, Aug 21, 2013 at 07:28:54AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-08-20 at 20:16 +0800, Kevin Hao wrote:
> 
> > Dummy question: What does the ifetch buffers mean? The instruction fetch
> > pipeline or instruction dispatch pipeline? Shouldn't all the prefetched
> > instructions in these buffers be discarded by isync?
> 
> Architecturally isync doesn't have to toss prefetch completely. It only
> needs to make sense that context changes performed by previous
> instructions (and interrupts/traps) happen at the point of the isync,
> for example, it will ensure that a trap conditional is fully evaluated
> before subsequent instruction execution, etc....
> 
> So in this case, it makes sure the icbi has been executed and it's the
> icbi that invalidates the prefetched instructions.
> 
> > 
> > >, sync orders the icbi and isync ensures its execution has been
> > > synchronized. At least I *think* that's the required sequence, I have to
> > > dbl check the arch, maybe tomorrow. I wouldn't be surprise if we also
> > > need a sync before the icbi to order the actual stores to memory that
> > > might have modified instructions with the icbi.
> > 
> > Doesn't the coherence between icache and dcache be maintained by the snooping?
> 
> The icache yes, but not the ifetch buffers (basically think of them as
> buffering stages between icache and dispatch). At least that's my
> understanding of the implementation. 

OK. Thanks for the explanation. I will update the patch according to your
suggestions.

Thanks,
Kevin
diff mbox

Patch

diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index d74fefb..8fcdec7 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -69,6 +69,8 @@  PPC64_CACHES:
 
 _KPROBE(flush_icache_range)
 BEGIN_FTR_SECTION
+	icbi	0,r3
+	sync
 	isync
 	blr
 END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
@@ -209,6 +211,8 @@  _GLOBAL(flush_inval_dcache_range)
  */
 _GLOBAL(__flush_dcache_icache)
 BEGIN_FTR_SECTION
+	icbi	0,r3
+	sync
 	isync
 	blr
 END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)