Patchwork powerpc/ppc64: remove __volatile__ in get_current()

login
register
mail settings
Submitter James Yang
Date Aug. 10, 2013, 4:49 a.m.
Message ID <1376110162-3462-1-git-send-email-James.Yang@freescale.com>
Download mbox | patch
Permalink /patch/266186/
State New
Headers show

Comments

James Yang - Aug. 10, 2013, 4:49 a.m.
Uses of get_current() that normally get optimized away still result in
a load instruction of the current pointer in 64-bit because the inline
asm uses __volatile__.  This patch removes __volatile__ so that nop-ed
uses of get_current() don't actually result in a load of the pointer.

Signed-off-by: James Yang <James.Yang@freescale.com>
---
 arch/powerpc/include/asm/current.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
James Yang - Aug. 23, 2013, 11:40 p.m.
On Sat, 10 Aug 2013, James Yang wrote:

> Uses of get_current() that normally get optimized away still result in
> a load instruction of the current pointer in 64-bit because the inline
> asm uses __volatile__.  This patch removes __volatile__ so that nop-ed
> uses of get_current() don't actually result in a load of the pointer.
> 
> Signed-off-by: James Yang <James.Yang@freescale.com>
> ---
>  arch/powerpc/include/asm/current.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h
> index e2c7f06..bb250c8 100644
> --- a/arch/powerpc/include/asm/current.h
> +++ b/arch/powerpc/include/asm/current.h
> @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void)
>  {
>  	struct task_struct *task;
>  
> -	__asm__ __volatile__("ld %0,%1(13)"
> +	__asm__ ("ld %0,%1(13)"
>  	: "=r" (task)
>  	: "i" (offsetof(struct paca_struct, __current)));


Hello, 

Scott's been able to put enough doubt in me to think that this is not 
entirely safe, even though the testing and code generation show it to 
work.  Please reject this patch.

I think there is still value in getting the unnecessary loads to be 
removed since it would also allow unnecessary conditional branches to 
be removed.  I'll think about alternate ways to do this.

Regards,

--James
Scott Wood - Aug. 23, 2013, 11:48 p.m.
On Fri, 2013-08-23 at 18:40 -0500, James Yang wrote:
> On Sat, 10 Aug 2013, James Yang wrote:
> 
> > Uses of get_current() that normally get optimized away still result in
> > a load instruction of the current pointer in 64-bit because the inline
> > asm uses __volatile__.  This patch removes __volatile__ so that nop-ed
> > uses of get_current() don't actually result in a load of the pointer.
> > 
> > Signed-off-by: James Yang <James.Yang@freescale.com>
> > ---
> >  arch/powerpc/include/asm/current.h |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h
> > index e2c7f06..bb250c8 100644
> > --- a/arch/powerpc/include/asm/current.h
> > +++ b/arch/powerpc/include/asm/current.h
> > @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void)
> >  {
> >  	struct task_struct *task;
> >  
> > -	__asm__ __volatile__("ld %0,%1(13)"
> > +	__asm__ ("ld %0,%1(13)"
> >  	: "=r" (task)
> >  	: "i" (offsetof(struct paca_struct, __current)));
> 
> 
> Hello, 
> 
> Scott's been able to put enough doubt in me to think that this is not 
> entirely safe, even though the testing and code generation show it to 
> work.  Please reject this patch.
> 
> I think there is still value in getting the unnecessary loads to be 
> removed since it would also allow unnecessary conditional branches to 
> be removed.  I'll think about alternate ways to do this.

Actually, I changed my mind in the other direction in parallel. :-P

I think it's probably safe.

-Scott
Benjamin Herrenschmidt - Aug. 24, 2013, 12:20 a.m.
On Fri, 2013-08-23 at 18:40 -0500, James Yang wrote:
> Scott's been able to put enough doubt in me to think that this is not 
> entirely safe, even though the testing and code generation show it to 
> work.  Please reject this patch.
> 
> I think there is still value in getting the unnecessary loads to be 
> removed since it would also allow unnecessary conditional branches to 
> be removed.  I'll think about alternate ways to do this.

Hrm, The problem has to do with PACA accesses moving around accross
preempt boundaries, it's a bit tricky, but in the case of "current"
shouldn't be a problem... while the rest of the PACA might change (CPU#
etc...) current remains stable for the point of view of a given thread.

So I think the patch is fine.

Scott ?

Now, we do need some serious rework of PACA accesses. I'm very *VERY*
nervous with what we have now. A bit of grepping shows dozens of cases
where gcc copies r13 into another register or even saves/restores it, it
scares the shit out of me :-)

My thinking is to make r13 a hidden reg like we do (or used to) on ppc32
with r2 and break down paca access into two forms:

 - Direct access of a single field -> asm loads/stores inline

 - Anything else, uses a get_paca/put_paca construct that includes a
preempt_disable/enable (and maybe along with a __get_paca/__put_paca
pair that doesn't). This basically does a mr of r13 into another
register and basically hides the whole lot from gcc.

The former would be used for single fields, the latter, while adding a
potentially unnecessary mr, will be much safer vs. gcc playing games
with r13.

Any volunteer ? Haven't had time to do it myself so far :-)

Cheers,
Ben.
Benjamin Herrenschmidt - Aug. 24, 2013, 12:22 a.m.
On Fri, 2013-08-23 at 18:48 -0500, Scott Wood wrote:
> Actually, I changed my mind in the other direction in parallel. :-P
> 
> I think it's probably safe.

Yes, I think it is as well ... but only because "current" is special and
whatever the r13 for the thread is, r13->current will always be the same
value for that thread :-)

Note: That would NOT work if we used a C construct such as
local_paca->current, because in that case, gcc might be stupid enough to
*copy* r13 to another reg, and later on dereference using that other
reg. At that point, the paca pointer itself might become stale when
used.

Cheers,
Ben.

Patch

diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h
index e2c7f06..bb250c8 100644
--- a/arch/powerpc/include/asm/current.h
+++ b/arch/powerpc/include/asm/current.h
@@ -19,7 +19,7 @@  static inline struct task_struct *get_current(void)
 {
 	struct task_struct *task;
 
-	__asm__ __volatile__("ld %0,%1(13)"
+	__asm__ ("ld %0,%1(13)"
 	: "=r" (task)
 	: "i" (offsetof(struct paca_struct, __current)));