Message ID | 1376110162-3462-1-git-send-email-James.Yang@freescale.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Michael Ellerman |
Headers | show |
On Sat, 10 Aug 2013, James Yang wrote: > Uses of get_current() that normally get optimized away still result in > a load instruction of the current pointer in 64-bit because the inline > asm uses __volatile__. This patch removes __volatile__ so that nop-ed > uses of get_current() don't actually result in a load of the pointer. > > Signed-off-by: James Yang <James.Yang@freescale.com> > --- > arch/powerpc/include/asm/current.h | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h > index e2c7f06..bb250c8 100644 > --- a/arch/powerpc/include/asm/current.h > +++ b/arch/powerpc/include/asm/current.h > @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void) > { > struct task_struct *task; > > - __asm__ __volatile__("ld %0,%1(13)" > + __asm__ ("ld %0,%1(13)" > : "=r" (task) > : "i" (offsetof(struct paca_struct, __current))); Hello, Scott's been able to put enough doubt in me to think that this is not entirely safe, even though the testing and code generation show it to work. Please reject this patch. I think there is still value in getting the unnecessary loads to be removed since it would also allow unnecessary conditional branches to be removed. I'll think about alternate ways to do this. Regards, --James
On Fri, 2013-08-23 at 18:40 -0500, James Yang wrote: > On Sat, 10 Aug 2013, James Yang wrote: > > > Uses of get_current() that normally get optimized away still result in > > a load instruction of the current pointer in 64-bit because the inline > > asm uses __volatile__. This patch removes __volatile__ so that nop-ed > > uses of get_current() don't actually result in a load of the pointer. > > > > Signed-off-by: James Yang <James.Yang@freescale.com> > > --- > > arch/powerpc/include/asm/current.h | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h > > index e2c7f06..bb250c8 100644 > > --- a/arch/powerpc/include/asm/current.h > > +++ b/arch/powerpc/include/asm/current.h > > @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void) > > { > > struct task_struct *task; > > > > - __asm__ __volatile__("ld %0,%1(13)" > > + __asm__ ("ld %0,%1(13)" > > : "=r" (task) > > : "i" (offsetof(struct paca_struct, __current))); > > > Hello, > > Scott's been able to put enough doubt in me to think that this is not > entirely safe, even though the testing and code generation show it to > work. Please reject this patch. > > I think there is still value in getting the unnecessary loads to be > removed since it would also allow unnecessary conditional branches to > be removed. I'll think about alternate ways to do this. Actually, I changed my mind in the other direction in parallel. :-P I think it's probably safe. -Scott
On Fri, 2013-08-23 at 18:40 -0500, James Yang wrote: > Scott's been able to put enough doubt in me to think that this is not > entirely safe, even though the testing and code generation show it to > work. Please reject this patch. > > I think there is still value in getting the unnecessary loads to be > removed since it would also allow unnecessary conditional branches to > be removed. I'll think about alternate ways to do this. Hrm, The problem has to do with PACA accesses moving around accross preempt boundaries, it's a bit tricky, but in the case of "current" shouldn't be a problem... while the rest of the PACA might change (CPU# etc...) current remains stable for the point of view of a given thread. So I think the patch is fine. Scott ? Now, we do need some serious rework of PACA accesses. I'm very *VERY* nervous with what we have now. A bit of grepping shows dozens of cases where gcc copies r13 into another register or even saves/restores it, it scares the shit out of me :-) My thinking is to make r13 a hidden reg like we do (or used to) on ppc32 with r2 and break down paca access into two forms: - Direct access of a single field -> asm loads/stores inline - Anything else, uses a get_paca/put_paca construct that includes a preempt_disable/enable (and maybe along with a __get_paca/__put_paca pair that doesn't). This basically does a mr of r13 into another register and basically hides the whole lot from gcc. The former would be used for single fields, the latter, while adding a potentially unnecessary mr, will be much safer vs. gcc playing games with r13. Any volunteer ? Haven't had time to do it myself so far :-) Cheers, Ben.
On Fri, 2013-08-23 at 18:48 -0500, Scott Wood wrote: > Actually, I changed my mind in the other direction in parallel. :-P > > I think it's probably safe. Yes, I think it is as well ... but only because "current" is special and whatever the r13 for the thread is, r13->current will always be the same value for that thread :-) Note: That would NOT work if we used a C construct such as local_paca->current, because in that case, gcc might be stupid enough to *copy* r13 to another reg, and later on dereference using that other reg. At that point, the paca pointer itself might become stale when used. Cheers, Ben.
diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h index e2c7f06..bb250c8 100644 --- a/arch/powerpc/include/asm/current.h +++ b/arch/powerpc/include/asm/current.h @@ -19,7 +19,7 @@ static inline struct task_struct *get_current(void) { struct task_struct *task; - __asm__ __volatile__("ld %0,%1(13)" + __asm__ ("ld %0,%1(13)" : "=r" (task) : "i" (offsetof(struct paca_struct, __current)));
Uses of get_current() that normally get optimized away still result in a load instruction of the current pointer in 64-bit because the inline asm uses __volatile__. This patch removes __volatile__ so that nop-ed uses of get_current() don't actually result in a load of the pointer. Signed-off-by: James Yang <James.Yang@freescale.com> --- arch/powerpc/include/asm/current.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)