diff mbox

[RFC] Emulate "lwsync" to run standard user land on e500 cores

Message ID 1382081880-6666-1-git-send-email-wd@denx.de (mailing list archive)
State RFC
Headers show

Commit Message

Wolfgang Denk Oct. 18, 2013, 7:38 a.m. UTC
Default Debian PowerPC doesn't work on e500 because the code contains
"lwsync" instructions, which are unsupported on this core.  As a
result, applications using this will crash with an "unhandled signal 4"
"Illegal instruction" error.

As a work around we add code to emulate this insn.  This is expensive
performance-wise, but allows to run standard user land code.

Signed-off-by: Wolfgang Denk <wd@denx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Scott Wood <scottwood@freescale.com>
---
I am aware that the clean solution to the problem is to build user
space with compiler options that match the target architecture.
However, sometimes this is just too much effort.

Also, of course the performance of such an emulation sucks. But the
the occurrence of such instructions is so rare that no significant
slowdown can be oserved.

I'm not sure if this should / could go into mainline.  I'm posting it
primarily so it can be found should anybody else need this.
- wd

 arch/powerpc/kernel/traps.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Scott Wood Oct. 18, 2013, 4:38 p.m. UTC | #1
On Fri, 2013-10-18 at 09:38 +0200, Wolfgang Denk wrote:
> Default Debian PowerPC doesn't work on e500 because the code contains
> "lwsync" instructions, which are unsupported on this core.  As a
> result, applications using this will crash with an "unhandled signal 4"
> "Illegal instruction" error.
> 
> As a work around we add code to emulate this insn.  This is expensive
> performance-wise, but allows to run standard user land code.
> 
> Signed-off-by: Wolfgang Denk <wd@denx.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Scott Wood <scottwood@freescale.com>
> ---
> I am aware that the clean solution to the problem is to build user
> space with compiler options that match the target architecture.
> However, sometimes this is just too much effort.
> 
> Also, of course the performance of such an emulation sucks. But the
> the occurrence of such instructions is so rare that no significant
> slowdown can be oserved.
> 
> I'm not sure if this should / could go into mainline.  I'm posting it
> primarily so it can be found should anybody else need this.
> - wd
> 
>  arch/powerpc/kernel/traps.c | 7 +++++++
>  1 file changed, 7 insertions(+)

There's already been a patch posted for this:
http://patchwork.ozlabs.org/patch/256747/

I plan to apply it for my next pull request.

-Scott
Wolfgang Denk Oct. 18, 2013, 6:50 p.m. UTC | #2
Dear Scott,

In message <1382114321.7979.840.camel@snotra.buserror.net> you wrote:
>
> There's already been a patch posted for this:
> http://patchwork.ozlabs.org/patch/256747/
> 
> I plan to apply it for my next pull request.

Ah, cool.  Thanks!

Best regards,

Wolfgang Denk
Kumar Gala Oct. 23, 2013, 5:07 a.m. UTC | #3
On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:

> Default Debian PowerPC doesn't work on e500 because the code contains
> "lwsync" instructions, which are unsupported on this core.  As a
> result, applications using this will crash with an "unhandled signal 4"
> "Illegal instruction" error.
> 
> As a work around we add code to emulate this insn.  This is expensive
> performance-wise, but allows to run standard user land code.
> 
> Signed-off-by: Wolfgang Denk <wd@denx.de>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Scott Wood <scottwood@freescale.com>
> ---
> I am aware that the clean solution to the problem is to build user
> space with compiler options that match the target architecture.
> However, sometimes this is just too much effort.
> 
> Also, of course the performance of such an emulation sucks. But the
> the occurrence of such instructions is so rare that no significant
> slowdown can be oserved.
> 
> I'm not sure if this should / could go into mainline.  I'm posting it
> primarily so it can be found should anybody else need this.
> - wd
> 
> arch/powerpc/kernel/traps.c | 7 +++++++
> 1 file changed, 7 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index f783c93..f330374 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
> 		return 0;
> 	}
> 
> +	/* Emulating the lwsync insn as a sync insn */
> +	if (instword == PPC_INST_LWSYNC) {
> +		PPC_WARN_EMULATED(lwsync, regs);
> +		asm volatile("sync" : : : "memory");

Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.

> +		return 0;
> +	}
> +
> 	/* Emulate the mcrxr insn.  */
> 	if ((instword & PPC_INST_MCRXR_MASK) == PPC_INST_MCRXR) {
> 		int shift = (instword >> 21) & 0x1c;
> -- 
> 1.8.3.1
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
Scott Wood Oct. 23, 2013, 10:15 a.m. UTC | #4
On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index f783c93..f330374 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
> > 		return 0;
> > 	}
> > 
> > +	/* Emulating the lwsync insn as a sync insn */
> > +	if (instword == PPC_INST_LWSYNC) {
> > +		PPC_WARN_EMULATED(lwsync, regs);
> > +		asm volatile("sync" : : : "memory");
> 
> Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.

No, it doesn't equate to a sync.  See the discussion here:
http://patchwork.ozlabs.org/patch/256747/

-Scott
Kumar Gala Oct. 24, 2013, 4:06 a.m. UTC | #5
On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:

> On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
>> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>> index f783c93..f330374 100644
>>> --- a/arch/powerpc/kernel/traps.c
>>> +++ b/arch/powerpc/kernel/traps.c
>>> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
>>> 		return 0;
>>> 	}
>>> 
>>> +	/* Emulating the lwsync insn as a sync insn */
>>> +	if (instword == PPC_INST_LWSYNC) {
>>> +		PPC_WARN_EMULATED(lwsync, regs);
>>> +		asm volatile("sync" : : : "memory");
>> 
>> Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.
> 
> No, it doesn't equate to a sync.  See the discussion here:
> http://patchwork.ozlabs.org/patch/256747/
> 

Thanks. 

I'm not sure I'm a fan of doing this as it silently hides a significant performance impact.

Could we possible re-write the userspace instruction to be a 'sync' when we hit this?

- k
Benjamin Herrenschmidt Oct. 24, 2013, 9:45 a.m. UTC | #6
On Wed, 2013-10-23 at 23:06 -0500, Kumar Gala wrote:
> On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:
> 
> > On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
> >> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
> >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>> index f783c93..f330374 100644
> >>> --- a/arch/powerpc/kernel/traps.c
> >>> +++ b/arch/powerpc/kernel/traps.c
> >>> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
> >>> 		return 0;
> >>> 	}
> >>> 
> >>> +	/* Emulating the lwsync insn as a sync insn */
> >>> +	if (instword == PPC_INST_LWSYNC) {
> >>> +		PPC_WARN_EMULATED(lwsync, regs);
> >>> +		asm volatile("sync" : : : "memory");
> >> 
> >> Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.
> > 
> > No, it doesn't equate to a sync.  See the discussion here:
> > http://patchwork.ozlabs.org/patch/256747/
> > 
> 
> Thanks. 
> 
> I'm not sure I'm a fan of doing this as it silently hides a significant performance impact.
> 
> Could we possible re-write the userspace instruction to be a 'sync' when we hit this?

Rewriting user space is a can of worms I wouldn't get into ... is any
other arch doing it ?

I'm not too worried as long as we warn and account them.

Cheers,
Ben.
Kumar Gala Oct. 24, 2013, 9:55 a.m. UTC | #7
On Oct 24, 2013, at 4:45 AM, Benjamin Herrenschmidt wrote:

> On Wed, 2013-10-23 at 23:06 -0500, Kumar Gala wrote:
>> On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:
>> 
>>> On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
>>>> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>> index f783c93..f330374 100644
>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
>>>>> 		return 0;
>>>>> 	}
>>>>> 
>>>>> +	/* Emulating the lwsync insn as a sync insn */
>>>>> +	if (instword == PPC_INST_LWSYNC) {
>>>>> +		PPC_WARN_EMULATED(lwsync, regs);
>>>>> +		asm volatile("sync" : : : "memory");
>>>> 
>>>> Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.
>>> 
>>> No, it doesn't equate to a sync.  See the discussion here:
>>> http://patchwork.ozlabs.org/patch/256747/
>>> 
>> 
>> Thanks. 
>> 
>> I'm not sure I'm a fan of doing this as it silently hides a significant performance impact.
>> 
>> Could we possible re-write the userspace instruction to be a 'sync' when we hit this?
> 
> Rewriting user space is a can of worms I wouldn't get into ... is any
> other arch doing it ?

Fair enough
> 
> I'm not too worried as long as we warn and account them.

Than, I'd ask this be under a Kconfig option that is disabled by default.  Users should have to explicitly enable this so they know what they are doing.

- k
David Laight Oct. 24, 2013, 10:18 a.m. UTC | #8
> Subject: Re: [PATCH] [RFC] Emulate "lwsync" to run standard user land on e500 cores
> 
> 
> On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:
> 
> > On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
> >> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
> >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>> index f783c93..f330374 100644
> >>> --- a/arch/powerpc/kernel/traps.c
> >>> +++ b/arch/powerpc/kernel/traps.c
> >>> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
> >>> 		return 0;
> >>> 	}
> >>>
> >>> +	/* Emulating the lwsync insn as a sync insn */
> >>> +	if (instword == PPC_INST_LWSYNC) {
> >>> +		PPC_WARN_EMULATED(lwsync, regs);
> >>> +		asm volatile("sync" : : : "memory");
...
> I'm not sure I'm a fan of doing this as it silently hides a significant performance impact.
> 
> Could we possible re-write the userspace instruction to be a 'sync' when we hit this?

You'd need to modify the memory without marking the page dirty.
(Which might be interesting for other cpu-dependant optimisations.)

	David
James Yang Oct. 24, 2013, 9:05 p.m. UTC | #9
On Thu, 24 Oct 2013, Kumar Gala wrote:

> On Oct 24, 2013, at 4:45 AM, Benjamin Herrenschmidt wrote:
> 
> > On Wed, 2013-10-23 at 23:06 -0500, Kumar Gala wrote:
> >> On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:
> >> 
> >>> On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
> >>>> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
> >>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>>>> index f783c93..f330374 100644
> >>>>> --- a/arch/powerpc/kernel/traps.c
> >>>>> +++ b/arch/powerpc/kernel/traps.c
> >>>>> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
> >>>>> 		return 0;
> >>>>> 	}
> >>>>> 
> >>>>> +	/* Emulating the lwsync insn as a sync insn */
> >>>>> +	if (instword == PPC_INST_LWSYNC) {
> >>>>> +		PPC_WARN_EMULATED(lwsync, regs);
> >>>>> +		asm volatile("sync" : : : "memory");
> >>>> 
> >>>> Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.
> >>> 
> >>> No, it doesn't equate to a sync.  See the discussion here:
> >>> http://patchwork.ozlabs.org/patch/256747/
> >>> 
> >> 
> >> Thanks. 
> >> 
> >> I'm not sure I'm a fan of doing this as it silently hides a 
> >> significant performance impact.
> >> 
> >> Could we possible re-write the userspace instruction to be a 
> >> 'sync' when we hit this?
> > 
> > Rewriting user space is a can of worms I wouldn't get into ... is 
> > any other arch doing it ?
> 
> Fair enough
> > 
> > I'm not too worried as long as we warn and account them.
> 
> Than, I'd ask this be under a Kconfig option that is disabled by 
> default.  Users should have to explicitly enable this so they know 
> what they are doing.


I think it should be enabled by default, rather than disabled, so that 
users would actually see a warning rather than get a sig 4.  Or, let 
it not be Kconfig-able so that this doesn't become a problem any more. 
(It's been 4 years since I sent to you an earlier version of this 
patch.)
Kumar Gala Oct. 25, 2013, 4:12 a.m. UTC | #10
On Oct 24, 2013, at 4:05 PM, James Yang wrote:

> On Thu, 24 Oct 2013, Kumar Gala wrote:
> 
>> On Oct 24, 2013, at 4:45 AM, Benjamin Herrenschmidt wrote:
>> 
>>> On Wed, 2013-10-23 at 23:06 -0500, Kumar Gala wrote:
>>>> On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:
>>>> 
>>>>> On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
>>>>>> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>>>> index f783c93..f330374 100644
>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
>>>>>>> 		return 0;
>>>>>>> 	}
>>>>>>> 
>>>>>>> +	/* Emulating the lwsync insn as a sync insn */
>>>>>>> +	if (instword == PPC_INST_LWSYNC) {
>>>>>>> +		PPC_WARN_EMULATED(lwsync, regs);
>>>>>>> +		asm volatile("sync" : : : "memory");
>>>>>> 
>>>>>> Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.
>>>>> 
>>>>> No, it doesn't equate to a sync.  See the discussion here:
>>>>> http://patchwork.ozlabs.org/patch/256747/
>>>>> 
>>>> 
>>>> Thanks. 
>>>> 
>>>> I'm not sure I'm a fan of doing this as it silently hides a 
>>>> significant performance impact.
>>>> 
>>>> Could we possible re-write the userspace instruction to be a 
>>>> 'sync' when we hit this?
>>> 
>>> Rewriting user space is a can of worms I wouldn't get into ... is 
>>> any other arch doing it ?
>> 
>> Fair enough
>>> 
>>> I'm not too worried as long as we warn and account them.
>> 
>> Than, I'd ask this be under a Kconfig option that is disabled by 
>> default.  Users should have to explicitly enable this so they know 
>> what they are doing.
> 
> 
> I think it should be enabled by default, rather than disabled, so that 
> users would actually see a warning rather than get a sig 4.  Or, let 
> it not be Kconfig-able so that this doesn't become a problem any more. 
> (It's been 4 years since I sent to you an earlier version of this 
> patch.)

And clearly most users don't seem terrible annoyed enough about this issue to have concerns.  I don't see why making it a Kconfig option impacts the small handful of people that happen to try and run a more generic distro on e500 cores.

- k
Yang James-RA8135 Oct. 25, 2013, 4:49 a.m. UTC | #11
Sorry for the formatting.  I'm not at my usual mailer.

From: Kumar Gala:
>On Oct 24, 2013, at 4:05 PM, James Yang wrote:
>> On Thu, 24 Oct 2013, Kumar Gala wrote:
>>> Than, I'd ask this be under a Kconfig option that is disabled by
>>> default.  Users should have to explicitly enable this so they know
>>> what they are doing.
>>
>>
>> I think it should be enabled by default, rather than disabled, so that
>> users would actually see a warning rather than get a sig 4.  Or, let
>> it not be Kconfig-able so that this doesn't become a problem any more.
>> (It's been 4 years since I sent to you an earlier version of this
>> patch.)
>
>And clearly most users don't seem terrible annoyed enough about 
>this issue to have concerns.  I don't see why making it a Kconfig 
>option impacts the small handful of people that happen to try 
> and run a more generic distro on e500 cores.


I would have to dispute that qualification of "most".  

This is not a distro issue. It's a libstdc++ portability issue. libstdc++ 
hardcodes lwsync unless __NO_LWSYNC__ is explicitly defined, 
which you only get with -mcpu=8540/-mcpu=8548.  When compiled 
for any powerpc target other than -mcpu=8540/-mcpu=8548, including
the default -mcpu=common,  libstdc++ will end up containing lwsync.  
There is no way to explicitly request libstdc++ to be built without lwsync
with an -mcpu target other than 8540/8548.

The issue is easily demonstrated by running a program that throws a 
C++ exception: __cxa_throw() is called, which has an lwsync.  This
results in an illegal instruction exception when run on an e500v1/e500v2.

Those users who insist on using a generic target for their compiler
will hit this problem, in particular, those who need to generate 
binary code that targets a wide range of powerpc targets, such as
generic distro.  It's unreasonable to require a user to recompile 
libstdc++ just to get functionality for a C++ program, since they would
have to provide two libstdc++.so along with some dynamic-linking hacks to
determine at runtime which one should be used.  Emulating lwsync 
does not prevent an e500v1/e500v2-targeted libstdc++ from providing
the optimal performance.


citation:

> > lwsync is embedded into the library unless __NO_LWSYNC__ is defined:
> > http://gcc.gnu.org/viewcvs/gcc/trunk/libstdc%2B%2B-v3/config/cpu/powerpc/atomic_word.h?revision=195701&view=markup#l30
> > -----------------------------------------------
> > #define _GLIBCXX_READ_MEM_BARRIER __asm __volatile
> > ("isync":::"memory")
> > #ifdef __NO_LWSYNC__
> > #define _GLIBCXX_WRITE_MEM_BARRIER __asm __volatile
> > ("sync":::"memory")
> > #else
> > #define _GLIBCXX_WRITE_MEM_BARRIER __asm __volatile
> > ("lwsync":::"memory")
> > #endif
> > -----------------------------------------------
> > 
> >
David Laight Oct. 25, 2013, 9:58 a.m. UTC | #12
> This is not a distro issue. It's a libstdc++ portability issue. libstdc++
> hardcodes lwsync unless __NO_LWSYNC__ is explicitly defined,
> which you only get with -mcpu=8540/-mcpu=8548.  When compiled
> for any powerpc target other than -mcpu=8540/-mcpu=8548, including
> the default -mcpu=common,  libstdc++ will end up containing lwsync.
> There is no way to explicitly request libstdc++ to be built without lwsync
> with an -mcpu target other than 8540/8548.
> 
> The issue is easily demonstrated by running a program that throws a
> C++ exception: __cxa_throw() is called, which has an lwsync.  This
> results in an illegal instruction exception when run on an e500v1/e500v2.

Perhaps libstc++ should be working out at run time whether lwsync is valid?

	David
Scott Wood Oct. 25, 2013, 10:36 a.m. UTC | #13
On Thu, 2013-10-24 at 04:55 -0500, Kumar Gala wrote:
> On Oct 24, 2013, at 4:45 AM, Benjamin Herrenschmidt wrote:
> 
> > On Wed, 2013-10-23 at 23:06 -0500, Kumar Gala wrote:
> >> On Oct 23, 2013, at 5:15 AM, Scott Wood wrote:
> >> 
> >>> On Wed, 2013-10-23 at 00:07 -0500, Kumar Gala wrote:
> >>>> On Oct 18, 2013, at 2:38 AM, Wolfgang Denk wrote:
> >>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>>>> index f783c93..f330374 100644
> >>>>> --- a/arch/powerpc/kernel/traps.c
> >>>>> +++ b/arch/powerpc/kernel/traps.c
> >>>>> @@ -986,6 +986,13 @@ static int emulate_instruction(struct pt_regs *regs)
> >>>>> 		return 0;
> >>>>> 	}
> >>>>> 
> >>>>> +	/* Emulating the lwsync insn as a sync insn */
> >>>>> +	if (instword == PPC_INST_LWSYNC) {
> >>>>> +		PPC_WARN_EMULATED(lwsync, regs);
> >>>>> +		asm volatile("sync" : : : "memory");
> >>>> 
> >>>> Do we really need the inline asm?  Doesn't the fact of just taking an exception and returning from it equate to a sync.
> >>> 
> >>> No, it doesn't equate to a sync.  See the discussion here:
> >>> http://patchwork.ozlabs.org/patch/256747/
> >>> 
> >> 
> >> Thanks. 
> >> 
> >> I'm not sure I'm a fan of doing this as it silently hides a significant performance impact.
> >> 
> >> Could we possible re-write the userspace instruction to be a 'sync' when we hit this?
> > 
> > Rewriting user space is a can of worms I wouldn't get into ... is any
> > other arch doing it ?
> 
> Fair enough
> > 
> > I'm not too worried as long as we warn and account them.
> 
> Than, I'd ask this be under a Kconfig option that is disabled by
> default.  Users should have to explicitly enable this so they know what
> they are doing.

Why should this be any different than the other emulated instructions,
which are generally either not kconfigized, or on by default in some
configs (like fp emu)?

Making sure users are aware of this is what PPC_WARN_EMULATED is for.

Has anyone measured how much this slows things down with a typical
userspace?

-Scott
Benjamin Herrenschmidt Oct. 25, 2013, 1:02 p.m. UTC | #14
On Fri, 2013-10-25 at 10:58 +0100, David Laight wrote:
> > This is not a distro issue. It's a libstdc++ portability issue. libstdc++
> > hardcodes lwsync unless __NO_LWSYNC__ is explicitly defined,
> > which you only get with -mcpu=8540/-mcpu=8548.  When compiled
> > for any powerpc target other than -mcpu=8540/-mcpu=8548, including
> > the default -mcpu=common,  libstdc++ will end up containing lwsync.
> > There is no way to explicitly request libstdc++ to be built without lwsync
> > with an -mcpu target other than 8540/8548.
> > 
> > The issue is easily demonstrated by running a program that throws a
> > C++ exception: __cxa_throw() is called, which has an lwsync.  This
> > results in an illegal instruction exception when run on an e500v1/e500v2.
> 
> Perhaps libstc++ should be working out at run time whether lwsync is valid?

Do we have enough coats of paint on this bike shed yet ? :-)

I'm personally tempted to take Scott's approach since that's what we do
for other things as well, it just works and is simple.

Cheers,
Ben.
James Yang Oct. 25, 2013, 3:13 p.m. UTC | #15
On Fri, 25 Oct 2013, David Laight wrote:

> > This is not a distro issue. It's a libstdc++ portability issue. libstdc++
> > hardcodes lwsync unless __NO_LWSYNC__ is explicitly defined,
> > which you only get with -mcpu=8540/-mcpu=8548.  When compiled
> > for any powerpc target other than -mcpu=8540/-mcpu=8548, including
> > the default -mcpu=common,  libstdc++ will end up containing lwsync.
> > There is no way to explicitly request libstdc++ to be built without lwsync
> > with an -mcpu target other than 8540/8548.
> > 
> > The issue is easily demonstrated by running a program that throws a
> > C++ exception: __cxa_throw() is called, which has an lwsync.  This
> > results in an illegal instruction exception when run on an e500v1/e500v2.
> 
> Perhaps libstc++ should be working out at run time whether lwsync is 
> valid?

lwsync has been in libstdc++ for over 8 years so there's a substantial 
amount of legacy binary code that exists such that changing libstdc++ 
now won't solve the problem.

This isn't a performance issue, it's a functional issue -- libstdc++ 
for -mcpu=common targets doesn't work on e500v1/e500v2. (QorIQ P4080 
that have e500mc and newer cores support lwsync so this is no longer 
an issue.)

If one /really/ cared about losing performance while running common 
target binaries, emulating lwsync is an inconsequential performance 
loss compared to the kernel doing floating point emulation.

Recompiling libstdc++ (and all your other userland) with -mcpu=8548 is 
what you'd have to do to avoid classic FP and use Embedded FP or SPE, 
and if you are doing that you'll get sync instead of lwsync in 
libstdc++.
James Yang Oct. 25, 2013, 3:25 p.m. UTC | #16
On Fri, 25 Oct 2013, Scott Wood wrote:

> Has anyone measured how much this slows things down with a typical
> userspace?

Not a measurement of the patch in question but an older similar patch 
on 3.0.51 (8572 running Debian 5.0.3):

$ ./test-lwsync 
cycles per emulated lwsync = 371
cycles per sync = 36

----------------------------------------------------------
#include <stdio.h>

int main (void) {
        unsigned long atb_start, atb_stop;
        unsigned long i;

        asm volatile ("mfspr %0,526" : "=r" (atb_start));
        for (i = 0; i < 100; i++) {
                asm volatile ("lwsync");
        }
        asm volatile ("mfspr %0,526" : "=r" (atb_stop));
        printf("cycles per emulated lwsync = %lu\n", (atb_stop - 
atb_start) / 100);

        asm volatile ("mfspr %0,526" : "=r" (atb_start));
        for (i = 0; i < 100; i++) {
                asm volatile ("sync");
        }
        asm volatile ("mfspr %0,526" : "=r" (atb_stop));

        printf("cycles per sync = %lu\n", (atb_stop - atb_start) / 
100);

        return 0;
}
----------------------------------------------------------
Kumar Gala Oct. 26, 2013, 7:26 a.m. UTC | #17
On Oct 25, 2013, at 8:02 AM, Benjamin Herrenschmidt wrote:

> On Fri, 2013-10-25 at 10:58 +0100, David Laight wrote:
>>> This is not a distro issue. It's a libstdc++ portability issue. libstdc++
>>> hardcodes lwsync unless __NO_LWSYNC__ is explicitly defined,
>>> which you only get with -mcpu=8540/-mcpu=8548.  When compiled
>>> for any powerpc target other than -mcpu=8540/-mcpu=8548, including
>>> the default -mcpu=common,  libstdc++ will end up containing lwsync.
>>> There is no way to explicitly request libstdc++ to be built without lwsync
>>> with an -mcpu target other than 8540/8548.
>>> 
>>> The issue is easily demonstrated by running a program that throws a
>>> C++ exception: __cxa_throw() is called, which has an lwsync.  This
>>> results in an illegal instruction exception when run on an e500v1/e500v2.
>> 
>> Perhaps libstc++ should be working out at run time whether lwsync is valid?
> 
> Do we have enough coats of paint on this bike shed yet ? :-)
> 
> I'm personally tempted to take Scott's approach since that's what we do
> for other things as well, it just works and is simple.
> 
> Cheers,
> Ben.
> 

I give in, however some should test with CONFIG_PPC_EMULATED_STATS and fix what I'm guessing is a build breakage with either patch.

- k
Wolfgang Denk Oct. 27, 2013, 10:25 a.m. UTC | #18
Dear Kumar,

In message <6BFC8EB0-1A75-41C3-985A-E3ED14846710@kernel.crashing.org> you wrote:
> 
> Fair enough
> > 
> > I'm not too worried as long as we warn and account them.
>
> Than, I'd ask this be under a Kconfig option that is disabled by
> default.  Users should have to explicitly enable this so they know what
> they are doing.

Is this really worth the effort?  Under normal situations (users are
using a user space environment that has been properly buiult for the
processor variant they are using) nobody should ever run into this
situation.  It happens only if you are already doing something wrong -
like using user space that has not been built for an E500 core on such
a machine.

In this situation, it seems more useful to me if a "standard" kernel
just works with a "standard" user space environment, even if this
includes some performance penalty - which actually should be
neglibale.  In my tests (when running standard Debian for PPC on a
E500 only very few programs actually triggered this situation, and
none of them in a time-critical way.  I doubt I would even be able to
measure the performance impact.

Best regards,

Wolfgang Denk
Wolfgang Denk Oct. 27, 2013, 10:29 a.m. UTC | #19
Dear Scott,

In message <1382697373.3926.36.camel@aoeu.buserror.net> you wrote:
>
> Has anyone measured how much this slows things down with a typical
> userspace?

In the applications I found to trigger this issue the number of traps
is so small that I can't even reliably measure any difference.

Best regards,

Wolfgang Denk
Scott Wood Oct. 28, 2013, 5:52 p.m. UTC | #20
On Fri, 2013-10-25 at 10:25 -0500, James Yang wrote:
> On Fri, 25 Oct 2013, Scott Wood wrote:
> 
> > Has anyone measured how much this slows things down with a typical
> > userspace?
> 
> Not a measurement of the patch in question but an older similar patch 
> on 3.0.51 (8572 running Debian 5.0.3):
> 
> $ ./test-lwsync 
> cycles per emulated lwsync = 371
> cycles per sync = 36

My point was more to find out how often a typical userspace executes
these instructions.  From Wolfgang's e-mail it seems the answer is not
very often.

-Scott
diff mbox

Patch

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index f783c93..f330374 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -986,6 +986,13 @@  static int emulate_instruction(struct pt_regs *regs)
 		return 0;
 	}
 
+	/* Emulating the lwsync insn as a sync insn */
+	if (instword == PPC_INST_LWSYNC) {
+		PPC_WARN_EMULATED(lwsync, regs);
+		asm volatile("sync" : : : "memory");
+		return 0;
+	}
+
 	/* Emulate the mcrxr insn.  */
 	if ((instword & PPC_INST_MCRXR_MASK) == PPC_INST_MCRXR) {
 		int shift = (instword >> 21) & 0x1c;