Patchwork [v2,2/3] ARM: iwmmxt: Port problematic iwmmxt support code to v7/Thumb-2

login
register
mail settings
Submitter Eric Miao
Date Sept. 8, 2011, 5:03 p.m.
Message ID <CAMPhdO82ho2WPrJ3irwLgfPOycQt=kyqK3hhtrLCjOw1vptTrg@mail.gmail.com>
Download mbox | patch
Permalink /patch/113926/
State New
Headers show

Comments

Eric Miao - Sept. 8, 2011, 5:03 p.m.
On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 08 September 2011, Dave Martin wrote:
>> The iwmmxt code contains some code to implement a pseudo-ISB, but
>> this is not buildable for Thumb-2.
>>
>> This patch replaces the pseudo-ISB with a real one for Thumb-2
>> kernels.
>>
>> Signed-off-by: Dave Martin <dave.martin@linaro.org>
>> ---
>>  arch/arm/kernel/iwmmxt.S |    9 +++++++++
>>  1 files changed, 9 insertions(+), 0 deletions(-)
>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>

Maybe it'll be much simpler to have something like below:
Jon Medhurst (Tixy) - Sept. 8, 2011, 5:13 p.m.
On Thu, 2011-09-08 at 10:03 -0700, Eric Miao wrote:
> On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 08 September 2011, Dave Martin wrote:
> >> The iwmmxt code contains some code to implement a pseudo-ISB, but
> >> this is not buildable for Thumb-2.
> >>
> >> This patch replaces the pseudo-ISB with a real one for Thumb-2
> >> kernels.
> >>
> >> Signed-off-by: Dave Martin <dave.martin@linaro.org>
> >> ---
> >>  arch/arm/kernel/iwmmxt.S |    9 +++++++++
> >>  1 files changed, 9 insertions(+), 0 deletions(-)
> >
> > Acked-by: Arnd Bergmann <arnd@arndb.de>
> >
> 
> Maybe it'll be much simpler to have something like below:
> 
> diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
> index a087838..5998f7d 100644
> --- a/arch/arm/kernel/iwmmxt.S
> +++ b/arch/arm/kernel/iwmmxt.S
> @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch)
>         PJ4(eor r1, r1, #0xf)
>         PJ4(mcr p15, 0, r1, c1, c0, 2)
> 
> -       mrc     p15, 0, r1, c2, c0, 0
> -       sub     pc, lr, r1, lsr #32             @ cpwait and return
> +       XSC(mrc p15, 0, r1, c2, c0, 0)
> +       PJ4(isb)
> +       mov     pc, lr                          @ cpwait and return

Don't we still need "sub pc, lr, r1, lsr #32" in the XSC case?
I had assumed this was to introduce a data dependency. I.e. PC depends
on value of R1 which is loaded by the MRC, so it can't complete until
that has.
Eric Miao - Sept. 8, 2011, 5:15 p.m.
On Thu, Sep 8, 2011 at 10:13 AM, Jon Medhurst (Tixy)
<jon.medhurst@linaro.org> wrote:
> On Thu, 2011-09-08 at 10:03 -0700, Eric Miao wrote:
>> On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Thursday 08 September 2011, Dave Martin wrote:
>> >> The iwmmxt code contains some code to implement a pseudo-ISB, but
>> >> this is not buildable for Thumb-2.
>> >>
>> >> This patch replaces the pseudo-ISB with a real one for Thumb-2
>> >> kernels.
>> >>
>> >> Signed-off-by: Dave Martin <dave.martin@linaro.org>
>> >> ---
>> >>  arch/arm/kernel/iwmmxt.S |    9 +++++++++
>> >>  1 files changed, 9 insertions(+), 0 deletions(-)
>> >
>> > Acked-by: Arnd Bergmann <arnd@arndb.de>
>> >
>>
>> Maybe it'll be much simpler to have something like below:
>>
>> diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
>> index a087838..5998f7d 100644
>> --- a/arch/arm/kernel/iwmmxt.S
>> +++ b/arch/arm/kernel/iwmmxt.S
>> @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch)
>>         PJ4(eor r1, r1, #0xf)
>>         PJ4(mcr p15, 0, r1, c1, c0, 2)
>>
>> -       mrc     p15, 0, r1, c2, c0, 0
>> -       sub     pc, lr, r1, lsr #32             @ cpwait and return
>> +       XSC(mrc p15, 0, r1, c2, c0, 0)
>> +       PJ4(isb)
>> +       mov     pc, lr                          @ cpwait and return
>
> Don't we still need "sub pc, lr, r1, lsr #32" in the XSC case?
> I had assumed this was to introduce a data dependency. I.e. PC depends
> on value of R1 which is loaded by the MRC, so it can't complete until
> that has.

Indeed, that's actually a good point.
Dave Martin - Sept. 8, 2011, 5:20 p.m.
On Thu, Sep 08, 2011 at 10:03:52AM -0700, Eric Miao wrote:
> On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 08 September 2011, Dave Martin wrote:
> >> The iwmmxt code contains some code to implement a pseudo-ISB, but
> >> this is not buildable for Thumb-2.
> >>
> >> This patch replaces the pseudo-ISB with a real one for Thumb-2
> >> kernels.
> >>
> >> Signed-off-by: Dave Martin <dave.martin@linaro.org>
> >> ---
> >>  arch/arm/kernel/iwmmxt.S |    9 +++++++++
> >>  1 files changed, 9 insertions(+), 0 deletions(-)
> >
> > Acked-by: Arnd Bergmann <arnd@arndb.de>
> >
> 
> Maybe it'll be much simpler to have something like below:
> 
> diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
> index a087838..5998f7d 100644
> --- a/arch/arm/kernel/iwmmxt.S
> +++ b/arch/arm/kernel/iwmmxt.S
> @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch)
>         PJ4(eor r1, r1, #0xf)
>         PJ4(mcr p15, 0, r1, c1, c0, 2)
> 
> -       mrc     p15, 0, r1, c2, c0, 0
> -       sub     pc, lr, r1, lsr #32             @ cpwait and return
> +       XSC(mrc p15, 0, r1, c2, c0, 0)
> +       PJ4(isb)
> +       mov     pc, lr                          @ cpwait and return

This won't allow the building of this code for a v7 ARM kernel with
current tools.

I think this is likely to be too much dirsuption: requiring really
new tools for Thumb-2 is unavoidable, but we should allow people
using existing tools to build this to carry on with the current
situation for the time being.  This is because of the fact that
not all current tools can support "isb" and the iwmmxt mnemonics
in the same file.

One thing we could do is to code the ISB using .long, since we
only need this for Thumb kernels.

This would allow us to get rid of the Makefile warning, since the
generated code would be the preferred v7 code in that case.

Any comments on this?

Cheers
---Dave
Nicolas Pitre - Sept. 8, 2011, 5:32 p.m.
On Thu, 8 Sep 2011, Dave Martin wrote:

> On Thu, Sep 08, 2011 at 10:03:52AM -0700, Eric Miao wrote:
> > On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Thursday 08 September 2011, Dave Martin wrote:
> > >> The iwmmxt code contains some code to implement a pseudo-ISB, but
> > >> this is not buildable for Thumb-2.
> > >>
> > >> This patch replaces the pseudo-ISB with a real one for Thumb-2
> > >> kernels.
> > >>
> > >> Signed-off-by: Dave Martin <dave.martin@linaro.org>
> > >> ---
> > >>  arch/arm/kernel/iwmmxt.S |    9 +++++++++
> > >>  1 files changed, 9 insertions(+), 0 deletions(-)
> > >
> > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > >
> > 
> > Maybe it'll be much simpler to have something like below:
> > 
> > diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
> > index a087838..5998f7d 100644
> > --- a/arch/arm/kernel/iwmmxt.S
> > +++ b/arch/arm/kernel/iwmmxt.S
> > @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch)
> >         PJ4(eor r1, r1, #0xf)
> >         PJ4(mcr p15, 0, r1, c1, c0, 2)
> > 
> > -       mrc     p15, 0, r1, c2, c0, 0
> > -       sub     pc, lr, r1, lsr #32             @ cpwait and return
> > +       XSC(mrc p15, 0, r1, c2, c0, 0)
> > +       PJ4(isb)
> > +       mov     pc, lr                          @ cpwait and return
> 
> This won't allow the building of this code for a v7 ARM kernel with
> current tools.
> 
> I think this is likely to be too much dirsuption: requiring really
> new tools for Thumb-2 is unavoidable, but we should allow people
> using existing tools to build this to carry on with the current
> situation for the time being.  This is because of the fact that
> not all current tools can support "isb" and the iwmmxt mnemonics
> in the same file.
> 
> One thing we could do is to code the ISB using .long, since we
> only need this for Thumb kernels.

You would have an ARM and a Thumb variant then, plus the legacy variant, 
right?

> This would allow us to get rid of the Makefile warning, since the
> generated code would be the preferred v7 code in that case.

Makes sense.


Nicolas
Nicolas Pitre - Sept. 8, 2011, 5:33 p.m.
On Thu, 8 Sep 2011, Jon Medhurst (Tixy) wrote:

> On Thu, 2011-09-08 at 10:03 -0700, Eric Miao wrote:
> > On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Thursday 08 September 2011, Dave Martin wrote:
> > >> The iwmmxt code contains some code to implement a pseudo-ISB, but
> > >> this is not buildable for Thumb-2.
> > >>
> > >> This patch replaces the pseudo-ISB with a real one for Thumb-2
> > >> kernels.
> > >>
> > >> Signed-off-by: Dave Martin <dave.martin@linaro.org>
> > >> ---
> > >>  arch/arm/kernel/iwmmxt.S |    9 +++++++++
> > >>  1 files changed, 9 insertions(+), 0 deletions(-)
> > >
> > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > >
> > 
> > Maybe it'll be much simpler to have something like below:
> > 
> > diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
> > index a087838..5998f7d 100644
> > --- a/arch/arm/kernel/iwmmxt.S
> > +++ b/arch/arm/kernel/iwmmxt.S
> > @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch)
> >         PJ4(eor r1, r1, #0xf)
> >         PJ4(mcr p15, 0, r1, c1, c0, 2)
> > 
> > -       mrc     p15, 0, r1, c2, c0, 0
> > -       sub     pc, lr, r1, lsr #32             @ cpwait and return
> > +       XSC(mrc p15, 0, r1, c2, c0, 0)
> > +       PJ4(isb)
> > +       mov     pc, lr                          @ cpwait and return
> 
> Don't we still need "sub pc, lr, r1, lsr #32" in the XSC case?
> I had assumed this was to introduce a data dependency. I.e. PC depends
> on value of R1 which is loaded by the MRC, so it can't complete until
> that has.

Yes, that's exactly the reason why this is coded that way.


Nicolas
Eric Miao - Sept. 8, 2011, 6:49 p.m.
On Thu, Sep 8, 2011 at 10:20 AM, Dave Martin <dave.martin@linaro.org> wrote:
> On Thu, Sep 08, 2011 at 10:03:52AM -0700, Eric Miao wrote:
>> On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Thursday 08 September 2011, Dave Martin wrote:
>> >> The iwmmxt code contains some code to implement a pseudo-ISB, but
>> >> this is not buildable for Thumb-2.
>> >>
>> >> This patch replaces the pseudo-ISB with a real one for Thumb-2
>> >> kernels.
>> >>
>> >> Signed-off-by: Dave Martin <dave.martin@linaro.org>
>> >> ---
>> >>  arch/arm/kernel/iwmmxt.S |    9 +++++++++
>> >>  1 files changed, 9 insertions(+), 0 deletions(-)
>> >
>> > Acked-by: Arnd Bergmann <arnd@arndb.de>
>> >
>>
>> Maybe it'll be much simpler to have something like below:
>>
>> diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
>> index a087838..5998f7d 100644
>> --- a/arch/arm/kernel/iwmmxt.S
>> +++ b/arch/arm/kernel/iwmmxt.S
>> @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch)
>>         PJ4(eor r1, r1, #0xf)
>>         PJ4(mcr p15, 0, r1, c1, c0, 2)
>>
>> -       mrc     p15, 0, r1, c2, c0, 0
>> -       sub     pc, lr, r1, lsr #32             @ cpwait and return
>> +       XSC(mrc p15, 0, r1, c2, c0, 0)
>> +       PJ4(isb)
>> +       mov     pc, lr                          @ cpwait and return
>
> This won't allow the building of this code for a v7 ARM kernel with
> current tools.

Ah I missed that point.

So the problem is really when compiling this file with existing toolchain,
it's downgrading to v5 compatible mode, and the instruction below

sub pc, lr, r1, lsr #32

wouldn't be encoded when building a THUMB2 kernel. Considering the
r1, lsr #32 is actually to create an explicit data dependency of the previous
co-processor instruction, would it be one option to rewrite this as something
like:

mov r1, r1
mov pc, lr
Nicolas Pitre - Sept. 8, 2011, 7:30 p.m.
On Thu, 8 Sep 2011, Eric Miao wrote:

> So the problem is really when compiling this file with existing toolchain,
> it's downgrading to v5 compatible mode, and the instruction below
> 
> sub pc, lr, r1, lsr #32
> 
> wouldn't be encoded when building a THUMB2 kernel. Considering the
> r1, lsr #32 is actually to create an explicit data dependency of the previous
> co-processor instruction, would it be one option to rewrite this as something
> like:
> 
> mov r1, r1
> mov pc, lr

That is certainly a pretty simple, obvious and straight forward way to 
deal with this issue.  I think I like this even more.


Nicolas
Jon Medhurst (Tixy) - Sept. 9, 2011, 9:55 a.m.
On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> So the problem is really when compiling this file with existing toolchain,
> it's downgrading to v5 compatible mode, and the instruction below
> 
> sub pc, lr, r1, lsr #32
> 
> wouldn't be encoded when building a THUMB2 kernel. Considering the
> r1, lsr #32 is actually to create an explicit data dependency of the previous
> co-processor instruction, would it be one option to rewrite this as something
> like:
> 
> mov r1, r1
> mov pc, lr

That doesn't include a data dependency of PC on R1, so it's possible for
MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
has completed. We would want...

add lr, lr, r1, lsr #32
mov pc, lr
Nicolas Pitre - Sept. 9, 2011, 1:21 p.m.
On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote:

> On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> > So the problem is really when compiling this file with existing toolchain,
> > it's downgrading to v5 compatible mode, and the instruction below
> > 
> > sub pc, lr, r1, lsr #32
> > 
> > wouldn't be encoded when building a THUMB2 kernel. Considering the
> > r1, lsr #32 is actually to create an explicit data dependency of the previous
> > co-processor instruction, would it be one option to rewrite this as something
> > like:
> > 
> > mov r1, r1
> > mov pc, lr
> 
> That doesn't include a data dependency of PC on R1, so it's possible for
> MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
> has completed. We would want...
> 
> add lr, lr, r1, lsr #32
> mov pc, lr

But isn't the first insn unavailable with Thumb2?

Maybe something like:

	sub	r1, r1, r1
	add	pc, lr, r1


Nicolas
Jon Medhurst (Tixy) - Sept. 9, 2011, 2:05 p.m.
On Fri, 2011-09-09 at 09:21 -0400, Nicolas Pitre wrote:
> On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote:
> 
> > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> > > So the problem is really when compiling this file with existing toolchain,
> > > it's downgrading to v5 compatible mode, and the instruction below
> > > 
> > > sub pc, lr, r1, lsr #32
> > > 
> > > wouldn't be encoded when building a THUMB2 kernel. Considering the
> > > r1, lsr #32 is actually to create an explicit data dependency of the previous
> > > co-processor instruction, would it be one option to rewrite this as something
> > > like:
> > > 
> > > mov r1, r1
> > > mov pc, lr
> > 
> > That doesn't include a data dependency of PC on R1, so it's possible for
> > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
> > has completed. We would want...
> > 
> > add lr, lr, r1, lsr #32
> > mov pc, lr
> 
> But isn't the first insn unavailable with Thumb2?

It is available in Thumb2.

> Maybe something like:
> 
> 	sub	r1, r1, r1
> 	add	pc, lr, r1

That's not allowed in Thumb2. ;-)
There aren't many instruction forms which are allowed to set PC.
Dave Martin - Sept. 9, 2011, 4:41 p.m.
On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote:
> On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote:
> 
> > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> > > So the problem is really when compiling this file with existing toolchain,
> > > it's downgrading to v5 compatible mode, and the instruction below
> > > 
> > > sub pc, lr, r1, lsr #32
> > > 
> > > wouldn't be encoded when building a THUMB2 kernel. Considering the
> > > r1, lsr #32 is actually to create an explicit data dependency of the previous
> > > co-processor instruction, would it be one option to rewrite this as something
> > > like:
> > > 
> > > mov r1, r1
> > > mov pc, lr
> > 
> > That doesn't include a data dependency of PC on R1, so it's possible for
> > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
> > has completed. We would want...
> > 
> > add lr, lr, r1, lsr #32
> > mov pc, lr
> 
> But isn't the first insn unavailable with Thumb2?
> 
> Maybe something like:
> 
> 	sub	r1, r1, r1
> 	add	pc, lr, r1

Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB.  So for the
Thumb-2 case, this should always become

	isb

Ideally, we should do this even for ARM v7 kernels, but in order not to break older
tools which don't understand -march=armv7-a+iwmmxt we may need to use a .long to
encode the ISB manually.  I'll give that a try...

Cheers
---Dave
Nicolas Pitre - Sept. 9, 2011, 5:51 p.m.
On Fri, 9 Sep 2011, Dave Martin wrote:

> On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote:
> > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote:
> > 
> > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> > > > So the problem is really when compiling this file with existing toolchain,
> > > > it's downgrading to v5 compatible mode, and the instruction below
> > > > 
> > > > sub pc, lr, r1, lsr #32
> > > > 
> > > > wouldn't be encoded when building a THUMB2 kernel. Considering the
> > > > r1, lsr #32 is actually to create an explicit data dependency of the previous
> > > > co-processor instruction, would it be one option to rewrite this as something
> > > > like:
> > > > 
> > > > mov r1, r1
> > > > mov pc, lr
> > > 
> > > That doesn't include a data dependency of PC on R1, so it's possible for
> > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
> > > has completed. We would want...
> > > 
> > > add lr, lr, r1, lsr #32
> > > mov pc, lr
> > 
> > But isn't the first insn unavailable with Thumb2?
> > 
> > Maybe something like:
> > 
> > 	sub	r1, r1, r1
> > 	add	pc, lr, r1
> 
> Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB.  So for the
> Thumb-2 case, this should always become
> 
> 	isb

Don't forget that here we are talking about the Marvell implementation.  
Since I've no doubt that the real isb certainly works, so far the 
advertized isb has always been implemented with a CP readback with a 
data dependency.  Don't forget that this CPU core can also pretend to be 
an ARMv6 and they probably didn't duplicate the whole instruction 
decoder for each modes.

So I really doubt the Marvell implementation would behave any 
differently if this special isb is expressed in terms of Thumb2 rather 
than ARM instructions as the backend is certainly common to all the 
modes. If we can use the same implementation for all modes then we'll 
just simplify maintenance of this code.


Nicolas
Dave Martin - Sept. 12, 2011, 2:37 p.m.
On Fri, Sep 09, 2011 at 01:51:30PM -0400, Nicolas Pitre wrote:
> On Fri, 9 Sep 2011, Dave Martin wrote:
> 
> > On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote:
> > > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote:
> > > 
> > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> > > > > So the problem is really when compiling this file with existing toolchain,
> > > > > it's downgrading to v5 compatible mode, and the instruction below
> > > > > 
> > > > > sub pc, lr, r1, lsr #32
> > > > > 
> > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the
> > > > > r1, lsr #32 is actually to create an explicit data dependency of the previous
> > > > > co-processor instruction, would it be one option to rewrite this as something
> > > > > like:
> > > > > 
> > > > > mov r1, r1
> > > > > mov pc, lr
> > > > 
> > > > That doesn't include a data dependency of PC on R1, so it's possible for
> > > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
> > > > has completed. We would want...
> > > > 
> > > > add lr, lr, r1, lsr #32
> > > > mov pc, lr
> > > 
> > > But isn't the first insn unavailable with Thumb2?
> > > 
> > > Maybe something like:
> > > 
> > > 	sub	r1, r1, r1
> > > 	add	pc, lr, r1
> > 
> > Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB.  So for the
> > Thumb-2 case, this should always become
> > 
> > 	isb
> 
> Don't forget that here we are talking about the Marvell implementation.  
> Since I've no doubt that the real isb certainly works, so far the 
> advertized isb has always been implemented with a CP readback with a 
> data dependency.  Don't forget that this CPU core can also pretend to be 
> an ARMv6 and they probably didn't duplicate the whole instruction 
> decoder for each modes.

If the isb instruction doesn't work for this in v7 mode, I think that
would be an architecture violation, but you're right that this may
not be available, or may not behave identically, when the CPU is
behaving like a v6 part.

> So I really doubt the Marvell implementation would behave any 
> differently if this special isb is expressed in terms of Thumb2 rather 
> than ARM instructions as the backend is certainly common to all the 
> modes. If we can use the same implementation for all modes then we'll 
> just simplify maintenance of this code.

Agreed, but the trouble is that what constitutes a suitable data
dependency is completely microarchitecture dependent -- we're relying
on side-effects outside the architecture here.

We can't code _exactly_ the same thing in Thumb-2 because of restrictions
in the instruction set, so we have to settle for something that "looks
similar" -- but there's no guarantee it will work.

Do you feel your judgement on this is authoritative?  If so, then
we can go ahead with your suggestion; otherwise we might need input
from someone else.

Cheers
---Dave
Russell King - ARM Linux - Sept. 12, 2011, 2:43 p.m.
On Mon, Sep 12, 2011 at 03:37:08PM +0100, Dave Martin wrote:
> Agreed, but the trouble is that what constitutes a suitable data
> dependency is completely microarchitecture dependent -- we're relying
> on side-effects outside the architecture here.
> 
> We can't code _exactly_ the same thing in Thumb-2 because of restrictions
> in the instruction set, so we have to settle for something that "looks
> similar" -- but there's no guarantee it will work.
> 
> Do you feel your judgement on this is authoritative?  If so, then
> we can go ahead with your suggestion; otherwise we might need input
> from someone else.

How about we ask Haojian Zhuang what the recommended sequence is for
ensuring that writes hit CP15 when executing in Thumb-2 mode on PJ4,
instead of blindly beating around the issue without really knowing
what we're doing?
Nicolas Pitre - Sept. 12, 2011, 2:55 p.m.
On Mon, 12 Sep 2011, Dave Martin wrote:

> On Fri, Sep 09, 2011 at 01:51:30PM -0400, Nicolas Pitre wrote:
> > On Fri, 9 Sep 2011, Dave Martin wrote:
> > 
> > > On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote:
> > > > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote:
> > > > 
> > > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> > > > > > So the problem is really when compiling this file with existing toolchain,
> > > > > > it's downgrading to v5 compatible mode, and the instruction below
> > > > > > 
> > > > > > sub pc, lr, r1, lsr #32
> > > > > > 
> > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the
> > > > > > r1, lsr #32 is actually to create an explicit data dependency of the previous
> > > > > > co-processor instruction, would it be one option to rewrite this as something
> > > > > > like:
> > > > > > 
> > > > > > mov r1, r1
> > > > > > mov pc, lr
> > > > > 
> > > > > That doesn't include a data dependency of PC on R1, so it's possible for
> > > > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
> > > > > has completed. We would want...
> > > > > 
> > > > > add lr, lr, r1, lsr #32
> > > > > mov pc, lr
> > > > 
> > > > But isn't the first insn unavailable with Thumb2?
> > > > 
> > > > Maybe something like:
> > > > 
> > > > 	sub	r1, r1, r1
> > > > 	add	pc, lr, r1
> > > 
> > > Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB.  So for the
> > > Thumb-2 case, this should always become
> > > 
> > > 	isb
> > 
> > Don't forget that here we are talking about the Marvell implementation.  
> > Since I've no doubt that the real isb certainly works, so far the 
> > advertized isb has always been implemented with a CP readback with a 
> > data dependency.  Don't forget that this CPU core can also pretend to be 
> > an ARMv6 and they probably didn't duplicate the whole instruction 
> > decoder for each modes.
> 
> If the isb instruction doesn't work for this in v7 mode, I think that
> would be an architecture violation, but you're right that this may
> not be available, or may not behave identically, when the CPU is
> behaving like a v6 part.

I'm sure isb is available in ARMv7 mode.

What I'm saying is that the advertised trick for ARMv6 mode certainly 
must work just as well here in ARMv7 mode.

> > So I really doubt the Marvell implementation would behave any 
> > differently if this special isb is expressed in terms of Thumb2 rather 
> > than ARM instructions as the backend is certainly common to all the 
> > modes. If we can use the same implementation for all modes then we'll 
> > just simplify maintenance of this code.
> 
> Agreed, but the trouble is that what constitutes a suitable data
> dependency is completely microarchitecture dependent -- we're relying
> on side-effects outside the architecture here.

Sure.  But what is there now is what is documented by Marvell docs i.e. 
create a data dependency on the register used to read back the CP value.  
I've seen a few implementation variations on that theme too.

> We can't code _exactly_ the same thing in Thumb-2 because of restrictions
> in the instruction set, so we have to settle for something that "looks
> similar" -- but there's no guarantee it will work.
> 
> Do you feel your judgement on this is authoritative?  If so, then
> we can go ahead with your suggestion; otherwise we might need input
> from someone else.

My suggestion would be what Tixy also proposed:

	mrc	p15, 0, r1, c2, c0, 0
	add	lr, lr, r1, lsr #32
	mov	pc, lr

Running this past people still at Marvell (I'm not anymore) wouldn't 
hurt of course.


Nicolas
Dave Martin - Sept. 12, 2011, 4:22 p.m.
On Mon, Sep 12, 2011 at 10:55:30AM -0400, Nicolas Pitre wrote:
> On Mon, 12 Sep 2011, Dave Martin wrote:
> 
> > On Fri, Sep 09, 2011 at 01:51:30PM -0400, Nicolas Pitre wrote:
> > > On Fri, 9 Sep 2011, Dave Martin wrote:
> > > 
> > > > On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote:
> > > > > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote:
> > > > > 
> > > > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote:
> > > > > > > So the problem is really when compiling this file with existing toolchain,
> > > > > > > it's downgrading to v5 compatible mode, and the instruction below
> > > > > > > 
> > > > > > > sub pc, lr, r1, lsr #32
> > > > > > > 
> > > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the
> > > > > > > r1, lsr #32 is actually to create an explicit data dependency of the previous
> > > > > > > co-processor instruction, would it be one option to rewrite this as something
> > > > > > > like:
> > > > > > > 
> > > > > > > mov r1, r1
> > > > > > > mov pc, lr
> > > > > > 
> > > > > > That doesn't include a data dependency of PC on R1, so it's possible for
> > > > > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1
> > > > > > has completed. We would want...
> > > > > > 
> > > > > > add lr, lr, r1, lsr #32
> > > > > > mov pc, lr
> > > > > 
> > > > > But isn't the first insn unavailable with Thumb2?
> > > > > 
> > > > > Maybe something like:
> > > > > 
> > > > > 	sub	r1, r1, r1
> > > > > 	add	pc, lr, r1
> > > > 
> > > > Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB.  So for the
> > > > Thumb-2 case, this should always become
> > > > 
> > > > 	isb
> > > 
> > > Don't forget that here we are talking about the Marvell implementation.  
> > > Since I've no doubt that the real isb certainly works, so far the 
> > > advertized isb has always been implemented with a CP readback with a 
> > > data dependency.  Don't forget that this CPU core can also pretend to be 
> > > an ARMv6 and they probably didn't duplicate the whole instruction 
> > > decoder for each modes.
> > 
> > If the isb instruction doesn't work for this in v7 mode, I think that
> > would be an architecture violation, but you're right that this may
> > not be available, or may not behave identically, when the CPU is
> > behaving like a v6 part.
> 
> I'm sure isb is available in ARMv7 mode.
> 
> What I'm saying is that the advertised trick for ARMv6 mode certainly 
> must work just as well here in ARMv7 mode.
> 
> > > So I really doubt the Marvell implementation would behave any 
> > > differently if this special isb is expressed in terms of Thumb2 rather 
> > > than ARM instructions as the backend is certainly common to all the 
> > > modes. If we can use the same implementation for all modes then we'll 
> > > just simplify maintenance of this code.
> > 
> > Agreed, but the trouble is that what constitutes a suitable data
> > dependency is completely microarchitecture dependent -- we're relying
> > on side-effects outside the architecture here.
> 
> Sure.  But what is there now is what is documented by Marvell docs i.e. 
> create a data dependency on the register used to read back the CP value.  
> I've seen a few implementation variations on that theme too.
> 
> > We can't code _exactly_ the same thing in Thumb-2 because of restrictions
> > in the instruction set, so we have to settle for something that "looks
> > similar" -- but there's no guarantee it will work.
> > 
> > Do you feel your judgement on this is authoritative?  If so, then
> > we can go ahead with your suggestion; otherwise we might need input
> > from someone else.
> 
> My suggestion would be what Tixy also proposed:
> 
> 	mrc	p15, 0, r1, c2, c0, 0
> 	add	lr, lr, r1, lsr #32
> 	mov	pc, lr
>
> Running this past people still at Marvell (I'm not anymore) wouldn't 
> hurt of course.

Sure, that might work, and if we have a sequence which works for all
build configurations, that's great.  But it needs to work for the right
reasons, otherwise we may be storing up nasty surprises for later.

As you/Russell suggest, I think Haojian or someone should comment.

>From an architectural point of view, I don't think that sequence has
any guaranteed barrier behaviour at all, although it will have a
barrier effect on many implementations.  As we get into v7-land, this
kind of thing becomre more and more unsafe as new implementations push
the architectural envelope.

Haojian, can you comment on whether the proposed sequence will work in
this case of synchronising CPACR updates?  If we can use the same
sequence for all v6/v7/ARM/Thumb-2 kernels, that would be preferable
to #ifdefs.

Cheers
---Dave

Patch

diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
index a087838..5998f7d 100644
--- a/arch/arm/kernel/iwmmxt.S
+++ b/arch/arm/kernel/iwmmxt.S
@@ -319,8 +319,9 @@  ENTRY(iwmmxt_task_switch)
        PJ4(eor r1, r1, #0xf)
        PJ4(mcr p15, 0, r1, c1, c0, 2)

-       mrc     p15, 0, r1, c2, c0, 0
-       sub     pc, lr, r1, lsr #32             @ cpwait and return
+       XSC(mrc p15, 0, r1, c2, c0, 0)
+       PJ4(isb)
+       mov     pc, lr                          @ cpwait and return

 /*
  * Remove Concan ownership of given task