diff mbox

MTD: Fix Orion NAND driver compilation with ARM OABI

Message ID 20100320085507.4038.96426.stgit@pauliusz
State Not Applicable
Headers show

Commit Message

Paulius Zaleckas March 20, 2010, 8:55 a.m. UTC
We must tell GCC to use even register for variable passed
to ldrd instruction. Without this patch GCC 4.2.1 puts this
variable to r2/r3 on EABI and r3/r4 on OABI, so force it to
r2/r3. This does not change anything when EABI and OABI
compilation works OK.

Without this patch and with OABI I get:
  CC      drivers/mtd/nand/orion_nand.o
/tmp/ccMkwOCs.s: Assembler messages:
/tmp/ccMkwOCs.s:63: Error: first destination register must be even -- `ldrd r3,[ip]'
make[5]: *** [drivers/mtd/nand/orion_nand.o] Error 1

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
---

 drivers/mtd/nand/orion_nand.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

Comments

David Woodhouse March 20, 2010, 9:41 a.m. UTC | #1
On Sat, 2010-03-20 at 10:55 +0200, Paulius Zaleckas wrote:
> We must tell GCC to use even register for variable passed
> to ldrd instruction. Without this patch GCC 4.2.1 puts this
> variable to r2/r3 on EABI and r3/r4 on OABI, so force it to
> r2/r3. This does not change anything when EABI and OABI
> compilation works OK.
> 
> Without this patch and with OABI I get:
> /tmp/ccMkwOCs.s:63: Error: first destination register must be even -- `ldrd r3,[ip]'
> make[5]: *** [drivers/mtd/nand/orion_nand.o] Error 1

 ...
> -		uint64_t x;
> +		/*
> +		 * force x variable to r2/r3 registers since ldrd instruction
> +		 * requires first register to be even.
> +		 */
> +		register uint64_t x asm ("r2");
> +
>  		asm volatile ("ldrd\t%0, [%1]" : "=&r" (x) : "r" (io_base));

Hm, isn't there an asm constraint which will force it into an
appropriate register pair? Failing that, "=&2,4,6,8" ought to work.
(Um, and why the earlyclobber? Why can't io_base be passed in in one of
the same registers?)

We should try to avoid making our constraints more restrictive than they
need to be.
Paulius Zaleckas March 20, 2010, 11:20 a.m. UTC | #2
On Sat, Mar 20, 2010 at 11:41 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Sat, 2010-03-20 at 10:55 +0200, Paulius Zaleckas wrote:
>> We must tell GCC to use even register for variable passed
>> to ldrd instruction. Without this patch GCC 4.2.1 puts this
>> variable to r2/r3 on EABI and r3/r4 on OABI, so force it to
>> r2/r3. This does not change anything when EABI and OABI
>> compilation works OK.
>>
>> Without this patch and with OABI I get:
>> /tmp/ccMkwOCs.s:63: Error: first destination register must be even -- `ldrd r3,[ip]'
>> make[5]: *** [drivers/mtd/nand/orion_nand.o] Error 1
>
>  ...
>> -             uint64_t x;
>> +             /*
>> +              * force x variable to r2/r3 registers since ldrd instruction
>> +              * requires first register to be even.
>> +              */
>> +             register uint64_t x asm ("r2");
>> +
>>               asm volatile ("ldrd\t%0, [%1]" : "=&r" (x) : "r" (io_base));
>
> Hm, isn't there an asm constraint which will force it into an
> appropriate register pair?

Not that I know of...

> Failing that, "=&2,4,6,8" ought to work.

No, fails with error: matching constraint not valid in output operand

> (Um, and why the earlyclobber? Why can't io_base be passed in in one of
> the same registers?)
>
> We should try to avoid making our constraints more restrictive than they
> need to be.
>
> --
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
David Woodhouse March 20, 2010, 11:58 a.m. UTC | #3
On Sat, 2010-03-20 at 13:20 +0200, Paulius Zaleckas wrote:
> On Sat, Mar 20, 2010 at 11:41 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > On Sat, 2010-03-20 at 10:55 +0200, Paulius Zaleckas wrote:
> >> -             uint64_t x;
> >> +             /*
> >> +              * force x variable to r2/r3 registers since ldrd instruction
> >> +              * requires first register to be even.
> >> +              */
> >> +             register uint64_t x asm ("r2");
> >> +
> >>               asm volatile ("ldrd\t%0, [%1]" : "=&r" (x) : "r" (io_base));
> >
> > Hm, isn't there an asm constraint which will force it into an
> > appropriate register pair?
> 
> Not that I know of...
> 
> > Failing that, "=&r2,r4,r6,r8" ought to work.
> 
> No, fails with error: matching constraint not valid in output operand

Hm, crap -- GCC on ARM doesn't let you give specific registers, so that
trick doesn't work.

Strictly speaking, I think your version is wrong -- although you force
the variable 'x' to be stored in r2/r3, you don't actually force GCC to
use r2/r3 as the output registers for the asm statement -- it could
happily use other registers, then move their contents into r2/r3
afterwards.

Obviously it _won't_ do that most of the time, but it _could_. GCC PR
#15089 was filed for the fact that sometimes it does, but I think Nico
was missing the point -- GCC is _allowed_ to do that, and if it makes
you sad then you should be asking for better assembly constraints which
would allow you to tell it not to.

See the __asmeq() macro in <asm/system.h> for a dirty hack which will
check which registers are used and abort at compile time, although your
compilation is going to fail anyway so I'm not sure it makes much of a
difference in this particular case.

The real fix here is to add an asm constraint to GCC which allows you to
specify "any even GPR" (or whatever's most suitable for the ldrd
instruction). Being able to give specific registers, like you can on
other architectures, would be useful too.

Please file a PR, then resubmit your patch with a comment explaining
that the code is known to be broken because GCC doesn't allow you to do
any better, and containing a reference to your PR. If people copy your
code, I want them to at least know that they're propagating a bug.
Nicolas Pitre March 20, 2010, 2:41 p.m. UTC | #4
On Sat, 20 Mar 2010, David Woodhouse wrote:

> On Sat, 2010-03-20 at 13:20 +0200, Paulius Zaleckas wrote:
> > On Sat, Mar 20, 2010 at 11:41 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > > On Sat, 2010-03-20 at 10:55 +0200, Paulius Zaleckas wrote:
> > >> -             uint64_t x;
> > >> +             /*
> > >> +              * force x variable to r2/r3 registers since ldrd instruction
> > >> +              * requires first register to be even.
> > >> +              */
> > >> +             register uint64_t x asm ("r2");
> > >> +
> > >>               asm volatile ("ldrd\t%0, [%1]" : "=&r" (x) : "r" (io_base));
> > >
> > > Hm, isn't there an asm constraint which will force it into an
> > > appropriate register pair?
> > 
> > Not that I know of...

Digging into the gcc code there is indeed no way to specify an even 
register pair as a constraint. What's there is:

#define TARGET_LDRD                     (arm_arch5e && ARM_DOUBLEWORD_ALIGN)

and

#define ARM_DOUBLEWORD_ALIGN    TARGET_AAPCS_BASED

where AAPCS is equivalent to EABI.

> > > Failing that, "=&r2,r4,r6,r8" ought to work.
> > 
> > No, fails with error: matching constraint not valid in output operand
> 
> Hm, crap -- GCC on ARM doesn't let you give specific registers, so that
> trick doesn't work.
> 
> Strictly speaking, I think your version is wrong -- although you force
> the variable 'x' to be stored in r2/r3, you don't actually force GCC to
> use r2/r3 as the output registers for the asm statement -- it could
> happily use other registers, then move their contents into r2/r3
> afterwards.

Actually that's the other way around, and the GCC manual is 
contradicting you.  The value can be moved in other registers, but at 
the point of reference (which is the inline asm in this case) then the 
value has to be in the specified register.  Quoting the GCC manual in 
section 5.40 "Variables in Specified Registers":

|Local register variables in specific registers do not reserve the 
|registers, except at the point where they are used as input or output 
|operands in an `asm' statement and the `asm' statement itself is not 
|deleted.  The compiler's data flow analysis is capable of determining 
|where the specified registers contain live values, and where they are 
|available for other uses.  Stores into local register variables may be 
|deleted when they appear to be dead according to dataflow analysis.  
|References to local register variables may be deleted or moved or 
|simplified.
|
|These local variables are sometimes convenient for use with 
|the extended `asm' feature (*note Extended Asm::), if you want to write 
|one output of the assembler instruction directly into a particular 
|register.  (This will work provided the register you specify fits the 
|constraints specified for that operand in the `asm'.)

And that last paragraph is precisely what is being relied 
upon here.  Otherwise what would be the purpose left for this construct?

> Obviously it _won't_ do that most of the time, but it _could_. GCC PR
> #15089 was filed for the fact that sometimes it does, but I think Nico
> was missing the point -- GCC is _allowed_ to do that, and if it makes
> you sad then you should be asking for better assembly constraints which
> would allow you to tell it not to.

No.  PR #15089 was filed because this has been broken at some point when 
some code generation optimizations kicked in.  And the bug as now been 
fixed for more than 5 years now.

> See the __asmeq() macro in <asm/system.h> for a dirty hack which will
> check which registers are used and abort at compile time, although your
> compilation is going to fail anyway so I'm not sure it makes much of a
> difference in this particular case.

The __asmeq hack has been used as a safety guard so not to ban what used 
to be the current GCC version at the time this bug was discovered. 
Since this bug seemed to appear much less frequently
when using -Os instead of -O2 (the default on ARM) then the __asmeq was 
a good way to still use the broken GCC without ending up with broken 
code unknowingly.

Unless people are still using ancient GCC versions, there is little 
reason to worry about this bug anymore.

Just to say that I don't think that Paulius' version is wrong.  Maybe 
suboptimal but not wrong.  And his choice of r2 matches with the 
compiler's choice in the EABI case anyway, so in practice the code is as 
optimal as it would otherwise be.

> The real fix here is to add an asm constraint to GCC which allows you to
> specify "any even GPR" (or whatever's most suitable for the ldrd
> instruction). Being able to give specific registers, like you can on
> other architectures, would be useful too.

I agree completely with the fact that a proper constraint is lacking.

Being able to give a specific register is already supported though, 
although this is not directly through the inline asm constraint.

> Please file a PR, then resubmit your patch with a comment explaining
> that the code is known to be broken because GCC doesn't allow you to do
> any better, and containing a reference to your PR. If people copy your
> code, I want them to at least know that they're propagating a bug.

This is sensible I guess.


Nicolas
Mikael Pettersson March 20, 2010, 3:06 p.m. UTC | #5
David Woodhouse writes:
 > On Sat, 2010-03-20 at 13:20 +0200, Paulius Zaleckas wrote:
 > > On Sat, Mar 20, 2010 at 11:41 AM, David Woodhouse <dwmw2@infradead.org> wrote:
 > > > On Sat, 2010-03-20 at 10:55 +0200, Paulius Zaleckas wrote:
 > > >> -             uint64_t x;
 > > >> +             /*
 > > >> +              * force x variable to r2/r3 registers since ldrd instruction
 > > >> +              * requires first register to be even.
 > > >> +              */
 > > >> +             register uint64_t x asm ("r2");
 > > >> +
 > > >>               asm volatile ("ldrd\t%0, [%1]" : "=&r" (x) : "r" (io_base));
 > > >
 > > > Hm, isn't there an asm constraint which will force it into an
 > > > appropriate register pair?
 > > 
 > > Not that I know of...
 > > 
 > > > Failing that, "=&r2,r4,r6,r8" ought to work.
 > > 
 > > No, fails with error: matching constraint not valid in output operand
 > 
 > Hm, crap -- GCC on ARM doesn't let you give specific registers, so that
 > trick doesn't work.

I missed the start of this thread, but looking at orion_nand.c I fail to
see why you'd need to mess with inline asm for reading a sequence of u64
values from an I/O location to an array.

Rewriting that to proper C generates a nice ldrd;strd loop with my gcc-4.4.3.
Nicolas Pitre March 20, 2010, 3:37 p.m. UTC | #6
On Sat, 20 Mar 2010, Mikael Pettersson wrote:

> I missed the start of this thread, but looking at orion_nand.c I fail to
> see why you'd need to mess with inline asm for reading a sequence of u64
> values from an I/O location to an array.

The reason for doing so with inline asm in the first place was not 
discussed in this thread.

> Rewriting that to proper C generates a nice ldrd;strd loop with my gcc-4.4.3.

Sure, GCC version 4.4.3 does the right thing.  Not all of them do.


Nicolas
Jamie Lokier March 21, 2010, 5:16 p.m. UTC | #7
David Woodhouse wrote:
> Strictly speaking, I think your version is wrong -- although you force
> the variable 'x' to be stored in r2/r3, you don't actually force GCC to
> use r2/r3 as the output registers for the asm statement -- it could
> happily use other registers, then move their contents into r2/r3
> afterwards.

We used to do that a lot in the syscall macros in <asm/unistd.h>, on a
lot of architectures.  Were they all broken?

> Obviously it _won't_ do that most of the time, but it _could_. GCC PR
> #15089 was filed for the fact that sometimes it does, but I think Nico
> was missing the point -- GCC is _allowed_ to do that, and if it makes
> you sad then you should be asking for better assembly constraints which
> would allow you to tell it not to.

From the GCC info documentation:

    Sometimes you need to make an `asm' operand be a specific register,
    but there's no matching constraint letter for that register _by
    itself_.  To force the operand into that register, use a local variable
    for the operand and specify the register in the variable declaration.
    *Note Explicit Reg Vars::.  Then for the `asm' operand, use any
    register constraint letter that matches the register:

         register int *p1 asm ("r0") = ...;
         register int *p2 asm ("r1") = ...;
         register int *result asm ("r0");
         asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));

Fwiw, that note is present in GCC-4.0.1, but not GCC-3.3.6.  But we've
depended on that behaviour for a long time.

Note that we've depended on GCC not copying values with a dereferenced
memory location for a long time too: E.g. "+m" (*ptr) is used a lot in
spinlocks.

I'm sure I've read email confirmation (on these very lists) that GCC
will always work with a memory constraint used in that way, without
copying the value to/from a different location such as a stack slot.

But suprisingly, the GCC documentation says:

    Extended asm supports input-output or read-write operands.  Use
    the constraint character `+' to indicate such an operand and list
    it with the output operands.  You should only use read-write
    operands when the constraints for the operand (or the operand in
    which only some of the bits are to be changed) allow a register.
                                                   ^^^^^^^^^^^^^^^^

Maybe we're relying on undefined GCC behaviour for the "+m" constraint?

> See the __asmeq() macro in <asm/system.h> for a dirty hack which will
> check which registers are used and abort at compile time, although your
> compilation is going to fail anyway so I'm not sure it makes much of a
> difference in this particular case.
> 
> The real fix here is to add an asm constraint to GCC which allows you to
> specify "any even GPR" (or whatever's most suitable for the ldrd
> instruction). Being able to give specific registers, like you can on
> other architectures, would be useful too.

See above GCC documentation for using register variables to designate
specific registers.  Many supported architectures don't have asm
letter constraints for each register - hence so many of the old
_syscallN macros in <asm/unistd.h> having to use register variables.

I am surprised GCC doesn't have a constraint for "any even register
suitable for ldrd" on ARM, but I've just checked gcc-4.4.3 and it doesn't.

However, if I'm reading the source correctly, if not compiling for
Thumb-1, and GCC believes the target machine supports ldrd, then all
doubleword values are constrained to an even register pair anyway.
That's why GCC itself does not need an even-register constraint letter.

...Which is I guess why it throws up only with OABI, or with pre-arm5e
archs: GCC doesn't consider OABI targets to support ldrd.  (It's
actually some more obscure condition, let's not go there).

Something else from the lovely GCC source:

    mfix-cortex-m3-ldrd
    Target Report Var(fix_cm3_ldrd) Init(2)
    Avoid overlapping destination and address registers on LDRD instructions
    that may trigger Cortex-M3 errata.

In other words, the "=&" earlyclobber *is* needed on Cortex-M3.

Enjoy!
-- Jamie
diff mbox

Patch

diff --git a/drivers/mtd/nand/orion_nand.c b/drivers/mtd/nand/orion_nand.c
index f59c074..cb5175e 100644
--- a/drivers/mtd/nand/orion_nand.c
+++ b/drivers/mtd/nand/orion_nand.c
@@ -60,7 +60,12 @@  static void orion_nand_read_buf(struct mtd_info *mtd, uint8_t *buf, int len)
 	}
 	buf64 = (uint64_t *)buf;
 	while (i < len/8) {
-		uint64_t x;
+		/*
+		 * force x variable to r2/r3 registers since ldrd instruction
+		 * requires first register to be even.
+		 */
+		register uint64_t x asm ("r2");
+
 		asm volatile ("ldrd\t%0, [%1]" : "=&r" (x) : "r" (io_base));
 		buf64[i++] = x;
 	}