Patchwork [U-Boot] crc32: Correct endianness of crc32 result

login
register
mail settings
Submitter Simon Glass
Date April 5, 2013, 11:11 p.m.
Message ID <1365203470-9099-1-git-send-email-sjg@chromium.org>
Download mbox | patch
Permalink /patch/234297/
State Changes Requested
Delegated to: Wolfgang Denk
Headers show

Comments

Simon Glass - April 5, 2013, 11:11 p.m.
When crc32 is handled by the hash library, it requires the data to be in
big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
command reports incorrect data. For example, previously we might see:

Peach # crc32 40000000 100
CRC32 for 40000000 ... 400000ff ==> 0d968558

but instead with the hash library we see:

Peach # crc32 40000000 100
CRC32 for 40000000 ... 400000ff ==> 5885960d

Correct this.

Signed-off-by: Simon Glass <sjg@chromium.org>
Reviewed-by: Vadim Bendebury <vbendeb@google.com>
---
 lib/crc32.c | 6 ++++++
 1 file changed, 6 insertions(+)
Allen Martin - April 5, 2013, 11:32 p.m.
On Fri, Apr 05, 2013 at 04:11:10PM -0700, Simon Glass wrote:
> When crc32 is handled by the hash library, it requires the data to be in
> big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
> command reports incorrect data. For example, previously we might see:
> 
> Peach # crc32 40000000 100
> CRC32 for 40000000 ... 400000ff ==> 0d968558
> 
> but instead with the hash library we see:
> 
> Peach # crc32 40000000 100
> CRC32 for 40000000 ... 400000ff ==> 5885960d
> 
> Correct this.
> 
> Signed-off-by: Simon Glass <sjg@chromium.org>
> Reviewed-by: Vadim Bendebury <vbendeb@google.com>
> ---
>  lib/crc32.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/lib/crc32.c b/lib/crc32.c
> index 76205da..94720bf 100644
> --- a/lib/crc32.c
> +++ b/lib/crc32.c
> @@ -10,6 +10,7 @@
>  
>  #ifndef USE_HOSTCC
>  #include <common.h>
> +#include <asm/unaligned.h>
>  #endif
>  #include <compiler.h>
>  #include <u-boot/crc.h>
> @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int ilen,
>  	uint32_t crc;
>  
>  	crc = crc32_wd(0, input, ilen, chunk_sz);
> +#ifdef USE_HOSTCC
> +	crc = htobe32(crc);
>  	memcpy(output, &crc, sizeof(crc));
> +#else
> +	put_unaligned_be32(crc, output);
> +#endif
>  }
> -- 
> 1.8.1.3
> 

Tested on Tegra114 Dalmore, verified crc32 comes out as expected now.

Reviewed-by: Allen Martin <amartin@nvidia.com>
Tested-by: Allen Martin <amartin@nvidia.com>
Wolfgang Denk - April 6, 2013, 7:04 a.m.
Dear Simon Glass,

In message <1365203470-9099-1-git-send-email-sjg@chromium.org> you wrote:
> When crc32 is handled by the hash library, it requires the data to be in
> big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
> command reports incorrect data. For example, previously we might see:


> +#ifdef USE_HOSTCC
> +	crc = htobe32(crc);
>  	memcpy(output, &crc, sizeof(crc));
> +#else
> +	put_unaligned_be32(crc, output);
> +#endif

Why is this depending on USE_HOSTCC, and not on the endianess?

And why do we need the #ifdef?  Can we not always use htobe32() and
put_unaligned_be32() ?

Best regards,

Wolfgang Denk
Simon Glass - April 16, 2013, 9:57 p.m.
Hi Wolfgang,

On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Simon Glass,
>
> In message <1365203470-9099-1-git-send-email-sjg@chromium.org> you wrote:
>> When crc32 is handled by the hash library, it requires the data to be in
>> big-endian format, since it reads it byte-wise. Thus at present the 'crc32'
>> command reports incorrect data. For example, previously we might see:
>
>
>> +#ifdef USE_HOSTCC
>> +     crc = htobe32(crc);
>>       memcpy(output, &crc, sizeof(crc));
>> +#else
>> +     put_unaligned_be32(crc, output);
>> +#endif
>
> Why is this depending on USE_HOSTCC, and not on the endianess?

We always want big-endian in this case, since the bytes have to date
been written that way by the crc32 command.

>
> And why do we need the #ifdef?  Can we not always use htobe32() and
> put_unaligned_be32() ?

Well I don't think put_unaligned_be32 is available to user space,
which is the environment that the tools are built under. It is
available in the kernel, but that's not our environment.

I'm not happy with this solution and would be pleased to find a better
way, but I'm not sure what it is.

But this patch does fix a real bug which we should sort out before the
release, one way or another.

>
> Best regards,
>
> Wolfgang Denk

Regards,
Simon
Tom Rini - April 16, 2013, 11 p.m.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/16/2013 05:57 PM, Simon Glass wrote:
> Hi Wolfgang,
> 
> On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk <wd@denx.de> wrote:
>> Dear Simon Glass,
>> 
>> In message <1365203470-9099-1-git-send-email-sjg@chromium.org>
>> you wrote:
>>> When crc32 is handled by the hash library, it requires the data
>>> to be in big-endian format, since it reads it byte-wise. Thus
>>> at present the 'crc32' command reports incorrect data. For
>>> example, previously we might see:
>> 
>> 
>>> +#ifdef USE_HOSTCC +     crc = htobe32(crc); memcpy(output,
>>> &crc, sizeof(crc)); +#else +     put_unaligned_be32(crc,
>>> output); +#endif
>> 
>> Why is this depending on USE_HOSTCC, and not on the endianess?
> 
> We always want big-endian in this case, since the bytes have to
> date been written that way by the crc32 command.

In other words, this code is executed on both the host and the target,
and there's not a uniform endian sanitizer function.

- -- 
Tom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJRbdghAAoJENk4IS6UOR1WzzsP/j+JhIIkp9EK4YwDQ7H6F06V
i60/Y/EaVg32HAqMjo66y2GCcu7ak9fobZo3wHnAjrFkaSJWq4s+eDNOtJiHbr/O
ZwQbDJ/t5Yf7vuZ1OhBTII+pYHXYcDW+30QZpBP3+ydY8Zkg7tsAcQ5rH7KdHWuD
83k7DLjv9obn85eeVikNkfB7ONNFVRug07Obcd+jbXdQamc/VWxWZyvUwDiKGCYH
eqmss/hQ7o343FWKqsVNSxd3/tF7z3PNevWm83xJxlc5xbtyJ/8kad6qkkILtOGd
G5OOXnL7BpSqL2mxt5ruW+cwqOnp74SvoQXuM6lNZzeAmlirAginYbnDSvjQamIy
DK1BQAjodXGU7nZxffw4vKZzbGzkgRl2KkuGvQfpMJzoJRyWvrbDzIVOeF/bXXQS
UEvASOFAdqKpMPNJnIm16GUtH6/OyEWK+8HInFse7K19ycM4M/TpM2dhmVZrHG0S
Q+Xq7ZI6pEq0SjMIfhuCwYJS6lqbtlQ5eOk+KoTYXOWneOzOgDGKO+El53wDwKQ5
0icIUwNKn4ZKMg7HLE25Docx3Ez6OVBD2Aelz0wlc5FMWlmLrHe9oYaufDN36bBH
D5XrLeBDjj89mTOHl4V0U3tZ/1iLLFqxo9RyNG6lPkLOQD62vLPxjyPYqY8Q6q85
kPdToR/o/YfFk3EsiYpD
=0O7d
-----END PGP SIGNATURE-----
Simon Glass - April 16, 2013, 11:16 p.m.
Hi Tom,

On Tue, Apr 16, 2013 at 4:00 PM, Tom Rini <trini@ti.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 04/16/2013 05:57 PM, Simon Glass wrote:
>> Hi Wolfgang,
>>
>> On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk <wd@denx.de> wrote:
>>> Dear Simon Glass,
>>>
>>> In message <1365203470-9099-1-git-send-email-sjg@chromium.org>
>>> you wrote:
>>>> When crc32 is handled by the hash library, it requires the data
>>>> to be in big-endian format, since it reads it byte-wise. Thus
>>>> at present the 'crc32' command reports incorrect data. For
>>>> example, previously we might see:
>>>
>>>
>>>> +#ifdef USE_HOSTCC +     crc = htobe32(crc); memcpy(output,
>>>> &crc, sizeof(crc)); +#else +     put_unaligned_be32(crc,
>>>> output); +#endif
>>>
>>> Why is this depending on USE_HOSTCC, and not on the endianess?
>>
>> We always want big-endian in this case, since the bytes have to
>> date been written that way by the crc32 command.
>
> In other words, this code is executed on both the host and the target,
> and there's not a uniform endian sanitizer function.

Well to be honest, I don't think the code is needed on the host, and I
could put the #ifdef around the whole function. I just thought that
might be a bit short-sighted.

Regards,
Simon
Wolfgang Denk - April 17, 2013, 5:40 a.m.
Dear Simon Glass,

In message <CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com> you wrote:
> 
> >> +#ifdef USE_HOSTCC
> >> +     crc = htobe32(crc);
> >>       memcpy(output, &crc, sizeof(crc));
> >> +#else
> >> +     put_unaligned_be32(crc, output);
> >> +#endif
> >
> > Why is this depending on USE_HOSTCC, and not on the endianess?
> 
> We always want big-endian in this case, since the bytes have to date
> been written that way by the crc32 command.

Let me rephrase.  Why do we need an #ifdef here, and why depends this
on USE_HOSTCC?

> > And why do we need the #ifdef?  Can we not always use htobe32() and
> > put_unaligned_be32() ?
> 
> Well I don't think put_unaligned_be32 is available to user space,
> which is the environment that the tools are built under. It is
> available in the kernel, but that's not our environment.

It appears put_unaligned_be32() is a widely unknown, pretty exotic
function that so far has been used in ony two source files:

	drivers/usb/gadget/f_mass_storage.c
	lib/tpm.c

The implementation (in "include/linux/unaligned/generic.h") is ugly
and pretty expensive in terms of run time and memory footprint.

I would like to avoid it's usage all together.

> I'm not happy with this solution and would be pleased to find a better
> way, but I'm not sure what it is.

Does the htobe32() + memcpy() approach not work everywhere?

> But this patch does fix a real bug which we should sort out before the
> release, one way or another.

Agreed.

Best regards,

Wolfgang Denk
Simon Glass - April 17, 2013, 6:28 p.m.
Hi Wolfgang,

On Tue, Apr 16, 2013 at 10:40 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Simon Glass,
>
> In message <CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com> you wrote:
>>
>> >> +#ifdef USE_HOSTCC
>> >> +     crc = htobe32(crc);
>> >>       memcpy(output, &crc, sizeof(crc));
>> >> +#else
>> >> +     put_unaligned_be32(crc, output);
>> >> +#endif
>> >
>> > Why is this depending on USE_HOSTCC, and not on the endianess?
>>
>> We always want big-endian in this case, since the bytes have to date
>> been written that way by the crc32 command.
>
> Let me rephrase.  Why do we need an #ifdef here, and why depends this
> on USE_HOSTCC?
>
>> > And why do we need the #ifdef?  Can we not always use htobe32() and
>> > put_unaligned_be32() ?
>>
>> Well I don't think put_unaligned_be32 is available to user space,
>> which is the environment that the tools are built under. It is
>> available in the kernel, but that's not our environment.
>
> It appears put_unaligned_be32() is a widely unknown, pretty exotic
> function that so far has been used in ony two source files:
>
>         drivers/usb/gadget/f_mass_storage.c
>         lib/tpm.c
>
> The implementation (in "include/linux/unaligned/generic.h") is ugly
> and pretty expensive in terms of run time and memory footprint.
>
> I would like to avoid it's usage all together.

It ends up producing this on ARMv7:

43e2c1d8: e1a03820 lsr r3, r0, #16
43e2c1dc: e5c40003 strb r0, [r4, #3]
43e2c1e0: e5c43001 strb r3, [r4, #1]
43e2c1e4: e1a02423 lsr r2, r3, #8
43e2c1e8: e7e73450 ubfx r3, r0, #8, #8
43e2c1ec: e5c42000 strb r2, [r4]
43e2c1f0: e5c43002 strb r3, [r4, #2]

In fact with unaligned support we could optimise this.

>
>> I'm not happy with this solution and would be pleased to find a better
>> way, but I'm not sure what it is.
>
> Does the htobe32() + memcpy() approach not work everywhere?

But htobe32() isn't available in U-Boot.

I tried using:

#ifdef USE_HOSTCC
   crc = htobe32(crc);
#else
   crc = cpu_to_be32(crc);
#endif
   memcpy(output, &crc, sizeof(crc));


This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
lot slower due to the overhead of a very small memcpy().

43e2c1d8: e28d1008 add r1, sp, #8
43e2c1dc: e3a02004 mov r2, #4
43e2c1e0: e6bf0f30 rev r0, r0
43e2c1e4: e5210004 str r0, [r1, #-4]!
43e2c1e8: e1a00004 mov r0, r4
43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>

>
>> But this patch does fix a real bug which we should sort out before the
>> release, one way or another.
>
> Agreed.

Let me know what you prefer and I will update the patch.

Regards,
Simon
Albert ARIBAUD - April 17, 2013, 7:23 p.m.
Hi Simon -- and sorry for the dupe.

On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass <sjg@chromium.org>
wrote:

> I tried using:
> 
> #ifdef USE_HOSTCC
>    crc = htobe32(crc);
> #else
>    crc = cpu_to_be32(crc);
> #endif
>    memcpy(output, &crc, sizeof(crc));
> 
> 
> This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
> lot slower due to the overhead of a very small memcpy().
> 
> 43e2c1d8: e28d1008 add r1, sp, #8
> 43e2c1dc: e3a02004 mov r2, #4
> 43e2c1e0: e6bf0f30 rev r0, r0
> 43e2c1e4: e5210004 str r0, [r1, #-4]!
> 43e2c1e8: e1a00004 mov r0, r4
> 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>

How about replacing the memcpy with an explicit put_unaligned(),
similar to what was done in

http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html

with get_unaligned()? The code will be longer than above, but shorter
than the above plus the memcpy(), and faster too -- actually, I'm
surprised that the compiler does not unroll the memcpy() on its own,
considering the size argument is a constant.

> Regards,
> Simon

Amicalement,
Simon Glass - April 17, 2013, 8:59 p.m.
Hi Albert,

On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:
> Hi Simon -- and sorry for the dupe.
>
> On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass <sjg@chromium.org>
> wrote:
>
>> I tried using:
>>
>> #ifdef USE_HOSTCC
>>    crc = htobe32(crc);
>> #else
>>    crc = cpu_to_be32(crc);
>> #endif
>>    memcpy(output, &crc, sizeof(crc));
>>
>>
>> This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
>> lot slower due to the overhead of a very small memcpy().
>>
>> 43e2c1d8: e28d1008 add r1, sp, #8
>> 43e2c1dc: e3a02004 mov r2, #4
>> 43e2c1e0: e6bf0f30 rev r0, r0
>> 43e2c1e4: e5210004 str r0, [r1, #-4]!
>> 43e2c1e8: e1a00004 mov r0, r4
>> 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>
>
> How about replacing the memcpy with an explicit put_unaligned(),
> similar to what was done in
>
> http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html
>
> with get_unaligned()? The code will be longer than above, but shorter
> than the above plus the memcpy(), and faster too -- actually, I'm
> surprised that the compiler does not unroll the memcpy() on its own,
> considering the size argument is a constant.

Do you mean like this?

#ifdef USE_HOSTCC
   crc = htobe32(crc);
   memcpy(output, &crc, sizeof(crc));
#else
   crc = cpu_to_be32(crc);
   put_unaligned(crc, (uint32_t *)output);
#endif


This produces the same code as my original patch. If this is
acceptable then I will do that, although it doesn't really seem any
better.

Regards,
Simon

[and sorry for my dup]
Albert ARIBAUD - April 18, 2013, 6:20 a.m.
Hi Simon,

On Wed, 17 Apr 2013 13:59:48 -0700, Simon Glass <sjg@chromium.org>
wrote:

> Hi Albert,
> 
> On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD
> <albert.u.boot@aribaud.net> wrote:
> > Hi Simon -- and sorry for the dupe.
> >
> > On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass <sjg@chromium.org>
> > wrote:
> >
> >> I tried using:
> >>
> >> #ifdef USE_HOSTCC
> >>    crc = htobe32(crc);
> >> #else
> >>    crc = cpu_to_be32(crc);
> >> #endif
> >>    memcpy(output, &crc, sizeof(crc));
> >>
> >>
> >> This is one instruction (4 bytes, 16%) smaller, but I suspect quite a
> >> lot slower due to the overhead of a very small memcpy().
> >>
> >> 43e2c1d8: e28d1008 add r1, sp, #8
> >> 43e2c1dc: e3a02004 mov r2, #4
> >> 43e2c1e0: e6bf0f30 rev r0, r0
> >> 43e2c1e4: e5210004 str r0, [r1, #-4]!
> >> 43e2c1e8: e1a00004 mov r0, r4
> >> 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy>
> >
> > How about replacing the memcpy with an explicit put_unaligned(),
> > similar to what was done in
> >
> > http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html
> >
> > with get_unaligned()? The code will be longer than above, but shorter
> > than the above plus the memcpy(), and faster too -- actually, I'm
> > surprised that the compiler does not unroll the memcpy() on its own,
> > considering the size argument is a constant.
> 
> Do you mean like this?
> 
> #ifdef USE_HOSTCC
>    crc = htobe32(crc);
>    memcpy(output, &crc, sizeof(crc));
> #else
>    crc = cpu_to_be32(crc);
>    put_unaligned(crc, (uint32_t *)output);
> #endif
> 
> This produces the same code as my original patch. If this is
> acceptable then I will do that, although it doesn't really seem any
> better.

Wolfgang may not like it any more than put_unaligned_be32() as it
builds upon it (and the disk patch could have used the be32 version as
well). Personally, I think we should allow and use these...

... and work on optimizing their implementation for ARM; we should be
able to reduce such put()s and get()s to a few instructions. And to
avoid any misunderstanding, yes, I volunteer for the optimizing work. :)

> Regards,
> Simon
> 
> [and sorry for my dup]

Actually I'm the culprit.

Amicalement,
Wolfgang Denk - April 18, 2013, 10:36 a.m.
Dear Albert ARIBAUD,

In message <20130418082027.4b5ea191@lilith> you wrote:
> 
> > #ifdef USE_HOSTCC
> >    crc = htobe32(crc);
> >    memcpy(output, &crc, sizeof(crc));
> > #else
> >    crc = cpu_to_be32(crc);
> >    put_unaligned(crc, (uint32_t *)output);
> > #endif
> > 
> > This produces the same code as my original patch. If this is
> > acceptable then I will do that, although it doesn't really seem any
> > better.
> 
> Wolfgang may not like it any more than put_unaligned_be32() as it
> builds upon it (and the disk patch could have used the be32 version as

Indeed.

> well). Personally, I think we should allow and use these...
> 
> ... and work on optimizing their implementation for ARM; we should be
> able to reduce such put()s and get()s to a few instructions. And to

OK - and what about the other architectures that suffer from the same
issues?

> avoid any misunderstanding, yes, I volunteer for the optimizing work. :)

I really dislike introducing such custom functions when we have
standard functions available that implement the same purposes.

Checking Linux code (as U-Boot is not representative here):

	-> find * -name '*.c' | wc -l
	18362
	-> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l
	136

i. e. just 0.75% of the source files actually use any of the
"put_unaligned*()" variants - it is a highly exotic function.

htobe32() is even worse - just a single source file in the whole Lnux
tree uses it (arch/um/drivers/cow_user.c).


Can we not stick to standard functions?  Instead of htobe32() we
should use htonl() which is available both in U-Boot and in the host
environment.

Best regards,

Wolfgang Denk
Albert ARIBAUD - April 18, 2013, 11:18 a.m.
Hi Wolfgang,

On Thu, 18 Apr 2013 12:36:00 +0200, Wolfgang Denk <wd@denx.de> wrote:

> Dear Albert ARIBAUD,
> 
> In message <20130418082027.4b5ea191@lilith> you wrote:
> > 
> > > #ifdef USE_HOSTCC
> > >    crc = htobe32(crc);
> > >    memcpy(output, &crc, sizeof(crc));
> > > #else
> > >    crc = cpu_to_be32(crc);
> > >    put_unaligned(crc, (uint32_t *)output);
> > > #endif
> > > 
> > > This produces the same code as my original patch. If this is
> > > acceptable then I will do that, although it doesn't really seem any
> > > better.
> > 
> > Wolfgang may not like it any more than put_unaligned_be32() as it
> > builds upon it (and the disk patch could have used the be32 version as
> 
> Indeed.
> 
> > well). Personally, I think we should allow and use these...
> > 
> > ... and work on optimizing their implementation for ARM; we should be
> > able to reduce such put()s and get()s to a few instructions. And to
> 
> OK - and what about the other architectures that suffer from the same
> issues?

They should/could provide their own optimized versions, obviously.

> > avoid any misunderstanding, yes, I volunteer for the optimizing work. :)
> 
> I really dislike introducing such custom functions when we have
> standard functions available that implement the same purposes.

I understand the point; this is a classical opposition of generic vs
optimized.

> Checking Linux code (as U-Boot is not representative here):
> 
> 	-> find * -name '*.c' | wc -l
> 	18362
> 	-> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l
> 	136
> 
> i. e. just 0.75% of the source files actually use any of the
> "put_unaligned*()" variants - it is a highly exotic function.
> 
> htobe32() is even worse - just a single source file in the whole Lnux
> tree uses it (arch/um/drivers/cow_user.c).
> 
> 
> Can we not stick to standard functions?  Instead of htobe32() we
> should use htonl() which is available both in U-Boot and in the host
> environment.

Note that there are two problems here: endianness conversion and
unaligned storage. We can indeed use htoln() instead of htobe32(), but
that only affects/solved endianness issues, not alignment ones, for
which there is no solution that is efficient across all ARM targets.

Note too that if the hash infrastructure mandated that the output
buffer be 4-byte-aligned, i.e. a u32* or similar, we would not even
have to worry about unalignment; such a constraint on the output
buffer makes all the more sense if we consider the fact that current
hash sizes are 4 (crc32), 20 (SHA1) and 32 (SHA256), all multiples of
4, and future hashes will most certainly not be less aligned.

So how about changing the element type of output in the definition of
hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
sha256_csum_wd and crc32_wd_buf, and adapting the output argument
of the sole call to hash_func_ws, that is, the local "u8
output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with
alignment.

Note finally that we *need* unaligned access macros/inline
functions/whatever, if only for exceptions laid out in
doc/README.arm-unaligned-accesses. If we avoid calling them too often,
we can live with generic.

> Best regards,
> 
> Wolfgang Denk

Amicalement,
Wolfgang Denk - April 18, 2013, 2:39 p.m.
Dear Albert,

In message <20130418131810.38c916b8@lilith> you wrote:
> 
> > OK - and what about the other architectures that suffer from the same
> > issues?
> 
> They should/could provide their own optimized versions, obviously.

Well, if this was a generally needed or even useful service, that
might make sense.  But it is highly exotic stuff...

> > I really dislike introducing such custom functions when we have
> > standard functions available that implement the same purposes.
> 
> I understand the point; this is a classical opposition of generic vs
> optimized.

Not really.  It is more opposition against premature optimization at
the cost of non-standard code.

Is there any justification that we really need hightly optimized
versus generic code here?  Is this used anywhere in a place where
performance is critical?

> > Can we not stick to standard functions?  Instead of htobe32() we
> > should use htonl() which is available both in U-Boot and in the host
> > environment.
> 
> Note that there are two problems here: endianness conversion and
> unaligned storage. We can indeed use htoln() instead of htobe32(), but
> that only affects/solved endianness issues, not alignment ones, for
> which there is no solution that is efficient across all ARM targets.

Correct.  But the htobe32() approach was cobining this with the
generic memcpy(), too, so we would do the same with htoln().

> So how about changing the element type of output in the definition of
> hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
> sha256_csum_wd and crc32_wd_buf, and adapting the output argument
> of the sole call to hash_func_ws, that is, the local "u8
> output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with
> alignment.

OK, but that would be a totally different approach (which appears to
be a good one, here).

> Note finally that we *need* unaligned access macros/inline
> functions/whatever, if only for exceptions laid out in
> doc/README.arm-unaligned-accesses. If we avoid calling them too often,
> we can live with generic.

Agreed; the focus should always be on avoidance, though.

Best regards,

Wolfgang Denk
Albert ARIBAUD - April 18, 2013, 4:39 p.m.
Hi Wolfgang,

On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote:

> > So how about changing the element type of output in the definition of
> > hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
> > sha256_csum_wd and crc32_wd_buf, and adapting the output argument
> > of the sole call to hash_func_ws, that is, the local "u8
> > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with
> > alignment.
> 
> OK, but that would be a totally different approach (which appears to
> be a good one, here).

Indeed; I would suggest splitting this change in two independent ones:

- fix the endianness issue: change the endianness of crc through the
  use of htonl() but leave the existing memcpy() in place as it is,
  even though it is not speed-optimized. That's what Simon's patch
  does if the HOSTCC part is ignored;

- fix the unalignment issue by changing parameter 'output' of function
  type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
  code accordingly -- which would lead to replacing the crc32 final
  memcpy() with a single indirect assignment.

These two changes could be submitted either separately, or as a series.

Amicalement,
Tom Rini - April 18, 2013, 4:58 p.m.
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
> Hi Wolfgang,
> 
> On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote:
> 
> > > So how about changing the element type of output in the definition of
> > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
> > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument
> > > of the sole call to hash_func_ws, that is, the local "u8
> > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with
> > > alignment.
> > 
> > OK, but that would be a totally different approach (which appears to
> > be a good one, here).
> 
> Indeed; I would suggest splitting this change in two independent ones:
> 
> - fix the endianness issue: change the endianness of crc through the
>   use of htonl() but leave the existing memcpy() in place as it is,
>   even though it is not speed-optimized. That's what Simon's patch
>   does if the HOSTCC part is ignored;
> 
> - fix the unalignment issue by changing parameter 'output' of function
>   type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
>   code accordingly -- which would lead to replacing the crc32 final
>   memcpy() with a single indirect assignment.
> 
> These two changes could be submitted either separately, or as a series.

Now so that I'm clear, if we don't do anything about the unaligned issue
today, it's just slow, not an unaligned access that leads to abort,
right?  So part one today for release, part two next week after release.
Albert ARIBAUD - April 18, 2013, 6:43 p.m.
Hi Tom,

On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini <trini@ti.com> wrote:

> On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
> > Hi Wolfgang,
> > 
> > On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote:
> > 
> > > > So how about changing the element type of output in the definition of
> > > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
> > > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument
> > > > of the sole call to hash_func_ws, that is, the local "u8
> > > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with
> > > > alignment.
> > > 
> > > OK, but that would be a totally different approach (which appears to
> > > be a good one, here).
> > 
> > Indeed; I would suggest splitting this change in two independent ones:
> > 
> > - fix the endianness issue: change the endianness of crc through the
> >   use of htonl() but leave the existing memcpy() in place as it is,
> >   even though it is not speed-optimized. That's what Simon's patch
> >   does if the HOSTCC part is ignored;
> > 
> > - fix the unalignment issue by changing parameter 'output' of function
> >   type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
> >   code accordingly -- which would lead to replacing the crc32 final
> >   memcpy() with a single indirect assignment.
> > 
> > These two changes could be submitted either separately, or as a series.
> 
> Now so that I'm clear, if we don't do anything about the unaligned issue
> today, it's just slow, not an unaligned access that leads to abort,
> right?  So part one today for release, part two next week after release.

Yes, the current code is just slower than it could be, but works, and
so would it with Simon's patch as long as it keeps the memcpy().

Amicalement,
Simon Glass - April 18, 2013, 7:06 p.m.
Hi,

On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:
> Hi Tom,
>
> On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini <trini@ti.com> wrote:
>
>> On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
>> > Hi Wolfgang,
>> >
>> > On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote:
>> >
>> > > > So how about changing the element type of output in the definition of
>> > > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
>> > > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument
>> > > > of the sole call to hash_func_ws, that is, the local "u8
>> > > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with
>> > > > alignment.
>> > >
>> > > OK, but that would be a totally different approach (which appears to
>> > > be a good one, here).
>> >
>> > Indeed; I would suggest splitting this change in two independent ones:
>> >
>> > - fix the endianness issue: change the endianness of crc through the
>> >   use of htonl() but leave the existing memcpy() in place as it is,
>> >   even though it is not speed-optimized. That's what Simon's patch
>> >   does if the HOSTCC part is ignored;
>> >
>> > - fix the unalignment issue by changing parameter 'output' of function
>> >   type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
>> >   code accordingly -- which would lead to replacing the crc32 final
>> >   memcpy() with a single indirect assignment.
>> >
>> > These two changes could be submitted either separately, or as a series.
>>
>> Now so that I'm clear, if we don't do anything about the unaligned issue
>> today, it's just slow, not an unaligned access that leads to abort,
>> right?  So part one today for release, part two next week after release.
>
> Yes, the current code is just slower than it could be, but works, and
> so would it with Simon's patch as long as it keeps the memcpy().

Yes the alignment issue is manufactured IMO by the char * used for the
function signature and I did not feel comfortable declaring that it
must be word aligned. To be it seems that the type should be naturally
aligned. What do people think about that?

Anyway, that's why I ended up with this patch, which I felt was small
enough to be accepted as a bug fix for the release.

Albert's email exactly mirrors my thinking on this - and yes I would
be much more comfortable not having to create part 2 for the release.

Please let me know what I should do with this patch for now.

Regards,
Simon
Tom Rini - April 18, 2013, 7:18 p.m.
On Thu, Apr 18, 2013 at 12:06:47PM -0700, Simon Glass wrote:
> Hi,
> 
> On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD
> <albert.u.boot@aribaud.net> wrote:
> > Hi Tom,
> >
> > On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini <trini@ti.com> wrote:
> >
> >> On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote:
> >> > Hi Wolfgang,
> >> >
> >> > On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote:
> >> >
> >> > > > So how about changing the element type of output in the definition of
> >> > > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd,
> >> > > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument
> >> > > > of the sole call to hash_func_ws, that is, the local "u8
> >> > > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with
> >> > > > alignment.
> >> > >
> >> > > OK, but that would be a totally different approach (which appears to
> >> > > be a good one, here).
> >> >
> >> > Indeed; I would suggest splitting this change in two independent ones:
> >> >
> >> > - fix the endianness issue: change the endianness of crc through the
> >> >   use of htonl() but leave the existing memcpy() in place as it is,
> >> >   even though it is not speed-optimized. That's what Simon's patch
> >> >   does if the HOSTCC part is ignored;
> >> >
> >> > - fix the unalignment issue by changing parameter 'output' of function
> >> >   type 'hash_func_ws' from u8* to u32* and adjusting the rest of the
> >> >   code accordingly -- which would lead to replacing the crc32 final
> >> >   memcpy() with a single indirect assignment.
> >> >
> >> > These two changes could be submitted either separately, or as a series.
> >>
> >> Now so that I'm clear, if we don't do anything about the unaligned issue
> >> today, it's just slow, not an unaligned access that leads to abort,
> >> right?  So part one today for release, part two next week after release.
> >
> > Yes, the current code is just slower than it could be, but works, and
> > so would it with Simon's patch as long as it keeps the memcpy().
> 
> Yes the alignment issue is manufactured IMO by the char * used for the
> function signature and I did not feel comfortable declaring that it
> must be word aligned. To be it seems that the type should be naturally
> aligned. What do people think about that?
> 
> Anyway, that's why I ended up with this patch, which I felt was small
> enough to be accepted as a bug fix for the release.
> 
> Albert's email exactly mirrors my thinking on this - and yes I would
> be much more comfortable not having to create part 2 for the release.
> 
> Please let me know what I should do with this patch for now.

Lets follow the outline Albert made above, two patches, #2 depends on #1
and #1 is now, #2 is next week.

Patch

diff --git a/lib/crc32.c b/lib/crc32.c
index 76205da..94720bf 100644
--- a/lib/crc32.c
+++ b/lib/crc32.c
@@ -10,6 +10,7 @@ 
 
 #ifndef USE_HOSTCC
 #include <common.h>
+#include <asm/unaligned.h>
 #endif
 #include <compiler.h>
 #include <u-boot/crc.h>
@@ -256,5 +257,10 @@  void crc32_wd_buf(const unsigned char *input, unsigned int ilen,
 	uint32_t crc;
 
 	crc = crc32_wd(0, input, ilen, chunk_sz);
+#ifdef USE_HOSTCC
+	crc = htobe32(crc);
 	memcpy(output, &crc, sizeof(crc));
+#else
+	put_unaligned_be32(crc, output);
+#endif
 }