Message ID | 1365203470-9099-1-git-send-email-sjg@chromium.org |
---|---|
State | Changes Requested |
Delegated to: | Wolfgang Denk |
Headers | show |
On Fri, Apr 05, 2013 at 04:11:10PM -0700, Simon Glass wrote: > When crc32 is handled by the hash library, it requires the data to be in > big-endian format, since it reads it byte-wise. Thus at present the 'crc32' > command reports incorrect data. For example, previously we might see: > > Peach # crc32 40000000 100 > CRC32 for 40000000 ... 400000ff ==> 0d968558 > > but instead with the hash library we see: > > Peach # crc32 40000000 100 > CRC32 for 40000000 ... 400000ff ==> 5885960d > > Correct this. > > Signed-off-by: Simon Glass <sjg@chromium.org> > Reviewed-by: Vadim Bendebury <vbendeb@google.com> > --- > lib/crc32.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/lib/crc32.c b/lib/crc32.c > index 76205da..94720bf 100644 > --- a/lib/crc32.c > +++ b/lib/crc32.c > @@ -10,6 +10,7 @@ > > #ifndef USE_HOSTCC > #include <common.h> > +#include <asm/unaligned.h> > #endif > #include <compiler.h> > #include <u-boot/crc.h> > @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int ilen, > uint32_t crc; > > crc = crc32_wd(0, input, ilen, chunk_sz); > +#ifdef USE_HOSTCC > + crc = htobe32(crc); > memcpy(output, &crc, sizeof(crc)); > +#else > + put_unaligned_be32(crc, output); > +#endif > } > -- > 1.8.1.3 > Tested on Tegra114 Dalmore, verified crc32 comes out as expected now. Reviewed-by: Allen Martin <amartin@nvidia.com> Tested-by: Allen Martin <amartin@nvidia.com>
Dear Simon Glass, In message <1365203470-9099-1-git-send-email-sjg@chromium.org> you wrote: > When crc32 is handled by the hash library, it requires the data to be in > big-endian format, since it reads it byte-wise. Thus at present the 'crc32' > command reports incorrect data. For example, previously we might see: > +#ifdef USE_HOSTCC > + crc = htobe32(crc); > memcpy(output, &crc, sizeof(crc)); > +#else > + put_unaligned_be32(crc, output); > +#endif Why is this depending on USE_HOSTCC, and not on the endianess? And why do we need the #ifdef? Can we not always use htobe32() and put_unaligned_be32() ? Best regards, Wolfgang Denk
Hi Wolfgang, On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk <wd@denx.de> wrote: > Dear Simon Glass, > > In message <1365203470-9099-1-git-send-email-sjg@chromium.org> you wrote: >> When crc32 is handled by the hash library, it requires the data to be in >> big-endian format, since it reads it byte-wise. Thus at present the 'crc32' >> command reports incorrect data. For example, previously we might see: > > >> +#ifdef USE_HOSTCC >> + crc = htobe32(crc); >> memcpy(output, &crc, sizeof(crc)); >> +#else >> + put_unaligned_be32(crc, output); >> +#endif > > Why is this depending on USE_HOSTCC, and not on the endianess? We always want big-endian in this case, since the bytes have to date been written that way by the crc32 command. > > And why do we need the #ifdef? Can we not always use htobe32() and > put_unaligned_be32() ? Well I don't think put_unaligned_be32 is available to user space, which is the environment that the tools are built under. It is available in the kernel, but that's not our environment. I'm not happy with this solution and would be pleased to find a better way, but I'm not sure what it is. But this patch does fix a real bug which we should sort out before the release, one way or another. > > Best regards, > > Wolfgang Denk Regards, Simon
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/16/2013 05:57 PM, Simon Glass wrote: > Hi Wolfgang, > > On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk <wd@denx.de> wrote: >> Dear Simon Glass, >> >> In message <1365203470-9099-1-git-send-email-sjg@chromium.org> >> you wrote: >>> When crc32 is handled by the hash library, it requires the data >>> to be in big-endian format, since it reads it byte-wise. Thus >>> at present the 'crc32' command reports incorrect data. For >>> example, previously we might see: >> >> >>> +#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, >>> &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, >>> output); +#endif >> >> Why is this depending on USE_HOSTCC, and not on the endianess? > > We always want big-endian in this case, since the bytes have to > date been written that way by the crc32 command. In other words, this code is executed on both the host and the target, and there's not a uniform endian sanitizer function. - -- Tom -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRbdghAAoJENk4IS6UOR1WzzsP/j+JhIIkp9EK4YwDQ7H6F06V i60/Y/EaVg32HAqMjo66y2GCcu7ak9fobZo3wHnAjrFkaSJWq4s+eDNOtJiHbr/O ZwQbDJ/t5Yf7vuZ1OhBTII+pYHXYcDW+30QZpBP3+ydY8Zkg7tsAcQ5rH7KdHWuD 83k7DLjv9obn85eeVikNkfB7ONNFVRug07Obcd+jbXdQamc/VWxWZyvUwDiKGCYH eqmss/hQ7o343FWKqsVNSxd3/tF7z3PNevWm83xJxlc5xbtyJ/8kad6qkkILtOGd G5OOXnL7BpSqL2mxt5ruW+cwqOnp74SvoQXuM6lNZzeAmlirAginYbnDSvjQamIy DK1BQAjodXGU7nZxffw4vKZzbGzkgRl2KkuGvQfpMJzoJRyWvrbDzIVOeF/bXXQS UEvASOFAdqKpMPNJnIm16GUtH6/OyEWK+8HInFse7K19ycM4M/TpM2dhmVZrHG0S Q+Xq7ZI6pEq0SjMIfhuCwYJS6lqbtlQ5eOk+KoTYXOWneOzOgDGKO+El53wDwKQ5 0icIUwNKn4ZKMg7HLE25Docx3Ez6OVBD2Aelz0wlc5FMWlmLrHe9oYaufDN36bBH D5XrLeBDjj89mTOHl4V0U3tZ/1iLLFqxo9RyNG6lPkLOQD62vLPxjyPYqY8Q6q85 kPdToR/o/YfFk3EsiYpD =0O7d -----END PGP SIGNATURE-----
Hi Tom, On Tue, Apr 16, 2013 at 4:00 PM, Tom Rini <trini@ti.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 04/16/2013 05:57 PM, Simon Glass wrote: >> Hi Wolfgang, >> >> On Sat, Apr 6, 2013 at 12:04 AM, Wolfgang Denk <wd@denx.de> wrote: >>> Dear Simon Glass, >>> >>> In message <1365203470-9099-1-git-send-email-sjg@chromium.org> >>> you wrote: >>>> When crc32 is handled by the hash library, it requires the data >>>> to be in big-endian format, since it reads it byte-wise. Thus >>>> at present the 'crc32' command reports incorrect data. For >>>> example, previously we might see: >>> >>> >>>> +#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, >>>> &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, >>>> output); +#endif >>> >>> Why is this depending on USE_HOSTCC, and not on the endianess? >> >> We always want big-endian in this case, since the bytes have to >> date been written that way by the crc32 command. > > In other words, this code is executed on both the host and the target, > and there's not a uniform endian sanitizer function. Well to be honest, I don't think the code is needed on the host, and I could put the #ifdef around the whole function. I just thought that might be a bit short-sighted. Regards, Simon
Dear Simon Glass, In message <CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com> you wrote: > > >> +#ifdef USE_HOSTCC > >> + crc = htobe32(crc); > >> memcpy(output, &crc, sizeof(crc)); > >> +#else > >> + put_unaligned_be32(crc, output); > >> +#endif > > > > Why is this depending on USE_HOSTCC, and not on the endianess? > > We always want big-endian in this case, since the bytes have to date > been written that way by the crc32 command. Let me rephrase. Why do we need an #ifdef here, and why depends this on USE_HOSTCC? > > And why do we need the #ifdef? Can we not always use htobe32() and > > put_unaligned_be32() ? > > Well I don't think put_unaligned_be32 is available to user space, > which is the environment that the tools are built under. It is > available in the kernel, but that's not our environment. It appears put_unaligned_be32() is a widely unknown, pretty exotic function that so far has been used in ony two source files: drivers/usb/gadget/f_mass_storage.c lib/tpm.c The implementation (in "include/linux/unaligned/generic.h") is ugly and pretty expensive in terms of run time and memory footprint. I would like to avoid it's usage all together. > I'm not happy with this solution and would be pleased to find a better > way, but I'm not sure what it is. Does the htobe32() + memcpy() approach not work everywhere? > But this patch does fix a real bug which we should sort out before the > release, one way or another. Agreed. Best regards, Wolfgang Denk
Hi Wolfgang, On Tue, Apr 16, 2013 at 10:40 PM, Wolfgang Denk <wd@denx.de> wrote: > Dear Simon Glass, > > In message <CAPnjgZ2jRVpQ56_EpVKUMH8E1L2LJ0HgzNCs-GrN9bXfOSXz+Q@mail.gmail.com> you wrote: >> >> >> +#ifdef USE_HOSTCC >> >> + crc = htobe32(crc); >> >> memcpy(output, &crc, sizeof(crc)); >> >> +#else >> >> + put_unaligned_be32(crc, output); >> >> +#endif >> > >> > Why is this depending on USE_HOSTCC, and not on the endianess? >> >> We always want big-endian in this case, since the bytes have to date >> been written that way by the crc32 command. > > Let me rephrase. Why do we need an #ifdef here, and why depends this > on USE_HOSTCC? > >> > And why do we need the #ifdef? Can we not always use htobe32() and >> > put_unaligned_be32() ? >> >> Well I don't think put_unaligned_be32 is available to user space, >> which is the environment that the tools are built under. It is >> available in the kernel, but that's not our environment. > > It appears put_unaligned_be32() is a widely unknown, pretty exotic > function that so far has been used in ony two source files: > > drivers/usb/gadget/f_mass_storage.c > lib/tpm.c > > The implementation (in "include/linux/unaligned/generic.h") is ugly > and pretty expensive in terms of run time and memory footprint. > > I would like to avoid it's usage all together. It ends up producing this on ARMv7: 43e2c1d8: e1a03820 lsr r3, r0, #16 43e2c1dc: e5c40003 strb r0, [r4, #3] 43e2c1e0: e5c43001 strb r3, [r4, #1] 43e2c1e4: e1a02423 lsr r2, r3, #8 43e2c1e8: e7e73450 ubfx r3, r0, #8, #8 43e2c1ec: e5c42000 strb r2, [r4] 43e2c1f0: e5c43002 strb r3, [r4, #2] In fact with unaligned support we could optimise this. > >> I'm not happy with this solution and would be pleased to find a better >> way, but I'm not sure what it is. > > Does the htobe32() + memcpy() approach not work everywhere? But htobe32() isn't available in U-Boot. I tried using: #ifdef USE_HOSTCC crc = htobe32(crc); #else crc = cpu_to_be32(crc); #endif memcpy(output, &crc, sizeof(crc)); This is one instruction (4 bytes, 16%) smaller, but I suspect quite a lot slower due to the overhead of a very small memcpy(). 43e2c1d8: e28d1008 add r1, sp, #8 43e2c1dc: e3a02004 mov r2, #4 43e2c1e0: e6bf0f30 rev r0, r0 43e2c1e4: e5210004 str r0, [r1, #-4]! 43e2c1e8: e1a00004 mov r0, r4 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy> > >> But this patch does fix a real bug which we should sort out before the >> release, one way or another. > > Agreed. Let me know what you prefer and I will update the patch. Regards, Simon
Hi Simon -- and sorry for the dupe. On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass <sjg@chromium.org> wrote: > I tried using: > > #ifdef USE_HOSTCC > crc = htobe32(crc); > #else > crc = cpu_to_be32(crc); > #endif > memcpy(output, &crc, sizeof(crc)); > > > This is one instruction (4 bytes, 16%) smaller, but I suspect quite a > lot slower due to the overhead of a very small memcpy(). > > 43e2c1d8: e28d1008 add r1, sp, #8 > 43e2c1dc: e3a02004 mov r2, #4 > 43e2c1e0: e6bf0f30 rev r0, r0 > 43e2c1e4: e5210004 str r0, [r1, #-4]! > 43e2c1e8: e1a00004 mov r0, r4 > 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy> How about replacing the memcpy with an explicit put_unaligned(), similar to what was done in http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html with get_unaligned()? The code will be longer than above, but shorter than the above plus the memcpy(), and faster too -- actually, I'm surprised that the compiler does not unroll the memcpy() on its own, considering the size argument is a constant. > Regards, > Simon Amicalement,
Hi Albert, On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote: > Hi Simon -- and sorry for the dupe. > > On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass <sjg@chromium.org> > wrote: > >> I tried using: >> >> #ifdef USE_HOSTCC >> crc = htobe32(crc); >> #else >> crc = cpu_to_be32(crc); >> #endif >> memcpy(output, &crc, sizeof(crc)); >> >> >> This is one instruction (4 bytes, 16%) smaller, but I suspect quite a >> lot slower due to the overhead of a very small memcpy(). >> >> 43e2c1d8: e28d1008 add r1, sp, #8 >> 43e2c1dc: e3a02004 mov r2, #4 >> 43e2c1e0: e6bf0f30 rev r0, r0 >> 43e2c1e4: e5210004 str r0, [r1, #-4]! >> 43e2c1e8: e1a00004 mov r0, r4 >> 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy> > > How about replacing the memcpy with an explicit put_unaligned(), > similar to what was done in > > http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html > > with get_unaligned()? The code will be longer than above, but shorter > than the above plus the memcpy(), and faster too -- actually, I'm > surprised that the compiler does not unroll the memcpy() on its own, > considering the size argument is a constant. Do you mean like this? #ifdef USE_HOSTCC crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); #else crc = cpu_to_be32(crc); put_unaligned(crc, (uint32_t *)output); #endif This produces the same code as my original patch. If this is acceptable then I will do that, although it doesn't really seem any better. Regards, Simon [and sorry for my dup]
Hi Simon, On Wed, 17 Apr 2013 13:59:48 -0700, Simon Glass <sjg@chromium.org> wrote: > Hi Albert, > > On Wed, Apr 17, 2013 at 12:23 PM, Albert ARIBAUD > <albert.u.boot@aribaud.net> wrote: > > Hi Simon -- and sorry for the dupe. > > > > On Wed, 17 Apr 2013 11:28:07 -0700, Simon Glass <sjg@chromium.org> > > wrote: > > > >> I tried using: > >> > >> #ifdef USE_HOSTCC > >> crc = htobe32(crc); > >> #else > >> crc = cpu_to_be32(crc); > >> #endif > >> memcpy(output, &crc, sizeof(crc)); > >> > >> > >> This is one instruction (4 bytes, 16%) smaller, but I suspect quite a > >> lot slower due to the overhead of a very small memcpy(). > >> > >> 43e2c1d8: e28d1008 add r1, sp, #8 > >> 43e2c1dc: e3a02004 mov r2, #4 > >> 43e2c1e0: e6bf0f30 rev r0, r0 > >> 43e2c1e4: e5210004 str r0, [r1, #-4]! > >> 43e2c1e8: e1a00004 mov r0, r4 > >> 43e2c1ec: eb001af7 bl 43e32dd0 <memcpy> > > > > How about replacing the memcpy with an explicit put_unaligned(), > > similar to what was done in > > > > http://www.mail-archive.com/u-boot@lists.denx.de/msg109555.html > > > > with get_unaligned()? The code will be longer than above, but shorter > > than the above plus the memcpy(), and faster too -- actually, I'm > > surprised that the compiler does not unroll the memcpy() on its own, > > considering the size argument is a constant. > > Do you mean like this? > > #ifdef USE_HOSTCC > crc = htobe32(crc); > memcpy(output, &crc, sizeof(crc)); > #else > crc = cpu_to_be32(crc); > put_unaligned(crc, (uint32_t *)output); > #endif > > This produces the same code as my original patch. If this is > acceptable then I will do that, although it doesn't really seem any > better. Wolfgang may not like it any more than put_unaligned_be32() as it builds upon it (and the disk patch could have used the be32 version as well). Personally, I think we should allow and use these... ... and work on optimizing their implementation for ARM; we should be able to reduce such put()s and get()s to a few instructions. And to avoid any misunderstanding, yes, I volunteer for the optimizing work. :) > Regards, > Simon > > [and sorry for my dup] Actually I'm the culprit. Amicalement,
Dear Albert ARIBAUD, In message <20130418082027.4b5ea191@lilith> you wrote: > > > #ifdef USE_HOSTCC > > crc = htobe32(crc); > > memcpy(output, &crc, sizeof(crc)); > > #else > > crc = cpu_to_be32(crc); > > put_unaligned(crc, (uint32_t *)output); > > #endif > > > > This produces the same code as my original patch. If this is > > acceptable then I will do that, although it doesn't really seem any > > better. > > Wolfgang may not like it any more than put_unaligned_be32() as it > builds upon it (and the disk patch could have used the be32 version as Indeed. > well). Personally, I think we should allow and use these... > > ... and work on optimizing their implementation for ARM; we should be > able to reduce such put()s and get()s to a few instructions. And to OK - and what about the other architectures that suffer from the same issues? > avoid any misunderstanding, yes, I volunteer for the optimizing work. :) I really dislike introducing such custom functions when we have standard functions available that implement the same purposes. Checking Linux code (as U-Boot is not representative here): -> find * -name '*.c' | wc -l 18362 -> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l 136 i. e. just 0.75% of the source files actually use any of the "put_unaligned*()" variants - it is a highly exotic function. htobe32() is even worse - just a single source file in the whole Lnux tree uses it (arch/um/drivers/cow_user.c). Can we not stick to standard functions? Instead of htobe32() we should use htonl() which is available both in U-Boot and in the host environment. Best regards, Wolfgang Denk
Hi Wolfgang, On Thu, 18 Apr 2013 12:36:00 +0200, Wolfgang Denk <wd@denx.de> wrote: > Dear Albert ARIBAUD, > > In message <20130418082027.4b5ea191@lilith> you wrote: > > > > > #ifdef USE_HOSTCC > > > crc = htobe32(crc); > > > memcpy(output, &crc, sizeof(crc)); > > > #else > > > crc = cpu_to_be32(crc); > > > put_unaligned(crc, (uint32_t *)output); > > > #endif > > > > > > This produces the same code as my original patch. If this is > > > acceptable then I will do that, although it doesn't really seem any > > > better. > > > > Wolfgang may not like it any more than put_unaligned_be32() as it > > builds upon it (and the disk patch could have used the be32 version as > > Indeed. > > > well). Personally, I think we should allow and use these... > > > > ... and work on optimizing their implementation for ARM; we should be > > able to reduce such put()s and get()s to a few instructions. And to > > OK - and what about the other architectures that suffer from the same > issues? They should/could provide their own optimized versions, obviously. > > avoid any misunderstanding, yes, I volunteer for the optimizing work. :) > > I really dislike introducing such custom functions when we have > standard functions available that implement the same purposes. I understand the point; this is a classical opposition of generic vs optimized. > Checking Linux code (as U-Boot is not representative here): > > -> find * -name '*.c' | wc -l > 18362 > -> find * -name '*.c' | xargs fgrep -l put_unaligned | wc -l > 136 > > i. e. just 0.75% of the source files actually use any of the > "put_unaligned*()" variants - it is a highly exotic function. > > htobe32() is even worse - just a single source file in the whole Lnux > tree uses it (arch/um/drivers/cow_user.c). > > > Can we not stick to standard functions? Instead of htobe32() we > should use htonl() which is available both in U-Boot and in the host > environment. Note that there are two problems here: endianness conversion and unaligned storage. We can indeed use htoln() instead of htobe32(), but that only affects/solved endianness issues, not alignment ones, for which there is no solution that is efficient across all ARM targets. Note too that if the hash infrastructure mandated that the output buffer be 4-byte-aligned, i.e. a u32* or similar, we would not even have to worry about unalignment; such a constraint on the output buffer makes all the more sense if we consider the fact that current hash sizes are 4 (crc32), 20 (SHA1) and 32 (SHA256), all multiples of 4, and future hashes will most certainly not be less aligned. So how about changing the element type of output in the definition of hash_func_ws, adapting the corresponding implementations sha1_csum_wd, sha256_csum_wd and crc32_wd_buf, and adapting the output argument of the sole call to hash_func_ws, that is, the local "u8 output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with alignment. Note finally that we *need* unaligned access macros/inline functions/whatever, if only for exceptions laid out in doc/README.arm-unaligned-accesses. If we avoid calling them too often, we can live with generic. > Best regards, > > Wolfgang Denk Amicalement,
Dear Albert, In message <20130418131810.38c916b8@lilith> you wrote: > > > OK - and what about the other architectures that suffer from the same > > issues? > > They should/could provide their own optimized versions, obviously. Well, if this was a generally needed or even useful service, that might make sense. But it is highly exotic stuff... > > I really dislike introducing such custom functions when we have > > standard functions available that implement the same purposes. > > I understand the point; this is a classical opposition of generic vs > optimized. Not really. It is more opposition against premature optimization at the cost of non-standard code. Is there any justification that we really need hightly optimized versus generic code here? Is this used anywhere in a place where performance is critical? > > Can we not stick to standard functions? Instead of htobe32() we > > should use htonl() which is available both in U-Boot and in the host > > environment. > > Note that there are two problems here: endianness conversion and > unaligned storage. We can indeed use htoln() instead of htobe32(), but > that only affects/solved endianness issues, not alignment ones, for > which there is no solution that is efficient across all ARM targets. Correct. But the htobe32() approach was cobining this with the generic memcpy(), too, so we would do the same with htoln(). > So how about changing the element type of output in the definition of > hash_func_ws, adapting the corresponding implementations sha1_csum_wd, > sha256_csum_wd and crc32_wd_buf, and adapting the output argument > of the sole call to hash_func_ws, that is, the local "u8 > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with > alignment. OK, but that would be a totally different approach (which appears to be a good one, here). > Note finally that we *need* unaligned access macros/inline > functions/whatever, if only for exceptions laid out in > doc/README.arm-unaligned-accesses. If we avoid calling them too often, > we can live with generic. Agreed; the focus should always be on avoidance, though. Best regards, Wolfgang Denk
Hi Wolfgang, On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote: > > So how about changing the element type of output in the definition of > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd, > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument > > of the sole call to hash_func_ws, that is, the local "u8 > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with > > alignment. > > OK, but that would be a totally different approach (which appears to > be a good one, here). Indeed; I would suggest splitting this change in two independent ones: - fix the endianness issue: change the endianness of crc through the use of htonl() but leave the existing memcpy() in place as it is, even though it is not speed-optimized. That's what Simon's patch does if the HOSTCC part is ignored; - fix the unalignment issue by changing parameter 'output' of function type 'hash_func_ws' from u8* to u32* and adjusting the rest of the code accordingly -- which would lead to replacing the crc32 final memcpy() with a single indirect assignment. These two changes could be submitted either separately, or as a series. Amicalement,
On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote: > Hi Wolfgang, > > On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote: > > > > So how about changing the element type of output in the definition of > > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd, > > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument > > > of the sole call to hash_func_ws, that is, the local "u8 > > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with > > > alignment. > > > > OK, but that would be a totally different approach (which appears to > > be a good one, here). > > Indeed; I would suggest splitting this change in two independent ones: > > - fix the endianness issue: change the endianness of crc through the > use of htonl() but leave the existing memcpy() in place as it is, > even though it is not speed-optimized. That's what Simon's patch > does if the HOSTCC part is ignored; > > - fix the unalignment issue by changing parameter 'output' of function > type 'hash_func_ws' from u8* to u32* and adjusting the rest of the > code accordingly -- which would lead to replacing the crc32 final > memcpy() with a single indirect assignment. > > These two changes could be submitted either separately, or as a series. Now so that I'm clear, if we don't do anything about the unaligned issue today, it's just slow, not an unaligned access that leads to abort, right? So part one today for release, part two next week after release.
Hi Tom, On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini <trini@ti.com> wrote: > On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote: > > Hi Wolfgang, > > > > On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote: > > > > > > So how about changing the element type of output in the definition of > > > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd, > > > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument > > > > of the sole call to hash_func_ws, that is, the local "u8 > > > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with > > > > alignment. > > > > > > OK, but that would be a totally different approach (which appears to > > > be a good one, here). > > > > Indeed; I would suggest splitting this change in two independent ones: > > > > - fix the endianness issue: change the endianness of crc through the > > use of htonl() but leave the existing memcpy() in place as it is, > > even though it is not speed-optimized. That's what Simon's patch > > does if the HOSTCC part is ignored; > > > > - fix the unalignment issue by changing parameter 'output' of function > > type 'hash_func_ws' from u8* to u32* and adjusting the rest of the > > code accordingly -- which would lead to replacing the crc32 final > > memcpy() with a single indirect assignment. > > > > These two changes could be submitted either separately, or as a series. > > Now so that I'm clear, if we don't do anything about the unaligned issue > today, it's just slow, not an unaligned access that leads to abort, > right? So part one today for release, part two next week after release. Yes, the current code is just slower than it could be, but works, and so would it with Simon's patch as long as it keeps the memcpy(). Amicalement,
Hi, On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote: > Hi Tom, > > On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini <trini@ti.com> wrote: > >> On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote: >> > Hi Wolfgang, >> > >> > On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote: >> > >> > > > So how about changing the element type of output in the definition of >> > > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd, >> > > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument >> > > > of the sole call to hash_func_ws, that is, the local "u8 >> > > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with >> > > > alignment. >> > > >> > > OK, but that would be a totally different approach (which appears to >> > > be a good one, here). >> > >> > Indeed; I would suggest splitting this change in two independent ones: >> > >> > - fix the endianness issue: change the endianness of crc through the >> > use of htonl() but leave the existing memcpy() in place as it is, >> > even though it is not speed-optimized. That's what Simon's patch >> > does if the HOSTCC part is ignored; >> > >> > - fix the unalignment issue by changing parameter 'output' of function >> > type 'hash_func_ws' from u8* to u32* and adjusting the rest of the >> > code accordingly -- which would lead to replacing the crc32 final >> > memcpy() with a single indirect assignment. >> > >> > These two changes could be submitted either separately, or as a series. >> >> Now so that I'm clear, if we don't do anything about the unaligned issue >> today, it's just slow, not an unaligned access that leads to abort, >> right? So part one today for release, part two next week after release. > > Yes, the current code is just slower than it could be, but works, and > so would it with Simon's patch as long as it keeps the memcpy(). Yes the alignment issue is manufactured IMO by the char * used for the function signature and I did not feel comfortable declaring that it must be word aligned. To be it seems that the type should be naturally aligned. What do people think about that? Anyway, that's why I ended up with this patch, which I felt was small enough to be accepted as a bug fix for the release. Albert's email exactly mirrors my thinking on this - and yes I would be much more comfortable not having to create part 2 for the release. Please let me know what I should do with this patch for now. Regards, Simon
On Thu, Apr 18, 2013 at 12:06:47PM -0700, Simon Glass wrote: > Hi, > > On Thu, Apr 18, 2013 at 11:43 AM, Albert ARIBAUD > <albert.u.boot@aribaud.net> wrote: > > Hi Tom, > > > > On Thu, 18 Apr 2013 12:58:54 -0400, Tom Rini <trini@ti.com> wrote: > > > >> On Thu, Apr 18, 2013 at 06:39:29PM +0200, Albert ARIBAUD wrote: > >> > Hi Wolfgang, > >> > > >> > On Thu, 18 Apr 2013 16:39:09 +0200, Wolfgang Denk <wd@denx.de> wrote: > >> > > >> > > > So how about changing the element type of output in the definition of > >> > > > hash_func_ws, adapting the corresponding implementations sha1_csum_wd, > >> > > > sha256_csum_wd and crc32_wd_buf, and adapting the output argument > >> > > > of the sole call to hash_func_ws, that is, the local "u8 > >> > > > output[HASH_MAX_DIGEST_SIZE];" in hash.c? Then we'be done with > >> > > > alignment. > >> > > > >> > > OK, but that would be a totally different approach (which appears to > >> > > be a good one, here). > >> > > >> > Indeed; I would suggest splitting this change in two independent ones: > >> > > >> > - fix the endianness issue: change the endianness of crc through the > >> > use of htonl() but leave the existing memcpy() in place as it is, > >> > even though it is not speed-optimized. That's what Simon's patch > >> > does if the HOSTCC part is ignored; > >> > > >> > - fix the unalignment issue by changing parameter 'output' of function > >> > type 'hash_func_ws' from u8* to u32* and adjusting the rest of the > >> > code accordingly -- which would lead to replacing the crc32 final > >> > memcpy() with a single indirect assignment. > >> > > >> > These two changes could be submitted either separately, or as a series. > >> > >> Now so that I'm clear, if we don't do anything about the unaligned issue > >> today, it's just slow, not an unaligned access that leads to abort, > >> right? So part one today for release, part two next week after release. > > > > Yes, the current code is just slower than it could be, but works, and > > so would it with Simon's patch as long as it keeps the memcpy(). > > Yes the alignment issue is manufactured IMO by the char * used for the > function signature and I did not feel comfortable declaring that it > must be word aligned. To be it seems that the type should be naturally > aligned. What do people think about that? > > Anyway, that's why I ended up with this patch, which I felt was small > enough to be accepted as a bug fix for the release. > > Albert's email exactly mirrors my thinking on this - and yes I would > be much more comfortable not having to create part 2 for the release. > > Please let me know what I should do with this patch for now. Lets follow the outline Albert made above, two patches, #2 depends on #1 and #1 is now, #2 is next week.
diff --git a/lib/crc32.c b/lib/crc32.c index 76205da..94720bf 100644 --- a/lib/crc32.c +++ b/lib/crc32.c @@ -10,6 +10,7 @@ #ifndef USE_HOSTCC #include <common.h> +#include <asm/unaligned.h> #endif #include <compiler.h> #include <u-boot/crc.h> @@ -256,5 +257,10 @@ void crc32_wd_buf(const unsigned char *input, unsigned int ilen, uint32_t crc; crc = crc32_wd(0, input, ilen, chunk_sz); +#ifdef USE_HOSTCC + crc = htobe32(crc); memcpy(output, &crc, sizeof(crc)); +#else + put_unaligned_be32(crc, output); +#endif }