diff mbox

target-arm: crypto: fix BE host support

Message ID 1420208303-24111-1-git-send-email-ard.biesheuvel@linaro.org
State New
Headers show

Commit Message

Ard Biesheuvel Jan. 2, 2015, 2:18 p.m. UTC
The crypto emulation code in target-arm/crypto_helper.c never worked
correctly on big endian hosts, due to the fact that it uses a union
of array types to convert between the native VFP register size (64
bits) and the types used in the algorithms (bytes and 32 bit words)

We cannot just swab between LE and BE when reading and writing the
registers, as the SHA code performs word additions, so instead, add
array accessors for the CRYPTO_STATE type whose LE and BE specific
implementations ensure that the correct array elements are referenced.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---

Only tested on a LE (amd64) host, as I don't have access to a BE host.

 target-arm/crypto_helper.c | 114 +++++++++++++++++++++++++--------------------
 1 file changed, 63 insertions(+), 51 deletions(-)

Comments

Laszlo Ersek Jan. 2, 2015, 3:17 p.m. UTC | #1
On 01/02/15 15:18, Ard Biesheuvel wrote:
> The crypto emulation code in target-arm/crypto_helper.c never worked
> correctly on big endian hosts, due to the fact that it uses a union
> of array types to convert between the native VFP register size (64
> bits) and the types used in the algorithms (bytes and 32 bit words)
> 
> We cannot just swab between LE and BE when reading and writing the
> registers, as the SHA code performs word additions, so instead, add
> array accessors for the CRYPTO_STATE type whose LE and BE specific
> implementations ensure that the correct array elements are referenced.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> 
> Only tested on a LE (amd64) host, as I don't have access to a BE host.
> 
>  target-arm/crypto_helper.c | 114 +++++++++++++++++++++++++--------------------
>  1 file changed, 63 insertions(+), 51 deletions(-)
> 
> diff --git a/target-arm/crypto_helper.c b/target-arm/crypto_helper.c
> index dd60d0b81a4c..1fe975d0f1e3 100644
> --- a/target-arm/crypto_helper.c
> +++ b/target-arm/crypto_helper.c
> @@ -22,6 +22,14 @@ union CRYPTO_STATE {
>      uint64_t   l[2];
>  };
>  
> +#ifdef HOST_WORDS_BIGENDIAN
> +#define CR_ST_BYTE(state, i)   (state.bytes[(15 - (i)) ^ 8])

Produces

7, 6, 5, 4, 3, 2, 1, 0,
15, 14, 13, 12, 11, 10, 9

which I think is consistent with what Peter wrote.

> +#define CR_ST_WORD(state, i)   (state.words[(3 - (i)) ^ 2])

Hm. The indices look good:

1, 0
3, 2

But, assuming you're on a big-endian host, perhaps you should byte swap
the uint32_t too?...

> (ie if you store 0x112233445566778899aabbccddeeff00 as a 64 bit write
> to VFP register D0 then regs[0] will be
> 0x112233445566778899aabbccddeeff00 regardless of host endianness. That
> is, the least significant 8 bits of D0 will be (regs[0] & 0xff). (This
> isn't the same number as if you do the union-type-punning thing with
> union { uint64_t l; uint8_t b[8]; } and look at b[0].)

Assume the guest writes value 0x112233445566778899aabbccddeeff00 (LE
representation).

* Assuming a little endian host, we get:

  [00 ff ee dd cc bb aa 99 88 77 66 55 44 33 22 11]

(full 128-bit LE vector).

This is grouped into units of four bytes for the words array:

  [00 ff ee dd] [cc bb aa 99] [88 77 66 55] [44 33 22 11]

And the meaning of the individual uint32_t's is:

  0xddeeff00 0x99aabbcc 0x55667788 0x11223344

Which are the building blocks for the crypto primitives.

* Assuming a big endian host, you get the following byte array for the
same store (if I understand Peter right -- two uint64_t values are
encoded as host endian, but the ordering between those remains little
endian):

  [99 aa bb cc dd ee ff 00   11 22 33 44 55 66 77 88]

(and your CR_ST_BYTE() macro looks right for this).

Grouping this into units of four bytes for the words array, we get

  [99 aa bb cc] [dd ee ff 00] [11 22 33 44] [55 66 77 88]

resulting in values

  0x99aabbcc 0xddeeff00 0x11223344 0x55667788

And then your CR_ST_WORD macro permutes them (1, 0, 3, 2) to the
following values:

  0xddeeff00 0x99aabbcc 0x55667788 0x11223344

... Which is identical to the LE host result.

So this part looks fine to me.

Regarding the rest -- in my opinion, getting the implementation of
crypto primitivies *wrong* despite them succesfully passing an extensive
test suite is exceedingly unlikely. You usually cannot get crypto
primitives "halfway right" -- as long as you don't *combine* them to
bigger building blocks, crypto primitives are 100% specified and there
are no "corner cases" that are usual for other program logic. The
implementation either works or is catastrophically broken (which is
quickly visible).

Hence, if Peter gets good test results for this patchset, then I think
that *suffices* to apply the patch.

... Which is why I won't try to eyeball the rest of the patch where you
put the macros to use. :) The macros themselves are sound. If you broke
their application, they won't pass the test suite (in the kernel bootup
code, or elsewhere).

Acked-by: Laszlo Ersek <lersek@redhat.com>

Thanks
Laszlo

> +#else
> +#define CR_ST_BYTE(state, i)   (state.bytes[i])
> +#define CR_ST_WORD(state, i)   (state.words[i])
> +#endif
> +
>  void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
>                           uint32_t decrypt)
>  {
> @@ -46,7 +54,7 @@ void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
>  
>      /* combine ShiftRows operation and sbox substitution */
>      for (i = 0; i < 16; i++) {
> -        st.bytes[i] = sbox[decrypt][rk.bytes[shift[decrypt][i]]];
> +        CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
>      }
>  
>      env->vfp.regs[rd] = make_float64(st.l[0]);
> @@ -198,11 +206,11 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
>      assert(decrypt < 2);
>  
>      for (i = 0; i < 16; i += 4) {
> -        st.words[i >> 2] = cpu_to_le32(
> -            mc[decrypt][st.bytes[i]] ^
> -            rol32(mc[decrypt][st.bytes[i + 1]], 8) ^
> -            rol32(mc[decrypt][st.bytes[i + 2]], 16) ^
> -            rol32(mc[decrypt][st.bytes[i + 3]], 24));
> +        CR_ST_WORD(st, i >> 2) =
> +            mc[decrypt][CR_ST_BYTE(st, i)] ^
> +            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
> +            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
> +            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
>      }
>  
>      env->vfp.regs[rd] = make_float64(st.l[0]);
> @@ -255,24 +263,25 @@ void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn,
>  
>              switch (op) {
>              case 0: /* sha1c */
> -                t = cho(d.words[1], d.words[2], d.words[3]);
> +                t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
>                  break;
>              case 1: /* sha1p */
> -                t = par(d.words[1], d.words[2], d.words[3]);
> +                t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
>                  break;
>              case 2: /* sha1m */
> -                t = maj(d.words[1], d.words[2], d.words[3]);
> +                t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
>                  break;
>              default:
>                  g_assert_not_reached();
>              }
> -            t += rol32(d.words[0], 5) + n.words[0] + m.words[i];
> -
> -            n.words[0] = d.words[3];
> -            d.words[3] = d.words[2];
> -            d.words[2] = ror32(d.words[1], 2);
> -            d.words[1] = d.words[0];
> -            d.words[0] = t;
> +            t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0)
> +                 + CR_ST_WORD(m, i);
> +
> +            CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3);
> +            CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
> +            CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2);
> +            CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
> +            CR_ST_WORD(d, 0) = t;
>          }
>      }
>      env->vfp.regs[rd] = make_float64(d.l[0]);
> @@ -286,8 +295,8 @@ void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, uint32_t rm)
>          float64_val(env->vfp.regs[rm + 1])
>      } };
>  
> -    m.words[0] = ror32(m.words[0], 2);
> -    m.words[1] = m.words[2] = m.words[3] = 0;
> +    CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2);
> +    CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0;
>  
>      env->vfp.regs[rd] = make_float64(m.l[0]);
>      env->vfp.regs[rd + 1] = make_float64(m.l[1]);
> @@ -304,10 +313,10 @@ void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t rd, uint32_t rm)
>          float64_val(env->vfp.regs[rm + 1])
>      } };
>  
> -    d.words[0] = rol32(d.words[0] ^ m.words[1], 1);
> -    d.words[1] = rol32(d.words[1] ^ m.words[2], 1);
> -    d.words[2] = rol32(d.words[2] ^ m.words[3], 1);
> -    d.words[3] = rol32(d.words[3] ^ d.words[0], 1);
> +    CR_ST_WORD(d, 0) = rol32(CR_ST_WORD(d, 0) ^ CR_ST_WORD(m, 1), 1);
> +    CR_ST_WORD(d, 1) = rol32(CR_ST_WORD(d, 1) ^ CR_ST_WORD(m, 2), 1);
> +    CR_ST_WORD(d, 2) = rol32(CR_ST_WORD(d, 2) ^ CR_ST_WORD(m, 3), 1);
> +    CR_ST_WORD(d, 3) = rol32(CR_ST_WORD(d, 3) ^ CR_ST_WORD(d, 0), 1);
>  
>      env->vfp.regs[rd] = make_float64(d.l[0]);
>      env->vfp.regs[rd + 1] = make_float64(d.l[1]);
> @@ -356,20 +365,22 @@ void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t rd, uint32_t rn,
>      int i;
>  
>      for (i = 0; i < 4; i++) {
> -        uint32_t t = cho(n.words[0], n.words[1], n.words[2]) + n.words[3]
> -                     + S1(n.words[0]) + m.words[i];
> -
> -        n.words[3] = n.words[2];
> -        n.words[2] = n.words[1];
> -        n.words[1] = n.words[0];
> -        n.words[0] = d.words[3] + t;
> -
> -        t += maj(d.words[0], d.words[1], d.words[2]) + S0(d.words[0]);
> -
> -        d.words[3] = d.words[2];
> -        d.words[2] = d.words[1];
> -        d.words[1] = d.words[0];
> -        d.words[0] = t;
> +        uint32_t t = cho(CR_ST_WORD(n, 0), CR_ST_WORD(n, 1), CR_ST_WORD(n, 2))
> +                     + CR_ST_WORD(n, 3) + S1(CR_ST_WORD(n, 0))
> +                     + CR_ST_WORD(m, i);
> +
> +        CR_ST_WORD(n, 3) = CR_ST_WORD(n, 2);
> +        CR_ST_WORD(n, 2) = CR_ST_WORD(n, 1);
> +        CR_ST_WORD(n, 1) = CR_ST_WORD(n, 0);
> +        CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3) + t;
> +
> +        t += maj(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
> +             + S0(CR_ST_WORD(d, 0));
> +
> +        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
> +        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
> +        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
> +        CR_ST_WORD(d, 0) = t;
>      }
>  
>      env->vfp.regs[rd] = make_float64(d.l[0]);
> @@ -394,13 +405,14 @@ void HELPER(crypto_sha256h2)(CPUARMState *env, uint32_t rd, uint32_t rn,
>      int i;
>  
>      for (i = 0; i < 4; i++) {
> -        uint32_t t = cho(d.words[0], d.words[1], d.words[2]) + d.words[3]
> -                     + S1(d.words[0]) + m.words[i];
> -
> -        d.words[3] = d.words[2];
> -        d.words[2] = d.words[1];
> -        d.words[1] = d.words[0];
> -        d.words[0] = n.words[3 - i] + t;
> +        uint32_t t = cho(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
> +                     + CR_ST_WORD(d, 3) + S1(CR_ST_WORD(d, 0))
> +                     + CR_ST_WORD(m, i);
> +
> +        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
> +        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
> +        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
> +        CR_ST_WORD(d, 0) = CR_ST_WORD(n, 3 - i) + t;
>      }
>  
>      env->vfp.regs[rd] = make_float64(d.l[0]);
> @@ -418,10 +430,10 @@ void HELPER(crypto_sha256su0)(CPUARMState *env, uint32_t rd, uint32_t rm)
>          float64_val(env->vfp.regs[rm + 1])
>      } };
>  
> -    d.words[0] += s0(d.words[1]);
> -    d.words[1] += s0(d.words[2]);
> -    d.words[2] += s0(d.words[3]);
> -    d.words[3] += s0(m.words[0]);
> +    CR_ST_WORD(d, 0) += s0(CR_ST_WORD(d, 1));
> +    CR_ST_WORD(d, 1) += s0(CR_ST_WORD(d, 2));
> +    CR_ST_WORD(d, 2) += s0(CR_ST_WORD(d, 3));
> +    CR_ST_WORD(d, 3) += s0(CR_ST_WORD(m, 0));
>  
>      env->vfp.regs[rd] = make_float64(d.l[0]);
>      env->vfp.regs[rd + 1] = make_float64(d.l[1]);
> @@ -443,10 +455,10 @@ void HELPER(crypto_sha256su1)(CPUARMState *env, uint32_t rd, uint32_t rn,
>          float64_val(env->vfp.regs[rm + 1])
>      } };
>  
> -    d.words[0] += s1(m.words[2]) + n.words[1];
> -    d.words[1] += s1(m.words[3]) + n.words[2];
> -    d.words[2] += s1(d.words[0]) + n.words[3];
> -    d.words[3] += s1(d.words[1]) + m.words[0];
> +    CR_ST_WORD(d, 0) += s1(CR_ST_WORD(m, 2)) + CR_ST_WORD(n, 1);
> +    CR_ST_WORD(d, 1) += s1(CR_ST_WORD(m, 3)) + CR_ST_WORD(n, 2);
> +    CR_ST_WORD(d, 2) += s1(CR_ST_WORD(d, 0)) + CR_ST_WORD(n, 3);
> +    CR_ST_WORD(d, 3) += s1(CR_ST_WORD(d, 1)) + CR_ST_WORD(m, 0);
>  
>      env->vfp.regs[rd] = make_float64(d.l[0]);
>      env->vfp.regs[rd + 1] = make_float64(d.l[1]);
>
Peter Maydell Jan. 2, 2015, 5:36 p.m. UTC | #2
On 2 January 2015 at 15:17, Laszlo Ersek <lersek@redhat.com> wrote:
> On 01/02/15 15:18, Ard Biesheuvel wrote:
>> (ie if you store 0x112233445566778899aabbccddeeff00 as a 64 bit write
>> to VFP register D0 then regs[0] will be
>> 0x112233445566778899aabbccddeeff00 regardless of host endianness. That
>> is, the least significant 8 bits of D0 will be (regs[0] & 0xff). (This
>> isn't the same number as if you do the union-type-punning thing with
>> union { uint64_t l; uint8_t b[8]; } and look at b[0].)

This example is confusing because I carefully said "64 bit write"
and then used a 128 bit constant. What I meant was:

 ie if you store 0x1122334455667788 as a 64 bit write
 to VFP register D0 then regs[0] will be
 0x1122334455667788 regardless of host endianness.

For 128 bit vectors, if you store
0x112233445566778899aabbccddeeff00 to Q0 then you get
regs[0] == 0x99aabbccddeeff00
regs[1] == 0x1122334455667788

(as is required architecturally in order for a subsequent guest
read from D0 to do the right thing).

-- PMM
Laszlo Ersek Jan. 2, 2015, 7:18 p.m. UTC | #3
On 01/02/15 18:36, Peter Maydell wrote:
> On 2 January 2015 at 15:17, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 01/02/15 15:18, Ard Biesheuvel wrote:
>>> (ie if you store 0x112233445566778899aabbccddeeff00 as a 64 bit write
>>> to VFP register D0 then regs[0] will be
>>> 0x112233445566778899aabbccddeeff00 regardless of host endianness. That
>>> is, the least significant 8 bits of D0 will be (regs[0] & 0xff). (This
>>> isn't the same number as if you do the union-type-punning thing with
>>> union { uint64_t l; uint8_t b[8]; } and look at b[0].)
> 
> This example is confusing because I carefully said "64 bit write"
> and then used a 128 bit constant. What I meant was:
> 
>  ie if you store 0x1122334455667788 as a 64 bit write
>  to VFP register D0 then regs[0] will be
>  0x1122334455667788 regardless of host endianness.
> 
> For 128 bit vectors, if you store
> 0x112233445566778899aabbccddeeff00 to Q0 then you get
> regs[0] == 0x99aabbccddeeff00
> regs[1] == 0x1122334455667788
> 
> (as is required architecturally in order for a subsequent guest
> read from D0 to do the right thing).

Thank you for the correction -- but I think that's exactly what I worked
with in the rest of my email, don't you find?

Thanks
Laszlo
Ard Biesheuvel Jan. 2, 2015, 7:21 p.m. UTC | #4
On 2 January 2015 at 15:17, Laszlo Ersek <lersek@redhat.com> wrote:
> On 01/02/15 15:18, Ard Biesheuvel wrote:
>> The crypto emulation code in target-arm/crypto_helper.c never worked
>> correctly on big endian hosts, due to the fact that it uses a union
>> of array types to convert between the native VFP register size (64
>> bits) and the types used in the algorithms (bytes and 32 bit words)
>>
>> We cannot just swab between LE and BE when reading and writing the
>> registers, as the SHA code performs word additions, so instead, add
>> array accessors for the CRYPTO_STATE type whose LE and BE specific
>> implementations ensure that the correct array elements are referenced.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>
>> Only tested on a LE (amd64) host, as I don't have access to a BE host.
>>
>>  target-arm/crypto_helper.c | 114 +++++++++++++++++++++++++--------------------
>>  1 file changed, 63 insertions(+), 51 deletions(-)
>>
>> diff --git a/target-arm/crypto_helper.c b/target-arm/crypto_helper.c
>> index dd60d0b81a4c..1fe975d0f1e3 100644
>> --- a/target-arm/crypto_helper.c
>> +++ b/target-arm/crypto_helper.c
>> @@ -22,6 +22,14 @@ union CRYPTO_STATE {
>>      uint64_t   l[2];
>>  };
>>
>> +#ifdef HOST_WORDS_BIGENDIAN
>> +#define CR_ST_BYTE(state, i)   (state.bytes[(15 - (i)) ^ 8])
>
> Produces
>
> 7, 6, 5, 4, 3, 2, 1, 0,
> 15, 14, 13, 12, 11, 10, 9
>
> which I think is consistent with what Peter wrote.
>
>> +#define CR_ST_WORD(state, i)   (state.words[(3 - (i)) ^ 2])
>
> Hm. The indices look good:
>
> 1, 0
> 3, 2
>
> But, assuming you're on a big-endian host, perhaps you should byte swap
> the uint32_t too?...
>

No, I don't think so. I even removed the cpu_to_le32() call from the
aesmc helper because the MixColumns lookup tables will be emitted in
host native endianness, and the vfp.regs[] array is host native
endianness as well. I think the union type may have been a mistake to
begin with, because it introduces endianness dependencies that don't
actually exist in the code. It probably would have been cleaner if I
had defined the relation between VFP D-regs, words and bytes in terms
of shifts and masks instead.

>> (ie if you store 0x112233445566778899aabbccddeeff00 as a 64 bit write
>> to VFP register D0 then regs[0] will be
>> 0x112233445566778899aabbccddeeff00 regardless of host endianness. That
>> is, the least significant 8 bits of D0 will be (regs[0] & 0xff). (This
>> isn't the same number as if you do the union-type-punning thing with
>> union { uint64_t l; uint8_t b[8]; } and look at b[0].)
>
> Assume the guest writes value 0x112233445566778899aabbccddeeff00 (LE
> representation).
>
> * Assuming a little endian host, we get:
>
>   [00 ff ee dd cc bb aa 99 88 77 66 55 44 33 22 11]
>
> (full 128-bit LE vector).
>
> This is grouped into units of four bytes for the words array:
>
>   [00 ff ee dd] [cc bb aa 99] [88 77 66 55] [44 33 22 11]
>
> And the meaning of the individual uint32_t's is:
>
>   0xddeeff00 0x99aabbcc 0x55667788 0x11223344
>
> Which are the building blocks for the crypto primitives.
>
> * Assuming a big endian host, you get the following byte array for the
> same store (if I understand Peter right -- two uint64_t values are
> encoded as host endian, but the ordering between those remains little
> endian):
>
>   [99 aa bb cc dd ee ff 00   11 22 33 44 55 66 77 88]
>
> (and your CR_ST_BYTE() macro looks right for this).
>
> Grouping this into units of four bytes for the words array, we get
>
>   [99 aa bb cc] [dd ee ff 00] [11 22 33 44] [55 66 77 88]
>
> resulting in values
>
>   0x99aabbcc 0xddeeff00 0x11223344 0x55667788
>
> And then your CR_ST_WORD macro permutes them (1, 0, 3, 2) to the
> following values:
>
>   0xddeeff00 0x99aabbcc 0x55667788 0x11223344
>
> ... Which is identical to the LE host result.
>
> So this part looks fine to me.
>

Thanks for the elaborate review.

> Regarding the rest -- in my opinion, getting the implementation of
> crypto primitivies *wrong* despite them succesfully passing an extensive
> test suite is exceedingly unlikely. You usually cannot get crypto
> primitives "halfway right" -- as long as you don't *combine* them to
> bigger building blocks, crypto primitives are 100% specified and there
> are no "corner cases" that are usual for other program logic. The
> implementation either works or is catastrophically broken (which is
> quickly visible).
>
> Hence, if Peter gets good test results for this patchset, then I think
> that *suffices* to apply the patch.
>

Agreed.

> ... Which is why I won't try to eyeball the rest of the patch where you
> put the macros to use. :) The macros themselves are sound. If you broke
> their application, they won't pass the test suite (in the kernel bootup
> code, or elsewhere).
>
> Acked-by: Laszlo Ersek <lersek@redhat.com>
>
> Thanks
> Laszlo
>

Cheers,
Ard.


>> +#else
>> +#define CR_ST_BYTE(state, i)   (state.bytes[i])
>> +#define CR_ST_WORD(state, i)   (state.words[i])
>> +#endif
>> +
>>  void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
>>                           uint32_t decrypt)
>>  {
>> @@ -46,7 +54,7 @@ void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
>>
>>      /* combine ShiftRows operation and sbox substitution */
>>      for (i = 0; i < 16; i++) {
>> -        st.bytes[i] = sbox[decrypt][rk.bytes[shift[decrypt][i]]];
>> +        CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
>>      }
>>
>>      env->vfp.regs[rd] = make_float64(st.l[0]);
>> @@ -198,11 +206,11 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
>>      assert(decrypt < 2);
>>
>>      for (i = 0; i < 16; i += 4) {
>> -        st.words[i >> 2] = cpu_to_le32(
>> -            mc[decrypt][st.bytes[i]] ^
>> -            rol32(mc[decrypt][st.bytes[i + 1]], 8) ^
>> -            rol32(mc[decrypt][st.bytes[i + 2]], 16) ^
>> -            rol32(mc[decrypt][st.bytes[i + 3]], 24));
>> +        CR_ST_WORD(st, i >> 2) =
>> +            mc[decrypt][CR_ST_BYTE(st, i)] ^
>> +            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
>> +            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
>> +            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
>>      }
>>
>>      env->vfp.regs[rd] = make_float64(st.l[0]);
>> @@ -255,24 +263,25 @@ void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn,
>>
>>              switch (op) {
>>              case 0: /* sha1c */
>> -                t = cho(d.words[1], d.words[2], d.words[3]);
>> +                t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
>>                  break;
>>              case 1: /* sha1p */
>> -                t = par(d.words[1], d.words[2], d.words[3]);
>> +                t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
>>                  break;
>>              case 2: /* sha1m */
>> -                t = maj(d.words[1], d.words[2], d.words[3]);
>> +                t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
>>                  break;
>>              default:
>>                  g_assert_not_reached();
>>              }
>> -            t += rol32(d.words[0], 5) + n.words[0] + m.words[i];
>> -
>> -            n.words[0] = d.words[3];
>> -            d.words[3] = d.words[2];
>> -            d.words[2] = ror32(d.words[1], 2);
>> -            d.words[1] = d.words[0];
>> -            d.words[0] = t;
>> +            t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0)
>> +                 + CR_ST_WORD(m, i);
>> +
>> +            CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3);
>> +            CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
>> +            CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2);
>> +            CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
>> +            CR_ST_WORD(d, 0) = t;
>>          }
>>      }
>>      env->vfp.regs[rd] = make_float64(d.l[0]);
>> @@ -286,8 +295,8 @@ void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, uint32_t rm)
>>          float64_val(env->vfp.regs[rm + 1])
>>      } };
>>
>> -    m.words[0] = ror32(m.words[0], 2);
>> -    m.words[1] = m.words[2] = m.words[3] = 0;
>> +    CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2);
>> +    CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0;
>>
>>      env->vfp.regs[rd] = make_float64(m.l[0]);
>>      env->vfp.regs[rd + 1] = make_float64(m.l[1]);
>> @@ -304,10 +313,10 @@ void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t rd, uint32_t rm)
>>          float64_val(env->vfp.regs[rm + 1])
>>      } };
>>
>> -    d.words[0] = rol32(d.words[0] ^ m.words[1], 1);
>> -    d.words[1] = rol32(d.words[1] ^ m.words[2], 1);
>> -    d.words[2] = rol32(d.words[2] ^ m.words[3], 1);
>> -    d.words[3] = rol32(d.words[3] ^ d.words[0], 1);
>> +    CR_ST_WORD(d, 0) = rol32(CR_ST_WORD(d, 0) ^ CR_ST_WORD(m, 1), 1);
>> +    CR_ST_WORD(d, 1) = rol32(CR_ST_WORD(d, 1) ^ CR_ST_WORD(m, 2), 1);
>> +    CR_ST_WORD(d, 2) = rol32(CR_ST_WORD(d, 2) ^ CR_ST_WORD(m, 3), 1);
>> +    CR_ST_WORD(d, 3) = rol32(CR_ST_WORD(d, 3) ^ CR_ST_WORD(d, 0), 1);
>>
>>      env->vfp.regs[rd] = make_float64(d.l[0]);
>>      env->vfp.regs[rd + 1] = make_float64(d.l[1]);
>> @@ -356,20 +365,22 @@ void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t rd, uint32_t rn,
>>      int i;
>>
>>      for (i = 0; i < 4; i++) {
>> -        uint32_t t = cho(n.words[0], n.words[1], n.words[2]) + n.words[3]
>> -                     + S1(n.words[0]) + m.words[i];
>> -
>> -        n.words[3] = n.words[2];
>> -        n.words[2] = n.words[1];
>> -        n.words[1] = n.words[0];
>> -        n.words[0] = d.words[3] + t;
>> -
>> -        t += maj(d.words[0], d.words[1], d.words[2]) + S0(d.words[0]);
>> -
>> -        d.words[3] = d.words[2];
>> -        d.words[2] = d.words[1];
>> -        d.words[1] = d.words[0];
>> -        d.words[0] = t;
>> +        uint32_t t = cho(CR_ST_WORD(n, 0), CR_ST_WORD(n, 1), CR_ST_WORD(n, 2))
>> +                     + CR_ST_WORD(n, 3) + S1(CR_ST_WORD(n, 0))
>> +                     + CR_ST_WORD(m, i);
>> +
>> +        CR_ST_WORD(n, 3) = CR_ST_WORD(n, 2);
>> +        CR_ST_WORD(n, 2) = CR_ST_WORD(n, 1);
>> +        CR_ST_WORD(n, 1) = CR_ST_WORD(n, 0);
>> +        CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3) + t;
>> +
>> +        t += maj(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
>> +             + S0(CR_ST_WORD(d, 0));
>> +
>> +        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
>> +        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
>> +        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
>> +        CR_ST_WORD(d, 0) = t;
>>      }
>>
>>      env->vfp.regs[rd] = make_float64(d.l[0]);
>> @@ -394,13 +405,14 @@ void HELPER(crypto_sha256h2)(CPUARMState *env, uint32_t rd, uint32_t rn,
>>      int i;
>>
>>      for (i = 0; i < 4; i++) {
>> -        uint32_t t = cho(d.words[0], d.words[1], d.words[2]) + d.words[3]
>> -                     + S1(d.words[0]) + m.words[i];
>> -
>> -        d.words[3] = d.words[2];
>> -        d.words[2] = d.words[1];
>> -        d.words[1] = d.words[0];
>> -        d.words[0] = n.words[3 - i] + t;
>> +        uint32_t t = cho(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
>> +                     + CR_ST_WORD(d, 3) + S1(CR_ST_WORD(d, 0))
>> +                     + CR_ST_WORD(m, i);
>> +
>> +        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
>> +        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
>> +        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
>> +        CR_ST_WORD(d, 0) = CR_ST_WORD(n, 3 - i) + t;
>>      }
>>
>>      env->vfp.regs[rd] = make_float64(d.l[0]);
>> @@ -418,10 +430,10 @@ void HELPER(crypto_sha256su0)(CPUARMState *env, uint32_t rd, uint32_t rm)
>>          float64_val(env->vfp.regs[rm + 1])
>>      } };
>>
>> -    d.words[0] += s0(d.words[1]);
>> -    d.words[1] += s0(d.words[2]);
>> -    d.words[2] += s0(d.words[3]);
>> -    d.words[3] += s0(m.words[0]);
>> +    CR_ST_WORD(d, 0) += s0(CR_ST_WORD(d, 1));
>> +    CR_ST_WORD(d, 1) += s0(CR_ST_WORD(d, 2));
>> +    CR_ST_WORD(d, 2) += s0(CR_ST_WORD(d, 3));
>> +    CR_ST_WORD(d, 3) += s0(CR_ST_WORD(m, 0));
>>
>>      env->vfp.regs[rd] = make_float64(d.l[0]);
>>      env->vfp.regs[rd + 1] = make_float64(d.l[1]);
>> @@ -443,10 +455,10 @@ void HELPER(crypto_sha256su1)(CPUARMState *env, uint32_t rd, uint32_t rn,
>>          float64_val(env->vfp.regs[rm + 1])
>>      } };
>>
>> -    d.words[0] += s1(m.words[2]) + n.words[1];
>> -    d.words[1] += s1(m.words[3]) + n.words[2];
>> -    d.words[2] += s1(d.words[0]) + n.words[3];
>> -    d.words[3] += s1(d.words[1]) + m.words[0];
>> +    CR_ST_WORD(d, 0) += s1(CR_ST_WORD(m, 2)) + CR_ST_WORD(n, 1);
>> +    CR_ST_WORD(d, 1) += s1(CR_ST_WORD(m, 3)) + CR_ST_WORD(n, 2);
>> +    CR_ST_WORD(d, 2) += s1(CR_ST_WORD(d, 0)) + CR_ST_WORD(n, 3);
>> +    CR_ST_WORD(d, 3) += s1(CR_ST_WORD(d, 1)) + CR_ST_WORD(m, 0);
>>
>>      env->vfp.regs[rd] = make_float64(d.l[0]);
>>      env->vfp.regs[rd + 1] = make_float64(d.l[1]);
>>
>
Peter Maydell Jan. 5, 2015, 12:34 p.m. UTC | #5
On 2 January 2015 at 19:21, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> I think the union type may have been a mistake to
> begin with, because it introduces endianness dependencies that don't
> actually exist in the code. It probably would have been cleaner if I
> had defined the relation between VFP D-regs, words and bytes in terms
> of shifts and masks instead.

Mmm, I think in retrospect this is right. Still, this patch does
cause the kernel's self-tests on boot on a BE ppc64 host to pass,
and looking through the code we haven't missed any accesses to
.bytes or .words, so it looks good to me.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

-- PMM
Ard Biesheuvel Jan. 5, 2015, 1:47 p.m. UTC | #6
On 5 January 2015 at 12:34, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 2 January 2015 at 19:21, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> I think the union type may have been a mistake to
>> begin with, because it introduces endianness dependencies that don't
>> actually exist in the code. It probably would have been cleaner if I
>> had defined the relation between VFP D-regs, words and bytes in terms
>> of shifts and masks instead.
>
> Mmm, I think in retrospect this is right. Still, this patch does
> cause the kernel's self-tests on boot on a BE ppc64 host to pass,
> and looking through the code we haven't missed any accesses to
> .bytes or .words, so it looks good to me.
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>

OK, good, thanks for testing.

I propose we leave the patch as-is then. Are you ok to add the tags
and merge it?

Cheers,
Ard.
Peter Maydell Jan. 5, 2015, 1:58 p.m. UTC | #7
On 5 January 2015 at 13:47, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 5 January 2015 at 12:34, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 2 January 2015 at 19:21, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> I think the union type may have been a mistake to
>>> begin with, because it introduces endianness dependencies that don't
>>> actually exist in the code. It probably would have been cleaner if I
>>> had defined the relation between VFP D-regs, words and bytes in terms
>>> of shifts and masks instead.
>>
>> Mmm, I think in retrospect this is right. Still, this patch does
>> cause the kernel's self-tests on boot on a BE ppc64 host to pass,
>> and looking through the code we haven't missed any accesses to
>> .bytes or .words, so it looks good to me.
>>
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>
>
> OK, good, thanks for testing.
>
> I propose we leave the patch as-is then. Are you ok to add the tags
> and merge it?

Yep; I've put it into target-arm.next.

-- PMM
diff mbox

Patch

diff --git a/target-arm/crypto_helper.c b/target-arm/crypto_helper.c
index dd60d0b81a4c..1fe975d0f1e3 100644
--- a/target-arm/crypto_helper.c
+++ b/target-arm/crypto_helper.c
@@ -22,6 +22,14 @@  union CRYPTO_STATE {
     uint64_t   l[2];
 };
 
+#ifdef HOST_WORDS_BIGENDIAN
+#define CR_ST_BYTE(state, i)   (state.bytes[(15 - (i)) ^ 8])
+#define CR_ST_WORD(state, i)   (state.words[(3 - (i)) ^ 2])
+#else
+#define CR_ST_BYTE(state, i)   (state.bytes[i])
+#define CR_ST_WORD(state, i)   (state.words[i])
+#endif
+
 void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
                          uint32_t decrypt)
 {
@@ -46,7 +54,7 @@  void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
 
     /* combine ShiftRows operation and sbox substitution */
     for (i = 0; i < 16; i++) {
-        st.bytes[i] = sbox[decrypt][rk.bytes[shift[decrypt][i]]];
+        CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
     }
 
     env->vfp.regs[rd] = make_float64(st.l[0]);
@@ -198,11 +206,11 @@  void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
     assert(decrypt < 2);
 
     for (i = 0; i < 16; i += 4) {
-        st.words[i >> 2] = cpu_to_le32(
-            mc[decrypt][st.bytes[i]] ^
-            rol32(mc[decrypt][st.bytes[i + 1]], 8) ^
-            rol32(mc[decrypt][st.bytes[i + 2]], 16) ^
-            rol32(mc[decrypt][st.bytes[i + 3]], 24));
+        CR_ST_WORD(st, i >> 2) =
+            mc[decrypt][CR_ST_BYTE(st, i)] ^
+            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
+            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
+            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
     }
 
     env->vfp.regs[rd] = make_float64(st.l[0]);
@@ -255,24 +263,25 @@  void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn,
 
             switch (op) {
             case 0: /* sha1c */
-                t = cho(d.words[1], d.words[2], d.words[3]);
+                t = cho(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
                 break;
             case 1: /* sha1p */
-                t = par(d.words[1], d.words[2], d.words[3]);
+                t = par(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
                 break;
             case 2: /* sha1m */
-                t = maj(d.words[1], d.words[2], d.words[3]);
+                t = maj(CR_ST_WORD(d, 1), CR_ST_WORD(d, 2), CR_ST_WORD(d, 3));
                 break;
             default:
                 g_assert_not_reached();
             }
-            t += rol32(d.words[0], 5) + n.words[0] + m.words[i];
-
-            n.words[0] = d.words[3];
-            d.words[3] = d.words[2];
-            d.words[2] = ror32(d.words[1], 2);
-            d.words[1] = d.words[0];
-            d.words[0] = t;
+            t += rol32(CR_ST_WORD(d, 0), 5) + CR_ST_WORD(n, 0)
+                 + CR_ST_WORD(m, i);
+
+            CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3);
+            CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+            CR_ST_WORD(d, 2) = ror32(CR_ST_WORD(d, 1), 2);
+            CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+            CR_ST_WORD(d, 0) = t;
         }
     }
     env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -286,8 +295,8 @@  void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, uint32_t rm)
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    m.words[0] = ror32(m.words[0], 2);
-    m.words[1] = m.words[2] = m.words[3] = 0;
+    CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2);
+    CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0;
 
     env->vfp.regs[rd] = make_float64(m.l[0]);
     env->vfp.regs[rd + 1] = make_float64(m.l[1]);
@@ -304,10 +313,10 @@  void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t rd, uint32_t rm)
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    d.words[0] = rol32(d.words[0] ^ m.words[1], 1);
-    d.words[1] = rol32(d.words[1] ^ m.words[2], 1);
-    d.words[2] = rol32(d.words[2] ^ m.words[3], 1);
-    d.words[3] = rol32(d.words[3] ^ d.words[0], 1);
+    CR_ST_WORD(d, 0) = rol32(CR_ST_WORD(d, 0) ^ CR_ST_WORD(m, 1), 1);
+    CR_ST_WORD(d, 1) = rol32(CR_ST_WORD(d, 1) ^ CR_ST_WORD(m, 2), 1);
+    CR_ST_WORD(d, 2) = rol32(CR_ST_WORD(d, 2) ^ CR_ST_WORD(m, 3), 1);
+    CR_ST_WORD(d, 3) = rol32(CR_ST_WORD(d, 3) ^ CR_ST_WORD(d, 0), 1);
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
     env->vfp.regs[rd + 1] = make_float64(d.l[1]);
@@ -356,20 +365,22 @@  void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t rd, uint32_t rn,
     int i;
 
     for (i = 0; i < 4; i++) {
-        uint32_t t = cho(n.words[0], n.words[1], n.words[2]) + n.words[3]
-                     + S1(n.words[0]) + m.words[i];
-
-        n.words[3] = n.words[2];
-        n.words[2] = n.words[1];
-        n.words[1] = n.words[0];
-        n.words[0] = d.words[3] + t;
-
-        t += maj(d.words[0], d.words[1], d.words[2]) + S0(d.words[0]);
-
-        d.words[3] = d.words[2];
-        d.words[2] = d.words[1];
-        d.words[1] = d.words[0];
-        d.words[0] = t;
+        uint32_t t = cho(CR_ST_WORD(n, 0), CR_ST_WORD(n, 1), CR_ST_WORD(n, 2))
+                     + CR_ST_WORD(n, 3) + S1(CR_ST_WORD(n, 0))
+                     + CR_ST_WORD(m, i);
+
+        CR_ST_WORD(n, 3) = CR_ST_WORD(n, 2);
+        CR_ST_WORD(n, 2) = CR_ST_WORD(n, 1);
+        CR_ST_WORD(n, 1) = CR_ST_WORD(n, 0);
+        CR_ST_WORD(n, 0) = CR_ST_WORD(d, 3) + t;
+
+        t += maj(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
+             + S0(CR_ST_WORD(d, 0));
+
+        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
+        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+        CR_ST_WORD(d, 0) = t;
     }
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -394,13 +405,14 @@  void HELPER(crypto_sha256h2)(CPUARMState *env, uint32_t rd, uint32_t rn,
     int i;
 
     for (i = 0; i < 4; i++) {
-        uint32_t t = cho(d.words[0], d.words[1], d.words[2]) + d.words[3]
-                     + S1(d.words[0]) + m.words[i];
-
-        d.words[3] = d.words[2];
-        d.words[2] = d.words[1];
-        d.words[1] = d.words[0];
-        d.words[0] = n.words[3 - i] + t;
+        uint32_t t = cho(CR_ST_WORD(d, 0), CR_ST_WORD(d, 1), CR_ST_WORD(d, 2))
+                     + CR_ST_WORD(d, 3) + S1(CR_ST_WORD(d, 0))
+                     + CR_ST_WORD(m, i);
+
+        CR_ST_WORD(d, 3) = CR_ST_WORD(d, 2);
+        CR_ST_WORD(d, 2) = CR_ST_WORD(d, 1);
+        CR_ST_WORD(d, 1) = CR_ST_WORD(d, 0);
+        CR_ST_WORD(d, 0) = CR_ST_WORD(n, 3 - i) + t;
     }
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
@@ -418,10 +430,10 @@  void HELPER(crypto_sha256su0)(CPUARMState *env, uint32_t rd, uint32_t rm)
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    d.words[0] += s0(d.words[1]);
-    d.words[1] += s0(d.words[2]);
-    d.words[2] += s0(d.words[3]);
-    d.words[3] += s0(m.words[0]);
+    CR_ST_WORD(d, 0) += s0(CR_ST_WORD(d, 1));
+    CR_ST_WORD(d, 1) += s0(CR_ST_WORD(d, 2));
+    CR_ST_WORD(d, 2) += s0(CR_ST_WORD(d, 3));
+    CR_ST_WORD(d, 3) += s0(CR_ST_WORD(m, 0));
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
     env->vfp.regs[rd + 1] = make_float64(d.l[1]);
@@ -443,10 +455,10 @@  void HELPER(crypto_sha256su1)(CPUARMState *env, uint32_t rd, uint32_t rn,
         float64_val(env->vfp.regs[rm + 1])
     } };
 
-    d.words[0] += s1(m.words[2]) + n.words[1];
-    d.words[1] += s1(m.words[3]) + n.words[2];
-    d.words[2] += s1(d.words[0]) + n.words[3];
-    d.words[3] += s1(d.words[1]) + m.words[0];
+    CR_ST_WORD(d, 0) += s1(CR_ST_WORD(m, 2)) + CR_ST_WORD(n, 1);
+    CR_ST_WORD(d, 1) += s1(CR_ST_WORD(m, 3)) + CR_ST_WORD(n, 2);
+    CR_ST_WORD(d, 2) += s1(CR_ST_WORD(d, 0)) + CR_ST_WORD(n, 3);
+    CR_ST_WORD(d, 3) += s1(CR_ST_WORD(d, 1)) + CR_ST_WORD(m, 0);
 
     env->vfp.regs[rd] = make_float64(d.l[0]);
     env->vfp.regs[rd + 1] = make_float64(d.l[1]);