diff mbox series

[v8.1,1/2] target/s390x: support SHA-512 extensions

Message ID 20220922153820.221811-1-david@redhat.com
State New
Headers show
Series [v8.1,1/2] target/s390x: support SHA-512 extensions | expand

Commit Message

David Hildenbrand Sept. 22, 2022, 3:38 p.m. UTC
From: "Jason A. Donenfeld" <Jason@zx2c4.com>

In order to fully support MSA_EXT_5, we have to support the SHA-512
special instructions. So implement those.

The implementation began as something TweetNacl-like, and then was
adjusted to be useful here. It's not very beautiful, but it is quite
short and compact, which is what we're going for.

Cc: Thomas Huth <thuth@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
[ restructure, add missing exception, add comments, fixup CPU model ]
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c       |   3 +
 target/s390x/gen-features.c      |   9 +-
 target/s390x/tcg/crypto_helper.c | 229 +++++++++++++++++++++++++++++++
 3 files changed, 240 insertions(+), 1 deletion(-)

Comments

Thomas Huth Sept. 22, 2022, 3:55 p.m. UTC | #1
On 22/09/2022 17.38, David Hildenbrand wrote:
> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> 
> In order to fully support MSA_EXT_5, we have to support the SHA-512
> special instructions. So implement those.
> 
> The implementation began as something TweetNacl-like, and then was
> adjusted to be useful here. It's not very beautiful, but it is quite
> short and compact, which is what we're going for.
...
> @@ -52,6 +278,9 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
>               cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
>           }
>           break;
> +    case 3: /* CPACF_*_SHA_512 */
> +        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
> +                            &env->regs[r2 + 1], type);

I have to say that I liked Jason's v8 better here. Code 3 is also used for 
other instructions with completely different meaning, e.g. PCKMO uses 3 for 
TDEA-192 ... so having the "type" check here made more sense.
(meta comment: maybe we should split up the msa function and stop using just 
one function for all crypto/rng related instructions? ... but that's of 
course something for a different patch series)

  Thomas
Jason A. Donenfeld Sept. 22, 2022, 4:35 p.m. UTC | #2
On Thu, Sep 22, 2022 at 5:55 PM Thomas Huth <thuth@redhat.com> wrote:
>
> On 22/09/2022 17.38, David Hildenbrand wrote:
> > From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> >
> > In order to fully support MSA_EXT_5, we have to support the SHA-512
> > special instructions. So implement those.
> >
> > The implementation began as something TweetNacl-like, and then was
> > adjusted to be useful here. It's not very beautiful, but it is quite
> > short and compact, which is what we're going for.
> ...
> > @@ -52,6 +278,9 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
> >               cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
> >           }
> >           break;
> > +    case 3: /* CPACF_*_SHA_512 */
> > +        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
> > +                            &env->regs[r2 + 1], type);
>
> I have to say that I liked Jason's v8 better here. Code 3 is also used for
> other instructions with completely different meaning, e.g. PCKMO uses 3 for
> TDEA-192 ... so having the "type" check here made more sense.
> (meta comment: maybe we should split up the msa function and stop using just
> one function for all crypto/rng related instructions? ... but that's of
> course something for a different patch series)

Maybe just commit my v8, and then David's changes can layer on top as
follow ups? Checking len alignment, for example, is a separate patch
from the rest.

Jason
David Hildenbrand Sept. 22, 2022, 5:18 p.m. UTC | #3
On 22.09.22 17:55, Thomas Huth wrote:
> On 22/09/2022 17.38, David Hildenbrand wrote:
>> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>>
>> In order to fully support MSA_EXT_5, we have to support the SHA-512
>> special instructions. So implement those.
>>
>> The implementation began as something TweetNacl-like, and then was
>> adjusted to be useful here. It's not very beautiful, but it is quite
>> short and compact, which is what we're going for.
> ...
>> @@ -52,6 +278,9 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
>>                cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
>>            }
>>            break;
>> +    case 3: /* CPACF_*_SHA_512 */
>> +        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
>> +                            &env->regs[r2 + 1], type);
> 
> I have to say that I liked Jason's v8 better here. Code 3 is also used for
> other instructions with completely different meaning, e.g. PCKMO uses 3 for
> TDEA-192 ... so having the "type" check here made more sense.
> (meta comment: maybe we should split up the msa function and stop using just
> one function for all crypto/rng related instructions? ... but that's of
> course something for a different patch series)

It kind-f made sense in v8 where we actually had different functions. We 
no longer have that here.

Anyhow, this is just the same as in patch 2/2. So no need to be fancy 
here ;)
Thomas Huth Sept. 23, 2022, 6:23 a.m. UTC | #4
On 22/09/2022 19.18, David Hildenbrand wrote:
> On 22.09.22 17:55, Thomas Huth wrote:
>> On 22/09/2022 17.38, David Hildenbrand wrote:
>>> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>>>
>>> In order to fully support MSA_EXT_5, we have to support the SHA-512
>>> special instructions. So implement those.
>>>
>>> The implementation began as something TweetNacl-like, and then was
>>> adjusted to be useful here. It's not very beautiful, but it is quite
>>> short and compact, which is what we're going for.
>> ...
>>> @@ -52,6 +278,9 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, 
>>> uint32_t r2, uint32_t r3,
>>>                cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
>>>            }
>>>            break;
>>> +    case 3: /* CPACF_*_SHA_512 */
>>> +        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
>>> +                            &env->regs[r2 + 1], type);
>>
>> I have to say that I liked Jason's v8 better here. Code 3 is also used for
>> other instructions with completely different meaning, e.g. PCKMO uses 3 for
>> TDEA-192 ... so having the "type" check here made more sense.
>> (meta comment: maybe we should split up the msa function and stop using just
>> one function for all crypto/rng related instructions? ... but that's of
>> course something for a different patch series)
> 
> It kind-f made sense in v8 where we actually had different functions. We no 
> longer have that here.

But the point is that the "msa" helper is also called for other instructions 
like PCKMO which can also be called with code 3. And there it means 
something completely different. ... unless I completely misunderstood the 
code, of course.

I think I'll go with Jason's v8 (+ compat machine handling on top) for now, 
so that we better record his state of the patch, and then we can do cleanup 
patches on top later.

  Thomas
Thomas Huth Sept. 23, 2022, 6:25 a.m. UTC | #5
On 22/09/2022 18.35, Jason A. Donenfeld wrote:
> On Thu, Sep 22, 2022 at 5:55 PM Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 22/09/2022 17.38, David Hildenbrand wrote:
>>> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>>>
>>> In order to fully support MSA_EXT_5, we have to support the SHA-512
>>> special instructions. So implement those.
>>>
>>> The implementation began as something TweetNacl-like, and then was
>>> adjusted to be useful here. It's not very beautiful, but it is quite
>>> short and compact, which is what we're going for.
>> ...
>>> @@ -52,6 +278,9 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
>>>                cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
>>>            }
>>>            break;
>>> +    case 3: /* CPACF_*_SHA_512 */
>>> +        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
>>> +                            &env->regs[r2 + 1], type);
>>
>> I have to say that I liked Jason's v8 better here. Code 3 is also used for
>> other instructions with completely different meaning, e.g. PCKMO uses 3 for
>> TDEA-192 ... so having the "type" check here made more sense.
>> (meta comment: maybe we should split up the msa function and stop using just
>> one function for all crypto/rng related instructions? ... but that's of
>> course something for a different patch series)
> 
> Maybe just commit my v8, and then David's changes can layer on top as
> follow ups? Checking len alignment, for example, is a separate patch
> from the rest.

Yes, let's do that now - that will also later help to distinguish who did 
what part of the code.

  Thomas
David Hildenbrand Sept. 23, 2022, 6:37 a.m. UTC | #6
On 23.09.22 08:23, Thomas Huth wrote:
> On 22/09/2022 19.18, David Hildenbrand wrote:
>> On 22.09.22 17:55, Thomas Huth wrote:
>>> On 22/09/2022 17.38, David Hildenbrand wrote:
>>>> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>>>>
>>>> In order to fully support MSA_EXT_5, we have to support the SHA-512
>>>> special instructions. So implement those.
>>>>
>>>> The implementation began as something TweetNacl-like, and then was
>>>> adjusted to be useful here. It's not very beautiful, but it is quite
>>>> short and compact, which is what we're going for.
>>> ...
>>>> @@ -52,6 +278,9 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1,
>>>> uint32_t r2, uint32_t r3,
>>>>                 cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
>>>>             }
>>>>             break;
>>>> +    case 3: /* CPACF_*_SHA_512 */
>>>> +        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
>>>> +                            &env->regs[r2 + 1], type);
>>>
>>> I have to say that I liked Jason's v8 better here. Code 3 is also used for
>>> other instructions with completely different meaning, e.g. PCKMO uses 3 for
>>> TDEA-192 ... so having the "type" check here made more sense.
>>> (meta comment: maybe we should split up the msa function and stop using just
>>> one function for all crypto/rng related instructions? ... but that's of
>>> course something for a different patch series)
>>
>> It kind-f made sense in v8 where we actually had different functions. We no
>> longer have that here.
> 
> But the point is that the "msa" helper is also called for other instructions
> like PCKMO which can also be called with code 3. And there it means
> something completely different. ... unless I completely misunderstood the
> code, of course.

test_be_bit() fences everything off we don't support. Simply falling 
through here and returning 0 at the end doesn't make any sense either.

> 
> I think I'll go with Jason's v8 (+ compat machine handling on top) for now,
> so that we better record his state of the patch, and then we can do cleanup
> patches on top later.

Feel free to commit what you want (I'll be happy to see Jason's work 
upstream for good), but note that

1) I don't feel confident to give the original patch my ack/rb
2) I am not a supporter of committing code that has known issues
3) I won't follow up with additional cleanup patches because I already 
spent more time on this than I originally planned.
Jason A. Donenfeld Sept. 23, 2022, 9:19 a.m. UTC | #7
Hi David & Thomas,

On Fri, Sep 23, 2022 at 08:37:50AM +0200, David Hildenbrand wrote:
> > But the point is that the "msa" helper is also called for other instructions
> > like PCKMO which can also be called with code 3. And there it means
> > something completely different. ... unless I completely misunderstood the
> > code, of course.
> 
> test_be_bit() fences everything off we don't support. Simply falling 
> through here and returning 0 at the end doesn't make any sense either.

Indeed you're right here, and also, there's this line in your code:

    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);

which means things *are* bounded to just those types as Thomas wants
too.

> > I think I'll go with Jason's v8 (+ compat machine handling on top) for now,
> > so that we better record his state of the patch, and then we can do cleanup
> > patches on top later.

I think doing things in layered steps makes sense, so we actually have a
record of how this changes, rather than trying to sneak in huge changes
to a "v8.1" patch, which lore doesn't even understand. At the same time,
I think David's refactoring is for the most part a quite nice
improvement that we shouldn't forget about.

> 3) I won't follow up with additional cleanup patches because I already 
> spent more time on this than I originally planned.

What is this B.S.? I spent months on this code and had to poke you a
bunch to review it. You spend one afternoon with it and you're already
burnt out, apparently. Sorry to hear that. But also, something is amiss
when the volunteer completely outside his realm of expertise has a
greater commitment than the professional maintainer. Regardless, seeing
that kind of thing here doesn't make me enthusiastic about contributing
to s390 stuff in the future, in the sense that hearing "I won't work
more on this" from a maintainer is a contagious sentiment; leaders are
emulated.

The 2/2 patch doesn't even apply on top of your "v8.1 1/2", so your
submission isn't even easily apply-able.

So, here, to kick things off in the right direction, and hopefully
motivate you to send something afresh, below is a diff between v8 in its
entirety and your "v8.1 1/2" patch, so that you can send this in. It
sounds like Thomas may have already picked some of the machine version
handling stuff from that, so it might need a little readjustment, but
that's the general idea. It should be somewhat trivial to submit this
and justify the test_be_bit() stuff or that g_assert() or whatever else
separately.

Jason

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 9a2467c889..e18b816aba 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -803,8 +803,11 @@ DEFINE_CCW_MACHINE(7_2, "7.2", true);
 
 static void ccw_machine_7_1_instance_options(MachineState *machine)
 {
+    static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V7_1 };
+
     ccw_machine_7_2_instance_options(machine);
     s390_cpudef_featoff_greater(16, 1, S390_FEAT_PAIE);
+    s390_set_qemu_cpu_model(0x8561, 15, 1, qemu_cpu_feat);
 }
 
 static void ccw_machine_7_1_class_options(MachineClass *mc)
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index aaade67574..1e3b7c0dc9 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -744,13 +744,16 @@ static uint16_t qemu_V7_0[] = {
     S390_FEAT_MISC_INSTRUCTION_EXT3,
 };
 
+static uint16_t qemu_V7_1[] = {
+    S390_FEAT_VECTOR_ENH2,
+};
+
 /*
  * Features for the "qemu" CPU model of the latest QEMU machine and the "max"
  * CPU model under TCG. Don't include features that are not part of the full
  * feature set of the current "max" CPU model generation.
  */
 static uint16_t qemu_MAX[] = {
-    S390_FEAT_VECTOR_ENH2,
     S390_FEAT_MSA_EXT_5,
     S390_FEAT_KIMD_SHA_512,
     S390_FEAT_KLMD_SHA_512,
@@ -877,6 +880,7 @@ static FeatGroupDefSpec QemuFeatDef[] = {
     QEMU_FEAT_INITIALIZER(V6_0),
     QEMU_FEAT_INITIALIZER(V6_2),
     QEMU_FEAT_INITIALIZER(V7_0),
+    QEMU_FEAT_INITIALIZER(V7_1),
     QEMU_FEAT_INITIALIZER(MAX),
 };
 
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 0daa9a2dd9..762b277884 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -21,13 +21,34 @@
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
-static uint64_t R(uint64_t x, int c) { return (x >> c) | (x << (64 - c)); }
-static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (~x & z); }
-static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z) { return (x & y) ^ (x & z) ^ (y & z); }
-static uint64_t Sigma0(uint64_t x) { return R(x, 28) ^ R(x, 34) ^ R(x, 39); }
-static uint64_t Sigma1(uint64_t x) { return R(x, 14) ^ R(x, 18) ^ R(x, 41); }
-static uint64_t sigma0(uint64_t x) { return R(x, 1) ^ R(x, 8) ^ (x >> 7); }
-static uint64_t sigma1(uint64_t x) { return R(x, 19) ^ R(x, 61) ^ (x >> 6); }
+static uint64_t R(uint64_t x, int c)
+{
+    return (x >> c) | (x << (64 - c));
+}
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z)
+{
+    return (x & y) ^ (~x & z);
+}
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z)
+{
+    return (x & y) ^ (x & z) ^ (y & z);
+}
+static uint64_t Sigma0(uint64_t x)
+{
+    return R(x, 28) ^ R(x, 34) ^ R(x, 39);
+}
+static uint64_t Sigma1(uint64_t x)
+{
+    return R(x, 14) ^ R(x, 18) ^ R(x, 41);
+}
+static uint64_t sigma0(uint64_t x)
+{
+    return R(x, 1) ^ R(x, 8) ^ (x >> 7);
+}
+static uint64_t sigma1(uint64_t x)
+{
+    return R(x, 19) ^ R(x, 61) ^ (x >> 6);
+}
 
 static const uint64_t K[80] = {
     0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
@@ -59,119 +80,169 @@ static const uint64_t K[80] = {
     0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
 };
 
-static int kimd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t parameter_block,
-                       uint64_t *message_reg, uint64_t *len_reg, uint8_t *stack_buffer)
+/* a is icv/ocv, w is a single message block. w will get reused internally. */
+static void sha512_bda(uint64_t a[8], uint64_t w[16])
 {
-    enum { MAX_BLOCKS_PER_RUN = 64 }; /* This is arbitrary, just to keep interactivity. */
-    uint64_t z[8], b[8], a[8], w[16], t;
-    uint64_t message = message_reg ? *message_reg : 0;
-    uint64_t len = *len_reg, processed = 0, addr;
-    int i, j, message_reg_len = 64, blocks = 0, cc = 0;
+    uint64_t t, z[8], b[8];
+    int i, j;
 
-    if (!(env->psw.mask & PSW_MASK_64)) {
-        len = (uint32_t)len;
-        message_reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
-    }
+    memcpy(z, a, sizeof(z));
+    for (i = 0; i < 80; i++) {
+        memcpy(b, a, sizeof(b));
 
-    for (i = 0; i < 8; ++i) {
-        addr = wrap_address(env, parameter_block + 8 * i);
-        z[i] = a[i] = cpu_ldq_be_data_ra(env, addr, ra);
-    }
-
-    while (len >= 128) {
-        if (++blocks > MAX_BLOCKS_PER_RUN) {
-            cc = 3;
-            break;
+        t = a[7] + Sigma1(a[4]) + Ch(a[4], a[5], a[6]) + K[i] + w[i % 16];
+        b[7] = t + Sigma0(a[0]) + Maj(a[0], a[1], a[2]);
+        b[3] += t;
+        for (j = 0; j < 8; ++j) {
+            a[(j + 1) % 8] = b[j];
         }
-
-        for (i = 0; i < 16; ++i) {
-            if (message) {
-                addr = wrap_address(env, message + 8 * i);
-                w[i] = cpu_ldq_be_data_ra(env, addr, ra);
-            } else {
-                w[i] = be64_to_cpu(((uint64_t *)stack_buffer)[i]);
+        if (i % 16 == 15) {
+            for (j = 0; j < 16; ++j) {
+                w[j] += w[(j + 9) % 16] + sigma0(w[(j + 1) % 16]) +
+                        sigma1(w[(j + 14) % 16]);
             }
         }
+    }
 
-        for (i = 0; i < 80; ++i) {
-            for (j = 0; j < 8; ++j) {
-                b[j] = a[j];
-            }
-            t = a[7] + Sigma1(a[4]) + Ch(a[4], a[5], a[6]) + K[i] + w[i % 16];
-            b[7] = t + Sigma0(a[0]) + Maj(a[0], a[1], a[2]);
-            b[3] += t;
-            for (j = 0; j < 8; ++j) {
-                a[(j + 1) % 8] = b[j];
-            }
-            if (i % 16 == 15) {
-                for (j = 0; j < 16; ++j) {
-                    w[j] += w[(j + 9) % 16] + sigma0(w[(j + 1) % 16]) + sigma1(w[(j + 14) % 16]);
-                }
-            }
-        }
+    for (i = 0; i < 8; i++) {
+        a[i] += z[i];
+    }
+}
 
-        for (i = 0; i < 8; ++i) {
-            a[i] += z[i];
-            z[i] = a[i];
-        }
+/* a is icv/ocv, w is a single message block that needs be64 conversion. */
+static void sha512_bda_be64(uint64_t a[8], uint64_t w[16])
+{
+    uint64_t t[16];
+    int i;
 
-        if (message) {
-            message += 128;
-        } else {
-            stack_buffer += 128;
-        }
-        len -= 128;
-        processed += 128;
+    for (i = 0; i < 16; i++) {
+        t[i] = be64_to_cpu(w[i]);
+    }
+    sha512_bda(a, t);
+}
+
+static void sha512_read_icv(CPUS390XState *env, uint64_t addr,
+                            uint64_t a[8], uintptr_t ra)
+{
+    int i;
+
+    for (i = 0; i < 8; i++, addr += 8) {
+        addr = wrap_address(env, addr);
+        a[i] = cpu_ldq_be_data_ra(env, addr, ra);
     }
+}
+
+static void sha512_write_ocv(CPUS390XState *env, uint64_t addr,
+                             uint64_t a[8], uintptr_t ra)
+{
+    int i;
 
-    for (i = 0; i < 8; ++i) {
-        addr = wrap_address(env, parameter_block + 8 * i);
-        cpu_stq_be_data_ra(env, addr, z[i], ra);
+    for (i = 0; i < 8; i++, addr += 8) {
+        addr = wrap_address(env, addr);
+        cpu_stq_be_data_ra(env, addr, a[i], ra);
     }
+}
+
+static void sha512_read_block(CPUS390XState *env, uint64_t addr,
+                              uint64_t a[16], uintptr_t ra)
+{
+    int i;
 
-    if (message_reg) {
-        *message_reg = deposit64(*message_reg, 0, message_reg_len, message);
+    for (i = 0; i < 16; i++, addr += 8) {
+        addr = wrap_address(env, addr);
+        a[i] = cpu_ldq_be_data_ra(env, addr, ra);
     }
-    *len_reg -= processed;
-    return cc;
 }
 
-static int klmd_sha512(CPUS390XState *env, uintptr_t ra, uint64_t parameter_block,
-                        uint64_t *message_reg, uint64_t *len_reg)
+static void sha512_read_mbl_be64(CPUS390XState *env, uint64_t addr,
+                                 uint8_t a[16], uintptr_t ra)
 {
-    uint8_t x[256];
-    uint64_t i, message, len, addr;
-    int j, message_reg_len = 64, cc;
+    int i;
 
-    cc = kimd_sha512(env, ra, parameter_block, message_reg, len_reg, NULL);
-    if (cc) {
-        return cc;
+    for (i = 0; i < 16; i++, addr += 1) {
+        addr = wrap_address(env, addr);
+        a[i] = cpu_ldub_data_ra(env, addr, ra);
     }
+}
+
+static int cpacf_sha512(CPUS390XState *env, uintptr_t ra, uint64_t param_addr,
+                      uint64_t *message_reg, uint64_t *len_reg, uint32_t type)
+{
+    enum { MAX_BLOCKS_PER_RUN = 64 }; /* Arbitrary: keep interactivity. */
+    uint64_t len = *len_reg, a[8], processed = 0;
+    int i, message_reg_len = 64;
+
+    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
 
-    message = *message_reg;
-    len = *len_reg;
     if (!(env->psw.mask & PSW_MASK_64)) {
         len = (uint32_t)len;
         message_reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
     }
 
-    for (i = 0; i < len; ++i) {
-        addr = wrap_address(env, message + i);
-        x[i] = cpu_ldub_data_ra(env, addr, ra);
+    /* KIMD: length has to be properly aligned. */
+    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
+        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
     }
-    memset(x + i, 0, sizeof(x) - i);
-    x[i] = 128;
-    i = i < 112 ? 128 : 256;
-    for (j = 0; j < 16; ++j) {
-        addr = wrap_address(env, parameter_block + 64 + j);
-        x[i - 16 + j] = cpu_ldub_data_ra(env, addr, ra);
+
+    sha512_read_icv(env, param_addr, a, ra);
+
+    /* Process full blocks first. */
+    for (; len >= 128; len -= 128, processed += 128) {
+        uint64_t w[16];
+
+        if (processed >= MAX_BLOCKS_PER_RUN * 128) {
+            break;
+        }
+
+        sha512_read_block(env, *message_reg + processed, w, ra);
+        sha512_bda(a, w);
     }
-    if (kimd_sha512(env, ra, parameter_block, NULL, &i, x)) {
-        g_assert_not_reached(); /* It must handle at least 2 blocks. */
+
+    /* KLMD: Process partial/empty block last. */
+    if (type == S390_FEAT_TYPE_KLMD && len < 128) {
+        uint8_t x[128];
+
+        /* Read the remainder of the message byte-per-byte. */
+        for (i = 0; i < len; i++) {
+            uint64_t addr = wrap_address(env, *message_reg + processed + i);
+
+            x[i] = cpu_ldub_data_ra(env, addr, ra);
+        }
+        /* Pad the remainder with zero and set the top bit. */
+        memset(x + len, 0, 128 - len);
+        x[len] = 128;
+
+        /*
+         * Place the MBL either into this block (if there is space left),
+         * or use an additional one.
+         */
+        if (len < 112) {
+            sha512_read_mbl_be64(env, param_addr + 64, x + 112, ra);
+        }
+        sha512_bda_be64(a, (uint64_t *)x);
+
+        if (len >= 112) {
+            memset(x, 0, 112);
+            sha512_read_mbl_be64(env, param_addr + 64, x + 112, ra);
+            sha512_bda_be64(a, (uint64_t *)x);
+        }
+
+        processed += len;
+        len = 0;
     }
-    *message_reg = deposit64(*message_reg, 0, message_reg_len, message + len);
-    *len_reg -= len;
-    return 0;
+
+    /*
+     * Modify memory after we read all inputs and modify registers only after
+     * writing memory succeeded.
+     *
+     * TODO: if writing fails halfway through (e.g., when crossing page
+     * boundaries), we're in trouble. We'd need something like access_prepare().
+     */
+    sha512_write_ocv(env, param_addr, a, ra);
+    *message_reg = deposit64(*message_reg, 0, message_reg_len,
+                             *message_reg + processed);
+    *len_reg -= processed;
+    return !len ? 0 : 3;
 }
 
 static void fill_buf_random(CPUS390XState *env, uintptr_t ra,
@@ -234,13 +305,8 @@ uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
         }
         break;
     case 3: /* CPACF_*_SHA_512 */
-        switch (type) {
-        case S390_FEAT_TYPE_KIMD:
-            return kimd_sha512(env, ra, env->regs[1], &env->regs[r2], &env->regs[r2 + 1], NULL);
-        case S390_FEAT_TYPE_KLMD:
-            return klmd_sha512(env, ra, env->regs[1], &env->regs[r2], &env->regs[r2 + 1]);
-        }
-        break;
+        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
+                            &env->regs[r2 + 1], type);
     case 114: /* CPACF_PRNO_TRNG */
         fill_buf_random(env, ra, &env->regs[r1], &env->regs[r1 + 1]);
         fill_buf_random(env, ra, &env->regs[r2], &env->regs[r2 + 1]);
David Hildenbrand Sept. 23, 2022, 10:47 a.m. UTC | #8
>> 3) I won't follow up with additional cleanup patches because I already
>> spent more time on this than I originally planned.
> 
> What is this B.S.?
> I spent months on this code and had to poke you a
> bunch to review it. You spend one afternoon with it and you're already
> burnt out, apparently. Sorry to hear that. But also, something is amiss

There is a big difference between "burnt out" and "having to 
prioritize". No need to feel sorry.

You must be fortunate if "one afternoon" is not a significant time 
investment. For me it is a significant investment.

> when the volunteer completely outside his realm of expertise has a
> greater commitment than the professional maintainer. Regardless, seeing > that kind of thing here doesn't make me enthusiastic about contributing
> to s390 stuff in the future, in the sense that hearing "I won't work
> more on this" from a maintainer is a contagious sentiment; leaders are
> emulated.

Let's recap:

1. This is very complicated code and a complicated instruction to
    emulate. It's not easy to review. It's not easy code to write for
    someone new to s390x / TCG -- especially to get memory
    accesses right and work around the lack of memory transactions.

2. We provided early feedback fast, but I expressed that there are
    certain things that need improvements and that might be coded in a
    way that make it easier to understand/review. I had to play myself
    with that code to figure out what it even does and how we can improve
    it. As I was overloaded lately (including vacation, conferences),
    that time was not hard to find because other projects were of higher
    priority on my end.

3. You really pushed me hard offline to look into it. I did it
    ASAP because it fell through the cracks and I expressed that I am
    sorry. I proposed to get it ready for upstream and you agreed. Thomas
    was aware of that communication.


I sent out something ASAP to get your stuff finally merged. I really 
tried my best yesterday. Apparently I failed.

In an ideal world I would have *never* sent out that code. I would have 
provided review feedback and guidance to make that code easier to grasp 
and sort out the remaining issues. I thought we (Thomas included) had an 
agreement that that's the way we are going to do it. Apparently I was wrong.


Most probably I lived in the kernel space too long such that we don't 
rush something upstream. For that reason *I* sent out a patch with 
fixups included instead of requesting more changes after you clearly 
expressed that you don't want to wait any longer.


Here I am, getting told by Thomas that we now do it differently now. 
What I really tried to express here is: if Thomas wants to commit things 
differently now, maybe he can just separate the cleanup parts. I really 
*don't want* to send out a multi-patch series to cleanup individual 
parts -- that takes significantly more time. Especially not if something 
is not committed yet.


Yes, such upstream experiences are discouraging to new contributors. But 
such upstream experiences discourage maintainer like me as well. This 
morning I honestly asked myself if I should still be listed as a 
maintainer under s390x/tcg.

Not sure if s390x/tcg would be better without me, but then I get to 
disappoint less people.

> 
> The 2/2 patch doesn't even apply on top of your "v8.1 1/2", so your
> submission isn't even easily apply-able.

Sorry, but that's a piece of cake for Thomas. And he could always 
request a complete resend from me anytime.
Jason A. Donenfeld Sept. 23, 2022, 11:19 a.m. UTC | #9
On Fri, Sep 23, 2022 at 12:47 PM David Hildenbrand <david@redhat.com> wrote:
> You must be fortunate if "one afternoon" is not a significant time
> investment. For me it is a significant investment.

For me too, to say the least of the multiple afternoons I've spent on
this patch set. Getting back to technical content:

> and sort out the remaining issues. I thought we (Thomas included) had an
> agreement that that's the way we are going to do it. Apparently I was wrong.
> Most probably I lived in the kernel space too long such that we don't
> rush something upstream. For that reason *I* sent out a patch with
> Here I am, getting told by Thomas that we now do it differently now.
> What I really tried to express here is: if Thomas wants to commit things
> differently now, maybe he can just separate the cleanup parts. I really
> *don't want* to send out a multi-patch series to cleanup individual
> parts -- that takes significantly more time. Especially not if something
> is not committed yet.

To recap what's been fixed in your v8.1, versus what's been refactored
out of style preference:

1) It handles the machine versioning.
2) It throws an exception on length alignment in KIMD mode:
+    /* KIMD: length has to be properly aligned. */
+    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
+        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+    }
3) It asserts if type is neither KIMD vs KLMD, with:
+    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);


With (1), Thomas added that to v8.

With (3), doesn't the upper layer of the tcg dispatcher already check
that, and return an error back to the guest? If so, then your change
doesn't do anything. If not, then your change introduces a DoS, since
a guest can now crash the host process by triggering that g_assert(),
right? I had assumed the tcg dispatcher was checking that.

With (2), I found this text:

4. For COMPUTE INTERMEDIATE MESSAGE
DIGEST, the second-operand length is not a multiple
of the data block size of the designated
function (see Figure 7-65 on page 7-102 for
COMPUTE INTERMEDIATE MESSAGE
DIGEST functions). This specification-exception
condition does not apply to the query function,
nor does it apply to COMPUTE LAST MESSAGE
DIGEST.

This part seems like the most important delta between Thomas' plan and
what you posted in v8.1.

With that said, your style refactorings are probably a nice thing to
keep around too.

Jason
David Hildenbrand Sept. 23, 2022, 11:35 a.m. UTC | #10
On 23.09.22 13:19, Jason A. Donenfeld wrote:
> On Fri, Sep 23, 2022 at 12:47 PM David Hildenbrand <david@redhat.com> wrote:
>> You must be fortunate if "one afternoon" is not a significant time
>> investment. For me it is a significant investment.
> 
> For me too, to say the least of the multiple afternoons I've spent on
> this patch set. Getting back to technical content:
> 
>> and sort out the remaining issues. I thought we (Thomas included) had an
>> agreement that that's the way we are going to do it. Apparently I was wrong.
>> Most probably I lived in the kernel space too long such that we don't
>> rush something upstream. For that reason *I* sent out a patch with
>> Here I am, getting told by Thomas that we now do it differently now.
>> What I really tried to express here is: if Thomas wants to commit things
>> differently now, maybe he can just separate the cleanup parts. I really
>> *don't want* to send out a multi-patch series to cleanup individual
>> parts -- that takes significantly more time. Especially not if something
>> is not committed yet.
> 
> To recap what's been fixed in your v8.1, versus what's been refactored
> out of style preference:
> 
> 1) It handles the machine versioning.
> 2) It throws an exception on length alignment in KIMD mode:
> +    /* KIMD: length has to be properly aligned. */
> +    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
> +        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
> +    }
> 3) It asserts if type is neither KIMD vs KLMD, with:
> +    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
> 

One important part is

4) No memory modifications before all inputs were read
Jason A. Donenfeld Sept. 23, 2022, 11:46 a.m. UTC | #11
Hi David,

On Fri, Sep 23, 2022 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 23.09.22 13:19, Jason A. Donenfeld wrote:
> > On Fri, Sep 23, 2022 at 12:47 PM David Hildenbrand <david@redhat.com> wrote:
> >> You must be fortunate if "one afternoon" is not a significant time
> >> investment. For me it is a significant investment.
> >
> > For me too, to say the least of the multiple afternoons I've spent on
> > this patch set. Getting back to technical content:
> >
> >> and sort out the remaining issues. I thought we (Thomas included) had an
> >> agreement that that's the way we are going to do it. Apparently I was wrong.
> >> Most probably I lived in the kernel space too long such that we don't
> >> rush something upstream. For that reason *I* sent out a patch with
> >> Here I am, getting told by Thomas that we now do it differently now.
> >> What I really tried to express here is: if Thomas wants to commit things
> >> differently now, maybe he can just separate the cleanup parts. I really
> >> *don't want* to send out a multi-patch series to cleanup individual
> >> parts -- that takes significantly more time. Especially not if something
> >> is not committed yet.
> >
> > To recap what's been fixed in your v8.1, versus what's been refactored
> > out of style preference:
> >
> > 1) It handles the machine versioning.
> > 2) It throws an exception on length alignment in KIMD mode:
> > +    /* KIMD: length has to be properly aligned. */
> > +    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
> > +        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
> > +    }
> > 3) It asserts if type is neither KIMD vs KLMD, with:
> > +    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
> >
>
> One important part is
>
> 4) No memory modifications before all inputs were read

Ahhh, which v8's klmd doesn't do, since it updates the parameter block
before doing the last block. Is this a requirement of the spec? If
not, then it doesn't matter. But if so, v8's approach is truly
hopeless, and we'd be remiss to not go with your refactoring that
accomplishes this.

Jason
Thomas Huth Sept. 23, 2022, 12:05 p.m. UTC | #12
On 23/09/2022 13.46, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Fri, Sep 23, 2022 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 23.09.22 13:19, Jason A. Donenfeld wrote:
>>> On Fri, Sep 23, 2022 at 12:47 PM David Hildenbrand <david@redhat.com> wrote:
>>>> You must be fortunate if "one afternoon" is not a significant time
>>>> investment. For me it is a significant investment.
>>>
>>> For me too, to say the least of the multiple afternoons I've spent on
>>> this patch set. Getting back to technical content:
>>>
>>>> and sort out the remaining issues. I thought we (Thomas included) had an
>>>> agreement that that's the way we are going to do it. Apparently I was wrong.
>>>> Most probably I lived in the kernel space too long such that we don't
>>>> rush something upstream. For that reason *I* sent out a patch with
>>>> Here I am, getting told by Thomas that we now do it differently now.
>>>> What I really tried to express here is: if Thomas wants to commit things
>>>> differently now, maybe he can just separate the cleanup parts. I really
>>>> *don't want* to send out a multi-patch series to cleanup individual
>>>> parts -- that takes significantly more time. Especially not if something
>>>> is not committed yet.
>>>
>>> To recap what's been fixed in your v8.1, versus what's been refactored
>>> out of style preference:
>>>
>>> 1) It handles the machine versioning.
>>> 2) It throws an exception on length alignment in KIMD mode:
>>> +    /* KIMD: length has to be properly aligned. */
>>> +    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
>>> +        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>> +    }
>>> 3) It asserts if type is neither KIMD vs KLMD, with:
>>> +    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
>>>
>>
>> One important part is
>>
>> 4) No memory modifications before all inputs were read
> 
> Ahhh, which v8's klmd doesn't do, since it updates the parameter block
> before doing the last block. Is this a requirement of the spec? If
> not, then it doesn't matter. But if so, v8's approach is truly
> hopeless, and we'd be remiss to not go with your refactoring that
> accomplishes this.

Ok, great, if you're fine with the rework, I'll go with David's v8.1 
instead. (keeping an entry on my TODO list to rework that ugly generic "msa" 
helper function one day - this really kept me being confused for much of my 
patch review time)

  Thomas
Jason A. Donenfeld Sept. 23, 2022, 12:07 p.m. UTC | #13
On Fri, Sep 23, 2022 at 2:05 PM Thomas Huth <thuth@redhat.com> wrote:
>
> On 23/09/2022 13.46, Jason A. Donenfeld wrote:
> > Hi David,
> >
> > On Fri, Sep 23, 2022 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 23.09.22 13:19, Jason A. Donenfeld wrote:
> >>> On Fri, Sep 23, 2022 at 12:47 PM David Hildenbrand <david@redhat.com> wrote:
> >>>> You must be fortunate if "one afternoon" is not a significant time
> >>>> investment. For me it is a significant investment.
> >>>
> >>> For me too, to say the least of the multiple afternoons I've spent on
> >>> this patch set. Getting back to technical content:
> >>>
> >>>> and sort out the remaining issues. I thought we (Thomas included) had an
> >>>> agreement that that's the way we are going to do it. Apparently I was wrong.
> >>>> Most probably I lived in the kernel space too long such that we don't
> >>>> rush something upstream. For that reason *I* sent out a patch with
> >>>> Here I am, getting told by Thomas that we now do it differently now.
> >>>> What I really tried to express here is: if Thomas wants to commit things
> >>>> differently now, maybe he can just separate the cleanup parts. I really
> >>>> *don't want* to send out a multi-patch series to cleanup individual
> >>>> parts -- that takes significantly more time. Especially not if something
> >>>> is not committed yet.
> >>>
> >>> To recap what's been fixed in your v8.1, versus what's been refactored
> >>> out of style preference:
> >>>
> >>> 1) It handles the machine versioning.
> >>> 2) It throws an exception on length alignment in KIMD mode:
> >>> +    /* KIMD: length has to be properly aligned. */
> >>> +    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
> >>> +        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
> >>> +    }
> >>> 3) It asserts if type is neither KIMD vs KLMD, with:
> >>> +    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
> >>>
> >>
> >> One important part is
> >>
> >> 4) No memory modifications before all inputs were read
> >
> > Ahhh, which v8's klmd doesn't do, since it updates the parameter block
> > before doing the last block. Is this a requirement of the spec? If
> > not, then it doesn't matter. But if so, v8's approach is truly
> > hopeless, and we'd be remiss to not go with your refactoring that
> > accomplishes this.
>
> Ok, great, if you're fine with the rework, I'll go with David's v8.1
> instead. (keeping an entry on my TODO list to rework that ugly generic "msa"
> helper function one day - this really kept me being confused for much of my
> patch review time)

Okay, sure. Can one of you just look to see if that g_assert() is
going to be a DoS vector, though, or if it'll never be reached if the
prior code goes well? That's the one remaining thing I'm not sure
about.

Do you want me to rebase 2/2 on top of v8.1?

Jason
David Hildenbrand Sept. 23, 2022, 12:13 p.m. UTC | #14
On 23.09.22 13:46, Jason A. Donenfeld wrote:
> Hi David,
> 
> On Fri, Sep 23, 2022 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 23.09.22 13:19, Jason A. Donenfeld wrote:
>>> On Fri, Sep 23, 2022 at 12:47 PM David Hildenbrand <david@redhat.com> wrote:
>>>> You must be fortunate if "one afternoon" is not a significant time
>>>> investment. For me it is a significant investment.
>>>
>>> For me too, to say the least of the multiple afternoons I've spent on
>>> this patch set. Getting back to technical content:
>>>
>>>> and sort out the remaining issues. I thought we (Thomas included) had an
>>>> agreement that that's the way we are going to do it. Apparently I was wrong.
>>>> Most probably I lived in the kernel space too long such that we don't
>>>> rush something upstream. For that reason *I* sent out a patch with
>>>> Here I am, getting told by Thomas that we now do it differently now.
>>>> What I really tried to express here is: if Thomas wants to commit things
>>>> differently now, maybe he can just separate the cleanup parts. I really
>>>> *don't want* to send out a multi-patch series to cleanup individual
>>>> parts -- that takes significantly more time. Especially not if something
>>>> is not committed yet.
>>>
>>> To recap what's been fixed in your v8.1, versus what's been refactored
>>> out of style preference:
>>>
>>> 1) It handles the machine versioning.
>>> 2) It throws an exception on length alignment in KIMD mode:
>>> +    /* KIMD: length has to be properly aligned. */
>>> +    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
>>> +        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>> +    }
>>> 3) It asserts if type is neither KIMD vs KLMD, with:
>>> +    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
>>>
>>
>> One important part is
>>
>> 4) No memory modifications before all inputs were read
> 
> Ahhh, which v8's klmd doesn't do, since it updates the parameter block
> before doing the last block. Is this a requirement of the spec? If

Right, and not only the parameter block, but also registers IIRC.


It depend on the instruction and exception. IIRC, exceptions can be 
nullifying, terminating, suppressing, and partially-completing ...

Suppression: no state modification. PSW updated. Exception triggered. 
"The contents of any result fields, including the condition code, are 
not changed."

Nullification: no state modification. PSW not incremented. Exception 
triggered.

Termination: state partially changed. Bad (e.g., machine check). 
Exception triggered.

Partial completion only applies to "Interruptible Instructions". Instead 
of triggering an exception, the instruction exits (with CC=3 or so) and 
modified the state accordingly. There are only a handful of such 
instructions.



There is a chapter called "Types of Instruction Ending" in the PoP 
that's an interesting read.


So in case of Suppression/Nullification, the expectation is that no 
state (memory, register content) was updated when the exception triggers.


Depending on the way the instruction is used and how exceptions are 
handled, that can be a real issue, for example, if the program assumes 
that registers were not updated, or that memory wasn't updated -- but 
they actually were.

> not, then it doesn't matter. But if so, v8's approach is truly
> hopeless, and we'd be remiss to not go with your refactoring that
> accomplishes this.
I wouldn't call it hopeless, but that was the real reason why a 
restructured your code at all and why I had to play with the code myself 
to see if it can be done any better. All the moving stuff into other 
functions were just attempts to keep the code readable when unifying 
both functions :)

As I said, handling state update is non-trivial, and that instruction is 
especially hard to implement.

I *assume* that we never fail that way in the Linux kernel use case, 
because we don't rely on exceptions at all. So one might argue that v8 
is "good enough" for that.
Thomas Huth Sept. 23, 2022, 12:45 p.m. UTC | #15
On 23/09/2022 14.07, Jason A. Donenfeld wrote:
> On Fri, Sep 23, 2022 at 2:05 PM Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 23/09/2022 13.46, Jason A. Donenfeld wrote:
>>> Hi David,
>>>
>>> On Fri, Sep 23, 2022 at 1:35 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 23.09.22 13:19, Jason A. Donenfeld wrote:
>>>>> On Fri, Sep 23, 2022 at 12:47 PM David Hildenbrand <david@redhat.com> wrote:
>>>>>> You must be fortunate if "one afternoon" is not a significant time
>>>>>> investment. For me it is a significant investment.
>>>>>
>>>>> For me too, to say the least of the multiple afternoons I've spent on
>>>>> this patch set. Getting back to technical content:
>>>>>
>>>>>> and sort out the remaining issues. I thought we (Thomas included) had an
>>>>>> agreement that that's the way we are going to do it. Apparently I was wrong.
>>>>>> Most probably I lived in the kernel space too long such that we don't
>>>>>> rush something upstream. For that reason *I* sent out a patch with
>>>>>> Here I am, getting told by Thomas that we now do it differently now.
>>>>>> What I really tried to express here is: if Thomas wants to commit things
>>>>>> differently now, maybe he can just separate the cleanup parts. I really
>>>>>> *don't want* to send out a multi-patch series to cleanup individual
>>>>>> parts -- that takes significantly more time. Especially not if something
>>>>>> is not committed yet.
>>>>>
>>>>> To recap what's been fixed in your v8.1, versus what's been refactored
>>>>> out of style preference:
>>>>>
>>>>> 1) It handles the machine versioning.
>>>>> 2) It throws an exception on length alignment in KIMD mode:
>>>>> +    /* KIMD: length has to be properly aligned. */
>>>>> +    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
>>>>> +        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>>>>> +    }
>>>>> 3) It asserts if type is neither KIMD vs KLMD, with:
>>>>> +    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
>>>>>
>>>>
>>>> One important part is
>>>>
>>>> 4) No memory modifications before all inputs were read
>>>
>>> Ahhh, which v8's klmd doesn't do, since it updates the parameter block
>>> before doing the last block. Is this a requirement of the spec? If
>>> not, then it doesn't matter. But if so, v8's approach is truly
>>> hopeless, and we'd be remiss to not go with your refactoring that
>>> accomplishes this.
>>
>> Ok, great, if you're fine with the rework, I'll go with David's v8.1
>> instead. (keeping an entry on my TODO list to rework that ugly generic "msa"
>> helper function one day - this really kept me being confused for much of my
>> patch review time)
> 
> Okay, sure. Can one of you just look to see if that g_assert() is
> going to be a DoS vector, though, or if it'll never be reached if the
> prior code goes well? That's the one remaining thing I'm not sure
> about.
> 
> Do you want me to rebase 2/2 on top of v8.1?

Thanks, but I think it's a trivial contextual conflict only - I can fix that 
up when picking up the patches.

  Thomas
diff mbox series

Patch

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 9a2467c889..e18b816aba 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -803,8 +803,11 @@  DEFINE_CCW_MACHINE(7_2, "7.2", true);
 
 static void ccw_machine_7_1_instance_options(MachineState *machine)
 {
+    static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V7_1 };
+
     ccw_machine_7_2_instance_options(machine);
     s390_cpudef_featoff_greater(16, 1, S390_FEAT_PAIE);
+    s390_set_qemu_cpu_model(0x8561, 15, 1, qemu_cpu_feat);
 }
 
 static void ccw_machine_7_1_class_options(MachineClass *mc)
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 1558c52626..baadbf4517 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -744,13 +744,19 @@  static uint16_t qemu_V7_0[] = {
     S390_FEAT_MISC_INSTRUCTION_EXT3,
 };
 
+static uint16_t qemu_V7_1[] = {
+    S390_FEAT_VECTOR_ENH2,
+};
+
 /*
  * Features for the "qemu" CPU model of the latest QEMU machine and the "max"
  * CPU model under TCG. Don't include features that are not part of the full
  * feature set of the current "max" CPU model generation.
  */
 static uint16_t qemu_MAX[] = {
-    S390_FEAT_VECTOR_ENH2,
+    S390_FEAT_MSA_EXT_5,
+    S390_FEAT_KIMD_SHA_512,
+    S390_FEAT_KLMD_SHA_512,
 };
 
 /****** END FEATURE DEFS ******/
@@ -873,6 +879,7 @@  static FeatGroupDefSpec QemuFeatDef[] = {
     QEMU_FEAT_INITIALIZER(V6_0),
     QEMU_FEAT_INITIALIZER(V6_2),
     QEMU_FEAT_INITIALIZER(V7_0),
+    QEMU_FEAT_INITIALIZER(V7_1),
     QEMU_FEAT_INITIALIZER(MAX),
 };
 
diff --git a/target/s390x/tcg/crypto_helper.c b/target/s390x/tcg/crypto_helper.c
index 138d9e7ad9..106c89fd2d 100644
--- a/target/s390x/tcg/crypto_helper.c
+++ b/target/s390x/tcg/crypto_helper.c
@@ -1,10 +1,12 @@ 
 /*
  *  s390x crypto helpers
  *
+ *  Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  *  Copyright (c) 2017 Red Hat Inc
  *
  *  Authors:
  *   David Hildenbrand <david@redhat.com>
+ *   Jason A. Donenfeld <Jason@zx2c4.com>
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or later.
  * See the COPYING file in the top-level directory.
@@ -18,6 +20,230 @@ 
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 
+static uint64_t R(uint64_t x, int c)
+{
+    return (x >> c) | (x << (64 - c));
+}
+static uint64_t Ch(uint64_t x, uint64_t y, uint64_t z)
+{
+    return (x & y) ^ (~x & z);
+}
+static uint64_t Maj(uint64_t x, uint64_t y, uint64_t z)
+{
+    return (x & y) ^ (x & z) ^ (y & z);
+}
+static uint64_t Sigma0(uint64_t x)
+{
+    return R(x, 28) ^ R(x, 34) ^ R(x, 39);
+}
+static uint64_t Sigma1(uint64_t x)
+{
+    return R(x, 14) ^ R(x, 18) ^ R(x, 41);
+}
+static uint64_t sigma0(uint64_t x)
+{
+    return R(x, 1) ^ R(x, 8) ^ (x >> 7);
+}
+static uint64_t sigma1(uint64_t x)
+{
+    return R(x, 19) ^ R(x, 61) ^ (x >> 6);
+}
+
+static const uint64_t K[80] = {
+    0x428a2f98d728ae22ULL, 0x7137449123ef65cdULL, 0xb5c0fbcfec4d3b2fULL,
+    0xe9b5dba58189dbbcULL, 0x3956c25bf348b538ULL, 0x59f111f1b605d019ULL,
+    0x923f82a4af194f9bULL, 0xab1c5ed5da6d8118ULL, 0xd807aa98a3030242ULL,
+    0x12835b0145706fbeULL, 0x243185be4ee4b28cULL, 0x550c7dc3d5ffb4e2ULL,
+    0x72be5d74f27b896fULL, 0x80deb1fe3b1696b1ULL, 0x9bdc06a725c71235ULL,
+    0xc19bf174cf692694ULL, 0xe49b69c19ef14ad2ULL, 0xefbe4786384f25e3ULL,
+    0x0fc19dc68b8cd5b5ULL, 0x240ca1cc77ac9c65ULL, 0x2de92c6f592b0275ULL,
+    0x4a7484aa6ea6e483ULL, 0x5cb0a9dcbd41fbd4ULL, 0x76f988da831153b5ULL,
+    0x983e5152ee66dfabULL, 0xa831c66d2db43210ULL, 0xb00327c898fb213fULL,
+    0xbf597fc7beef0ee4ULL, 0xc6e00bf33da88fc2ULL, 0xd5a79147930aa725ULL,
+    0x06ca6351e003826fULL, 0x142929670a0e6e70ULL, 0x27b70a8546d22ffcULL,
+    0x2e1b21385c26c926ULL, 0x4d2c6dfc5ac42aedULL, 0x53380d139d95b3dfULL,
+    0x650a73548baf63deULL, 0x766a0abb3c77b2a8ULL, 0x81c2c92e47edaee6ULL,
+    0x92722c851482353bULL, 0xa2bfe8a14cf10364ULL, 0xa81a664bbc423001ULL,
+    0xc24b8b70d0f89791ULL, 0xc76c51a30654be30ULL, 0xd192e819d6ef5218ULL,
+    0xd69906245565a910ULL, 0xf40e35855771202aULL, 0x106aa07032bbd1b8ULL,
+    0x19a4c116b8d2d0c8ULL, 0x1e376c085141ab53ULL, 0x2748774cdf8eeb99ULL,
+    0x34b0bcb5e19b48a8ULL, 0x391c0cb3c5c95a63ULL, 0x4ed8aa4ae3418acbULL,
+    0x5b9cca4f7763e373ULL, 0x682e6ff3d6b2b8a3ULL, 0x748f82ee5defb2fcULL,
+    0x78a5636f43172f60ULL, 0x84c87814a1f0ab72ULL, 0x8cc702081a6439ecULL,
+    0x90befffa23631e28ULL, 0xa4506cebde82bde9ULL, 0xbef9a3f7b2c67915ULL,
+    0xc67178f2e372532bULL, 0xca273eceea26619cULL, 0xd186b8c721c0c207ULL,
+    0xeada7dd6cde0eb1eULL, 0xf57d4f7fee6ed178ULL, 0x06f067aa72176fbaULL,
+    0x0a637dc5a2c898a6ULL, 0x113f9804bef90daeULL, 0x1b710b35131c471bULL,
+    0x28db77f523047d84ULL, 0x32caab7b40c72493ULL, 0x3c9ebe0a15c9bebcULL,
+    0x431d67c49c100d4cULL, 0x4cc5d4becb3e42b6ULL, 0x597f299cfc657e2aULL,
+    0x5fcb6fab3ad6faecULL, 0x6c44198c4a475817ULL
+};
+
+/* a is icv/ocv, w is a single message block. w will get reused internally. */
+static void sha512_bda(uint64_t a[8], uint64_t w[16])
+{
+    uint64_t t, z[8], b[8];
+    int i, j;
+
+    memcpy(z, a, sizeof(z));
+    for (i = 0; i < 80; i++) {
+        memcpy(b, a, sizeof(b));
+
+        t = a[7] + Sigma1(a[4]) + Ch(a[4], a[5], a[6]) + K[i] + w[i % 16];
+        b[7] = t + Sigma0(a[0]) + Maj(a[0], a[1], a[2]);
+        b[3] += t;
+        for (j = 0; j < 8; ++j) {
+            a[(j + 1) % 8] = b[j];
+        }
+        if (i % 16 == 15) {
+            for (j = 0; j < 16; ++j) {
+                w[j] += w[(j + 9) % 16] + sigma0(w[(j + 1) % 16]) +
+                        sigma1(w[(j + 14) % 16]);
+            }
+        }
+    }
+
+    for (i = 0; i < 8; i++) {
+        a[i] += z[i];
+    }
+}
+
+/* a is icv/ocv, w is a single message block that needs be64 conversion. */
+static void sha512_bda_be64(uint64_t a[8], uint64_t w[16])
+{
+    uint64_t t[16];
+    int i;
+
+    for (i = 0; i < 16; i++) {
+        t[i] = be64_to_cpu(w[i]);
+    }
+    sha512_bda(a, t);
+}
+
+static void sha512_read_icv(CPUS390XState *env, uint64_t addr,
+                            uint64_t a[8], uintptr_t ra)
+{
+    int i;
+
+    for (i = 0; i < 8; i++, addr += 8) {
+        addr = wrap_address(env, addr);
+        a[i] = cpu_ldq_be_data_ra(env, addr, ra);
+    }
+}
+
+static void sha512_write_ocv(CPUS390XState *env, uint64_t addr,
+                             uint64_t a[8], uintptr_t ra)
+{
+    int i;
+
+    for (i = 0; i < 8; i++, addr += 8) {
+        addr = wrap_address(env, addr);
+        cpu_stq_be_data_ra(env, addr, a[i], ra);
+    }
+}
+
+static void sha512_read_block(CPUS390XState *env, uint64_t addr,
+                              uint64_t a[16], uintptr_t ra)
+{
+    int i;
+
+    for (i = 0; i < 16; i++, addr += 8) {
+        addr = wrap_address(env, addr);
+        a[i] = cpu_ldq_be_data_ra(env, addr, ra);
+    }
+}
+
+static void sha512_read_mbl_be64(CPUS390XState *env, uint64_t addr,
+                                 uint8_t a[16], uintptr_t ra)
+{
+    int i;
+
+    for (i = 0; i < 16; i++, addr += 1) {
+        addr = wrap_address(env, addr);
+        a[i] = cpu_ldub_data_ra(env, addr, ra);
+    }
+}
+
+static int cpacf_sha512(CPUS390XState *env, uintptr_t ra, uint64_t param_addr,
+                      uint64_t *message_reg, uint64_t *len_reg, uint32_t type)
+{
+    enum { MAX_BLOCKS_PER_RUN = 64 }; /* Arbitrary: keep interactivity. */
+    uint64_t len = *len_reg, a[8], processed = 0;
+    int i, message_reg_len = 64;
+
+    g_assert(type == S390_FEAT_TYPE_KIMD || type == S390_FEAT_TYPE_KLMD);
+
+    if (!(env->psw.mask & PSW_MASK_64)) {
+        len = (uint32_t)len;
+        message_reg_len = (env->psw.mask & PSW_MASK_32) ? 32 : 24;
+    }
+
+    /* KIMD: length has to be properly aligned. */
+    if (type == S390_FEAT_TYPE_KIMD && !QEMU_IS_ALIGNED(len, 128)) {
+        tcg_s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+    }
+
+    sha512_read_icv(env, param_addr, a, ra);
+
+    /* Process full blocks first. */
+    for (; len >= 128; len -= 128, processed += 128) {
+        uint64_t w[16];
+
+        if (processed >= MAX_BLOCKS_PER_RUN * 128) {
+            break;
+        }
+
+        sha512_read_block(env, *message_reg + processed, w, ra);
+        sha512_bda(a, w);
+    }
+
+    /* KLMD: Process partial/empty block last. */
+    if (type == S390_FEAT_TYPE_KLMD && len < 128) {
+        uint8_t x[128];
+
+        /* Read the remainder of the message byte-per-byte. */
+        for (i = 0; i < len; i++) {
+            uint64_t addr = wrap_address(env, *message_reg + processed + i);
+
+            x[i] = cpu_ldub_data_ra(env, addr, ra);
+        }
+        /* Pad the remainder with zero and set the top bit. */
+        memset(x + len, 0, 128 - len);
+        x[len] = 128;
+
+        /*
+         * Place the MBL either into this block (if there is space left),
+         * or use an additional one.
+         */
+        if (len < 112) {
+            sha512_read_mbl_be64(env, param_addr + 64, x + 112, ra);
+        }
+        sha512_bda_be64(a, (uint64_t *)x);
+
+        if (len >= 112) {
+            memset(x, 0, 112);
+            sha512_read_mbl_be64(env, param_addr + 64, x + 112, ra);
+            sha512_bda_be64(a, (uint64_t *)x);
+        }
+
+        processed += len;
+        len = 0;
+    }
+
+    /*
+     * Modify memory after we read all inputs and modify registers only after
+     * writing memory succeeded.
+     *
+     * TODO: if writing fails halfway through (e.g., when crossing page
+     * boundaries), we're in trouble. We'd need something like access_prepare().
+     */
+    sha512_write_ocv(env, param_addr, a, ra);
+    *message_reg = deposit64(*message_reg, 0, message_reg_len,
+                             *message_reg + processed);
+    *len_reg -= processed;
+    return !len ? 0 : 3;
+}
+
 uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
                      uint32_t type)
 {
@@ -52,6 +278,9 @@  uint32_t HELPER(msa)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
             cpu_stb_data_ra(env, param_addr, subfunc[i], ra);
         }
         break;
+    case 3: /* CPACF_*_SHA_512 */
+        return cpacf_sha512(env, ra, env->regs[1], &env->regs[r2],
+                            &env->regs[r2 + 1], type);
     default:
         /* we don't implement any other subfunction yet */
         g_assert_not_reached();