diff mbox series

[09/10] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension

Message ID 20201012153746.9996-10-peter.maydell@linaro.org
State New
Headers show
Series target/arm: Various v8.1M minor features | expand

Commit Message

Peter Maydell Oct. 12, 2020, 3:37 p.m. UTC
If the M-profile low-overhead-branch extension is implemented, FPSCR
bits [18:16] are a new field LTPSIZE.  If MVE is not implemented
(currently always true for us) then this field always reads as 4 and
ignores writes.

These bits used to be the vector-length field for the old
short-vector extension, so we need to take care that they
are not misinterpreted as setting vec_len.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c        |  5 +++++
 target/arm/vfp_helper.c | 25 +++++++++++++++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

Comments

Richard Henderson Oct. 13, 2020, 8:06 p.m. UTC | #1
On 10/12/20 8:37 AM, Peter Maydell wrote:
> @@ -198,8 +200,14 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
>          /*
>           * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
>           * and also for the trapped-exception-handling bits IxE.
> +         * From v8.1M with the low-overhead-loop extension bits
> +         * [18:16] are used for LTPSIZE and (since we don't implement
> +         * MVE) always read as 4 and ignore writes.
>           */
>          val &= 0xf7c0009f;
> +        if (cpu_isar_feature(aa32_lob, cpu)) {
> +            val |= 4 << 16;
> +        }
>      }
>  
>      vfp_set_fpscr_to_host(env, val);
> @@ -212,9 +220,18 @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
>       * (which are stored in fp_status), and the other RES0 bits
>       * in between, then we clear all of the low 16 bits.
>       */
> -    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
> -    env->vfp.vec_len = (val >> 16) & 7;
> -    env->vfp.vec_stride = (val >> 20) & 3;
> +    if (cpu_isar_feature(aa32_lob, cpu)) {
> +        /*
> +         * M-profile low-overhead-loop extension: [18:16] are LTPSIZE
> +         * and we keep them in vfp.xregs[].
> +         */
> +        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7cf0000;
> +    } else {
> +        /* Those bits might be the old-style short vector length/stride */
> +        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
> +        env->vfp.vec_len = (val >> 16) & 7;
> +        env->vfp.vec_stride = (val >> 20) & 3;
> +    }

I think these two sets of masking are confusing.
Perhaps usefully rearranged as

    if (!fp16) {
        val &= ~fz16;
    }
    vfp_set_fpscr_to_host(env, val);

    if (!m-profile) {
        vec_len = extract32(val, 16, 3);
        vec_stride = extract32(val, 20, 2);
    }
    val &= 0xf7c80000;
    if (lob) {
        val |= 4 << 16;
    }
    fpscr = val;

Which then obviates the next patch.


r~
Peter Maydell Oct. 13, 2020, 8:38 p.m. UTC | #2
On Tue, 13 Oct 2020 at 21:06, Richard Henderson
<richard.henderson@linaro.org> wrote:
> I think these two sets of masking are confusing.
> Perhaps usefully rearranged as
>
>     if (!fp16) {
>         val &= ~fz16;
>     }
>     vfp_set_fpscr_to_host(env, val);
>
>     if (!m-profile) {
>         vec_len = extract32(val, 16, 3);
>         vec_stride = extract32(val, 20, 2);
>     }
>     val &= 0xf7c80000;
>     if (lob) {
>         val |= 4 << 16;
>     }
>     fpscr = val;

Yeah, probably cleaner.

The other thing I wondered about is whether we should
be setting vec_len/vec_stride for an A-profile CPU which
doesn't implement the short-vector extension (ie where
MVFR0.FPShVec is zero). But that gets a bit awkward: v8A
allows an implementation to make Stride and Len be RAZ,
but v7A didn't permit that and so I think we would need
to distinguish:
 * has short-vector support (eg Cortex-A9)
 * v8A, can implement FPSCR.{Stride,Len} as RAZ/WI
 * no short-vector support, Stride/Len can be written
   but the only effect is that some insns must UNDEF
   (eg Cortex-A7)

I think at the moment we currently provide short-vector
support for everything, which is wrong but wrong in
the direction that means more guest code runs...

thanks
-- PMM
Richard Henderson Oct. 13, 2020, 9:01 p.m. UTC | #3
On 10/13/20 1:38 PM, Peter Maydell wrote:
>  * has short-vector support (eg Cortex-A9)
>  * v8A, can implement FPSCR.{Stride,Len} as RAZ/WI
>  * no short-vector support, Stride/Len can be written
>    but the only effect is that some insns must UNDEF
>    (eg Cortex-A7)

Yep.

The other thing I wondered is if it was worthwhile to go ahead and split out
ltpsize now, even with MTE not implemented.

Eventually the conditions here would look like

    if (m-profile) {
        if (mte) {
            ltpsize = [18:16];
        }
    } else {
        if (!v8) {
            vec_len = [18:16];
            vec_stride = [22:20];
        }
    }

but for now you could leave out the assignment to ltpsize and just leave it
initialized to 4 since reset.


r~
diff mbox series

Patch

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 186ee621a65..baae826f94f 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -255,6 +255,11 @@  static void arm_cpu_reset(DeviceState *dev)
         uint8_t *rom;
         uint32_t vecbase;
 
+        if (cpu_isar_feature(aa32_lob, cpu)) {
+            /* LTPSIZE is constant 4 if MVE not implemented */
+            env->vfp.xregs[ARM_VFP_FPSCR] |= 4 << 16;
+        }
+
         if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
             env->v7m.secure = true;
         } else {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index 5666393ef79..350150adbf1 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -189,8 +189,10 @@  uint32_t vfp_get_fpscr(CPUARMState *env)
 
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
+    ARMCPU *cpu = env_archcpu(env);
+
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
+    if (!cpu_isar_feature(any_fp16, cpu)) {
         val &= ~FPCR_FZ16;
     }
 
@@ -198,8 +200,14 @@  void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
         /*
          * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
          * and also for the trapped-exception-handling bits IxE.
+         * From v8.1M with the low-overhead-loop extension bits
+         * [18:16] are used for LTPSIZE and (since we don't implement
+         * MVE) always read as 4 and ignore writes.
          */
         val &= 0xf7c0009f;
+        if (cpu_isar_feature(aa32_lob, cpu)) {
+            val |= 4 << 16;
+        }
     }
 
     vfp_set_fpscr_to_host(env, val);
@@ -212,9 +220,18 @@  void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
      * (which are stored in fp_status), and the other RES0 bits
      * in between, then we clear all of the low 16 bits.
      */
-    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
-    env->vfp.vec_len = (val >> 16) & 7;
-    env->vfp.vec_stride = (val >> 20) & 3;
+    if (cpu_isar_feature(aa32_lob, cpu)) {
+        /*
+         * M-profile low-overhead-loop extension: [18:16] are LTPSIZE
+         * and we keep them in vfp.xregs[].
+         */
+        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7cf0000;
+    } else {
+        /* Those bits might be the old-style short vector length/stride */
+        env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
+        env->vfp.vec_len = (val >> 16) & 7;
+        env->vfp.vec_stride = (val >> 20) & 3;
+    }
 
     /*
      * The bit we set within fpscr_q is arbitrary; the register as a