diff mbox series

[27/35] target/arm: Report VNCR_EL2 based faults correctly

Message ID 20231218113305.2511480-28-peter.maydell@linaro.org
State New
Headers show
Series target/arm: Implement emulation of nested virtualization | expand

Commit Message

Peter Maydell Dec. 18, 2023, 11:32 a.m. UTC
If FEAT_NV2 redirects a system register access to a memory offset
from VNCR_EL2, that access might fault.  In this case we need to
report the correct syndrome information:
 * Data Abort, from same-EL
 * no ISS information
 * the VNCR bit (bit 13) is set

and the exception must be taken to EL2.

Save an appropriate syndrome template when generating code; we can
then use that to:
 * select the right target EL
 * reconstitute a correct final syndrome for the data abort
 * report the right syndrome if we take a FEAT_RME granule protection
   fault on the VNCR-based write

Note that because VNCR is bit 13, we must start keeping bit 13 in
template syndromes, by adjusting ARM_INSN_START_WORD2_SHIFT.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h               |  4 ++--
 target/arm/syndrome.h          | 20 ++++++++++++++++----
 target/arm/tcg/tlb_helper.c    | 27 +++++++++++++++++++++++++--
 target/arm/tcg/translate-a64.c |  4 ++++
 4 files changed, 47 insertions(+), 8 deletions(-)

Comments

Richard Henderson Dec. 28, 2023, 12:03 a.m. UTC | #1
On 12/18/23 22:32, Peter Maydell wrote:
> If FEAT_NV2 redirects a system register access to a memory offset
> from VNCR_EL2, that access might fault.  In this case we need to
> report the correct syndrome information:
>   * Data Abort, from same-EL
>   * no ISS information
>   * the VNCR bit (bit 13) is set
> 
> and the exception must be taken to EL2.
> 
> Save an appropriate syndrome template when generating code; we can
> then use that to:
>   * select the right target EL
>   * reconstitute a correct final syndrome for the data abort
>   * report the right syndrome if we take a FEAT_RME granule protection
>     fault on the VNCR-based write
> 
> Note that because VNCR is bit 13, we must start keeping bit 13 in
> template syndromes, by adjusting ARM_INSN_START_WORD2_SHIFT.
> 
> Signed-off-by: Peter Maydell<peter.maydell@linaro.org>
> ---
>   target/arm/cpu.h               |  4 ++--
>   target/arm/syndrome.h          | 20 ++++++++++++++++----
>   target/arm/tcg/tlb_helper.c    | 27 +++++++++++++++++++++++++--
>   target/arm/tcg/translate-a64.c |  4 ++++
>   4 files changed, 47 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~
Peter Maydell Jan. 16, 2024, 1:20 p.m. UTC | #2
On Tue, 16 Jan 2024 at 13:09, Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Mon, 18 Dec 2023 11:32:57 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
>
> > If FEAT_NV2 redirects a system register access to a memory offset
> > from VNCR_EL2, that access might fault.  In this case we need to
> > report the correct syndrome information:
> >  * Data Abort, from same-EL
> >  * no ISS information
> >  * the VNCR bit (bit 13) is set
> >
> > and the exception must be taken to EL2.
> >
> > Save an appropriate syndrome template when generating code; we can
> > then use that to:
> >  * select the right target EL
> >  * reconstitute a correct final syndrome for the data abort
> >  * report the right syndrome if we take a FEAT_RME granule protection
> >    fault on the VNCR-based write
> >
> > Note that because VNCR is bit 13, we must start keeping bit 13 in
> > template syndromes, by adjusting ARM_INSN_START_WORD2_SHIFT.
> >
> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>
> Hi Peter,
>
> I'm getting an unhelpful crash on calling init in a guest
> running on top of an a76 emulated host with virtualization turned on.
>
> Run /sbin/init as init process
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
> CPU: 1 PID: 1 Comm: init Not tainted 6.7.0+ #1119
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace+0xa0/0x128
>  show_stack+0x20/0x38
>  dump_stack_lvl+0x48/0x60
>  dump_stack+0x18/0x28
>  panic+0x380/0x3c0
>  do_exit+0x89c/0x9a0
>  do_group_exit+0x3c/0xa0
>  get_signal+0x968/0x970
>  do_notify_resume+0x21c/0x1460
>  el0_ia+0xa0/0xb0
>  el0t_64_sync_handler+0xd0/0x130
>  el0t_64_sync+0x190/0x198
> SMP: stopping secondary CPUs
> Kernel Offset: 0x2a8c93a00000 from 0xffff800080000000
> PHYS_OFFSET: 0xffff82f980000000
> CPU features: 0x0,00000001,7002014a,2101720b
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007 ]---
>
> Upstream kernel as of yesterday.  Nothing particular 'exciting' in the
> configurations. Not attempting to use Nested virt.
> -M virt,gic-version=3,virtualization=true
> -cpu cortex-a76 (happens with max as well but switched to a76 for testing
> to reduce possible sources of problems).
>
> Doesn't happen if single cpu in the guest, or if using gic v2 in both.
>
> Bisection points at this patch - so far no idea why but I've only
> just started digging into this.

Bisecting to this patch is a bit weird because at this point
in the series emulation of FEAT_NV2 should be disabled and
the code being added should never be used. You could put
an assert(0) into the code in translate-a64.c before the
call to syn_data_abort_vncr() and in arm_deliver_fault()
assert(!is_vncr) to confirm that we're not somehow getting
into this code for some non-FEAT_NV2 situation, I guess.

thanks
-- PMM
Peter Maydell Jan. 16, 2024, 2:59 p.m. UTC | #3
On Tue, 16 Jan 2024 at 14:50, Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 16 Jan 2024 13:20:33 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
> > Bisecting to this patch is a bit weird because at this point
> > in the series emulation of FEAT_NV2 should be disabled and
> > the code being added should never be used. You could put
> > an assert(0) into the code in translate-a64.c before the
> > call to syn_data_abort_vncr() and in arm_deliver_fault()
> > assert(!is_vncr) to confirm that we're not somehow getting
> > into this code for some non-FEAT_NV2 situation, I guess.
>
> Not that, but surprisingly is_vncr == true.
> in arm_deliver_fault()
>
> Frigging that to be false gets me up and running. I'll see
> if I can figure out why it is set.

I don't know if this is the cause, but looking again at the
line that sets is_vncr I see at least one obvious bug:

    bool is_vncr = (mmu_idx != MMU_INST_FETCH) &&
        (env->exception.syndrome & ARM_EL_VNCR);

is testing the wrong variable -- the first part
of the condition should be "access_type != MMU_INST_FETCH".

If you fix that does the failure go away ?

Yay for C and its very sloppy typing :-/

-- PMM
Peter Maydell Jan. 16, 2024, 3:35 p.m. UTC | #4
On Tue, 16 Jan 2024 at 15:29, Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 16 Jan 2024 14:59:15 +0000
> Peter Maydell <peter.maydell@linaro.org> wrote:
>
> > On Tue, 16 Jan 2024 at 14:50, Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Tue, 16 Jan 2024 13:20:33 +0000
> > > Peter Maydell <peter.maydell@linaro.org> wrote:
> > > > Bisecting to this patch is a bit weird because at this point
> > > > in the series emulation of FEAT_NV2 should be disabled and
> > > > the code being added should never be used. You could put
> > > > an assert(0) into the code in translate-a64.c before the
> > > > call to syn_data_abort_vncr() and in arm_deliver_fault()
> > > > assert(!is_vncr) to confirm that we're not somehow getting
> > > > into this code for some non-FEAT_NV2 situation, I guess.
> > >
> > > Not that, but surprisingly is_vncr == true.
> > > in arm_deliver_fault()
> > >
> > > Frigging that to be false gets me up and running. I'll see
> > > if I can figure out why it is set.
> >
> > I don't know if this is the cause, but looking again at the
> > line that sets is_vncr I see at least one obvious bug:
> >
> >     bool is_vncr = (mmu_idx != MMU_INST_FETCH) &&
> >         (env->exception.syndrome & ARM_EL_VNCR);
> >
> > is testing the wrong variable -- the first part
> > of the condition should be "access_type != MMU_INST_FETCH".
> >
> > If you fix that does the failure go away ?
> Ah - indeed that fixes it.
>
> I guess that makes sense. Presumably the bit is used for
> something else for instruction fetches.
>
> Thanks for your quick help!

Thanks for testing -- I'll send out a proper patch in
a bit.

-- PMM
diff mbox series

Patch

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index bc4fa95ea35..da640949518 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -120,12 +120,12 @@  enum {
 #define TARGET_INSN_START_EXTRA_WORDS 2
 
 /* The 2nd extra word holding syndrome info for data aborts does not use
- * the upper 6 bits nor the lower 14 bits. We mask and shift it down to
+ * the upper 6 bits nor the lower 13 bits. We mask and shift it down to
  * help the sleb128 encoder do a better job.
  * When restoring the CPU state, we shift it back up.
  */
 #define ARM_INSN_START_WORD2_MASK ((1 << 26) - 1)
-#define ARM_INSN_START_WORD2_SHIFT 14
+#define ARM_INSN_START_WORD2_SHIFT 13
 
 /* We currently assume float and double are IEEE single and double
    precision respectively.
diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index 95454b5b3bb..1a49767479f 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -86,6 +86,9 @@  typedef enum {
 #define ARM_EL_IL (1 << ARM_EL_IL_SHIFT)
 #define ARM_EL_ISV (1 << ARM_EL_ISV_SHIFT)
 
+/* In the Data Abort syndrome */
+#define ARM_EL_VNCR (1 << 13)
+
 static inline uint32_t syn_get_ec(uint32_t syn)
 {
     return syn >> ARM_EL_EC_SHIFT;
@@ -256,13 +259,12 @@  static inline uint32_t syn_bxjtrap(int cv, int cond, int rm)
         (cv << 24) | (cond << 20) | rm;
 }
 
-static inline uint32_t syn_gpc(int s2ptw, int ind, int gpcsc,
+static inline uint32_t syn_gpc(int s2ptw, int ind, int gpcsc, int vncr,
                                int cm, int s1ptw, int wnr, int fsc)
 {
-    /* TODO: FEAT_NV2 adds VNCR */
     return (EC_GPC << ARM_EL_EC_SHIFT) | ARM_EL_IL | (s2ptw << 21)
-            | (ind << 20) | (gpcsc << 14) | (cm << 8) | (s1ptw << 7)
-            | (wnr << 6) | fsc;
+        | (ind << 20) | (gpcsc << 14) | (vncr << 13) | (cm << 8)
+        | (s1ptw << 7) | (wnr << 6) | fsc;
 }
 
 static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
@@ -295,6 +297,16 @@  static inline uint32_t syn_data_abort_with_iss(int same_el,
            | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
 }
 
+/*
+ * Faults due to FEAT_NV2 VNCR_EL2-based accesses report as same-EL
+ * Data Aborts with the VNCR bit set.
+ */
+static inline uint32_t syn_data_abort_vncr(int ea, int wnr, int fsc)
+{
+    return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (1 << ARM_EL_EC_SHIFT)
+        | ARM_EL_IL | ARM_EL_VNCR | (wnr << 6) | fsc;
+}
+
 static inline uint32_t syn_swstep(int same_el, int isv, int ex)
 {
     return (EC_SOFTWARESTEP << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
index 4fdd85359e1..dd5de74ffb7 100644
--- a/target/arm/tcg/tlb_helper.c
+++ b/target/arm/tcg/tlb_helper.c
@@ -50,7 +50,15 @@  static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
      * ST64BV, or ST64BV0 insns report syndrome info even for stage-1
      * faults and regardless of the target EL.
      */
-    if (!(template_syn & ARM_EL_ISV) || target_el != 2
+    if (template_syn & ARM_EL_VNCR) {
+        /*
+         * FEAT_NV2 faults on accesses via VNCR_EL2 are a special case:
+         * they are always reported as "same EL", even though we are going
+         * from EL1 to EL2.
+         */
+        assert(!fi->stage2);
+        syn = syn_data_abort_vncr(fi->ea, is_write, fsc);
+    } else if (!(template_syn & ARM_EL_ISV) || target_el != 2
         || fi->s1ptw || !fi->stage2) {
         syn = syn_data_abort_no_iss(same_el, 0,
                                     fi->ea, 0, fi->s1ptw, is_write, fsc);
@@ -169,6 +177,20 @@  void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
     int current_el = arm_current_el(env);
     bool same_el;
     uint32_t syn, exc, fsr, fsc;
+    /*
+     * We know this must be a data or insn abort, and that
+     * env->exception.syndrome contains the template syndrome set
+     * up at translate time. So we can check only the VNCR bit
+     * (and indeed syndrome does not have the EC field in it,
+     * because we masked that out in disas_set_insn_syndrome())
+     */
+    bool is_vncr = (mmu_idx != MMU_INST_FETCH) &&
+        (env->exception.syndrome & ARM_EL_VNCR);
+
+    if (is_vncr) {
+        /* FEAT_NV2 faults on accesses via VNCR_EL2 go to EL2 */
+        target_el = 2;
+    }
 
     if (report_as_gpc_exception(cpu, current_el, fi)) {
         target_el = 3;
@@ -177,7 +199,8 @@  void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
 
         syn = syn_gpc(fi->stage2 && fi->type == ARMFault_GPCFOnWalk,
                       access_type == MMU_INST_FETCH,
-                      encode_gpcsc(fi), 0, fi->s1ptw,
+                      encode_gpcsc(fi), is_vncr,
+                      0, fi->s1ptw,
                       access_type == MMU_DATA_STORE, fsc);
 
         env->cp15.mfar_el3 = fi->paddr;
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 128bff4b445..8f905ed9645 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -2293,6 +2293,7 @@  static void handle_sys(DisasContext *s, bool isread,
         MemOp mop = MO_64 | MO_ALIGN | MO_ATOM_IFALIGN;
         ARMMMUIdx armmemidx = s->nv2_mem_e20 ? ARMMMUIdx_E20_2 : ARMMMUIdx_E2;
         int memidx = arm_to_core_mmu_idx(armmemidx);
+        uint32_t syn;
 
         if (s->nv2_mem_be) {
             mop |= MO_BE;
@@ -2302,6 +2303,9 @@  static void handle_sys(DisasContext *s, bool isread,
         tcg_gen_addi_i64(ptr, ptr,
                          (ri->nv2_redirect_offset & ~NV2_REDIR_FLAG_MASK));
         tcg_rt = cpu_reg(s, rt);
+
+        syn = syn_data_abort_vncr(0, !isread, 0);
+        disas_set_insn_syndrome(s, syn);
         if (isread) {
             tcg_gen_qemu_ld_i64(tcg_rt, ptr, memidx, mop);
         } else {