diff mbox

[RFC] target-arm: kvm: Differentiate registers based on write-back levels

Message ID 1436526053-4516-1-git-send-email-christoffer.dall@linaro.org
State New
Headers show

Commit Message

Christoffer Dall July 10, 2015, 11 a.m. UTC
Some registers like the CNTVCT register should only be written to the
kernel as part of machine initialization or on vmload operations, but
never during runtime, as this can potentially make time go backwards or
create inconsistent time observations between VCPUs.

Introduce a list of registers that should not be written back at runtime
and check this list on syncing the register state to the KVM state.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 target-arm/kvm.c     | 34 +++++++++++++++++++++++++++++++++-
 target-arm/kvm32.c   |  2 +-
 target-arm/kvm64.c   |  2 +-
 target-arm/kvm_arm.h |  3 ++-
 target-arm/machine.c |  2 +-
 5 files changed, 38 insertions(+), 5 deletions(-)

Comments

Peter Maydell July 10, 2015, 11:22 a.m. UTC | #1
On 10 July 2015 at 12:00, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> Some registers like the CNTVCT register should only be written to the
> kernel as part of machine initialization or on vmload operations, but
> never during runtime, as this can potentially make time go backwards or
> create inconsistent time observations between VCPUs.
>
> Introduce a list of registers that should not be written back at runtime
> and check this list on syncing the register state to the KVM state.

Thanks for picking this one up...

> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  target-arm/kvm.c     | 34 +++++++++++++++++++++++++++++++++-
>  target-arm/kvm32.c   |  2 +-
>  target-arm/kvm64.c   |  2 +-
>  target-arm/kvm_arm.h |  3 ++-
>  target-arm/machine.c |  2 +-
>  5 files changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index 548bfd7..2e92699 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -409,7 +409,35 @@ bool write_kvmstate_to_list(ARMCPU *cpu)
>      return ok;
>  }
>
> -bool write_list_to_kvmstate(ARMCPU *cpu)
> +typedef struct cpreg_state_level {
> +    uint64_t kvm_idx;
> +    int level;
> +} cpreg_state_level;

(QEMU's coding style prefers CPRegStateLevel for struct types.)

> +
> +/* All system registers not listed in the following table are assumed to be
> + * of the level KVM_PUT_RUNTIME_STATE, a register should be written less
> + * often, you must add it to this table with a state of either
> + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
> + */
> +cpreg_state_level non_runtime_cpregs[] = {
> +    { KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },

This should be KVM_PUT_RESET_STATE, right?

> +};

The other option here would be to keep the level information
in the cpreg structs (which is where we put everything else
we know about cpregs); we'd probably need to then initialise
some other data structure if we wanted to avoid the hash
table lookup for every register in write_list_to_kvmstate.

I guess if we expect this list to remain a fairly small
set of exceptional cases then this is OK (and vaguely
comparable to the existing kvm_arm_reg_syncs_via-cpreg_list
handling).

Don't we need separate 32-bit and 64-bit versions of
this list?

thanks
-- PMM
Christoffer Dall July 11, 2015, 12:18 p.m. UTC | #2
On Fri, Jul 10, 2015 at 12:22:31PM +0100, Peter Maydell wrote:
> On 10 July 2015 at 12:00, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> > Some registers like the CNTVCT register should only be written to the
> > kernel as part of machine initialization or on vmload operations, but
> > never during runtime, as this can potentially make time go backwards or
> > create inconsistent time observations between VCPUs.
> >
> > Introduce a list of registers that should not be written back at runtime
> > and check this list on syncing the register state to the KVM state.
> 
> Thanks for picking this one up...
> 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  target-arm/kvm.c     | 34 +++++++++++++++++++++++++++++++++-
> >  target-arm/kvm32.c   |  2 +-
> >  target-arm/kvm64.c   |  2 +-
> >  target-arm/kvm_arm.h |  3 ++-
> >  target-arm/machine.c |  2 +-
> >  5 files changed, 38 insertions(+), 5 deletions(-)
> >
> > diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> > index 548bfd7..2e92699 100644
> > --- a/target-arm/kvm.c
> > +++ b/target-arm/kvm.c
> > @@ -409,7 +409,35 @@ bool write_kvmstate_to_list(ARMCPU *cpu)
> >      return ok;
> >  }
> >
> > -bool write_list_to_kvmstate(ARMCPU *cpu)
> > +typedef struct cpreg_state_level {
> > +    uint64_t kvm_idx;
> > +    int level;
> > +} cpreg_state_level;
> 
> (QEMU's coding style prefers CPRegStateLevel for struct types.)
> 

ok

> > +
> > +/* All system registers not listed in the following table are assumed to be
> > + * of the level KVM_PUT_RUNTIME_STATE, a register should be written less
> > + * often, you must add it to this table with a state of either
> > + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
> > + */
> > +cpreg_state_level non_runtime_cpregs[] = {
> > +    { KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
> 
> This should be KVM_PUT_RESET_STATE, right?
> 
should it?  If you reset a real machine, you will not necessarily see a
counter value of zero will you?

I guess this depends on whether QEMU reset means power the system
completely off and then on again, or some softer reset?

> > +};
> 
> The other option here would be to keep the level information
> in the cpreg structs (which is where we put everything else
> we know about cpregs); we'd probably need to then initialise
> some other data structure if we wanted to avoid the hash
> table lookup for every register in write_list_to_kvmstate.
> 
> I guess if we expect this list to remain a fairly small
> set of exceptional cases then this is OK (and vaguely
> comparable to the existing kvm_arm_reg_syncs_via-cpreg_list
> handling).

I thought about this too, and sent this as an RFC for exactly this
reason.  I did it this way initially for two reasons: (1) I don't
understand the hash-table register initialization flow for aarch64 and
(2) I could really only identify this single register for now that needs
to be marked as a non-runtime register, and then this is less invasive.

> 
> Don't we need separate 32-bit and 64-bit versions of
> this list?
> 
Do we?  I thought this file would compile separately for the 32-bit and
64-bit versions and the register index define is the same name for both
architectures, did I get this wrong?

Of course, for other registers with unique-to-32-bit-or-64-bit reg index
defines, yes, we would need a separate table.  Should they then be
defined in the kvm32.c and kvm64.c and passed in as a pointer to
write_kvmstate_to_list() ?

Thanks,
-Christoffer
Peter Maydell July 14, 2015, 2:54 p.m. UTC | #3
On 11 July 2015 at 13:18, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Fri, Jul 10, 2015 at 12:22:31PM +0100, Peter Maydell wrote:
>> On 10 July 2015 at 12:00, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>> > +/* All system registers not listed in the following table are assumed to be
>> > + * of the level KVM_PUT_RUNTIME_STATE, a register should be written less
>> > + * often, you must add it to this table with a state of either
>> > + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
>> > + */
>> > +cpreg_state_level non_runtime_cpregs[] = {
>> > +    { KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
>>
>> This should be KVM_PUT_RESET_STATE, right?
>>
> should it?  If you reset a real machine, you will not necessarily see a
> counter value of zero will you?

I was confused, I thought PUT_FULL_STATE meant what PUT_RUNTIME_STATE
does.

> I guess this depends on whether QEMU reset means power the system
> completely off and then on again, or some softer reset?

QEMU reset means power cycle. But I think the semantics of
KVM_PUT_RESET_STATE are not "does real h/w change this on
reset" but "does QEMU's runtime code change this on reset"
(ie does the common code then need to do a sync of the register
in order to make the reset code's change show up to KVM).

>> The other option here would be to keep the level information
>> in the cpreg structs (which is where we put everything else
>> we know about cpregs); we'd probably need to then initialise
>> some other data structure if we wanted to avoid the hash
>> table lookup for every register in write_list_to_kvmstate.
>>
>> I guess if we expect this list to remain a fairly small
>> set of exceptional cases then this is OK (and vaguely
>> comparable to the existing kvm_arm_reg_syncs_via-cpreg_list
>> handling).
>
> I thought about this too, and sent this as an RFC for exactly this
> reason.  I did it this way initially for two reasons: (1) I don't
> understand the hash-table register initialization flow for aarch64 and
> (2) I could really only identify this single register for now that needs
> to be marked as a non-runtime register, and then this is less invasive.

Yes, it's the "only one register" part that makes it seem
overkill to do it the other way.

>> Don't we need separate 32-bit and 64-bit versions of
>> this list?
>>
> Do we?  I thought this file would compile separately for the 32-bit and
> 64-bit versions and the register index define is the same name for both
> architectures, did I get this wrong?

I think you're right, but it feels a bit fragile to depend on
the fact that the name used by the kernel headers is the same
in both cases, especially since as soon as we wanted to add a
register that only mattered for one of 32/64 we'd need to refactor
to split things into two lists.

> Of course, for other registers with unique-to-32-bit-or-64-bit reg index
> defines, yes, we would need a separate table.  Should they then be
> defined in the kvm32.c and kvm64.c and passed in as a pointer to
> write_kvmstate_to_list() ?

I think I would make cpreg_level() be a function defined in
kvm32.c/kvm64.c (as kvm_arm_reg_syncs_via_cpreg_list() is);
you'd need to give it a kvm_arm_ prefix, maybe
kvm_arm_reg_sync_level().

thanks
-- PMM
diff mbox

Patch

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 548bfd7..2e92699 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -409,7 +409,35 @@  bool write_kvmstate_to_list(ARMCPU *cpu)
     return ok;
 }
 
-bool write_list_to_kvmstate(ARMCPU *cpu)
+typedef struct cpreg_state_level {
+    uint64_t kvm_idx;
+    int level;
+} cpreg_state_level;
+
+/* All system registers not listed in the following table are assumed to be
+ * of the level KVM_PUT_RUNTIME_STATE, a register should be written less
+ * often, you must add it to this table with a state of either
+ * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE.
+ */
+cpreg_state_level non_runtime_cpregs[] = {
+    { KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE },
+};
+
+static int cpreg_level(uint64_t kvm_idx)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(non_runtime_cpregs); i++) {
+        cpreg_state_level *l = &non_runtime_cpregs[i];
+        if (l->kvm_idx == kvm_idx) {
+            return l->level;
+        }
+    }
+
+    return KVM_PUT_RUNTIME_STATE;
+}
+
+bool write_list_to_kvmstate(ARMCPU *cpu, int level)
 {
     CPUState *cs = CPU(cpu);
     int i;
@@ -421,6 +449,10 @@  bool write_list_to_kvmstate(ARMCPU *cpu)
         uint32_t v32;
         int ret;
 
+        if (cpreg_level(regidx) > level) {
+            continue;
+        }
+
         r.id = regidx;
         switch (regidx & KVM_REG_SIZE_MASK) {
         case KVM_REG_SIZE_U32:
diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c
index d7e7d68..9fbd5fd 100644
--- a/target-arm/kvm32.c
+++ b/target-arm/kvm32.c
@@ -367,7 +367,7 @@  int kvm_arch_put_registers(CPUState *cs, int level)
      * managed to update the CPUARMState with, and only allowing those
      * to be written back up into the kernel).
      */
-    if (!write_list_to_kvmstate(cpu)) {
+    if (!write_list_to_kvmstate(cpu, level)) {
         return EINVAL;
     }
 
diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index ac34f51..2911679 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -280,7 +280,7 @@  int kvm_arch_put_registers(CPUState *cs, int level)
         return ret;
     }
 
-    if (!write_list_to_kvmstate(cpu)) {
+    if (!write_list_to_kvmstate(cpu, level)) {
         return EINVAL;
     }
 
diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h
index 5abd591..ce03e97 100644
--- a/target-arm/kvm_arm.h
+++ b/target-arm/kvm_arm.h
@@ -71,6 +71,7 @@  bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx);
 /**
  * write_list_to_kvmstate:
  * @cpu: ARMCPU
+ * @level: the state level to sync
  *
  * For each register listed in the ARMCPU cpreg_indexes list, write
  * its value from the cpreg_values list into the kernel (via ioctl).
@@ -83,7 +84,7 @@  bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx);
  * Note that we do not stop early on failure -- we will attempt
  * writing all registers in the list.
  */
-bool write_list_to_kvmstate(ARMCPU *cpu);
+bool write_list_to_kvmstate(ARMCPU *cpu, int level);
 
 /**
  * write_kvmstate_to_list:
diff --git a/target-arm/machine.c b/target-arm/machine.c
index 9eb51df..32adfe7 100644
--- a/target-arm/machine.c
+++ b/target-arm/machine.c
@@ -251,7 +251,7 @@  static int cpu_post_load(void *opaque, int version_id)
     }
 
     if (kvm_enabled()) {
-        if (!write_list_to_kvmstate(cpu)) {
+        if (!write_list_to_kvmstate(cpu, KVM_PUT_FULL_STATE)) {
             return -1;
         }
         /* Note that it's OK for the TCG side not to know about