diff mbox series

[5/5] KVM: Hook kvm_arch_put_registers() errors to the caller

Message ID 20220617144857.34189-6-peterx@redhat.com
State New
Headers show
Series CPU: Detect put cpu register errors for migrations | expand

Commit Message

Peter Xu June 17, 2022, 2:48 p.m. UTC
Leverage the new mechanism to pass over errors to upper stack for
kvm_arch_put_registers() when called for the post_init() accel hook.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c  | 13 ++++++++++---
 accel/kvm/kvm-cpus.h |  2 +-
 softmmu/cpus.c       |  5 ++++-
 3 files changed, 15 insertions(+), 5 deletions(-)

Comments

Peter Maydell June 23, 2022, 1:09 p.m. UTC | #1
On Fri, 17 Jun 2022 at 15:53, Peter Xu <peterx@redhat.com> wrote:
>
> Leverage the new mechanism to pass over errors to upper stack for
> kvm_arch_put_registers() when called for the post_init() accel hook.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  accel/kvm/kvm-all.c  | 13 ++++++++++---
>  accel/kvm/kvm-cpus.h |  2 +-
>  softmmu/cpus.c       |  5 ++++-
>  3 files changed, 15 insertions(+), 5 deletions(-)

Checking for errors definitely does seem like the right thing to do.
That said:

(1) Why do we want to check for errors only on the call
for post_init synchronize, and not any of the other places
where we call kvm_arch_put_registers()?

(2) I suspect this will uncover some situations where we've
been happening-to-work because we ignore an error, and now
will start to actively fail. But I guess there's not much
we can do about that except say "we'll fix them as we encounter
bug reports about them". (I know of at least one: on the
Mac M1 running Linux, if the host doesn't have this kernel fix:
https://lore.kernel.org/all/YnHz6Cw5ONR2e+KA@google.com/T/
then the first put_registers will fail (mostly harmlessly).
I think that's the post_init sync but it might be the post_reset
one.)

thanks
-- PMM
Peter Xu June 23, 2022, 4:55 p.m. UTC | #2
On Thu, Jun 23, 2022 at 02:09:43PM +0100, Peter Maydell wrote:
> On Fri, 17 Jun 2022 at 15:53, Peter Xu <peterx@redhat.com> wrote:
> >
> > Leverage the new mechanism to pass over errors to upper stack for
> > kvm_arch_put_registers() when called for the post_init() accel hook.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  accel/kvm/kvm-all.c  | 13 ++++++++++---
> >  accel/kvm/kvm-cpus.h |  2 +-
> >  softmmu/cpus.c       |  5 ++++-
> >  3 files changed, 15 insertions(+), 5 deletions(-)
> 
> Checking for errors definitely does seem like the right thing to do.
> That said:
> 
> (1) Why do we want to check for errors only on the call
> for post_init synchronize, and not any of the other places
> where we call kvm_arch_put_registers()?

Because I only know that's what we need to keep live migration honest on
being successful, and I didn't want to spread the fire elsewhere, neither
from knowledge nor time..  So I wanted to keep the series simple but
useful.

If we have reasons to cover some of the rest, I can still try.

> 
> (2) I suspect this will uncover some situations where we've
> been happening-to-work because we ignore an error, and now
> will start to actively fail. But I guess there's not much
> we can do about that except say "we'll fix them as we encounter
> bug reports about them". (I know of at least one: on the
> Mac M1 running Linux, if the host doesn't have this kernel fix:
> https://lore.kernel.org/all/YnHz6Cw5ONR2e+KA@google.com/T/
> then the first put_registers will fail (mostly harmlessly).
> I think that's the post_init sync but it might be the post_reset
> one.)

.. from what I read from the commit message in the link, hopefully that was
only about the reset process since that sounds like a mismatched regs
before/after gic created.  When migration completes, I guess we're always
fetching from the after-gic-created case?  But it'll be great if we double
check.  Luckily it seems only for m1.

What my series wanted to achieve is not affect anything else but migration
(so if they fail elsewhere they keep the benign failing).  What I worried
is exactly when we have benign failures on put regs on live migration use
cases, but hopefully not.

Thanks,
diff mbox series

Patch

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index df4f7c98f3..03e29ab1ed 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2771,15 +2771,22 @@  void kvm_cpu_synchronize_post_reset(CPUState *cpu)
     run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 }
 
-static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
+static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg,
+                                             Error **errp)
 {
-    kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
+    int ret = kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
+
+    if (ret) {
+        error_setg(errp, "kvm_arch_put_registers() failed with retval=%d", ret);
+        return;
+    }
+
     cpu->vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_init(CPUState *cpu, Error **errp)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+    run_on_cpu2(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL, errp);
 }
 
 static void do_kvm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg)
diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
index bf0bd1bee4..c9b8262704 100644
--- a/accel/kvm/kvm-cpus.h
+++ b/accel/kvm/kvm-cpus.h
@@ -16,7 +16,7 @@  int kvm_init_vcpu(CPUState *cpu, Error **errp);
 int kvm_cpu_exec(CPUState *cpu);
 void kvm_destroy_vcpu(CPUState *cpu);
 void kvm_cpu_synchronize_post_reset(CPUState *cpu);
-void kvm_cpu_synchronize_post_init(CPUState *cpu);
+void kvm_cpu_synchronize_post_init(CPUState *cpu, Error **errp);
 void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu);
 
 #endif /* KVM_CPUS_H */
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 59c70fd496..6c0b5b87f0 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -151,7 +151,10 @@  void cpu_synchronize_all_post_init(Error **errp)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
-        cpu_synchronize_post_init(cpu);
+        cpu_synchronize_post_init_full(cpu, errp);
+        if (errp && *errp) {
+            break;
+        }
     }
 }