diff mbox series

[v1] cpus: track calls to resume/pause_all_vcpus()

Message ID 20180409130700.5692-1-david@redhat.com
State New
Headers show
Series [v1] cpus: track calls to resume/pause_all_vcpus() | expand

Commit Message

David Hildenbrand April 9, 2018, 1:07 p.m. UTC
If we have parallel calls to resume/pause_all_vcpus() we can get
into trouble because the qemu mutex is temporarily dropped while
waiting for all threads to stop. This can happen e.g. for s390x, where
resume/pause_all_vcpus() can be triggered by a VCPU.

Pause/Resume exactly once, when we leave/hit "0".

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 cpus.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

Comments

Paolo Bonzini April 9, 2018, 1:12 p.m. UTC | #1
On 09/04/2018 15:07, David Hildenbrand wrote:
> If we have parallel calls to resume/pause_all_vcpus() we can get
> into trouble because the qemu mutex is temporarily dropped while
> waiting for all threads to stop. This can happen e.g. for s390x, where
> resume/pause_all_vcpus() can be triggered by a VCPU.

Why does s390 need to do pause_all_vcpus()/resume_all_vcpus() instead of
just asking the main thread to do it (similar to qemu_system_reset), is
it because diag 308 must be synchronous?

One disadvantage of the current approach is that diag 308 does not obey
-no-reboot.

Paolo
David Hildenbrand April 9, 2018, 1:28 p.m. UTC | #2
On 09.04.2018 15:12, Paolo Bonzini wrote:
> On 09/04/2018 15:07, David Hildenbrand wrote:
>> If we have parallel calls to resume/pause_all_vcpus() we can get
>> into trouble because the qemu mutex is temporarily dropped while
>> waiting for all threads to stop. This can happen e.g. for s390x, where
>> resume/pause_all_vcpus() can be triggered by a VCPU.
> 

I'm also using it resume/pause_all_vcpus() now in a prototype to
temporarily get all VCPUs out of KVM, that's how I noticed that this is
shaky :)

> Why does s390 need to do pause_all_vcpus()/resume_all_vcpus() instead of
> just asking the main thread to do it (similar to qemu_system_reset), is
> it because diag 308 must be synchronous?

Christian implemented it back than to (quoting from another mail)

"I did this to prevent a "still running CPU to restart an already
stopped one"."

The problem is that another VCPU could just be about to send a SIGP
START/RESTART to a VCPU. Without the pause_all_vcpus(), the SIGP could
be delayed and executed just after the "soft reset", therefore resulting
in more than 1 VCPU running.

> 
> One disadvantage of the current approach is that diag 308 does not obey
> -no-reboot.

Both calls are used for kdump+kexec. "kdump on s390 uses a load normal
reset to bring the system in a defined state by doing a subsystem
reset", so like a "soft reboot". I don't think that we want to apply
"-no-reboot" here.

> 
> Paolo
>
David Hildenbrand April 13, 2018, 10:14 a.m. UTC | #3
On 09.04.2018 15:07, David Hildenbrand wrote:
> If we have parallel calls to resume/pause_all_vcpus() we can get
> into trouble because the qemu mutex is temporarily dropped while
> waiting for all threads to stop. This can happen e.g. for s390x, where
> resume/pause_all_vcpus() can be triggered by a VCPU.
> 
> Pause/Resume exactly once, when we leave/hit "0".
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  cpus.c | 31 ++++++++++++++++++++++++-------
>  1 file changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 2e6701795b..7c7e0245c5 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1778,17 +1778,26 @@ static bool all_vcpus_paused(void)
>      return true;
>  }
>  
> +/* wait for the initial vm_start() call */
> +static int vcpus_paused = 1;
> +
>  void pause_all_vcpus(void)
>  {
>      CPUState *cpu;
>  
> -    qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
> -    CPU_FOREACH(cpu) {
> -        if (qemu_cpu_is_self(cpu)) {
> -            qemu_cpu_stop(cpu, true);
> -        } else {
> -            cpu->stop = true;
> -            qemu_cpu_kick(cpu);
> +    assert(qemu_mutex_iothread_locked());
> +    assert(vcpus_paused >= 0);
> +
> +    vcpus_paused++;
> +    if (vcpus_paused == 1) {
> +        qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
> +        CPU_FOREACH(cpu) {
> +            if (qemu_cpu_is_self(cpu)) {
> +                qemu_cpu_stop(cpu, true);
> +            } else {
> +                cpu->stop = true;
> +                qemu_cpu_kick(cpu);
> +            }
>          }
>      }
>  
> @@ -1820,6 +1829,14 @@ void resume_all_vcpus(void)
>  {
>      CPUState *cpu;
>  
> +    assert(vcpus_paused >= 0);
> +    assert(qemu_mutex_iothread_locked());
> +
> +    vcpus_paused--;
> +    if (vcpus_paused > 0) {
> +        return;
> +    }
> +
>      qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
>      CPU_FOREACH(cpu) {
>          cpu_resume(cpu);
> 

So if everything goes well, we have a replacement for s390x and this
patch should no longer be needed.

pause_all_vcpus/resume_all_vcpus should not be called from a VCPU.

.. that implies that I have to find another way to get all CPUs out KVM
for the prototype I am working on :/
diff mbox series

Patch

diff --git a/cpus.c b/cpus.c
index 2e6701795b..7c7e0245c5 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1778,17 +1778,26 @@  static bool all_vcpus_paused(void)
     return true;
 }
 
+/* wait for the initial vm_start() call */
+static int vcpus_paused = 1;
+
 void pause_all_vcpus(void)
 {
     CPUState *cpu;
 
-    qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
-    CPU_FOREACH(cpu) {
-        if (qemu_cpu_is_self(cpu)) {
-            qemu_cpu_stop(cpu, true);
-        } else {
-            cpu->stop = true;
-            qemu_cpu_kick(cpu);
+    assert(qemu_mutex_iothread_locked());
+    assert(vcpus_paused >= 0);
+
+    vcpus_paused++;
+    if (vcpus_paused == 1) {
+        qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
+        CPU_FOREACH(cpu) {
+            if (qemu_cpu_is_self(cpu)) {
+                qemu_cpu_stop(cpu, true);
+            } else {
+                cpu->stop = true;
+                qemu_cpu_kick(cpu);
+            }
         }
     }
 
@@ -1820,6 +1829,14 @@  void resume_all_vcpus(void)
 {
     CPUState *cpu;
 
+    assert(vcpus_paused >= 0);
+    assert(qemu_mutex_iothread_locked());
+
+    vcpus_paused--;
+    if (vcpus_paused > 0) {
+        return;
+    }
+
     qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
     CPU_FOREACH(cpu) {
         cpu_resume(cpu);