diff mbox series

[v1,for-2.11,2/3] s390x/tcg: fix DIAG 308 with > 1 VCPU (single threaded TCG)

Message ID 20171116170526.12643-3-david@redhat.com
State New
Headers show
Series s390x: fixes for SIGP and DIAG 308 | expand

Commit Message

David Hildenbrand Nov. 16, 2017, 5:05 p.m. UTC
Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
the bios tries to switch to the loaded kernel via DIAG 308.

pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.

And there is also no need for it. run_on_cpu() will make sure that the
CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
longer run.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/diag.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Alex Bennée Nov. 16, 2017, 5:37 p.m. UTC | #1
David Hildenbrand <david@redhat.com> writes:

> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
> the bios tries to switch to the loaded kernel via DIAG 308.
>
> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
>
> And there is also no need for it. run_on_cpu() will make sure that the
> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
> longer run.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/diag.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index dbbb9e886f..52bc348808 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>      CPUState *t;
>
> -    pause_all_vcpus();
>      cpu_synchronize_all_states();
>      CPU_FOREACH(t) {
>          run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);

I think you also need to fix the run_on_cpu to be a async_run_on_cpu as
you would otherwise hang waiting for run_on_cpu to finish on a
single-threaded TCG run (as you are in the only vCPU context).

If it is important that the source vCPU doesn't continue you can
schedule it's work afterwards with a async_safe_run_on_cpu which will
complete after all other vCPUs have executed their work.


> @@ -37,7 +36,6 @@ static int modified_clear_reset(S390CPU *cpu)
>      s390_crypto_reset();
>      scc->load_normal(CPU(cpu));
>      cpu_synchronize_all_post_reset();
> -    resume_all_vcpus();
>      return 0;
>  }
>
> @@ -53,7 +51,6 @@ static int load_normal_reset(S390CPU *cpu)
>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>      CPUState *t;
>
> -    pause_all_vcpus();
>      cpu_synchronize_all_states();
>      CPU_FOREACH(t) {
>          run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL);
> @@ -63,7 +60,6 @@ static int load_normal_reset(S390CPU *cpu)
>      scc->initial_cpu_reset(CPU(cpu));
>      scc->load_normal(CPU(cpu));
>      cpu_synchronize_all_post_reset();
> -    resume_all_vcpus();
>      return 0;
>  }
>
> --
> 2.13.6


--
Alex Bennée
David Hildenbrand Nov. 16, 2017, 5:47 p.m. UTC | #2
On 16.11.2017 18:05, David Hildenbrand wrote:
> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
> the bios tries to switch to the loaded kernel via DIAG 308.
> 
> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
> 
> And there is also no need for it. run_on_cpu() will make sure that the
> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
> longer run.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/diag.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index dbbb9e886f..52bc348808 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>      CPUState *t;
>  
> -    pause_all_vcpus();
>      cpu_synchronize_all_states();

Hmm, thinking about this whole synchronization thingy, I guess it could
happen that a VCPU starts running again after the
cpu_synchronize_all_states() but before the reset. So we should do that
in one shot. Resulting in something like the following. Comments?



From 14440f775e65bb88c1a95705fc9b438f4022f332 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 15 Nov 2017 17:30:02 +0100
Subject: [PATCH v1] s390x/tcg: fix DIAG 308 with > 1 VCPU (single threaded
 TCG)

Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
the bios tries to switch to the loaded kernel via DIAG 308.

pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.

And there is also no need for it. run_on_cpu() will make sure that the
CPUs exits KVM/TCG, where they get stopped. Once stopped, they will no
longer run. As we have to proper synchronize before and after the reset,
and have to hinder the VCPU from going back into TCG/KVM execution
after the sync, let's do the synchronization in the same shot.

Optimize so we don't do one additional round of synchronization on the
running CPU.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/diag.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index dbbb9e886f..8396124a5f 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -19,51 +19,65 @@
 #include "exec/exec-all.h"
 #include "hw/watchdog/wdt_diag288.h"
 #include "sysemu/cpus.h"
+#include "sysemu/hw_accel.h"
 #include "hw/s390x/ipl.h"
 #include "hw/s390x/s390-virtio-ccw.h"

+static void s390_do_sync_cpu_full_reset(CPUState *cs, run_on_cpu_data arg)
+{
+    cpu_synchronize_state(cs);
+    cpu_reset(cs);
+    cpu_synchronize_post_reset(cs);
+}
+
 static int modified_clear_reset(S390CPU *cpu)
 {
     S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
+    CPUState *cs = CPU(cpu);
     CPUState *t;

-    pause_all_vcpus();
-    cpu_synchronize_all_states();
     CPU_FOREACH(t) {
-        run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);
+        if (t != cs) {
+            run_on_cpu(t, s390_do_sync_cpu_full_reset, RUN_ON_CPU_NULL);
+        }
     }
     s390_cmma_reset();
     subsystem_reset();
     s390_crypto_reset();
-    scc->load_normal(CPU(cpu));
-    cpu_synchronize_all_post_reset();
-    resume_all_vcpus();
+    cpu_synchronize_state(cs);
+    cpu_reset(cs);
+    scc->load_normal(cs);
+    cpu_synchronize_post_reset(cs);
     return 0;
 }

-static inline void s390_do_cpu_reset(CPUState *cs, run_on_cpu_data arg)
+static void s390_do_sync_cpu_reset(CPUState *cs, run_on_cpu_data arg)
 {
     S390CPUClass *scc = S390_CPU_GET_CLASS(cs);

+    cpu_synchronize_state(cs);
     scc->cpu_reset(cs);
+    cpu_synchronize_post_reset(cs);
 }

 static int load_normal_reset(S390CPU *cpu)
 {
     S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
+    CPUState *cs = CPU(cpu);
     CPUState *t;

-    pause_all_vcpus();
-    cpu_synchronize_all_states();
     CPU_FOREACH(t) {
-        run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL);
+        if (t != cs) {
+            run_on_cpu(t, s390_do_sync_cpu_reset, RUN_ON_CPU_NULL);
+        }
     }
     s390_cmma_reset();
     subsystem_reset();
-    scc->initial_cpu_reset(CPU(cpu));
-    scc->load_normal(CPU(cpu));
-    cpu_synchronize_all_post_reset();
-    resume_all_vcpus();
+    cpu_synchronize_state(cs);
+    /* scc->initital_cpu_reset() is a superset of scc->cpu_reset() */
+    scc->initial_cpu_reset(cs);
+    scc->load_normal(cs);
+    cpu_synchronize_post_reset(cs);
     return 0;
 }
David Hildenbrand Nov. 16, 2017, 5:52 p.m. UTC | #3
On 16.11.2017 18:37, Alex Bennée wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
>> the bios tries to switch to the loaded kernel via DIAG 308.
>>
>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
>>
>> And there is also no need for it. run_on_cpu() will make sure that the
>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
>> longer run.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/diag.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>> index dbbb9e886f..52bc348808 100644
>> --- a/target/s390x/diag.c
>> +++ b/target/s390x/diag.c
>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>>      CPUState *t;
>>
>> -    pause_all_vcpus();
>>      cpu_synchronize_all_states();
>>      CPU_FOREACH(t) {
>>          run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);
> 
> I think you also need to fix the run_on_cpu to be a async_run_on_cpu as
> you would otherwise hang waiting for run_on_cpu to finish on a
> single-threaded TCG run (as you are in the only vCPU context).

No, it works just fine for single threaded TCG. run_on_cpu() can deal
with single threaded TCG just fine. (otherwise e.g. SIGP code also
wouldn't work)

In do_run_on_cpu, the following code always directly triggers for single
threaded tcg:

    if (qemu_cpu_is_self(cpu)) {

        func(cpu, data);

        return;

    }

> 
> If it is important that the source vCPU doesn't continue you can
> schedule it's work afterwards with a async_safe_run_on_cpu which will
> complete after all other vCPUs have executed their work.
> 

It is important. Introducing async helpers at a point where sync is
needed sounds strange.

Thanks!

>>
>> --
>> 2.13.6
> 
> 
> --
> Alex Bennée
>
Alex Bennée Nov. 16, 2017, 6:12 p.m. UTC | #4
David Hildenbrand <david@redhat.com> writes:

> On 16.11.2017 18:37, Alex Bennée wrote:
>>
>> David Hildenbrand <david@redhat.com> writes:
>>
>>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
>>> the bios tries to switch to the loaded kernel via DIAG 308.
>>>
>>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
>>>
>>> And there is also no need for it. run_on_cpu() will make sure that the
>>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
>>> longer run.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  target/s390x/diag.c | 4 ----
>>>  1 file changed, 4 deletions(-)
>>>
>>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>>> index dbbb9e886f..52bc348808 100644
>>> --- a/target/s390x/diag.c
>>> +++ b/target/s390x/diag.c
>>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>>>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>>>      CPUState *t;
>>>
>>> -    pause_all_vcpus();
>>>      cpu_synchronize_all_states();
>>>      CPU_FOREACH(t) {
>>>          run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);
>>
>> I think you also need to fix the run_on_cpu to be a async_run_on_cpu as
>> you would otherwise hang waiting for run_on_cpu to finish on a
>> single-threaded TCG run (as you are in the only vCPU context).
>
> No, it works just fine for single threaded TCG. run_on_cpu() can deal
> with single threaded TCG just fine. (otherwise e.g. SIGP code also
> wouldn't work)
>
> In do_run_on_cpu, the following code always directly triggers for single
> threaded tcg:
>
>     if (qemu_cpu_is_self(cpu)) {
>
>         func(cpu, data);
>
>         return;
>
>     }

For -smp 1 it's fine, but have you tested --accel thread=single -smp 2?

>
>>
>> If it is important that the source vCPU doesn't continue you can
>> schedule it's work afterwards with a async_safe_run_on_cpu which will
>> complete after all other vCPUs have executed their work.
>>
>
> It is important. Introducing async helpers at a point where sync is
> needed sounds strange.

Well the helper that schedulules the final async helper also needs to
exit the run loop at that point, otherwise you are right it would
attempt to execute a few more instructions in the block.

>
> Thanks!
>
>>>
>>> --
>>> 2.13.6
>>
>>
>> --
>> Alex Bennée
>>
>
>
> --
>
> Thanks,
>
> David / dhildenb


--
Alex Bennée
Alex Bennée Nov. 16, 2017, 6:14 p.m. UTC | #5
David Hildenbrand <david@redhat.com> writes:

> On 16.11.2017 18:37, Alex Bennée wrote:
>>
>> David Hildenbrand <david@redhat.com> writes:
>>
>>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
>>> the bios tries to switch to the loaded kernel via DIAG 308.
>>>
>>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
>>>
>>> And there is also no need for it. run_on_cpu() will make sure that the
>>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
>>> longer run.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  target/s390x/diag.c | 4 ----
>>>  1 file changed, 4 deletions(-)
>>>
>>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>>> index dbbb9e886f..52bc348808 100644
>>> --- a/target/s390x/diag.c
>>> +++ b/target/s390x/diag.c
>>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>>>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>>>      CPUState *t;
>>>
>>> -    pause_all_vcpus();
>>>      cpu_synchronize_all_states();
>>>      CPU_FOREACH(t) {
>>>          run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);
>>
>> I think you also need to fix the run_on_cpu to be a async_run_on_cpu as
>> you would otherwise hang waiting for run_on_cpu to finish on a
>> single-threaded TCG run (as you are in the only vCPU context).
>
> No, it works just fine for single threaded TCG. run_on_cpu() can deal
> with single threaded TCG just fine. (otherwise e.g. SIGP code also
> wouldn't work)
>
> In do_run_on_cpu, the following code always directly triggers for single
> threaded tcg:
>
>     if (qemu_cpu_is_self(cpu)) {
>
>         func(cpu, data);
>
>         return;
>
>     }

Sorry ignore what I said, qemu_cpu_is_self tests the underlying thread
not the vCPU which they all share.


--
Alex Bennée
David Hildenbrand Nov. 16, 2017, 6:24 p.m. UTC | #6
On 16.11.2017 19:12, Alex Bennée wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 16.11.2017 18:37, Alex Bennée wrote:
>>>
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
>>>> the bios tries to switch to the loaded kernel via DIAG 308.
>>>>
>>>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
>>>>
>>>> And there is also no need for it. run_on_cpu() will make sure that the
>>>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
>>>> longer run.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  target/s390x/diag.c | 4 ----
>>>>  1 file changed, 4 deletions(-)
>>>>
>>>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>>>> index dbbb9e886f..52bc348808 100644
>>>> --- a/target/s390x/diag.c
>>>> +++ b/target/s390x/diag.c
>>>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>>>>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>>>>      CPUState *t;
>>>>
>>>> -    pause_all_vcpus();
>>>>      cpu_synchronize_all_states();
>>>>      CPU_FOREACH(t) {
>>>>          run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);
>>>
>>> I think you also need to fix the run_on_cpu to be a async_run_on_cpu as
>>> you would otherwise hang waiting for run_on_cpu to finish on a
>>> single-threaded TCG run (as you are in the only vCPU context).
>>
>> No, it works just fine for single threaded TCG. run_on_cpu() can deal
>> with single threaded TCG just fine. (otherwise e.g. SIGP code also
>> wouldn't work)
>>
>> In do_run_on_cpu, the following code always directly triggers for single
>> threaded tcg:
>>
>>     if (qemu_cpu_is_self(cpu)) {
>>
>>         func(cpu, data);
>>
>>         return;
>>
>>     }
> 
> For -smp 1 it's fine, but have you tested --accel thread=single -smp 2?

Yes, otherwise I would never have noticed this bug ;)

I use for my current single threaded setup:
"--accel thread=single -smp 4"

The point is: if run_on_cpu() would not be able to cope with this very
simple problem, it would basically be useless.

instead of scheduling work, it simply executes all these functions
directly for single threaded TCG. For multi threaded TCG it actually
schedules work, that's why I notice the missing iolock (see patch
following this one)

> 
>>
>>>
>>> If it is important that the source vCPU doesn't continue you can
>>> schedule it's work afterwards with a async_safe_run_on_cpu which will
>>> complete after all other vCPUs have executed their work.
>>>
>>
>> It is important. Introducing async helpers at a point where sync is
>> needed sounds strange.
> 
> Well the helper that schedulules the final async helper also needs to
> exit the run loop at that point, otherwise you are right it would
> attempt to execute a few more instructions in the block.
> 

And the nice thing about run_on_cpu here is for single threaded TCG that
not a single work has to be scheduled.
Christian Borntraeger Nov. 16, 2017, 8:57 p.m. UTC | #7
Please change the subject. In busy times I tend to ignore tcg patches.
This code is clearly kvm and tcg.
So what about "s390x/diag:" as prefix?

On 11/16/2017 06:05 PM, David Hildenbrand wrote:
> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
> the bios tries to switch to the loaded kernel via DIAG 308.
> 
> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
> 
> And there is also no need for it. run_on_cpu() will make sure that the
> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
> longer run.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/diag.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index dbbb9e886f..52bc348808 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>      CPUState *t;
> 
> -    pause_all_vcpus();


I did this to prevent a "still running CPU to restart an already stopped one".
Are we sure that this can not happen?


>      cpu_synchronize_all_states();
>      CPU_FOREACH(t) {
>          run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);
> @@ -37,7 +36,6 @@ static int modified_clear_reset(S390CPU *cpu)
>      s390_crypto_reset();
>      scc->load_normal(CPU(cpu));
>      cpu_synchronize_all_post_reset();
> -    resume_all_vcpus();
>      return 0;
>  }
> 
> @@ -53,7 +51,6 @@ static int load_normal_reset(S390CPU *cpu)
>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>      CPUState *t;
> 
> -    pause_all_vcpus();
>      cpu_synchronize_all_states();
>      CPU_FOREACH(t) {
>          run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL);
> @@ -63,7 +60,6 @@ static int load_normal_reset(S390CPU *cpu)
>      scc->initial_cpu_reset(CPU(cpu));
>      scc->load_normal(CPU(cpu));
>      cpu_synchronize_all_post_reset();
> -    resume_all_vcpus();
>      return 0;
>  }
>
David Hildenbrand Nov. 16, 2017, 9:42 p.m. UTC | #8
On 16.11.2017 21:57, Christian Borntraeger wrote:
> Please change the subject. In busy times I tend to ignore tcg patches.
> This code is clearly kvm and tcg.
> So what about "s390x/diag:" as prefix?

Right, it was a fix for TCG, that's why I added TCG only. But it should
have been purely s390x or s390x/diag.

> 
> On 11/16/2017 06:05 PM, David Hildenbrand wrote:
>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when
>> the bios tries to switch to the loaded kernel via DIAG 308.
>>
>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU.
>>
>> And there is also no need for it. run_on_cpu() will make sure that the
>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no
>> longer run.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/diag.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>> index dbbb9e886f..52bc348808 100644
>> --- a/target/s390x/diag.c
>> +++ b/target/s390x/diag.c
>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu)
>>      S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
>>      CPUState *t;
>>
>> -    pause_all_vcpus();
> 
> 
> I did this to prevent a "still running CPU to restart an already stopped one".
> Are we sure that this can not happen?

Interesting question. I thought it would not be a problem but the way
locking with run_on_cpu() is handled is tricky. Now I am worried about a
couple of deadlocks.

pause_all_vcpus() actually gives up the qemu_global_mutex, so anybody
waiting for that mutex can continue.

We have a deadlock if two CPUs at the same time would call
pause_all_vcpus(). E.g. two CPUs executing at the same time a DIAG 308
(unlikely).

run_on_cpu temporarily gives up the qemu_global_mutex. If two CPUs call
a run_on_cpu at the same time against each other, we might also have a
deadlock.

Two CPUs simultaneously trying to send a SIGP START/STOP/RESTART cannot
run into a deadlock as they are protected by the SIGP mutex with a trylock.

So even with pause_all_vcpus() I think we have a problem if another CPU
sends a SIGP to the issuing DIAG CPU and expects the run_on_cpu to
trigger. Deadlock, but unlikely I assume?

Let's defer this patch for now, booting with 1 VCPU is not degraded and
it looked easier than it is.

Maybe fixing pause_all_vcpus() to work with more than one CPU in single
threaded TCG is an (easier) alternative and at least keeps the (tested)
state. 2.12 material.
diff mbox series

Patch

diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index dbbb9e886f..52bc348808 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -27,7 +27,6 @@  static int modified_clear_reset(S390CPU *cpu)
     S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
     CPUState *t;
 
-    pause_all_vcpus();
     cpu_synchronize_all_states();
     CPU_FOREACH(t) {
         run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL);
@@ -37,7 +36,6 @@  static int modified_clear_reset(S390CPU *cpu)
     s390_crypto_reset();
     scc->load_normal(CPU(cpu));
     cpu_synchronize_all_post_reset();
-    resume_all_vcpus();
     return 0;
 }
 
@@ -53,7 +51,6 @@  static int load_normal_reset(S390CPU *cpu)
     S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
     CPUState *t;
 
-    pause_all_vcpus();
     cpu_synchronize_all_states();
     CPU_FOREACH(t) {
         run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL);
@@ -63,7 +60,6 @@  static int load_normal_reset(S390CPU *cpu)
     scc->initial_cpu_reset(CPU(cpu));
     scc->load_normal(CPU(cpu));
     cpu_synchronize_all_post_reset();
-    resume_all_vcpus();
     return 0;
 }