Message ID | 20171116170526.12643-3-david@redhat.com |
---|---|
State | New |
Headers | show |
Series | s390x: fixes for SIGP and DIAG 308 | expand |
David Hildenbrand <david@redhat.com> writes: > Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when > the bios tries to switch to the loaded kernel via DIAG 308. > > pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. > > And there is also no need for it. run_on_cpu() will make sure that the > CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no > longer run. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > target/s390x/diag.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/target/s390x/diag.c b/target/s390x/diag.c > index dbbb9e886f..52bc348808 100644 > --- a/target/s390x/diag.c > +++ b/target/s390x/diag.c > @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) > S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); > CPUState *t; > > - pause_all_vcpus(); > cpu_synchronize_all_states(); > CPU_FOREACH(t) { > run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); I think you also need to fix the run_on_cpu to be a async_run_on_cpu as you would otherwise hang waiting for run_on_cpu to finish on a single-threaded TCG run (as you are in the only vCPU context). If it is important that the source vCPU doesn't continue you can schedule it's work afterwards with a async_safe_run_on_cpu which will complete after all other vCPUs have executed their work. > @@ -37,7 +36,6 @@ static int modified_clear_reset(S390CPU *cpu) > s390_crypto_reset(); > scc->load_normal(CPU(cpu)); > cpu_synchronize_all_post_reset(); > - resume_all_vcpus(); > return 0; > } > > @@ -53,7 +51,6 @@ static int load_normal_reset(S390CPU *cpu) > S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); > CPUState *t; > > - pause_all_vcpus(); > cpu_synchronize_all_states(); > CPU_FOREACH(t) { > run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL); > @@ -63,7 +60,6 @@ static int load_normal_reset(S390CPU *cpu) > scc->initial_cpu_reset(CPU(cpu)); > scc->load_normal(CPU(cpu)); > cpu_synchronize_all_post_reset(); > - resume_all_vcpus(); > return 0; > } > > -- > 2.13.6 -- Alex Bennée
On 16.11.2017 18:05, David Hildenbrand wrote: > Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when > the bios tries to switch to the loaded kernel via DIAG 308. > > pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. > > And there is also no need for it. run_on_cpu() will make sure that the > CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no > longer run. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > target/s390x/diag.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/target/s390x/diag.c b/target/s390x/diag.c > index dbbb9e886f..52bc348808 100644 > --- a/target/s390x/diag.c > +++ b/target/s390x/diag.c > @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) > S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); > CPUState *t; > > - pause_all_vcpus(); > cpu_synchronize_all_states(); Hmm, thinking about this whole synchronization thingy, I guess it could happen that a VCPU starts running again after the cpu_synchronize_all_states() but before the reset. So we should do that in one shot. Resulting in something like the following. Comments? From 14440f775e65bb88c1a95705fc9b438f4022f332 Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david@redhat.com> Date: Wed, 15 Nov 2017 17:30:02 +0100 Subject: [PATCH v1] s390x/tcg: fix DIAG 308 with > 1 VCPU (single threaded TCG) Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when the bios tries to switch to the loaded kernel via DIAG 308. pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. And there is also no need for it. run_on_cpu() will make sure that the CPUs exits KVM/TCG, where they get stopped. Once stopped, they will no longer run. As we have to proper synchronize before and after the reset, and have to hinder the VCPU from going back into TCG/KVM execution after the sync, let's do the synchronization in the same shot. Optimize so we don't do one additional round of synchronization on the running CPU. Signed-off-by: David Hildenbrand <david@redhat.com> --- target/s390x/diag.c | 42 ++++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/target/s390x/diag.c b/target/s390x/diag.c index dbbb9e886f..8396124a5f 100644 --- a/target/s390x/diag.c +++ b/target/s390x/diag.c @@ -19,51 +19,65 @@ #include "exec/exec-all.h" #include "hw/watchdog/wdt_diag288.h" #include "sysemu/cpus.h" +#include "sysemu/hw_accel.h" #include "hw/s390x/ipl.h" #include "hw/s390x/s390-virtio-ccw.h" +static void s390_do_sync_cpu_full_reset(CPUState *cs, run_on_cpu_data arg) +{ + cpu_synchronize_state(cs); + cpu_reset(cs); + cpu_synchronize_post_reset(cs); +} + static int modified_clear_reset(S390CPU *cpu) { S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); + CPUState *cs = CPU(cpu); CPUState *t; - pause_all_vcpus(); - cpu_synchronize_all_states(); CPU_FOREACH(t) { - run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); + if (t != cs) { + run_on_cpu(t, s390_do_sync_cpu_full_reset, RUN_ON_CPU_NULL); + } } s390_cmma_reset(); subsystem_reset(); s390_crypto_reset(); - scc->load_normal(CPU(cpu)); - cpu_synchronize_all_post_reset(); - resume_all_vcpus(); + cpu_synchronize_state(cs); + cpu_reset(cs); + scc->load_normal(cs); + cpu_synchronize_post_reset(cs); return 0; } -static inline void s390_do_cpu_reset(CPUState *cs, run_on_cpu_data arg) +static void s390_do_sync_cpu_reset(CPUState *cs, run_on_cpu_data arg) { S390CPUClass *scc = S390_CPU_GET_CLASS(cs); + cpu_synchronize_state(cs); scc->cpu_reset(cs); + cpu_synchronize_post_reset(cs); } static int load_normal_reset(S390CPU *cpu) { S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); + CPUState *cs = CPU(cpu); CPUState *t; - pause_all_vcpus(); - cpu_synchronize_all_states(); CPU_FOREACH(t) { - run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL); + if (t != cs) { + run_on_cpu(t, s390_do_sync_cpu_reset, RUN_ON_CPU_NULL); + } } s390_cmma_reset(); subsystem_reset(); - scc->initial_cpu_reset(CPU(cpu)); - scc->load_normal(CPU(cpu)); - cpu_synchronize_all_post_reset(); - resume_all_vcpus(); + cpu_synchronize_state(cs); + /* scc->initital_cpu_reset() is a superset of scc->cpu_reset() */ + scc->initial_cpu_reset(cs); + scc->load_normal(cs); + cpu_synchronize_post_reset(cs); return 0; }
On 16.11.2017 18:37, Alex Bennée wrote: > > David Hildenbrand <david@redhat.com> writes: > >> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when >> the bios tries to switch to the loaded kernel via DIAG 308. >> >> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. >> >> And there is also no need for it. run_on_cpu() will make sure that the >> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no >> longer run. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> target/s390x/diag.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/target/s390x/diag.c b/target/s390x/diag.c >> index dbbb9e886f..52bc348808 100644 >> --- a/target/s390x/diag.c >> +++ b/target/s390x/diag.c >> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) >> S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); >> CPUState *t; >> >> - pause_all_vcpus(); >> cpu_synchronize_all_states(); >> CPU_FOREACH(t) { >> run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); > > I think you also need to fix the run_on_cpu to be a async_run_on_cpu as > you would otherwise hang waiting for run_on_cpu to finish on a > single-threaded TCG run (as you are in the only vCPU context). No, it works just fine for single threaded TCG. run_on_cpu() can deal with single threaded TCG just fine. (otherwise e.g. SIGP code also wouldn't work) In do_run_on_cpu, the following code always directly triggers for single threaded tcg: if (qemu_cpu_is_self(cpu)) { func(cpu, data); return; } > > If it is important that the source vCPU doesn't continue you can > schedule it's work afterwards with a async_safe_run_on_cpu which will > complete after all other vCPUs have executed their work. > It is important. Introducing async helpers at a point where sync is needed sounds strange. Thanks! >> >> -- >> 2.13.6 > > > -- > Alex Bennée >
David Hildenbrand <david@redhat.com> writes: > On 16.11.2017 18:37, Alex Bennée wrote: >> >> David Hildenbrand <david@redhat.com> writes: >> >>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when >>> the bios tries to switch to the loaded kernel via DIAG 308. >>> >>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. >>> >>> And there is also no need for it. run_on_cpu() will make sure that the >>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no >>> longer run. >>> >>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> --- >>> target/s390x/diag.c | 4 ---- >>> 1 file changed, 4 deletions(-) >>> >>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c >>> index dbbb9e886f..52bc348808 100644 >>> --- a/target/s390x/diag.c >>> +++ b/target/s390x/diag.c >>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) >>> S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); >>> CPUState *t; >>> >>> - pause_all_vcpus(); >>> cpu_synchronize_all_states(); >>> CPU_FOREACH(t) { >>> run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); >> >> I think you also need to fix the run_on_cpu to be a async_run_on_cpu as >> you would otherwise hang waiting for run_on_cpu to finish on a >> single-threaded TCG run (as you are in the only vCPU context). > > No, it works just fine for single threaded TCG. run_on_cpu() can deal > with single threaded TCG just fine. (otherwise e.g. SIGP code also > wouldn't work) > > In do_run_on_cpu, the following code always directly triggers for single > threaded tcg: > > if (qemu_cpu_is_self(cpu)) { > > func(cpu, data); > > return; > > } For -smp 1 it's fine, but have you tested --accel thread=single -smp 2? > >> >> If it is important that the source vCPU doesn't continue you can >> schedule it's work afterwards with a async_safe_run_on_cpu which will >> complete after all other vCPUs have executed their work. >> > > It is important. Introducing async helpers at a point where sync is > needed sounds strange. Well the helper that schedulules the final async helper also needs to exit the run loop at that point, otherwise you are right it would attempt to execute a few more instructions in the block. > > Thanks! > >>> >>> -- >>> 2.13.6 >> >> >> -- >> Alex Bennée >> > > > -- > > Thanks, > > David / dhildenb -- Alex Bennée
David Hildenbrand <david@redhat.com> writes: > On 16.11.2017 18:37, Alex Bennée wrote: >> >> David Hildenbrand <david@redhat.com> writes: >> >>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when >>> the bios tries to switch to the loaded kernel via DIAG 308. >>> >>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. >>> >>> And there is also no need for it. run_on_cpu() will make sure that the >>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no >>> longer run. >>> >>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> --- >>> target/s390x/diag.c | 4 ---- >>> 1 file changed, 4 deletions(-) >>> >>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c >>> index dbbb9e886f..52bc348808 100644 >>> --- a/target/s390x/diag.c >>> +++ b/target/s390x/diag.c >>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) >>> S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); >>> CPUState *t; >>> >>> - pause_all_vcpus(); >>> cpu_synchronize_all_states(); >>> CPU_FOREACH(t) { >>> run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); >> >> I think you also need to fix the run_on_cpu to be a async_run_on_cpu as >> you would otherwise hang waiting for run_on_cpu to finish on a >> single-threaded TCG run (as you are in the only vCPU context). > > No, it works just fine for single threaded TCG. run_on_cpu() can deal > with single threaded TCG just fine. (otherwise e.g. SIGP code also > wouldn't work) > > In do_run_on_cpu, the following code always directly triggers for single > threaded tcg: > > if (qemu_cpu_is_self(cpu)) { > > func(cpu, data); > > return; > > } Sorry ignore what I said, qemu_cpu_is_self tests the underlying thread not the vCPU which they all share. -- Alex Bennée
On 16.11.2017 19:12, Alex Bennée wrote: > > David Hildenbrand <david@redhat.com> writes: > >> On 16.11.2017 18:37, Alex Bennée wrote: >>> >>> David Hildenbrand <david@redhat.com> writes: >>> >>>> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when >>>> the bios tries to switch to the loaded kernel via DIAG 308. >>>> >>>> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. >>>> >>>> And there is also no need for it. run_on_cpu() will make sure that the >>>> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no >>>> longer run. >>>> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>> --- >>>> target/s390x/diag.c | 4 ---- >>>> 1 file changed, 4 deletions(-) >>>> >>>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c >>>> index dbbb9e886f..52bc348808 100644 >>>> --- a/target/s390x/diag.c >>>> +++ b/target/s390x/diag.c >>>> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) >>>> S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); >>>> CPUState *t; >>>> >>>> - pause_all_vcpus(); >>>> cpu_synchronize_all_states(); >>>> CPU_FOREACH(t) { >>>> run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); >>> >>> I think you also need to fix the run_on_cpu to be a async_run_on_cpu as >>> you would otherwise hang waiting for run_on_cpu to finish on a >>> single-threaded TCG run (as you are in the only vCPU context). >> >> No, it works just fine for single threaded TCG. run_on_cpu() can deal >> with single threaded TCG just fine. (otherwise e.g. SIGP code also >> wouldn't work) >> >> In do_run_on_cpu, the following code always directly triggers for single >> threaded tcg: >> >> if (qemu_cpu_is_self(cpu)) { >> >> func(cpu, data); >> >> return; >> >> } > > For -smp 1 it's fine, but have you tested --accel thread=single -smp 2? Yes, otherwise I would never have noticed this bug ;) I use for my current single threaded setup: "--accel thread=single -smp 4" The point is: if run_on_cpu() would not be able to cope with this very simple problem, it would basically be useless. instead of scheduling work, it simply executes all these functions directly for single threaded TCG. For multi threaded TCG it actually schedules work, that's why I notice the missing iolock (see patch following this one) > >> >>> >>> If it is important that the source vCPU doesn't continue you can >>> schedule it's work afterwards with a async_safe_run_on_cpu which will >>> complete after all other vCPUs have executed their work. >>> >> >> It is important. Introducing async helpers at a point where sync is >> needed sounds strange. > > Well the helper that schedulules the final async helper also needs to > exit the run loop at that point, otherwise you are right it would > attempt to execute a few more instructions in the block. > And the nice thing about run_on_cpu here is for single threaded TCG that not a single work has to be scheduled.
Please change the subject. In busy times I tend to ignore tcg patches. This code is clearly kvm and tcg. So what about "s390x/diag:" as prefix? On 11/16/2017 06:05 PM, David Hildenbrand wrote: > Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when > the bios tries to switch to the loaded kernel via DIAG 308. > > pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. > > And there is also no need for it. run_on_cpu() will make sure that the > CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no > longer run. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > target/s390x/diag.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/target/s390x/diag.c b/target/s390x/diag.c > index dbbb9e886f..52bc348808 100644 > --- a/target/s390x/diag.c > +++ b/target/s390x/diag.c > @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) > S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); > CPUState *t; > > - pause_all_vcpus(); I did this to prevent a "still running CPU to restart an already stopped one". Are we sure that this can not happen? > cpu_synchronize_all_states(); > CPU_FOREACH(t) { > run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); > @@ -37,7 +36,6 @@ static int modified_clear_reset(S390CPU *cpu) > s390_crypto_reset(); > scc->load_normal(CPU(cpu)); > cpu_synchronize_all_post_reset(); > - resume_all_vcpus(); > return 0; > } > > @@ -53,7 +51,6 @@ static int load_normal_reset(S390CPU *cpu) > S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); > CPUState *t; > > - pause_all_vcpus(); > cpu_synchronize_all_states(); > CPU_FOREACH(t) { > run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL); > @@ -63,7 +60,6 @@ static int load_normal_reset(S390CPU *cpu) > scc->initial_cpu_reset(CPU(cpu)); > scc->load_normal(CPU(cpu)); > cpu_synchronize_all_post_reset(); > - resume_all_vcpus(); > return 0; > } >
On 16.11.2017 21:57, Christian Borntraeger wrote: > Please change the subject. In busy times I tend to ignore tcg patches. > This code is clearly kvm and tcg. > So what about "s390x/diag:" as prefix? Right, it was a fix for TCG, that's why I added TCG only. But it should have been purely s390x or s390x/diag. > > On 11/16/2017 06:05 PM, David Hildenbrand wrote: >> Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when >> the bios tries to switch to the loaded kernel via DIAG 308. >> >> pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. >> >> And there is also no need for it. run_on_cpu() will make sure that the >> CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no >> longer run. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> target/s390x/diag.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/target/s390x/diag.c b/target/s390x/diag.c >> index dbbb9e886f..52bc348808 100644 >> --- a/target/s390x/diag.c >> +++ b/target/s390x/diag.c >> @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) >> S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); >> CPUState *t; >> >> - pause_all_vcpus(); > > > I did this to prevent a "still running CPU to restart an already stopped one". > Are we sure that this can not happen? Interesting question. I thought it would not be a problem but the way locking with run_on_cpu() is handled is tricky. Now I am worried about a couple of deadlocks. pause_all_vcpus() actually gives up the qemu_global_mutex, so anybody waiting for that mutex can continue. We have a deadlock if two CPUs at the same time would call pause_all_vcpus(). E.g. two CPUs executing at the same time a DIAG 308 (unlikely). run_on_cpu temporarily gives up the qemu_global_mutex. If two CPUs call a run_on_cpu at the same time against each other, we might also have a deadlock. Two CPUs simultaneously trying to send a SIGP START/STOP/RESTART cannot run into a deadlock as they are protected by the SIGP mutex with a trylock. So even with pause_all_vcpus() I think we have a problem if another CPU sends a SIGP to the issuing DIAG CPU and expects the run_on_cpu to trigger. Deadlock, but unlikely I assume? Let's defer this patch for now, booting with 1 VCPU is not degraded and it looked easier than it is. Maybe fixing pause_all_vcpus() to work with more than one CPU in single threaded TCG is an (easier) alternative and at least keeps the (tested) state. 2.12 material.
diff --git a/target/s390x/diag.c b/target/s390x/diag.c index dbbb9e886f..52bc348808 100644 --- a/target/s390x/diag.c +++ b/target/s390x/diag.c @@ -27,7 +27,6 @@ static int modified_clear_reset(S390CPU *cpu) S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); CPUState *t; - pause_all_vcpus(); cpu_synchronize_all_states(); CPU_FOREACH(t) { run_on_cpu(t, s390_do_cpu_full_reset, RUN_ON_CPU_NULL); @@ -37,7 +36,6 @@ static int modified_clear_reset(S390CPU *cpu) s390_crypto_reset(); scc->load_normal(CPU(cpu)); cpu_synchronize_all_post_reset(); - resume_all_vcpus(); return 0; } @@ -53,7 +51,6 @@ static int load_normal_reset(S390CPU *cpu) S390CPUClass *scc = S390_CPU_GET_CLASS(cpu); CPUState *t; - pause_all_vcpus(); cpu_synchronize_all_states(); CPU_FOREACH(t) { run_on_cpu(t, s390_do_cpu_reset, RUN_ON_CPU_NULL); @@ -63,7 +60,6 @@ static int load_normal_reset(S390CPU *cpu) scc->initial_cpu_reset(CPU(cpu)); scc->load_normal(CPU(cpu)); cpu_synchronize_all_post_reset(); - resume_all_vcpus(); return 0; }
Currently, single threaded TCG with > 1 VCPU gets stuck during IPL, when the bios tries to switch to the loaded kernel via DIAG 308. pause_all_vcpus()/resume_all_vcpus() should not be called from a VCPU. And there is also no need for it. run_on_cpu() will make sure that the CPUs exit KVM/TCG, where they get stopped. Once stopped, they will no longer run. Signed-off-by: David Hildenbrand <david@redhat.com> --- target/s390x/diag.c | 4 ---- 1 file changed, 4 deletions(-)