Patchwork [RFC] KVM: PPC: Add level based interrupt logic

login
register
mail settings
Submitter Alexander Graf
Date Aug. 31, 2010, 10:07 a.m.
Message ID <1283249237-31881-1-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/63221/
State New
Headers show

Comments

Alexander Graf - Aug. 31, 2010, 10:07 a.m.
KVM on PowerPC used to have completely broken interrupt logic. Usually,
interrupts work by having a PIC that pulls a line up/down, so the CPU knows
that an interrupt is active. This line stays active until some action is
done to the PIC to release the line.

On KVM for PPC, we just checked if there was an interrupt pending and pulled
a line in the kernel module. We never released it though, hoping that kernel
space would just declare an interrupt as released when injected - which is
wrong.

To fix this, we need to completely redesign the interrupt injection logic.
Whenever an interrupt line gets triggered, we need to notify kernel space
that the line is up. Whenever it gets released, we do the same. This way
we can assure that the interrupt state is always known to kernel space.

This fixes random stalls in KVM guests on PowerPC that were waiting for
an interrupt while everyone else thought they received it already.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

This adds a new generic kvm-all callback called 'kvm_set_interrupt'. I figured
more archs than just PPC could use that and I didn't like the idea of having
PPC KVM specific code in hw/ :).

Since this touches generic code, I'm sending it out as RFC first. Please
comment and tell me what you think. Could maybe even x86 benefit from this?
Does x86 even work w/o in-kernel APIC and I/O thread? Who guarantees that
we get back into pre_run after CPU_INTERRUPT_HARD gets set?

Alex

---
 hw/ppc.c             |    8 ++++++++
 kvm-all.c            |    9 +++++++++
 kvm-stub.c           |    5 +++++
 kvm.h                |    3 +++
 target-ppc/kvm.c     |   37 +++++++++++++++++++++++++++++++++++--
 target-ppc/kvm_ppc.h |   12 ++++++++++++
 6 files changed, 72 insertions(+), 2 deletions(-)
Avi Kivity - Sept. 1, 2010, 7:41 a.m.
On 08/31/2010 01:07 PM, Alexander Graf wrote:
> KVM on PowerPC used to have completely broken interrupt logic. Usually,
> interrupts work by having a PIC that pulls a line up/down, so the CPU knows
> that an interrupt is active. This line stays active until some action is
> done to the PIC to release the line.
>
> On KVM for PPC, we just checked if there was an interrupt pending and pulled
> a line in the kernel module. We never released it though, hoping that kernel
> space would just declare an interrupt as released when injected - which is
> wrong.
>
> To fix this, we need to completely redesign the interrupt injection logic.
> Whenever an interrupt line gets triggered, we need to notify kernel space
> that the line is up. Whenever it gets released, we do the same. This way
> we can assure that the interrupt state is always known to kernel space.
>
> This fixes random stalls in KVM guests on PowerPC that were waiting for
> an interrupt while everyone else thought they received it already.

This is more or less equivalent to KVM_IRQ_LINE.
Alexander Graf - Sept. 1, 2010, 9:38 a.m.
On 01.09.2010, at 09:41, Avi Kivity wrote:

> On 08/31/2010 01:07 PM, Alexander Graf wrote:
>> KVM on PowerPC used to have completely broken interrupt logic. Usually,
>> interrupts work by having a PIC that pulls a line up/down, so the CPU knows
>> that an interrupt is active. This line stays active until some action is
>> done to the PIC to release the line.
>> 
>> On KVM for PPC, we just checked if there was an interrupt pending and pulled
>> a line in the kernel module. We never released it though, hoping that kernel
>> space would just declare an interrupt as released when injected - which is
>> wrong.
>> 
>> To fix this, we need to completely redesign the interrupt injection logic.
>> Whenever an interrupt line gets triggered, we need to notify kernel space
>> that the line is up. Whenever it gets released, we do the same. This way
>> we can assure that the interrupt state is always known to kernel space.
>> 
>> This fixes random stalls in KVM guests on PowerPC that were waiting for
>> an interrupt while everyone else thought they received it already.
> 
> This is more or less equivalent to KVM_IRQ_LINE.

My question was if you think the internal C interface is generic enough or if it needs a lot more magic for x86 anyways :).


Alex
Avi Kivity - Sept. 1, 2010, 11:45 a.m.
On 09/01/2010 12:38 PM, Alexander Graf wrote:
> On 01.09.2010, at 09:41, Avi Kivity wrote:
>
>> On 08/31/2010 01:07 PM, Alexander Graf wrote:
>>> KVM on PowerPC used to have completely broken interrupt logic. Usually,
>>> interrupts work by having a PIC that pulls a line up/down, so the CPU knows
>>> that an interrupt is active. This line stays active until some action is
>>> done to the PIC to release the line.
>>>
>>> On KVM for PPC, we just checked if there was an interrupt pending and pulled
>>> a line in the kernel module. We never released it though, hoping that kernel
>>> space would just declare an interrupt as released when injected - which is
>>> wrong.
>>>
>>> To fix this, we need to completely redesign the interrupt injection logic.
>>> Whenever an interrupt line gets triggered, we need to notify kernel space
>>> that the line is up. Whenever it gets released, we do the same. This way
>>> we can assure that the interrupt state is always known to kernel space.
>>>
>>> This fixes random stalls in KVM guests on PowerPC that were waiting for
>>> an interrupt while everyone else thought they received it already.
>> This is more or less equivalent to KVM_IRQ_LINE.
> My question was if you think the internal C interface is generic enough or if it needs a lot more magic for x86 anyways :).
>

So you noticed I avoided it.  Well, being forced to look, I don't think 
it's worthwhile to try to be generic here.  Both the PIC<->APIC and the 
APIC<->core interfaces are too complicated to be modelled by a single line.
Alexander Graf - Sept. 1, 2010, 11:47 a.m.
On 01.09.2010, at 13:45, Avi Kivity wrote:

> On 09/01/2010 12:38 PM, Alexander Graf wrote:
>> On 01.09.2010, at 09:41, Avi Kivity wrote:
>> 
>>> On 08/31/2010 01:07 PM, Alexander Graf wrote:
>>>> KVM on PowerPC used to have completely broken interrupt logic. Usually,
>>>> interrupts work by having a PIC that pulls a line up/down, so the CPU knows
>>>> that an interrupt is active. This line stays active until some action is
>>>> done to the PIC to release the line.
>>>> 
>>>> On KVM for PPC, we just checked if there was an interrupt pending and pulled
>>>> a line in the kernel module. We never released it though, hoping that kernel
>>>> space would just declare an interrupt as released when injected - which is
>>>> wrong.
>>>> 
>>>> To fix this, we need to completely redesign the interrupt injection logic.
>>>> Whenever an interrupt line gets triggered, we need to notify kernel space
>>>> that the line is up. Whenever it gets released, we do the same. This way
>>>> we can assure that the interrupt state is always known to kernel space.
>>>> 
>>>> This fixes random stalls in KVM guests on PowerPC that were waiting for
>>>> an interrupt while everyone else thought they received it already.
>>> This is more or less equivalent to KVM_IRQ_LINE.
>> My question was if you think the internal C interface is generic enough or if it needs a lot more magic for x86 anyways :).
>> 
> 
> So you noticed I avoided it.  Well, being forced to look, I don't think it's worthwhile to try to be generic here.  Both the PIC<->APIC and the APIC<->core interfaces are too complicated to be modelled by a single line.

Makes sense. Well, I guess it doesn't hurt to have the interface as is and only implement it for PPC for now.


Alex
Avi Kivity - Sept. 1, 2010, 11:56 a.m.
On 09/01/2010 02:47 PM, Alexander Graf wrote:
> On 01.09.2010, at 13:45, Avi Kivity wrote:
>
>> On 09/01/2010 12:38 PM, Alexander Graf wrote:
>>> On 01.09.2010, at 09:41, Avi Kivity wrote:
>>>
>>>> On 08/31/2010 01:07 PM, Alexander Graf wrote:
>>>>> KVM on PowerPC used to have completely broken interrupt logic. Usually,
>>>>> interrupts work by having a PIC that pulls a line up/down, so the CPU knows
>>>>> that an interrupt is active. This line stays active until some action is
>>>>> done to the PIC to release the line.
>>>>>
>>>>> On KVM for PPC, we just checked if there was an interrupt pending and pulled
>>>>> a line in the kernel module. We never released it though, hoping that kernel
>>>>> space would just declare an interrupt as released when injected - which is
>>>>> wrong.
>>>>>
>>>>> To fix this, we need to completely redesign the interrupt injection logic.
>>>>> Whenever an interrupt line gets triggered, we need to notify kernel space
>>>>> that the line is up. Whenever it gets released, we do the same. This way
>>>>> we can assure that the interrupt state is always known to kernel space.
>>>>>
>>>>> This fixes random stalls in KVM guests on PowerPC that were waiting for
>>>>> an interrupt while everyone else thought they received it already.
>>>> This is more or less equivalent to KVM_IRQ_LINE.
>>> My question was if you think the internal C interface is generic enough or if it needs a lot more magic for x86 anyways :).
>>>
>> So you noticed I avoided it.  Well, being forced to look, I don't think it's worthwhile to try to be generic here.  Both the PIC<->APIC and the APIC<->core interfaces are too complicated to be modelled by a single line.
> Makes sense. Well, I guess it doesn't hurt to have the interface as is and only implement it for PPC for now.
>
>

Why not limit it to ppc?  Someone might call it by accident.
Alexander Graf - Sept. 1, 2010, 11:58 a.m.
On 01.09.2010, at 13:56, Avi Kivity wrote:

> On 09/01/2010 02:47 PM, Alexander Graf wrote:
>> On 01.09.2010, at 13:45, Avi Kivity wrote:
>> 
>>> On 09/01/2010 12:38 PM, Alexander Graf wrote:
>>>> On 01.09.2010, at 09:41, Avi Kivity wrote:
>>>> 
>>>>> On 08/31/2010 01:07 PM, Alexander Graf wrote:
>>>>>> KVM on PowerPC used to have completely broken interrupt logic. Usually,
>>>>>> interrupts work by having a PIC that pulls a line up/down, so the CPU knows
>>>>>> that an interrupt is active. This line stays active until some action is
>>>>>> done to the PIC to release the line.
>>>>>> 
>>>>>> On KVM for PPC, we just checked if there was an interrupt pending and pulled
>>>>>> a line in the kernel module. We never released it though, hoping that kernel
>>>>>> space would just declare an interrupt as released when injected - which is
>>>>>> wrong.
>>>>>> 
>>>>>> To fix this, we need to completely redesign the interrupt injection logic.
>>>>>> Whenever an interrupt line gets triggered, we need to notify kernel space
>>>>>> that the line is up. Whenever it gets released, we do the same. This way
>>>>>> we can assure that the interrupt state is always known to kernel space.
>>>>>> 
>>>>>> This fixes random stalls in KVM guests on PowerPC that were waiting for
>>>>>> an interrupt while everyone else thought they received it already.
>>>>> This is more or less equivalent to KVM_IRQ_LINE.
>>>> My question was if you think the internal C interface is generic enough or if it needs a lot more magic for x86 anyways :).
>>>> 
>>> So you noticed I avoided it.  Well, being forced to look, I don't think it's worthwhile to try to be generic here.  Both the PIC<->APIC and the APIC<->core interfaces are too complicated to be modelled by a single line.
>> Makes sense. Well, I guess it doesn't hurt to have the interface as is and only implement it for PPC for now.
>> 
>> 
> 
> Why not limit it to ppc?  Someone might call it by accident.

How would you model it? Call kvm_arch_set_interrupt in hw/ppc.c? Which header would I define the call in?


Alex
Avi Kivity - Sept. 1, 2010, 12:09 p.m.
On 09/01/2010 02:58 PM, Alexander Graf wrote:
>
>> Why not limit it to ppc?  Someone might call it by accident.
> How would you model it? Call kvm_arch_set_interrupt in hw/ppc.c? Which header would I define the call in?
>

Well, eventually this finds its way to cpu code in target-ppc, no? from 
there you can call a function in target-ppc/kvm.c without any need for 
tricks.

So just hack it there.
Avi Kivity - Sept. 1, 2010, 12:09 p.m.
On 09/01/2010 03:09 PM, Avi Kivity wrote:
>  On 09/01/2010 02:58 PM, Alexander Graf wrote:
>>
>>> Why not limit it to ppc?  Someone might call it by accident.
>> How would you model it? Call kvm_arch_set_interrupt in hw/ppc.c? 
>> Which header would I define the call in?
>>
>
> Well, eventually this finds its way to cpu code in target-ppc, no? 
> from there you can call a function in target-ppc/kvm.c without any 
> need for tricks.
>
> So just hack it there.
>

Another way to look at this: shouldn't this be the same for kvm and non-kvm?
Alexander Graf - Sept. 1, 2010, 12:10 p.m.
On 01.09.2010, at 14:09, Avi Kivity wrote:

> On 09/01/2010 02:58 PM, Alexander Graf wrote:
>> 
>>> Why not limit it to ppc?  Someone might call it by accident.
>> How would you model it? Call kvm_arch_set_interrupt in hw/ppc.c? Which header would I define the call in?
>> 
> 
> Well, eventually this finds its way to cpu code in target-ppc, no? from there you can call a function in target-ppc/kvm.c without any need for tricks.

Unfortunately all the interrupt magic is not cpu code. The qemu model is broken there. hw/ code just directly pokes env-> variables to trigger hard interrupts.


Alex

Patch

diff --git a/hw/ppc.c b/hw/ppc.c
index 2a77eb9..87006f7 100644
--- a/hw/ppc.c
+++ b/hw/ppc.c
@@ -28,6 +28,7 @@ 
 #include "nvram.h"
 #include "qemu-log.h"
 #include "loader.h"
+#include "kvm.h"
 
 //#define PPC_DEBUG_IRQ
 //#define PPC_DEBUG_TB
@@ -50,6 +51,8 @@  static void cpu_ppc_tb_start (CPUState *env);
 
 static void ppc_set_irq (CPUState *env, int n_IRQ, int level)
 {
+    unsigned int old_pending = env->pending_interrupts;
+
     if (level) {
         env->pending_interrupts |= 1 << n_IRQ;
         cpu_interrupt(env, CPU_INTERRUPT_HARD);
@@ -58,6 +61,11 @@  static void ppc_set_irq (CPUState *env, int n_IRQ, int level)
         if (env->pending_interrupts == 0)
             cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
     }
+
+    if (old_pending != env->pending_interrupts) {
+        kvm_set_interrupt(env, n_IRQ, level);
+    }
+
     LOG_IRQ("%s: %p n_IRQ %d level %d => pending %08" PRIx32
                 "req %08x\n", __func__, env, n_IRQ, level,
                 env->pending_interrupts, env->interrupt_request);
diff --git a/kvm-all.c b/kvm-all.c
index 58b0404..65b1ad2 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1291,3 +1291,12 @@  int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
     return -ENOSYS;
 #endif
 }
+
+int kvm_set_interrupt(CPUState *env, int irq, int level)
+{
+#ifdef TARGET_PPC
+    return kvm_arch_set_interrupt(env, irq, level);
+#else
+    return 0;
+#endif
+}
diff --git a/kvm-stub.c b/kvm-stub.c
index d45f9fa..e73bc5a 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -141,3 +141,8 @@  int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign)
 {
     return -ENOSYS;
 }
+
+int kvm_set_interrupt(CPUState *env, int irq, int level)
+{
+    return 0;
+}
diff --git a/kvm.h b/kvm.h
index 50b6c01..c14c26d 100644
--- a/kvm.h
+++ b/kvm.h
@@ -68,6 +68,7 @@  int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap);
 #ifndef _WIN32
 int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset);
 #endif
+int kvm_set_interrupt(CPUState *env, int irq, int level);
 
 int kvm_pit_in_kernel(void);
 int kvm_irqchip_in_kernel(void);
@@ -151,6 +152,8 @@  void kvm_cpu_synchronize_state(CPUState *env);
 void kvm_cpu_synchronize_post_reset(CPUState *env);
 void kvm_cpu_synchronize_post_init(CPUState *env);
 
+int kvm_arch_set_interrupt(CPUState *env, int irq, int level);
+
 /* generic hooks - to be moved/refactored once there are more users */
 
 static inline void cpu_synchronize_state(CPUState *env)
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 14d6365..35e3712 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -37,6 +37,9 @@ 
     do { } while (0)
 #endif
 
+static int cap_interrupt_unset = false;
+static int cap_interrupt_level = false;
+
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
  *     takes but ignores it, goes to sleep and never gets notified that there's
@@ -55,6 +58,18 @@  static void kvm_kick_env(void *env)
 
 int kvm_arch_init(KVMState *s, int smp_cpus)
 {
+#ifdef KVM_CAP_PPC_UNSET_IRQ
+    cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
+#endif
+#ifdef KVM_CAP_PPC_IRQ_LEVEL
+    cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
+#endif
+
+    if (!cap_interrupt_level) {
+        fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
+                        "VM to stall at times!\n");
+    }
+
     return 0;
 }
 
@@ -178,6 +193,23 @@  int kvm_arch_get_registers(CPUState *env)
     return 0;
 }
 
+int kvm_arch_set_interrupt(CPUState *env, int irq, int level)
+{
+    unsigned virq = level ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET;
+
+    if (irq != PPC_INTERRUPT_EXT) {
+        return 0;
+    }
+
+    if (!kvm_enabled() || !cap_interrupt_unset || !cap_interrupt_level) {
+        return 0;
+    }
+
+    kvm_vcpu_ioctl(env, KVM_INTERRUPT, &virq);
+
+    return 0;
+}
+
 #if defined(TARGET_PPCEMB)
 #define PPC_INPUT_INT PPC40x_INPUT_INT
 #elif defined(TARGET_PPC64)
@@ -193,7 +225,8 @@  int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 
     /* PowerPC Qemu tracks the various core input pins (interrupt, critical
      * interrupt, reset, etc) in PPC-specific env->irq_input_state. */
-    if (run->ready_for_interrupt_injection &&
+    if (!cap_interrupt_level &&
+        run->ready_for_interrupt_injection &&
         (env->interrupt_request & CPU_INTERRUPT_HARD) &&
         (env->irq_input_state & (1<<PPC_INPUT_INT)))
     {
@@ -201,7 +234,7 @@  int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
          * future KVM could cache it in-kernel to avoid a heavyweight exit
          * when reading the UIC.
          */
-        irq = -1U;
+        irq = KVM_INTERRUPT_SET;
 
         dprintf("injected interrupt %d\n", irq);
         r = kvm_vcpu_ioctl(env, KVM_INTERRUPT, &irq);
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 65e31c9..e88423a 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -17,4 +17,16 @@  int kvmppc_read_host_property(const char *node_path, const char *prop,
 uint32_t kvmppc_get_tbfreq(void);
 int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len);
 
+#ifndef KVM_INTERRUPT_SET
+#define KVM_INTERRUPT_SET -1
+#endif
+
+#ifndef KVM_INTERRUPT_UNSET
+#define KVM_INTERRUPT_UNSET -2
+#endif
+
+#ifndef KVM_INTERRUPT_SET_LEVEL
+#define KVM_INTERRUPT_SET_LEVEL -3
+#endif
+
 #endif /* __KVM_PPC_H__ */