Patchwork [RFC,6/6] kvm/ppc/mpic: in-kernel MPIC emulation

login
register
mail settings
Submitter Scott Wood
Date Feb. 14, 2013, 5:49 a.m.
Message ID <1360820960-12537-7-git-send-email-scottwood@freescale.com>
Download mbox | patch
Permalink /patch/220362/
State New
Headers show

Comments

Scott Wood - Feb. 14, 2013, 5:49 a.m.
Hook the MPIC code up to the KVM interfaces, add locking, etc.

TODO: irqfd support

Signed-off-by: Scott Wood <scottwood@freescale.com>
---
 Documentation/virtual/kvm/devices/mpic.txt |   36 ++
 arch/powerpc/include/asm/kvm_host.h        |    9 +-
 arch/powerpc/include/asm/kvm_ppc.h         |    4 +
 arch/powerpc/kvm/Kconfig                   |    5 +
 arch/powerpc/kvm/Makefile                  |    2 +
 arch/powerpc/kvm/booke.c                   |   10 +-
 arch/powerpc/kvm/mpic.c                    |  875 +++++++++++++++++++++++-----
 arch/powerpc/kvm/powerpc.c                 |   12 +-
 include/linux/kvm_host.h                   |    4 +-
 include/uapi/linux/kvm.h                   |   17 +-
 virt/kvm/kvm_main.c                        |   12 +
 11 files changed, 822 insertions(+), 164 deletions(-)
 create mode 100644 Documentation/virtual/kvm/devices/mpic.txt
Alexander Graf - March 21, 2013, 8:28 a.m.
On 14.02.2013, at 06:49, Scott Wood wrote:

> Hook the MPIC code up to the KVM interfaces, add locking, etc.
> 
> TODO: irqfd support
> 
> Signed-off-by: Scott Wood <scottwood@freescale.com>

Could you please split this patch up on your next respin? Also please make sure you don't have #if 0'ed code in here. Just return to user space with an error when you encounter something you don't know how to handle.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - March 21, 2013, 2:43 p.m.
On 03/21/2013 03:28:35 AM, Alexander Graf wrote:
> 
> On 14.02.2013, at 06:49, Scott Wood wrote:
> 
> > Hook the MPIC code up to the KVM interfaces, add locking, etc.
> >
> > TODO: irqfd support
> >
> > Signed-off-by: Scott Wood <scottwood@freescale.com>
> 
> Could you please split this patch up on your next respin?

Any particular split you're looking for?

The only reason it's split as much as it is already is to give some  
chance of merging updates from QEMU being less painful.  As far as the  
kernel is concerned, this is new code, which is not functional (and  
thus not built) before this patch.  There aren't meaningful  
intermediate states.

> Also please make sure you don't have #if 0'ed code in here.

Well, yeah.  Note the RFC. :-)

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - March 21, 2013, 2:52 p.m.
On 21.03.2013, at 15:43, Scott Wood wrote:

> On 03/21/2013 03:28:35 AM, Alexander Graf wrote:
>> On 14.02.2013, at 06:49, Scott Wood wrote:
>> > Hook the MPIC code up to the KVM interfaces, add locking, etc.
>> >
>> > TODO: irqfd support
>> >
>> > Signed-off-by: Scott Wood <scottwood@freescale.com>
>> Could you please split this patch up on your next respin?
> 
> Any particular split you're looking for?

Anything that makes reviewing it easier :). I can't concentrate for 100k straight.

> The only reason it's split as much as it is already is to give some chance of merging updates from QEMU being less painful.  As far as the kernel is concerned, this is new code, which is not functional (and thus not built) before this patch.  There aren't meaningful intermediate states.
> 
>> Also please make sure you don't have #if 0'ed code in here.
> 
> Well, yeah.  Note the RFC. :-)

Just wanted to make sure you don't forget them when you send out a non-RFC :). Not that I'd assume you'd do that ;)


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Documentation/virtual/kvm/devices/mpic.txt b/Documentation/virtual/kvm/devices/mpic.txt
new file mode 100644
index 0000000..1ef30f0
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/mpic.txt
@@ -0,0 +1,36 @@ 
+MPIC interrupt controller
+=========================
+
+Device types supported:
+  KVM_DEV_TYPE_FSL_MPIC_20     Freescale MPIC v2.0
+  KVM_DEV_TYPE_FSL_MPIC_42     Freescale MPIC v4.2
+
+Only one MPIC instance, of any type, may be instantiated.  The created
+MPIC will act as the system interrupt controller, connecting to each
+vcpu's interrupt inputs.
+
+Groups:
+  KVM_DEV_MPIC_GRP_MISC
+  Attributes:
+    KVM_DEV_MPIC_BASE_ADDR (rw, 64-bit)
+      Base address of the 256 KiB MPIC register space.  Must be
+      naturally aligned.  A value of zero disables the mapping.
+      Reset value is zero.
+
+  KVM_DEV_MPIC_GRP_REGISTER (rw, 32-bit)
+    Access MPIC register state.  "attr" is the byte offset into
+    the MPIC register space.  Accesses must be 4-byte aligned.
+
+    MSIs may be signaled by using this attribute group to write
+    to the relevant MSIIR.
+
+  KVM_DEV_MPIC_GRP_IRQ_ACTIVE (rw, 32-bit)
+    IRQ input line for each standard openpic source.  0 is inactive and 1
+    is active, regardless of interrupt sense.
+
+    For edge-triggered interrupts:  Writing 1 is considered an activating
+    edge, and writing 0 is ignored.  Reading returns 1 if a previously
+    signaled edge has not been acknowledged, and 0 otherwise.
+
+    "attr" is the IRQ number.  IRQ numbers for standard sources are the
+    byte offset of the relevant IVPR from EIVPR0, divided by 32.
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8a72d59..be81c7a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -256,6 +256,7 @@  struct kvm_arch {
 #ifdef CONFIG_PPC_BOOK3S_64
 	struct list_head spapr_tce_tables;
 #endif
+	void *irqchip_priv;
 };
 
 /*
@@ -359,6 +360,11 @@  struct kvmppc_slb {
 #define KVMPPC_BOOKE_MAX_IAC	4
 #define KVMPPC_BOOKE_MAX_DAC	2
 
+/* KVMPPC_EPR_USER takes precedence over KVMPPC_EPR_KERNEL */
+#define KVMPPC_EPR_NONE		0 /* EPR not supported */
+#define KVMPPC_EPR_USER		1 /* exit to userspace to fill EPR */
+#define KVMPPC_EPR_KERNEL	2 /* in-kernel irqchip */
+
 struct kvmppc_booke_debug_reg {
 	u32 dbcr0;
 	u32 dbcr1;
@@ -520,7 +526,7 @@  struct kvm_vcpu_arch {
 	u8 sane;
 	u8 cpu_type;
 	u8 hcall_needed;
-	u8 epr_enabled;
+	u8 epr_flags; /* KVMPPC_EPR_xxx */
 	u8 epr_needed;
 
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
@@ -587,5 +593,6 @@  struct kvm_vcpu_arch {
 #define KVM_MMIO_REG_FQPR	0x0060
 
 #define __KVM_HAVE_ARCH_WQP
+#define __KVM_HAVE_CREATE_DEVICE
 
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 44a657a..d46504d 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -165,6 +165,8 @@  extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);
 
 extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *);
 
+int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
@@ -271,6 +273,8 @@  static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr)
 #endif
 }
 
+void kvmppc_mpic_set_epr(struct kvm_vcpu *vcpu);
+
 int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
 			      struct kvm_config_tlb *cfg);
 int kvm_vcpu_ioctl_dirty_tlb(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 4730c95..18d5e72 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -151,6 +151,11 @@  config KVM_E500MC
 
 	  If unsure, say N.
 
+config KVM_MPIC
+	bool "KVM in-kernel MPIC emulation"
+	depends on KVM
+
+
 source drivers/vhost/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index b772ede..4a2277a 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -103,6 +103,8 @@  kvm-book3s_32-objs := \
 	book3s_32_mmu.o
 kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs)
 
+kvm-objs-$(CONFIG_KVM_MPIC) += mpic.o
+
 kvm-objs := $(kvm-objs-m) $(kvm-objs-y)
 
 obj-$(CONFIG_KVM_440) += kvm.o
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 020923e..8483cb2 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -347,7 +347,7 @@  static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 		keep_irq = true;
 	}
 
-	if ((priority == BOOKE_IRQPRIO_EXTERNAL) && vcpu->arch.epr_enabled)
+	if ((priority == BOOKE_IRQPRIO_EXTERNAL) && vcpu->arch.epr_flags)
 		update_epr = true;
 
 	switch (priority) {
@@ -428,8 +428,12 @@  static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 			set_guest_esr(vcpu, vcpu->arch.queued_esr);
 		if (update_dear == true)
 			set_guest_dear(vcpu, vcpu->arch.queued_dear);
-		if (update_epr == true)
-			kvm_make_request(KVM_REQ_EPR_EXIT, vcpu);
+		if (update_epr == true) {
+			if (vcpu->arch.epr_flags & KVMPPC_EPR_USER)
+				kvm_make_request(KVM_REQ_EPR_EXIT, vcpu);
+			else if (vcpu->arch.epr_flags & KVMPPC_EPR_KERNEL)
+				kvmppc_mpic_set_epr(vcpu);
+		}
 
 		new_msr &= msr_mask;
 #if defined(CONFIG_64BIT)
diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 1df67ae..27040e4 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -23,6 +23,18 @@ 
  * THE SOFTWARE.
  */
 
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/kvm_host.h>
+#include <linux/errno.h>
+#include <linux/notifier.h>
+#include <asm/uaccess.h>
+#include <asm/mpic.h>
+#include <asm/kvm_para.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_ppc.h>
+#include "iodev.h"
+
 #define MAX_CPU     32
 #define MAX_SRC     256
 #define MAX_TMR     4
@@ -89,6 +101,7 @@  static struct fsl_mpic_info fsl_mpic_42 = {
 #define ILR_INTTGT_INT    0x00
 #define ILR_INTTGT_CINT   0x01	/* critical */
 #define ILR_INTTGT_MCP    0x02	/* machine check */
+#define NUM_OUTPUTS       3
 
 #define MSIIR_OFFSET       0x140
 #define MSIIR_SRS_SHIFT    29
@@ -98,18 +111,14 @@  static struct fsl_mpic_info fsl_mpic_42 = {
 
 static int get_current_cpu(void)
 {
-	CPUState *cpu_single_cpu;
-
-	if (!cpu_single_env)
-		return -1;
-
-	cpu_single_cpu = ENV_GET_CPU(cpu_single_env);
-	return cpu_single_cpu->cpu_index;
+	struct kvm_vcpu *vcpu = current->thread.kvm_vcpu;
+	return vcpu ? vcpu->vcpu_id : -1;
 }
 
-static uint32_t openpic_cpu_read_internal(void *opaque, gpa_t addr, int idx);
-static void openpic_cpu_write_internal(void *opaque, gpa_t addr,
-				       uint32_t val, int idx);
+static int openpic_cpu_write_internal(struct kvm_io_device *this, gpa_t addr,
+				      u32 val, int idx);
+static int openpic_cpu_read_internal(struct kvm_io_device *this, gpa_t addr,
+				     u32 *ptr, int idx);
 
 enum irq_type {
 	IRQ_TYPE_NORMAL = 0,
@@ -131,7 +140,7 @@  struct irq_source {
 	uint32_t idr;		/* IRQ destination register */
 	uint32_t destmask;	/* bitmap of CPU destinations */
 	int last_cpu;
-	int output;		/* IRQ level, e.g. OPENPIC_OUTPUT_INT */
+	int output;		/* IRQ level, e.g. ILR_INTTGT_INT */
 	int pending;		/* TRUE if IRQ is pending */
 	enum irq_type type;
 	bool level:1;		/* level-triggered */
@@ -158,16 +167,35 @@  struct irq_source {
 #define IDR_CI      0x40000000	/* critical interrupt */
 
 struct irq_dest {
+	struct kvm_vcpu *vcpu;
+
 	int32_t ctpr;		/* CPU current task priority */
 	struct irq_queue raised;
 	struct irq_queue servicing;
-	qemu_irq *irqs;
 
 	/* Count of IRQ sources asserting on non-INT outputs */
-	uint32_t outputs_active[OPENPIC_OUTPUT_NB];
+	uint32_t outputs_active[NUM_OUTPUTS];
+};
+
+struct openpic;
+
+struct sub_region {
+	struct kvm_io_device iodev;
+	struct openpic *opp;
+	gpa_t base;
+	int size;
 };
 
 struct openpic {
+	struct kvm_device dev;
+	struct kvm *kvm;
+	gpa_t reg_base;
+	spinlock_t lock;
+	struct notifier_block vcpu_notifier;
+
+	struct sub_region sub_io_mem[6];
+	int sub_count;
+
 	/* Behavior control */
 	struct fsl_mpic_info *fsl;
 	uint32_t model;
@@ -208,6 +236,51 @@  struct openpic {
 	uint32_t irq_msi;
 };
 
+
+static void mpic_irq_raise(struct openpic *opp, struct irq_dest *dst,
+			   int output)
+{
+	struct kvm_interrupt irq = {
+		.irq = KVM_INTERRUPT_SET_LEVEL,
+	};
+
+	if (!dst->vcpu) {
+		pr_debug("%s: destination cpu %d does not exist\n",
+			 __func__, dst - &opp->dst[0]);
+		return;
+	}
+
+	pr_debug("%s: cpu %d output %d\n", __func__, dst->vcpu->vcpu_id,
+		output);
+
+	if (output != ILR_INTTGT_INT)	/* TODO */
+		return;
+
+	kvm_vcpu_ioctl_interrupt(dst->vcpu, &irq);
+}
+
+static void mpic_irq_lower(struct openpic *opp, struct irq_dest *dst,
+			   int output)
+{
+	struct kvm_interrupt irq = {
+		.irq = KVM_INTERRUPT_UNSET,
+	};
+
+	if (!dst->vcpu) {
+		pr_debug("%s: destination cpu %d does not exist\n",
+			 __func__, dst - &opp->dst[0]);
+		return;
+	}
+
+	pr_debug("%s: cpu %d output %d\n", __func__, dst->vcpu->vcpu_id,
+		output);
+
+	if (output != ILR_INTTGT_INT)	/* TODO */
+		return;
+
+	kvmppc_core_dequeue_external(dst->vcpu, &irq);
+}
+
 static inline void IRQ_setbit(struct irq_queue *q, int n_IRQ)
 {
 	set_bit(n_IRQ, q->queue);
@@ -268,7 +341,7 @@  static void IRQ_local_pipe(struct openpic *opp, int n_CPU, int n_IRQ,
 	pr_debug("%s: IRQ %d active %d was %d\n",
 		__func__, n_IRQ, active, was_active);
 
-	if (src->output != OPENPIC_OUTPUT_INT) {
+	if (src->output != ILR_INTTGT_INT) {
 		pr_debug("%s: output %d irq %d active %d was %d count %d\n",
 			__func__, src->output, n_IRQ, active, was_active,
 			dst->outputs_active[src->output]);
@@ -282,14 +355,14 @@  static void IRQ_local_pipe(struct openpic *opp, int n_CPU, int n_IRQ,
 			    dst->outputs_active[src->output]++ == 0) {
 				pr_debug("%s: Raise OpenPIC output %d cpu %d irq %d\n",
 					__func__, src->output, n_CPU, n_IRQ);
-				qemu_irq_raise(dst->irqs[src->output]);
+				mpic_irq_raise(opp, dst, src->output);
 			}
 		} else {
 			if (was_active &&
 			    --dst->outputs_active[src->output] == 0) {
 				pr_debug("%s: Lower OpenPIC output %d cpu %d irq %d\n",
 					__func__, src->output, n_CPU, n_IRQ);
-				qemu_irq_lower(dst->irqs[src->output]);
+				mpic_irq_lower(opp, dst, src->output);
 			}
 		}
 
@@ -322,8 +395,7 @@  static void IRQ_local_pipe(struct openpic *opp, int n_CPU, int n_IRQ,
 		} else {
 			pr_debug("%s: Raise OpenPIC INT output cpu %d irq %d/%d\n",
 				__func__, n_CPU, n_IRQ, dst->raised.next);
-			qemu_irq_raise(opp->dst[n_CPU].
-				       irqs[OPENPIC_OUTPUT_INT]);
+			mpic_irq_raise(opp, dst, ILR_INTTGT_INT);
 		}
 	} else {
 		IRQ_get_next(opp, &dst->servicing);
@@ -338,8 +410,7 @@  static void IRQ_local_pipe(struct openpic *opp, int n_CPU, int n_IRQ,
 			pr_debug("%s: IRQ %d inactive, current prio %d/%d, CPU %d\n",
 				__func__, n_IRQ, dst->ctpr,
 				dst->servicing.priority, n_CPU);
-			qemu_irq_lower(opp->dst[n_CPU].
-				       irqs[OPENPIC_OUTPUT_INT]);
+			mpic_irq_lower(opp, dst, ILR_INTTGT_INT);
 		}
 	}
 }
@@ -415,8 +486,8 @@  static void openpic_set_irq(void *opaque, int n_IRQ, int level)
 	struct irq_source *src;
 
 	if (n_IRQ >= MAX_IRQ) {
-		pr_err("%s: IRQ %d out of range\n", __func__, n_IRQ);
-		abort();
+		WARN_ONCE(1, "%s: IRQ %d out of range\n", __func__, n_IRQ);
+		return;
 	}
 
 	src = &opp->src[n_IRQ];
@@ -433,7 +504,7 @@  static void openpic_set_irq(void *opaque, int n_IRQ, int level)
 			openpic_update_irq(opp, n_IRQ);
 		}
 
-		if (src->output != OPENPIC_OUTPUT_INT) {
+		if (src->output != ILR_INTTGT_INT) {
 			/* Edge-triggered interrupts shouldn't be used
 			 * with non-INT delivery, but just in case,
 			 * try to make it do something sane rather than
@@ -446,15 +517,14 @@  static void openpic_set_irq(void *opaque, int n_IRQ, int level)
 	}
 }
 
-static void openpic_reset(DeviceState *d)
+static void openpic_reset(struct openpic *opp)
 {
-	struct openpic *opp = FROM_SYSBUS(typeof(*opp), SYS_BUS_DEVICE(d));
 	int i;
 
 	opp->gcr = GCR_RESET;
+
 	/* Initialise controller registers */
 	opp->frr = ((opp->nb_irqs - 1) << FRR_NIRQ_SHIFT) |
-	    ((opp->nb_cpus - 1) << FRR_NCPU_SHIFT) |
 	    (opp->vid << FRR_VID_SHIFT);
 
 	opp->pir = 0;
@@ -504,7 +574,7 @@  static inline uint32_t read_IRQreg_idr(struct openpic *opp, int n_IRQ)
 static inline uint32_t read_IRQreg_ilr(struct openpic *opp, int n_IRQ)
 {
 	if (opp->flags & OPENPIC_FLAG_ILR)
-		return output_to_inttgt(opp->src[n_IRQ].output);
+		return opp->src[n_IRQ].output;
 
 	return 0xffffffff;
 }
@@ -539,7 +609,7 @@  static inline void write_IRQreg_idr(struct openpic *opp, int n_IRQ,
 					__func__);
 			}
 
-			src->output = OPENPIC_OUTPUT_CINT;
+			src->output = ILR_INTTGT_CINT;
 			src->nomask = true;
 			src->destmask = 0;
 
@@ -550,7 +620,7 @@  static inline void write_IRQreg_idr(struct openpic *opp, int n_IRQ,
 					src->destmask |= 1UL << i;
 			}
 		} else {
-			src->output = OPENPIC_OUTPUT_INT;
+			src->output = ILR_INTTGT_INT;
 			src->nomask = false;
 			src->destmask = src->idr & normal_mask;
 		}
@@ -565,7 +635,7 @@  static inline void write_IRQreg_ilr(struct openpic *opp, int n_IRQ,
 	if (opp->flags & OPENPIC_FLAG_ILR) {
 		struct irq_source *src = &opp->src[n_IRQ];
 
-		src->output = inttgt_to_output(val & ILR_INTTGT_MASK);
+		src->output = val & ILR_INTTGT_MASK;
 		pr_debug("Set ILR %d to 0x%08x, output %d\n", n_IRQ, src->idr,
 			src->output);
 
@@ -614,34 +684,77 @@  static inline void write_IRQreg_ivpr(struct openpic *opp, int n_IRQ,
 
 static void openpic_gcr_write(struct openpic *opp, uint64_t val)
 {
+#if 0
 	bool mpic_proxy = false;
+#endif
 
 	if (val & GCR_RESET) {
-		openpic_reset(&opp->busdev.qdev);
+		openpic_reset(opp);
 		return;
 	}
 
 	opp->gcr &= ~opp->mpic_mode_mask;
 	opp->gcr |= val & opp->mpic_mode_mask;
-
+#if 0
 	/* Set external proxy mode */
 	if ((val & opp->mpic_mode_mask) == GCR_MODE_PROXY)
 		mpic_proxy = true;
 
 	ppce500_set_mpic_proxy(mpic_proxy);
+#endif
 }
 
-static void openpic_gbl_write(void *opaque, gpa_t addr, uint64_t val,
-			      unsigned len)
+static int openpic_get_val32(int len, const void *ptr, u32 *val)
 {
-	struct openpic *opp = opaque;
+	if (len != 4) {
+		pr_debug("%s: bad length %d\n", __func__, len);
+		return -EINVAL;
+	}
+
+	memcpy(val, ptr, min(len, 4));
+	return 0;
+}
+
+static int openpic_put_val32(int len, void *ptr, u32 val)
+{
+	/*
+	 * Technically only 32-bit accesses are allowed, but be nice
+	 * to people dumping registers -- it works in real hardware
+	 * (reads only, not writes).
+	 */
+	if (len > 4) {
+		pr_debug("%s: bad length %d\n", __func__, len);
+		return -EINVAL;
+	}
+
+	memcpy(ptr, &val, min(len, 4));
+	return 0;
+}
+
+static int openpic_gbl_write(struct kvm_io_device *this, gpa_t addr,
+			     int len, const void *ptr)
+{
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+#if 0
 	struct irq_dest *dst;
-	int idx;
+#endif
+	u32 val;
+	int ret, idx;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx " <= %08" PRIx64 "\n",
-		__func__, addr, val);
+	ret = openpic_get_val32(len, ptr, &val);
+	if (ret < 0)
+		return 1;
+
+	pr_debug("%s: addr %#llx <= %08x\n", __func__, addr, val);
 	if (addr & 0xF)
-		return;
+		return 0;
+
+	spin_lock_irq(&opp->lock);
 
 	switch (addr) {
 	case 0x00:	/* Block Revision Register1 (BRR1) is Readonly */
@@ -654,7 +767,7 @@  static void openpic_gbl_write(void *opaque, gpa_t addr, uint64_t val,
 	case 0x90:
 	case 0xA0:
 	case 0xB0:
-		openpic_cpu_write_internal(opp, addr, val, get_current_cpu());
+		openpic_cpu_write_internal(this, addr, val, get_current_cpu());
 		break;
 	case 0x1000:		/* FRR */
 		break;
@@ -668,14 +781,18 @@  static void openpic_gbl_write(void *opaque, gpa_t addr, uint64_t val,
 			if ((val & (1 << idx)) && !(opp->pir & (1 << idx))) {
 				pr_debug("Raise OpenPIC RESET output for CPU %d\n",
 					idx);
+#if 0
 				dst = &opp->dst[idx];
-				qemu_irq_raise(dst->irqs[OPENPIC_OUTPUT_RESET]);
+				mpic_irq_raise(opp, dst, OPENPIC_OUTPUT_RESET);
+#endif
 			} else if (!(val & (1 << idx)) &&
 				   (opp->pir & (1 << idx))) {
 				pr_debug("Lower OpenPIC RESET output for CPU %d\n",
 					idx);
+#if 0
 				dst = &opp->dst[idx];
-				qemu_irq_lower(dst->irqs[OPENPIC_OUTPUT_RESET]);
+				mpic_irq_lower(opp, dst, OPENPIC_OUTPUT_RESET);
+#endif
 			}
 		}
 		opp->pir = val;
@@ -695,21 +812,34 @@  static void openpic_gbl_write(void *opaque, gpa_t addr, uint64_t val,
 	default:
 		break;
 	}
+
+	spin_unlock_irq(&opp->lock);
+	return 0;
 }
 
-static uint64_t openpic_gbl_read(void *opaque, gpa_t addr, unsigned len)
+static int openpic_gbl_read(struct kvm_io_device *this, gpa_t addr,
+			    int len, void *ptr)
 {
-	struct openpic *opp = opaque;
-	uint32_t retval;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+	u32 retval;
+	int ret;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx "\n", __func__, addr);
+	pr_debug("%s: addr %#llx\n", __func__, addr);
 	retval = 0xFFFFFFFF;
 	if (addr & 0xF)
-		return retval;
+		goto out;
+
+	spin_lock_irq(&opp->lock);
 
 	switch (addr) {
 	case 0x1000:		/* FRR */
 		retval = opp->frr;
+		retval |= (opp->nb_cpus - 1) << FRR_NCPU_SHIFT;
 		break;
 	case 0x1020:		/* GCR */
 		retval = opp->gcr;
@@ -731,8 +861,8 @@  static uint64_t openpic_gbl_read(void *opaque, gpa_t addr, unsigned len)
 	case 0x90:
 	case 0xA0:
 	case 0xB0:
-		retval =
-		    openpic_cpu_read_internal(opp, addr, get_current_cpu());
+		retval = openpic_cpu_read_internal(this, addr,
+			&retval, get_current_cpu());
 		break;
 	case 0x10A0:		/* IPI_IVPR */
 	case 0x10B0:
@@ -750,33 +880,51 @@  static uint64_t openpic_gbl_read(void *opaque, gpa_t addr, unsigned len)
 	default:
 		break;
 	}
+
+	spin_unlock_irq(&opp->lock);
+out:
 	pr_debug("%s: => 0x%08x\n", __func__, retval);
 
-	return retval;
+	ret = openpic_put_val32(len, ptr, retval);
+	if (ret < 0)
+		return 1;
+
+	return 0;
 }
 
-static void openpic_tmr_write(void *opaque, gpa_t addr, uint64_t val,
-			      unsigned len)
+static int openpic_tmr_write(struct kvm_io_device *this, gpa_t addr,
+			     int len, const void *ptr)
 {
-	struct openpic *opp = opaque;
-	int idx;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+	u32 val;
+	int ret, idx;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
+
+	ret = openpic_get_val32(len, ptr, &val);
+	if (ret < 0)
+		return 1;
 
 	addr += 0x10f0;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx " <= %08" PRIx64 "\n",
-		__func__, addr, val);
+	pr_debug("%s: addr %#llx <= %08x\n", __func__, addr, val);
 	if (addr & 0xF)
-		return;
+		return 0;
 
 	if (addr == 0x10f0) {
 		/* TFRR */
 		opp->tfrr = val;
-		return;
+		return 0;
 	}
 
 	idx = (addr >> 6) & 0x3;
 	addr = addr & 0x30;
 
+	spin_lock_irq(&opp->lock);
+
 	switch (addr & 0x30) {
 	case 0x00:		/* TCCR */
 		break;
@@ -795,15 +943,25 @@  static void openpic_tmr_write(void *opaque, gpa_t addr, uint64_t val,
 		write_IRQreg_idr(opp, opp->irq_tim0 + idx, val);
 		break;
 	}
+
+	spin_unlock_irq(&opp->lock);
+
+	return 0;
 }
 
-static uint64_t openpic_tmr_read(void *opaque, gpa_t addr, unsigned len)
+static int openpic_tmr_read(struct kvm_io_device *this, gpa_t addr,
+				 int len, void *ptr)
 {
-	struct openpic *opp = opaque;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
 	uint32_t retval = -1;
-	int idx;
+	int ret, idx;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx "\n", __func__, addr);
+	pr_debug("%s: addr %#llx\n", __func__, addr);
 	if (addr & 0xF)
 		goto out;
 
@@ -813,6 +971,9 @@  static uint64_t openpic_tmr_read(void *opaque, gpa_t addr, unsigned len)
 		retval = opp->tfrr;
 		goto out;
 	}
+
+	spin_lock_irq(&opp->lock);
+
 	switch (addr & 0x30) {
 	case 0x00:		/* TCCR */
 		retval = opp->timers[idx].tccr;
@@ -828,24 +989,40 @@  static uint64_t openpic_tmr_read(void *opaque, gpa_t addr, unsigned len)
 		break;
 	}
 
+	spin_unlock_irq(&opp->lock);
 out:
 	pr_debug("%s: => 0x%08x\n", __func__, retval);
 
-	return retval;
+	ret = openpic_put_val32(len, ptr, retval);
+	if (ret < 0)
+		return 1;
+
+	return 0;
 }
 
-static void openpic_src_write(void *opaque, gpa_t addr, uint64_t val,
-			      unsigned len)
+static int openpic_src_write(struct kvm_io_device *this, gpa_t addr,
+			     int len, const void *ptr)
 {
-	struct openpic *opp = opaque;
-	int idx;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+	u32 val;
+	int ret, idx;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx " <= %08" PRIx64 "\n",
-		__func__, addr, val);
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
+
+	ret = openpic_get_val32(len, ptr, &val);
+	if (ret < 0)
+		return 1;
+
+	pr_debug("%s: addr %#llx <= %08x\n", __func__, addr, val);
 
 	addr = addr & 0xffff;
 	idx = addr >> 5;
 
+	spin_lock_irq(&opp->lock);
+
 	switch (addr & 0x1f) {
 	case 0x00:
 		write_IRQreg_ivpr(opp, idx, val);
@@ -857,20 +1034,32 @@  static void openpic_src_write(void *opaque, gpa_t addr, uint64_t val,
 		write_IRQreg_ilr(opp, idx, val);
 		break;
 	}
+
+
+	spin_unlock_irq(&opp->lock);
+	return 0;
 }
 
-static uint64_t openpic_src_read(void *opaque, uint64_t addr, unsigned len)
+static int openpic_src_read(struct kvm_io_device *this, uint64_t addr,
+			    int len, void *ptr)
 {
-	struct openpic *opp = opaque;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
 	uint32_t retval;
-	int idx;
+	int ret, idx;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx "\n", __func__, addr);
+	pr_debug("%s: addr %#llx\n", __func__, addr);
 	retval = 0xFFFFFFFF;
 
 	addr = addr & 0xffff;
 	idx = addr >> 5;
 
+	spin_lock_irq(&opp->lock);
+
 	switch (addr & 0x1f) {
 	case 0x00:
 		retval = read_IRQreg_ivpr(opp, idx);
@@ -883,21 +1072,38 @@  static uint64_t openpic_src_read(void *opaque, uint64_t addr, unsigned len)
 		break;
 	}
 
+	spin_unlock_irq(&opp->lock);
 	pr_debug("%s: => 0x%08x\n", __func__, retval);
-	return retval;
+
+	ret = openpic_put_val32(len, ptr, retval);
+	if (ret < 0)
+		return 1;
+
+	return 0;
 }
 
-static void openpic_msi_write(void *opaque, gpa_t addr, uint64_t val,
-			      unsigned size)
+static int openpic_msi_write(struct kvm_io_device *this, gpa_t addr,
+			     int len, const void *ptr)
 {
-	struct openpic *opp = opaque;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+	u32 val;
 	int idx = opp->irq_msi;
-	int srs, ibs;
+	int srs, ibs, ret;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
+
+	ret = openpic_get_val32(len, ptr, &val);
+	if (ret < 0)
+		return 1;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx " <= 0x%08" PRIx64 "\n",
-		__func__, addr, val);
+	pr_debug("%s: addr %#llx <= 0x%08x\n", __func__, addr, val);
 	if (addr & 0xF)
-		return;
+		return 0;
+
+	spin_lock_irq(&opp->lock);
 
 	switch (addr) {
 	case MSIIR_OFFSET:
@@ -911,20 +1117,31 @@  static void openpic_msi_write(void *opaque, gpa_t addr, uint64_t val,
 		/* most registers are read-only, thus ignored */
 		break;
 	}
+
+	spin_unlock_irq(&opp->lock);
+	return 0;
 }
 
-static uint64_t openpic_msi_read(void *opaque, gpa_t addr, unsigned size)
+static int openpic_msi_read(struct kvm_io_device *this, gpa_t addr,
+			    int len, void *ptr)
 {
-	struct openpic *opp = opaque;
-	uint64_t r = 0;
-	int i, srs;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+	uint32_t r = 0;
+	int i, srs, ret;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx "\n", __func__, addr);
+	pr_debug("%s: addr %#llx\n", __func__, addr);
 	if (addr & 0xF)
-		return -1;
+		return 1;
 
 	srs = addr >> 4;
 
+	spin_lock_irq(&opp->lock);
+
 	switch (addr) {
 	case 0x00:
 	case 0x10:
@@ -945,45 +1162,76 @@  static uint64_t openpic_msi_read(void *opaque, gpa_t addr, unsigned size)
 		break;
 	}
 
-	return r;
+	spin_unlock_irq(&opp->lock);
+	pr_debug("%s: => 0x%08x\n", __func__, r);
+
+	ret = openpic_put_val32(len, ptr, r);
+	if (ret < 0)
+		return 1;
+
+	return 0;
 }
 
-static uint64_t openpic_summary_read(void *opaque, gpa_t addr, unsigned size)
+static int openpic_summary_read(struct kvm_io_device *this, gpa_t addr,
+				int len, void *ptr)
 {
-	uint64_t r = 0;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	uint32_t r = 0;
+	int ret;
 
-	pr_debug("%s: addr %#" HWADDR_PRIx "\n", __func__, addr);
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
+
+	pr_debug("%s: addr %#llx\n", __func__, addr);
 
 	/* TODO: EISR/EIMR */
 
-	return r;
+	ret = openpic_put_val32(len, ptr, r);
+	if (ret < 0)
+		return 1;
+
+	return 0;
 }
 
-static void openpic_summary_write(void *opaque, gpa_t addr, uint64_t val,
-				  unsigned size)
+static int openpic_summary_write(struct kvm_io_device *this, gpa_t addr,
+				 int len, const void *ptr)
 {
-	pr_debug("%s: addr %#" HWADDR_PRIx " <= 0x%08" PRIx64 "\n",
-		__func__, addr, val);
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	int ret;
+	uint32_t val;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
+
+	ret = openpic_get_val32(len, ptr, &val);
+	if (ret < 0)
+		return 1;
+
+	pr_debug("%s: addr %#llx <= 0x%08x\n", __func__, addr, val);
 
 	/* TODO: EISR/EIMR */
+	return 0;
 }
 
-static void openpic_cpu_write_internal(void *opaque, gpa_t addr,
-				       uint32_t val, int idx)
+static int openpic_cpu_write_internal(struct kvm_io_device *this, gpa_t addr,
+				      u32 val, int idx)
 {
-	struct openpic *opp = opaque;
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
 	struct irq_source *src;
 	struct irq_dest *dst;
 	int s_IRQ, n_IRQ;
 
-	pr_debug("%s: cpu %d addr %#" HWADDR_PRIx " <= 0x%08x\n", __func__, idx,
+	pr_debug("%s: cpu %d addr %#llx <= 0x%08x\n", __func__, idx,
 		addr, val);
 
 	if (idx < 0)
-		return;
+		return 0;
 
 	if (addr & 0xF)
-		return;
+		return 0;
 
 	dst = &opp->dst[idx];
 	addr &= 0xFF0;
@@ -1008,11 +1256,11 @@  static void openpic_cpu_write_internal(void *opaque, gpa_t addr,
 		if (dst->raised.priority <= dst->ctpr) {
 			pr_debug("%s: Lower OpenPIC INT output cpu %d due to ctpr\n",
 				__func__, idx);
-			qemu_irq_lower(dst->irqs[OPENPIC_OUTPUT_INT]);
+			mpic_irq_lower(opp, dst, ILR_INTTGT_INT);
 		} else if (dst->raised.priority > dst->servicing.priority) {
 			pr_debug("%s: Raise OpenPIC INT output cpu %d irq %d\n",
 				__func__, idx, dst->raised.next);
-			qemu_irq_raise(dst->irqs[OPENPIC_OUTPUT_INT]);
+			mpic_irq_raise(opp, dst, ILR_INTTGT_INT);
 		}
 
 		break;
@@ -1043,18 +1291,38 @@  static void openpic_cpu_write_internal(void *opaque, gpa_t addr,
 		     IVPR_PRIORITY(src->ivpr) > dst->servicing.priority)) {
 			pr_debug("Raise OpenPIC INT output cpu %d irq %d\n",
 				idx, n_IRQ);
-			qemu_irq_raise(opp->dst[idx].irqs[OPENPIC_OUTPUT_INT]);
+			mpic_irq_raise(opp, dst, ILR_INTTGT_INT);
 		}
 		break;
 	default:
 		break;
 	}
+
+	return 0;
 }
 
-static void openpic_cpu_write(void *opaque, gpa_t addr, uint64_t val,
-			      unsigned len)
+static int openpic_cpu_write(struct kvm_io_device *this, gpa_t addr,
+			     int len, const void *ptr)
 {
-	openpic_cpu_write_internal(opaque, addr, val, (addr & 0x1f000) >> 12);
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+	u32 val;
+	int ret;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
+
+	ret = openpic_get_val32(len, ptr, &val);
+	if (ret < 0)
+		return 1;
+
+	spin_lock_irq(&opp->lock);
+	ret = openpic_cpu_write_internal(this, addr, val,
+					 (addr & 0x1f000) >> 12);
+
+	spin_unlock_irq(&opp->lock);
+	return ret;
 }
 
 static uint32_t openpic_iack(struct openpic *opp, struct irq_dest *dst,
@@ -1064,7 +1332,7 @@  static uint32_t openpic_iack(struct openpic *opp, struct irq_dest *dst,
 	int retval, irq;
 
 	pr_debug("Lower OpenPIC INT output\n");
-	qemu_irq_lower(dst->irqs[OPENPIC_OUTPUT_INT]);
+	mpic_irq_lower(opp, dst, ILR_INTTGT_INT);
 
 	irq = IRQ_get_next(opp, &dst->raised);
 	pr_debug("IACK: irq=%d\n", irq);
@@ -1107,20 +1375,37 @@  static uint32_t openpic_iack(struct openpic *opp, struct irq_dest *dst,
 	return retval;
 }
 
-static uint32_t openpic_cpu_read_internal(void *opaque, gpa_t addr, int idx)
+void kvmppc_mpic_set_epr(struct kvm_vcpu *vcpu)
 {
-	struct openpic *opp = opaque;
+	struct kvm *kvm = vcpu->kvm;
+	struct openpic *opp = kvm->arch.irqchip_priv;
+	int cpu = vcpu->vcpu_id;
+	unsigned long flags;
+
+	spin_lock_irqsave(&opp->lock, flags);
+
+	if ((opp->gcr & opp->mpic_mode_mask) == GCR_MODE_PROXY)
+		kvmppc_set_epr(vcpu, openpic_iack(opp, &opp->dst[cpu], cpu));
+
+	spin_unlock_irqrestore(&opp->lock, flags);
+}
+
+static int openpic_cpu_read_internal(struct kvm_io_device *this, gpa_t addr,
+				     u32 *ptr, int idx)
+{
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
 	struct irq_dest *dst;
 	uint32_t retval;
 
-	pr_debug("%s: cpu %d addr %#" HWADDR_PRIx "\n", __func__, idx, addr);
+	pr_debug("%s: cpu %d addr %#llx\n", __func__, idx, addr);
 	retval = 0xFFFFFFFF;
 
 	if (idx < 0)
-		return retval;
+		goto out;
 
 	if (addr & 0xF)
-		return retval;
+		goto out;
 
 	dst = &opp->dst[idx];
 	addr &= 0xFF0;
@@ -1142,12 +1427,35 @@  static uint32_t openpic_cpu_read_internal(void *opaque, gpa_t addr, int idx)
 	}
 	pr_debug("%s: => 0x%08x\n", __func__, retval);
 
-	return retval;
+out:
+	*ptr = retval;
+	return 0;
 }
 
-static uint64_t openpic_cpu_read(void *opaque, gpa_t addr, unsigned len)
+static int openpic_cpu_read(struct kvm_io_device *this, gpa_t addr,
+			    int len, void *ptr)
 {
-	return openpic_cpu_read_internal(opaque, addr, (addr & 0x1f000) >> 12);
+	struct sub_region *sub = container_of(this, struct sub_region, iodev);
+	struct openpic *opp = sub->opp;
+	int ret;
+	u32 val;
+
+	addr -= sub->base;
+	if (addr > sub->size)
+		return 1;
+
+	spin_lock_irq(&opp->lock);
+	ret = openpic_cpu_read_internal(this, addr, &val,
+					(addr & 0x1f000) >> 12);
+	spin_unlock_irq(&opp->lock);
+	if (ret < 0)
+		return 1;
+
+	ret = openpic_put_val32(len, ptr, val);
+	if (ret < 0)
+		return 1;
+
+	return 0;
 }
 
 static const struct kvm_io_device_ops openpic_glb_ops_be = {
@@ -1205,11 +1513,10 @@  static void fsl_common_init(struct openpic *opp)
 	opp->irq_tim0 = virq;
 	virq += MAX_TMR;
 
-	assert(virq <= MAX_IRQ);
+	BUG_ON(virq > MAX_IRQ);
 
 	opp->irq_msi = 224;
 
-	msi_supported = true;
 	for (i = 0; i < opp->fsl->max_ext; i++)
 		opp->src[i].level = false;
 
@@ -1226,39 +1533,55 @@  static void fsl_common_init(struct openpic *opp)
 	}
 }
 
-static void map_list(struct openpic *opp, const struct mem_reg *list,
-		     int *count)
+static void map_list(struct openpic *opp, const struct mem_reg *list)
 {
+	mutex_lock(&opp->kvm->slots_lock);
+
 	while (list->name) {
-		assert(*count < ARRAY_SIZE(opp->sub_io_mem));
+		struct sub_region *sub;
+
+		BUG_ON(opp->sub_count >= ARRAY_SIZE(opp->sub_io_mem));
 
-		memory_region_init_io(&opp->sub_io_mem[*count], list->ops, opp,
-				      list->name, list->size);
+		sub = &opp->sub_io_mem[opp->sub_count];
+		sub->opp = opp;
+		sub->base = opp->reg_base + list->start_addr;
+		sub->size = list->size;
 
-		memory_region_add_subregion(&opp->mem, list->start_addr,
-					    &opp->sub_io_mem[*count]);
+		kvm_iodevice_init(&sub->iodev, list->ops);
 
-		(*count)++;
+		kvm_io_bus_register_dev(opp->kvm, KVM_MMIO_BUS,
+			opp->reg_base + list->start_addr, list->size,
+			&sub->iodev);
+
+		opp->sub_count++;
 		list++;
 	}
+
+	mutex_unlock(&opp->kvm->slots_lock);
+}
+
+static void unmap_all(struct openpic *opp)
+{
+	int i;
+
+	mutex_lock(&opp->kvm->slots_lock);
+
+	for (i = 0; i < opp->sub_count; i++) {
+		kvm_io_bus_unregister_dev(opp->kvm, KVM_MMIO_BUS,
+			&opp->sub_io_mem[i].iodev);
+	}
+
+	mutex_unlock(&opp->kvm->slots_lock);
+
+	opp->sub_count = 0;
 }
 
-static int openpic_init(SysBusDevice *dev)
+static int set_base_addr(struct kvm *kvm, struct kvm_device *dev,
+			 struct kvm_device_attr *attr)
 {
-	struct openpic *opp = FROM_SYSBUS(typeof(*opp), dev);
-	int i, j;
-	int list_count = 0;
-	static const struct mem_reg list_le[] = {
-		{"glb", &openpic_glb_ops_le,
-		 OPENPIC_GLB_REG_START, OPENPIC_GLB_REG_SIZE},
-		{"tmr", &openpic_tmr_ops_le,
-		 OPENPIC_TMR_REG_START, OPENPIC_TMR_REG_SIZE},
-		{"src", &openpic_src_ops_le,
-		 OPENPIC_SRC_REG_START, OPENPIC_SRC_REG_SIZE},
-		{"cpu", &openpic_cpu_ops_le,
-		 OPENPIC_CPU_REG_START, OPENPIC_CPU_REG_SIZE},
-		{NULL}
-	};
+	struct openpic *opp = container_of(dev, struct openpic, dev);
+	u64 base;
+
 	static const struct mem_reg list_be[] = {
 		{"glb", &openpic_glb_ops_be,
 		 OPENPIC_GLB_REG_START, OPENPIC_GLB_REG_SIZE},
@@ -1278,11 +1601,239 @@  static int openpic_init(SysBusDevice *dev)
 		{NULL}
 	};
 
-	memory_region_init(&opp->mem, "openpic", 0x40000);
+	if (copy_from_user(&base, (u64 __iomem *)(long)attr->addr, sizeof(u64)))
+		return -EFAULT;
+
+	if (base & 0x3ffff) {
+		pr_debug("kvm mpic %s: KVM_DEV_MPIC_BASE_ADDR %08llx not aligned\n",
+			 __func__, base);
+		return -EINVAL;
+	}
+
+	if (base == opp->reg_base)
+		return 0;
+
+	unmap_all(opp);
+	opp->reg_base = base;
+
+	pr_debug("kvm mpic %s: KVM_DEV_MPIC_BASE_ADDR %08llx\n",
+		 __func__, base);
+
+	if (base == 0)
+		return 0;
 
 	switch (opp->model) {
-	case OPENPIC_MODEL_FSL_MPIC_20:
+	case KVM_DEV_TYPE_FSL_MPIC_20:
+		map_list(opp, list_be);
+		map_list(opp, list_fsl);
+
+		break;
+
+	case KVM_DEV_TYPE_FSL_MPIC_42:
+		map_list(opp, list_be);
+		map_list(opp, list_fsl);
+
+		break;
+
 	default:
+		WARN_ON_ONCE(1);
+	}
+
+	return 0;
+}
+
+#define ATTR_SET		0
+#define ATTR_GET		1
+
+static int access_reg(struct openpic *opp, gpa_t addr, u32 *val, int type)
+{
+	int ret;
+
+	if (!opp->sub_count)
+		return -EPERM;
+
+	if (addr & 3)
+		return -ENXIO;
+
+	if (addr > 0x40000)
+		return -ENXIO;
+
+	addr += opp->reg_base;
+
+	mutex_lock(&opp->kvm->slots_lock);
+
+	if (type == ATTR_SET)
+		ret = kvm_io_bus_write(opp->kvm, KVM_MMIO_BUS, addr, 4, val);
+	else
+		ret = kvm_io_bus_read(opp->kvm, KVM_MMIO_BUS, addr, 4, val);
+
+	mutex_unlock(&opp->kvm->slots_lock);
+
+	pr_debug("%s: type %d addr %llx val %x\n", __func__, type, addr, *val);
+
+	return ret;
+}
+
+static int mpic_set_attr(struct kvm *kvm, struct kvm_device *dev,
+			 struct kvm_device_attr *attr)
+{
+	struct openpic *opp = container_of(dev, struct openpic, dev);
+	u32 attr32;
+
+	switch (attr->group) {
+	case KVM_DEV_MPIC_GRP_MISC:
+		switch (attr->attr) {
+		case KVM_DEV_MPIC_BASE_ADDR:
+			return set_base_addr(kvm, dev, attr);
+		}
+
+		break;
+
+	case KVM_DEV_MPIC_GRP_REGISTER:
+		if (copy_from_user(&attr32, (u32 __user *)(long)attr->addr,
+				   sizeof(u32)))
+			return -EFAULT;
+
+		return access_reg(opp, attr->attr, &attr32, ATTR_SET);
+
+	case KVM_DEV_MPIC_GRP_IRQ_ACTIVE:
+		if (attr->attr > MAX_SRC)
+			return -EINVAL;
+
+		if (copy_from_user(&attr32, (u32 __user *)(long)attr->addr,
+				   sizeof(u32)))
+			return -EFAULT;
+
+		if (attr32 != 0 && attr32 != 1)
+			return -EINVAL;
+
+		spin_lock_irq(&opp->lock);
+		openpic_set_irq(opp, attr->attr, attr32);
+		spin_unlock_irq(&opp->lock);
+		return 0;
+	}
+
+	return -ENXIO;
+}
+
+static int mpic_get_attr(struct kvm *kvm, struct kvm_device *dev,
+			 struct kvm_device_attr *attr)
+{
+	struct openpic *opp = container_of(dev, struct openpic, dev);
+	u64 attr64;
+	u32 attr32;
+	int ret;
+
+	switch (attr->group) {
+	case KVM_DEV_MPIC_GRP_MISC:
+		switch (attr->attr) {
+		case KVM_DEV_MPIC_BASE_ADDR:
+			attr64 = opp->reg_base;
+
+			if (copy_to_user((u64 __user *)(long)attr->addr,
+					 &attr64, sizeof(u64)))
+				return -EFAULT;
+
+			return 0;
+		}
+
+		break;
+
+	case KVM_DEV_MPIC_GRP_REGISTER:
+		ret = access_reg(opp, attr->attr, &attr32, ATTR_GET);
+		if (ret)
+			return ret;
+
+		if (copy_to_user((u32 __user *)(long)attr->addr, &attr32,
+				 sizeof(u32)))
+			return -EFAULT;
+
+		return 0;
+
+	case KVM_DEV_MPIC_GRP_IRQ_ACTIVE:
+		if (attr->attr > MAX_SRC)
+			return -EINVAL;
+
+		attr32 = opp->src[attr->attr].pending;
+
+		if (copy_to_user((u32 __user *)(long)attr->addr, &attr32,
+				 sizeof(u32)))
+			return -EFAULT;
+
+		return 0;
+	}
+
+	return -ENXIO;
+}
+
+static void mpic_destroy(struct kvm *kvm, struct kvm_device *dev)
+{
+	struct openpic *opp = container_of(dev, struct openpic, dev);
+
+	blocking_notifier_chain_unregister(&kvm->vcpu_notifier,
+					   &opp->vcpu_notifier);
+
+	unmap_all(opp);
+	kfree(opp);
+}
+
+static int add_cpu(struct openpic *opp, struct kvm_vcpu *vcpu)
+{
+	u32 id = vcpu->vcpu_id;
+
+	if (id < 0 || id >= MAX_CPU)
+		return -EPERM;
+
+	spin_lock_irq(&opp->lock);
+
+	WARN_ON(opp->dst[id].vcpu);
+	opp->dst[id].vcpu = vcpu;
+	opp->nb_cpus = max(opp->nb_cpus, id + 1);
+
+	spin_unlock_irq(&opp->lock);
+
+	if (opp->mpic_mode_mask == GCR_MODE_PROXY)
+		vcpu->arch.epr_flags |= KVMPPC_EPR_KERNEL;
+
+	return 0;
+}
+
+static int kvm_mpic_vcpu_notifier(struct notifier_block *nb,
+				  unsigned long create, void *v)
+{
+	struct openpic *opp = container_of(nb, struct openpic, vcpu_notifier);
+	struct kvm_vcpu *vcpu = v;
+	int ret;
+
+	if (create) {
+		ret = add_cpu(opp, vcpu);
+		if (ret < 0)
+			return notifier_from_errno(ret);
+	}
+
+	return NOTIFY_OK;
+}
+
+int kvm_create_mpic(struct kvm *kvm, u32 type, struct kvm_device **dev)
+{
+	struct openpic *opp;
+	struct kvm_vcpu *vcpu;
+	int ret, i;
+
+	if (kvm->arch.irqchip_priv)
+		return -EEXIST;
+
+	opp = kzalloc(sizeof(struct openpic), GFP_KERNEL);
+	if (!opp)
+		return 0;
+
+	kvm->arch.irqchip_priv = opp;
+	opp->kvm = kvm;
+	opp->model = type;
+	spin_lock_init(&opp->lock);
+
+	switch (opp->model) {
+	case KVM_DEV_TYPE_FSL_MPIC_20:
 		opp->fsl = &fsl_mpic_20;
 		opp->brr1 = 0x00400200;
 		opp->flags |= OPENPIC_FLAG_IDR_CRIT;
@@ -1290,12 +1841,10 @@  static int openpic_init(SysBusDevice *dev)
 		opp->mpic_mode_mask = GCR_MODE_MIXED;
 
 		fsl_common_init(opp);
-		map_list(opp, list_be, &list_count);
-		map_list(opp, list_fsl, &list_count);
 
 		break;
 
-	case OPENPIC_MODEL_FSL_MPIC_42:
+	case KVM_DEV_TYPE_FSL_MPIC_42:
 		opp->fsl = &fsl_mpic_42;
 		opp->brr1 = 0x00400402;
 		opp->flags |= OPENPIC_FLAG_ILR;
@@ -1303,11 +1852,39 @@  static int openpic_init(SysBusDevice *dev)
 		opp->mpic_mode_mask = GCR_MODE_PROXY;
 
 		fsl_common_init(opp);
-		map_list(opp, list_be, &list_count);
-		map_list(opp, list_fsl, &list_count);
 
 		break;
+
+	default:
+		ret = -ENODEV;
+		goto err;
 	}
 
+	openpic_reset(opp);
+
+	opp->dev.type = type;
+	opp->dev.set_attr = mpic_set_attr;
+	opp->dev.get_attr = mpic_get_attr;
+	opp->dev.destroy = mpic_destroy;
+	*dev = &opp->dev;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		ret = add_cpu(opp, vcpu);
+		if (ret < 0)
+			goto err;
+	}
+
+	opp->vcpu_notifier.notifier_call = kvm_mpic_vcpu_notifier;
+
+	/* FIXME: register notifier for subsequently created vcpus */
+	ret = blocking_notifier_chain_register(&kvm->vcpu_notifier,
+					       &opp->vcpu_notifier);
+	if (ret < 0)
+		goto err;
+
 	return 0;
+
+err:
+	kfree(opp);
+	return ret;
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 61989f4..e3d09f7 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -317,6 +317,7 @@  int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_ENABLE_CAP:
 	case KVM_CAP_ONE_REG:
 	case KVM_CAP_IOEVENTFD:
+	case KVM_CAP_DEVICE_CTRL:
 		r = 1;
 		break;
 #ifndef CONFIG_KVM_BOOK3S_64_HV
@@ -781,7 +782,10 @@  static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		break;
 	case KVM_CAP_PPC_EPR:
 		r = 0;
-		vcpu->arch.epr_enabled = cap->args[0];
+		if (cap->args[0])
+			vcpu->arch.epr_flags |= KVMPPC_EPR_USER;
+		else
+			vcpu->arch.epr_flags &= ~KVMPPC_EPR_USER;
 		break;
 #ifdef CONFIG_BOOKE
 	case KVM_CAP_PPC_BOOKE_WATCHDOG:
@@ -927,6 +931,7 @@  static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo)
 long kvm_arch_vm_ioctl(struct file *filp,
                        unsigned int ioctl, unsigned long arg)
 {
+	struct kvm *kvm __maybe_unused = filp->private_data;
 	void __user *argp = (void __user *)arg;
 	long r;
 
@@ -945,7 +950,6 @@  long kvm_arch_vm_ioctl(struct file *filp,
 #ifdef CONFIG_PPC_BOOK3S_64
 	case KVM_CREATE_SPAPR_TCE: {
 		struct kvm_create_spapr_tce create_tce;
-		struct kvm *kvm = filp->private_data;
 
 		r = -EFAULT;
 		if (copy_from_user(&create_tce, argp, sizeof(create_tce)))
@@ -957,7 +961,6 @@  long kvm_arch_vm_ioctl(struct file *filp,
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 	case KVM_ALLOCATE_RMA: {
-		struct kvm *kvm = filp->private_data;
 		struct kvm_allocate_rma rma;
 
 		r = kvm_vm_ioctl_allocate_rma(kvm, &rma);
@@ -967,7 +970,6 @@  long kvm_arch_vm_ioctl(struct file *filp,
 	}
 
 	case KVM_PPC_ALLOCATE_HTAB: {
-		struct kvm *kvm = filp->private_data;
 		u32 htab_order;
 
 		r = -EFAULT;
@@ -984,7 +986,6 @@  long kvm_arch_vm_ioctl(struct file *filp,
 	}
 
 	case KVM_PPC_GET_HTAB_FD: {
-		struct kvm *kvm = filp->private_data;
 		struct kvm_get_htab_fd ghf;
 
 		r = -EFAULT;
@@ -997,7 +998,6 @@  long kvm_arch_vm_ioctl(struct file *filp,
 
 #ifdef CONFIG_PPC_BOOK3S_64
 	case KVM_PPC_GET_SMMU_INFO: {
-		struct kvm *kvm = filp->private_data;
 		struct kvm_ppc_smmu_info info;
 
 		memset(&info, 0, sizeof(info));
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3d28037..48342a6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1059,5 +1059,7 @@  static inline bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 }
 
 #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
-#endif
 
+int kvm_create_mpic(struct kvm *kvm, u32 type, struct kvm_device **dev);
+
+#endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1f348e0..1048a03 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -910,10 +910,19 @@  struct kvm_device_attr {
 #define KVM_DEV_ATTR_COMMON		0
 #define   KVM_DEV_ATTR_TYPE		0 /* 32-bit */
 
-#define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xac, struct kvm_create_device)
-#define KVM_SET_DEVICE_ATTR	  _IOW(KVMIO,  0xad, struct kvm_device_attr)
-#define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xae, struct kvm_device_attr)
-#define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xaf, struct kvm_device_attr)
+#define KVM_DEV_TYPE_FSL_MPIC_20	1
+#define KVM_DEV_TYPE_FSL_MPIC_42	2
+
+#define KVM_DEV_MPIC_GRP_MISC		1
+#define   KVM_DEV_MPIC_BASE_ADDR	0	/* 64-bit */
+
+#define KVM_DEV_MPIC_GRP_REGISTER	2	/* 32-bit */
+#define KVM_DEV_MPIC_GRP_IRQ_ACTIVE	3	/* 32-bit */
+
+#define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xab, struct kvm_create_device)
+#define KVM_SET_DEVICE_ATTR	  _IOW(KVMIO,  0xac, struct kvm_device_attr)
+#define KVM_GET_DEVICE_ATTR	  _IOW(KVMIO,  0xad, struct kvm_device_attr)
+#define KVM_HAS_DEVICE_ATTR	  _IOW(KVMIO,  0xae, struct kvm_device_attr)
 
 /*
  * ioctls for vcpu fds
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dd4c78d..db0c2b3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2210,6 +2210,18 @@  static int kvm_ioctl_create_device(struct kvm *kvm,
 	}
 
 	switch (cd->type) {
+#ifdef CONFIG_KVM_MPIC
+	case KVM_DEV_TYPE_FSL_MPIC_20:
+	case KVM_DEV_TYPE_FSL_MPIC_42: {
+		if (test) {
+			r = 0;
+			break;
+		}
+
+		r = kvm_create_mpic(kvm, cd->type, &dev);
+		break;
+	}
+#endif
 	default:
 		r = -ENODEV;
 		goto out;