diff mbox

[41/77] ppc/pnv: Add LPC controller and hook it up with a UART and RTC

Message ID 1447201710-10229-42-git-send-email-benh@kernel.crashing.org
State New
Headers show

Commit Message

Benjamin Herrenschmidt Nov. 11, 2015, 12:27 a.m. UTC
This adds a model of the POWER8 LPC controller. It is then used
by the PowerNV code to attach a UART and RTC, which, with the right
version of OPAL firmware, will provide a working console.

This version of the LPC controller model doesn't yet implement
support for the SerIRQ deserializer present in the Naples version
of the chip though some preliminary work is there.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/ppc/Makefile.objs |   2 +-
 hw/ppc/pnv.c         |  49 ++++-
 hw/ppc/pnv_lpc.c     | 527 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/pnv.h |   5 +
 4 files changed, 578 insertions(+), 5 deletions(-)
 create mode 100644 hw/ppc/pnv_lpc.c

Comments

Alexey Kardashevskiy Nov. 17, 2015, 12:32 a.m. UTC | #1
On 11/11/2015 11:27 AM, Benjamin Herrenschmidt wrote:
> This adds a model of the POWER8 LPC controller. It is then used
> by the PowerNV code to attach a UART and RTC, which, with the right
> version of OPAL firmware, will provide a working console.
>
> This version of the LPC controller model doesn't yet implement
> support for the SerIRQ deserializer present in the Naples version
> of the chip though some preliminary work is there.
>

Is this LPC controller one per a chip or per a machine?
In general it is quite nice when "-nodefaults" does not create neither PHB 
nor LPC so the user can add them manually with parameters different than 
defaults.
Benjamin Herrenschmidt Nov. 17, 2015, 12:40 a.m. UTC | #2
On Tue, 2015-11-17 at 11:32 +1100, Alexey Kardashevskiy wrote:
> On 11/11/2015 11:27 AM, Benjamin Herrenschmidt wrote:
> > This adds a model of the POWER8 LPC controller. It is then used
> > by the PowerNV code to attach a UART and RTC, which, with the right
> > version of OPAL firmware, will provide a working console.
> > 
> > This version of the LPC controller model doesn't yet implement
> > support for the SerIRQ deserializer present in the Naples version
> > of the chip though some preliminary work is there.
> > 
> 
> Is this LPC controller one per a chip or per a machine?

Per chip but we usually only wire one up per machine.

> In general it is quite nice when "-nodefaults" does not create
> neither PHB nor LPC so the user can add them manually with parameters
> different than defaults.

In this case though, PHB and LPC bridges are all part of the P8 chip,
and I'm trying to represent that topology as best as possible.

I think "-nodefaults" for Pnv should only be about the devices we
attach to the LPC/PHB not the busses themselves.

Cheers,
Ben.
David Gibson Dec. 1, 2015, 6:43 a.m. UTC | #3
On Tue, Nov 17, 2015 at 11:40:04AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2015-11-17 at 11:32 +1100, Alexey Kardashevskiy wrote:
> > On 11/11/2015 11:27 AM, Benjamin Herrenschmidt wrote:
> > > This adds a model of the POWER8 LPC controller. It is then used
> > > by the PowerNV code to attach a UART and RTC, which, with the right
> > > version of OPAL firmware, will provide a working console.
> > > 
> > > This version of the LPC controller model doesn't yet implement
> > > support for the SerIRQ deserializer present in the Naples version
> > > of the chip though some preliminary work is there.
> > > 
> > 
> > Is this LPC controller one per a chip or per a machine?
> 
> Per chip but we usually only wire one up per machine.
> 
> > In general it is quite nice when "-nodefaults" does not create
> > neither PHB nor LPC so the user can add them manually with parameters
> > different than defaults.
> 
> In this case though, PHB and LPC bridges are all part of the P8 chip,
> and I'm trying to represent that topology as best as possible.
> 
> I think "-nodefaults" for Pnv should only be about the devices we
> attach to the LPC/PHB not the busses themselves.

Exactly what is and isn't covered by -nodefaults is a bit of a mess -
part of the topic of my talk at KVM Forum.

But on the whole I agree with you, since the LPC is part of the P8
chip, I think it makes sense to include it even with -nodefaults.
Alexey Kardashevskiy Dec. 2, 2015, 2:24 a.m. UTC | #4
On 12/01/2015 05:43 PM, David Gibson wrote:
> On Tue, Nov 17, 2015 at 11:40:04AM +1100, Benjamin Herrenschmidt wrote:
>> On Tue, 2015-11-17 at 11:32 +1100, Alexey Kardashevskiy wrote:
>>> On 11/11/2015 11:27 AM, Benjamin Herrenschmidt wrote:
>>>> This adds a model of the POWER8 LPC controller. It is then used
>>>> by the PowerNV code to attach a UART and RTC, which, with the right
>>>> version of OPAL firmware, will provide a working console.
>>>>
>>>> This version of the LPC controller model doesn't yet implement
>>>> support for the SerIRQ deserializer present in the Naples version
>>>> of the chip though some preliminary work is there.
>>>>
>>>
>>> Is this LPC controller one per a chip or per a machine?
>>
>> Per chip but we usually only wire one up per machine.
>>
>>> In general it is quite nice when "-nodefaults" does not create
>>> neither PHB nor LPC so the user can add them manually with parameters
>>> different than defaults.
>>
>> In this case though, PHB and LPC bridges are all part of the P8 chip,
>> and I'm trying to represent that topology as best as possible.
>>
>> I think "-nodefaults" for Pnv should only be about the devices we
>> attach to the LPC/PHB not the busses themselves.
>
> Exactly what is and isn't covered by -nodefaults is a bit of a mess -
> part of the topic of my talk at KVM Forum.
>
> But on the whole I agree with you, since the LPC is part of the P8
> chip, I think it makes sense to include it even with -nodefaults.

POWER8 chips all have 8 threads per core but we do not always assume -smt 
...,threads=8, how are LPC or PHB different? PHB is more interesting - how 
is the user supposed to add more? And there always will be the default one 
which properties are set in a separate way (via -global, not -device). I 
found it sometime really annoying to debug the existing pseries which 
always adds a default PHB (I know, this was to make libvirt happy but this 
is not the case here).

Out of curiosity - if we have 2 chips, will the system work if the second 
chip does not get any LPC or PHB attached?
Benjamin Herrenschmidt Dec. 2, 2015, 5:29 a.m. UTC | #5
On Wed, 2015-12-02 at 13:24 +1100, Alexey Kardashevskiy wrote:
> > But on the whole I agree with you, since the LPC is part of the P8
> > chip, I think it makes sense to include it even with -nodefaults.
> 
> POWER8 chips all have 8 threads per core but we do not always assume -smt 
> ...,threads=8, how are LPC or PHB different? 

First, for pseries which is paravirtualized it's a different can of
worms completely. For powernv, we *should* represent all 8 threads,
we just can't yet due to TCG limitations.

> PHB is more interesting - how is the user supposed to add more?

That's an open question. Since we model a real P8 chip we can only
model the PHBs as they exist on it, which is up to 3 per chip at
very specific XSCOM addresses. We could try to model some non-existing
P8 chip with more but bad things will happen when the FW try to assign
interrupt numbers for example.

We simulate a machine that has been primed by HostBoot before OPAL
starts. So we rely on what the device-tree tells us of what PHB were
enabled but appart from that, we have to stick to the limitations.

> And there always will be the default one 
> which properties are set in a separate way (via -global, not -device). I 
> found it sometime really annoying to debug the existing pseries which 
> always adds a default PHB (I know, this was to make libvirt happy but this 
> is not the case here).
> 
> Out of curiosity - if we have 2 chips, will the system work if the second 
> chip does not get any LPC or PHB attached?

This is something I need to look into, there's a lot of work needed to
properly model "chips" that I haven't done yet, but what is there is
sufficient for a lot of usages already.

Cheers,
Ben.
Alexey Kardashevskiy Dec. 3, 2015, 1:04 a.m. UTC | #6
On 12/02/2015 04:29 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2015-12-02 at 13:24 +1100, Alexey Kardashevskiy wrote:
>>> But on the whole I agree with you, since the LPC is part of the P8
>>> chip, I think it makes sense to include it even with -nodefaults.
>>
>> POWER8 chips all have 8 threads per core but we do not always assume -smt
>> ...,threads=8, how are LPC or PHB different?
>
> First, for pseries which is paravirtualized it's a different can of
> worms completely. For powernv, we *should* represent all 8 threads,
> we just can't yet due to TCG limitations.

Out of curiosity - for pseries we should not? I know it works with various 
numbers of threads but is not that because we also control guest linux 
kernel and, for example, the Other OS (AIX) might be upset on 
non-multiply-of-2 number of threads?


>> PHB is more interesting - how is the user supposed to add more?
>
> That's an open question. Since we model a real P8 chip we can only
> model the PHBs as they exist on it, which is up to 3 per chip at
> very specific XSCOM addresses. We could try to model some non-existing
> P8 chip with more but bad things will happen when the FW try to assign
> interrupt numbers for example.
 >
> We simulate a machine that has been primed by HostBoot before OPAL
> starts. So we rely on what the device-tree tells us of what PHB were
> enabled but appart from that, we have to stick to the limitations.
 >
>> And there always will be the default one
>> which properties are set in a separate way (via -global, not -device). I
>> found it sometime really annoying to debug the existing pseries which
>> always adds a default PHB (I know, this was to make libvirt happy but this
>> is not the case here).
>>
>> Out of curiosity - if we have 2 chips, will the system work if the second
>> chip does not get any LPC or PHB attached?
>
> This is something I need to look into, there's a lot of work needed to
> properly model "chips" that I haven't done yet, but what is there is
> sufficient for a lot of usages already.

For now, if possible, I'd suggest implementing -nodefaults with no defaults 
whatsoever and create a config somewhere in the qemu tree to pass it via 
-readconfig to get reasonably configured machine so people will know what 
is expected to work but there will still be possibility for experiments (do 
not we secretly hope that other vendors will start designing/manufacturing 
their ppc64 chips?). It could be a config file per an actual POWER8 chip 
(we have two already).
David Gibson Dec. 3, 2015, 1:45 a.m. UTC | #7
On Thu, Dec 03, 2015 at 12:04:58PM +1100, Alexey Kardashevskiy wrote:
> On 12/02/2015 04:29 PM, Benjamin Herrenschmidt wrote:
> >On Wed, 2015-12-02 at 13:24 +1100, Alexey Kardashevskiy wrote:
> >>>But on the whole I agree with you, since the LPC is part of the P8
> >>>chip, I think it makes sense to include it even with -nodefaults.
> >>
> >>POWER8 chips all have 8 threads per core but we do not always assume -smt
> >>...,threads=8, how are LPC or PHB different?
> >
> >First, for pseries which is paravirtualized it's a different can of
> >worms completely. For powernv, we *should* represent all 8 threads,
> >we just can't yet due to TCG limitations.
> 
> Out of curiosity - for pseries we should not? I know it works with various
> numbers of threads but is not that because we also control guest linux
> kernel and, for example, the Other OS (AIX) might be upset on
> non-multiply-of-2 number of threads?

There are several different cases here and I'm not sure which you're
thinking about.

1) Guest has different number of threads-per-core than the host

This one is just fine - PAPR defines how the guest should get the
number of threads from the device tree, and qemu sets that correctly.

2) Guest threads-per-core not a power of two

The PAPR thread mechanism allows this to be communicated to the guest,
and I don't know if PAPR explicitly permits or prohibitis this
situation.  Guests could get confused by it, although that's arguably
a guest bug.

2) "Partially filled core", e.g. guest has 8 threads-per-core declared
   but only one vcpu available

This is the only one I can see as relying on Linux guest behaviour.
We kind of get away with this by accident with a Linux guest - it will
try to bring up all 8 threads, but fail non fatally.  We shouldn't
allow this situation, although we do right now.  Bharata posted a
patch which would prevent this in qemu, and I have a BZ to make
libvirt not allow this construction either.

> >>PHB is more interesting - how is the user supposed to add more?
> >
> >That's an open question. Since we model a real P8 chip we can only
> >model the PHBs as they exist on it, which is up to 3 per chip at
> >very specific XSCOM addresses. We could try to model some non-existing
> >P8 chip with more but bad things will happen when the FW try to assign
> >interrupt numbers for example.
> >
> >We simulate a machine that has been primed by HostBoot before OPAL
> >starts. So we rely on what the device-tree tells us of what PHB were
> >enabled but appart from that, we have to stick to the limitations.
> >
> >>And there always will be the default one
> >>which properties are set in a separate way (via -global, not -device). I
> >>found it sometime really annoying to debug the existing pseries which
> >>always adds a default PHB (I know, this was to make libvirt happy but this
> >>is not the case here).
> >>
> >>Out of curiosity - if we have 2 chips, will the system work if the second
> >>chip does not get any LPC or PHB attached?
> >
> >This is something I need to look into, there's a lot of work needed to
> >properly model "chips" that I haven't done yet, but what is there is
> >sufficient for a lot of usages already.
> 
> For now, if possible, I'd suggest implementing -nodefaults with no defaults
> whatsoever and create a config somewhere in the qemu tree to pass it via
> -readconfig to get reasonably configured machine so people will know what is
> expected to work but there will still be possibility for experiments (do not
> we secretly hope that other vendors will start designing/manufacturing their
> ppc64 chips?). It could be a config file per an actual POWER8 chip (we have
> two already).

I can see some benefit to that approach, but it does stray away from
current qemu practice (in general, not just compared to spapr).
Hmm.. not sure.

What I do think would be a good idea is to represent a POWER8 "chip"
as a instantiable qdev device, which will create the scoms and PHBs
under itself as per the hardware.  We can add device properties as
needed to make that construction more flexible.

We probably don't want to link the number of CPUs to the chip qdevs,
partly because that doesn't really fit the qemu model, but also
because we'll probably want some extra flexibility.  e.g. making a UP
system for experimentation, even though a single chip would have
multiple cores (IIUC) - SMP TCG is super slow, so we probably want that.
Benjamin Herrenschmidt Dec. 3, 2015, 10:54 p.m. UTC | #8
On Thu, 2015-12-03 at 12:04 +1100, Alexey Kardashevskiy wrote:
> On 12/02/2015 04:29 PM, Benjamin Herrenschmidt wrote:
> > On Wed, 2015-12-02 at 13:24 +1100, Alexey Kardashevskiy wrote:
> > > > But on the whole I agree with you, since the LPC is part of the P8
> > > > chip, I think it makes sense to include it even with -nodefaults.
> > > 
> > > POWER8 chips all have 8 threads per core but we do not always assume -smt
> > > ...,threads=8, how are LPC or PHB different?
> > 
> > First, for pseries which is paravirtualized it's a different can of
> > worms completely. For powernv, we *should* represent all 8 threads,
> > we just can't yet due to TCG limitations.
> 
> Out of curiosity - for pseries we should not? 

Not that we should not, more like, it makes sense to offer whatever
choice, it would be indeed better to have the ability to support them
however.

> I know it works with various 
> numbers of threads but is not that because we also control guest linux 
> kernel and, for example, the Other OS (AIX) might be upset on 
> non-multiply-of-2 number of threads?

Quite possibly. I have some plans to add threads support, just haven't
got to it yet :-)
> 
> > > PHB is more interesting - how is the user supposed to add more?
> > 
> > That's an open question. Since we model a real P8 chip we can only
> > model the PHBs as they exist on it, which is up to 3 per chip at
> > very specific XSCOM addresses. We could try to model some non-existing
> > P8 chip with more but bad things will happen when the FW try to assign
> > interrupt numbers for example.
>  >
> > We simulate a machine that has been primed by HostBoot before OPAL
> > starts. So we rely on what the device-tree tells us of what PHB were
> > enabled but appart from that, we have to stick to the limitations.
>  >
> > > And there always will be the default one
> > > which properties are set in a separate way (via -global, not -device). I
> > > found it sometime really annoying to debug the existing pseries which
> > > always adds a default PHB (I know, this was to make libvirt happy but this
> > > is not the case here).
> > > 
> > > Out of curiosity - if we have 2 chips, will the system work if the second
> > > chip does not get any LPC or PHB attached?
> > 
> > This is something I need to look into, there's a lot of work needed to
> > properly model "chips" that I haven't done yet, but what is there is
> > sufficient for a lot of usages already.
> 
> For now, if possible, I'd suggest implementing -nodefaults with no defaults 
> whatsoever and create a config somewhere in the qemu tree to pass it via 
> -readconfig to get reasonably configured machine so people will know what 
> is expected to work but there will still be possibility for experiments (do 
> not we secretly hope that other vendors will start designing/manufacturing 
> their ppc64 chips?). It could be a config file per an actual POWER8 chip 
> (we have two already).

There could be, but we'd need ways to specify a bunch of things, it's
not that trivial. IE, the xscom addresses of the 3 ranges for each PHB
for example etc.. For now let's keep things simple.

I also am in no hurry to support migration with powernv, so we have
quite a bit of flexibility to change things.

Cheers,
Ben.
Benjamin Herrenschmidt Dec. 3, 2015, 10:58 p.m. UTC | #9
On Thu, 2015-12-03 at 12:45 +1100, David Gibson wrote:
> 
> There are several different cases here and I'm not sure which you're
> thinking about.
> 
> 1) Guest has different number of threads-per-core than the host
> 
> This one is just fine - PAPR defines how the guest should get the
> number of threads from the device tree, and qemu sets that correctly.
> 
> 2) Guest threads-per-core not a power of two
> 
> The PAPR thread mechanism allows this to be communicated to the guest,
> and I don't know if PAPR explicitly permits or prohibitis this
> situation.  Guests could get confused by it, although that's arguably
> a guest bug.

Linux only supports powers of 2

 .../...

> > For now, if possible, I'd suggest implementing -nodefaults with no defaults
> > whatsoever and create a config somewhere in the qemu tree to pass it via
> > -readconfig to get reasonably configured machine so people will know what is
> > expected to work but there will still be possibility for experiments (do not
> > we secretly hope that other vendors will start designing/manufacturing their
> > ppc64 chips?). It could be a config file per an actual POWER8 chip (we have
> > two already).
> 
> I can see some benefit to that approach, but it does stray away from
> current qemu practice (in general, not just compared to spapr).
> Hmm.. not sure.

I don't see an urgent need. We can do that in a separate series if we
want to. As I said in another email, I don't want powernv to be "set in
stone" in term of ABIs & migration for a while, so we have latitude to
play with the model for a while before we lock things down. At least
until after P9.

> What I do think would be a good idea is to represent a POWER8 "chip"
> as a instantiable qdev device, which will create the scoms and PHBs
> under itself as per the hardware.  We can add device properties as
> needed to make that construction more flexible.

It would also instanciate the CPUs. I want to change that in the
current model, just haven't got to it yet.

> We probably don't want to link the number of CPUs to the chip qdevs,
> partly because that doesn't really fit the qemu model, but also
> because we'll probably want some extra flexibility.  e.g. making a UP
> system for experimentation, even though a single chip would have
> multiple cores (IIUC) - SMP TCG is super slow, so we probably want that.

We want to eventually represent the EX units properly on XSCOM (the
cores). So we will need that tie between chip and CPU. But we don't
have to mark all cores in a chip as "enabled". It's ok to have chips
with only one good core for example.

Cheers,
Ben.
diff mbox

Patch

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 2a7dd42..5ebf0e0 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -5,7 +5,7 @@  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 # IBM PowerNV
-obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o
+obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_lpc.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index a7a9b0f..b4c6dd4 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -44,6 +44,9 @@ 
 #include "hw/ppc/xics.h"
 #include "hw/ppc/pnv_xscom.h"
 
+#include "hw/isa/isa.h"
+#include "hw/char/serial.h"
+#include "hw/timer/mc146818rtc.h"
 #include "exec/address-spaces.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
@@ -483,7 +486,15 @@  static const VMStateDescription vmstate_powernv = {
     .minimum_version_id = 1,
 };
 
-static void pnv_create_chip(PnvSystem *sys, unsigned int chip_no)
+static void pnv_lpc_irq_handler_cpld(void *opaque, int n, int level)
+{
+    /* We don't yet emulate the PSI bridge which provides the external
+     * interrupt, so just drop interrupts on the floor
+     */
+}
+
+static void pnv_create_chip(PnvSystem *sys, unsigned int chip_no,
+                            bool has_lpc, bool has_lpc_irq)
 {
     PnvChip *chip = &sys->chips[chip_no];
 
@@ -496,6 +507,27 @@  static void pnv_create_chip(PnvSystem *sys, unsigned int chip_no)
 
     /* Set up XSCOM bus */
     xscom_create(chip);
+
+    /* Create LPC controller */
+    if (has_lpc) {
+        pnv_lpc_create(chip, has_lpc_irq);
+
+        /* If we don't use the built-in LPC interrupt deserializer, we need
+         * to provide a set of qirqs for the ISA bus or things will go bad.
+         *
+         * Most machines using pre-Naples chips (without said deserializer)
+         * have a CPLD that will collect the SerIRQ and shoot them as a
+         * single level interrupt to the P8 chip. So let's setup a hook
+         * for doing just that.
+         *
+         * Note: The actual interrupt input isn't emulated yet, this will
+         * come with the PSI bridge model.
+         */
+        if (!has_lpc_irq) {
+            isa_bus_irqs(chip->lpc_bus,
+                         qemu_allocate_irqs(pnv_lpc_irq_handler_cpld, NULL, 16));
+        }
+    }
 }
 
 static void ppc_powernv_init(MachineState *machine)
@@ -513,6 +545,7 @@  static void ppc_powernv_init(MachineState *machine)
     sPowerNVMachineState *pnv_machine = POWERNV_MACHINE(machine);
     PnvSystem *sys = &pnv_machine->sys;
     XICSState *xics;
+    ISABus *isa_bus;
     long fw_size;
     char *filename;
     void *fdt;
@@ -557,10 +590,18 @@  static void ppc_powernv_init(MachineState *machine)
      */
     sys->num_chips = 1;
 
-    /* Create only one PHB for now until I figure out what's wrong
-     * when I create more (resource assignment failures in Linux)
+    /* Create only one chip for now with an LPC bus
      */
-    pnv_create_chip(sys, 0);
+    pnv_create_chip(sys, 0, true, false);
+
+    /* Grab chip 0's ISA bus */
+    isa_bus = sys->chips[0].lpc_bus;
+
+     /* Create serial port */
+    serial_hds_isa_init(isa_bus, MAX_SERIAL_PORTS);
+
+    /* Create an RTC ISA device too */
+    rtc_init(isa_bus, 2000, NULL);
 
     if (bios_name == NULL) {
         bios_name = FW_FILE_NAME;
diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
new file mode 100644
index 0000000..2165f24
--- /dev/null
+++ b/hw/ppc/pnv_lpc.c
@@ -0,0 +1,527 @@ 
+
+/*
+ * QEMU PowerNV LPC bus definitions
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation <dwg@au1.ibm.com>
+ * Based on the s390 virtio bus code:
+ * Copyright (c) 2009 Alexander Graf <agraf@suse.de>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "hw/hw.h"
+#include "sysemu/sysemu.h"
+#include "hw/boards.h"
+#include "monitor/monitor.h"
+#include "hw/loader.h"
+#include "elf.h"
+#include "hw/sysbus.h"
+#include "sysemu/kvm.h"
+#include "sysemu/device_tree.h"
+#include "kvm_ppc.h"
+
+#include "hw/isa/isa.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv.h"
+
+#include <libfdt.h>
+
+enum {
+    ECCB_CTL    = 0,
+    ECCB_RESET  = 1,
+    ECCB_STAT   = 2,
+    ECCB_DATA   = 3,
+};
+
+#define LPCDBG(fmt...) do { } while(0)
+//#define LPCDBG(fmt...) do { printf(fmt); } while(0)
+#define OPBDBG(fmt...) do { } while(0)
+//#define OPBDBG(fmt...) do { printf(fmt); } while(0)
+
+typedef struct PnvLpcController {
+    XScomDevice xd;
+    uint64_t eccb_stat_reg;
+    uint32_t eccb_data_reg;
+    bool has_serirq;
+
+    /* OPB bus */
+    MemoryRegion opb_mr;
+    AddressSpace opb_as;
+    /* ISA IO and Memory space */
+    MemoryRegion isa_io;
+    MemoryRegion isa_mem;
+    ISABus *isa_bus;
+    /* Windows from OPB to ISA (aliases) */
+    MemoryRegion opb_isa_io;
+    MemoryRegion opb_isa_mem;
+    MemoryRegion opb_isa_fw;
+    /* Registers */
+    MemoryRegion lpc_hc_regs;
+    MemoryRegion opb_master_regs;
+
+    /* OPB Master LS registers */
+#define OPB_MASTER_LS_IRQ_STAT  0x50
+#define   OPB_MASTER_IRQ_LPC            0x00000800
+    uint32_t opb_irq_stat;
+#define OPB_MASTER_LS_IRQ_MASK  0x54
+    uint32_t opb_irq_mask;
+#define OPB_MASTER_LS_IRQ_POL   0x58
+    uint32_t opb_irq_pol;
+
+    /* LPC HC registers */
+#define LPC_HC_FW_SEG_IDSEL     0x24
+    uint32_t lpc_hc_fw_seg_idsel;
+#define LPC_HC_FW_RD_ACC_SIZE   0x28
+#define   LPC_HC_FW_RD_1B               0x00000000
+#define   LPC_HC_FW_RD_2B               0x01000000
+#define   LPC_HC_FW_RD_4B               0x02000000
+#define   LPC_HC_FW_RD_16B              0x04000000
+#define   LPC_HC_FW_RD_128B             0x07000000
+    uint32_t lpc_hc_fw_rd_acc_size;
+#define LPC_HC_IRQSER_CTRL      0x30
+#define   LPC_HC_IRQSER_EN              0x80000000
+#define   LPC_HC_IRQSER_QMODE           0x40000000
+#define   LPC_HC_IRQSER_START_MASK      0x03000000
+#define   LPC_HC_IRQSER_START_4CLK      0x00000000
+#define   LPC_HC_IRQSER_START_6CLK      0x01000000
+#define   LPC_HC_IRQSER_START_8CLK      0x02000000
+    uint32_t lpc_hc_irqser_ctrl;
+#define LPC_HC_IRQMASK          0x34    /* same bit defs as LPC_HC_IRQSTAT */
+    uint32_t lpc_hc_irqmask;
+#define LPC_HC_IRQSTAT          0x38
+#define   LPC_HC_IRQ_SERIRQ0            0x80000000 /* all bits down to ... */
+#define   LPC_HC_IRQ_SERIRQ16           0x00008000 /* IRQ16=IOCHK#, IRQ2=SMI# */
+#define   LPC_HC_IRQ_SERIRQ_ALL         0xffff8000
+#define   LPC_HC_IRQ_LRESET             0x00000400
+#define   LPC_HC_IRQ_SYNC_ABNORM_ERR    0x00000080
+#define   LPC_HC_IRQ_SYNC_NORESP_ERR    0x00000040
+#define   LPC_HC_IRQ_SYNC_NORM_ERR      0x00000020
+#define   LPC_HC_IRQ_SYNC_TIMEOUT_ERR   0x00000010
+#define   LPC_HC_IRQ_SYNC_TARG_TAR_ERR  0x00000008
+#define   LPC_HC_IRQ_SYNC_BM_TAR_ERR    0x00000004
+#define   LPC_HC_IRQ_SYNC_BM0_REQ       0x00000002
+#define   LPC_HC_IRQ_SYNC_BM1_REQ       0x00000001
+    uint32_t lpc_hc_irqstat;
+#define LPC_HC_ERROR_ADDRESS    0x40
+    uint32_t lpc_hc_error_addr;
+
+} PnvLpcController;
+
+#define ISA_IO_SIZE             0x00010000
+#define ISA_MEM_SIZE            0x10000000
+#define LPC_IO_OPB_ADDR         0xd0010000
+#define LPC_IO_OPB_SIZE         0x00010000
+#define LPC_MEM_OPB_ADDR        0xe0010000
+#define LPC_MEM_OPB_SIZE        0x10000000
+#define LPC_FW_OPB_ADDR         0xf0000000
+#define LPC_FW_OPB_SIZE         0x10000000
+
+#define LPC_OPB_REGS_OPB_ADDR   0xc0010000
+#define LPC_OPB_REGS_OPB_SIZE   0x00002000
+#define LPC_HC_REGS_OPB_ADDR    0xc0012000
+#define LPC_HC_REGS_OPB_SIZE    0x00001000
+
+#define TYPE_PNV_LPC_CONTROLLER "pnv-lpc"
+#define PNV_LPC_CONTROLLER(obj) \
+     OBJECT_CHECK(PnvLpcController, (obj), TYPE_PNV_LPC_CONTROLLER)
+
+#define _FDT(exp) \
+    do { \
+        int ret = (exp);                                           \
+        if (ret < 0) {                                             \
+            fprintf(stderr, "qemu: error creating device tree: %s: %s\n", \
+                    #exp, fdt_strerror(ret));                      \
+            exit(1);                                               \
+        }                                                          \
+    } while (0)
+
+static int pnv_lpc_devnode(XScomDevice *dev, void *fdt)
+{
+    _FDT((fdt_property_cell(fdt, "#address-cells", 2)));
+    _FDT((fdt_property_cell(fdt, "#size-cells", 1)));
+    _FDT((fdt_property(fdt, "primary", NULL, 0)));
+    return 0;
+}
+
+static bool opb_read(PnvLpcController *lpc, uint32_t addr, uint8_t *data, int sz)
+{
+    bool success;
+
+    /* XXX Handle access size limits and FW read caching here */
+    success = !address_space_rw(&lpc->opb_as, addr, MEMTXATTRS_UNSPECIFIED,
+                                data, sz, false);
+
+    LPCDBG("OPB read @0x%08x, sz=%d data=%02x %02x %02x %02x ok=%d\n",
+           addr, sz, data[0], data[1], data[2], data[3], success);
+
+    return success;
+}
+
+static bool opb_write(PnvLpcController *lpc, uint32_t addr, uint8_t *data, int sz)
+{
+    bool success;
+
+    /* XXX Handle access size limits here */
+    success = !address_space_rw(&lpc->opb_as, addr, MEMTXATTRS_UNSPECIFIED,
+                                data, sz, true);
+
+    LPCDBG("OPB write @0x%08x, sz=%d data=%02x %02x %02x %02x ok=%d\n",
+           addr, sz, data[0], data[1], data[2], data[3], success);
+
+    return success;
+}
+
+#define ECCB_CTL_READ           (1ull << (63-15))
+#define ECCB_CTL_SZ_LSH         (63-7)
+#define ECCB_CTL_SZ_MASK        (0xfull << ECCB_CTL_SZ_LSH)
+#define ECCB_CTL_ADDR_MASK      0xffffffffu;
+
+#define ECCB_STAT_OP_DONE       (1ull << (63-52))
+#define ECCB_STAT_OP_ERR        (1ull << (63-52))
+#define ECCB_STAT_RD_DATA_LSH   (63-37)
+#define ECCB_STAT_RD_DATA_MASK  (0xffffffff << ECCB_STAT_RD_DATA_LSH)
+
+static void pnv_lpc_do_eccb(PnvLpcController *lpc, uint64_t cmd)
+{
+    /* XXX Check for magic bits at the top, addr size etc... */
+    unsigned int sz = (cmd & ECCB_CTL_SZ_MASK) >> ECCB_CTL_SZ_LSH;
+    uint32_t opb_addr = cmd & ECCB_CTL_ADDR_MASK;
+    uint8_t data[4];
+    bool success;
+
+    LPCDBG("ECCB cmd: %016llx data: %08x\n",
+           (unsigned long long)cmd, lpc->eccb_data_reg);
+
+    if (cmd & ECCB_CTL_READ) {
+        success = opb_read(lpc, opb_addr, data, sz);
+        if (success) {
+            lpc->eccb_stat_reg = ECCB_STAT_OP_DONE |
+                    (((uint64_t)data[0]) << 24 |
+                     ((uint64_t)data[1]) << 16 |
+                     ((uint64_t)data[2]) <<  8 |
+                     ((uint64_t)data[3])) << ECCB_STAT_RD_DATA_LSH;
+        } else {
+            lpc->eccb_stat_reg = ECCB_STAT_OP_DONE |
+                    (0xffffffffull << ECCB_STAT_RD_DATA_LSH);
+        }
+    } else {
+        data[0] = lpc->eccb_data_reg >> 24; 
+        data[1] = lpc->eccb_data_reg >> 16;
+        data[2] = lpc->eccb_data_reg >>  8;
+        data[3] = lpc->eccb_data_reg;
+
+        LPCDBG("OPB write @0x%08x, sz=%d data=%02x %02x %02x %02x\n",
+               opb_addr, sz, data[0], data[1], data[2], data[3]);
+
+        success = opb_write(lpc, opb_addr, data, sz);
+        lpc->eccb_stat_reg = ECCB_STAT_OP_DONE;
+    }
+    /* XXX Which error bit (if any) to signal OPB error ? */
+}
+
+static bool pnv_lpc_xscom_read(XScomDevice *dev, uint32_t range,
+                               uint32_t offset, uint64_t *out_val)
+{
+    PnvLpcController *lpc = PNV_LPC_CONTROLLER(dev);
+
+    switch(offset & 3) {
+    case ECCB_CTL:
+    case ECCB_RESET:
+        *out_val = 0;
+        break;
+    case ECCB_STAT:
+        *out_val = lpc->eccb_stat_reg;
+        lpc->eccb_stat_reg = 0;
+        break;
+    case ECCB_DATA:
+        *out_val = ((uint64_t)lpc->eccb_data_reg) << 32;
+        break;
+    }
+    return true;
+}
+
+static bool pnv_lpc_xscom_write(XScomDevice *dev, uint32_t range,
+                                uint32_t offset, uint64_t val)
+{
+    PnvLpcController *lpc = PNV_LPC_CONTROLLER(dev);
+
+    switch(offset & 3) {
+    case ECCB_CTL:
+        pnv_lpc_do_eccb(lpc, val);
+        break;
+    case ECCB_RESET:
+        /*  XXXX  */
+        break;
+    case ECCB_STAT:
+        break;
+    case ECCB_DATA:
+        lpc->eccb_data_reg = val >> 32;
+        break;
+    }
+    return true;
+}
+
+static void pnv_lpc_isa_irq_handler(void *opaque, int n, int level)
+{
+     /* XXX TODO */
+}
+
+static uint64_t lpc_hc_read(void *opaque, hwaddr addr, unsigned size)
+{
+    PnvLpcController *lpc = opaque;
+
+    if (size != 4) {
+        fprintf(stderr, "lpc_hc_read: Invalid size %d\n", size);
+        return 0xfffffffffffffffful;
+    }
+
+    OPBDBG("LPC HC read @0x%08x\n", (unsigned int)addr);
+
+    switch(addr) {
+    case LPC_HC_FW_SEG_IDSEL:
+        return lpc->lpc_hc_fw_seg_idsel;
+    case LPC_HC_FW_RD_ACC_SIZE:
+        return lpc->lpc_hc_fw_rd_acc_size;
+    case LPC_HC_IRQSER_CTRL:
+        return lpc->lpc_hc_irqser_ctrl;
+    case LPC_HC_IRQMASK:
+        return lpc->lpc_hc_irqmask;
+    case LPC_HC_IRQSTAT:
+        return lpc->lpc_hc_irqstat;
+    case LPC_HC_ERROR_ADDRESS:
+        return lpc->lpc_hc_error_addr;
+    default:
+        OPBDBG("LPC HC Unimplemented register !\n");
+        return 0xfffffffffffffffful;
+    }
+}
+
+static void lpc_hc_write(void *opaque, hwaddr addr, uint64_t val,
+                         unsigned size)
+{
+    PnvLpcController *lpc = opaque;
+
+    if (size != 4) {
+        fprintf(stderr, "lpc_hc_write: Invalid size %d\n", size);
+        return;
+    }
+
+    OPBDBG("LPC HC write @0x%08x\n", (unsigned int)addr);
+
+    /* XXX Filter out reserved bits */
+
+    switch(addr) {
+    case LPC_HC_FW_SEG_IDSEL:
+        /* XXX Actually figure out how that works as this impact
+         * memory regions/aliases\
+         */
+        lpc->lpc_hc_fw_seg_idsel = val;
+    case LPC_HC_FW_RD_ACC_SIZE:
+        lpc->lpc_hc_fw_rd_acc_size = val;
+    case LPC_HC_IRQSER_CTRL:
+        lpc->lpc_hc_irqser_ctrl = val;
+    case LPC_HC_IRQMASK:
+        lpc->lpc_hc_irqmask = val;
+    case LPC_HC_IRQSTAT:
+        lpc->lpc_hc_irqstat &= ~val;
+    case LPC_HC_ERROR_ADDRESS:
+        break;
+    default:
+        OPBDBG("LPC HC Unimplemented register !\n");
+    }
+}
+
+static const MemoryRegionOps lpc_hc_ops = {
+    .read = lpc_hc_read,
+    .write = lpc_hc_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static uint64_t opb_master_read(void *opaque, hwaddr addr, unsigned size)
+{
+    PnvLpcController *lpc = opaque;
+
+    if (size != 4) {
+        fprintf(stderr, "opb_master_read: Invalid size %d\n", size);
+        return 0xfffffffffffffffful;
+    }
+
+    OPBDBG("OPB MASTER read @0x%08x\n", (unsigned int)addr);
+
+    switch(addr) {
+    case OPB_MASTER_LS_IRQ_STAT:
+        return lpc->opb_irq_stat;
+    case OPB_MASTER_LS_IRQ_MASK:
+        return lpc->opb_irq_mask;
+    case OPB_MASTER_LS_IRQ_POL:
+        return lpc->opb_irq_pol;
+    default:
+        OPBDBG("OPB MASTER Unimplemented register !\n");
+        return 0xfffffffffffffffful;
+    }
+}
+
+static void opb_master_write(void *opaque, hwaddr addr,
+                             uint64_t val, unsigned size)
+{
+    PnvLpcController *lpc = opaque;
+
+    if (size != 4) {
+        fprintf(stderr, "opb_master_write: Invalid size %d\n", size);
+        return;
+    }
+
+    OPBDBG("OPB MASTER write @0x%08x\n", (unsigned int)addr);
+
+    switch(addr) {
+    case OPB_MASTER_LS_IRQ_STAT:
+        lpc->opb_irq_stat &= ~val;
+        break;
+    case OPB_MASTER_LS_IRQ_MASK:
+        /* XXX Filter out reserved bits */
+        lpc->opb_irq_mask = val;
+        break;
+    case OPB_MASTER_LS_IRQ_POL:
+        /* XXX Filter out reserved bits */
+        lpc->opb_irq_pol = val;
+        break;
+    default:
+        OPBDBG("OPB MASTER Unimplemented register !\n");
+    }
+}
+
+static const MemoryRegionOps opb_master_ops = {
+    .read = opb_master_read,
+    .write = opb_master_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static void pnv_lpc_realize(DeviceState *dev, Error **errp)
+{
+    PnvLpcController *lpc = PNV_LPC_CONTROLLER(dev);
+
+    /* LPC XSCOM address is fixed */
+    lpc->xd.ranges[0].addr = 0xb0020;
+    lpc->xd.ranges[0].size = 4;
+
+    /* Reg inits */
+    lpc->lpc_hc_fw_rd_acc_size = LPC_HC_FW_RD_4B;
+
+    /* Create address space and backing MR for the OPB bus */
+    memory_region_init(&lpc->opb_mr, OBJECT(dev), "lpc-opb", 0x100000000ull);
+    address_space_init(&lpc->opb_as, &lpc->opb_mr, "lpc-opb");
+
+    /* Create ISA IO and Mem space regions which are the root of
+     * the ISA bus (ie, ISA address spaces). We don't create a
+     * separate one for FW which we alias to memory.
+     */
+    memory_region_init(&lpc->isa_io, OBJECT(dev), "isa-io", ISA_IO_SIZE);
+    memory_region_init(&lpc->isa_mem, OBJECT(dev), "isa-mem", ISA_MEM_SIZE);
+
+    /* Create windows from the OPB space to the ISA space */
+    memory_region_init_alias(&lpc->opb_isa_io, OBJECT(dev), "lpc-isa-io",
+                             &lpc->isa_io, 0, LPC_IO_OPB_SIZE);
+    memory_region_add_subregion(&lpc->opb_mr, LPC_IO_OPB_ADDR,
+                                &lpc->opb_isa_io);
+    memory_region_init_alias(&lpc->opb_isa_mem, OBJECT(dev), "lpc-isa-mem",
+                             &lpc->isa_mem, 0, LPC_MEM_OPB_SIZE);
+    memory_region_add_subregion(&lpc->opb_mr, LPC_MEM_OPB_ADDR,
+                                &lpc->opb_isa_mem);
+    memory_region_init_alias(&lpc->opb_isa_fw, OBJECT(dev), "lpc-isa-fw",
+                             &lpc->isa_mem, 0, LPC_FW_OPB_SIZE);
+    memory_region_add_subregion(&lpc->opb_mr, LPC_FW_OPB_ADDR,
+                                &lpc->opb_isa_fw);
+
+
+    /* Create MMIO regions for LPC HC and OPB registers */
+    memory_region_init_io(&lpc->opb_master_regs, OBJECT(dev), &opb_master_ops,
+                          lpc, "lpc-opb-master", LPC_OPB_REGS_OPB_SIZE);
+    memory_region_add_subregion(&lpc->opb_mr, LPC_OPB_REGS_OPB_ADDR,
+                                &lpc->opb_master_regs);
+    memory_region_init_io(&lpc->lpc_hc_regs, OBJECT(dev), &lpc_hc_ops, lpc,
+                          "lpc-hc", LPC_HC_REGS_OPB_SIZE);
+    memory_region_add_subregion(&lpc->opb_mr, LPC_HC_REGS_OPB_ADDR,
+                                &lpc->lpc_hc_regs);
+
+    /* Instanciate ISA bus */
+    lpc->isa_bus = isa_bus_new(dev, &lpc->isa_mem, &lpc->isa_io);
+
+    /* Not all variants have a working serial irq decoder. If not,
+     * handling of LPC interrupts becomes a platform issue (some
+     * platforms have a CPLD to do it).
+     */
+    if (lpc->has_serirq) {
+        isa_bus_irqs(lpc->isa_bus,
+                     qemu_allocate_irqs(pnv_lpc_isa_irq_handler, lpc, 16));
+    }
+}
+
+void pnv_lpc_create(PnvChip *chip, bool has_serirq)
+{
+    struct DeviceState *dev;
+    PnvLpcController *lpc;
+
+    dev = qdev_create(&chip->xscom->bus, TYPE_PNV_LPC_CONTROLLER);
+    lpc = PNV_LPC_CONTROLLER(dev);
+    lpc->has_serirq = has_serirq;
+    qdev_init_nofail(dev);
+    chip->lpc = lpc;
+    chip->lpc_bus = lpc->isa_bus;
+}
+
+static void pnv_lpc_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    XScomDeviceClass *k = XSCOM_DEVICE_CLASS(klass);
+    static const char *compat[] = { "ibm,power8-lpc", NULL };
+
+    k->devnode = pnv_lpc_devnode;
+    k->read = pnv_lpc_xscom_read;
+    k->write = pnv_lpc_xscom_write;
+    k->dt_name = "isa";
+    k->dt_compatible = compat;
+
+    dc->realize = pnv_lpc_realize;
+}
+
+static const TypeInfo pnv_lpc_info = {
+    .name          = TYPE_PNV_LPC_CONTROLLER,
+    .parent        = TYPE_XSCOM_DEVICE,
+    .instance_size = sizeof(PnvLpcController),
+    .class_init    = pnv_lpc_class_init,
+};
+
+static void pnv_lpc_register_types(void)
+{
+    type_register_static(&pnv_lpc_info);
+}
+
+type_init(pnv_lpc_register_types)
+
diff --git a/include/hw/ppc/pnv.h b/include/hw/ppc/pnv.h
index 80617b4..77b809a 100644
--- a/include/hw/ppc/pnv.h
+++ b/include/hw/ppc/pnv.h
@@ -21,12 +21,16 @@ 
 
 #include "hw/hw.h"
 typedef struct XScomBus XScomBus;
+typedef struct ISABus ISABus;
+typedef struct PnvLpcController PnvLpcController;
 typedef struct XICSState XICSState;
 
 /* Should we turn that into a QOjb of some sort ? */
 typedef struct PnvChip {
     uint32_t         chip_id;
     XScomBus         *xscom;
+    PnvLpcController *lpc;
+    ISABus           *lpc_bus;
 } PnvChip;
 
 typedef struct PnvSystem {
@@ -36,5 +40,6 @@  typedef struct PnvSystem {
     PnvChip   chips[PNV_MAX_CHIPS];
 } PnvSystem;
 
+extern void pnv_lpc_create(PnvChip *chip, bool has_serirq);
 #endif /* _HW_PNV_LPC_H */