Patchwork [v2,07/11] versatile_pci: Implement the correct PCI IRQ mapping

login
register
mail settings
Submitter Peter Maydell
Date March 26, 2013, 10:22 a.m.
Message ID <1364293331-8722-8-git-send-email-peter.maydell@linaro.org>
Download mbox | patch
Permalink /patch/231174/
State New
Headers show

Comments

Peter Maydell - March 26, 2013, 10:22 a.m.
Implement the correct IRQ mapping for the Versatile PCI controller; it
differs between realview and versatile boards, but the previous QEMU
implementation was correct only for the first PCI card on a versatile
board, since we weren't swizzling IRQs based on the slot number.

Since this change would otherwise break any uses of PCI on Linux kernels
which have an equivalent bug (since they have effectively only been
tested against QEMU, not real hardware), we implement a mechanism
for automatically detecting those broken kernels and switching back
to the old mapping. This works by looking at the values the kernel
writes to the PCI_INTERRUPT_LINE register in the config space, which
is effectively the interrupt number the kernel expects the device
to be using. If this doesn't match reality we use the broken mapping.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/versatile_pci.c |   85 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 79 insertions(+), 6 deletions(-)
Arnd Bergmann - March 26, 2013, 10:54 a.m.
On Tuesday 26 March 2013, Peter Maydell wrote:
> 
> Implement the correct IRQ mapping for the Versatile PCI controller; it
> differs between realview and versatile boards, but the previous QEMU
> implementation was correct only for the first PCI card on a versatile
> board, since we weren't swizzling IRQs based on the slot number.
> 
> Since this change would otherwise break any uses of PCI on Linux kernels
> which have an equivalent bug (since they have effectively only been
> tested against QEMU, not real hardware), we implement a mechanism
> for automatically detecting those broken kernels and switching back
> to the old mapping. This works by looking at the values the kernel
> writes to the PCI_INTERRUPT_LINE register in the config space, which
> is effectively the interrupt number the kernel expects the device
> to be using. If this doesn't match reality we use the broken mapping.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Yes, very good.  We will probably introduce sparse irq support on
versatile in the near future, and then the value we write into the
PCI_INTERRUPT_LINE field will become arbitrary from qemu's point
of view, but I will make sure that we fix the interrupt mapping
in the kernel at the same time so we always fall into the
"s->broken_irq_mapping = false;" case.

We also need to find a way to make the new kernel work with
an old qemu, and I think we can do that by using the versatile-dt
board type with a PCI device node that sets all four lines to
27, while using the actual interrupt lines for the default
versatile device tree.

	Arnd
Peter Maydell - March 26, 2013, 11 a.m.
On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote:
> Yes, very good.  We will probably introduce sparse irq support on
> versatile in the near future, and then the value we write into the
> PCI_INTERRUPT_LINE field will become arbitrary from qemu's point
> of view, but I will make sure that we fix the interrupt mapping
> in the kernel at the same time so we always fall into the
> "s->broken_irq_mapping = false;" case.

Yeah, as long as you avoid the number 27 you're ok :-)

> We also need to find a way to make the new kernel work with
> an old qemu, and I think we can do that by using the versatile-dt
> board type with a PCI device node that sets all four lines to
> 27, while using the actual interrupt lines for the default
> versatile device tree.

Personally I'd be happy for you to just say "needs a new QEMU".
The broken QEMU is missing so much (including working memory
windows) that I think it would be a pain to get the kernel to
cope with it.

-- PMM
Arnd Bergmann - March 26, 2013, 11:08 a.m.
On Tuesday 26 March 2013, Peter Maydell wrote:
> 
> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote:
> > Yes, very good.  We will probably introduce sparse irq support on
> > versatile in the near future, and then the value we write into the
> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point
> > of view, but I will make sure that we fix the interrupt mapping
> > in the kernel at the same time so we always fall into the
> > "s->broken_irq_mapping = false;" case.
> 
> Yeah, as long as you avoid the number 27 you're ok :-)

Good point. I guess we'll have to keep using a legacy domain for
versatile then.

> > We also need to find a way to make the new kernel work with
> > an old qemu, and I think we can do that by using the versatile-dt
> > board type with a PCI device node that sets all four lines to
> > 27, while using the actual interrupt lines for the default
> > versatile device tree.
> 
> Personally I'd be happy for you to just say "needs a new QEMU".
> The broken QEMU is missing so much (including working memory
> windows) that I think it would be a pain to get the kernel to
> cope with it.

But it was working earlier, so I'd definitely try not to break
if at all possible. A lot of people use the verstatile qemu
model to run kernels and I would not want to deal with the
complaints I'd get if we break those. Using a separate dts
file seems easy enough.

	Arnd
Peter Maydell - March 26, 2013, 11:17 a.m.
On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 26 March 2013, Peter Maydell wrote:
>> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote:
>> > Yes, very good.  We will probably introduce sparse irq support on
>> > versatile in the near future, and then the value we write into the
>> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point
>> > of view, but I will make sure that we fix the interrupt mapping
>> > in the kernel at the same time so we always fall into the
>> > "s->broken_irq_mapping = false;" case.
>>
>> Yeah, as long as you avoid the number 27 you're ok :-)
>
> Good point. I guess we'll have to keep using a legacy domain for
> versatile then.

I'm happy to provide some other way for QEMU to detect a
new working kernel if you want to implement one in the
kernel, if that's cleaner.

>> > We also need to find a way to make the new kernel work with
>> > an old qemu, and I think we can do that by using the versatile-dt
>> > board type with a PCI device node that sets all four lines to
>> > 27, while using the actual interrupt lines for the default
>> > versatile device tree.
>>
>> Personally I'd be happy for you to just say "needs a new QEMU".
>> The broken QEMU is missing so much (including working memory
>> windows) that I think it would be a pain to get the kernel to
>> cope with it.
>
> But it was working earlier, so I'd definitely try not to break
> if at all possible. A lot of people use the verstatile qemu
> model to run kernels and I would not want to deal with the
> complaints I'd get if we break those. Using a separate dts
> file seems easy enough.

Yeah, but it only worked earlier because the kernel PCI controller
support was so broken and limited, I think. If you run a new
kernel with the old QEMU it won't work even if you avoid the
irq-mapping issues.

Also I really don't want to get people into the habit of
using a QEMU-specific dts file. The result will be that
nobody will ever move on from that dts file to the one that
gets used with real hardware.

Plus, you guys make kernel changes that break running on
QEMU all the time, so I don't see that as a big deal really.
(Most recently, vexpress needed a pile of support for the
config registers that define the voltage regulators and clocks.)

-- PMM
Michael S. Tsirkin - March 26, 2013, 9:12 p.m.
On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote:
> On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tuesday 26 March 2013, Peter Maydell wrote:
> >> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote:
> >> > Yes, very good.  We will probably introduce sparse irq support on
> >> > versatile in the near future, and then the value we write into the
> >> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point
> >> > of view, but I will make sure that we fix the interrupt mapping
> >> > in the kernel at the same time so we always fall into the
> >> > "s->broken_irq_mapping = false;" case.
> >>
> >> Yeah, as long as you avoid the number 27 you're ok :-)
> >
> > Good point. I guess we'll have to keep using a legacy domain for
> > versatile then.
> 
> I'm happy to provide some other way for QEMU to detect a
> new working kernel if you want to implement one in the
> kernel, if that's cleaner.

That's why I suggested writing some number at offset 0x08
(that's revision ID) in configuration space of device 0 on bus 0.
It's guaranteed harmless on real hardware and QEMU could detect this
write and say "aha new kernel, even though it uses #27".

> >> > We also need to find a way to make the new kernel work with
> >> > an old qemu, and I think we can do that by using the versatile-dt
> >> > board type with a PCI device node that sets all four lines to
> >> > 27, while using the actual interrupt lines for the default
> >> > versatile device tree.
> >>
> >> Personally I'd be happy for you to just say "needs a new QEMU".
> >> The broken QEMU is missing so much (including working memory
> >> windows) that I think it would be a pain to get the kernel to
> >> cope with it.
> >
> > But it was working earlier, so I'd definitely try not to break
> > if at all possible. A lot of people use the verstatile qemu
> > model to run kernels and I would not want to deal with the
> > complaints I'd get if we break those. Using a separate dts
> > file seems easy enough.
> 
> Yeah, but it only worked earlier because the kernel PCI controller
> support was so broken and limited, I think. If you run a new
> kernel with the old QEMU it won't work even if you avoid the
> irq-mapping issues.
> 
> Also I really don't want to get people into the habit of
> using a QEMU-specific dts file. The result will be that
> nobody will ever move on from that dts file to the one that
> gets used with real hardware.
> 
> Plus, you guys make kernel changes that break running on
> QEMU all the time, so I don't see that as a big deal really.
> (Most recently, vexpress needed a pile of support for the
> config registers that define the voltage regulators and clocks.)
> 
> -- PMM

I have to agree with that.
Peter Maydell - March 28, 2013, 10:28 a.m.
On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote:
>> On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Tuesday 26 March 2013, Peter Maydell wrote:
>> >> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote:
>> >> > Yes, very good.  We will probably introduce sparse irq support on
>> >> > versatile in the near future, and then the value we write into the
>> >> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point
>> >> > of view, but I will make sure that we fix the interrupt mapping
>> >> > in the kernel at the same time so we always fall into the
>> >> > "s->broken_irq_mapping = false;" case.
>> >>
>> >> Yeah, as long as you avoid the number 27 you're ok :-)
>> >
>> > Good point. I guess we'll have to keep using a legacy domain for
>> > versatile then.
>>
>> I'm happy to provide some other way for QEMU to detect a
>> new working kernel if you want to implement one in the
>> kernel, if that's cleaner.
>
> That's why I suggested writing some number at offset 0x08
> (that's revision ID) in configuration space of device 0 on bus 0.
> It's guaranteed harmless on real hardware and QEMU could detect this
> write and say "aha new kernel, even though it uses #27".

OK, that makes sense. Would somebody like to provide a kernel
patch (preferably against 2.6.35+Arnd's patchset :-)) which
just prods that ID space, so I have something to test the
QEMU end against?

thanks
-- PMM
Michael S. Tsirkin - April 4, 2013, 10:58 a.m.
On Thu, Apr 04, 2013 at 12:05:51PM +0100, Peter Maydell wrote:
> On 28 March 2013 10:28, Peter Maydell <peter.maydell@linaro.org> wrote:
> > On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote:
> >>> I'm happy to provide some other way for QEMU to detect a
> >>> new working kernel if you want to implement one in the
> >>> kernel, if that's cleaner.
> >>
> >> That's why I suggested writing some number at offset 0x08
> >> (that's revision ID) in configuration space of device 0 on bus 0.
> >> It's guaranteed harmless on real hardware and QEMU could detect this
> >> write and say "aha new kernel, even though it uses #27".
> >
> > OK, that makes sense.
> 
> In fact I've just realised that "allow new kernels to use #27
> without getting the back-compat broken irq mapping" conflicts
> with "allow new kernels to kexec old broken kernels".

Could you clarify why, pls?  Also - does this really matter so much?
kexec between version is always a risky affair...

> So the
> easiest approach is just to drop the code which allows us to
> switch back to broken mode again. Then the new kernel's init
> code can force non-broken mode by writing something other than
> #27 to some device's PCI_INTERRUPT_LINE register, and then
> is free to use #27 if it wants to.
> 
> -- PMM

I'm fine with this too.
Peter Maydell - April 4, 2013, 11:05 a.m.
On 28 March 2013 10:28, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote:
>>> I'm happy to provide some other way for QEMU to detect a
>>> new working kernel if you want to implement one in the
>>> kernel, if that's cleaner.
>>
>> That's why I suggested writing some number at offset 0x08
>> (that's revision ID) in configuration space of device 0 on bus 0.
>> It's guaranteed harmless on real hardware and QEMU could detect this
>> write and say "aha new kernel, even though it uses #27".
>
> OK, that makes sense.

In fact I've just realised that "allow new kernels to use #27
without getting the back-compat broken irq mapping" conflicts
with "allow new kernels to kexec old broken kernels". So the
easiest approach is just to drop the code which allows us to
switch back to broken mode again. Then the new kernel's init
code can force non-broken mode by writing something other than
#27 to some device's PCI_INTERRUPT_LINE register, and then
is free to use #27 if it wants to.

-- PMM
Peter Maydell - April 4, 2013, 12:16 p.m.
On 4 April 2013 11:58, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Apr 04, 2013 at 12:05:51PM +0100, Peter Maydell wrote:
>> On 28 March 2013 10:28, Peter Maydell <peter.maydell@linaro.org> wrote:
>> > On 26 March 2013 21:12, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> On Tue, Mar 26, 2013 at 11:17:55AM +0000, Peter Maydell wrote:
>> >>> I'm happy to provide some other way for QEMU to detect a
>> >>> new working kernel if you want to implement one in the
>> >>> kernel, if that's cleaner.
>> >>
>> >> That's why I suggested writing some number at offset 0x08
>> >> (that's revision ID) in configuration space of device 0 on bus 0.
>> >> It's guaranteed harmless on real hardware and QEMU could detect this
>> >> write and say "aha new kernel, even though it uses #27".
>> >
>> > OK, that makes sense.
>>
>> In fact I've just realised that "allow new kernels to use #27
>> without getting the back-compat broken irq mapping" conflicts
>> with "allow new kernels to kexec old broken kernels".
>
> Could you clarify why, pls?

Either we provide a path for QEMU's state to go from "this
is a new kernel and we act like hardware" to "this is an old
kernel and we do not", or we don't. If we provide a way for
that state transition to occur, then new kernels can't use
#27; if we don't, then the kexec'd old kernel wouldn't see
the old broken mapping it expects. I only stuck the "allow
us to go back to the old mapping" code in there because I
didn't see why it would hurt at the time.

>  Also - does this really matter so much?
> kexec between version is always a risky affair...

I agree, which is why I'm dropping that idea.

-- PMM
Peter Maydell - May 14, 2013, 11:45 a.m.
On 26 March 2013 11:08, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 26 March 2013, Peter Maydell wrote:
>>
>> On 26 March 2013 10:54, Arnd Bergmann <arnd@arndb.de> wrote:
>> > Yes, very good.  We will probably introduce sparse irq support on
>> > versatile in the near future, and then the value we write into the
>> > PCI_INTERRUPT_LINE field will become arbitrary from qemu's point
>> > of view, but I will make sure that we fix the interrupt mapping
>> > in the kernel at the same time so we always fall into the
>> > "s->broken_irq_mapping = false;" case.
>>
>> Yeah, as long as you avoid the number 27 you're ok :-)
>
> Good point. I guess we'll have to keep using a legacy domain for
> versatile then.

So there turns out to be a problem with this. Newer
kernels (I'm looking at current mainline master) write
a number other than 27 (in this case 94) to the
PCI_INTERRUPT_LINE register, but they still assume the
old broken IRQ mapping. So our "autodetect busted kernels"
code doesn't work.

I've also discovered that there are some kernels (including
current master) which don't like it when the host PCI bridge
goes at slot 29 like it does on real hardware; so I think
I'll just have to revert that patch.

-- PMM

Patch

diff --git a/hw/versatile_pci.c b/hw/versatile_pci.c
index 576e619..1b56796 100644
--- a/hw/versatile_pci.c
+++ b/hw/versatile_pci.c
@@ -26,6 +26,9 @@  typedef struct {
 
     /* Constant for life of device: */
     int realview;
+
+    /* Variable state: */
+    bool broken_irq_mapping;
 } PCIVPBState;
 
 #define TYPE_VERSATILE_PCI "versatile_pci"
@@ -44,14 +47,37 @@  static inline uint32_t vpb_pci_config_addr(hwaddr addr)
 static void pci_vpb_config_write(void *opaque, hwaddr addr,
                                  uint64_t val, unsigned size)
 {
-    pci_data_write(opaque, vpb_pci_config_addr(addr), val, size);
+    /* Old and buggy versions of QEMU used the wrong mapping from
+     * PCI IRQs to system interrupt lines. Unfortunately the Linux
+     * kernel also had the corresponding bug in setting up interrupts
+     * (so older kernels work on QEMU and not on real hardware).
+     * We automatically detect these broken kernels and flip back
+     * to the broken irq mapping by spotting guest writes to the
+     * PCI_INTERRUPT_LINE register to see where the guest thinks
+     * interrupts are going to be routed.
+     * We allow arbitrary flipping between broken and non broken
+     * to permit corner cases like kexec between two kernels.
+     */
+    PCIVPBState *s = (PCIVPBState *)opaque;
+    if (!s->realview && (addr & 0xff) == PCI_INTERRUPT_LINE) {
+        uint8_t devfn = addr >> 8;
+        if ((PCI_SLOT(devfn) % PCI_NUM_PINS) != 2) {
+            if (val == 27) {
+                s->broken_irq_mapping = true;
+            } else {
+                s->broken_irq_mapping = false;
+            }
+        }
+    }
+    pci_data_write(&s->pci_bus, vpb_pci_config_addr(addr), val, size);
 }
 
 static uint64_t pci_vpb_config_read(void *opaque, hwaddr addr,
                                     unsigned size)
 {
+    PCIVPBState *s = (PCIVPBState *)opaque;
     uint32_t val;
-    val = pci_data_read(opaque, vpb_pci_config_addr(addr), size);
+    val = pci_data_read(&s->pci_bus, vpb_pci_config_addr(addr), size);
     return val;
 }
 
@@ -63,7 +89,47 @@  static const MemoryRegionOps pci_vpb_config_ops = {
 
 static int pci_vpb_map_irq(PCIDevice *d, int irq_num)
 {
-    return irq_num;
+    PCIVPBState *s = container_of(d->bus, PCIVPBState, pci_bus);
+
+    if (s->broken_irq_mapping) {
+        /* Legacy broken IRQ mapping for compatibility with old and
+         * buggy Linux guests
+         */
+        return irq_num;
+    }
+
+    /* Slot to IRQ mapping for RealView Platform Baseboard 926 backplane
+     *      name    slot    IntA    IntB    IntC    IntD
+     *      A       31      IRQ28   IRQ29   IRQ30   IRQ27
+     *      B       30      IRQ27   IRQ28   IRQ29   IRQ30
+     *      C       29      IRQ30   IRQ27   IRQ28   IRQ29
+     * Slot C is for the host bridge; A and B the peripherals.
+     * Our output irqs 0..3 correspond to the baseboard's 27..30.
+     *
+     * This mapping function takes account of an oddity in the PB926
+     * board wiring, where the FPGA's P_nINTA input is connected to
+     * the INTB connection on the board PCI edge connector, P_nINTB
+     * is connected to INTC, and so on, so everything is one number
+     * further round from where you might expect.
+     */
+    return pci_swizzle_map_irq_fn(d, irq_num + 2);
+}
+
+static int pci_vpb_rv_map_irq(PCIDevice *d, int irq_num)
+{
+    /* Slot to IRQ mapping for RealView EB and PB1176 backplane
+     *      name    slot    IntA    IntB    IntC    IntD
+     *      A       31      IRQ50   IRQ51   IRQ48   IRQ49
+     *      B       30      IRQ49   IRQ50   IRQ51   IRQ48
+     *      C       29      IRQ48   IRQ49   IRQ50   IRQ51
+     * Slot C is for the host bridge; A and B the peripherals.
+     * Our output irqs 0..3 correspond to the baseboard's 48..51.
+     *
+     * The PB1176 and EB boards don't have the PB926 wiring oddity
+     * described above; P_nINTA connects to INTA, P_nINTB to INTB
+     * and so on, which is why this mapping function is different.
+     */
+    return pci_swizzle_map_irq_fn(d, irq_num + 3);
 }
 
 static void pci_vpb_set_irq(void *opaque, int irq_num, int level)
@@ -95,13 +161,20 @@  static void pci_vpb_realize(DeviceState *dev, Error **errp)
 {
     PCIVPBState *s = PCI_VPB(dev);
     SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+    pci_map_irq_fn mapfn;
     int i;
 
     for (i = 0; i < 4; i++) {
         sysbus_init_irq(sbd, &s->irq[i]);
     }
 
-    pci_bus_irqs(&s->pci_bus, pci_vpb_set_irq, pci_vpb_map_irq, s->irq, 4);
+    if (s->realview) {
+        mapfn = pci_vpb_rv_map_irq;
+    } else {
+        mapfn = pci_vpb_map_irq;
+    }
+
+    pci_bus_irqs(&s->pci_bus, pci_vpb_set_irq, mapfn, s->irq, 4);
 
     /* ??? Register memory space.  */
 
@@ -110,10 +183,10 @@  static void pci_vpb_realize(DeviceState *dev, Error **errp)
      * 1 : PCI config window
      * 2 : PCI IO window
      */
-    memory_region_init_io(&s->mem_config, &pci_vpb_config_ops, &s->pci_bus,
+    memory_region_init_io(&s->mem_config, &pci_vpb_config_ops, s,
                           "pci-vpb-selfconfig", 0x1000000);
     sysbus_init_mmio(sbd, &s->mem_config);
-    memory_region_init_io(&s->mem_config2, &pci_vpb_config_ops, &s->pci_bus,
+    memory_region_init_io(&s->mem_config2, &pci_vpb_config_ops, s,
                           "pci-vpb-config", 0x1000000);
     sysbus_init_mmio(sbd, &s->mem_config2);