diff mbox

Reproducible crash on PCIe hotplug

Message ID 20161212202617-mutt-send-email-mst@kernel.org
State New
Headers show

Commit Message

Michael S. Tsirkin Dec. 12, 2016, 6:41 p.m. UTC
On Mon, Dec 12, 2016 at 05:29:15PM +0000, Stefan Hajnoczi wrote:
> On Mon, Dec 12, 2016 at 01:34:05PM +0800, Cao jin wrote:
> > 
> > 
> > On 12/10/2016 04:39 AM, Eduardo Habkost wrote:
> > > Using latest qemu.git master:
> > > 
> > >   $ qemu-system-x86_64 -machine q35 -readconfig docs/q35-chipset.cfg -monitor stdio
> > >   QEMU 2.7.93 monitor - type 'help' for more information
> > >   (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=00
> > >   (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=08
> > >   Segmentation fault (core dumped)
> > > 
> > > It crashes at:
> > > 
> > >   #7  0x000055555598d7dc in do_pci_register_device (errp=0x7fffffffbfd0, devfn=64, name=0x5555565df340 "e1000e", bus=0x555558487380, pci_dev=0x5555589cd000)
> > >       at /home/ehabkost/rh/proj/virt/qemu/hw/pci/pci.c:983
> > >   983             error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> > >   (gdb) l
> > >   978                        PCI_SLOT(devfn), PCI_FUNC(devfn), name,
> > >   979                        bus->devices[devfn]->name);
> > >   980             return NULL;
> > >   981         } else if (dev->hotplugged &&
> > >   982                    pci_get_function_0(pci_dev)) {
> > >   983             error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> > >   984                        " new func %s cannot be exposed to guest.",
> > >   985                        PCI_SLOT(devfn),
> > >   986                        bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
> > >   987                        name);
> > > 
> > 
> > Thanks for informing me. I am kind of busy for now, so I suppose I will
> > investigate it after 2.8 release.
> 
> Please let me know if this should be considered a release blocker.
> 
> The proposed QEMU 2.8 release date is tomorrow (December 13th)!
> 
> Stefan

I don't see how it's a blocker, it's an illegal configuration.
Here's the fix. It's a rather obvious one.
I'll target the fix for 2.9.
Eduardo, I'd appreciate a tested-by tag.

-->

pci: fix error message for express slots

PCI Express downstream slot has a single PCI slot
behind it, using PCI_DEVFN(PCI_SLOT(devfn), 0)
does not give you function 0 in cases such as ARI
as well as some error cases.

This is exactly what we are hitting:
   $ qemu-system-x86_64 -machine q35 -readconfig docs/q35-chipset.cfg -monitor stdio
   (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=00
   (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=08
   Segmentation fault (core dumped)

The fix is to use the pci_get_function_0 API.

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Eduardo Habkost <ehabkost@redhat.com>
---

Comments

Eduardo Habkost Dec. 12, 2016, 6:57 p.m. UTC | #1
On Mon, Dec 12, 2016 at 08:41:41PM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 12, 2016 at 05:29:15PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Dec 12, 2016 at 01:34:05PM +0800, Cao jin wrote:
> > > 
> > > 
> > > On 12/10/2016 04:39 AM, Eduardo Habkost wrote:
> > > > Using latest qemu.git master:
> > > > 
> > > >   $ qemu-system-x86_64 -machine q35 -readconfig docs/q35-chipset.cfg -monitor stdio
> > > >   QEMU 2.7.93 monitor - type 'help' for more information
> > > >   (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=00
> > > >   (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=08
> > > >   Segmentation fault (core dumped)
> > > > 
> > > > It crashes at:
> > > > 
> > > >   #7  0x000055555598d7dc in do_pci_register_device (errp=0x7fffffffbfd0, devfn=64, name=0x5555565df340 "e1000e", bus=0x555558487380, pci_dev=0x5555589cd000)
> > > >       at /home/ehabkost/rh/proj/virt/qemu/hw/pci/pci.c:983
> > > >   983             error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> > > >   (gdb) l
> > > >   978                        PCI_SLOT(devfn), PCI_FUNC(devfn), name,
> > > >   979                        bus->devices[devfn]->name);
> > > >   980             return NULL;
> > > >   981         } else if (dev->hotplugged &&
> > > >   982                    pci_get_function_0(pci_dev)) {
> > > >   983             error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> > > >   984                        " new func %s cannot be exposed to guest.",
> > > >   985                        PCI_SLOT(devfn),
> > > >   986                        bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
> > > >   987                        name);
> > > > 
> > > 
> > > Thanks for informing me. I am kind of busy for now, so I suppose I will
> > > investigate it after 2.8 release.
> > 
> > Please let me know if this should be considered a release blocker.
> > 
> > The proposed QEMU 2.8 release date is tomorrow (December 13th)!
> > 
> > Stefan
> 
> I don't see how it's a blocker, it's an illegal configuration.
> Here's the fix. It's a rather obvious one.
> I'll target the fix for 2.9.
> Eduardo, I'd appreciate a tested-by tag.

I confirm the patch fixes the crash, but the error message seems
incorrect: the existing e1000e device is on slot 0 function 0,
not slot 8.

  $ ./x86-kvm-build/x86_64-softmmu/qemu-system-x86_64 -machine q35 -readconfig docs/q35-chipset.cfg -monitor stdio
  QEMU 2.7.93 monitor - type 'help' for more information
  (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=00
  (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=08
  PCI: slot 8 function 0 already ocuppied by e1000e, new func e1000e cannot be exposed to guest.
           ^^^


> 
> -->
> 
> pci: fix error message for express slots
> 
> PCI Express downstream slot has a single PCI slot
> behind it, using PCI_DEVFN(PCI_SLOT(devfn), 0)
> does not give you function 0 in cases such as ARI
> as well as some error cases.
> 
> This is exactly what we are hitting:
>    $ qemu-system-x86_64 -machine q35 -readconfig docs/q35-chipset.cfg -monitor stdio
>    (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=00
>    (qemu) device_add e1000e,bus=ich9-pcie-port-4,addr=08
>    Segmentation fault (core dumped)
> 
> The fix is to use the pci_get_function_0 API.
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Reported-by: Eduardo Habkost <ehabkost@redhat.com>
> ---
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 24fae16..339c531 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -983,7 +983,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>          error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
>                     " new func %s cannot be exposed to guest.",
>                     PCI_SLOT(devfn),
> -                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
> +                   pci_get_function_0(pci_dev)->name,
>                     name);
>  
>         return NULL;
> 
> -- 
> MST
diff mbox

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 24fae16..339c531 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -983,7 +983,7 @@  static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
         error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
                    " new func %s cannot be exposed to guest.",
                    PCI_SLOT(devfn),
-                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
+                   pci_get_function_0(pci_dev)->name,
                    name);
 
        return NULL;