2.0 regression: loadvm assertion with ehci + tablet
diff mbox

Message ID 5339B077.4040707@redhat.com
State New
Headers show

Commit Message

Cole Robinson March 31, 2014, 6:14 p.m. UTC
On 03/30/2014 06:25 PM, Andreas Färber wrote:
> Hi,
> 
> Am 30.03.2014 22:27, schrieb Cole Robinson:
>> With git master, loadvm hits an assert failure if using ehci and usb tablet.
>> Steps to reproduce:
>>
>> $ qemu-img create -f qcow2 foo.qcow2 10G
>> $ ./x86_64-softmmu/qemu-system-x86_64 \
>>   -enable-kvm -m 4096 \
>>   -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 \
>>   -device
>> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 \
>>   -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 \
>>   -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 \
>>   -device usb-tablet,id=input0 \
>>   -hda foo.qcow2 \
>>   -cdrom Fedora-20-x86_64-Live-Desktop.iso \
>>   -boot d -monitor stdio
>>
>> <wait until guest boot to 'welcome to fedora' window>
>> (qemu) savevm foo
>> (qemu) loadvm foo
>> qemu-system-x86_64: hw/pci/pci.c:250: pcibus_reset: Assertion
>> `bus->irq_count[i] == 0' failed.
>>
>> The relevant backtrace bits for the assertion:
>>
>> #4  0x00007f8f7241971e in pcibus_reset (qbus=0x7f8f74082fd0)
>>     at hw/pci/pci.c:250
>> #5  0x00007f8f723bd36d in qbus_reset_one (bus=0x7f8f74082fd0,
>>     opaque=<optimized out>) at hw/core/qdev.c:249
>> #6  0x00007f8f723bec88 in qdev_walk_children (dev=0x7f8f73efb320,
>>     pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x7f8f723bf4f0 <qdev_reset_one>,
>>     post_busfn=0x7f8f723bd320 <qbus_reset_one>, opaque=0x0)
>>     at hw/core/qdev.c:403
>> #7  0x00007f8f723bedb8 in qbus_walk_children (bus=0x7f8f740706e0,
>>     pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x7f8f723bf4f0 <qdev_reset_one>,
>>     post_busfn=0x7f8f723bd320 <qbus_reset_one>, opaque=0x0)
>>     at hw/core/qdev.c:369
>> #8  0x00007f8f724f5c5d in qemu_devices_reset () at vl.c:1867
>> #9  qemu_system_reset (report=report@entry=false) at vl.c:1880
>> #10 0x00007f8f7256dba2 in load_vmstate (name=name@entry=0x7f8f7417a160 "foo")
>>     at /home/crobinso/src/qemu/savevm.c:1098
>>
>> The 'cause' is this:
>>
>> #0  ehci_detach (port=0x555556436968) at hw/usb/hcd-ehci.c:810
>> #1  0x0000555555727b5e in usb_detach (port=port@entry=0x555556436968)
>>     at hw/usb/core.c:49
>> #2  0x0000555555736bf3 in ehci_reset (opaque=0x5555564364d8)
>>     at hw/usb/hcd-ehci.c:941
>> #3  0x00005555557e1fcd in qemu_devices_reset () at vl.c:1867
>> #4  qemu_system_reset (report=report@entry=false) at vl.c:1880
>> #5  0x0000555555859f12 in load_vmstate (name=name@entry=0x555556458210 "foo")
>>     at /home/crobinso/src/qemu/savevm.c:1098
>>
>> ehci_reset calls usb_detach which sets pcibus->irq_count[3] = 1. pcibus_reset
>> runs and hits the assertion. But I don't understand this stuff enough to
>> determine what's actually wrong here :)
>>
>> I bisected the issue to:
>>
>> commit 31b030d4abc5bea89c2b33b39d3b302836f6b6ee
>> Author: Andreas Färber <afaerber@suse.de>
>> Date:   Wed Sep 4 01:29:02 2013 +0200
>>
>>     cputlb: Change tlb_flush_page() argument to CPUState
>>
>>     Signed-off-by: Andreas Färber <afaerber@suse.de>
>>
>> ...and then I double checked it since that sounds unrelated. Same result.
> 
> You are running into an unrelated migration bug:
> http://git.qemu.org/?p=qemu.git;a=commit;h=c01a71c1a56fa27f43449ff59e5d03b2483658a2
> 
> Sorry about that. You'll need to patch -p1 the above commit on top of
> each git-bisect commit to find the actual breakage if the above commit
> is already bad (can't test right now).
> 

Indeed, that seemed to be messing up my search, thanks.

So the real culprit is:

commit 9bdbbfc3a04c28dc43af5afffb32066623cb0022
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Dec 6 17:54:25 2013 +0100

    pci: clean up resetting of IRQs

    pci_device_reset will deassert the INTX pins, and this will make the
    irq_count array all-zeroes.  Check that this is the case, and remove
    the existing loop which might even unsync irq_count and irq_state.

Which is what adds the assert.

Looking at pci_device_reset, there is an issue:

     dev->irq_state = 0;
     pci_update_irq_status(dev);
     pci_device_deassert_intx(dev);

irq_state is cleared before pci_device_deassert_intx. But tries to clear all
irqs via pci_irq_handler, but that function will exit without taking any
action if the requested irq level matches what we already track in irq_state.
Since irq_state is 0, pci_device_deassert_intx is basically a no-op. Any
interrupts with level=1 will not be cleared, which is the case with the usb
tablet after usb_detach.

This fixes things for me, but I have no idea if it's the proper fix:


- Cole

Comments

Paolo Bonzini March 31, 2014, 6:18 p.m. UTC | #1
Il 31/03/2014 20:14, Cole Robinson ha scritto:
> irq_state is cleared before pci_device_deassert_intx. But tries to clear all
> irqs via pci_irq_handler, but that function will exit without taking any
> action if the requested irq level matches what we already track in irq_state.
> Since irq_state is 0, pci_device_deassert_intx is basically a no-op. Any
> interrupts with level=1 will not be cleared, which is the case with the usb
> tablet after usb_detach.

Thanks for the analysis.  It's my bug indeed.

> This fixes things for me, but I have no idea if it's the proper fix:
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 8f722dd..1912dfb 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -189,9 +189,9 @@ static void pci_do_device_reset(PCIDevice *dev)
>  {
>      int r;
>
> +    pci_device_deassert_intx(dev);
>      dev->irq_state = 0;
>      pci_update_irq_status(dev);
> -    pci_device_deassert_intx(dev);
>      /* Clear all writable bits */
>      pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
>                                   pci_get_word(dev->wmask + PCI_COMMAND) |

Yes, the patch is fine.  Even better, dev->irq_state = 0 could become an 
assertion too, and the call to pci_update_irq_status is not necessary at 
all.

Paolo

Patch
diff mbox

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8f722dd..1912dfb 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -189,9 +189,9 @@  static void pci_do_device_reset(PCIDevice *dev)
 {
     int r;

+    pci_device_deassert_intx(dev);
     dev->irq_state = 0;
     pci_update_irq_status(dev);
-    pci_device_deassert_intx(dev);
     /* Clear all writable bits */
     pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
                                  pci_get_word(dev->wmask + PCI_COMMAND) |