diff mbox

`qdev_free` when unplug a pci device

Message ID 4D6B0DF8.5000407@cn.fujitsu.com
State New
Headers show

Commit Message

Wen Congyang Feb. 28, 2011, 2:52 a.m. UTC
Hi Markus Armbruster

At 02/23/2011 04:30 PM, Markus Armbruster Write:
> Isaku Yamahata <yamahata@valinux.co.jp> writes:
> 

<snip>

> 
> I don't think this patch is correct.  Let me explain.
> 
> Device hot unplug is *not* guaranteed to succeed.
> 
> For some buses, such as USB, it always succeeds immediately, i.e. when
> the device_del monitor command finishes, the device is gone.  Live is
> good.
> 
> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
> doesn't wait for the dance to complete.  Why?  The dance can take an
> unpredictable amount of time, including forever.
> 
> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
> slot, and the unplug has not yet completed (race condition), or it
> failed.  Yes, Virginia, PCI hotplug *can* fail.
> 
> When unplug succeeds, the qdev is automatically destroyed.
> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
> does it for PCIE.

I got a similar problem.  When I unplug a pci device by hand, it works
as expected, and I can hotplug it again. But when I use a srcipt to
do the same thing, sometimes it failed. I think I may find another bug.

Steps to reproduce this bug:
1. cat ./test-e1000.sh # RHEL6RC is domain name
   #! /bin/bash

   while true; do
           virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
           if [[ $? -ne 0 ]]; then
                   break
           fi
           virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
           if [[ $? -ne 0 ]]; then
                   break
           fi
           sleep 5
   done

2. ./test-e1000.sh
   Interface attached successfully

   Interface detached successfully

   Interface attached successfully

   Interface detached successfully

   error: Failed to attach interface
   error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device


I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
# cat trace | grep 'irq=9'
          <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
          <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
          <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
          <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
          <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
          <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
According to the log, we only receive 3 sci interrupt, and one is lost.

I enable piix4's debug and 1 line printf into piix4_device_hotplug:
    printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
here is the log:

Comments

Isaku Yamahata March 1, 2011, 4:11 a.m. UTC | #1
Hi.

I don't suppose that just introducing pending bits solve the issue.
Your test only use single hot plug/unplug. How about mixing of multiple
hot plug/unplug with different slots.
Zeroing up/down on piix4_device_hotplug() is the culprit.

State machine of (up, down) would be needed.
(up, down) of each slots should be changed following
something like

- on power-on  (0, 0)
- on hot plug  (0, 0) -> (1, 0)
               if other combination -> error
- on hot unpug (1, 0) or (0, 0) -> (0, 1)
               (0, 0) is for cold plugged devices.
               (1, 0) is for hot plugged devices.
               if other combination -> error
- on pciej_write(write on PCI_EJ_BASE)
               (0, 1) -> (0, 0) and qdev_free()
               if other combination -> ignore

When enabling sci, those bits are checked and raise sci if necessary.

Any comments on this?
Anyway the current pci hotplug-related commands are somewhat broken,
and needs clean up with multifunction hotplug support which is
on my todo list.

thanks

On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote:
> Hi Markus Armbruster
> 
> At 02/23/2011 04:30 PM, Markus Armbruster Write:
> > Isaku Yamahata <yamahata@valinux.co.jp> writes:
> > 
> 
> <snip>
> 
> > 
> > I don't think this patch is correct.  Let me explain.
> > 
> > Device hot unplug is *not* guaranteed to succeed.
> > 
> > For some buses, such as USB, it always succeeds immediately, i.e. when
> > the device_del monitor command finishes, the device is gone.  Live is
> > good.
> > 
> > But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
> > doesn't wait for the dance to complete.  Why?  The dance can take an
> > unpredictable amount of time, including forever.
> > 
> > Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
> > slot, and the unplug has not yet completed (race condition), or it
> > failed.  Yes, Virginia, PCI hotplug *can* fail.
> > 
> > When unplug succeeds, the qdev is automatically destroyed.
> > pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
> > does it for PCIE.
> 
> I got a similar problem.  When I unplug a pci device by hand, it works
> as expected, and I can hotplug it again. But when I use a srcipt to
> do the same thing, sometimes it failed. I think I may find another bug.
> 
> Steps to reproduce this bug:
> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>    #! /bin/bash
> 
>    while true; do
>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>            if [[ $? -ne 0 ]]; then
>                    break
>            fi
>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>            if [[ $? -ne 0 ]]; then
>                    break
>            fi
>            sleep 5
>    done
> 
> 2. ./test-e1000.sh
>    Interface attached successfully
> 
>    Interface detached successfully
> 
>    Interface attached successfully
> 
>    Interface detached successfully
> 
>    error: Failed to attach interface
>    error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device
> 
> 
> I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
> # cat trace | grep 'irq=9'
>           <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
>           <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
>           <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
>           <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
>           <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
>           <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
> According to the log, we only receive 3 sci interrupt, and one is lost.
> 
> I enable piix4's debug and 1 line printf into piix4_device_hotplug:
>     printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
> here is the log:
> ========
> PM: mapping to 0xb000
> PM readw port=0x0004 val=0x0000
> ...
> gpe write afe2 <== 0
> gpe write afe0 <== 255
> gpe write afe3 <== 0
> gpe write afe1 <== 255
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0000
> gpe read afe0 == 0
> gpe read afe2 == 0
> gpe read afe1 == 0
> gpe read afe3 == 0
> PM writew port=0x0000 val=0x0020
> PM readw port=0x0002 val=0x0000
> PM writew port=0x0002 val=0x0020
> PM readw port=0x0002 val=0x0020
> gpe write afe2 <== 255
> gpe write afe3 <== 255
> ...
> slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0120
> gpe read afe0 == 2
> gpe read afe2 == ff
> gpe read afe2 == ff
> gpe write afe2 <== 253
> gpe read afe1 == 0
> gpe read afe3 == ff
> pcihotplug read ae00 == 40
> pcihotplug read ae04 == 0
> ...
> gpe write afe0 <== 2
> gpe write afe2 <== 255
> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0120
> gpe read afe0 == 2
> gpe read afe2 == ff
> gpe read afe2 == ff
> gpe write afe2 <== 253
> gpe read afe1 == 0
> gpe read afe3 == ff
> pcihotplug read ae00 == 0
> pcihotplug read ae04 == 40
> ...
> pciej write ae08 <== 64		<=== we will call qdev_free()
> pciej write ae08 <== 64
> gpe write afe0 <== 2
> gpe write afe2 <== 255
> slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0120
> gpe read afe0 == 2
> gpe read afe2 == ff
> gpe read afe2 == ff
> gpe write afe2 <== 253
> gpe read afe1 == 0
> gpe read afe3 == ff
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> gpe write afe2 <== 255
> ===========
> 
> Here is short log, and show the different between the first time and second time:
> ===========
> gpe write afe0 <== 2
> gpe write afe2 <== 255
> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> ....
> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> gpe write afe2 <== 255
> 
> ===========
> 
> Does this behavior is a normal behavior?
> 
> The following patch can fix this problem.
> 
> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Mon, 28 Feb 2011 10:33:45 +0800
> Subject: [PATCH] pend hotplug event until last event is handled by guest OS
> 
> ---
>  hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
>  1 files changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index b5a2762..4ff3475 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState {
>      /* for pci hotplug */
>      struct gpe_regs gpe;
>      struct pci_status pci0_status;
> +    struct pci_status pci0_status_pending;
>      uint32_t pci0_hotplug_enable;
>  } PIIX4PMState;
>  
> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
>      *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
>  }
>  
> +static void raise_pending_hotplug(PIIX4PMState *s)
> +{
> +    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
> +        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
> +        s->pci0_status.up = s->pci0_status_pending.up;
> +        s->pci0_status.down = s->pci0_status_pending.down;
> +        s->pci0_status_pending.up = 0;
> +        s->pci0_status_pending.down = 0;
> +
> +        pm_update_sci(s);
> +    }
> +}
> +
>  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>  {
>      PIIX4PMState *s = opaque;
> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>      pm_update_sci(s);
>  
>      PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
> +
> +    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
> +        /* check and reraise the pending hotplug event */
> +        raise_pending_hotplug(s);
> +    }
>  }
>  
>  static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot)
>      s->pci0_status.up |= (1 << slot);
>  }
>  
> +static void pend_enable_device(PIIX4PMState *s, int slot)
> +{
> +    s->pci0_status_pending.up |= (1 << slot);
> +}
> +
>  static void disable_device(PIIX4PMState *s, int slot)
>  {
>      s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>      s->pci0_status.down |= (1 << slot);
>  }
>  
> +static void pend_disable_device(PIIX4PMState *s, int slot)
> +{
> +    s->pci0_status_pending.down |= (1 << slot);
> +}
> +
>  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>  				PCIHotplugState state)
>  {
> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>          return 0;
>      }
>  
> -    s->pci0_status.up = 0;
> -    s->pci0_status.down = 0;
> -    if (state == PCI_HOTPLUG_ENABLED) {
> -        enable_device(s, slot);
> +    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
> +        if (state == PCI_HOTPLUG_ENABLED) {
> +            pend_enable_device(s, slot);
> +        } else {
> +            pend_disable_device(s, slot);
> +        }
>      } else {
> -        disable_device(s, slot);
> -    }
> +        s->pci0_status.up = 0;
> +        s->pci0_status.down = 0;
> +        if (state == PCI_HOTPLUG_ENABLED) {
> +            enable_device(s, slot);
> +        } else {
> +            disable_device(s, slot);
> +        }
>  
> -    pm_update_sci(s);
> +        pm_update_sci(s);
> +    }
>  
>      return 0;
>  }
> -- 
> 1.7.1
> 
> > 
> > Your patch adds a *second* qdev_free().  No good.
> > 
> > 
> 
>
Wen Congyang March 1, 2011, 6:58 a.m. UTC | #2
At 03/01/2011 12:11 PM, Isaku Yamahata Write:
> Hi.
> 
> I don't suppose that just introducing pending bits solve the issue.
> Your test only use single hot plug/unplug. How about mixing of multiple
> hot plug/unplug with different slots.

The qemu uses the same thread to deal with monitor command and I/O request
from guest OS. So I think mixing of multiple hot plug/unplug with different
slots can also work fine.

> Zeroing up/down on piix4_device_hotplug() is the culprit.
> 
> State machine of (up, down) would be needed.
> (up, down) of each slots should be changed following
> something like

Why we need this feature? We access s->pci0_status only in main thread and
do not accesss it when we deal with signal, so there is no competition to
access it.

> 
> - on power-on  (0, 0)
> - on hot plug  (0, 0) -> (1, 0)
>                if other combination -> error
> - on hot unpug (1, 0) or (0, 0) -> (0, 1)
>                (0, 0) is for cold plugged devices.
>                (1, 0) is for hot plugged devices.
>                if other combination -> error
> - on pciej_write(write on PCI_EJ_BASE)
>                (0, 1) -> (0, 0) and qdev_free()
>                if other combination -> ignore
> 
> When enabling sci, those bits are checked and raise sci if necessary.
> 
> Any comments on this?
> Anyway the current pci hotplug-related commands are somewhat broken,
> and needs clean up with multifunction hotplug support which is
> on my todo list.
> 
> thanks
> 
> On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote:
>> Hi Markus Armbruster
>>
>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
>>>
>>
>> <snip>
>>
>>>
>>> I don't think this patch is correct.  Let me explain.
>>>
>>> Device hot unplug is *not* guaranteed to succeed.
>>>
>>> For some buses, such as USB, it always succeeds immediately, i.e. when
>>> the device_del monitor command finishes, the device is gone.  Live is
>>> good.
>>>
>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
>>> doesn't wait for the dance to complete.  Why?  The dance can take an
>>> unpredictable amount of time, including forever.
>>>
>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
>>> slot, and the unplug has not yet completed (race condition), or it
>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
>>>
>>> When unplug succeeds, the qdev is automatically destroyed.
>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
>>> does it for PCIE.
>>
>> I got a similar problem.  When I unplug a pci device by hand, it works
>> as expected, and I can hotplug it again. But when I use a srcipt to
>> do the same thing, sometimes it failed. I think I may find another bug.
>>
>> Steps to reproduce this bug:
>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>>    #! /bin/bash
>>
>>    while true; do
>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>>            if [[ $? -ne 0 ]]; then
>>                    break
>>            fi
>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>>            if [[ $? -ne 0 ]]; then
>>                    break
>>            fi
>>            sleep 5
>>    done
>>
>> 2. ./test-e1000.sh
>>    Interface attached successfully
>>
>>    Interface detached successfully
>>
>>    Interface attached successfully
>>
>>    Interface detached successfully
>>
>>    error: Failed to attach interface
>>    error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device
>>
>>
>> I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
>> # cat trace | grep 'irq=9'
>>           <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
>>           <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
>>           <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
>>           <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
>>           <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
>>           <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
>> According to the log, we only receive 3 sci interrupt, and one is lost.
>>
>> I enable piix4's debug and 1 line printf into piix4_device_hotplug:
>>     printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
>> here is the log:
>> ========
>> PM: mapping to 0xb000
>> PM readw port=0x0004 val=0x0000
>> ...
>> gpe write afe2 <== 0
>> gpe write afe0 <== 255
>> gpe write afe3 <== 0
>> gpe write afe1 <== 255
>> PM readw port=0x0000 val=0x0001
>> PM readw port=0x0002 val=0x0000
>> gpe read afe0 == 0
>> gpe read afe2 == 0
>> gpe read afe1 == 0
>> gpe read afe3 == 0
>> PM writew port=0x0000 val=0x0020
>> PM readw port=0x0002 val=0x0000
>> PM writew port=0x0002 val=0x0020
>> PM readw port=0x0002 val=0x0020
>> gpe write afe2 <== 255
>> gpe write afe3 <== 255
>> ...
>> slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
>> PM readw port=0x0000 val=0x0001
>> PM readw port=0x0002 val=0x0120
>> gpe read afe0 == 2
>> gpe read afe2 == ff
>> gpe read afe2 == ff
>> gpe write afe2 <== 253
>> gpe read afe1 == 0
>> gpe read afe3 == ff
>> pcihotplug read ae00 == 40
>> pcihotplug read ae04 == 0
>> ...
>> gpe write afe0 <== 2
>> gpe write afe2 <== 255
>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
>> PM readw port=0x0000 val=0x0001
>> PM readw port=0x0002 val=0x0120
>> gpe read afe0 == 2
>> gpe read afe2 == ff
>> gpe read afe2 == ff
>> gpe write afe2 <== 253
>> gpe read afe1 == 0
>> gpe read afe3 == ff
>> pcihotplug read ae00 == 0
>> pcihotplug read ae04 == 40
>> ...
>> pciej write ae08 <== 64		<=== we will call qdev_free()
>> pciej write ae08 <== 64
>> gpe write afe0 <== 2
>> gpe write afe2 <== 255
>> slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
>> PM readw port=0x0000 val=0x0001
>> PM readw port=0x0002 val=0x0120
>> gpe read afe0 == 2
>> gpe read afe2 == ff
>> gpe read afe2 == ff
>> gpe write afe2 <== 253
>> gpe read afe1 == 0
>> gpe read afe3 == ff
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> pcihotplug read ae00 == 80
>> pcihotplug read ae04 == 0
>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
>> gpe write afe2 <== 255
>> ===========
>>
>> Here is short log, and show the different between the first time and second time:
>> ===========
>> gpe write afe0 <== 2
>> gpe write afe2 <== 255
>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
>> ....
>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
>> gpe write afe2 <== 255
>>
>> ===========
>>
>> Does this behavior is a normal behavior?
>>
>> The following patch can fix this problem.
>>
>> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
>> From: Wen Congyang <wency@cn.fujitsu.com>
>> Date: Mon, 28 Feb 2011 10:33:45 +0800
>> Subject: [PATCH] pend hotplug event until last event is handled by guest OS
>>
>> ---
>>  hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
>>  1 files changed, 44 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
>> index b5a2762..4ff3475 100644
>> --- a/hw/acpi_piix4.c
>> +++ b/hw/acpi_piix4.c
>> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState {
>>      /* for pci hotplug */
>>      struct gpe_regs gpe;
>>      struct pci_status pci0_status;
>> +    struct pci_status pci0_status_pending;
>>      uint32_t pci0_hotplug_enable;
>>  } PIIX4PMState;
>>  
>> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
>>      *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
>>  }
>>  
>> +static void raise_pending_hotplug(PIIX4PMState *s)
>> +{
>> +    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
>> +        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>> +        s->pci0_status.up = s->pci0_status_pending.up;
>> +        s->pci0_status.down = s->pci0_status_pending.down;
>> +        s->pci0_status_pending.up = 0;
>> +        s->pci0_status_pending.down = 0;
>> +
>> +        pm_update_sci(s);
>> +    }
>> +}
>> +
>>  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>>  {
>>      PIIX4PMState *s = opaque;
>> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>>      pm_update_sci(s);
>>  
>>      PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
>> +
>> +    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
>> +        /* check and reraise the pending hotplug event */
>> +        raise_pending_hotplug(s);
>> +    }
>>  }
>>  
>>  static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
>> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot)
>>      s->pci0_status.up |= (1 << slot);
>>  }
>>  
>> +static void pend_enable_device(PIIX4PMState *s, int slot)
>> +{
>> +    s->pci0_status_pending.up |= (1 << slot);
>> +}
>> +
>>  static void disable_device(PIIX4PMState *s, int slot)
>>  {
>>      s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>>      s->pci0_status.down |= (1 << slot);
>>  }
>>  
>> +static void pend_disable_device(PIIX4PMState *s, int slot)
>> +{
>> +    s->pci0_status_pending.down |= (1 << slot);
>> +}
>> +
>>  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>>  				PCIHotplugState state)
>>  {
>> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>>          return 0;
>>      }
>>  
>> -    s->pci0_status.up = 0;
>> -    s->pci0_status.down = 0;
>> -    if (state == PCI_HOTPLUG_ENABLED) {
>> -        enable_device(s, slot);
>> +    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
>> +        if (state == PCI_HOTPLUG_ENABLED) {
>> +            pend_enable_device(s, slot);
>> +        } else {
>> +            pend_disable_device(s, slot);
>> +        }
>>      } else {
>> -        disable_device(s, slot);
>> -    }
>> +        s->pci0_status.up = 0;
>> +        s->pci0_status.down = 0;
>> +        if (state == PCI_HOTPLUG_ENABLED) {
>> +            enable_device(s, slot);
>> +        } else {
>> +            disable_device(s, slot);
>> +        }
>>  
>> -    pm_update_sci(s);
>> +        pm_update_sci(s);
>> +    }
>>  
>>      return 0;
>>  }
>> -- 
>> 1.7.1
>>
>>>
>>> Your patch adds a *second* qdev_free().  No good.
>>>
>>>
>>
>>
>
Isaku Yamahata March 1, 2011, 7:13 a.m. UTC | #3
On Tue, Mar 01, 2011 at 02:58:44PM +0800, Wen Congyang wrote:
> At 03/01/2011 12:11 PM, Isaku Yamahata Write:
> > Hi.
> > 
> > I don't suppose that just introducing pending bits solve the issue.
> > Your test only use single hot plug/unplug. How about mixing of multiple
> > hot plug/unplug with different slots.
> 
> The qemu uses the same thread to deal with monitor command and I/O request
> from guest OS. So I think mixing of multiple hot plug/unplug with different
> slots can also work fine.
> 
> > Zeroing up/down on piix4_device_hotplug() is the culprit.
> > 
> > State machine of (up, down) would be needed.
> > (up, down) of each slots should be changed following
> > something like
> 
> Why we need this feature? We access s->pci0_status only in main thread and
> do not accesss it when we deal with signal, so there is no competition to
> access it.

We are talking about losing interrupt and events, aren't we?
Not race caused by threads.
The issue is, Qemu injected sci interrupt into guest,
but before the guest completes to handled it,
users/qemu can start to inject the next sci event triggered by hot plug/unplug.
Thus qemu loses sci interrupt and up/down status.

thanks
> 
> > 
> > - on power-on  (0, 0)
> > - on hot plug  (0, 0) -> (1, 0)
> >                if other combination -> error
> > - on hot unpug (1, 0) or (0, 0) -> (0, 1)
> >                (0, 0) is for cold plugged devices.
> >                (1, 0) is for hot plugged devices.
> >                if other combination -> error
> > - on pciej_write(write on PCI_EJ_BASE)
> >                (0, 1) -> (0, 0) and qdev_free()
> >                if other combination -> ignore
> > 
> > When enabling sci, those bits are checked and raise sci if necessary.
> > 
> > Any comments on this?
> > Anyway the current pci hotplug-related commands are somewhat broken,
> > and needs clean up with multifunction hotplug support which is
> > on my todo list.
> > 
> > thanks
> > 
> > On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote:
> >> Hi Markus Armbruster
> >>
> >> At 02/23/2011 04:30 PM, Markus Armbruster Write:
> >>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
> >>>
> >>
> >> <snip>
> >>
> >>>
> >>> I don't think this patch is correct.  Let me explain.
> >>>
> >>> Device hot unplug is *not* guaranteed to succeed.
> >>>
> >>> For some buses, such as USB, it always succeeds immediately, i.e. when
> >>> the device_del monitor command finishes, the device is gone.  Live is
> >>> good.
> >>>
> >>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
> >>> doesn't wait for the dance to complete.  Why?  The dance can take an
> >>> unpredictable amount of time, including forever.
> >>>
> >>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
> >>> slot, and the unplug has not yet completed (race condition), or it
> >>> failed.  Yes, Virginia, PCI hotplug *can* fail.
> >>>
> >>> When unplug succeeds, the qdev is automatically destroyed.
> >>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
> >>> does it for PCIE.
> >>
> >> I got a similar problem.  When I unplug a pci device by hand, it works
> >> as expected, and I can hotplug it again. But when I use a srcipt to
> >> do the same thing, sometimes it failed. I think I may find another bug.
> >>
> >> Steps to reproduce this bug:
> >> 1. cat ./test-e1000.sh # RHEL6RC is domain name
> >>    #! /bin/bash
> >>
> >>    while true; do
> >>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
> >>            if [[ $? -ne 0 ]]; then
> >>                    break
> >>            fi
> >>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
> >>            if [[ $? -ne 0 ]]; then
> >>                    break
> >>            fi
> >>            sleep 5
> >>    done
> >>
> >> 2. ./test-e1000.sh
> >>    Interface attached successfully
> >>
> >>    Interface detached successfully
> >>
> >>    Interface attached successfully
> >>
> >>    Interface detached successfully
> >>
> >>    error: Failed to attach interface
> >>    error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device
> >>
> >>
> >> I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
> >> # cat trace | grep 'irq=9'
> >>           <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
> >>           <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
> >>           <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
> >>           <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
> >>           <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
> >>           <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
> >> According to the log, we only receive 3 sci interrupt, and one is lost.
> >>
> >> I enable piix4's debug and 1 line printf into piix4_device_hotplug:
> >>     printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
> >> here is the log:
> >> ========
> >> PM: mapping to 0xb000
> >> PM readw port=0x0004 val=0x0000
> >> ...
> >> gpe write afe2 <== 0
> >> gpe write afe0 <== 255
> >> gpe write afe3 <== 0
> >> gpe write afe1 <== 255
> >> PM readw port=0x0000 val=0x0001
> >> PM readw port=0x0002 val=0x0000
> >> gpe read afe0 == 0
> >> gpe read afe2 == 0
> >> gpe read afe1 == 0
> >> gpe read afe3 == 0
> >> PM writew port=0x0000 val=0x0020
> >> PM readw port=0x0002 val=0x0000
> >> PM writew port=0x0002 val=0x0020
> >> PM readw port=0x0002 val=0x0020
> >> gpe write afe2 <== 255
> >> gpe write afe3 <== 255
> >> ...
> >> slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
> >> PM readw port=0x0000 val=0x0001
> >> PM readw port=0x0002 val=0x0120
> >> gpe read afe0 == 2
> >> gpe read afe2 == ff
> >> gpe read afe2 == ff
> >> gpe write afe2 <== 253
> >> gpe read afe1 == 0
> >> gpe read afe3 == ff
> >> pcihotplug read ae00 == 40
> >> pcihotplug read ae04 == 0
> >> ...
> >> gpe write afe0 <== 2
> >> gpe write afe2 <== 255
> >> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> >> PM readw port=0x0000 val=0x0001
> >> PM readw port=0x0002 val=0x0120
> >> gpe read afe0 == 2
> >> gpe read afe2 == ff
> >> gpe read afe2 == ff
> >> gpe write afe2 <== 253
> >> gpe read afe1 == 0
> >> gpe read afe3 == ff
> >> pcihotplug read ae00 == 0
> >> pcihotplug read ae04 == 40
> >> ...
> >> pciej write ae08 <== 64		<=== we will call qdev_free()
> >> pciej write ae08 <== 64
> >> gpe write afe0 <== 2
> >> gpe write afe2 <== 255
> >> slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
> >> PM readw port=0x0000 val=0x0001
> >> PM readw port=0x0002 val=0x0120
> >> gpe read afe0 == 2
> >> gpe read afe2 == ff
> >> gpe read afe2 == ff
> >> gpe write afe2 <== 253
> >> gpe read afe1 == 0
> >> gpe read afe3 == ff
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> pcihotplug read ae00 == 80
> >> pcihotplug read ae04 == 0
> >> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> >> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> >> gpe write afe2 <== 255
> >> ===========
> >>
> >> Here is short log, and show the different between the first time and second time:
> >> ===========
> >> gpe write afe0 <== 2
> >> gpe write afe2 <== 255
> >> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> >> ....
> >> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> >> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> >> gpe write afe2 <== 255
> >>
> >> ===========
> >>
> >> Does this behavior is a normal behavior?
> >>
> >> The following patch can fix this problem.
> >>
> >> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
> >> From: Wen Congyang <wency@cn.fujitsu.com>
> >> Date: Mon, 28 Feb 2011 10:33:45 +0800
> >> Subject: [PATCH] pend hotplug event until last event is handled by guest OS
> >>
> >> ---
> >>  hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
> >>  1 files changed, 44 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> >> index b5a2762..4ff3475 100644
> >> --- a/hw/acpi_piix4.c
> >> +++ b/hw/acpi_piix4.c
> >> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState {
> >>      /* for pci hotplug */
> >>      struct gpe_regs gpe;
> >>      struct pci_status pci0_status;
> >> +    struct pci_status pci0_status_pending;
> >>      uint32_t pci0_hotplug_enable;
> >>  } PIIX4PMState;
> >>  
> >> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
> >>      *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
> >>  }
> >>  
> >> +static void raise_pending_hotplug(PIIX4PMState *s)
> >> +{
> >> +    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
> >> +        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
> >> +        s->pci0_status.up = s->pci0_status_pending.up;
> >> +        s->pci0_status.down = s->pci0_status_pending.down;
> >> +        s->pci0_status_pending.up = 0;
> >> +        s->pci0_status_pending.down = 0;
> >> +
> >> +        pm_update_sci(s);
> >> +    }
> >> +}
> >> +
> >>  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
> >>  {
> >>      PIIX4PMState *s = opaque;
> >> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
> >>      pm_update_sci(s);
> >>  
> >>      PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
> >> +
> >> +    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
> >> +        /* check and reraise the pending hotplug event */
> >> +        raise_pending_hotplug(s);
> >> +    }
> >>  }
> >>  
> >>  static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
> >> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot)
> >>      s->pci0_status.up |= (1 << slot);
> >>  }
> >>  
> >> +static void pend_enable_device(PIIX4PMState *s, int slot)
> >> +{
> >> +    s->pci0_status_pending.up |= (1 << slot);
> >> +}
> >> +
> >>  static void disable_device(PIIX4PMState *s, int slot)
> >>  {
> >>      s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
> >>      s->pci0_status.down |= (1 << slot);
> >>  }
> >>  
> >> +static void pend_disable_device(PIIX4PMState *s, int slot)
> >> +{
> >> +    s->pci0_status_pending.down |= (1 << slot);
> >> +}
> >> +
> >>  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> >>  				PCIHotplugState state)
> >>  {
> >> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> >>          return 0;
> >>      }
> >>  
> >> -    s->pci0_status.up = 0;
> >> -    s->pci0_status.down = 0;
> >> -    if (state == PCI_HOTPLUG_ENABLED) {
> >> -        enable_device(s, slot);
> >> +    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
> >> +        if (state == PCI_HOTPLUG_ENABLED) {
> >> +            pend_enable_device(s, slot);
> >> +        } else {
> >> +            pend_disable_device(s, slot);
> >> +        }
> >>      } else {
> >> -        disable_device(s, slot);
> >> -    }
> >> +        s->pci0_status.up = 0;
> >> +        s->pci0_status.down = 0;
> >> +        if (state == PCI_HOTPLUG_ENABLED) {
> >> +            enable_device(s, slot);
> >> +        } else {
> >> +            disable_device(s, slot);
> >> +        }
> >>  
> >> -    pm_update_sci(s);
> >> +        pm_update_sci(s);
> >> +    }
> >>  
> >>      return 0;
> >>  }
> >> -- 
> >> 1.7.1
> >>
> >>>
> >>> Your patch adds a *second* qdev_free().  No good.
> >>>
> >>>
> >>
> >>
> > 
>
Wen Congyang March 1, 2011, 7:32 a.m. UTC | #4
At 03/01/2011 03:13 PM, Isaku Yamahata Write:
> On Tue, Mar 01, 2011 at 02:58:44PM +0800, Wen Congyang wrote:
>> At 03/01/2011 12:11 PM, Isaku Yamahata Write:
>>> Hi.
>>>
>>> I don't suppose that just introducing pending bits solve the issue.
>>> Your test only use single hot plug/unplug. How about mixing of multiple
>>> hot plug/unplug with different slots.
>>
>> The qemu uses the same thread to deal with monitor command and I/O request
>> from guest OS. So I think mixing of multiple hot plug/unplug with different
>> slots can also work fine.
>>
>>> Zeroing up/down on piix4_device_hotplug() is the culprit.
>>>
>>> State machine of (up, down) would be needed.
>>> (up, down) of each slots should be changed following
>>> something like
>>
>> Why we need this feature? We access s->pci0_status only in main thread and
>> do not accesss it when we deal with signal, so there is no competition to
>> access it.
> 
> We are talking about losing interrupt and events, aren't we?
> Not race caused by threads.

I am sorry, I do not understand what you are talk about.

> The issue is, Qemu injected sci interrupt into guest,
> but before the guest completes to handled it,
> users/qemu can start to inject the next sci event triggered by hot plug/unplug.
> Thus qemu loses sci interrupt and up/down status.

Yes, you are right. But I still do not understand why introducing pending bits
can not solve the issue?

Thanks
Wen Congyang

> 
> thanks
>>
>>>
>>> - on power-on  (0, 0)
>>> - on hot plug  (0, 0) -> (1, 0)
>>>                if other combination -> error
>>> - on hot unpug (1, 0) or (0, 0) -> (0, 1)
>>>                (0, 0) is for cold plugged devices.
>>>                (1, 0) is for hot plugged devices.
>>>                if other combination -> error
>>> - on pciej_write(write on PCI_EJ_BASE)
>>>                (0, 1) -> (0, 0) and qdev_free()
>>>                if other combination -> ignore
>>>
>>> When enabling sci, those bits are checked and raise sci if necessary.
>>>
>>> Any comments on this?
>>> Anyway the current pci hotplug-related commands are somewhat broken,
>>> and needs clean up with multifunction hotplug support which is
>>> on my todo list.
>>>
>>> thanks
>>>
>>> On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote:
>>>> Hi Markus Armbruster
>>>>
>>>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
>>>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
>>>>>
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> I don't think this patch is correct.  Let me explain.
>>>>>
>>>>> Device hot unplug is *not* guaranteed to succeed.
>>>>>
>>>>> For some buses, such as USB, it always succeeds immediately, i.e. when
>>>>> the device_del monitor command finishes, the device is gone.  Live is
>>>>> good.
>>>>>
>>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
>>>>> doesn't wait for the dance to complete.  Why?  The dance can take an
>>>>> unpredictable amount of time, including forever.
>>>>>
>>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
>>>>> slot, and the unplug has not yet completed (race condition), or it
>>>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
>>>>>
>>>>> When unplug succeeds, the qdev is automatically destroyed.
>>>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
>>>>> does it for PCIE.
>>>>
>>>> I got a similar problem.  When I unplug a pci device by hand, it works
>>>> as expected, and I can hotplug it again. But when I use a srcipt to
>>>> do the same thing, sometimes it failed. I think I may find another bug.
>>>>
>>>> Steps to reproduce this bug:
>>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>>>>    #! /bin/bash
>>>>
>>>>    while true; do
>>>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>>>>            if [[ $? -ne 0 ]]; then
>>>>                    break
>>>>            fi
>>>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>>>>            if [[ $? -ne 0 ]]; then
>>>>                    break
>>>>            fi
>>>>            sleep 5
>>>>    done
>>>>
>>>> 2. ./test-e1000.sh
>>>>    Interface attached successfully
>>>>
>>>>    Interface detached successfully
>>>>
>>>>    Interface attached successfully
>>>>
>>>>    Interface detached successfully
>>>>
>>>>    error: Failed to attach interface
>>>>    error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device
>>>>
>>>>
>>>> I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
>>>> # cat trace | grep 'irq=9'
>>>>           <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
>>>>           <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
>>>>           <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
>>>>           <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
>>>>           <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
>>>>           <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
>>>> According to the log, we only receive 3 sci interrupt, and one is lost.
>>>>
>>>> I enable piix4's debug and 1 line printf into piix4_device_hotplug:
>>>>     printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
>>>> here is the log:
>>>> ========
>>>> PM: mapping to 0xb000
>>>> PM readw port=0x0004 val=0x0000
>>>> ...
>>>> gpe write afe2 <== 0
>>>> gpe write afe0 <== 255
>>>> gpe write afe3 <== 0
>>>> gpe write afe1 <== 255
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0000
>>>> gpe read afe0 == 0
>>>> gpe read afe2 == 0
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == 0
>>>> PM writew port=0x0000 val=0x0020
>>>> PM readw port=0x0002 val=0x0000
>>>> PM writew port=0x0002 val=0x0020
>>>> PM readw port=0x0002 val=0x0020
>>>> gpe write afe2 <== 255
>>>> gpe write afe3 <== 255
>>>> ...
>>>> slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0120
>>>> gpe read afe0 == 2
>>>> gpe read afe2 == ff
>>>> gpe read afe2 == ff
>>>> gpe write afe2 <== 253
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == ff
>>>> pcihotplug read ae00 == 40
>>>> pcihotplug read ae04 == 0
>>>> ...
>>>> gpe write afe0 <== 2
>>>> gpe write afe2 <== 255
>>>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0120
>>>> gpe read afe0 == 2
>>>> gpe read afe2 == ff
>>>> gpe read afe2 == ff
>>>> gpe write afe2 <== 253
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == ff
>>>> pcihotplug read ae00 == 0
>>>> pcihotplug read ae04 == 40
>>>> ...
>>>> pciej write ae08 <== 64		<=== we will call qdev_free()
>>>> pciej write ae08 <== 64
>>>> gpe write afe0 <== 2
>>>> gpe write afe2 <== 255
>>>> slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0120
>>>> gpe read afe0 == 2
>>>> gpe read afe2 == ff
>>>> gpe read afe2 == ff
>>>> gpe write afe2 <== 253
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == ff
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
>>>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
>>>> gpe write afe2 <== 255
>>>> ===========
>>>>
>>>> Here is short log, and show the different between the first time and second time:
>>>> ===========
>>>> gpe write afe0 <== 2
>>>> gpe write afe2 <== 255
>>>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
>>>> ....
>>>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
>>>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
>>>> gpe write afe2 <== 255
>>>>
>>>> ===========
>>>>
>>>> Does this behavior is a normal behavior?
>>>>
>>>> The following patch can fix this problem.
>>>>
>>>> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>> Date: Mon, 28 Feb 2011 10:33:45 +0800
>>>> Subject: [PATCH] pend hotplug event until last event is handled by guest OS
>>>>
>>>> ---
>>>>  hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
>>>>  1 files changed, 44 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
>>>> index b5a2762..4ff3475 100644
>>>> --- a/hw/acpi_piix4.c
>>>> +++ b/hw/acpi_piix4.c
>>>> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState {
>>>>      /* for pci hotplug */
>>>>      struct gpe_regs gpe;
>>>>      struct pci_status pci0_status;
>>>> +    struct pci_status pci0_status_pending;
>>>>      uint32_t pci0_hotplug_enable;
>>>>  } PIIX4PMState;
>>>>  
>>>> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
>>>>      *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
>>>>  }
>>>>  
>>>> +static void raise_pending_hotplug(PIIX4PMState *s)
>>>> +{
>>>> +    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
>>>> +        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>>>> +        s->pci0_status.up = s->pci0_status_pending.up;
>>>> +        s->pci0_status.down = s->pci0_status_pending.down;
>>>> +        s->pci0_status_pending.up = 0;
>>>> +        s->pci0_status_pending.down = 0;
>>>> +
>>>> +        pm_update_sci(s);
>>>> +    }
>>>> +}
>>>> +
>>>>  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>>>>  {
>>>>      PIIX4PMState *s = opaque;
>>>> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>>>>      pm_update_sci(s);
>>>>  
>>>>      PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
>>>> +
>>>> +    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
>>>> +        /* check and reraise the pending hotplug event */
>>>> +        raise_pending_hotplug(s);
>>>> +    }
>>>>  }
>>>>  
>>>>  static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
>>>> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot)
>>>>      s->pci0_status.up |= (1 << slot);
>>>>  }
>>>>  
>>>> +static void pend_enable_device(PIIX4PMState *s, int slot)
>>>> +{
>>>> +    s->pci0_status_pending.up |= (1 << slot);
>>>> +}
>>>> +
>>>>  static void disable_device(PIIX4PMState *s, int slot)
>>>>  {
>>>>      s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>>>>      s->pci0_status.down |= (1 << slot);
>>>>  }
>>>>  
>>>> +static void pend_disable_device(PIIX4PMState *s, int slot)
>>>> +{
>>>> +    s->pci0_status_pending.down |= (1 << slot);
>>>> +}
>>>> +
>>>>  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>>>>  				PCIHotplugState state)
>>>>  {
>>>> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>>>>          return 0;
>>>>      }
>>>>  
>>>> -    s->pci0_status.up = 0;
>>>> -    s->pci0_status.down = 0;
>>>> -    if (state == PCI_HOTPLUG_ENABLED) {
>>>> -        enable_device(s, slot);
>>>> +    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
>>>> +        if (state == PCI_HOTPLUG_ENABLED) {
>>>> +            pend_enable_device(s, slot);
>>>> +        } else {
>>>> +            pend_disable_device(s, slot);
>>>> +        }
>>>>      } else {
>>>> -        disable_device(s, slot);
>>>> -    }
>>>> +        s->pci0_status.up = 0;
>>>> +        s->pci0_status.down = 0;
>>>> +        if (state == PCI_HOTPLUG_ENABLED) {
>>>> +            enable_device(s, slot);
>>>> +        } else {
>>>> +            disable_device(s, slot);
>>>> +        }
>>>>  
>>>> -    pm_update_sci(s);
>>>> +        pm_update_sci(s);
>>>> +    }
>>>>  
>>>>      return 0;
>>>>  }
>>>> -- 
>>>> 1.7.1
>>>>
>>>>>
>>>>> Your patch adds a *second* qdev_free().  No good.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
Isaku Yamahata March 1, 2011, 9:49 a.m. UTC | #5
On Tue, Mar 01, 2011 at 03:32:08PM +0800, Wen Congyang wrote:
> > The issue is, Qemu injected sci interrupt into guest,
> > but before the guest completes to handled it,
> > users/qemu can start to inject the next sci event triggered by hot plug/unplug.
> > Thus qemu loses sci interrupt and up/down status.
> 
> Yes, you are right. But I still do not understand why introducing pending bits
> can not solve the issue?

Maybe your patch works.
The difference is to introduce new variables or new state machine.

Anyway I suppose that the eventual solution is to switch to 
express native hotplug which is much more reliable and doesn't rely
on ACPI. (note: I'm very biased here.)

thanks

> 
> Thanks
> Wen Congyang
> 
> > 
> > thanks
> >>
> >>>
> >>> - on power-on  (0, 0)
> >>> - on hot plug  (0, 0) -> (1, 0)
> >>>                if other combination -> error
> >>> - on hot unpug (1, 0) or (0, 0) -> (0, 1)
> >>>                (0, 0) is for cold plugged devices.
> >>>                (1, 0) is for hot plugged devices.
> >>>                if other combination -> error
> >>> - on pciej_write(write on PCI_EJ_BASE)
> >>>                (0, 1) -> (0, 0) and qdev_free()
> >>>                if other combination -> ignore
> >>>
> >>> When enabling sci, those bits are checked and raise sci if necessary.
> >>>
> >>> Any comments on this?
> >>> Anyway the current pci hotplug-related commands are somewhat broken,
> >>> and needs clean up with multifunction hotplug support which is
> >>> on my todo list.
> >>>
> >>> thanks
> >>>
> >>> On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote:
> >>>> Hi Markus Armbruster
> >>>>
> >>>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
> >>>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
> >>>>>
> >>>>
> >>>> <snip>
> >>>>
> >>>>>
> >>>>> I don't think this patch is correct.  Let me explain.
> >>>>>
> >>>>> Device hot unplug is *not* guaranteed to succeed.
> >>>>>
> >>>>> For some buses, such as USB, it always succeeds immediately, i.e. when
> >>>>> the device_del monitor command finishes, the device is gone.  Live is
> >>>>> good.
> >>>>>
> >>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
> >>>>> doesn't wait for the dance to complete.  Why?  The dance can take an
> >>>>> unpredictable amount of time, including forever.
> >>>>>
> >>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
> >>>>> slot, and the unplug has not yet completed (race condition), or it
> >>>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
> >>>>>
> >>>>> When unplug succeeds, the qdev is automatically destroyed.
> >>>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
> >>>>> does it for PCIE.
> >>>>
> >>>> I got a similar problem.  When I unplug a pci device by hand, it works
> >>>> as expected, and I can hotplug it again. But when I use a srcipt to
> >>>> do the same thing, sometimes it failed. I think I may find another bug.
> >>>>
> >>>> Steps to reproduce this bug:
> >>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
> >>>>    #! /bin/bash
> >>>>
> >>>>    while true; do
> >>>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
> >>>>            if [[ $? -ne 0 ]]; then
> >>>>                    break
> >>>>            fi
> >>>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
> >>>>            if [[ $? -ne 0 ]]; then
> >>>>                    break
> >>>>            fi
> >>>>            sleep 5
> >>>>    done
> >>>>
> >>>> 2. ./test-e1000.sh
> >>>>    Interface attached successfully
> >>>>
> >>>>    Interface detached successfully
> >>>>
> >>>>    Interface attached successfully
> >>>>
> >>>>    Interface detached successfully
> >>>>
> >>>>    error: Failed to attach interface
> >>>>    error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device
> >>>>
> >>>>
> >>>> I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
> >>>> # cat trace | grep 'irq=9'
> >>>>           <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
> >>>>           <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
> >>>>           <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
> >>>>           <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
> >>>>           <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
> >>>>           <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
> >>>> According to the log, we only receive 3 sci interrupt, and one is lost.
> >>>>
> >>>> I enable piix4's debug and 1 line printf into piix4_device_hotplug:
> >>>>     printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
> >>>> here is the log:
> >>>> ========
> >>>> PM: mapping to 0xb000
> >>>> PM readw port=0x0004 val=0x0000
> >>>> ...
> >>>> gpe write afe2 <== 0
> >>>> gpe write afe0 <== 255
> >>>> gpe write afe3 <== 0
> >>>> gpe write afe1 <== 255
> >>>> PM readw port=0x0000 val=0x0001
> >>>> PM readw port=0x0002 val=0x0000
> >>>> gpe read afe0 == 0
> >>>> gpe read afe2 == 0
> >>>> gpe read afe1 == 0
> >>>> gpe read afe3 == 0
> >>>> PM writew port=0x0000 val=0x0020
> >>>> PM readw port=0x0002 val=0x0000
> >>>> PM writew port=0x0002 val=0x0020
> >>>> PM readw port=0x0002 val=0x0020
> >>>> gpe write afe2 <== 255
> >>>> gpe write afe3 <== 255
> >>>> ...
> >>>> slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
> >>>> PM readw port=0x0000 val=0x0001
> >>>> PM readw port=0x0002 val=0x0120
> >>>> gpe read afe0 == 2
> >>>> gpe read afe2 == ff
> >>>> gpe read afe2 == ff
> >>>> gpe write afe2 <== 253
> >>>> gpe read afe1 == 0
> >>>> gpe read afe3 == ff
> >>>> pcihotplug read ae00 == 40
> >>>> pcihotplug read ae04 == 0
> >>>> ...
> >>>> gpe write afe0 <== 2
> >>>> gpe write afe2 <== 255
> >>>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> >>>> PM readw port=0x0000 val=0x0001
> >>>> PM readw port=0x0002 val=0x0120
> >>>> gpe read afe0 == 2
> >>>> gpe read afe2 == ff
> >>>> gpe read afe2 == ff
> >>>> gpe write afe2 <== 253
> >>>> gpe read afe1 == 0
> >>>> gpe read afe3 == ff
> >>>> pcihotplug read ae00 == 0
> >>>> pcihotplug read ae04 == 40
> >>>> ...
> >>>> pciej write ae08 <== 64		<=== we will call qdev_free()
> >>>> pciej write ae08 <== 64
> >>>> gpe write afe0 <== 2
> >>>> gpe write afe2 <== 255
> >>>> slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
> >>>> PM readw port=0x0000 val=0x0001
> >>>> PM readw port=0x0002 val=0x0120
> >>>> gpe read afe0 == 2
> >>>> gpe read afe2 == ff
> >>>> gpe read afe2 == ff
> >>>> gpe write afe2 <== 253
> >>>> gpe read afe1 == 0
> >>>> gpe read afe3 == ff
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> pcihotplug read ae00 == 80
> >>>> pcihotplug read ae04 == 0
> >>>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> >>>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> >>>> gpe write afe2 <== 255
> >>>> ===========
> >>>>
> >>>> Here is short log, and show the different between the first time and second time:
> >>>> ===========
> >>>> gpe write afe0 <== 2
> >>>> gpe write afe2 <== 255
> >>>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> >>>> ....
> >>>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> >>>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> >>>> gpe write afe2 <== 255
> >>>>
> >>>> ===========
> >>>>
> >>>> Does this behavior is a normal behavior?
> >>>>
> >>>> The following patch can fix this problem.
> >>>>
> >>>> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
> >>>> From: Wen Congyang <wency@cn.fujitsu.com>
> >>>> Date: Mon, 28 Feb 2011 10:33:45 +0800
> >>>> Subject: [PATCH] pend hotplug event until last event is handled by guest OS
> >>>>
> >>>> ---
> >>>>  hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
> >>>>  1 files changed, 44 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> >>>> index b5a2762..4ff3475 100644
> >>>> --- a/hw/acpi_piix4.c
> >>>> +++ b/hw/acpi_piix4.c
> >>>> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState {
> >>>>      /* for pci hotplug */
> >>>>      struct gpe_regs gpe;
> >>>>      struct pci_status pci0_status;
> >>>> +    struct pci_status pci0_status_pending;
> >>>>      uint32_t pci0_hotplug_enable;
> >>>>  } PIIX4PMState;
> >>>>  
> >>>> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
> >>>>      *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
> >>>>  }
> >>>>  
> >>>> +static void raise_pending_hotplug(PIIX4PMState *s)
> >>>> +{
> >>>> +    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
> >>>> +        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
> >>>> +        s->pci0_status.up = s->pci0_status_pending.up;
> >>>> +        s->pci0_status.down = s->pci0_status_pending.down;
> >>>> +        s->pci0_status_pending.up = 0;
> >>>> +        s->pci0_status_pending.down = 0;
> >>>> +
> >>>> +        pm_update_sci(s);
> >>>> +    }
> >>>> +}
> >>>> +
> >>>>  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
> >>>>  {
> >>>>      PIIX4PMState *s = opaque;
> >>>> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
> >>>>      pm_update_sci(s);
> >>>>  
> >>>>      PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
> >>>> +
> >>>> +    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
> >>>> +        /* check and reraise the pending hotplug event */
> >>>> +        raise_pending_hotplug(s);
> >>>> +    }
> >>>>  }
> >>>>  
> >>>>  static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
> >>>> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot)
> >>>>      s->pci0_status.up |= (1 << slot);
> >>>>  }
> >>>>  
> >>>> +static void pend_enable_device(PIIX4PMState *s, int slot)
> >>>> +{
> >>>> +    s->pci0_status_pending.up |= (1 << slot);
> >>>> +}
> >>>> +
> >>>>  static void disable_device(PIIX4PMState *s, int slot)
> >>>>  {
> >>>>      s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
> >>>>      s->pci0_status.down |= (1 << slot);
> >>>>  }
> >>>>  
> >>>> +static void pend_disable_device(PIIX4PMState *s, int slot)
> >>>> +{
> >>>> +    s->pci0_status_pending.down |= (1 << slot);
> >>>> +}
> >>>> +
> >>>>  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> >>>>  				PCIHotplugState state)
> >>>>  {
> >>>> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> >>>>          return 0;
> >>>>      }
> >>>>  
> >>>> -    s->pci0_status.up = 0;
> >>>> -    s->pci0_status.down = 0;
> >>>> -    if (state == PCI_HOTPLUG_ENABLED) {
> >>>> -        enable_device(s, slot);
> >>>> +    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
> >>>> +        if (state == PCI_HOTPLUG_ENABLED) {
> >>>> +            pend_enable_device(s, slot);
> >>>> +        } else {
> >>>> +            pend_disable_device(s, slot);
> >>>> +        }
> >>>>      } else {
> >>>> -        disable_device(s, slot);
> >>>> -    }
> >>>> +        s->pci0_status.up = 0;
> >>>> +        s->pci0_status.down = 0;
> >>>> +        if (state == PCI_HOTPLUG_ENABLED) {
> >>>> +            enable_device(s, slot);
> >>>> +        } else {
> >>>> +            disable_device(s, slot);
> >>>> +        }
> >>>>  
> >>>> -    pm_update_sci(s);
> >>>> +        pm_update_sci(s);
> >>>> +    }
> >>>>  
> >>>>      return 0;
> >>>>  }
> >>>> -- 
> >>>> 1.7.1
> >>>>
> >>>>>
> >>>>> Your patch adds a *second* qdev_free().  No good.
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> > 
>
Ryan Harper March 9, 2011, 4:08 a.m. UTC | #6
* Wen Congyang <wency@cn.fujitsu.com> [2011-02-27 20:56]:
> Hi Markus Armbruster
> 
> At 02/23/2011 04:30 PM, Markus Armbruster Write:
> > Isaku Yamahata <yamahata@valinux.co.jp> writes:
> > 
> 
> <snip>
> 
> > 
> > I don't think this patch is correct.  Let me explain.
> > 
> > Device hot unplug is *not* guaranteed to succeed.
> > 
> > For some buses, such as USB, it always succeeds immediately, i.e. when
> > the device_del monitor command finishes, the device is gone.  Live is
> > good.
> > 
> > But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
> > doesn't wait for the dance to complete.  Why?  The dance can take an
> > unpredictable amount of time, including forever.
> > 
> > Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
> > slot, and the unplug has not yet completed (race condition), or it
> > failed.  Yes, Virginia, PCI hotplug *can* fail.
> > 
> > When unplug succeeds, the qdev is automatically destroyed.
> > pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
> > does it for PCIE.
> 
> I got a similar problem.  When I unplug a pci device by hand, it works
> as expected, and I can hotplug it again. But when I use a srcipt to
> do the same thing, sometimes it failed. I think I may find another bug.
> 
> Steps to reproduce this bug:
> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>    #! /bin/bash
> 
>    while true; do
>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>            if [[ $? -ne 0 ]]; then
>                    break
>            fi
>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>            if [[ $? -ne 0 ]]; then
>                    break
>            fi
>            sleep 5

How do you know that the guest has responded at this point before you
attempt to attach again at the top of the loop.  Any attach/detach
requires the guest to respond to the request and it may not respond at
all.


>    done
> 
> 2. ./test-e1000.sh
>    Interface attached successfully
> 
>    Interface detached successfully
> 
>    Interface attached successfully
> 
>    Interface detached successfully
> 
>    error: Failed to attach interface
>    error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device
> 
> 
> I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
> # cat trace | grep 'irq=9'
>           <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
>           <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
>           <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
>           <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
>           <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
>           <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
> According to the log, we only receive 3 sci interrupt, and one is lost.
> 
> I enable piix4's debug and 1 line printf into piix4_device_hotplug:
>     printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
> here is the log:
> ========
> PM: mapping to 0xb000
> PM readw port=0x0004 val=0x0000
> ...
> gpe write afe2 <== 0
> gpe write afe0 <== 255
> gpe write afe3 <== 0
> gpe write afe1 <== 255
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0000
> gpe read afe0 == 0
> gpe read afe2 == 0
> gpe read afe1 == 0
> gpe read afe3 == 0
> PM writew port=0x0000 val=0x0020
> PM readw port=0x0002 val=0x0000
> PM writew port=0x0002 val=0x0020
> PM readw port=0x0002 val=0x0020
> gpe write afe2 <== 255
> gpe write afe3 <== 255
> ...
> slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0120
> gpe read afe0 == 2
> gpe read afe2 == ff
> gpe read afe2 == ff
> gpe write afe2 <== 253
> gpe read afe1 == 0
> gpe read afe3 == ff
> pcihotplug read ae00 == 40
> pcihotplug read ae04 == 0
> ...
> gpe write afe0 <== 2
> gpe write afe2 <== 255
> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0120
> gpe read afe0 == 2
> gpe read afe2 == ff
> gpe read afe2 == ff
> gpe write afe2 <== 253
> gpe read afe1 == 0
> gpe read afe3 == ff
> pcihotplug read ae00 == 0
> pcihotplug read ae04 == 40
> ...
> pciej write ae08 <== 64		<=== we will call qdev_free()
> pciej write ae08 <== 64
> gpe write afe0 <== 2
> gpe write afe2 <== 255
> slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
> PM readw port=0x0000 val=0x0001
> PM readw port=0x0002 val=0x0120
> gpe read afe0 == 2
> gpe read afe2 == ff
> gpe read afe2 == ff
> gpe write afe2 <== 253
> gpe read afe1 == 0
> gpe read afe3 == ff
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> pcihotplug read ae00 == 80
> pcihotplug read ae04 == 0
> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> gpe write afe2 <== 255
> ===========
> 
> Here is short log, and show the different between the first time and second time:
> ===========
> gpe write afe0 <== 2
> gpe write afe2 <== 255
> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
> ....
> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
> gpe write afe2 <== 255
> 
> ===========
> 
> Does this behavior is a normal behavior?
> 
> The following patch can fix this problem.
> 
> >From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Mon, 28 Feb 2011 10:33:45 +0800
> Subject: [PATCH] pend hotplug event until last event is handled by guest OS
> 
> ---
>  hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
>  1 files changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index b5a2762..4ff3475 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState {
>      /* for pci hotplug */
>      struct gpe_regs gpe;
>      struct pci_status pci0_status;
> +    struct pci_status pci0_status_pending;
>      uint32_t pci0_hotplug_enable;
>  } PIIX4PMState;
> 
> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
>      *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
>  }
> 
> +static void raise_pending_hotplug(PIIX4PMState *s)
> +{
> +    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
> +        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
> +        s->pci0_status.up = s->pci0_status_pending.up;
> +        s->pci0_status.down = s->pci0_status_pending.down;
> +        s->pci0_status_pending.up = 0;
> +        s->pci0_status_pending.down = 0;
> +
> +        pm_update_sci(s);
> +    }
> +}
> +
>  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>  {
>      PIIX4PMState *s = opaque;
> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>      pm_update_sci(s);
> 
>      PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
> +
> +    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
> +        /* check and reraise the pending hotplug event */
> +        raise_pending_hotplug(s);
> +    }
>  }
> 
>  static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot)
>      s->pci0_status.up |= (1 << slot);
>  }
> 
> +static void pend_enable_device(PIIX4PMState *s, int slot)
> +{
> +    s->pci0_status_pending.up |= (1 << slot);
> +}
> +
>  static void disable_device(PIIX4PMState *s, int slot)
>  {
>      s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>      s->pci0_status.down |= (1 << slot);
>  }
> 
> +static void pend_disable_device(PIIX4PMState *s, int slot)
> +{
> +    s->pci0_status_pending.down |= (1 << slot);
> +}
> +
>  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>  				PCIHotplugState state)
>  {
> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>          return 0;
>      }
> 
> -    s->pci0_status.up = 0;
> -    s->pci0_status.down = 0;
> -    if (state == PCI_HOTPLUG_ENABLED) {
> -        enable_device(s, slot);
> +    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
> +        if (state == PCI_HOTPLUG_ENABLED) {
> +            pend_enable_device(s, slot);
> +        } else {
> +            pend_disable_device(s, slot);
> +        }
>      } else {
> -        disable_device(s, slot);
> -    }
> +        s->pci0_status.up = 0;
> +        s->pci0_status.down = 0;
> +        if (state == PCI_HOTPLUG_ENABLED) {
> +            enable_device(s, slot);
> +        } else {
> +            disable_device(s, slot);
> +        }
> 
> -    pm_update_sci(s);
> +        pm_update_sci(s);
> +    }
> 
>      return 0;
>  }
> -- 
> 1.7.1
> 
> > 
> > Your patch adds a *second* qdev_free().  No good.
> > 
> > 
>
Wen Congyang March 9, 2011, 5:04 a.m. UTC | #7
At 03/09/2011 12:08 PM, Ryan Harper Write:
> * Wen Congyang <wency@cn.fujitsu.com> [2011-02-27 20:56]:
>> Hi Markus Armbruster
>>
>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
>>>
>>
>> <snip>
>>
>>>
>>> I don't think this patch is correct.  Let me explain.
>>>
>>> Device hot unplug is *not* guaranteed to succeed.
>>>
>>> For some buses, such as USB, it always succeeds immediately, i.e. when
>>> the device_del monitor command finishes, the device is gone.  Live is
>>> good.
>>>
>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
>>> doesn't wait for the dance to complete.  Why?  The dance can take an
>>> unpredictable amount of time, including forever.
>>>
>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
>>> slot, and the unplug has not yet completed (race condition), or it
>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
>>>
>>> When unplug succeeds, the qdev is automatically destroyed.
>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
>>> does it for PCIE.
>>
>> I got a similar problem.  When I unplug a pci device by hand, it works
>> as expected, and I can hotplug it again. But when I use a srcipt to
>> do the same thing, sometimes it failed. I think I may find another bug.
>>
>> Steps to reproduce this bug:
>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>>    #! /bin/bash
>>
>>    while true; do
>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>>            if [[ $? -ne 0 ]]; then
>>                    break
>>            fi
>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>>            if [[ $? -ne 0 ]]; then
>>                    break
>>            fi
>>            sleep 5
> 
> How do you know that the guest has responded at this point before you
> attempt to attach again at the top of the loop.  Any attach/detach
> requires the guest to respond to the request and it may not respond at
> all.

When I attach/detach interface by hand, it works fine: I can see the new interface
when I attach it, and it disapears when I detached it.


> 
>
Ryan Harper March 9, 2011, 6:12 a.m. UTC | #8
* Wen Congyang <wency@cn.fujitsu.com> [2011-03-08 23:09]:
> At 03/09/2011 12:08 PM, Ryan Harper Write:
> > * Wen Congyang <wency@cn.fujitsu.com> [2011-02-27 20:56]:
> >> Hi Markus Armbruster
> >>
> >> At 02/23/2011 04:30 PM, Markus Armbruster Write:
> >>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
> >>>
> >>
> >> <snip>
> >>
> >>>
> >>> I don't think this patch is correct.  Let me explain.
> >>>
> >>> Device hot unplug is *not* guaranteed to succeed.
> >>>
> >>> For some buses, such as USB, it always succeeds immediately, i.e. when
> >>> the device_del monitor command finishes, the device is gone.  Live is
> >>> good.
> >>>
> >>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
> >>> doesn't wait for the dance to complete.  Why?  The dance can take an
> >>> unpredictable amount of time, including forever.
> >>>
> >>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
> >>> slot, and the unplug has not yet completed (race condition), or it
> >>> failed.  Yes, Virginia, PCI hotplug *can* fail.
> >>>
> >>> When unplug succeeds, the qdev is automatically destroyed.
> >>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
> >>> does it for PCIE.
> >>
> >> I got a similar problem.  When I unplug a pci device by hand, it works
> >> as expected, and I can hotplug it again. But when I use a srcipt to
> >> do the same thing, sometimes it failed. I think I may find another bug.
> >>
> >> Steps to reproduce this bug:
> >> 1. cat ./test-e1000.sh # RHEL6RC is domain name
> >>    #! /bin/bash
> >>
> >>    while true; do
> >>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
> >>            if [[ $? -ne 0 ]]; then
> >>                    break
> >>            fi
> >>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
> >>            if [[ $? -ne 0 ]]; then
> >>                    break
> >>            fi
> >>            sleep 5
> > 
> > How do you know that the guest has responded at this point before you
> > attempt to attach again at the top of the loop.  Any attach/detach
> > requires the guest to respond to the request and it may not respond at
> > all.
> 
> When I attach/detach interface by hand, it works fine: I can see the new interface
> when I attach it, and it disapears when I detached it.

The point is that since the attach and detach require guest
participation, this interface isn't reliable.  You have a sleep 5 in
your loop, hoping to wait long enough for the guest to respond, but
after a number of iterations in your loop it fails, you can bump the
sleep to to 3600 seconds and the guest *still* might not respond...
Wen Congyang March 9, 2011, 7:19 a.m. UTC | #9
At 03/09/2011 02:12 PM, Ryan Harper Write:
> * Wen Congyang <wency@cn.fujitsu.com> [2011-03-08 23:09]:
>> At 03/09/2011 12:08 PM, Ryan Harper Write:
>>> * Wen Congyang <wency@cn.fujitsu.com> [2011-02-27 20:56]:
>>>> Hi Markus Armbruster
>>>>
>>>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
>>>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
>>>>>
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> I don't think this patch is correct.  Let me explain.
>>>>>
>>>>> Device hot unplug is *not* guaranteed to succeed.
>>>>>
>>>>> For some buses, such as USB, it always succeeds immediately, i.e. when
>>>>> the device_del monitor command finishes, the device is gone.  Live is
>>>>> good.
>>>>>
>>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
>>>>> doesn't wait for the dance to complete.  Why?  The dance can take an
>>>>> unpredictable amount of time, including forever.
>>>>>
>>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
>>>>> slot, and the unplug has not yet completed (race condition), or it
>>>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
>>>>>
>>>>> When unplug succeeds, the qdev is automatically destroyed.
>>>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
>>>>> does it for PCIE.
>>>>
>>>> I got a similar problem.  When I unplug a pci device by hand, it works
>>>> as expected, and I can hotplug it again. But when I use a srcipt to
>>>> do the same thing, sometimes it failed. I think I may find another bug.
>>>>
>>>> Steps to reproduce this bug:
>>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>>>>    #! /bin/bash
>>>>
>>>>    while true; do
>>>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>>>>            if [[ $? -ne 0 ]]; then
>>>>                    break
>>>>            fi
>>>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>>>>            if [[ $? -ne 0 ]]; then
>>>>                    break
>>>>            fi
>>>>            sleep 5
>>>
>>> How do you know that the guest has responded at this point before you
>>> attempt to attach again at the top of the loop.  Any attach/detach
>>> requires the guest to respond to the request and it may not respond at
>>> all.
>>
>> When I attach/detach interface by hand, it works fine: I can see the new interface
>> when I attach it, and it disapears when I detached it.
> 
> The point is that since the attach and detach require guest
> participation, this interface isn't reliable.  You have a sleep 5 in
> your loop, hoping to wait long enough for the guest to respond, but
> after a number of iterations in your loop it fails, you can bump the
> sleep to to 3600 seconds and the guest *still* might not respond...

We use sci interrupt to tell the guest that a device has been attached/detached.
But the sci interrupt is *lost* in qemu, so the guest does not know a device has
been attached/detached, and does not respond it.

If the sci interrupt is not lost, the guest can respond it.

> 
>
Ryan Harper March 10, 2011, 4:31 a.m. UTC | #10
* Wen Congyang <wency@cn.fujitsu.com> [2011-03-09 01:21]:
> At 03/09/2011 02:12 PM, Ryan Harper Write:
> > * Wen Congyang <wency@cn.fujitsu.com> [2011-03-08 23:09]:
> >> At 03/09/2011 12:08 PM, Ryan Harper Write:
> >>> * Wen Congyang <wency@cn.fujitsu.com> [2011-02-27 20:56]:
> >>>> Hi Markus Armbruster
> >>>>
> >>>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
> >>>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
> >>>>>
> >>>>
> >>>> <snip>
> >>>>
> >>>>>
> >>>>> I don't think this patch is correct.  Let me explain.
> >>>>>
> >>>>> Device hot unplug is *not* guaranteed to succeed.
> >>>>>
> >>>>> For some buses, such as USB, it always succeeds immediately, i.e. when
> >>>>> the device_del monitor command finishes, the device is gone.  Live is
> >>>>> good.
> >>>>>
> >>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
> >>>>> doesn't wait for the dance to complete.  Why?  The dance can take an
> >>>>> unpredictable amount of time, including forever.
> >>>>>
> >>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
> >>>>> slot, and the unplug has not yet completed (race condition), or it
> >>>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
> >>>>>
> >>>>> When unplug succeeds, the qdev is automatically destroyed.
> >>>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
> >>>>> does it for PCIE.
> >>>>
> >>>> I got a similar problem.  When I unplug a pci device by hand, it works
> >>>> as expected, and I can hotplug it again. But when I use a srcipt to
> >>>> do the same thing, sometimes it failed. I think I may find another bug.
> >>>>
> >>>> Steps to reproduce this bug:
> >>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
> >>>>    #! /bin/bash
> >>>>
> >>>>    while true; do
> >>>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
> >>>>            if [[ $? -ne 0 ]]; then
> >>>>                    break
> >>>>            fi
> >>>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
> >>>>            if [[ $? -ne 0 ]]; then
> >>>>                    break
> >>>>            fi
> >>>>            sleep 5
> >>>
> >>> How do you know that the guest has responded at this point before you
> >>> attempt to attach again at the top of the loop.  Any attach/detach
> >>> requires the guest to respond to the request and it may not respond at
> >>> all.
> >>
> >> When I attach/detach interface by hand, it works fine: I can see the new interface
> >> when I attach it, and it disapears when I detached it.
> > 
> > The point is that since the attach and detach require guest
> > participation, this interface isn't reliable.  You have a sleep 5 in
> > your loop, hoping to wait long enough for the guest to respond, but
> > after a number of iterations in your loop it fails, you can bump the
> > sleep to to 3600 seconds and the guest *still* might not respond...
> 
> We use sci interrupt to tell the guest that a device has been attached/detached.
> But the sci interrupt is *lost* in qemu, so the guest does not know a device has
> been attached/detached, and does not respond it.
> 
> If the sci interrupt is not lost, the guest can respond it.

*can* is the important word.  Even if the interrupt isn;t lost, you have
no way to guarantee that the guest will respond at all.  That's not to
say there isn't a bug around the lost interrupt; but rather a more
general point about hotplug's current architecture.

> 
> > 
> >
Wen Congyang March 10, 2011, 5:28 a.m. UTC | #11
At 03/10/2011 12:31 PM, Ryan Harper Write:
> * Wen Congyang <wency@cn.fujitsu.com> [2011-03-09 01:21]:
>> At 03/09/2011 02:12 PM, Ryan Harper Write:
>>> * Wen Congyang <wency@cn.fujitsu.com> [2011-03-08 23:09]:
>>>> At 03/09/2011 12:08 PM, Ryan Harper Write:
>>>>> * Wen Congyang <wency@cn.fujitsu.com> [2011-02-27 20:56]:
>>>>>> Hi Markus Armbruster
>>>>>>
>>>>>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
>>>>>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
>>>>>>>
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>>
>>>>>>> I don't think this patch is correct.  Let me explain.
>>>>>>>
>>>>>>> Device hot unplug is *not* guaranteed to succeed.
>>>>>>>
>>>>>>> For some buses, such as USB, it always succeeds immediately, i.e. when
>>>>>>> the device_del monitor command finishes, the device is gone.  Live is
>>>>>>> good.
>>>>>>>
>>>>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
>>>>>>> doesn't wait for the dance to complete.  Why?  The dance can take an
>>>>>>> unpredictable amount of time, including forever.
>>>>>>>
>>>>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
>>>>>>> slot, and the unplug has not yet completed (race condition), or it
>>>>>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
>>>>>>>
>>>>>>> When unplug succeeds, the qdev is automatically destroyed.
>>>>>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
>>>>>>> does it for PCIE.
>>>>>>
>>>>>> I got a similar problem.  When I unplug a pci device by hand, it works
>>>>>> as expected, and I can hotplug it again. But when I use a srcipt to
>>>>>> do the same thing, sometimes it failed. I think I may find another bug.
>>>>>>
>>>>>> Steps to reproduce this bug:
>>>>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>>>>>>    #! /bin/bash
>>>>>>
>>>>>>    while true; do
>>>>>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>>>>>>            if [[ $? -ne 0 ]]; then
>>>>>>                    break
>>>>>>            fi
>>>>>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>>>>>>            if [[ $? -ne 0 ]]; then
>>>>>>                    break
>>>>>>            fi
>>>>>>            sleep 5
>>>>>
>>>>> How do you know that the guest has responded at this point before you
>>>>> attempt to attach again at the top of the loop.  Any attach/detach
>>>>> requires the guest to respond to the request and it may not respond at
>>>>> all.
>>>>
>>>> When I attach/detach interface by hand, it works fine: I can see the new interface
>>>> when I attach it, and it disapears when I detached it.
>>>
>>> The point is that since the attach and detach require guest
>>> participation, this interface isn't reliable.  You have a sleep 5 in
>>> your loop, hoping to wait long enough for the guest to respond, but
>>> after a number of iterations in your loop it fails, you can bump the
>>> sleep to to 3600 seconds and the guest *still* might not respond...
>>
>> We use sci interrupt to tell the guest that a device has been attached/detached.
>> But the sci interrupt is *lost* in qemu, so the guest does not know a device has
>> been attached/detached, and does not respond it.
>>
>> If the sci interrupt is not lost, the guest can respond it.
> 
> *can* is the important word.  Even if the interrupt isn;t lost, you have
> no way to guarantee that the guest will respond at all.  That's not to
> say there isn't a bug around the lost interrupt; but rather a more
> general point about hotplug's current architecture.

I don't know whether a real hardware has the same behavior.
Should we make sure the sci interrupt not lost?

> 
>>
>>>
>>>
>
diff mbox

Patch

========
PM: mapping to 0xb000
PM readw port=0x0004 val=0x0000
...
gpe write afe2 <== 0
gpe write afe0 <== 255
gpe write afe3 <== 0
gpe write afe1 <== 255
PM readw port=0x0000 val=0x0001
PM readw port=0x0002 val=0x0000
gpe read afe0 == 0
gpe read afe2 == 0
gpe read afe1 == 0
gpe read afe3 == 0
PM writew port=0x0000 val=0x0020
PM readw port=0x0002 val=0x0000
PM writew port=0x0002 val=0x0020
PM readw port=0x0002 val=0x0020
gpe write afe2 <== 255
gpe write afe3 <== 255
...
slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
PM readw port=0x0000 val=0x0001
PM readw port=0x0002 val=0x0120
gpe read afe0 == 2
gpe read afe2 == ff
gpe read afe2 == ff
gpe write afe2 <== 253
gpe read afe1 == 0
gpe read afe3 == ff
pcihotplug read ae00 == 40
pcihotplug read ae04 == 0
...
gpe write afe0 <== 2
gpe write afe2 <== 255
slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
PM readw port=0x0000 val=0x0001
PM readw port=0x0002 val=0x0120
gpe read afe0 == 2
gpe read afe2 == ff
gpe read afe2 == ff
gpe write afe2 <== 253
gpe read afe1 == 0
gpe read afe3 == ff
pcihotplug read ae00 == 0
pcihotplug read ae04 == 40
...
pciej write ae08 <== 64		<=== we will call qdev_free()
pciej write ae08 <== 64
gpe write afe0 <== 2
gpe write afe2 <== 255
slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
PM readw port=0x0000 val=0x0001
PM readw port=0x0002 val=0x0120
gpe read afe0 == 2
gpe read afe2 == ff
gpe read afe2 == ff
gpe write afe2 <== 253
gpe read afe1 == 0
gpe read afe3 == ff
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
gpe write afe2 <== 255
===========

Here is short log, and show the different between the first time and second time:
===========
gpe write afe0 <== 2
gpe write afe2 <== 255
slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
....
slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
gpe write afe2 <== 255

===========

Does this behavior is a normal behavior?

The following patch can fix this problem.

From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Mon, 28 Feb 2011 10:33:45 +0800
Subject: [PATCH] pend hotplug event until last event is handled by guest OS

---
 hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index b5a2762..4ff3475 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -74,6 +74,7 @@  typedef struct PIIX4PMState {
     /* for pci hotplug */
     struct gpe_regs gpe;
     struct pci_status pci0_status;
+    struct pci_status pci0_status_pending;
     uint32_t pci0_hotplug_enable;
 } PIIX4PMState;
 
@@ -518,6 +519,19 @@  static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
     *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
 }
 
+static void raise_pending_hotplug(PIIX4PMState *s)
+{
+    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
+        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
+        s->pci0_status.up = s->pci0_status_pending.up;
+        s->pci0_status.down = s->pci0_status_pending.down;
+        s->pci0_status_pending.up = 0;
+        s->pci0_status_pending.down = 0;
+
+        pm_update_sci(s);
+    }
+}
+
 static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
 {
     PIIX4PMState *s = opaque;
@@ -539,6 +553,11 @@  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
     pm_update_sci(s);
 
     PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
+
+    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
+        /* check and reraise the pending hotplug event */
+        raise_pending_hotplug(s);
+    }
 }
 
 static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
@@ -639,12 +658,22 @@  static void enable_device(PIIX4PMState *s, int slot)
     s->pci0_status.up |= (1 << slot);
 }
 
+static void pend_enable_device(PIIX4PMState *s, int slot)
+{
+    s->pci0_status_pending.up |= (1 << slot);
+}
+
 static void disable_device(PIIX4PMState *s, int slot)
 {
     s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
     s->pci0_status.down |= (1 << slot);
 }
 
+static void pend_disable_device(PIIX4PMState *s, int slot)
+{
+    s->pci0_status_pending.down |= (1 << slot);
+}
+
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
 				PCIHotplugState state)
 {
@@ -659,15 +688,23 @@  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
         return 0;
     }
 
-    s->pci0_status.up = 0;
-    s->pci0_status.down = 0;
-    if (state == PCI_HOTPLUG_ENABLED) {
-        enable_device(s, slot);
+    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
+        if (state == PCI_HOTPLUG_ENABLED) {
+            pend_enable_device(s, slot);
+        } else {
+            pend_disable_device(s, slot);
+        }
     } else {
-        disable_device(s, slot);
-    }
+        s->pci0_status.up = 0;
+        s->pci0_status.down = 0;
+        if (state == PCI_HOTPLUG_ENABLED) {
+            enable_device(s, slot);
+        } else {
+            disable_device(s, slot);
+        }
 
-    pm_update_sci(s);
+        pm_update_sci(s);
+    }
 
     return 0;
 }