diff mbox

[PATCHv2] piix: fix up/down races + document

Message ID 20120329125143.GA13169@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin March 29, 2012, 12:51 p.m. UTC
piix acpi interface suffers from the following 2 issues:

1.
- delete device a
- quickly add device b in another slot

if we do this before guest reads the down register,
the down event is discarded and device will never
be deleted.

2.
- delete device a
- quickly reset before guest can respond

interrupt is reset and guest will never eject the device.

To fix this, we implement two changes:

1. Add two new registers:
CLR_UP
CLR_DOWN
bios will now write to these the value it read from UP/DOWN

2. on reset, remove all devices which have DOWN bit set

For compatibility with old guests, we also clear
the DOWN bit on write to EJ0 for a device.

This patch also adds more detailed documentation
for the ACPI interface, including the semantics
of both old and new registers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Changes from v1:
- tweaked a comment about clearing down register on reset
- documentation update

 docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
 hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
 2 files changed, 121 insertions(+), 22 deletions(-)

Comments

Gleb Natapov March 29, 2012, 1:04 p.m. UTC | #1
On Thu, Mar 29, 2012 at 02:51:44PM +0200, Michael S. Tsirkin wrote:
> piix acpi interface suffers from the following 2 issues:
> 
> 1.
> - delete device a
> - quickly add device b in another slot
> 
> if we do this before guest reads the down register,
> the down event is discarded and device will never
> be deleted.
> 
> 2.
> - delete device a
> - quickly reset before guest can respond
> 
> interrupt is reset and guest will never eject the device.
> 
> To fix this, we implement two changes:
> 
> 1. Add two new registers:
> CLR_UP
> CLR_DOWN
> bios will now write to these the value it read from UP/DOWN
> 
> 2. on reset, remove all devices which have DOWN bit set
> 
> For compatibility with old guests, we also clear
> the DOWN bit on write to EJ0 for a device.
> 
> This patch also adds more detailed documentation
> for the ACPI interface, including the semantics
> of both old and new registers.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Changes from v1:
> - tweaked a comment about clearing down register on reset
> - documentation update
> 
>  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
>  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
>  2 files changed, 121 insertions(+), 22 deletions(-)
> 
> diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> index f0f74a7..a8398f9 100644
> --- a/docs/specs/acpi_pci_hotplug.txt
> +++ b/docs/specs/acpi_pci_hotplug.txt
> @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
>  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
>  -----------------------------------------
>  
> -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
>  event to ACPI BIOS, via SCI interrupt.
>  
> -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> +Semantics: this event occurs each time a bit in either Pending PCI slot
> +injection notification or Pending PCI slot removal notification registers
> +changes status from 0 to 1.
> +
> +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
>  ---------------------------------------------------------------
> -Slot injection notification pending. One bit per slot.
> +Slot injection notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.
>  
>  Read by ACPI BIOS GPE.1 handler to notify OS of injection
>  events.
>  
> -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +Semantics: after a slot is populated by hotplug, it is immediately enabled
> +and a bit in this register is set.
> +Reset value: 0
> +
> +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
>  -----------------------------------------------------
> -Slot removal notification pending. One bit per slot.
> +Slot removal notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.
>  
>  Read by ACPI BIOS GPE.1 handler to notify OS of removal
>  events.
>  
> +Semantics: upon hotunplug request, a bit in this register is set
> +and the OSPM is notified about hotunplug request, but a slot remains enabled
> +until eject.
> +Reset value: 0
> +
>  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
>  ----------------------------------------
>  
>  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> -Reads return 0.
> +Guests should not read this register. In practice reads return 0.
> +
> +Semantics: selected hotunpluggable slots are disabled and powered off.
> +Has no effect on non-hotunpluggable slots.
As it's implemented right now only one slot should be ejected at a time.

Looks good otherwise.

> +
> +For compatibility with old BIOS, writes to this register
> +clear bits in pending slot injection notification register.
>  
>  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
>  -----------------------------------------------
>  
> -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> -bit per slot.
> +Used by ACPI BIOS _RMV method to detect removability status
> +and report to OS. One bit per slot.
> +Guests should not write this register, in practice the value
> +written is discarded.
> +
> +Semantics: selected slots support hotplug. Contents of this register
> +do not change while guest is running.
> +
> +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> +---------------------------------------------------------------
> +Clears bits in pending slot injection notification register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot injection notification register and before
> +notifying OS of injection events.
> +
> +Semantics: used to acknowledge the injection notification.
> +Does not affect slot status.
> +
> +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +-----------------------------------------------------
> +Clears bits in slot removal notification pending register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot removal notification register and before
> +notifying OS of removal events.
> +
> +Semantics: used to acknowledge the hotunplug request.
> +Does not affect slot status.
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 797ed24..50553fd 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -43,6 +43,8 @@
>  #define PCI_BASE 0xae00
>  #define PCI_EJ_BASE 0xae08
>  #define PCI_RMV_BASE 0xae0c
> +#define PCI_CLR_UP_BASE 0xae10
> +#define PCI_CLR_DOWN_BASE 0xae14
>  
>  #define PIIX4_PCI_HOTPLUG_STATUS 2
>  
> @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
>      }
>  }
>  
> +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> +{
> +    DeviceState *qdev, *next;
> +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> +    int slot = ffs(slots) - 1;
> +
> +    /*
> +     * Clear DOWN register - this is good for a case where guest can't
> +     * write to CLR_DOWN because of VM reset, and for compatibility with
> +     * old guests which do not have CLR_DOWN.
> +     */
> +    s->pci0_status.down &= ~(1U << slot);
> +
> +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> +        PCIDevice *dev = PCI_DEVICE(qdev);
> +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> +            qdev_free(qdev);
> +        }
> +    }
> +}
> +
>  static void piix4_reset(void *opaque)
>  {
>      PIIX4PMState *s = opaque;
> @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
>          pci_conf[0x5B] = 0x02;
>      }
>      piix4_update_hotplug(s);
> +    /*
> +     * Guest lost remove events if any.
> +     * As it's being reset it's safe to remove the device now.
> +     */
> +    while (s->pci0_status.down) {
> +        acpi_piix_eject_slot(s, s->pci0_status.down);
> +    }
> +    /*
> +     * Guest lost add events if any.
> +     * As it's being reset and will rescan the bus we can discard
> +     * past events now.
> +     */
> +    s->pci0_status.up = 0;
>  }
>  
>  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
>  
>  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
>  {
> -    BusState *bus = opaque;
> -    DeviceState *qdev, *next;
> -    int slot = ffs(val) - 1;
> +    PIIX4PMState *s = opaque;
>  
> -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> -        PCIDevice *dev = PCI_DEVICE(qdev);
> -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> -            qdev_free(qdev);
> -        }
> +    if (val) {
> +        acpi_piix_eject_slot(s, val);
>      }
>  
> -
>      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
>  }
>  
> +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.up &= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> +}
> +
> +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.down &= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> +}
> +
>  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
>  {
>      PIIX4PMState *s = opaque;
> @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
>      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
>      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
>  
> -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
>  
>      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
>      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
>  
> +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> +
>      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
>  }
>  
> @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>          return 0;
>      }
>  
> -    s->pci0_status.up = 0;
> -    s->pci0_status.down = 0;
>      if (state == PCI_HOTPLUG_ENABLED) {
>          enable_device(s, slot);
>      } else {
> -- 
> 1.7.9.111.gf3fb0

--
			Gleb.
Michael S. Tsirkin March 29, 2012, 1:30 p.m. UTC | #2
On Thu, Mar 29, 2012 at 03:04:12PM +0200, Gleb Natapov wrote:
> On Thu, Mar 29, 2012 at 02:51:44PM +0200, Michael S. Tsirkin wrote:
> > piix acpi interface suffers from the following 2 issues:
> > 
> > 1.
> > - delete device a
> > - quickly add device b in another slot
> > 
> > if we do this before guest reads the down register,
> > the down event is discarded and device will never
> > be deleted.
> > 
> > 2.
> > - delete device a
> > - quickly reset before guest can respond
> > 
> > interrupt is reset and guest will never eject the device.
> > 
> > To fix this, we implement two changes:
> > 
> > 1. Add two new registers:
> > CLR_UP
> > CLR_DOWN
> > bios will now write to these the value it read from UP/DOWN
> > 
> > 2. on reset, remove all devices which have DOWN bit set
> > 
> > For compatibility with old guests, we also clear
> > the DOWN bit on write to EJ0 for a device.
> > 
> > This patch also adds more detailed documentation
> > for the ACPI interface, including the semantics
> > of both old and new registers.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Changes from v1:
> > - tweaked a comment about clearing down register on reset
> > - documentation update
> > 
> >  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
> >  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
> >  2 files changed, 121 insertions(+), 22 deletions(-)
> > 
> > diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> > index f0f74a7..a8398f9 100644
> > --- a/docs/specs/acpi_pci_hotplug.txt
> > +++ b/docs/specs/acpi_pci_hotplug.txt
> > @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
> >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> >  -----------------------------------------
> >  
> > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
> >  event to ACPI BIOS, via SCI interrupt.
> >  
> > -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> > +Semantics: this event occurs each time a bit in either Pending PCI slot
> > +injection notification or Pending PCI slot removal notification registers
> > +changes status from 0 to 1.
> > +
> > +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
> >  ---------------------------------------------------------------
> > -Slot injection notification pending. One bit per slot.
> > +Slot injection notification is pending. One bit per slot.
> > +Guests should not write this register, in practice the value
> > +written overwrites the register.
> >  
> >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> >  events.
> >  
> > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > +Semantics: after a slot is populated by hotplug, it is immediately enabled
> > +and a bit in this register is set.
> > +Reset value: 0
> > +
> > +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> >  -----------------------------------------------------
> > -Slot removal notification pending. One bit per slot.
> > +Slot removal notification is pending. One bit per slot.
> > +Guests should not write this register, in practice the value
> > +written overwrites the register.
> >  
> >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> >  events.
> >  
> > +Semantics: upon hotunplug request, a bit in this register is set
> > +and the OSPM is notified about hotunplug request, but a slot remains enabled
> > +until eject.
> > +Reset value: 0
> > +
> >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> >  ----------------------------------------
> >  
> >  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> > -Reads return 0.
> > +Guests should not read this register. In practice reads return 0.
> > +
> > +Semantics: selected hotunpluggable slots are disabled and powered off.
> > +Has no effect on non-hotunpluggable slots.
> As it's implemented right now only one slot should be ejected at a time.

Right. I'll document that as well.

> Looks good otherwise.
> 
> > +
> > +For compatibility with old BIOS, writes to this register
> > +clear bits in pending slot injection notification register.
> >  
> >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> >  -----------------------------------------------
> >  
> > -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> > -bit per slot.
> > +Used by ACPI BIOS _RMV method to detect removability status
> > +and report to OS. One bit per slot.
> > +Guests should not write this register, in practice the value
> > +written is discarded.
> > +
> > +Semantics: selected slots support hotplug. Contents of this register
> > +do not change while guest is running.
> > +
> > +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> > +---------------------------------------------------------------
> > +Clears bits in pending slot injection notification register. One bit per slot.
> > +Guests should not read this register. In practice reads return 0.
> > +
> > +Written to by ACPI BIOS GPE.1 handler after reading the
> > +pending slot injection notification register and before
> > +notifying OS of injection events.
> > +
> > +Semantics: used to acknowledge the injection notification.
> > +Does not affect slot status.
> > +
> > +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > +-----------------------------------------------------
> > +Clears bits in slot removal notification pending register. One bit per slot.
> > +Guests should not read this register. In practice reads return 0.
> > +
> > +Written to by ACPI BIOS GPE.1 handler after reading the
> > +pending slot removal notification register and before
> > +notifying OS of removal events.
> > +
> > +Semantics: used to acknowledge the hotunplug request.
> > +Does not affect slot status.
> > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > index 797ed24..50553fd 100644
> > --- a/hw/acpi_piix4.c
> > +++ b/hw/acpi_piix4.c
> > @@ -43,6 +43,8 @@
> >  #define PCI_BASE 0xae00
> >  #define PCI_EJ_BASE 0xae08
> >  #define PCI_RMV_BASE 0xae0c
> > +#define PCI_CLR_UP_BASE 0xae10
> > +#define PCI_CLR_DOWN_BASE 0xae14
> >  
> >  #define PIIX4_PCI_HOTPLUG_STATUS 2
> >  
> > @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
> >      }
> >  }
> >  
> > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> > +{
> > +    DeviceState *qdev, *next;
> > +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> > +    int slot = ffs(slots) - 1;
> > +
> > +    /*
> > +     * Clear DOWN register - this is good for a case where guest can't
> > +     * write to CLR_DOWN because of VM reset, and for compatibility with
> > +     * old guests which do not have CLR_DOWN.
> > +     */
> > +    s->pci0_status.down &= ~(1U << slot);
> > +
> > +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > +        PCIDevice *dev = PCI_DEVICE(qdev);
> > +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > +            qdev_free(qdev);
> > +        }
> > +    }
> > +}
> > +
> >  static void piix4_reset(void *opaque)
> >  {
> >      PIIX4PMState *s = opaque;
> > @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
> >          pci_conf[0x5B] = 0x02;
> >      }
> >      piix4_update_hotplug(s);
> > +    /*
> > +     * Guest lost remove events if any.
> > +     * As it's being reset it's safe to remove the device now.
> > +     */
> > +    while (s->pci0_status.down) {
> > +        acpi_piix_eject_slot(s, s->pci0_status.down);
> > +    }
> > +    /*
> > +     * Guest lost add events if any.
> > +     * As it's being reset and will rescan the bus we can discard
> > +     * past events now.
> > +     */
> > +    s->pci0_status.up = 0;
> >  }
> >  
> >  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> > @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
> >  
> >  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
> >  {
> > -    BusState *bus = opaque;
> > -    DeviceState *qdev, *next;
> > -    int slot = ffs(val) - 1;
> > +    PIIX4PMState *s = opaque;
> >  
> > -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > -        PCIDevice *dev = PCI_DEVICE(qdev);
> > -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > -            qdev_free(qdev);
> > -        }
> > +    if (val) {
> > +        acpi_piix_eject_slot(s, val);
> >      }
> >  
> > -
> >      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
> >  }
> >  
> > +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> > +{
> > +    PIIX4PMState *s = opaque;
> > +    s->pci0_status.up &= ~val;
> > +
> > +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> > +}
> > +
> > +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> > +{
> > +    PIIX4PMState *s = opaque;
> > +    s->pci0_status.down &= ~val;
> > +
> > +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> > +}
> > +
> >  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
> >  {
> >      PIIX4PMState *s = opaque;
> > @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
> >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
> >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
> >  
> > -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> > -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> > +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> > +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> >  
> >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> >  
> > +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> > +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> > +
> >      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
> >  }
> >  
> > @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> >          return 0;
> >      }
> >  
> > -    s->pci0_status.up = 0;
> > -    s->pci0_status.down = 0;
> >      if (state == PCI_HOTPLUG_ENABLED) {
> >          enable_device(s, slot);
> >      } else {
> > -- 
> > 1.7.9.111.gf3fb0
> 
> --
> 			Gleb.
Alex Williamson March 29, 2012, 4:53 p.m. UTC | #3
On Thu, 2012-03-29 at 14:51 +0200, Michael S. Tsirkin wrote:
> piix acpi interface suffers from the following 2 issues:
> 
> 1.
> - delete device a
> - quickly add device b in another slot
> 
> if we do this before guest reads the down register,
> the down event is discarded and device will never
> be deleted.
> 
> 2.
> - delete device a
> - quickly reset before guest can respond
> 
> interrupt is reset and guest will never eject the device.
> 
> To fix this, we implement two changes:
> 
> 1. Add two new registers:
> CLR_UP
> CLR_DOWN
> bios will now write to these the value it read from UP/DOWN
> 
> 2. on reset, remove all devices which have DOWN bit set
> 
> For compatibility with old guests, we also clear
> the DOWN bit on write to EJ0 for a device.
> 
> This patch also adds more detailed documentation
> for the ACPI interface, including the semantics
> of both old and new registers.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Changes from v1:
> - tweaked a comment about clearing down register on reset
> - documentation update
> 
>  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
>  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
>  2 files changed, 121 insertions(+), 22 deletions(-)
> 
> diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> index f0f74a7..a8398f9 100644
> --- a/docs/specs/acpi_pci_hotplug.txt
> +++ b/docs/specs/acpi_pci_hotplug.txt
> @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
>  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
>  -----------------------------------------
>  
> -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
>  event to ACPI BIOS, via SCI interrupt.
>  
> -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> +Semantics: this event occurs each time a bit in either Pending PCI slot
> +injection notification or Pending PCI slot removal notification registers
> +changes status from 0 to 1.
> +
> +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
>  ---------------------------------------------------------------
> -Slot injection notification pending. One bit per slot.
> +Slot injection notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.

Sorry if I'm behind on the discussion, but why do we need to define
0xae0c & 0xae10 below for write-only access when these registers (0xae00
& 0xae04) are effectively read-only (yes, write is there, but it should
*never* be used in the current implementation)?  Thanks,

Alex

>  Read by ACPI BIOS GPE.1 handler to notify OS of injection
>  events.
>  
> -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +Semantics: after a slot is populated by hotplug, it is immediately enabled
> +and a bit in this register is set.
> +Reset value: 0
> +
> +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
>  -----------------------------------------------------
> -Slot removal notification pending. One bit per slot.
> +Slot removal notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.
>  
>  Read by ACPI BIOS GPE.1 handler to notify OS of removal
>  events.
>  
> +Semantics: upon hotunplug request, a bit in this register is set
> +and the OSPM is notified about hotunplug request, but a slot remains enabled
> +until eject.
> +Reset value: 0
> +
>  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
>  ----------------------------------------
>  
>  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> -Reads return 0.
> +Guests should not read this register. In practice reads return 0.
> +
> +Semantics: selected hotunpluggable slots are disabled and powered off.
> +Has no effect on non-hotunpluggable slots.
> +
> +For compatibility with old BIOS, writes to this register
> +clear bits in pending slot injection notification register.
>  
>  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
>  -----------------------------------------------
>  
> -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> -bit per slot.
> +Used by ACPI BIOS _RMV method to detect removability status
> +and report to OS. One bit per slot.
> +Guests should not write this register, in practice the value
> +written is discarded.
> +
> +Semantics: selected slots support hotplug. Contents of this register
> +do not change while guest is running.
> +
> +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> +---------------------------------------------------------------
> +Clears bits in pending slot injection notification register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot injection notification register and before
> +notifying OS of injection events.
> +
> +Semantics: used to acknowledge the injection notification.
> +Does not affect slot status.
> +
> +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +-----------------------------------------------------
> +Clears bits in slot removal notification pending register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot removal notification register and before
> +notifying OS of removal events.
> +
> +Semantics: used to acknowledge the hotunplug request.
> +Does not affect slot status.
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 797ed24..50553fd 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -43,6 +43,8 @@
>  #define PCI_BASE 0xae00
>  #define PCI_EJ_BASE 0xae08
>  #define PCI_RMV_BASE 0xae0c
> +#define PCI_CLR_UP_BASE 0xae10
> +#define PCI_CLR_DOWN_BASE 0xae14
>  
>  #define PIIX4_PCI_HOTPLUG_STATUS 2
>  
> @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
>      }
>  }
>  
> +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> +{
> +    DeviceState *qdev, *next;
> +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> +    int slot = ffs(slots) - 1;
> +
> +    /*
> +     * Clear DOWN register - this is good for a case where guest can't
> +     * write to CLR_DOWN because of VM reset, and for compatibility with
> +     * old guests which do not have CLR_DOWN.
> +     */
> +    s->pci0_status.down &= ~(1U << slot);
> +
> +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> +        PCIDevice *dev = PCI_DEVICE(qdev);
> +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> +            qdev_free(qdev);
> +        }
> +    }
> +}
> +
>  static void piix4_reset(void *opaque)
>  {
>      PIIX4PMState *s = opaque;
> @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
>          pci_conf[0x5B] = 0x02;
>      }
>      piix4_update_hotplug(s);
> +    /*
> +     * Guest lost remove events if any.
> +     * As it's being reset it's safe to remove the device now.
> +     */
> +    while (s->pci0_status.down) {
> +        acpi_piix_eject_slot(s, s->pci0_status.down);
> +    }
> +    /*
> +     * Guest lost add events if any.
> +     * As it's being reset and will rescan the bus we can discard
> +     * past events now.
> +     */
> +    s->pci0_status.up = 0;
>  }
>  
>  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
>  
>  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
>  {
> -    BusState *bus = opaque;
> -    DeviceState *qdev, *next;
> -    int slot = ffs(val) - 1;
> +    PIIX4PMState *s = opaque;
>  
> -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> -        PCIDevice *dev = PCI_DEVICE(qdev);
> -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> -            qdev_free(qdev);
> -        }
> +    if (val) {
> +        acpi_piix_eject_slot(s, val);
>      }
>  
> -
>      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
>  }
>  
> +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.up &= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> +}
> +
> +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.down &= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> +}
> +
>  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
>  {
>      PIIX4PMState *s = opaque;
> @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
>      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
>      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
>  
> -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
>  
>      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
>      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
>  
> +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> +
>      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
>  }
>  
> @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>          return 0;
>      }
>  
> -    s->pci0_status.up = 0;
> -    s->pci0_status.down = 0;
>      if (state == PCI_HOTPLUG_ENABLED) {
>          enable_device(s, slot);
>      } else {
Michael S. Tsirkin March 29, 2012, 7:38 p.m. UTC | #4
On Thu, Mar 29, 2012 at 10:53:38AM -0600, Alex Williamson wrote:
> On Thu, 2012-03-29 at 14:51 +0200, Michael S. Tsirkin wrote:
> > piix acpi interface suffers from the following 2 issues:
> > 
> > 1.
> > - delete device a
> > - quickly add device b in another slot
> > 
> > if we do this before guest reads the down register,
> > the down event is discarded and device will never
> > be deleted.
> > 
> > 2.
> > - delete device a
> > - quickly reset before guest can respond
> > 
> > interrupt is reset and guest will never eject the device.
> > 
> > To fix this, we implement two changes:
> > 
> > 1. Add two new registers:
> > CLR_UP
> > CLR_DOWN
> > bios will now write to these the value it read from UP/DOWN
> > 
> > 2. on reset, remove all devices which have DOWN bit set
> > 
> > For compatibility with old guests, we also clear
> > the DOWN bit on write to EJ0 for a device.
> > 
> > This patch also adds more detailed documentation
> > for the ACPI interface, including the semantics
> > of both old and new registers.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Changes from v1:
> > - tweaked a comment about clearing down register on reset
> > - documentation update
> > 
> >  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
> >  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
> >  2 files changed, 121 insertions(+), 22 deletions(-)
> > 
> > diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> > index f0f74a7..a8398f9 100644
> > --- a/docs/specs/acpi_pci_hotplug.txt
> > +++ b/docs/specs/acpi_pci_hotplug.txt
> > @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
> >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> >  -----------------------------------------
> >  
> > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
> >  event to ACPI BIOS, via SCI interrupt.
> >  
> > -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> > +Semantics: this event occurs each time a bit in either Pending PCI slot
> > +injection notification or Pending PCI slot removal notification registers
> > +changes status from 0 to 1.
> > +
> > +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
> >  ---------------------------------------------------------------
> > -Slot injection notification pending. One bit per slot.
> > +Slot injection notification is pending. One bit per slot.
> > +Guests should not write this register, in practice the value
> > +written overwrites the register.
> 
> Sorry if I'm behind on the discussion, but why do we need to define
> 0xae0c & 0xae10 below for write-only access when these registers (0xae00
> & 0xae04) are effectively read-only (yes, write is there, but it should
> *never* be used in the current implementation)?  Thanks,
> 
> Alex

Because otherwise a new bios will be broken on an old qemu.
We should have made the registers read-only in the old qemu
then we won't have the problem now.

Yes, it's probably not a big deal, but it's just as easy
to create a new register as it is to redefine the old one.

> >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> >  events.
> >  
> > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > +Semantics: after a slot is populated by hotplug, it is immediately enabled
> > +and a bit in this register is set.
> > +Reset value: 0
> > +
> > +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> >  -----------------------------------------------------
> > -Slot removal notification pending. One bit per slot.
> > +Slot removal notification is pending. One bit per slot.
> > +Guests should not write this register, in practice the value
> > +written overwrites the register.
> >  
> >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> >  events.
> >  
> > +Semantics: upon hotunplug request, a bit in this register is set
> > +and the OSPM is notified about hotunplug request, but a slot remains enabled
> > +until eject.
> > +Reset value: 0
> > +
> >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> >  ----------------------------------------
> >  
> >  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> > -Reads return 0.
> > +Guests should not read this register. In practice reads return 0.
> > +
> > +Semantics: selected hotunpluggable slots are disabled and powered off.
> > +Has no effect on non-hotunpluggable slots.
> > +
> > +For compatibility with old BIOS, writes to this register
> > +clear bits in pending slot injection notification register.
> >  
> >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> >  -----------------------------------------------
> >  
> > -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> > -bit per slot.
> > +Used by ACPI BIOS _RMV method to detect removability status
> > +and report to OS. One bit per slot.
> > +Guests should not write this register, in practice the value
> > +written is discarded.
> > +
> > +Semantics: selected slots support hotplug. Contents of this register
> > +do not change while guest is running.
> > +
> > +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> > +---------------------------------------------------------------
> > +Clears bits in pending slot injection notification register. One bit per slot.
> > +Guests should not read this register. In practice reads return 0.
> > +
> > +Written to by ACPI BIOS GPE.1 handler after reading the
> > +pending slot injection notification register and before
> > +notifying OS of injection events.
> > +
> > +Semantics: used to acknowledge the injection notification.
> > +Does not affect slot status.
> > +
> > +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > +-----------------------------------------------------
> > +Clears bits in slot removal notification pending register. One bit per slot.
> > +Guests should not read this register. In practice reads return 0.
> > +
> > +Written to by ACPI BIOS GPE.1 handler after reading the
> > +pending slot removal notification register and before
> > +notifying OS of removal events.
> > +
> > +Semantics: used to acknowledge the hotunplug request.
> > +Does not affect slot status.
> > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > index 797ed24..50553fd 100644
> > --- a/hw/acpi_piix4.c
> > +++ b/hw/acpi_piix4.c
> > @@ -43,6 +43,8 @@
> >  #define PCI_BASE 0xae00
> >  #define PCI_EJ_BASE 0xae08
> >  #define PCI_RMV_BASE 0xae0c
> > +#define PCI_CLR_UP_BASE 0xae10
> > +#define PCI_CLR_DOWN_BASE 0xae14
> >  
> >  #define PIIX4_PCI_HOTPLUG_STATUS 2
> >  
> > @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
> >      }
> >  }
> >  
> > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> > +{
> > +    DeviceState *qdev, *next;
> > +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> > +    int slot = ffs(slots) - 1;
> > +
> > +    /*
> > +     * Clear DOWN register - this is good for a case where guest can't
> > +     * write to CLR_DOWN because of VM reset, and for compatibility with
> > +     * old guests which do not have CLR_DOWN.
> > +     */
> > +    s->pci0_status.down &= ~(1U << slot);
> > +
> > +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > +        PCIDevice *dev = PCI_DEVICE(qdev);
> > +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > +            qdev_free(qdev);
> > +        }
> > +    }
> > +}
> > +
> >  static void piix4_reset(void *opaque)
> >  {
> >      PIIX4PMState *s = opaque;
> > @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
> >          pci_conf[0x5B] = 0x02;
> >      }
> >      piix4_update_hotplug(s);
> > +    /*
> > +     * Guest lost remove events if any.
> > +     * As it's being reset it's safe to remove the device now.
> > +     */
> > +    while (s->pci0_status.down) {
> > +        acpi_piix_eject_slot(s, s->pci0_status.down);
> > +    }
> > +    /*
> > +     * Guest lost add events if any.
> > +     * As it's being reset and will rescan the bus we can discard
> > +     * past events now.
> > +     */
> > +    s->pci0_status.up = 0;
> >  }
> >  
> >  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> > @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
> >  
> >  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
> >  {
> > -    BusState *bus = opaque;
> > -    DeviceState *qdev, *next;
> > -    int slot = ffs(val) - 1;
> > +    PIIX4PMState *s = opaque;
> >  
> > -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > -        PCIDevice *dev = PCI_DEVICE(qdev);
> > -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > -            qdev_free(qdev);
> > -        }
> > +    if (val) {
> > +        acpi_piix_eject_slot(s, val);
> >      }
> >  
> > -
> >      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
> >  }
> >  
> > +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> > +{
> > +    PIIX4PMState *s = opaque;
> > +    s->pci0_status.up &= ~val;
> > +
> > +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> > +}
> > +
> > +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> > +{
> > +    PIIX4PMState *s = opaque;
> > +    s->pci0_status.down &= ~val;
> > +
> > +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> > +}
> > +
> >  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
> >  {
> >      PIIX4PMState *s = opaque;
> > @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
> >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
> >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
> >  
> > -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> > -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> > +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> > +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> >  
> >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> >  
> > +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> > +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> > +
> >      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
> >  }
> >  
> > @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> >          return 0;
> >      }
> >  
> > -    s->pci0_status.up = 0;
> > -    s->pci0_status.down = 0;
> >      if (state == PCI_HOTPLUG_ENABLED) {
> >          enable_device(s, slot);
> >      } else {
> 
>
Alex Williamson March 29, 2012, 7:52 p.m. UTC | #5
On Thu, 2012-03-29 at 21:38 +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 29, 2012 at 10:53:38AM -0600, Alex Williamson wrote:
> > On Thu, 2012-03-29 at 14:51 +0200, Michael S. Tsirkin wrote:
> > > piix acpi interface suffers from the following 2 issues:
> > > 
> > > 1.
> > > - delete device a
> > > - quickly add device b in another slot
> > > 
> > > if we do this before guest reads the down register,
> > > the down event is discarded and device will never
> > > be deleted.
> > > 
> > > 2.
> > > - delete device a
> > > - quickly reset before guest can respond
> > > 
> > > interrupt is reset and guest will never eject the device.
> > > 
> > > To fix this, we implement two changes:
> > > 
> > > 1. Add two new registers:
> > > CLR_UP
> > > CLR_DOWN
> > > bios will now write to these the value it read from UP/DOWN
> > > 
> > > 2. on reset, remove all devices which have DOWN bit set
> > > 
> > > For compatibility with old guests, we also clear
> > > the DOWN bit on write to EJ0 for a device.
> > > 
> > > This patch also adds more detailed documentation
> > > for the ACPI interface, including the semantics
> > > of both old and new registers.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > 
> > > Changes from v1:
> > > - tweaked a comment about clearing down register on reset
> > > - documentation update
> > > 
> > >  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
> > >  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
> > >  2 files changed, 121 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> > > index f0f74a7..a8398f9 100644
> > > --- a/docs/specs/acpi_pci_hotplug.txt
> > > +++ b/docs/specs/acpi_pci_hotplug.txt
> > > @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
> > >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> > >  -----------------------------------------
> > >  
> > > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> > > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
> > >  event to ACPI BIOS, via SCI interrupt.
> > >  
> > > -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> > > +Semantics: this event occurs each time a bit in either Pending PCI slot
> > > +injection notification or Pending PCI slot removal notification registers
> > > +changes status from 0 to 1.
> > > +
> > > +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
> > >  ---------------------------------------------------------------
> > > -Slot injection notification pending. One bit per slot.
> > > +Slot injection notification is pending. One bit per slot.
> > > +Guests should not write this register, in practice the value
> > > +written overwrites the register.
> > 
> > Sorry if I'm behind on the discussion, but why do we need to define
> > 0xae0c & 0xae10 below for write-only access when these registers (0xae00
> > & 0xae04) are effectively read-only (yes, write is there, but it should
> > *never* be used in the current implementation)?  Thanks,
> > 
> > Alex
> 
> Because otherwise a new bios will be broken on an old qemu.
> We should have made the registers read-only in the old qemu
> then we won't have the problem now.

If you think there might be rouge bioses out there that ever wrote to
these registers, but I doubt that's the case.

> Yes, it's probably not a big deal, but it's just as easy
> to create a new register as it is to redefine the old one.

It could just as easily be called a spec clarification that writes were
never intended previously.  It's easy to add two more registers... oh
wait, we only have 16bits of ioport space...  Thanks,

Alex

> > >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> > >  events.
> > >  
> > > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > +Semantics: after a slot is populated by hotplug, it is immediately enabled
> > > +and a bit in this register is set.
> > > +Reset value: 0
> > > +
> > > +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > >  -----------------------------------------------------
> > > -Slot removal notification pending. One bit per slot.
> > > +Slot removal notification is pending. One bit per slot.
> > > +Guests should not write this register, in practice the value
> > > +written overwrites the register.
> > >  
> > >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> > >  events.
> > >  
> > > +Semantics: upon hotunplug request, a bit in this register is set
> > > +and the OSPM is notified about hotunplug request, but a slot remains enabled
> > > +until eject.
> > > +Reset value: 0
> > > +
> > >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> > >  ----------------------------------------
> > >  
> > >  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> > > -Reads return 0.
> > > +Guests should not read this register. In practice reads return 0.
> > > +
> > > +Semantics: selected hotunpluggable slots are disabled and powered off.
> > > +Has no effect on non-hotunpluggable slots.
> > > +
> > > +For compatibility with old BIOS, writes to this register
> > > +clear bits in pending slot injection notification register.
> > >  
> > >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> > >  -----------------------------------------------
> > >  
> > > -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> > > -bit per slot.
> > > +Used by ACPI BIOS _RMV method to detect removability status
> > > +and report to OS. One bit per slot.
> > > +Guests should not write this register, in practice the value
> > > +written is discarded.
> > > +
> > > +Semantics: selected slots support hotplug. Contents of this register
> > > +do not change while guest is running.
> > > +
> > > +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> > > +---------------------------------------------------------------
> > > +Clears bits in pending slot injection notification register. One bit per slot.
> > > +Guests should not read this register. In practice reads return 0.
> > > +
> > > +Written to by ACPI BIOS GPE.1 handler after reading the
> > > +pending slot injection notification register and before
> > > +notifying OS of injection events.
> > > +
> > > +Semantics: used to acknowledge the injection notification.
> > > +Does not affect slot status.
> > > +
> > > +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > +-----------------------------------------------------
> > > +Clears bits in slot removal notification pending register. One bit per slot.
> > > +Guests should not read this register. In practice reads return 0.
> > > +
> > > +Written to by ACPI BIOS GPE.1 handler after reading the
> > > +pending slot removal notification register and before
> > > +notifying OS of removal events.
> > > +
> > > +Semantics: used to acknowledge the hotunplug request.
> > > +Does not affect slot status.
> > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > > index 797ed24..50553fd 100644
> > > --- a/hw/acpi_piix4.c
> > > +++ b/hw/acpi_piix4.c
> > > @@ -43,6 +43,8 @@
> > >  #define PCI_BASE 0xae00
> > >  #define PCI_EJ_BASE 0xae08
> > >  #define PCI_RMV_BASE 0xae0c
> > > +#define PCI_CLR_UP_BASE 0xae10
> > > +#define PCI_CLR_DOWN_BASE 0xae14
> > >  
> > >  #define PIIX4_PCI_HOTPLUG_STATUS 2
> > >  
> > > @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
> > >      }
> > >  }
> > >  
> > > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> > > +{
> > > +    DeviceState *qdev, *next;
> > > +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> > > +    int slot = ffs(slots) - 1;
> > > +
> > > +    /*
> > > +     * Clear DOWN register - this is good for a case where guest can't
> > > +     * write to CLR_DOWN because of VM reset, and for compatibility with
> > > +     * old guests which do not have CLR_DOWN.
> > > +     */
> > > +    s->pci0_status.down &= ~(1U << slot);
> > > +
> > > +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > +        PCIDevice *dev = PCI_DEVICE(qdev);
> > > +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > +            qdev_free(qdev);
> > > +        }
> > > +    }
> > > +}
> > > +
> > >  static void piix4_reset(void *opaque)
> > >  {
> > >      PIIX4PMState *s = opaque;
> > > @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
> > >          pci_conf[0x5B] = 0x02;
> > >      }
> > >      piix4_update_hotplug(s);
> > > +    /*
> > > +     * Guest lost remove events if any.
> > > +     * As it's being reset it's safe to remove the device now.
> > > +     */
> > > +    while (s->pci0_status.down) {
> > > +        acpi_piix_eject_slot(s, s->pci0_status.down);
> > > +    }
> > > +    /*
> > > +     * Guest lost add events if any.
> > > +     * As it's being reset and will rescan the bus we can discard
> > > +     * past events now.
> > > +     */
> > > +    s->pci0_status.up = 0;
> > >  }
> > >  
> > >  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> > > @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
> > >  
> > >  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
> > >  {
> > > -    BusState *bus = opaque;
> > > -    DeviceState *qdev, *next;
> > > -    int slot = ffs(val) - 1;
> > > +    PIIX4PMState *s = opaque;
> > >  
> > > -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > -        PCIDevice *dev = PCI_DEVICE(qdev);
> > > -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > -            qdev_free(qdev);
> > > -        }
> > > +    if (val) {
> > > +        acpi_piix_eject_slot(s, val);
> > >      }
> > >  
> > > -
> > >      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
> > >  }
> > >  
> > > +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> > > +{
> > > +    PIIX4PMState *s = opaque;
> > > +    s->pci0_status.up &= ~val;
> > > +
> > > +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> > > +}
> > > +
> > > +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> > > +{
> > > +    PIIX4PMState *s = opaque;
> > > +    s->pci0_status.down &= ~val;
> > > +
> > > +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> > > +}
> > > +
> > >  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
> > >  {
> > >      PIIX4PMState *s = opaque;
> > > @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
> > >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
> > >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
> > >  
> > > -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> > > -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> > > +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> > > +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> > >  
> > >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> > >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> > >  
> > > +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> > > +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> > > +
> > >      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
> > >  }
> > >  
> > > @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> > >          return 0;
> > >      }
> > >  
> > > -    s->pci0_status.up = 0;
> > > -    s->pci0_status.down = 0;
> > >      if (state == PCI_HOTPLUG_ENABLED) {
> > >          enable_device(s, slot);
> > >      } else {
> > 
> >
Michael S. Tsirkin March 30, 2012, 12:19 p.m. UTC | #6
On Thu, Mar 29, 2012 at 01:52:31PM -0600, Alex Williamson wrote:
> On Thu, 2012-03-29 at 21:38 +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 29, 2012 at 10:53:38AM -0600, Alex Williamson wrote:
> > > On Thu, 2012-03-29 at 14:51 +0200, Michael S. Tsirkin wrote:
> > > > piix acpi interface suffers from the following 2 issues:
> > > > 
> > > > 1.
> > > > - delete device a
> > > > - quickly add device b in another slot
> > > > 
> > > > if we do this before guest reads the down register,
> > > > the down event is discarded and device will never
> > > > be deleted.
> > > > 
> > > > 2.
> > > > - delete device a
> > > > - quickly reset before guest can respond
> > > > 
> > > > interrupt is reset and guest will never eject the device.
> > > > 
> > > > To fix this, we implement two changes:
> > > > 
> > > > 1. Add two new registers:
> > > > CLR_UP
> > > > CLR_DOWN
> > > > bios will now write to these the value it read from UP/DOWN
> > > > 
> > > > 2. on reset, remove all devices which have DOWN bit set
> > > > 
> > > > For compatibility with old guests, we also clear
> > > > the DOWN bit on write to EJ0 for a device.
> > > > 
> > > > This patch also adds more detailed documentation
> > > > for the ACPI interface, including the semantics
> > > > of both old and new registers.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > 
> > > > Changes from v1:
> > > > - tweaked a comment about clearing down register on reset
> > > > - documentation update
> > > > 
> > > >  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
> > > >  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
> > > >  2 files changed, 121 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> > > > index f0f74a7..a8398f9 100644
> > > > --- a/docs/specs/acpi_pci_hotplug.txt
> > > > +++ b/docs/specs/acpi_pci_hotplug.txt
> > > > @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
> > > >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> > > >  -----------------------------------------
> > > >  
> > > > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> > > > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
> > > >  event to ACPI BIOS, via SCI interrupt.
> > > >  
> > > > -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> > > > +Semantics: this event occurs each time a bit in either Pending PCI slot
> > > > +injection notification or Pending PCI slot removal notification registers
> > > > +changes status from 0 to 1.
> > > > +
> > > > +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
> > > >  ---------------------------------------------------------------
> > > > -Slot injection notification pending. One bit per slot.
> > > > +Slot injection notification is pending. One bit per slot.
> > > > +Guests should not write this register, in practice the value
> > > > +written overwrites the register.
> > > 
> > > Sorry if I'm behind on the discussion, but why do we need to define
> > > 0xae0c & 0xae10 below for write-only access when these registers (0xae00
> > > & 0xae04) are effectively read-only (yes, write is there, but it should
> > > *never* be used in the current implementation)?  Thanks,
> > > 
> > > Alex
> > 
> > Because otherwise a new bios will be broken on an old qemu.
> > We should have made the registers read-only in the old qemu
> > then we won't have the problem now.
> 
> If you think there might be rouge bioses out there that ever wrote to
> these registers, but I doubt that's the case.

As I understand it, the issue is not rouge bioses,
it's the old qemus that make these writeable, and we want the new bios
to work there. Therefore new bios should avoid writing these
registers.

> > Yes, it's probably not a big deal, but it's just as easy
> > to create a new register as it is to redefine the old one.
> 
> It could just as easily be called a spec clarification that writes were
> never intended previously.

As I see it, the issue is with old implementations, not the spec.

>  It's easy to add two more registers... oh
> wait, we only have 16bits of ioport space...  Thanks,
> 
> Alex

8 bytes is still cheap ...

> > > >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> > > >  events.
> > > >  
> > > > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > > +Semantics: after a slot is populated by hotplug, it is immediately enabled
> > > > +and a bit in this register is set.
> > > > +Reset value: 0
> > > > +
> > > > +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > >  -----------------------------------------------------
> > > > -Slot removal notification pending. One bit per slot.
> > > > +Slot removal notification is pending. One bit per slot.
> > > > +Guests should not write this register, in practice the value
> > > > +written overwrites the register.
> > > >  
> > > >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> > > >  events.
> > > >  
> > > > +Semantics: upon hotunplug request, a bit in this register is set
> > > > +and the OSPM is notified about hotunplug request, but a slot remains enabled
> > > > +until eject.
> > > > +Reset value: 0
> > > > +
> > > >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> > > >  ----------------------------------------
> > > >  
> > > >  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> > > > -Reads return 0.
> > > > +Guests should not read this register. In practice reads return 0.
> > > > +
> > > > +Semantics: selected hotunpluggable slots are disabled and powered off.
> > > > +Has no effect on non-hotunpluggable slots.
> > > > +
> > > > +For compatibility with old BIOS, writes to this register
> > > > +clear bits in pending slot injection notification register.
> > > >  
> > > >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> > > >  -----------------------------------------------
> > > >  
> > > > -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> > > > -bit per slot.
> > > > +Used by ACPI BIOS _RMV method to detect removability status
> > > > +and report to OS. One bit per slot.
> > > > +Guests should not write this register, in practice the value
> > > > +written is discarded.
> > > > +
> > > > +Semantics: selected slots support hotplug. Contents of this register
> > > > +do not change while guest is running.
> > > > +
> > > > +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> > > > +---------------------------------------------------------------
> > > > +Clears bits in pending slot injection notification register. One bit per slot.
> > > > +Guests should not read this register. In practice reads return 0.
> > > > +
> > > > +Written to by ACPI BIOS GPE.1 handler after reading the
> > > > +pending slot injection notification register and before
> > > > +notifying OS of injection events.
> > > > +
> > > > +Semantics: used to acknowledge the injection notification.
> > > > +Does not affect slot status.
> > > > +
> > > > +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > > +-----------------------------------------------------
> > > > +Clears bits in slot removal notification pending register. One bit per slot.
> > > > +Guests should not read this register. In practice reads return 0.
> > > > +
> > > > +Written to by ACPI BIOS GPE.1 handler after reading the
> > > > +pending slot removal notification register and before
> > > > +notifying OS of removal events.
> > > > +
> > > > +Semantics: used to acknowledge the hotunplug request.
> > > > +Does not affect slot status.
> > > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > > > index 797ed24..50553fd 100644
> > > > --- a/hw/acpi_piix4.c
> > > > +++ b/hw/acpi_piix4.c
> > > > @@ -43,6 +43,8 @@
> > > >  #define PCI_BASE 0xae00
> > > >  #define PCI_EJ_BASE 0xae08
> > > >  #define PCI_RMV_BASE 0xae0c
> > > > +#define PCI_CLR_UP_BASE 0xae10
> > > > +#define PCI_CLR_DOWN_BASE 0xae14
> > > >  
> > > >  #define PIIX4_PCI_HOTPLUG_STATUS 2
> > > >  
> > > > @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
> > > >      }
> > > >  }
> > > >  
> > > > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> > > > +{
> > > > +    DeviceState *qdev, *next;
> > > > +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> > > > +    int slot = ffs(slots) - 1;
> > > > +
> > > > +    /*
> > > > +     * Clear DOWN register - this is good for a case where guest can't
> > > > +     * write to CLR_DOWN because of VM reset, and for compatibility with
> > > > +     * old guests which do not have CLR_DOWN.
> > > > +     */
> > > > +    s->pci0_status.down &= ~(1U << slot);
> > > > +
> > > > +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > > +        PCIDevice *dev = PCI_DEVICE(qdev);
> > > > +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > > +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > > +            qdev_free(qdev);
> > > > +        }
> > > > +    }
> > > > +}
> > > > +
> > > >  static void piix4_reset(void *opaque)
> > > >  {
> > > >      PIIX4PMState *s = opaque;
> > > > @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
> > > >          pci_conf[0x5B] = 0x02;
> > > >      }
> > > >      piix4_update_hotplug(s);
> > > > +    /*
> > > > +     * Guest lost remove events if any.
> > > > +     * As it's being reset it's safe to remove the device now.
> > > > +     */
> > > > +    while (s->pci0_status.down) {
> > > > +        acpi_piix_eject_slot(s, s->pci0_status.down);
> > > > +    }
> > > > +    /*
> > > > +     * Guest lost add events if any.
> > > > +     * As it's being reset and will rescan the bus we can discard
> > > > +     * past events now.
> > > > +     */
> > > > +    s->pci0_status.up = 0;
> > > >  }
> > > >  
> > > >  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> > > > @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
> > > >  
> > > >  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
> > > >  {
> > > > -    BusState *bus = opaque;
> > > > -    DeviceState *qdev, *next;
> > > > -    int slot = ffs(val) - 1;
> > > > +    PIIX4PMState *s = opaque;
> > > >  
> > > > -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > > -        PCIDevice *dev = PCI_DEVICE(qdev);
> > > > -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > > -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > > -            qdev_free(qdev);
> > > > -        }
> > > > +    if (val) {
> > > > +        acpi_piix_eject_slot(s, val);
> > > >      }
> > > >  
> > > > -
> > > >      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
> > > >  }
> > > >  
> > > > +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> > > > +{
> > > > +    PIIX4PMState *s = opaque;
> > > > +    s->pci0_status.up &= ~val;
> > > > +
> > > > +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> > > > +}
> > > > +
> > > > +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> > > > +{
> > > > +    PIIX4PMState *s = opaque;
> > > > +    s->pci0_status.down &= ~val;
> > > > +
> > > > +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> > > > +}
> > > > +
> > > >  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
> > > >  {
> > > >      PIIX4PMState *s = opaque;
> > > > @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
> > > >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
> > > >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
> > > >  
> > > > -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> > > > -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> > > > +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> > > > +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> > > >  
> > > >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> > > >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> > > >  
> > > > +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> > > > +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> > > > +
> > > >      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
> > > >  }
> > > >  
> > > > @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> > > >          return 0;
> > > >      }
> > > >  
> > > > -    s->pci0_status.up = 0;
> > > > -    s->pci0_status.down = 0;
> > > >      if (state == PCI_HOTPLUG_ENABLED) {
> > > >          enable_device(s, slot);
> > > >      } else {
> > > 
> > > 
> 
>
Igor Mammedov March 30, 2012, 4:05 p.m. UTC | #7
On 03/29/2012 02:51 PM, Michael S. Tsirkin wrote:
> piix acpi interface suffers from the following 2 issues:
>
> 1.
> - delete device a
> - quickly add device b in another slot
>
> if we do this before guest reads the down register,
> the down event is discarded and device will never
> be deleted.
>
> 2.
> - delete device a
> - quickly reset before guest can respond
>
> interrupt is reset and guest will never eject the device.
>
> To fix this, we implement two changes:
>
> 1. Add two new registers:
> CLR_UP
> CLR_DOWN
> bios will now write to these the value it read from UP/DOWN
>
> 2. on reset, remove all devices which have DOWN bit set
>
> For compatibility with old guests, we also clear
> the DOWN bit on write to EJ0 for a device.
>
> This patch also adds more detailed documentation
> for the ACPI interface, including the semantics
> of both old and new registers.
>
> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> ---
>
> Changes from v1:
> - tweaked a comment about clearing down register on reset
> - documentation update
>
>   docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
>   hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
>   2 files changed, 121 insertions(+), 22 deletions(-)
>
> diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> index f0f74a7..a8398f9 100644
> --- a/docs/specs/acpi_pci_hotplug.txt
> +++ b/docs/specs/acpi_pci_hotplug.txt
> @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
>   ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
>   -----------------------------------------
>
> -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
>   event to ACPI BIOS, via SCI interrupt.
>
> -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> +Semantics: this event occurs each time a bit in either Pending PCI slot
> +injection notification or Pending PCI slot removal notification registers
> +changes status from 0 to 1.
> +
> +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
>   ---------------------------------------------------------------
> -Slot injection notification pending. One bit per slot.
> +Slot injection notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.
>
>   Read by ACPI BIOS GPE.1 handler to notify OS of injection
>   events.
>
> -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +Semantics: after a slot is populated by hotplug, it is immediately enabled
> +and a bit in this register is set.
> +Reset value: 0
> +
> +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
>   -----------------------------------------------------
> -Slot removal notification pending. One bit per slot.
> +Slot removal notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.
>
>   Read by ACPI BIOS GPE.1 handler to notify OS of removal
>   events.
>
> +Semantics: upon hotunplug request, a bit in this register is set
> +and the OSPM is notified about hotunplug request, but a slot remains enabled
> +until eject.
> +Reset value: 0
> +
>   PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
>   ----------------------------------------
>
>   Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> -Reads return 0.
> +Guests should not read this register. In practice reads return 0.
> +
> +Semantics: selected hotunpluggable slots are disabled and powered off.
> +Has no effect on non-hotunpluggable slots.
> +
> +For compatibility with old BIOS, writes to this register
> +clear bits in pending slot injection notification register.
>
>   PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
>   -----------------------------------------------
>
> -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> -bit per slot.
> +Used by ACPI BIOS _RMV method to detect removability status
> +and report to OS. One bit per slot.
> +Guests should not write this register, in practice the value
> +written is discarded.
> +
> +Semantics: selected slots support hotplug. Contents of this register
> +do not change while guest is running.
> +
> +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> +---------------------------------------------------------------
> +Clears bits in pending slot injection notification register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot injection notification register and before
> +notifying OS of injection events.
> +
> +Semantics: used to acknowledge the injection notification.
> +Does not affect slot status.
> +
> +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +-----------------------------------------------------
> +Clears bits in slot removal notification pending register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot removal notification register and before
> +notifying OS of removal events.
> +
> +Semantics: used to acknowledge the hotunplug request.
> +Does not affect slot status.
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 797ed24..50553fd 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -43,6 +43,8 @@
>   #define PCI_BASE 0xae00
>   #define PCI_EJ_BASE 0xae08
>   #define PCI_RMV_BASE 0xae0c
> +#define PCI_CLR_UP_BASE 0xae10
> +#define PCI_CLR_DOWN_BASE 0xae14
>
>   #define PIIX4_PCI_HOTPLUG_STATUS 2
>
> @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
>       }
>   }
>
> +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> +{
> +    DeviceState *qdev, *next;
> +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> +    int slot = ffs(slots) - 1;
> +
> +    /*
> +     * Clear DOWN register - this is good for a case where guest can't
> +     * write to CLR_DOWN because of VM reset, and for compatibility with
> +     * old guests which do not have CLR_DOWN.
> +     */
> +    s->pci0_status.down&= ~(1U<<  slot);
> +
> +    QTAILQ_FOREACH_SAFE(qdev,&bus->children, sibling, next) {
> +        PCIDevice *dev = PCI_DEVICE(qdev);
> +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> +        if (PCI_SLOT(dev->devfn) == slot&&  !pc->no_hotplug) {
> +            qdev_free(qdev);
> +        }
> +    }
> +}
> +
>   static void piix4_reset(void *opaque)
>   {
>       PIIX4PMState *s = opaque;
> @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
>           pci_conf[0x5B] = 0x02;
>       }
>       piix4_update_hotplug(s);
> +    /*
> +     * Guest lost remove events if any.
> +     * As it's being reset it's safe to remove the device now.
> +     */
> +    while (s->pci0_status.down) {
> +        acpi_piix_eject_slot(s, s->pci0_status.down);
> +    }
> +    /*
> +     * Guest lost add events if any.
> +     * As it's being reset and will rescan the bus we can discard
> +     * past events now.
> +     */
> +    s->pci0_status.up = 0;
>   }
>
>   static void piix4_powerdown(void *opaque, int irq, int power_failing)
> @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
>
>   static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
>   {
> -    BusState *bus = opaque;
> -    DeviceState *qdev, *next;
> -    int slot = ffs(val) - 1;
> +    PIIX4PMState *s = opaque;
>
> -    QTAILQ_FOREACH_SAFE(qdev,&bus->children, sibling, next) {
> -        PCIDevice *dev = PCI_DEVICE(qdev);
> -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> -        if (PCI_SLOT(dev->devfn) == slot&&  !pc->no_hotplug) {
> -            qdev_free(qdev);
> -        }
> +    if (val) {
> +        acpi_piix_eject_slot(s, val);
>       }
>
> -
>       PIIX4_DPRINTF("pciej write %x<== %d\n", addr, val);
>   }
>
> +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.up&= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_up write %x<== %d\n", addr, val);
> +}
> +
> +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.down&= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_down write %x<== %d\n", addr, val);
> +}
>
Why adding PCI_CLR_[UP|DOWN]_BASE instead of using PCI_BASE for this writes?

>   static uint32_t pcirmv_read(void *opaque, uint32_t addr)
>   {
>       PIIX4PMState *s = opaque;
> @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
>       register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
>       register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
>
> -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
>
>       register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
>       register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
>
> +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> +
>       pci_bus_hotplug(bus, piix4_device_hotplug,&s->dev.qdev);
>   }
>
> @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>           return 0;
>       }
>
> -    s->pci0_status.up = 0;
> -    s->pci0_status.down = 0;
>       if (state == PCI_HOTPLUG_ENABLED) {
>           enable_device(s, slot);
>       } else {

And if pci-hotplug could be done as cpu-hotplug, then it would simplify pci-hotplug
(guest would only need to read PCI_BASE) and probably it would be possible to unify hotplug
code for pci/cpu. But that for sure will break compatibility.

-----
  Igor
Michael S. Tsirkin April 1, 2012, 8:22 a.m. UTC | #8
On Fri, Mar 30, 2012 at 06:05:26PM +0200, Igor Mammedov wrote:
> On 03/29/2012 02:51 PM, Michael S. Tsirkin wrote:
> >piix acpi interface suffers from the following 2 issues:
> >
> >1.
> >- delete device a
> >- quickly add device b in another slot
> >
> >if we do this before guest reads the down register,
> >the down event is discarded and device will never
> >be deleted.
> >
> >2.
> >- delete device a
> >- quickly reset before guest can respond
> >
> >interrupt is reset and guest will never eject the device.
> >
> >To fix this, we implement two changes:
> >
> >1. Add two new registers:
> >CLR_UP
> >CLR_DOWN
> >bios will now write to these the value it read from UP/DOWN
> >
> >2. on reset, remove all devices which have DOWN bit set
> >
> >For compatibility with old guests, we also clear
> >the DOWN bit on write to EJ0 for a device.
> >
> >This patch also adds more detailed documentation
> >for the ACPI interface, including the semantics
> >of both old and new registers.
> >
> >Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >---
> >
> >Changes from v1:
> >- tweaked a comment about clearing down register on reset
> >- documentation update
> >
> >  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
> >  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
> >  2 files changed, 121 insertions(+), 22 deletions(-)
> >
> >diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> >index f0f74a7..a8398f9 100644
> >--- a/docs/specs/acpi_pci_hotplug.txt
> >+++ b/docs/specs/acpi_pci_hotplug.txt
> >@@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
> >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> >  -----------------------------------------
> >
> >-Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> >+Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
> >  event to ACPI BIOS, via SCI interrupt.
> >
> >-PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> >+Semantics: this event occurs each time a bit in either Pending PCI slot
> >+injection notification or Pending PCI slot removal notification registers
> >+changes status from 0 to 1.
> >+
> >+Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
> >  ---------------------------------------------------------------
> >-Slot injection notification pending. One bit per slot.
> >+Slot injection notification is pending. One bit per slot.
> >+Guests should not write this register, in practice the value
> >+written overwrites the register.
> >
> >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> >  events.
> >
> >-PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> >+Semantics: after a slot is populated by hotplug, it is immediately enabled
> >+and a bit in this register is set.
> >+Reset value: 0
> >+
> >+Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> >  -----------------------------------------------------
> >-Slot removal notification pending. One bit per slot.
> >+Slot removal notification is pending. One bit per slot.
> >+Guests should not write this register, in practice the value
> >+written overwrites the register.
> >
> >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> >  events.
> >
> >+Semantics: upon hotunplug request, a bit in this register is set
> >+and the OSPM is notified about hotunplug request, but a slot remains enabled
> >+until eject.
> >+Reset value: 0
> >+
> >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> >  ----------------------------------------
> >
> >  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> >-Reads return 0.
> >+Guests should not read this register. In practice reads return 0.
> >+
> >+Semantics: selected hotunpluggable slots are disabled and powered off.
> >+Has no effect on non-hotunpluggable slots.
> >+
> >+For compatibility with old BIOS, writes to this register
> >+clear bits in pending slot injection notification register.
> >
> >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> >  -----------------------------------------------
> >
> >-Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> >-bit per slot.
> >+Used by ACPI BIOS _RMV method to detect removability status
> >+and report to OS. One bit per slot.
> >+Guests should not write this register, in practice the value
> >+written is discarded.
> >+
> >+Semantics: selected slots support hotplug. Contents of this register
> >+do not change while guest is running.
> >+
> >+Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> >+---------------------------------------------------------------
> >+Clears bits in pending slot injection notification register. One bit per slot.
> >+Guests should not read this register. In practice reads return 0.
> >+
> >+Written to by ACPI BIOS GPE.1 handler after reading the
> >+pending slot injection notification register and before
> >+notifying OS of injection events.
> >+
> >+Semantics: used to acknowledge the injection notification.
> >+Does not affect slot status.
> >+
> >+Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> >+-----------------------------------------------------
> >+Clears bits in slot removal notification pending register. One bit per slot.
> >+Guests should not read this register. In practice reads return 0.
> >+
> >+Written to by ACPI BIOS GPE.1 handler after reading the
> >+pending slot removal notification register and before
> >+notifying OS of removal events.
> >+
> >+Semantics: used to acknowledge the hotunplug request.
> >+Does not affect slot status.
> >diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> >index 797ed24..50553fd 100644
> >--- a/hw/acpi_piix4.c
> >+++ b/hw/acpi_piix4.c
> >@@ -43,6 +43,8 @@
> >  #define PCI_BASE 0xae00
> >  #define PCI_EJ_BASE 0xae08
> >  #define PCI_RMV_BASE 0xae0c
> >+#define PCI_CLR_UP_BASE 0xae10
> >+#define PCI_CLR_DOWN_BASE 0xae14
> >
> >  #define PIIX4_PCI_HOTPLUG_STATUS 2
> >
> >@@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
> >      }
> >  }
> >
> >+static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> >+{
> >+    DeviceState *qdev, *next;
> >+    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> >+    int slot = ffs(slots) - 1;
> >+
> >+    /*
> >+     * Clear DOWN register - this is good for a case where guest can't
> >+     * write to CLR_DOWN because of VM reset, and for compatibility with
> >+     * old guests which do not have CLR_DOWN.
> >+     */
> >+    s->pci0_status.down&= ~(1U<<  slot);
> >+
> >+    QTAILQ_FOREACH_SAFE(qdev,&bus->children, sibling, next) {
> >+        PCIDevice *dev = PCI_DEVICE(qdev);
> >+        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> >+        if (PCI_SLOT(dev->devfn) == slot&&  !pc->no_hotplug) {
> >+            qdev_free(qdev);
> >+        }
> >+    }
> >+}
> >+
> >  static void piix4_reset(void *opaque)
> >  {
> >      PIIX4PMState *s = opaque;
> >@@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
> >          pci_conf[0x5B] = 0x02;
> >      }
> >      piix4_update_hotplug(s);
> >+    /*
> >+     * Guest lost remove events if any.
> >+     * As it's being reset it's safe to remove the device now.
> >+     */
> >+    while (s->pci0_status.down) {
> >+        acpi_piix_eject_slot(s, s->pci0_status.down);
> >+    }
> >+    /*
> >+     * Guest lost add events if any.
> >+     * As it's being reset and will rescan the bus we can discard
> >+     * past events now.
> >+     */
> >+    s->pci0_status.up = 0;
> >  }
> >
> >  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> >@@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
> >
> >  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
> >  {
> >-    BusState *bus = opaque;
> >-    DeviceState *qdev, *next;
> >-    int slot = ffs(val) - 1;
> >+    PIIX4PMState *s = opaque;
> >
> >-    QTAILQ_FOREACH_SAFE(qdev,&bus->children, sibling, next) {
> >-        PCIDevice *dev = PCI_DEVICE(qdev);
> >-        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> >-        if (PCI_SLOT(dev->devfn) == slot&&  !pc->no_hotplug) {
> >-            qdev_free(qdev);
> >-        }
> >+    if (val) {
> >+        acpi_piix_eject_slot(s, val);
> >      }
> >
> >-
> >      PIIX4_DPRINTF("pciej write %x<== %d\n", addr, val);
> >  }
> >
> >+static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> >+{
> >+    PIIX4PMState *s = opaque;
> >+    s->pci0_status.up&= ~val;
> >+
> >+    PIIX4_DPRINTF("pci_clr_up write %x<== %d\n", addr, val);
> >+}
> >+
> >+static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> >+{
> >+    PIIX4PMState *s = opaque;
> >+    s->pci0_status.down&= ~val;
> >+
> >+    PIIX4_DPRINTF("pci_clr_down write %x<== %d\n", addr, val);
> >+}
> >
> Why adding PCI_CLR_[UP|DOWN]_BASE instead of using PCI_BASE for this writes?

Because symbolic names for specific registers are better than hard-coded offsets.
I think we should replace PCI_BASE with PCI_UP_BASE/PCI_DOWN_BASE as
well.

> >  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
> >  {
> >      PIIX4PMState *s = opaque;
> >@@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
> >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
> >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
> >
> >-    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> >-    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> >+    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> >+    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> >
> >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> >
> >+    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> >+    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> >+
> >      pci_bus_hotplug(bus, piix4_device_hotplug,&s->dev.qdev);
> >  }
> >
> >@@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> >          return 0;
> >      }
> >
> >-    s->pci0_status.up = 0;
> >-    s->pci0_status.down = 0;
> >      if (state == PCI_HOTPLUG_ENABLED) {
> >          enable_device(s, slot);
> >      } else {
> 
> And if pci-hotplug could be done as cpu-hotplug, then it would simplify pci-hotplug
> (guest would only need to read PCI_BASE) and probably it would be possible to unify hotplug
> code for pci/cpu. But that for sure will break compatibility.

Which isn't really acceptable.

> -----
>  Igor
Gleb Natapov April 1, 2012, 8:26 a.m. UTC | #9
On Sun, Apr 01, 2012 at 11:22:33AM +0300, Michael S. Tsirkin wrote:
> > And if pci-hotplug could be done as cpu-hotplug, then it would simplify pci-hotplug
> > (guest would only need to read PCI_BASE) and probably it would be possible to unify hotplug
> > code for pci/cpu. But that for sure will break compatibility.
> 
> Which isn't really acceptable.
> 
I do not think it should be taken for granted. We should be able to get
rid of a crud once in a while.
 
--
			Gleb.
Michael S. Tsirkin April 1, 2012, 9:06 a.m. UTC | #10
On Sun, Apr 01, 2012 at 11:26:18AM +0300, Gleb Natapov wrote:
> On Sun, Apr 01, 2012 at 11:22:33AM +0300, Michael S. Tsirkin wrote:
> > > And if pci-hotplug could be done as cpu-hotplug, then it would simplify pci-hotplug
> > > (guest would only need to read PCI_BASE) and probably it would be possible to unify hotplug
> > > code for pci/cpu. But that for sure will break compatibility.
> > 
> > Which isn't really acceptable.
> > 
> I do not think it should be taken for granted. We should be able to get
> rid of a crud once in a while.

In general, yes. It needs to be well planned several years in
advance though. Something like
http://www.kernel.org/doc/Documentation/feature-removal-schedule.txt
might be a model.

Breaking compatibility would bring very little
gain in this case though, right?

> --
> 			Gleb.
Gleb Natapov April 1, 2012, 9:13 a.m. UTC | #11
On Sun, Apr 01, 2012 at 12:06:16PM +0300, Michael S. Tsirkin wrote:
> On Sun, Apr 01, 2012 at 11:26:18AM +0300, Gleb Natapov wrote:
> > On Sun, Apr 01, 2012 at 11:22:33AM +0300, Michael S. Tsirkin wrote:
> > > > And if pci-hotplug could be done as cpu-hotplug, then it would simplify pci-hotplug
> > > > (guest would only need to read PCI_BASE) and probably it would be possible to unify hotplug
> > > > code for pci/cpu. But that for sure will break compatibility.
> > > 
> > > Which isn't really acceptable.
> > > 
> > I do not think it should be taken for granted. We should be able to get
> > rid of a crud once in a while.
> 
> In general, yes. It needs to be well planned several years in
> advance though. Something like
> http://www.kernel.org/doc/Documentation/feature-removal-schedule.txt
> might be a model.
> 
No nearly the same. We do not drop any features, we just reimplementing
it differently. Since QEMU ships with the bios image that suppose to
work with it preserving old bios compatibility shouldn't be considered to
much IMO. Migration from qemu-1.0 to qemu-1.1 -M pc-1.0 should work
though. By adding more craft now we make cleanup harder later.

> Breaking compatibility would bring very little
> gain in this case though, right?
> 
It will clean up code instead of adding more and will allow reuse of
hotplug code in QEMU and bios between pci/cpu/memory.

--
			Gleb.
Gleb Natapov April 1, 2012, 9:19 a.m. UTC | #12
On Sun, Apr 01, 2012 at 12:13:18PM +0300, Gleb Natapov wrote:
> On Sun, Apr 01, 2012 at 12:06:16PM +0300, Michael S. Tsirkin wrote:
> > On Sun, Apr 01, 2012 at 11:26:18AM +0300, Gleb Natapov wrote:
> > > On Sun, Apr 01, 2012 at 11:22:33AM +0300, Michael S. Tsirkin wrote:
> > > > > And if pci-hotplug could be done as cpu-hotplug, then it would simplify pci-hotplug
> > > > > (guest would only need to read PCI_BASE) and probably it would be possible to unify hotplug
> > > > > code for pci/cpu. But that for sure will break compatibility.
> > > > 
> > > > Which isn't really acceptable.
> > > > 
> > > I do not think it should be taken for granted. We should be able to get
> > > rid of a crud once in a while.
> > 
> > In general, yes. It needs to be well planned several years in
> > advance though. Something like
> > http://www.kernel.org/doc/Documentation/feature-removal-schedule.txt
> > might be a model.
> > 
> No nearly the same. We do not drop any features, we just reimplementing
> it differently. Since QEMU ships with the bios image that suppose to
> work with it preserving old bios compatibility shouldn't be considered to
> much IMO. Migration from qemu-1.0 to qemu-1.1 -M pc-1.0 should work
> though. By adding more craft now we make cleanup harder later.
> 
> > Breaking compatibility would bring very little
> > gain in this case though, right?
> > 
> It will clean up code instead of adding more and will allow reuse of
> hotplug code in QEMU and bios between pci/cpu/memory.
> 
Just to clarify I am not against your approach per se. We discussed both
points Igor and Alex are bringing now with you on IRC and you convinced me
that your approach is good enough. It shows that IRC is not a proper
channel to discuss thous thing since discussion have to be repeated on
ML anyway, so I bring my points from that discussion here for
consideration by others.

--
			Gleb.
Gleb Natapov April 1, 2012, 9:24 a.m. UTC | #13
On Thu, Mar 29, 2012 at 02:51:44PM +0200, Michael S. Tsirkin wrote:
> piix acpi interface suffers from the following 2 issues:
> 
> 1.
> - delete device a
> - quickly add device b in another slot
> 
> if we do this before guest reads the down register,
> the down event is discarded and device will never
> be deleted.
> 
> 2.
> - delete device a
> - quickly reset before guest can respond
> 
> interrupt is reset and guest will never eject the device.
> 
> To fix this, we implement two changes:
> 
> 1. Add two new registers:
> CLR_UP
> CLR_DOWN
> bios will now write to these the value it read from UP/DOWN
> 
> 2. on reset, remove all devices which have DOWN bit set
> 
> For compatibility with old guests, we also clear
> the DOWN bit on write to EJ0 for a device.
> 
> This patch also adds more detailed documentation
> for the ACPI interface, including the semantics
> of both old and new registers.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>

But Michael will have to convince Igor and Alex too.

> ---
> 
> Changes from v1:
> - tweaked a comment about clearing down register on reset
> - documentation update
> 
>  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
>  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
>  2 files changed, 121 insertions(+), 22 deletions(-)
> 
> diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> index f0f74a7..a8398f9 100644
> --- a/docs/specs/acpi_pci_hotplug.txt
> +++ b/docs/specs/acpi_pci_hotplug.txt
> @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
>  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
>  -----------------------------------------
>  
> -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
>  event to ACPI BIOS, via SCI interrupt.
>  
> -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> +Semantics: this event occurs each time a bit in either Pending PCI slot
> +injection notification or Pending PCI slot removal notification registers
> +changes status from 0 to 1.
> +
> +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
>  ---------------------------------------------------------------
> -Slot injection notification pending. One bit per slot.
> +Slot injection notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.
>  
>  Read by ACPI BIOS GPE.1 handler to notify OS of injection
>  events.
>  
> -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +Semantics: after a slot is populated by hotplug, it is immediately enabled
> +and a bit in this register is set.
> +Reset value: 0
> +
> +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
>  -----------------------------------------------------
> -Slot removal notification pending. One bit per slot.
> +Slot removal notification is pending. One bit per slot.
> +Guests should not write this register, in practice the value
> +written overwrites the register.
>  
>  Read by ACPI BIOS GPE.1 handler to notify OS of removal
>  events.
>  
> +Semantics: upon hotunplug request, a bit in this register is set
> +and the OSPM is notified about hotunplug request, but a slot remains enabled
> +until eject.
> +Reset value: 0
> +
>  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
>  ----------------------------------------
>  
>  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> -Reads return 0.
> +Guests should not read this register. In practice reads return 0.
> +
> +Semantics: selected hotunpluggable slots are disabled and powered off.
> +Has no effect on non-hotunpluggable slots.
> +
> +For compatibility with old BIOS, writes to this register
> +clear bits in pending slot injection notification register.
>  
>  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
>  -----------------------------------------------
>  
> -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> -bit per slot.
> +Used by ACPI BIOS _RMV method to detect removability status
> +and report to OS. One bit per slot.
> +Guests should not write this register, in practice the value
> +written is discarded.
> +
> +Semantics: selected slots support hotplug. Contents of this register
> +do not change while guest is running.
> +
> +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> +---------------------------------------------------------------
> +Clears bits in pending slot injection notification register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot injection notification register and before
> +notifying OS of injection events.
> +
> +Semantics: used to acknowledge the injection notification.
> +Does not affect slot status.
> +
> +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> +-----------------------------------------------------
> +Clears bits in slot removal notification pending register. One bit per slot.
> +Guests should not read this register. In practice reads return 0.
> +
> +Written to by ACPI BIOS GPE.1 handler after reading the
> +pending slot removal notification register and before
> +notifying OS of removal events.
> +
> +Semantics: used to acknowledge the hotunplug request.
> +Does not affect slot status.
> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> index 797ed24..50553fd 100644
> --- a/hw/acpi_piix4.c
> +++ b/hw/acpi_piix4.c
> @@ -43,6 +43,8 @@
>  #define PCI_BASE 0xae00
>  #define PCI_EJ_BASE 0xae08
>  #define PCI_RMV_BASE 0xae0c
> +#define PCI_CLR_UP_BASE 0xae10
> +#define PCI_CLR_DOWN_BASE 0xae14
>  
>  #define PIIX4_PCI_HOTPLUG_STATUS 2
>  
> @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
>      }
>  }
>  
> +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> +{
> +    DeviceState *qdev, *next;
> +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> +    int slot = ffs(slots) - 1;
> +
> +    /*
> +     * Clear DOWN register - this is good for a case where guest can't
> +     * write to CLR_DOWN because of VM reset, and for compatibility with
> +     * old guests which do not have CLR_DOWN.
> +     */
> +    s->pci0_status.down &= ~(1U << slot);
> +
> +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> +        PCIDevice *dev = PCI_DEVICE(qdev);
> +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> +            qdev_free(qdev);
> +        }
> +    }
> +}
> +
>  static void piix4_reset(void *opaque)
>  {
>      PIIX4PMState *s = opaque;
> @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
>          pci_conf[0x5B] = 0x02;
>      }
>      piix4_update_hotplug(s);
> +    /*
> +     * Guest lost remove events if any.
> +     * As it's being reset it's safe to remove the device now.
> +     */
> +    while (s->pci0_status.down) {
> +        acpi_piix_eject_slot(s, s->pci0_status.down);
> +    }
> +    /*
> +     * Guest lost add events if any.
> +     * As it's being reset and will rescan the bus we can discard
> +     * past events now.
> +     */
> +    s->pci0_status.up = 0;
>  }
>  
>  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
>  
>  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
>  {
> -    BusState *bus = opaque;
> -    DeviceState *qdev, *next;
> -    int slot = ffs(val) - 1;
> +    PIIX4PMState *s = opaque;
>  
> -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> -        PCIDevice *dev = PCI_DEVICE(qdev);
> -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> -            qdev_free(qdev);
> -        }
> +    if (val) {
> +        acpi_piix_eject_slot(s, val);
>      }
>  
> -
>      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
>  }
>  
> +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.up &= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> +}
> +
> +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> +{
> +    PIIX4PMState *s = opaque;
> +    s->pci0_status.down &= ~val;
> +
> +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> +}
> +
>  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
>  {
>      PIIX4PMState *s = opaque;
> @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
>      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
>      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
>  
> -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
>  
>      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
>      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
>  
> +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> +
>      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
>  }
>  
> @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>          return 0;
>      }
>  
> -    s->pci0_status.up = 0;
> -    s->pci0_status.down = 0;
>      if (state == PCI_HOTPLUG_ENABLED) {
>          enable_device(s, slot);
>      } else {
> -- 
> 1.7.9.111.gf3fb0

--
			Gleb.
diff mbox

Patch

diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
index f0f74a7..a8398f9 100644
--- a/docs/specs/acpi_pci_hotplug.txt
+++ b/docs/specs/acpi_pci_hotplug.txt
@@ -7,31 +7,83 @@  describes the interface between QEMU and the ACPI BIOS.
 ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
 -----------------------------------------
 
-Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
+Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
 event to ACPI BIOS, via SCI interrupt.
 
-PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
+Semantics: this event occurs each time a bit in either Pending PCI slot
+injection notification or Pending PCI slot removal notification registers
+changes status from 0 to 1.
+
+Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
 ---------------------------------------------------------------
-Slot injection notification pending. One bit per slot.
+Slot injection notification is pending. One bit per slot.
+Guests should not write this register, in practice the value
+written overwrites the register.
 
 Read by ACPI BIOS GPE.1 handler to notify OS of injection
 events.
 
-PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
+Semantics: after a slot is populated by hotplug, it is immediately enabled
+and a bit in this register is set.
+Reset value: 0
+
+Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
 -----------------------------------------------------
-Slot removal notification pending. One bit per slot.
+Slot removal notification is pending. One bit per slot.
+Guests should not write this register, in practice the value
+written overwrites the register.
 
 Read by ACPI BIOS GPE.1 handler to notify OS of removal
 events.
 
+Semantics: upon hotunplug request, a bit in this register is set
+and the OSPM is notified about hotunplug request, but a slot remains enabled
+until eject.
+Reset value: 0
+
 PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
 ----------------------------------------
 
 Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
-Reads return 0.
+Guests should not read this register. In practice reads return 0.
+
+Semantics: selected hotunpluggable slots are disabled and powered off.
+Has no effect on non-hotunpluggable slots.
+
+For compatibility with old BIOS, writes to this register
+clear bits in pending slot injection notification register.
 
 PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
 -----------------------------------------------
 
-Used by ACPI BIOS _RMV method to indicate removability status to OS. One
-bit per slot.
+Used by ACPI BIOS _RMV method to detect removability status
+and report to OS. One bit per slot.
+Guests should not write this register, in practice the value
+written is discarded.
+
+Semantics: selected slots support hotplug. Contents of this register
+do not change while guest is running.
+
+Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
+---------------------------------------------------------------
+Clears bits in pending slot injection notification register. One bit per slot.
+Guests should not read this register. In practice reads return 0.
+
+Written to by ACPI BIOS GPE.1 handler after reading the
+pending slot injection notification register and before
+notifying OS of injection events.
+
+Semantics: used to acknowledge the injection notification.
+Does not affect slot status.
+
+Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
+-----------------------------------------------------
+Clears bits in slot removal notification pending register. One bit per slot.
+Guests should not read this register. In practice reads return 0.
+
+Written to by ACPI BIOS GPE.1 handler after reading the
+pending slot removal notification register and before
+notifying OS of removal events.
+
+Semantics: used to acknowledge the hotunplug request.
+Does not affect slot status.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 797ed24..50553fd 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -43,6 +43,8 @@ 
 #define PCI_BASE 0xae00
 #define PCI_EJ_BASE 0xae08
 #define PCI_RMV_BASE 0xae0c
+#define PCI_CLR_UP_BASE 0xae10
+#define PCI_CLR_DOWN_BASE 0xae14
 
 #define PIIX4_PCI_HOTPLUG_STATUS 2
 
@@ -287,6 +289,28 @@  static void piix4_update_hotplug(PIIX4PMState *s)
     }
 }
 
+static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
+{
+    DeviceState *qdev, *next;
+    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
+    int slot = ffs(slots) - 1;
+
+    /*
+     * Clear DOWN register - this is good for a case where guest can't
+     * write to CLR_DOWN because of VM reset, and for compatibility with
+     * old guests which do not have CLR_DOWN.
+     */
+    s->pci0_status.down &= ~(1U << slot);
+
+    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
+        PCIDevice *dev = PCI_DEVICE(qdev);
+        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
+        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
+            qdev_free(qdev);
+        }
+    }
+}
+
 static void piix4_reset(void *opaque)
 {
     PIIX4PMState *s = opaque;
@@ -302,6 +326,19 @@  static void piix4_reset(void *opaque)
         pci_conf[0x5B] = 0x02;
     }
     piix4_update_hotplug(s);
+    /*
+     * Guest lost remove events if any.
+     * As it's being reset it's safe to remove the device now.
+     */
+    while (s->pci0_status.down) {
+        acpi_piix_eject_slot(s, s->pci0_status.down);
+    }
+    /*
+     * Guest lost add events if any.
+     * As it's being reset and will rescan the bus we can discard
+     * past events now.
+     */
+    s->pci0_status.up = 0;
 }
 
 static void piix4_powerdown(void *opaque, int irq, int power_failing)
@@ -490,22 +527,31 @@  static uint32_t pciej_read(void *opaque, uint32_t addr)
 
 static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
 {
-    BusState *bus = opaque;
-    DeviceState *qdev, *next;
-    int slot = ffs(val) - 1;
+    PIIX4PMState *s = opaque;
 
-    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
-        PCIDevice *dev = PCI_DEVICE(qdev);
-        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
-        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
-            qdev_free(qdev);
-        }
+    if (val) {
+        acpi_piix_eject_slot(s, val);
     }
 
-
     PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
 }
 
+static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    PIIX4PMState *s = opaque;
+    s->pci0_status.up &= ~val;
+
+    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
+}
+
+static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
+{
+    PIIX4PMState *s = opaque;
+    s->pci0_status.down &= ~val;
+
+    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
+}
+
 static uint32_t pcirmv_read(void *opaque, uint32_t addr)
 {
     PIIX4PMState *s = opaque;
@@ -532,12 +578,15 @@  static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
     register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
     register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
 
-    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
-    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
+    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
+    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
 
     register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
     register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
 
+    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
+    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
+
     pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
 }
 
@@ -567,8 +616,6 @@  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
         return 0;
     }
 
-    s->pci0_status.up = 0;
-    s->pci0_status.down = 0;
     if (state == PCI_HOTPLUG_ENABLED) {
         enable_device(s, slot);
     } else {