Patchwork [PATCHv4] piix: fix up/down races

login
register
mail settings
Submitter Michael S. Tsirkin
Date April 3, 2012, 1:52 p.m.
Message ID <20120403135251.GA12182@redhat.com>
Download mbox | patch
Permalink /patch/150444/
State New
Headers show

Comments

Michael S. Tsirkin - April 3, 2012, 1:52 p.m.
piix acpi interface suffers from the following 2 issues:

1.
- delete device a
- quickly add device b in another slot

if we do this before guest reads the down register,
the down event is discarded and device will never
be deleted.

2.
- delete device a
- quickly reset before guest can respond

interrupt is reset and guest will never eject the device.

To fix this, we implement three changes:

1. Do not clear up/down registers on each hotplug event.
this ensures we don't lose events.

2. Make existing up/down registers write 1 to clear;
bios will now write back to these the value it read.

3. on reset, remove all devices which have DOWN bit set

For compatibility with old guests, we also clear
the DOWN bit on write to EJ0 for a device.

This patch also extends the documentation for the
ACPI interface, to make the semantics explicit.

Compatibility notes: migrating guests across qemu versions
can cause bios from one qemu realease running on another qemu
release. The following documents the behaviour in this case:

A. new bios running on old qemu
A bios implementing the change mentioned above will if running on an old
qemu introduce a race: the registers were writeable there, so if a
register changes between read and write, we'll clobber the new bits
losing events.  As that version tends to lose events anyway, and as the
better than complicating the interface. If someone cares enough, the fix
can go on the stable branch.
B. old bios running on new qemu
Down bits will get cleared on eject because of 3 above.
Up bits only get cleared on reset.
This will cause unnecessary notifications and bus rescans,
but they are harmless.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Changes from v3:
- Documentation/comment tweaks suggested by Alex and Igor
- assert that DOWN is clear before adding a device
Changes from v2:
- Use existing register instead of adding a new one,
  and update documentation
Changes from v1:
- tweaked a comment about clearing down register on reset
- documentation update


 docs/specs/acpi_pci_hotplug.txt |   60 ++++++++++++++++++++++++++++++++-----
 hw/acpi_piix4.c                 |   63 +++++++++++++++++++++++++++++----------
 2 files changed, 99 insertions(+), 24 deletions(-)

Patch

diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
index f0f74a7..22cd32d 100644
--- a/docs/specs/acpi_pci_hotplug.txt
+++ b/docs/specs/acpi_pci_hotplug.txt
@@ -7,31 +7,75 @@  describes the interface between QEMU and the ACPI BIOS.
 ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
 -----------------------------------------
 
-Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
+Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
 event to ACPI BIOS, via SCI interrupt.
 
-PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
+Semantics: this event occurs each time a bit in either Pending PCI slot
+injection notification or Pending PCI slot removal notification registers
+changes status from 0 to 1.
+
+Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
 ---------------------------------------------------------------
-Slot injection notification pending. One bit per slot.
+Slot injection notification is pending. One bit per slot.
+When written by Guest, this register implements write one to clear behaviour.
 
 Read by ACPI BIOS GPE.1 handler to notify OS of injection
 events.
+Written to by ACPI BIOS GPE.1 handler after reading
+and before notifying OS of injection events.
+
+Semantics: after a slot is populated by hotplug, it is immediately enabled
+and a bit in this register is set.
+Writes into this register are used to acknowledge the injection notification,
+clearing the injection notification events in specified slots,
+and do not affect the slot status.
+Reset value: 0
 
-PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
+Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
 -----------------------------------------------------
-Slot removal notification pending. One bit per slot.
+Slot removal notification is pending. One bit per slot.
+When written by Guest, this register implements write one to clear behaviour:
 
 Read by ACPI BIOS GPE.1 handler to notify OS of removal
 events.
 
+Written to by ACPI BIOS GPE.1 handler after reading
+and before notifying OS of removal events.
+
+Note: if both slot removal and slot injection notification have a specific bit
+set, ACPI BIOS notifies OS of injection event followed by removal event.
+
+Semantics: upon hotunplug request, a bit in this register is set
+and the OSPM is notified about hotunplug request, but a slot remains enabled
+until eject.
+Writes into this register are used to acknowledge the hotunplug request,
+clearing the removal notification events in specified slots,
+and do not affect the slot status.
+Reset value: 0
+
 PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
 ----------------------------------------
 
 Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
-Reads return 0.
+Guests should not read this register. In practice reads return 0.
+guests should write values with exactly one bit set, selecting
+the slot to eject.
+In practice all bits except the lowest bit that is set are discarded.  Also in
+practice writing the value 0 into this register has no effect.
+
+Semantics: selected hotunpluggable slot is disabled and powered off.
+Has no effect on non-hotunpluggable slots.
+
+For compatibility with old BIOS, writes to this register
+clear bits in pending slot injection notification register.
 
 PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
 -----------------------------------------------
 
-Used by ACPI BIOS _RMV method to indicate removability status to OS. One
-bit per slot.
+Used by ACPI BIOS _RMV method to detect removability status
+and report to OS. One bit per slot.
+Guests should not write this register, in practice the value
+written is discarded.
+
+Semantics: selected slots support hotplug. Contents of this register
+do not change while guest is running.
diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 797ed24..a4b82c6 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -287,6 +287,30 @@  static void piix4_update_hotplug(PIIX4PMState *s)
     }
 }
 
+static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
+{
+    DeviceState *qdev, *next;
+    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
+    int slot = ffs(slots) - 1;
+
+    /*
+     * Clear DOWN register - this is good for a case where guest can't
+     * write to DOWN because of VM reset, and for compatibility with
+     * old guests which do not write to DOWN.
+     * Note: do not clear UP register for old guests here since extra bits
+     * in UP register only cause harmless bus rescans.
+     */
+    s->pci0_status.down &= ~(1U << slot);
+
+    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
+        PCIDevice *dev = PCI_DEVICE(qdev);
+        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
+        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
+            qdev_free(qdev);
+        }
+    }
+}
+
 static void piix4_reset(void *opaque)
 {
     PIIX4PMState *s = opaque;
@@ -302,6 +326,19 @@  static void piix4_reset(void *opaque)
         pci_conf[0x5B] = 0x02;
     }
     piix4_update_hotplug(s);
+    /*
+     * Guest lost remove events if any.
+     * As it's being reset it's safe to remove the device now.
+     */
+    while (s->pci0_status.down) {
+        acpi_piix_eject_slot(s, s->pci0_status.down);
+    }
+    /*
+     * Guest lost add events if any.
+     * As it's being reset and will rescan the bus we can discard
+     * past events now.
+     */
+    s->pci0_status.up = 0;
 }
 
 static void piix4_powerdown(void *opaque, int irq, int power_failing)
@@ -472,10 +509,10 @@  static void pcihotplug_write(void *opaque, uint32_t addr, uint32_t val)
     struct pci_status *g = opaque;
     switch (addr) {
         case PCI_BASE:
-            g->up = val;
+            g->up &= ~val;
             break;
         case PCI_BASE + 4:
-            g->down = val;
+            g->down &= ~val;
             break;
    }
 
@@ -490,19 +527,12 @@  static uint32_t pciej_read(void *opaque, uint32_t addr)
 
 static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
 {
-    BusState *bus = opaque;
-    DeviceState *qdev, *next;
-    int slot = ffs(val) - 1;
+    PIIX4PMState *s = opaque;
 
-    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
-        PCIDevice *dev = PCI_DEVICE(qdev);
-        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
-        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
-            qdev_free(qdev);
-        }
+    if (val) {
+        acpi_piix_eject_slot(s, val);
     }
 
-
     PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
 }
 
@@ -532,8 +562,8 @@  static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
     register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
     register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
 
-    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
-    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
+    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
+    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
 
     register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
     register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
@@ -545,12 +575,15 @@  static void enable_device(PIIX4PMState *s, int slot)
 {
     s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
     s->pci0_status.up |= (1 << slot);
+    /* Make sure DOWN has been cleared on eject */
+    assert(!(s->pci0_status.down & (1 << slot)));
 }
 
 static void disable_device(PIIX4PMState *s, int slot)
 {
     s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
     s->pci0_status.down |= (1 << slot);
+    /* UP can be set here: happens if enable is quickly followed by disable */
 }
 
 static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
@@ -567,8 +600,6 @@  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
         return 0;
     }
 
-    s->pci0_status.up = 0;
-    s->pci0_status.down = 0;
     if (state == PCI_HOTPLUG_ENABLED) {
         enable_device(s, slot);
     } else {