Patchwork [RFC] AMD IOMMU emulation

login
register
mail settings
Submitter Eduard - Gabriel Munteanu
Date May 20, 2010, 1:50 p.m.
Message ID <1274363407-24862-1-git-send-email-eduard.munteanu@linux360.ro>
Download mbox | patch
Permalink /patch/53072/
State New
Headers show

Comments

Eduard - Gabriel Munteanu - May 20, 2010, 1:50 p.m.
This is preliminary work for AMD IOMMU emulation support.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
---
 Makefile.target |    2 +
 configure       |    9 +
 hw/amd_iommu.c  |  442 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/pc.c         |    2 +
 hw/pc.h         |    3 +
 hw/pci_ids.h    |    2 +
 hw/pci_regs.h   |    1 +
 7 files changed, 461 insertions(+), 0 deletions(-)
 create mode 100644 hw/amd_iommu.c
Joerg Roedel - May 24, 2010, 3:40 p.m.
Hi Eduard,

On Thu, May 20, 2010 at 04:50:07PM +0300, Eduard - Gabriel Munteanu wrote:
> +  --enable-amd-iommu-emul) amd_iommu="yes"
> +  ;;

A compile-time option is a good idea.

> +/* MMIO registers */
> +#define MMIO_DEVICE_TABLE       0x0000
> +#define MMIO_COMMAND_BASE       0x0008
> +#define MMIO_EVENT_BASE         0x0010
> +#define MMIO_CONTROL            0x0018
> +#define MMIO_EXCL_BASE          0x0020
> +#define MMIO_EXCL_LIMIT         0x0028
> +#define MMIO_COMMAND_HEAD       0x2000
> +#define MMIO_COMMAND_TAIL       0x2008
> +#define MMIO_EVENT_HEAD         0x2010
> +#define MMIO_EVENT_TAIL         0x2018
> +#define MMIO_STATUS             0x2020
> +
> +#define MMIO_SIZE               0x2028

This size should be a power-of-two value. In this case probably 0x4000.

> +#define MMIO_DEVTAB_SIZE_MASK   ((1UL << 12) - 1)
> +#define MMIO_DEVTAB_BASE_MASK   (((1UL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)

You must use ULL to be 32bit safe. This is also true for the defines
below.

[...]

Otherwise the code looks good so far. Seems like the next step should be
some work on a qemu dma-layer where you can hook the translation into.

	Joerg
Blue Swirl - May 24, 2010, 8:10 p.m.
On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel <joro@8bytes.org> wrote:
> Hi Eduard,
>
> On Thu, May 20, 2010 at 04:50:07PM +0300, Eduard - Gabriel Munteanu wrote:
>> +  --enable-amd-iommu-emul) amd_iommu="yes"
>> +  ;;
>
> A compile-time option is a good idea.
>
>> +/* MMIO registers */
>> +#define MMIO_DEVICE_TABLE       0x0000
>> +#define MMIO_COMMAND_BASE       0x0008
>> +#define MMIO_EVENT_BASE         0x0010
>> +#define MMIO_CONTROL            0x0018
>> +#define MMIO_EXCL_BASE          0x0020
>> +#define MMIO_EXCL_LIMIT         0x0028
>> +#define MMIO_COMMAND_HEAD       0x2000
>> +#define MMIO_COMMAND_TAIL       0x2008
>> +#define MMIO_EVENT_HEAD         0x2010
>> +#define MMIO_EVENT_TAIL         0x2018
>> +#define MMIO_STATUS             0x2020
>> +
>> +#define MMIO_SIZE               0x2028
>
> This size should be a power-of-two value. In this case probably 0x4000.

Not really, the devices can reserve regions of any size. There were
some implementation deficiencies in earlier versions of QEMU, where
the whole page would be reserved anyway, but this limitation has been
removed long time ago.

>> +#define MMIO_DEVTAB_SIZE_MASK   ((1UL << 12) - 1)
>> +#define MMIO_DEVTAB_BASE_MASK   (((1UL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
>
> You must use ULL to be 32bit safe. This is also true for the defines
> below.
>
> [...]
>
> Otherwise the code looks good so far. Seems like the next step should be
> some work on a qemu dma-layer where you can hook the translation into.
>
>        Joerg
>
>
>
Joerg Roedel - May 25, 2010, 8:39 a.m.
On Mon, May 24, 2010 at 08:10:16PM +0000, Blue Swirl wrote:
> On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel <joro@8bytes.org> wrote:
> >> +
> >> +#define MMIO_SIZE               0x2028
> >
> > This size should be a power-of-two value. In this case probably 0x4000.
> 
> Not really, the devices can reserve regions of any size. There were
> some implementation deficiencies in earlier versions of QEMU, where
> the whole page would be reserved anyway, but this limitation has been
> removed long time ago.

The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
driver maps the MMIO region with this size. So the emulation should
reserve this amount of MMIO space too.

	Joerg
Eduard - Gabriel Munteanu - May 25, 2010, 11:23 a.m.
On Tue, May 25, 2010 at 10:39:22AM +0200, Joerg Roedel wrote:
> On Mon, May 24, 2010 at 08:10:16PM +0000, Blue Swirl wrote:
> > On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel <joro@8bytes.org> wrote:
> > >> +
> > >> +#define MMIO_SIZE ? ? ? ? ? ? ? 0x2028
> > >
> > > This size should be a power-of-two value. In this case probably 0x4000.
> > 
> > Not really, the devices can reserve regions of any size. There were
> > some implementation deficiencies in earlier versions of QEMU, where
> > the whole page would be reserved anyway, but this limitation has been
> > removed long time ago.
> 
> The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
> driver maps the MMIO region with this size. So the emulation should
> reserve this amount of MMIO space too.
> 
> 	Joerg

Yeah, I'll change that, since I already reserve 0x4000 bytes in SeaBIOS
for it (I did that to deal with the 16 KiB alignment requirement).


	Eduard
Blue Swirl - May 25, 2010, 7:16 p.m.
On Tue, May 25, 2010 at 8:39 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Mon, May 24, 2010 at 08:10:16PM +0000, Blue Swirl wrote:
>> On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel <joro@8bytes.org> wrote:
>> >> +
>> >> +#define MMIO_SIZE               0x2028
>> >
>> > This size should be a power-of-two value. In this case probably 0x4000.
>>
>> Not really, the devices can reserve regions of any size. There were
>> some implementation deficiencies in earlier versions of QEMU, where
>> the whole page would be reserved anyway, but this limitation has been
>> removed long time ago.
>
> The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
> driver maps the MMIO region with this size. So the emulation should
> reserve this amount of MMIO space too.

Well, Linux drivers may take a conservative approach so I'd check
what's the value in the device specs. In practice, on x86 hardware the
size doesn't matter too much, for example on Sparc an access beyond
the end of the device region would trap.

Patch

diff --git a/Makefile.target b/Makefile.target
index 0bdb184..13f8086 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -217,6 +217,8 @@  obj-i386-y += testdev.o
 obj-i386-$(CONFIG_KVM_PIT) += i8254-kvm.o
 obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
 
+obj-i386-$(CONFIG_AMD_IOMMU) += amd_iommu.o
+
 # Hardware support
 obj-ia64-y += ide.o pckbd.o vga.o $(SOUND_HW) dma.o $(AUDIODRV)
 obj-ia64-y += fdc.o mc146818rtc.o serial.o i8259.o ipf.o
diff --git a/configure b/configure
index ed8e17b..34e5194 100755
--- a/configure
+++ b/configure
@@ -305,6 +305,7 @@  mixemu="no"
 kvm_trace="no"
 kvm_cap_pit=""
 kvm_cap_device_assignment=""
+amd_iommu="no"
 kerneldir=""
 aix="no"
 blobs="yes"
@@ -603,6 +604,8 @@  for opt do
   ;;
   --enable-kvm-device-assignment) kvm_cap_device_assignment="yes"
   ;;
+  --enable-amd-iommu-emul) amd_iommu="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --enable-cocoa)
@@ -829,6 +832,8 @@  echo "  --disable-kvm-pit        disable KVM pit support"
 echo "  --enable-kvm-pit         enable KVM pit support"
 echo "  --disable-kvm-device-assignment  disable KVM device assignment support"
 echo "  --enable-kvm-device-assignment   enable KVM device assignment support"
+echo "  --disable-amd-iommu-emul disable AMD IOMMU emulation"
+echo "  --enable-amd-iommu-emul  enable AMD IOMMU emulation"
 echo "  --disable-nptl           disable usermode NPTL support"
 echo "  --enable-nptl            enable usermode NPTL support"
 echo "  --enable-system          enable all system emulation targets"
@@ -2185,6 +2190,7 @@  echo "KVM support       $kvm"
 echo "KVM PIT support   $kvm_cap_pit"
 echo "KVM device assig. $kvm_cap_device_assignment"
 echo "KVM trace support $kvm_trace"
+echo "AMD IOMMU emul.   $amd_iommu"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
@@ -2599,6 +2605,9 @@  case "$target_arch2" in
   x86_64)
     TARGET_BASE_ARCH=i386
     target_phys_bits=64
+    if test "$amd_iommu" = "yes"; then
+      echo "CONFIG_AMD_IOMMU=y" >> $config_target_mak
+    fi
   ;;
   ia64)
     target_phys_bits=64
diff --git a/hw/amd_iommu.c b/hw/amd_iommu.c
new file mode 100644
index 0000000..cde90d0
--- /dev/null
+++ b/hw/amd_iommu.c
@@ -0,0 +1,442 @@ 
+/*
+ * AMD IOMMU emulation
+ *
+ * Copyright (c) 2010 Eduard - Gabriel Munteanu
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "pc.h"
+#include "hw.h"
+#include "pci.h"
+
+/* Capability registers */
+#define CAPAB_HEADER            0x00
+#define   CAPAB_REV_TYPE        0x02
+#define   CAPAB_FLAGS           0x03
+#define CAPAB_BAR_LOW           0x04
+#define CAPAB_BAR_HIGH          0x08
+#define CAPAB_RANGE             0x0C
+#define CAPAB_MISC              0x10
+
+#define CAPAB_SIZE              0x14
+
+/* Capability header data */
+#define CAPAB_FLAG_IOTLBSUP     (1 << 0)
+#define CAPAB_FLAG_HTTUNNEL     (1 << 1)
+#define CAPAB_FLAG_NPCACHE      (1 << 2)
+#define CAPAB_INIT_REV          (1 << 3)
+#define CAPAB_INIT_TYPE         3
+#define CAPAB_INIT_REV_TYPE     (CAPAB_REV | CAPAB_TYPE)
+#define CAPAB_INIT_FLAGS        (CAPAB_FLAG_NPCACHE | CAPAB_FLAG_HTTUNNEL)
+#define CAPAB_INIT_MISC         (64 << 15) | (48 << 8)
+#define CAPAB_BAR_MASK          ~((1UL << 14) - 1)
+
+/* MMIO registers */
+#define MMIO_DEVICE_TABLE       0x0000
+#define MMIO_COMMAND_BASE       0x0008
+#define MMIO_EVENT_BASE         0x0010
+#define MMIO_CONTROL            0x0018
+#define MMIO_EXCL_BASE          0x0020
+#define MMIO_EXCL_LIMIT         0x0028
+#define MMIO_COMMAND_HEAD       0x2000
+#define MMIO_COMMAND_TAIL       0x2008
+#define MMIO_EVENT_HEAD         0x2010
+#define MMIO_EVENT_TAIL         0x2018
+#define MMIO_STATUS             0x2020
+
+#define MMIO_SIZE               0x2028
+
+#define MMIO_DEVTAB_SIZE_MASK   ((1UL << 12) - 1)
+#define MMIO_DEVTAB_BASE_MASK   (((1UL << 52) - 1) & ~MMIO_DEVTAB_SIZE_MASK)
+#define MMIO_DEVTAB_ENTRY_SIZE  32
+#define MMIO_DEVTAB_SIZE_UNIT   4096
+
+#define MMIO_CMDBUF_SIZE_BYTE       (MMIO_COMMAND_BASE + 7)
+#define MMIO_CMDBUF_SIZE_MASK       0x0F
+#define MMIO_CMDBUF_BASE_MASK       MMIO_DEVTAB_BASE_MASK
+#define MMIO_CMDBUF_DEFAULT_SIZE    8
+#define MMIO_CMDBUF_HEAD_MASK       (((1UL << 19) - 1) & ~0x0F)
+#define MMIO_CMDBUF_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EVTLOG_SIZE_BYTE       (MMIO_EVENT_BASE + 7)
+#define MMIO_EVTLOG_SIZE_MASK       MMIO_CMDBUF_SIZE_MASK
+#define MMIO_EVTLOG_BASE_MASK       MMIO_CMDBUF_BASE_MASK
+#define MMIO_EVTLOG_DEFAULT_SIZE    MMIO_CMDBUF_DEFAULT_SIZE
+#define MMIO_EVTLOG_HEAD_MASK       (((1UL << 19) - 1) & ~0x0F)
+#define MMIO_EVTLOG_TAIL_MASK       MMIO_EVTLOG_HEAD_MASK
+
+#define MMIO_EXCL_BASE_MASK         MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_ENABLED_MASK      (1 << 0)
+#define MMIO_EXCL_ALLOW_MASK        (1 << 1)
+#define MMIO_EXCL_LIMIT_MASK        MMIO_DEVTAB_BASE_MASK
+#define MMIO_EXCL_LIMIT_LOW         0xFFF
+
+#define CMDBUF_ID_BYTE              0x07
+#define CMDBUF_ENTRY_SIZE           0x0C
+
+#define CMD_COMPLETION_WAIT         0x01
+#define CMD_INVAL_DEVTAB_ENTRY      0x02
+#define CMD_INVAL_IOMMU_PAGES       0x03
+#define CMD_INVAL_IOTLB_PAGES       0x04
+#define CMD_INVAL_INTR_TABLE        0x05
+
+struct amd_iommu_state {
+    PCIDevice                   dev;
+
+    int                         capab_offset;
+    unsigned char               *capab;
+
+    int                         mmio_index;
+    target_phys_addr_t          mmio_addr;
+    unsigned char               *mmio_buf;
+    int                         mmio_enabled;
+
+    int                         enabled;
+
+    unsigned char               *devtab;
+    size_t                      devtab_len;
+
+    unsigned char               *cmdbuf;
+    size_t                      cmdbuf_len;
+    size_t                      cmdbuf_head;
+    size_t                      cmdbuf_tail;
+
+    unsigned char               *evtlog;
+    size_t                      evtlog_len;
+    size_t                      evtlog_head;
+    size_t                      evtlog_tail;
+
+    unsigned char               *excl_base;
+    unsigned char               *excl_limit;
+    int                         excl_enabled;
+    int                         excl_allow;
+};
+
+static uint32_t amd_iommu_mmio_buf_read(struct amd_iommu_state *st,
+                                        size_t offset,
+                                        size_t size)
+{
+    size_t i;
+    uint32_t ret;
+
+    if (!size)
+        return 0;
+
+    ret = st->mmio_buf[offset + size];
+    for (i = size - 1; i > 0; i--) {
+        ret <<= 8;
+        ret |= st->mmio_buf[offset + i];
+    }
+
+    return ret;
+}
+
+static void amd_iommu_mmio_buf_write(struct amd_iommu_state *st,
+                                     size_t offset,
+                                     size_t size,
+                                     uint32_t val)
+{
+    size_t i;
+
+    for (i = 0; i < size; i++) {
+        st->mmio_buf[offset + i] = val & 0xFF;
+        val >>= 8;
+    }
+}
+
+static void amd_iommu_update_mmio(struct amd_iommu_state *st,
+                                  target_phys_addr_t addr)
+{
+    size_t reg = addr & ~0x07;
+    uint64_t *base = (uint64_t *) &st->mmio_buf[reg];
+
+    switch (reg) {
+        case MMIO_CONTROL:
+            st->enabled = *base & 1;
+            break;
+        case MMIO_DEVICE_TABLE:
+            st->devtab = (unsigned char *) (*base & MMIO_DEVTAB_BASE_MASK);
+            st->devtab_len = ((*base & MMIO_DEVTAB_SIZE_MASK) + 1) *
+                             (MMIO_DEVTAB_SIZE_UNIT / MMIO_DEVTAB_ENTRY_SIZE);
+            printf("AMD IOMMU: set device table at %p, %zu entries.\n",
+                   st->devtab, st->devtab_len);
+            break;
+        case MMIO_COMMAND_BASE:
+            st->cmdbuf = (unsigned char *) (*base & MMIO_CMDBUF_BASE_MASK);
+            st->cmdbuf_len = 1UL << (st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] &
+                                     MMIO_CMDBUF_SIZE_MASK);
+            break;
+        case MMIO_COMMAND_HEAD:
+            st->cmdbuf_head = *base & MMIO_CMDBUF_HEAD_MASK;
+            break;
+        case MMIO_COMMAND_TAIL:
+            st->cmdbuf_tail = *base & MMIO_CMDBUF_TAIL_MASK;
+            break;
+        case MMIO_EVENT_BASE:
+            st->evtlog = (unsigned char *) (*base & MMIO_EVTLOG_BASE_MASK);
+            st->evtlog_len = 1UL << (st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] &
+                                     MMIO_EVTLOG_SIZE_MASK);
+            break;
+        case MMIO_EVENT_HEAD:
+            st->evtlog_head = *base & MMIO_EVTLOG_HEAD_MASK;
+            break;
+        case MMIO_EVENT_TAIL:
+            st->evtlog_tail = *base & MMIO_EVTLOG_TAIL_MASK;
+            break;
+        case MMIO_EXCL_BASE:
+            st->excl_base = (unsigned char *) (*base & MMIO_EXCL_BASE_MASK);
+            st->excl_enabled = *base & MMIO_EXCL_ENABLED_MASK;
+            st->excl_allow = *base & MMIO_EXCL_ALLOW_MASK;
+            break;
+        case MMIO_EXCL_LIMIT:
+            st->excl_limit = (unsigned char *) ((*base & MMIO_EXCL_LIMIT_MASK) |
+                                                MMIO_EXCL_LIMIT_LOW);
+            break;
+        default:
+            break;
+    }
+}
+
+static uint32_t amd_iommu_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    printf("AMD IOMMU: byte read from %lx\n", addr);
+
+    return amd_iommu_mmio_buf_read(st, addr, 1);
+}
+
+static uint32_t amd_iommu_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    printf("AMD IOMMU: word read from %lx\n", addr);
+
+    return amd_iommu_mmio_buf_read(st, addr, 2);
+}
+
+static uint32_t amd_iommu_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+    struct amd_iommu_state *st = opaque;
+
+    printf("AMD IOMMU: long read from %lx\n", addr);
+
+    return amd_iommu_mmio_buf_read(st, addr, 4);
+}
+
+static void amd_iommu_mmio_writeb(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    printf("AMD IOMMU: byte write %x to %lx\n", val, addr);
+
+    amd_iommu_mmio_buf_write(st, addr, 1, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writew(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    printf("AMD IOMMU: word write %x to %lx\n", val, addr);
+
+    amd_iommu_mmio_buf_write(st, addr, 2, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static void amd_iommu_mmio_writel(void *opaque,
+                                  target_phys_addr_t addr,
+                                  uint32_t val)
+{
+    struct amd_iommu_state *st = opaque;
+
+    printf("AMD IOMMU: long write %x to %lx\n", val, addr);
+
+    amd_iommu_mmio_buf_write(st, addr, 4, val);
+    amd_iommu_update_mmio(st, addr);
+}
+
+static CPUReadMemoryFunc * const amd_iommu_mmio_read[] = {
+    amd_iommu_mmio_readb,
+    amd_iommu_mmio_readw,
+    amd_iommu_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const amd_iommu_mmio_write[] = {
+    amd_iommu_mmio_writeb,
+    amd_iommu_mmio_writew,
+    amd_iommu_mmio_writel,
+};
+
+static void amd_iommu_init_mmio(struct amd_iommu_state *st)
+{
+    st->mmio_buf[MMIO_CMDBUF_SIZE_BYTE] = MMIO_CMDBUF_DEFAULT_SIZE;
+    st->mmio_buf[MMIO_EVTLOG_SIZE_BYTE] = MMIO_EVTLOG_DEFAULT_SIZE;
+}
+
+static void amd_iommu_enable_mmio(struct amd_iommu_state *st)
+{
+    target_phys_addr_t addr;
+
+    st->mmio_index = cpu_register_io_memory(amd_iommu_mmio_read,
+                                            amd_iommu_mmio_write, st);
+    if (st->mmio_index < 0)
+        return;
+
+    addr = le64_to_cpu(*(uint64_t *) &st->capab[CAPAB_BAR_LOW]) & CAPAB_BAR_MASK;
+    cpu_register_physical_memory(addr, MMIO_SIZE, st->mmio_index);
+
+    st->mmio_addr = addr;
+    st->mmio_buf = qemu_mallocz(MMIO_SIZE);
+    st->mmio_enabled = 1;
+    amd_iommu_init_mmio(st);
+
+    printf("amd_iommu: enabled at %lx\n", addr);
+}
+
+static uint32_t amd_iommu_read_capab(PCIDevice *pci_dev,
+                                     uint32_t addr, int len)
+{
+    uint32_t val = pci_default_cap_read_config(pci_dev, addr, len);
+
+    printf("amd_iommu_read_capab: addr %x, len %x, val %x\n", addr, len, val);
+
+    return val;
+}
+
+static void amd_iommu_write_capab(PCIDevice *dev,
+                                  uint32_t addr, uint32_t val, int len)
+{
+    struct amd_iommu_state *st;
+    unsigned char *capab;
+    int reg;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+    capab = st->capab;
+    reg = (addr - 0x40) & ~0x3;  /* Get the 32-bits register. */
+
+    printf("amd_iommu_write_capab: addr %x, val %x, len %x, reg %x\n", addr, val, len, reg);
+
+    switch (reg) {
+        case CAPAB_HEADER:
+        case CAPAB_MISC:
+            /* Read-only. */
+            return;
+        case CAPAB_BAR_LOW:
+        case CAPAB_BAR_HIGH:
+        case CAPAB_RANGE:
+            if (st->mmio_enabled) {
+                printf("amd_iommu_write_capab: already enabled, can't write!\n");
+                return;
+            }
+            pci_default_cap_write_config(dev, addr, val, len);
+            break;
+        default:
+            return;
+    }
+
+    if (capab[CAPAB_BAR_LOW] & 0x1)
+        amd_iommu_enable_mmio(st);
+}
+
+static int amd_iommu_init_capab(PCIDevice *dev)
+{
+    struct amd_iommu_state *st;
+    unsigned char *capab;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+    capab = st->dev.config + st->capab_offset;
+
+    capab[CAPAB_REV_TYPE]  = CAPAB_REV_TYPE;
+    capab[CAPAB_FLAGS]     = CAPAB_FLAGS;
+    capab[CAPAB_BAR_LOW]   = 0;
+    capab[CAPAB_BAR_HIGH]  = 0;
+    capab[CAPAB_RANGE]     = 0;
+    *((uint32_t *) &capab[CAPAB_MISC]) = cpu_to_le32(CAPAB_INIT_MISC);
+
+    st->capab = capab;
+    st->dev.cap.length = CAPAB_SIZE;
+
+    printf("amd_iommu_init_capab: ran fine!\n");
+
+    return 0;
+}
+
+static int amd_iommu_pci_initfn(PCIDevice *dev)
+{
+    struct amd_iommu_state *st;
+    int err;
+
+    st = DO_UPCAST(struct amd_iommu_state, dev, dev);
+
+    pci_config_set_vendor_id(st->dev.config, PCI_VENDOR_ID_AMD);
+    pci_config_set_device_id(st->dev.config, PCI_DEVICE_ID_AMD_IOMMU);
+    pci_config_set_class(st->dev.config, PCI_CLASS_SYSTEM_IOMMU);
+
+    st->capab_offset = pci_add_capability(&st->dev,
+                                          PCI_CAP_ID_SEC,
+                                          CAPAB_SIZE);
+    err = pci_enable_capability_support(&st->dev, st->capab_offset,
+                                        amd_iommu_read_capab,
+                                        amd_iommu_write_capab,
+                                        amd_iommu_init_capab);
+
+    printf("amd_iommu_dev_init: finished, cap at 0x%x, size %d, err = %d\n",
+           st->capab_offset, CAPAB_SIZE, err);
+
+    return err;
+}
+
+static const VMStateDescription vmstate_amd_iommu = {
+    .name                       = "amd-iommu",
+    .version_id                 = 1,
+    .minimum_version_id         = 1,
+    .minimum_version_id_old     = 1,
+    .fields                     = (VMStateField []) {
+        VMSTATE_PCI_DEVICE(dev, struct amd_iommu_state),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static PCIDeviceInfo amd_iommu_pci_info = {
+    .qdev.name    = "amd-iommu",
+    .qdev.desc    = "AMD IOMMU",
+    .qdev.size    = sizeof(struct amd_iommu_state),
+    .qdev.vmsd    = &vmstate_amd_iommu,
+    .init         = amd_iommu_pci_initfn,
+};
+
+void amd_iommu_init(PCIBus *bus)
+{
+    pci_create_simple(bus, -1, "amd-iommu");
+    printf("amd_iommu_init: finished\n");
+}
+
+static void amd_iommu_register(void)
+{
+    pci_qdev_register(&amd_iommu_pci_info);
+    printf("amd_iommu_register: finished\n");
+}
+
+device_init(amd_iommu_register);
diff --git a/hw/pc.c b/hw/pc.c
index 4a4a706..1999530 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -998,6 +998,8 @@  void pc_pci_device_init(PCIBus *pci_bus)
     int max_bus;
     int bus;
 
+    amd_iommu_init(pci_bus);
+
     max_bus = drive_get_max_bus(IF_SCSI);
     for (bus = 0; bus <= max_bus; bus++) {
         pci_create_simple(pci_bus, -1, "lsi53c895a");
diff --git a/hw/pc.h b/hw/pc.h
index 20f621d..af47f13 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -190,4 +190,7 @@  void extboot_init(BlockDriverState *bs);
 
 int e820_add_entry(uint64_t, uint64_t, uint32_t);
 
+/* amd_iommu.c */
+void amd_iommu_init(PCIBus *bus);
+
 #endif
diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index fe7a121..ac34779 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -26,6 +26,7 @@ 
 
 #define PCI_CLASS_MEMORY_RAM             0x0500
 
+#define PCI_CLASS_SYSTEM_IOMMU           0x0806
 #define PCI_CLASS_SYSTEM_OTHER           0x0880
 
 #define PCI_CLASS_SERIAL_USB             0x0c03
@@ -56,6 +57,7 @@ 
 
 #define PCI_VENDOR_ID_AMD                0x1022
 #define PCI_DEVICE_ID_AMD_LANCE          0x2000
+#define PCI_DEVICE_ID_AMD_IOMMU          0x0000     /* FIXME */
 
 #define PCI_VENDOR_ID_MOTOROLA           0x1057
 #define PCI_DEVICE_ID_MOTOROLA_MPC106    0x0002
diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 1c675dc..6399b5d 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -216,6 +216,7 @@ 
 #define  PCI_CAP_ID_SHPC 	0x0C	/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID	0x0D	/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3	0x0E	/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SEC     0x0F    /* Secure Device (AMD IOMMU) */
 #define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
 #define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define  PCI_CAP_ID_AF		0x13	/* PCI Advanced Features */