diff mbox

[PULL,v5,26/57] x86-iommu: introduce IEC notifiers

Message ID 1469123413-20809-27-git-send-email-mst@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin July 21, 2016, 5:53 p.m. UTC
From: Peter Xu <peterx@redhat.com>

This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
invalidation notifier list. When vIOMMU receives IEC invalidate
request, all the registered units will be notified with specific
invalidation requests.

Intel IOMMU is the first provider that generates such a event.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu_internal.h | 24 ++++++++++++++++++++----
 include/hw/i386/x86-iommu.h    | 40 ++++++++++++++++++++++++++++++++++++++++
 hw/i386/intel_iommu.c          | 36 +++++++++++++++++++++++++++++-------
 hw/i386/x86-iommu.c            | 29 +++++++++++++++++++++++++++++
 hw/i386/trace-events           |  3 +++
 5 files changed, 121 insertions(+), 11 deletions(-)

Comments

Jan Kiszka July 30, 2016, 7:52 a.m. UTC | #1
Sorry, only noticed this now in a discussion with David over the AMD IOMMU:

On 2016-07-21 19:53, Michael S. Tsirkin wrote:
> From: Peter Xu <peterx@redhat.com>
> 
> This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
> invalidation notifier list. When vIOMMU receives IEC invalidate
> request, all the registered units will be notified with specific
> invalidation requests.
> 
> Intel IOMMU is the first provider that generates such a event.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/i386/intel_iommu_internal.h | 24 ++++++++++++++++++++----
>  include/hw/i386/x86-iommu.h    | 40 ++++++++++++++++++++++++++++++++++++++++
>  hw/i386/intel_iommu.c          | 36 +++++++++++++++++++++++++++++-------
>  hw/i386/x86-iommu.c            | 29 +++++++++++++++++++++++++++++
>  hw/i386/trace-events           |  3 +++
>  5 files changed, 121 insertions(+), 11 deletions(-)
> 

...

> @@ -57,4 +76,25 @@ struct X86IOMMUState {
>   */
>  X86IOMMUState *x86_iommu_get_default(void);
>  
> +/**
> + * x86_iommu_iec_register_notifier - register IEC (Interrupt Entry
> + *                                   Cache) notifiers
> + * @iommu: IOMMU device to register
> + * @fn: IEC notifier hook function
> + * @data: notifier private data
> + */
> +void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
> +                                     iec_notify_fn fn, void *data);
> +
> +/**
> + * x86_iommu_iec_notify_all - Notify IEC invalidations
> + * @iommu: IOMMU device that sends the notification
> + * @global: whether this is a global invalidation. If true, @index
> + *          and @mask are undefined.
> + * @index: starting index of interrupt entry to invalidate
> + * @mask: index mask for the invalidation

This is Intel'ish: index and mask refer to the single Intel IR table.
AMD has per-device tables.

But even for Intel: Would the index make any sense to the callbacks? KVM
uses (virtual and real) GSIs to address its routing entries, no?

I suspect we will have to redesign this once we want to make use of
non-global invalidation.

Jan
Peter Xu July 31, 2016, 3:59 a.m. UTC | #2
On Sat, Jul 30, 2016 at 09:52:48AM +0200, Jan Kiszka wrote:

[...]

> > +/**
> > + * x86_iommu_iec_notify_all - Notify IEC invalidations
> > + * @iommu: IOMMU device that sends the notification
> > + * @global: whether this is a global invalidation. If true, @index
> > + *          and @mask are undefined.
> > + * @index: starting index of interrupt entry to invalidate
> > + * @mask: index mask for the invalidation
> 
> This is Intel'ish: index and mask refer to the single Intel IR table.
> AMD has per-device tables.

Actually I was trying to consider this before when designing about the
interface...

> 
> But even for Intel: Would the index make any sense to the callbacks? KVM
> uses (virtual and real) GSIs to address its routing entries, no?
> 
> I suspect we will have to redesign this once we want to make use of
> non-global invalidation.

IIUC here the problem is how we should manage the mapping between GSI
routes and IRTE index (or device IDs, but let's talk later about
device IDs, since we can map a device-id invalidation into several
standalone index invalidations)? Or say, who should maintain this? IEC
invalidation consumers (e.g., IRQFD logic, IOAPIC, ...), or IOMMU?

IMHO, I would prefer the consumers to maintain this, not IOMMU. So I
would prefer a raw notification interface (index, mask, device-id,
etc. rather than GSI route index), and the consumers are responsible
to translate this message for their own sake.

The reason is simple: what if we have some other components (besides
GSI routes) that will register to this notifier? Though I am not sure
whether there would be one in the future, but letting IOMMU knowing
about something like GSI route index is a little bit odd to me.

Take irqfd as an example, currently MSIRouteEntry is defined as:

    struct MSIRouteEntry {
        PCIDevice *dev;             /* Device pointer */
        int vector;                 /* MSI/MSIX vector index */
        int virq;                   /* Virtual IRQ index */
        QLIST_ENTRY(MSIRouteEntry) list;
    };

When we want to support explicit IEC invalidation, we may need an
extra field like:

        uint32_t index;             /* IRTE index */

So when MSI routes are invalidated, we can translated the raw index
information into virq by simply looking up the MSIRouteEntry list.

Regarding to AMD's device-id interface...

I see that AMD IOMMUs do not have a global IRTE index table, not sure
whether we can "define" one? E.g. IIUC AMD IOMMU IRTE will have 13
bits index for each device, so how about making a global index like:

  device-id (16 bits) + 000b (3 bits) + index_per_device (13 bits)

to form a 32 bit index. So when AMD IOMMUs got a invalidation request,
IOMMU can translate this per-device invalidation into several
invalidations for specific IRTE entries? Not sure whether this would
work.

Thanks,

-- peterx
Jan Kiszka July 31, 2016, 12:01 p.m. UTC | #3
On 2016-07-31 05:59, Peter Xu wrote:
> On Sat, Jul 30, 2016 at 09:52:48AM +0200, Jan Kiszka wrote:
> 
> [...]
> 
>>> +/**
>>> + * x86_iommu_iec_notify_all - Notify IEC invalidations
>>> + * @iommu: IOMMU device that sends the notification
>>> + * @global: whether this is a global invalidation. If true, @index
>>> + *          and @mask are undefined.
>>> + * @index: starting index of interrupt entry to invalidate
>>> + * @mask: index mask for the invalidation
>>
>> This is Intel'ish: index and mask refer to the single Intel IR table.
>> AMD has per-device tables.
> 
> Actually I was trying to consider this before when designing about the
> interface...
> 
>>
>> But even for Intel: Would the index make any sense to the callbacks? KVM
>> uses (virtual and real) GSIs to address its routing entries, no?
>>
>> I suspect we will have to redesign this once we want to make use of
>> non-global invalidation.
> 
> IIUC here the problem is how we should manage the mapping between GSI
> routes and IRTE index (or device IDs, but let's talk later about
> device IDs, since we can map a device-id invalidation into several
> standalone index invalidations)? Or say, who should maintain this? IEC
> invalidation consumers (e.g., IRQFD logic, IOAPIC, ...), or IOMMU?
> 
> IMHO, I would prefer the consumers to maintain this, not IOMMU. So I
> would prefer a raw notification interface (index, mask, device-id,
> etc. rather than GSI route index), and the consumers are responsible
> to translate this message for their own sake.
> 
> The reason is simple: what if we have some other components (besides
> GSI routes) that will register to this notifier? Though I am not sure
> whether there would be one in the future, but letting IOMMU knowing
> about something like GSI route index is a little bit odd to me.
> 
> Take irqfd as an example, currently MSIRouteEntry is defined as:
> 
>     struct MSIRouteEntry {
>         PCIDevice *dev;             /* Device pointer */
>         int vector;                 /* MSI/MSIX vector index */
>         int virq;                   /* Virtual IRQ index */
>         QLIST_ENTRY(MSIRouteEntry) list;
>     };
> 
> When we want to support explicit IEC invalidation, we may need an
> extra field like:
> 
>         uint32_t index;             /* IRTE index */
> 
> So when MSI routes are invalidated, we can translated the raw index
> information into virq by simply looking up the MSIRouteEntry list.
> 
> Regarding to AMD's device-id interface...
> 
> I see that AMD IOMMUs do not have a global IRTE index table, not sure
> whether we can "define" one? E.g. IIUC AMD IOMMU IRTE will have 13
> bits index for each device, so how about making a global index like:
> 
>   device-id (16 bits) + 000b (3 bits) + index_per_device (13 bits)
> 
> to form a 32 bit index. So when AMD IOMMUs got a invalidation request,
> IOMMU can translate this per-device invalidation into several
> invalidations for specific IRTE entries? Not sure whether this would
> work.

Yes, there has to be a generic handle for each translation result an
IOMMU generated. This handle can be stored on the consumer side along
with the translation request. How a handle is generated should be
completely up to the IOMMU.

The consumer should receive a 32-bit (or more) opaque value with each
translation request (separate parameter) and then again on specific
invalidation. The latter case may also report a range of handles, to
make things more efficient (provided the consumer store those handles
close to each other).

Jan
Peter Xu July 31, 2016, 12:51 p.m. UTC | #4
On Sun, Jul 31, 2016 at 02:01:26PM +0200, Jan Kiszka wrote:

[...]

> Yes, there has to be a generic handle for each translation result an
> IOMMU generated. This handle can be stored on the consumer side along
> with the translation request. How a handle is generated should be
> completely up to the IOMMU.
> 
> The consumer should receive a 32-bit (or more) opaque value with each
> translation request (separate parameter) and then again on specific
> invalidation. The latter case may also report a range of handles, to
> make things more efficient (provided the consumer store those handles
> close to each other).

Agreed. So I think masking can still survive to describe "a range of
handles" for AMD IOMMUs, as long as we are sure the masked range
covers the maximum index number of IRTE entry for specific device.

Thanks,

-- peterx
diff mbox

Patch

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e1a08cb..10c20fe 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -296,12 +296,28 @@  typedef enum VTDFaultReason {
 
 #define VTD_CONTEXT_CACHE_GEN_MAX       0xffffffffUL
 
+/* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */
+struct VTDInvDescIEC {
+    uint32_t type:4;            /* Should always be 0x4 */
+    uint32_t granularity:1;     /* If set, it's global IR invalidation */
+    uint32_t resved_1:22;
+    uint32_t index_mask:5;      /* 2^N for continuous int invalidation */
+    uint32_t index:16;          /* Start index to invalidate */
+    uint32_t reserved_2:16;
+};
+typedef struct VTDInvDescIEC VTDInvDescIEC;
+
 /* Queued Invalidation Descriptor */
-struct VTDInvDesc {
-    uint64_t lo;
-    uint64_t hi;
+union VTDInvDesc {
+    struct {
+        uint64_t lo;
+        uint64_t hi;
+    };
+    union {
+        VTDInvDescIEC iec;
+    };
 };
-typedef struct VTDInvDesc VTDInvDesc;
+typedef union VTDInvDesc VTDInvDesc;
 
 /* Masks for struct VTDInvDesc */
 #define VTD_INV_DESC_TYPE               0xf
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index fa6ce31..c48e8dd 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -46,9 +46,28 @@  struct X86IOMMUClass {
                      MSIMessage *dst, uint16_t sid);
 };
 
+/**
+ * iec_notify_fn - IEC (Interrupt Entry Cache) notifier hook,
+ *                 triggered when IR invalidation happens.
+ * @private: private data
+ * @global: whether this is a global IEC invalidation
+ * @index: IRTE index to invalidate (start from)
+ * @mask: invalidation mask
+ */
+typedef void (*iec_notify_fn)(void *private, bool global,
+                              uint32_t index, uint32_t mask);
+
+struct IEC_Notifier {
+    iec_notify_fn iec_notify;
+    void *private;
+    QLIST_ENTRY(IEC_Notifier) list;
+};
+typedef struct IEC_Notifier IEC_Notifier;
+
 struct X86IOMMUState {
     SysBusDevice busdev;
     bool intr_supported;        /* Whether vIOMMU supports IR */
+    QLIST_HEAD(, IEC_Notifier) iec_notifiers; /* IEC notify list */
 };
 
 /**
@@ -57,4 +76,25 @@  struct X86IOMMUState {
  */
 X86IOMMUState *x86_iommu_get_default(void);
 
+/**
+ * x86_iommu_iec_register_notifier - register IEC (Interrupt Entry
+ *                                   Cache) notifiers
+ * @iommu: IOMMU device to register
+ * @fn: IEC notifier hook function
+ * @data: notifier private data
+ */
+void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
+                                     iec_notify_fn fn, void *data);
+
+/**
+ * x86_iommu_iec_notify_all - Notify IEC invalidations
+ * @iommu: IOMMU device that sends the notification
+ * @global: whether this is a global invalidation. If true, @index
+ *          and @mask are undefined.
+ * @index: starting index of interrupt entry to invalidate
+ * @mask: index mask for the invalidation
+ */
+void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
+                              uint32_t index, uint32_t mask);
+
 #endif
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c7ded0f..2acec85 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -904,6 +904,12 @@  static void vtd_root_table_setup(IntelIOMMUState *s)
                 (s->root_extended ? "(extended)" : ""));
 }
 
+static void vtd_iec_notify_all(IntelIOMMUState *s, bool global,
+                               uint32_t index, uint32_t mask)
+{
+    x86_iommu_iec_notify_all(X86_IOMMU_DEVICE(s), global, index, mask);
+}
+
 static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 {
     uint64_t value = 0;
@@ -911,7 +917,8 @@  static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
     s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
     s->intr_root = value & VTD_IRTA_ADDR_MASK;
 
-    /* TODO: invalidate interrupt entry cache */
+    /* Notify global invalidation */
+    vtd_iec_notify_all(s, true, 0, 0);
 
     VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
                 s->intr_root, s->intr_size);
@@ -1413,6 +1420,21 @@  static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc)
     return true;
 }
 
+static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
+                                     VTDInvDesc *inv_desc)
+{
+    VTD_DPRINTF(INV, "inv ir glob %d index %d mask %d",
+                inv_desc->iec.granularity,
+                inv_desc->iec.index,
+                inv_desc->iec.index_mask);
+
+    vtd_iec_notify_all(s, !inv_desc->iec.granularity,
+                       inv_desc->iec.index,
+                       inv_desc->iec.index_mask);
+
+    return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
     VTDInvDesc inv_desc;
@@ -1453,12 +1475,12 @@  static bool vtd_process_inv_desc(IntelIOMMUState *s)
         break;
 
     case VTD_INV_DESC_IEC:
-        VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
-                    "not implemented yet");
-        /*
-         * Since currently we do not cache interrupt entries, we can
-         * just mark this descriptor as "good" and move on.
-         */
+        VTD_DPRINTF(INV, "Invalidation Interrupt Entry Cache "
+                    "Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
+                    inv_desc.hi, inv_desc.lo);
+        if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
+            return false;
+        }
         break;
 
     default:
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 4280839..ce26b2a 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -22,6 +22,33 @@ 
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
 #include "qemu/error-report.h"
+#include "trace.h"
+
+void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
+                                     iec_notify_fn fn, void *data)
+{
+    IEC_Notifier *notifier = g_new0(IEC_Notifier, 1);
+
+    notifier->iec_notify = fn;
+    notifier->private = data;
+
+    QLIST_INSERT_HEAD(&iommu->iec_notifiers, notifier, list);
+}
+
+void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
+                              uint32_t index, uint32_t mask)
+{
+    IEC_Notifier *notifier;
+
+    trace_x86_iommu_iec_notify(global, index, mask);
+
+    QLIST_FOREACH(notifier, &iommu->iec_notifiers, list) {
+        if (notifier->iec_notify) {
+            notifier->iec_notify(notifier->private, global,
+                                 index, mask);
+        }
+    }
+}
 
 /* Default X86 IOMMU device */
 static X86IOMMUState *x86_iommu_default = NULL;
@@ -46,7 +73,9 @@  X86IOMMUState *x86_iommu_get_default(void)
 
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
+    X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
     X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(dev);
+    QLIST_INIT(&x86_iommu->iec_notifiers);
     if (x86_class->realize) {
         x86_class->realize(dev, errp);
     }
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index ea77bc2..b4882c1 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -10,3 +10,6 @@  xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space (ad
 # hw/i386/pc.c
 mhp_pc_dimm_assigned_slot(int slot) "0x%d"
 mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
+
+# hw/i386/x86-iommu.c
+x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32