diff mbox

intel_iommu: make sure its init before PCI dev

Message ID 1487742565-2513-1-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu Feb. 22, 2017, 5:49 a.m. UTC
Intel vIOMMU devices are created with "-device" parameter, while here
actually we need to make sure this device will be created before some
other PCI devices (like vfio-pci devices) so that we know iommu_fn will
be setup correctly before realizations of those PCI devices.

Here we do explicit check to make sure intel-iommu device will be inited
before all the rest of the PCI devices. This is done by checking against
the devices dangled under current root PCIe bus and we should see
nothing there besides integrated ICH9 ones.

If the user violated this rule, we abort the program.

Maybe one day we will be able to manage the ordering of device
initialization, and then we can grant VT-d devices a higher init
priority. But before that, let's have this explicit check to make sure
of it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

Comments

Jintack Lim Feb. 22, 2017, 11:42 a.m. UTC | #1
On Wed, Feb 22, 2017 at 12:49 AM, Peter Xu <peterx@redhat.com> wrote:

> Intel vIOMMU devices are created with "-device" parameter, while here
> actually we need to make sure this device will be created before some
> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> be setup correctly before realizations of those PCI devices.
>
> Here we do explicit check to make sure intel-iommu device will be inited
> before all the rest of the PCI devices. This is done by checking against
> the devices dangled under current root PCIe bus and we should see
> nothing there besides integrated ICH9 ones.
>
> If the user violated this rule, we abort the program.
>

Hi Peter,

After applying this patch, qemu gave the following error and was
terminated, but I believe I passed parameters in a right order?

[kvm-node ~]$sudo qemu-system-x86_64 \
> -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> -M q35,accel=kvm,kernel-irqchip=split \
> -m 12G \
> -drive file=/mydata/guest0.img,format=raw --nographic -cpu host \
> -smp 4,sockets=4,cores=1,threads=1 \
> -device vfio-pci,host=08:00.0
qemu-system-x86_64: -device
intel-iommu,intremap=on,eim=off,caching-mode=on: Please init intel-iommu
before other PCI devices

 Thanks,
Jintack


> Maybe one day we will be able to manage the ordering of device
> initialization, and then we can grant VT-d devices a higher init
> priority. But before that, let's have this explicit check to make sure
> of it.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 22d8226..db74124 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -31,6 +31,7 @@
>  #include "hw/i386/apic-msidef.h"
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "hw/i386/ich9.h"
>  #include "hw/pci-host/q35.h"
>  #include "sysemu/kvm.h"
>  #include "hw/i386/apic_internal.h"
> @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s,
> Error **errp)
>      return true;
>  }
>
> +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
> +{
> +    int i;
> +    uint8_t func;
> +
> +    /* We check against root bus */
> +    assert(bus && pci_bus_is_root(bus));
> +
> +    /*
> +     * We need to make sure vIOMMU device is created before other PCI
> +     * devices other than the integrated ICH9 ones, so that they can
> +     * get correct iommu_fn setup even during its realize(). Some
> +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
> +     */
> +    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
> +        /* Skip the checking against ICH9 integrated devices */
> +        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
> +            func = PCI_FUNC(i);
> +            if (func == ICH9_LPC_FUNC ||
> +                func == ICH9_SATA1_FUNC ||
> +                func == ICH9_SMB_FUNC) {
> +                continue;
> +            }
> +        }
> +
> +        if (bus->devices[i]) {
> +            error_setg(errp, "Please init intel-iommu before "
> +                       "other PCI devices");
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  static void vtd_realize(DeviceState *dev, Error **errp)
>  {
>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> @@ -2567,6 +2603,10 @@ static void vtd_realize(DeviceState *dev, Error
> **errp)
>      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>
> +    if (vtd_has_inited_pci_devices(bus, errp)) {
> +        return;
> +    }
> +
>      VTD_DPRINTF(GENERAL, "");
>      x86_iommu->type = TYPE_INTEL;
>
> --
> 2.7.4
>
>
>
Jintack Lim Feb. 22, 2017, 1:37 p.m. UTC | #2
On Wed, Feb 22, 2017 at 6:42 AM, Jintack Lim <jintack@cs.columbia.edu>
wrote:

>
>
> On Wed, Feb 22, 2017 at 12:49 AM, Peter Xu <peterx@redhat.com> wrote:
>
>> Intel vIOMMU devices are created with "-device" parameter, while here
>> actually we need to make sure this device will be created before some
>> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
>> be setup correctly before realizations of those PCI devices.
>>
>> Here we do explicit check to make sure intel-iommu device will be inited
>> before all the rest of the PCI devices. This is done by checking against
>> the devices dangled under current root PCIe bus and we should see
>> nothing there besides integrated ICH9 ones.
>>
>> If the user violated this rule, we abort the program.
>>
>
> Hi Peter,
>
> After applying this patch, qemu gave the following error and was
> terminated, but I believe I passed parameters in a right order?
>

FYI, I've applied this patch to your vtd-vfio-enablement-v7 branch.


>
> [kvm-node ~]$sudo qemu-system-x86_64 \
> > -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> > -M q35,accel=kvm,kernel-irqchip=split \
> > -m 12G \
> > -drive file=/mydata/guest0.img,format=raw --nographic -cpu host \
> > -smp 4,sockets=4,cores=1,threads=1 \
> > -device vfio-pci,host=08:00.0
> qemu-system-x86_64: -device intel-iommu,intremap=on,eim=off,caching-mode=on:
> Please init intel-iommu before other PCI devices
>
>  Thanks,
> Jintack
>
>
>> Maybe one day we will be able to manage the ordering of device
>> initialization, and then we can grant VT-d devices a higher init
>> priority. But before that, let's have this explicit check to make sure
>> of it.
>>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 22d8226..db74124 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -31,6 +31,7 @@
>>  #include "hw/i386/apic-msidef.h"
>>  #include "hw/boards.h"
>>  #include "hw/i386/x86-iommu.h"
>> +#include "hw/i386/ich9.h"
>>  #include "hw/pci-host/q35.h"
>>  #include "sysemu/kvm.h"
>>  #include "hw/i386/apic_internal.h"
>> @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s,
>> Error **errp)
>>      return true;
>>  }
>>
>> +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
>> +{
>> +    int i;
>> +    uint8_t func;
>> +
>> +    /* We check against root bus */
>> +    assert(bus && pci_bus_is_root(bus));
>> +
>> +    /*
>> +     * We need to make sure vIOMMU device is created before other PCI
>> +     * devices other than the integrated ICH9 ones, so that they can
>> +     * get correct iommu_fn setup even during its realize(). Some
>> +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
>> +     */
>> +    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
>> +        /* Skip the checking against ICH9 integrated devices */
>> +        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
>> +            func = PCI_FUNC(i);
>> +            if (func == ICH9_LPC_FUNC ||
>> +                func == ICH9_SATA1_FUNC ||
>> +                func == ICH9_SMB_FUNC) {
>> +                continue;
>> +            }
>> +        }
>> +
>> +        if (bus->devices[i]) {
>> +            error_setg(errp, "Please init intel-iommu before "
>> +                       "other PCI devices");
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>  static void vtd_realize(DeviceState *dev, Error **errp)
>>  {
>>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
>> @@ -2567,6 +2603,10 @@ static void vtd_realize(DeviceState *dev, Error
>> **errp)
>>      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>>
>> +    if (vtd_has_inited_pci_devices(bus, errp)) {
>> +        return;
>> +    }
>> +
>>      VTD_DPRINTF(GENERAL, "");
>>      x86_iommu->type = TYPE_INTEL;
>>
>> --
>> 2.7.4
>>
>>
>>
>
Alex Williamson Feb. 22, 2017, 5:30 p.m. UTC | #3
On Wed, 22 Feb 2017 13:49:25 +0800
Peter Xu <peterx@redhat.com> wrote:

> Intel vIOMMU devices are created with "-device" parameter, while here
> actually we need to make sure this device will be created before some
> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> be setup correctly before realizations of those PCI devices.
> 
> Here we do explicit check to make sure intel-iommu device will be inited
> before all the rest of the PCI devices. This is done by checking against
> the devices dangled under current root PCIe bus and we should see
> nothing there besides integrated ICH9 ones.
> 
> If the user violated this rule, we abort the program.
> 
> Maybe one day we will be able to manage the ordering of device
> initialization, and then we can grant VT-d devices a higher init
> priority. But before that, let's have this explicit check to make sure
> of it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 22d8226..db74124 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -31,6 +31,7 @@
>  #include "hw/i386/apic-msidef.h"
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "hw/i386/ich9.h"
>  #include "hw/pci-host/q35.h"
>  #include "sysemu/kvm.h"
>  #include "hw/i386/apic_internal.h"
> @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>      return true;
>  }
>  
> +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
> +{
> +    int i;
> +    uint8_t func;
> +
> +    /* We check against root bus */
> +    assert(bus && pci_bus_is_root(bus));
> +
> +    /*
> +     * We need to make sure vIOMMU device is created before other PCI
> +     * devices other than the integrated ICH9 ones, so that they can
> +     * get correct iommu_fn setup even during its realize(). Some
> +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
> +     */
> +    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
> +        /* Skip the checking against ICH9 integrated devices */
> +        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
> +            func = PCI_FUNC(i);
> +            if (func == ICH9_LPC_FUNC ||
> +                func == ICH9_SATA1_FUNC ||
> +                func == ICH9_SMB_FUNC) {
> +                continue;
> +            }
> +        }


Whitelisting specific devfns seems pretty sketchy and fragile.  Can we
even assume we're on a Q35 chipset?  I don't see that vtd_realize()
takes any particular precautions not to allow initialization on 440fx,
or whatever generic chipset we come up with next that may not have
these specific devices.  It would probably be a better idea to use
object_dynamic_cast() if you want to whitelist specific devices.
Perhaps this could even be used to validate the chipset as well.
Thanks,

Alex

> +
> +        if (bus->devices[i]) {
> +            error_setg(errp, "Please init intel-iommu before "
> +                       "other PCI devices");
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  static void vtd_realize(DeviceState *dev, Error **errp)
>  {
>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> @@ -2567,6 +2603,10 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>  
> +    if (vtd_has_inited_pci_devices(bus, errp)) {
> +        return;
> +    }
> +
>      VTD_DPRINTF(GENERAL, "");
>      x86_iommu->type = TYPE_INTEL;
>
Peter Xu Feb. 23, 2017, 2:35 a.m. UTC | #4
On Wed, Feb 22, 2017 at 08:37:32AM -0500, Jintack Lim wrote:
> On Wed, Feb 22, 2017 at 6:42 AM, Jintack Lim <jintack@cs.columbia.edu>
> wrote:
> 
> >
> >
> > On Wed, Feb 22, 2017 at 12:49 AM, Peter Xu <peterx@redhat.com> wrote:
> >
> >> Intel vIOMMU devices are created with "-device" parameter, while here
> >> actually we need to make sure this device will be created before some
> >> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> >> be setup correctly before realizations of those PCI devices.
> >>
> >> Here we do explicit check to make sure intel-iommu device will be inited
> >> before all the rest of the PCI devices. This is done by checking against
> >> the devices dangled under current root PCIe bus and we should see
> >> nothing there besides integrated ICH9 ones.
> >>
> >> If the user violated this rule, we abort the program.
> >>
> >
> > Hi Peter,
> >
> > After applying this patch, qemu gave the following error and was
> > terminated, but I believe I passed parameters in a right order?
> >
> 
> FYI, I've applied this patch to your vtd-vfio-enablement-v7 branch.

I think the problem is that I was always testing with "-nodefaults",
while that's not your case... And without "-nodefaults" we'll have
these two devices created before all the rest:

00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:02.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)

So Intel vIOMMU complaint. :(

Thanks for the report! I'll think of something better and prepare
another version.

-- peterx
Peter Xu Feb. 23, 2017, 3:06 a.m. UTC | #5
On Wed, Feb 22, 2017 at 10:30:47AM -0700, Alex Williamson wrote:
> On Wed, 22 Feb 2017 13:49:25 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Intel vIOMMU devices are created with "-device" parameter, while here
> > actually we need to make sure this device will be created before some
> > other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> > be setup correctly before realizations of those PCI devices.
> > 
> > Here we do explicit check to make sure intel-iommu device will be inited
> > before all the rest of the PCI devices. This is done by checking against
> > the devices dangled under current root PCIe bus and we should see
> > nothing there besides integrated ICH9 ones.
> > 
> > If the user violated this rule, we abort the program.
> > 
> > Maybe one day we will be able to manage the ordering of device
> > initialization, and then we can grant VT-d devices a higher init
> > priority. But before that, let's have this explicit check to make sure
> > of it.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 22d8226..db74124 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -31,6 +31,7 @@
> >  #include "hw/i386/apic-msidef.h"
> >  #include "hw/boards.h"
> >  #include "hw/i386/x86-iommu.h"
> > +#include "hw/i386/ich9.h"
> >  #include "hw/pci-host/q35.h"
> >  #include "sysemu/kvm.h"
> >  #include "hw/i386/apic_internal.h"
> > @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> >      return true;
> >  }
> >  
> > +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
> > +{
> > +    int i;
> > +    uint8_t func;
> > +
> > +    /* We check against root bus */
> > +    assert(bus && pci_bus_is_root(bus));
> > +
> > +    /*
> > +     * We need to make sure vIOMMU device is created before other PCI
> > +     * devices other than the integrated ICH9 ones, so that they can
> > +     * get correct iommu_fn setup even during its realize(). Some
> > +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
> > +     */
> > +    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
> > +        /* Skip the checking against ICH9 integrated devices */
> > +        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
> > +            func = PCI_FUNC(i);
> > +            if (func == ICH9_LPC_FUNC ||
> > +                func == ICH9_SATA1_FUNC ||
> > +                func == ICH9_SMB_FUNC) {
> > +                continue;
> > +            }
> > +        }
> 
> 
> Whitelisting specific devfns seems pretty sketchy and fragile.  Can we
> even assume we're on a Q35 chipset?  I don't see that vtd_realize()
> takes any particular precautions not to allow initialization on 440fx,
> or whatever generic chipset we come up with next that may not have
> these specific devices.

Yes. IIUC VT-d now can only work with Q35, right? Cc Marcel and Paolo
in case I misunderstood it.

If so, maybe here we should check against q35 in vtd realization. How
about something like:

    if (!object_property_find((Object *)pcms, "q35", NULL)) {
        error_setg(errp, "Currently Intel vIOMMU only support Q35 platform. "
                   "Please specify \"-M q35\" to enable it.");
        return;
    }

?

> It would probably be a better idea to use
> object_dynamic_cast() if you want to whitelist specific devices.
> Perhaps this could even be used to validate the chipset as well.

Now Jintack reported another issue, that we may have two default
devices there if not specifying "-nodefaults", and that two devices
will always be the first ones to be inited.

How about here we just explicitly check against vfio-pci devices, so
we just make sure vfio-pci devices will be put after intel-iommu?
Since actually vfio-pci devices are the only ones that we know we need
to be inited explicitly after the VT-d device.

Thanks,

> Thanks,
> 
> Alex
> 
> > +
> > +        if (bus->devices[i]) {
> > +            error_setg(errp, "Please init intel-iommu before "
> > +                       "other PCI devices");
> > +            return true;
> > +        }
> > +    }
> > +
> > +    return false;
> > +}
> > +
> >  static void vtd_realize(DeviceState *dev, Error **errp)
> >  {
> >      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> > @@ -2567,6 +2603,10 @@ static void vtd_realize(DeviceState *dev, Error **errp)
> >      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
> >      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
> >  
> > +    if (vtd_has_inited_pci_devices(bus, errp)) {
> > +        return;
> > +    }
> > +
> >      VTD_DPRINTF(GENERAL, "");
> >      x86_iommu->type = TYPE_INTEL;
> >  
> 

-- peterx
Alex Williamson Feb. 23, 2017, 3:24 a.m. UTC | #6
On Thu, 23 Feb 2017 11:06:47 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Feb 22, 2017 at 10:30:47AM -0700, Alex Williamson wrote:
> > On Wed, 22 Feb 2017 13:49:25 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > Intel vIOMMU devices are created with "-device" parameter, while here
> > > actually we need to make sure this device will be created before some
> > > other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> > > be setup correctly before realizations of those PCI devices.
> > > 
> > > Here we do explicit check to make sure intel-iommu device will be inited
> > > before all the rest of the PCI devices. This is done by checking against
> > > the devices dangled under current root PCIe bus and we should see
> > > nothing there besides integrated ICH9 ones.
> > > 
> > > If the user violated this rule, we abort the program.
> > > 
> > > Maybe one day we will be able to manage the ordering of device
> > > initialization, and then we can grant VT-d devices a higher init
> > > priority. But before that, let's have this explicit check to make sure
> > > of it.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 40 insertions(+)
> > > 
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index 22d8226..db74124 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -31,6 +31,7 @@
> > >  #include "hw/i386/apic-msidef.h"
> > >  #include "hw/boards.h"
> > >  #include "hw/i386/x86-iommu.h"
> > > +#include "hw/i386/ich9.h"
> > >  #include "hw/pci-host/q35.h"
> > >  #include "sysemu/kvm.h"
> > >  #include "hw/i386/apic_internal.h"
> > > @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> > >      return true;
> > >  }
> > >  
> > > +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
> > > +{
> > > +    int i;
> > > +    uint8_t func;
> > > +
> > > +    /* We check against root bus */
> > > +    assert(bus && pci_bus_is_root(bus));
> > > +
> > > +    /*
> > > +     * We need to make sure vIOMMU device is created before other PCI
> > > +     * devices other than the integrated ICH9 ones, so that they can
> > > +     * get correct iommu_fn setup even during its realize(). Some
> > > +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
> > > +     */
> > > +    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
> > > +        /* Skip the checking against ICH9 integrated devices */
> > > +        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
> > > +            func = PCI_FUNC(i);
> > > +            if (func == ICH9_LPC_FUNC ||
> > > +                func == ICH9_SATA1_FUNC ||
> > > +                func == ICH9_SMB_FUNC) {
> > > +                continue;
> > > +            }
> > > +        }  
> > 
> > 
> > Whitelisting specific devfns seems pretty sketchy and fragile.  Can we
> > even assume we're on a Q35 chipset?  I don't see that vtd_realize()
> > takes any particular precautions not to allow initialization on 440fx,
> > or whatever generic chipset we come up with next that may not have
> > these specific devices.  
> 
> Yes. IIUC VT-d now can only work with Q35, right? Cc Marcel and Paolo
> in case I misunderstood it.

440FX is not a PCIe chipset therefore on bare metal there'd be no such
thing as requester IDs with which to lookup context entries.  Of course
a VM doesn't care about those details, but I think it best not to tempt
users with invalid configs.
 
> If so, maybe here we should check against q35 in vtd realization. How
> about something like:
> 
>     if (!object_property_find((Object *)pcms, "q35", NULL)) {
>         error_setg(errp, "Currently Intel vIOMMU only support Q35 platform. "
>                    "Please specify \"-M q35\" to enable it.");
>         return;
>     }
> 
> ?
> 
> > It would probably be a better idea to use
> > object_dynamic_cast() if you want to whitelist specific devices.
> > Perhaps this could even be used to validate the chipset as well.  
> 
> Now Jintack reported another issue, that we may have two default
> devices there if not specifying "-nodefaults", and that two devices
> will always be the first ones to be inited.
> 
> How about here we just explicitly check against vfio-pci devices, so
> we just make sure vfio-pci devices will be put after intel-iommu?
> Since actually vfio-pci devices are the only ones that we know we need
> to be inited explicitly after the VT-d device.

I was afraid you were going to come to this conclusion.  That works,
BUT it just means the problem gets ignored as a vfio problem when
really vfio is doing nothing wrong other than caring about the device
address space during its initialization.  Then users have a perfectly
working config, add a vfio-pci device and things explode.  If you want
to impose a user ordering requirement, do it consistently.  Thanks,

Alex
Peter Xu Feb. 23, 2017, 5:42 a.m. UTC | #7
On Wed, Feb 22, 2017 at 08:24:51PM -0700, Alex Williamson wrote:

[...]

> > Now Jintack reported another issue, that we may have two default
> > devices there if not specifying "-nodefaults", and that two devices
> > will always be the first ones to be inited.
> > 
> > How about here we just explicitly check against vfio-pci devices, so
> > we just make sure vfio-pci devices will be put after intel-iommu?
> > Since actually vfio-pci devices are the only ones that we know we need
> > to be inited explicitly after the VT-d device.
> 
> I was afraid you were going to come to this conclusion.  That works,
> BUT it just means the problem gets ignored as a vfio problem when
> really vfio is doing nothing wrong other than caring about the device
> address space during its initialization.  Then users have a perfectly
> working config, add a vfio-pci device and things explode.  If you want
> to impose a user ordering requirement, do it consistently.  Thanks,

We cannot guarantee that guest won't "explode" only if we
automatically order the device init, but that's something I am not
quite sure about that we will need now, especially I don't know
whether it would be a 2.9 material, considering that 2.9 soft freeze
should be on Feb 28th. :(

I kind of understand your concern, but would that really a so-called
explosion? The user will just be warned that he/she should move the
intel-iommu line slightly higher. Imho that's tolerable, since the
user is definitely adding something new to the parameters, and it's
possible the new command line just don't work. Also, imho it's not
anyone's fault if it happens, it's just a new rule that we need to
make sure things work properly...

I just want to know what would be the most feasible approach that we
can safely have the vtd vfio series in before 2.9. Any further input
would be welcomed.

Thanks,

-- peterx
Marcel Apfelbaum Feb. 23, 2017, 8:10 a.m. UTC | #8
On 02/23/2017 05:06 AM, Peter Xu wrote:
> On Wed, Feb 22, 2017 at 10:30:47AM -0700, Alex Williamson wrote:
>> On Wed, 22 Feb 2017 13:49:25 +0800
>> Peter Xu <peterx@redhat.com> wrote:
>>
>>> Intel vIOMMU devices are created with "-device" parameter, while here
>>> actually we need to make sure this device will be created before some
>>> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
>>> be setup correctly before realizations of those PCI devices.
>>>
>>> Here we do explicit check to make sure intel-iommu device will be inited
>>> before all the rest of the PCI devices. This is done by checking against
>>> the devices dangled under current root PCIe bus and we should see
>>> nothing there besides integrated ICH9 ones.

Hi,

Commit b86eacb8 (hw/pci: delay bus_master_enable_region initialization)
creates the IOMMU memory region at machine_done time so the
devices creation order wouldn't matter.

I don't think we use the iommu_fn before machine_done.
What have I missed?

Thanks,
Marcel

>>>
>>> If the user violated this rule, we abort the program.
>>>
>>> Maybe one day we will be able to manage the ordering of device
>>> initialization, and then we can grant VT-d devices a higher init
>>> priority. But before that, let's have this explicit check to make sure
>>> of it.
>>>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 40 insertions(+)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index 22d8226..db74124 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -31,6 +31,7 @@
>>>  #include "hw/i386/apic-msidef.h"
>>>  #include "hw/boards.h"
>>>  #include "hw/i386/x86-iommu.h"
>>> +#include "hw/i386/ich9.h"
>>>  #include "hw/pci-host/q35.h"
>>>  #include "sysemu/kvm.h"
>>>  #include "hw/i386/apic_internal.h"
>>> @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>>>      return true;
>>>  }
>>>
>>> +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
>>> +{
>>> +    int i;
>>> +    uint8_t func;
>>> +
>>> +    /* We check against root bus */
>>> +    assert(bus && pci_bus_is_root(bus));
>>> +
>>> +    /*
>>> +     * We need to make sure vIOMMU device is created before other PCI
>>> +     * devices other than the integrated ICH9 ones, so that they can
>>> +     * get correct iommu_fn setup even during its realize(). Some
>>> +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
>>> +     */
>>> +    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
>>> +        /* Skip the checking against ICH9 integrated devices */
>>> +        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
>>> +            func = PCI_FUNC(i);
>>> +            if (func == ICH9_LPC_FUNC ||
>>> +                func == ICH9_SATA1_FUNC ||
>>> +                func == ICH9_SMB_FUNC) {
>>> +                continue;
>>> +            }
>>> +        }
>>
>>
>> Whitelisting specific devfns seems pretty sketchy and fragile.  Can we
>> even assume we're on a Q35 chipset?  I don't see that vtd_realize()
>> takes any particular precautions not to allow initialization on 440fx,
>> or whatever generic chipset we come up with next that may not have
>> these specific devices.
>
> Yes. IIUC VT-d now can only work with Q35, right? Cc Marcel and Paolo
> in case I misunderstood it.
>
> If so, maybe here we should check against q35 in vtd realization. How
> about something like:
>
>     if (!object_property_find((Object *)pcms, "q35", NULL)) {
>         error_setg(errp, "Currently Intel vIOMMU only support Q35 platform. "
>                    "Please specify \"-M q35\" to enable it.");
>         return;
>     }
>
> ?
>
>> It would probably be a better idea to use
>> object_dynamic_cast() if you want to whitelist specific devices.
>> Perhaps this could even be used to validate the chipset as well.
>
> Now Jintack reported another issue, that we may have two default
> devices there if not specifying "-nodefaults", and that two devices
> will always be the first ones to be inited.
>
> How about here we just explicitly check against vfio-pci devices, so
> we just make sure vfio-pci devices will be put after intel-iommu?
> Since actually vfio-pci devices are the only ones that we know we need
> to be inited explicitly after the VT-d device.
>
> Thanks,
>
>> Thanks,
>>
>> Alex
>>
>>> +
>>> +        if (bus->devices[i]) {
>>> +            error_setg(errp, "Please init intel-iommu before "
>>> +                       "other PCI devices");
>>> +            return true;
>>> +        }
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>>  static void vtd_realize(DeviceState *dev, Error **errp)
>>>  {
>>>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
>>> @@ -2567,6 +2603,10 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>>>      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>>>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>>>
>>> +    if (vtd_has_inited_pci_devices(bus, errp)) {
>>> +        return;
>>> +    }
>>> +
>>>      VTD_DPRINTF(GENERAL, "");
>>>      x86_iommu->type = TYPE_INTEL;
>>>
>>
>
> -- peterx
>
Peter Xu Feb. 23, 2017, 8:16 a.m. UTC | #9
On Thu, Feb 23, 2017 at 10:10:57AM +0200, Marcel Apfelbaum wrote:
> On 02/23/2017 05:06 AM, Peter Xu wrote:
> >On Wed, Feb 22, 2017 at 10:30:47AM -0700, Alex Williamson wrote:
> >>On Wed, 22 Feb 2017 13:49:25 +0800
> >>Peter Xu <peterx@redhat.com> wrote:
> >>
> >>>Intel vIOMMU devices are created with "-device" parameter, while here
> >>>actually we need to make sure this device will be created before some
> >>>other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> >>>be setup correctly before realizations of those PCI devices.
> >>>
> >>>Here we do explicit check to make sure intel-iommu device will be inited
> >>>before all the rest of the PCI devices. This is done by checking against
> >>>the devices dangled under current root PCIe bus and we should see
> >>>nothing there besides integrated ICH9 ones.
> 
> Hi,
> 
> Commit b86eacb8 (hw/pci: delay bus_master_enable_region initialization)
> creates the IOMMU memory region at machine_done time so the
> devices creation order wouldn't matter.
> 
> I don't think we use the iommu_fn before machine_done.
> What have I missed?

Hi, Marcel,

The problem is that vfio-pci will need to fetch the pci address space
during realization (pci_device_iommu_address_space() is called in
vfio_realize()).

Any thoughts? Thanks,

-- peterx
Marcel Apfelbaum Feb. 23, 2017, 12:02 p.m. UTC | #10
On 02/23/2017 10:16 AM, Peter Xu wrote:
> On Thu, Feb 23, 2017 at 10:10:57AM +0200, Marcel Apfelbaum wrote:
>> On 02/23/2017 05:06 AM, Peter Xu wrote:
>>> On Wed, Feb 22, 2017 at 10:30:47AM -0700, Alex Williamson wrote:
>>>> On Wed, 22 Feb 2017 13:49:25 +0800
>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>
>>>>> Intel vIOMMU devices are created with "-device" parameter, while here
>>>>> actually we need to make sure this device will be created before some
>>>>> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
>>>>> be setup correctly before realizations of those PCI devices.
>>>>>
>>>>> Here we do explicit check to make sure intel-iommu device will be inited
>>>>> before all the rest of the PCI devices. This is done by checking against
>>>>> the devices dangled under current root PCIe bus and we should see
>>>>> nothing there besides integrated ICH9 ones.
>>
>> Hi,
>>
>> Commit b86eacb8 (hw/pci: delay bus_master_enable_region initialization)
>> creates the IOMMU memory region at machine_done time so the
>> devices creation order wouldn't matter.
>>
>> I don't think we use the iommu_fn before machine_done.
>> What have I missed?
>
> Hi, Marcel,
>
> The problem is that vfio-pci will need to fetch the pci address space
> during realization (pci_device_iommu_address_space() is called in
> vfio_realize()).
>
> Any thoughts? Thanks,

Sure, I'll try to find a solution, but first I need to
understand why vfio-pci need to access the pci_device_iommu_address_space
so early. Maybe we can delay this also to machine_done stage.

Thanks,
Marcel

>
> -- peterx
>
Alex Williamson Feb. 23, 2017, 3:35 p.m. UTC | #11
On Thu, 23 Feb 2017 14:02:07 +0200
Marcel Apfelbaum <marcel@redhat.com> wrote:

> On 02/23/2017 10:16 AM, Peter Xu wrote:
> > On Thu, Feb 23, 2017 at 10:10:57AM +0200, Marcel Apfelbaum wrote:  
> >> On 02/23/2017 05:06 AM, Peter Xu wrote:  
> >>> On Wed, Feb 22, 2017 at 10:30:47AM -0700, Alex Williamson wrote:  
> >>>> On Wed, 22 Feb 2017 13:49:25 +0800
> >>>> Peter Xu <peterx@redhat.com> wrote:
> >>>>  
> >>>>> Intel vIOMMU devices are created with "-device" parameter, while here
> >>>>> actually we need to make sure this device will be created before some
> >>>>> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> >>>>> be setup correctly before realizations of those PCI devices.
> >>>>>
> >>>>> Here we do explicit check to make sure intel-iommu device will be inited
> >>>>> before all the rest of the PCI devices. This is done by checking against
> >>>>> the devices dangled under current root PCIe bus and we should see
> >>>>> nothing there besides integrated ICH9 ones.  
> >>
> >> Hi,
> >>
> >> Commit b86eacb8 (hw/pci: delay bus_master_enable_region initialization)
> >> creates the IOMMU memory region at machine_done time so the
> >> devices creation order wouldn't matter.
> >>
> >> I don't think we use the iommu_fn before machine_done.
> >> What have I missed?  
> >
> > Hi, Marcel,
> >
> > The problem is that vfio-pci will need to fetch the pci address space
> > during realization (pci_device_iommu_address_space() is called in
> > vfio_realize()).
> >
> > Any thoughts? Thanks,  
> 
> Sure, I'll try to find a solution, but first I need to
> understand why vfio-pci need to access the pci_device_iommu_address_space
> so early. Maybe we can delay this also to machine_done stage.

It's the architecture of vfio, the user only gets access to the device
when the container has iommu protection, therefore vfio needs to look
at the device address space to determine if it can share a container
with other devices.  Without an iommu all devices share the system
address space and use the same container.  With an iommu, each device
is in a separate address space and each gets its own container.
Without a container, the user doesn't get access to the device.
Deferring the address space to machine done would essentially defer the
entire vfio device initialization or else we'd need to close the
device and re-open and initialize it through a new container at that
time.  Thanks,

Alex
Michael S. Tsirkin Feb. 23, 2017, 11:21 p.m. UTC | #12
On Wed, Feb 22, 2017 at 01:49:25PM +0800, Peter Xu wrote:
> Intel vIOMMU devices are created with "-device" parameter, while here
> actually we need to make sure this device will be created before some
> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> be setup correctly before realizations of those PCI devices.
> Here we do explicit check to make sure intel-iommu device will be inited
> before all the rest of the PCI devices. This is done by checking against
> the devices dangled under current root PCIe bus and we should see
> nothing there besides integrated ICH9 ones.
> 
> If the user violated this rule, we abort the program.
> 
> Maybe one day we will be able to manage the ordering of device
> initialization, and then we can grant VT-d devices a higher init
> priority. But before that, let's have this explicit check to make sure
> of it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 22d8226..db74124 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -31,6 +31,7 @@
>  #include "hw/i386/apic-msidef.h"
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "hw/i386/ich9.h"
>  #include "hw/pci-host/q35.h"
>  #include "sysemu/kvm.h"
>  #include "hw/i386/apic_internal.h"
> @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>      return true;
>  }
>  
> +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
> +{
> +    int i;
> +    uint8_t func;
> +
> +    /* We check against root bus */
> +    assert(bus && pci_bus_is_root(bus));
> +
> +    /*
> +     * We need to make sure vIOMMU device is created before other PCI
> +     * devices other than the integrated ICH9 ones,


Why wouldn't this apply to integrated ICH9 ones?


> so that they can
> +     * get correct iommu_fn setup even during its realize(). Some
> +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.

If there's something special about vfio devices, then just check
the device type.

> +     */
> +    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
> +        /* Skip the checking against ICH9 integrated devices */
> +        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
> +            func = PCI_FUNC(i);
> +            if (func == ICH9_LPC_FUNC ||
> +                func == ICH9_SATA1_FUNC ||
> +                func == ICH9_SMB_FUNC) {
> +                continue;
> +            }
> +        }
> +
> +        if (bus->devices[i]) {
> +            error_setg(errp, "Please init intel-iommu before "
> +                       "other PCI devices");
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  static void vtd_realize(DeviceState *dev, Error **errp)
>  {
>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> @@ -2567,6 +2603,10 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>  
> +    if (vtd_has_inited_pci_devices(bus, errp)) {
> +        return;
> +    }
> +
>      VTD_DPRINTF(GENERAL, "");
>      x86_iommu->type = TYPE_INTEL;
>  
> -- 
> 2.7.4
Michael S. Tsirkin Feb. 23, 2017, 11:26 p.m. UTC | #13
On Thu, Feb 23, 2017 at 01:42:34PM +0800, Peter Xu wrote:
> On Wed, Feb 22, 2017 at 08:24:51PM -0700, Alex Williamson wrote:
> 
> [...]
> 
> > > Now Jintack reported another issue, that we may have two default
> > > devices there if not specifying "-nodefaults", and that two devices
> > > will always be the first ones to be inited.
> > > 
> > > How about here we just explicitly check against vfio-pci devices, so
> > > we just make sure vfio-pci devices will be put after intel-iommu?
> > > Since actually vfio-pci devices are the only ones that we know we need
> > > to be inited explicitly after the VT-d device.
> > 
> > I was afraid you were going to come to this conclusion.  That works,
> > BUT it just means the problem gets ignored as a vfio problem when
> > really vfio is doing nothing wrong other than caring about the device
> > address space during its initialization.

And that part is broken at this stage.  We can add a flag
"uses_address_space_in_init". Would that be better?


>  Then users have a perfectly
> > working config, add a vfio-pci device and things explode.  If you want
> > to impose a user ordering requirement, do it consistently.

This is not a new thing however. vfio previously did not work with
an iommu either. It seems reasonable to make sure iommu
is initialized first.

But it also seems reasonable to enable that gradually
and not require removing ordering requirements upfront.


>  Thanks,
> 
> We cannot guarantee that guest won't "explode" only if we
> automatically order the device init, but that's something I am not
> quite sure about that we will need now, especially I don't know
> whether it would be a 2.9 material, considering that 2.9 soft freeze
> should be on Feb 28th. :(
> 
> I kind of understand your concern, but would that really a so-called
> explosion? The user will just be warned that he/she should move the
> intel-iommu line slightly higher. Imho that's tolerable, since the
> user is definitely adding something new to the parameters, and it's
> possible the new command line just don't work. Also, imho it's not
> anyone's fault if it happens, it's just a new rule that we need to
> make sure things work properly...
> 
> I just want to know what would be the most feasible approach that we
> can safely have the vtd vfio series in before 2.9. Any further input
> would be welcomed.
> 
> Thanks,
> 
> -- peterx
Peter Xu Feb. 24, 2017, 2:45 a.m. UTC | #14
On Fri, Feb 24, 2017 at 01:21:52AM +0200, Michael S. Tsirkin wrote:
> On Wed, Feb 22, 2017 at 01:49:25PM +0800, Peter Xu wrote:
> > Intel vIOMMU devices are created with "-device" parameter, while here
> > actually we need to make sure this device will be created before some
> > other PCI devices (like vfio-pci devices) so that we know iommu_fn will
> > be setup correctly before realizations of those PCI devices.
> > Here we do explicit check to make sure intel-iommu device will be inited
> > before all the rest of the PCI devices. This is done by checking against
> > the devices dangled under current root PCIe bus and we should see
> > nothing there besides integrated ICH9 ones.
> > 
> > If the user violated this rule, we abort the program.
> > 
> > Maybe one day we will be able to manage the ordering of device
> > initialization, and then we can grant VT-d devices a higher init
> > priority. But before that, let's have this explicit check to make sure
> > of it.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/intel_iommu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 22d8226..db74124 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -31,6 +31,7 @@
> >  #include "hw/i386/apic-msidef.h"
> >  #include "hw/boards.h"
> >  #include "hw/i386/x86-iommu.h"
> > +#include "hw/i386/ich9.h"
> >  #include "hw/pci-host/q35.h"
> >  #include "sysemu/kvm.h"
> >  #include "hw/i386/apic_internal.h"
> > @@ -2560,6 +2561,41 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> >      return true;
> >  }
> >  
> > +static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
> > +{
> > +    int i;
> > +    uint8_t func;
> > +
> > +    /* We check against root bus */
> > +    assert(bus && pci_bus_is_root(bus));
> > +
> > +    /*
> > +     * We need to make sure vIOMMU device is created before other PCI
> > +     * devices other than the integrated ICH9 ones,
> 
> 
> Why wouldn't this apply to integrated ICH9 ones?

Because these integrated devices are created along with q35 in
pc_q35_init() with:

    pc_vga_init(isa_bus, host_bus);
    pc_nic_init(isa_bus, host_bus);

which is definitely ahead of the general device init routines,
including VT-d device.

> 
> 
> > so that they can
> > +     * get correct iommu_fn setup even during its realize(). Some
> > +     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
> 
> If there's something special about vfio devices, then just check
> the device type.

Yeah. I think that's the most feasible way to do this for now. If Alex
won't disagree, I can prepare a new version.

Thanks,

-- peterx
Marcel Apfelbaum Feb. 28, 2017, 2:37 p.m. UTC | #15
On 02/23/2017 05:35 PM, Alex Williamson wrote:
> On Thu, 23 Feb 2017 14:02:07 +0200
> Marcel Apfelbaum <marcel@redhat.com> wrote:
>
>> On 02/23/2017 10:16 AM, Peter Xu wrote:
>>> On Thu, Feb 23, 2017 at 10:10:57AM +0200, Marcel Apfelbaum wrote:
>>>> On 02/23/2017 05:06 AM, Peter Xu wrote:
>>>>> On Wed, Feb 22, 2017 at 10:30:47AM -0700, Alex Williamson wrote:
>>>>>> On Wed, 22 Feb 2017 13:49:25 +0800
>>>>>> Peter Xu <peterx@redhat.com> wrote:
>>>>>>
>>>>>>> Intel vIOMMU devices are created with "-device" parameter, while here
>>>>>>> actually we need to make sure this device will be created before some
>>>>>>> other PCI devices (like vfio-pci devices) so that we know iommu_fn will
>>>>>>> be setup correctly before realizations of those PCI devices.
>>>>>>>
>>>>>>> Here we do explicit check to make sure intel-iommu device will be inited
>>>>>>> before all the rest of the PCI devices. This is done by checking against
>>>>>>> the devices dangled under current root PCIe bus and we should see
>>>>>>> nothing there besides integrated ICH9 ones.
>>>>
>>>> Hi,
>>>>
>>>> Commit b86eacb8 (hw/pci: delay bus_master_enable_region initialization)
>>>> creates the IOMMU memory region at machine_done time so the
>>>> devices creation order wouldn't matter.
>>>>
>>>> I don't think we use the iommu_fn before machine_done.
>>>> What have I missed?
>>>
>>> Hi, Marcel,
>>>
>>> The problem is that vfio-pci will need to fetch the pci address space
>>> during realization (pci_device_iommu_address_space() is called in
>>> vfio_realize()).
>>>
>>> Any thoughts? Thanks,
>>
>> Sure, I'll try to find a solution, but first I need to
>> understand why vfio-pci need to access the pci_device_iommu_address_space
>> so early. Maybe we can delay this also to machine_done stage.
>
> It's the architecture of vfio, the user only gets access to the device
> when the container has iommu protection, therefore vfio needs to look
> at the device address space to determine if it can share a container
> with other devices.  Without an iommu all devices share the system
> address space and use the same container.  With an iommu, each device
> is in a separate address space and each gets its own container.
> Without a container, the user doesn't get access to the device.
> Deferring the address space to machine done would essentially defer the
> entire vfio device initialization or else we'd need to close the
> device and re-open and initialize it through a new container at that
> time.  Thanks,
>

I understand. Maybe we should follow the same "template" as disk/drive, nic/netdev ?
I mean something like -device iommu,id=i1, -device vfio-pci,iommu=e1 .
I saw that you can mix the order in command line and still works.
I don't really know how they do it...

Thanks,
Marcel


> Alex
>
Paolo Bonzini March 9, 2017, 12:31 p.m. UTC | #16
On 23/02/2017 16:35, Alex Williamson wrote:
> It's the architecture of vfio, the user only gets access to the device
> when the container has iommu protection, therefore vfio needs to look
> at the device address space to determine if it can share a container
> with other devices.  Without an iommu all devices share the system
> address space and use the same container.  With an iommu, each device
> is in a separate address space and each gets its own container.
> Without a container, the user doesn't get access to the device.
> Deferring the address space to machine done would essentially defer the
> entire vfio device initialization or else we'd need to close the
> device and re-open and initialize it through a new container at that
> time.  Thanks,

If you used only one container you could still provide a working VFIO
configuration to the guest, even if the guest had an IOMMU.  All devices
ending up in one IOMMU domain is not particularly useful, but it would work.

If VFIO had an iommu property, to be used like "-device
intel_iommu,id=iommu0 -device vfio-pci,iommu=iommu0", It could tell VFIO
to use separate containers and also ensure proper ordering of command
line arguments.

Paolo
Michael S. Tsirkin March 9, 2017, 3:29 p.m. UTC | #17
On Thu, Mar 09, 2017 at 01:31:45PM +0100, Paolo Bonzini wrote:
> 
> 
> On 23/02/2017 16:35, Alex Williamson wrote:
> > It's the architecture of vfio, the user only gets access to the device
> > when the container has iommu protection, therefore vfio needs to look
> > at the device address space to determine if it can share a container
> > with other devices.  Without an iommu all devices share the system
> > address space and use the same container.  With an iommu, each device
> > is in a separate address space and each gets its own container.
> > Without a container, the user doesn't get access to the device.
> > Deferring the address space to machine done would essentially defer the
> > entire vfio device initialization or else we'd need to close the
> > device and re-open and initialize it through a new container at that
> > time.  Thanks,
> 
> If you used only one container you could still provide a working VFIO
> configuration to the guest, even if the guest had an IOMMU.  All devices
> ending up in one IOMMU domain is not particularly useful, but it would work.
> 
> If VFIO had an iommu property, to be used like "-device
> intel_iommu,id=iommu0 -device vfio-pci,iommu=iommu0", It could tell VFIO
> to use separate containers and also ensure proper ordering of command
> line arguments.
> 
> Paolo

Lots of guests can't handle arbitrary stuff like this though.
Normally IOMMU attaches to a bus and that's the only bus it can
handle and that's the only IOMMU that can handle the devices
on this bus.

It's a fact of life that not all hardware dependencies are in a form of
a bus-device relashionship, while modelling it as a bus internally might
be feasible exposing that to users seems risky and confusing as that's
our artifact.
Paolo Bonzini March 9, 2017, 3:34 p.m. UTC | #18
On 09/03/2017 16:29, Michael S. Tsirkin wrote:
>> If VFIO had an iommu property, to be used like "-device
>> intel_iommu,id=iommu0 -device vfio-pci,iommu=iommu0", It could tell VFIO
>> to use separate containers and also ensure proper ordering of command
>> line arguments.
>
> Lots of guests can't handle arbitrary stuff like this though.
> Normally IOMMU attaches to a bus and that's the only bus it can
> handle and that's the only IOMMU that can handle the devices
> on this bus.
> 
> It's a fact of life that not all hardware dependencies are in a form of
> a bus-device relashionship, while modelling it as a bus internally might
> be feasible exposing that to users seems risky and confusing as that's
> our artifact.

I'm not saying it's a bus.

But it's pretty common to have multiple IOMMUs for one bus.  A
relatively common case is to have one for the graphics card and one for
everything else, if I'm not mistaken.

Paolo
diff mbox

Patch

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 22d8226..db74124 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -31,6 +31,7 @@ 
 #include "hw/i386/apic-msidef.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "hw/i386/ich9.h"
 #include "hw/pci-host/q35.h"
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
@@ -2560,6 +2561,41 @@  static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
     return true;
 }
 
+static bool vtd_has_inited_pci_devices(PCIBus *bus, Error **errp)
+{
+    int i;
+    uint8_t func;
+
+    /* We check against root bus */
+    assert(bus && pci_bus_is_root(bus));
+
+    /*
+     * We need to make sure vIOMMU device is created before other PCI
+     * devices other than the integrated ICH9 ones, so that they can
+     * get correct iommu_fn setup even during its realize(). Some
+     * devices (e.g., vfio-pci) will need a correct iommu_fn to work.
+     */
+    for (i = 1; i < PCI_FUNC_MAX * PCI_SLOT_MAX; i++) {
+        /* Skip the checking against ICH9 integrated devices */
+        if (PCI_SLOT(i) == ICH9_LPC_DEV) {
+            func = PCI_FUNC(i);
+            if (func == ICH9_LPC_FUNC ||
+                func == ICH9_SATA1_FUNC ||
+                func == ICH9_SMB_FUNC) {
+                continue;
+            }
+        }
+
+        if (bus->devices[i]) {
+            error_setg(errp, "Please init intel-iommu before "
+                       "other PCI devices");
+            return true;
+        }
+    }
+
+    return false;
+}
+
 static void vtd_realize(DeviceState *dev, Error **errp)
 {
     PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
@@ -2567,6 +2603,10 @@  static void vtd_realize(DeviceState *dev, Error **errp)
     IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
 
+    if (vtd_has_inited_pci_devices(bus, errp)) {
+        return;
+    }
+
     VTD_DPRINTF(GENERAL, "");
     x86_iommu->type = TYPE_INTEL;