diff mbox series

[11/12] platform-msi: Add platform check for subdevice irq domain

Message ID 1612385805-3412-12-git-send-email-megha.dey@intel.com
State New
Headers show
Series Introduce dev-msi and interrupt message store | expand

Commit Message

Dey, Megha Feb. 3, 2021, 8:56 p.m. UTC
From: Lu Baolu <baolu.lu@linux.intel.com>

The pci_subdevice_msi_create_irq_domain() should fail if the underlying
platform is not able to support IMS (Interrupt Message Storage). Otherwise,
the isolation of interrupt is not guaranteed.

For x86, IMS is only supported on bare metal for now. We could enable it
in the virtualization environments in the future if interrupt HYPERCALL
domain is supported or the hardware has the capability of interrupt
isolation for subdevices.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/87pn4nk7nn.fsf@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/linux-pci/877dqrnzr3.fsf@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/linux-pci/877dqqmc2h.fsf@nanos.tec.linutronix.de/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 arch/x86/pci/common.c       | 74 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/base/platform-msi.c |  8 +++++
 include/linux/msi.h         |  1 +
 3 files changed, 83 insertions(+)

Comments

Jason Gunthorpe Feb. 4, 2021, 12:14 p.m. UTC | #1
On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote:
> +bool arch_support_pci_device_ims(struct pci_dev *pdev)
> +{

Consistent language please, we are not using IMS elsewhere, this
feature is called device_msi in Linux.

Jason
Lu Baolu Feb. 4, 2021, 1:22 p.m. UTC | #2
Hi Jason,

On 2021/2/4 20:14, Jason Gunthorpe wrote:
> On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote:
>> +bool arch_support_pci_device_ims(struct pci_dev *pdev)
>> +{
> 
> Consistent language please, we are not using IMS elsewhere, this
> feature is called device_msi in Linux.
> 

Thanks for pointing this out. I will correct it.

> Jason
> 

Best regards,
baolu
Leon Romanovsky Feb. 8, 2021, 8:21 a.m. UTC | #3
On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
>
> The pci_subdevice_msi_create_irq_domain() should fail if the underlying
> platform is not able to support IMS (Interrupt Message Storage). Otherwise,
> the isolation of interrupt is not guaranteed.
>
> For x86, IMS is only supported on bare metal for now. We could enable it
> in the virtualization environments in the future if interrupt HYPERCALL
> domain is supported or the hardware has the capability of interrupt
> isolation for subdevices.
>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lore.kernel.org/linux-pci/87pn4nk7nn.fsf@nanos.tec.linutronix.de/
> Link: https://lore.kernel.org/linux-pci/877dqrnzr3.fsf@nanos.tec.linutronix.de/
> Link: https://lore.kernel.org/linux-pci/877dqqmc2h.fsf@nanos.tec.linutronix.de/
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Megha Dey <megha.dey@intel.com>
> ---
>  arch/x86/pci/common.c       | 74 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/base/platform-msi.c |  8 +++++
>  include/linux/msi.h         |  1 +
>  3 files changed, 83 insertions(+)
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 3507f45..263ccf6 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -12,6 +12,8 @@
>  #include <linux/init.h>
>  #include <linux/dmi.h>
>  #include <linux/slab.h>
> +#include <linux/iommu.h>
> +#include <linux/msi.h>
>
>  #include <asm/acpi.h>
>  #include <asm/segment.h>
> @@ -724,3 +726,75 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev)
>  	return dev;
>  }
>  #endif
> +
> +#ifdef CONFIG_DEVICE_MSI

Sorry for my naive question, but I see it in all your patches in this series
and wonder why did you wrap everything with ifdefs?.

All *.c code is wrapped with those ifdefs, which is hard to navigate and
unlikely to give any code/size optimization benefit if kernel is compiled
without CONFIG_DEVICE_MSI. The more common approach is to put those
ifdef in the public header files and leave to the compiler to drop not
called functions.

Thanks
Lu Baolu Feb. 9, 2021, 12:36 a.m. UTC | #4
Hi Leon,

On 2/8/21 4:21 PM, Leon Romanovsky wrote:
> On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>
>> The pci_subdevice_msi_create_irq_domain() should fail if the underlying
>> platform is not able to support IMS (Interrupt Message Storage). Otherwise,
>> the isolation of interrupt is not guaranteed.
>>
>> For x86, IMS is only supported on bare metal for now. We could enable it
>> in the virtualization environments in the future if interrupt HYPERCALL
>> domain is supported or the hardware has the capability of interrupt
>> isolation for subdevices.
>>
>> Cc: David Woodhouse <dwmw@amazon.co.uk>
>> Cc: Leon Romanovsky <leon@kernel.org>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
>> Link: https://lore.kernel.org/linux-pci/87pn4nk7nn.fsf@nanos.tec.linutronix.de/
>> Link: https://lore.kernel.org/linux-pci/877dqrnzr3.fsf@nanos.tec.linutronix.de/
>> Link: https://lore.kernel.org/linux-pci/877dqqmc2h.fsf@nanos.tec.linutronix.de/
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Signed-off-by: Megha Dey <megha.dey@intel.com>
>> ---
>>   arch/x86/pci/common.c       | 74 +++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/base/platform-msi.c |  8 +++++
>>   include/linux/msi.h         |  1 +
>>   3 files changed, 83 insertions(+)
>>
>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index 3507f45..263ccf6 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -12,6 +12,8 @@
>>   #include <linux/init.h>
>>   #include <linux/dmi.h>
>>   #include <linux/slab.h>
>> +#include <linux/iommu.h>
>> +#include <linux/msi.h>
>>
>>   #include <asm/acpi.h>
>>   #include <asm/segment.h>
>> @@ -724,3 +726,75 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev)
>>   	return dev;
>>   }
>>   #endif
>> +
>> +#ifdef CONFIG_DEVICE_MSI
> 
> Sorry for my naive question, but I see it in all your patches in this series
> and wonder why did you wrap everything with ifdefs?.

The added code is only called when DEVICE_MSI is configured.

> 
> All *.c code is wrapped with those ifdefs, which is hard to navigate and
> unlikely to give any code/size optimization benefit if kernel is compiled
> without CONFIG_DEVICE_MSI. The more common approach is to put those
> ifdef in the public header files and leave to the compiler to drop not
> called functions.

Yes. This looks better.

> 
> Thanks
> 

Best regards,
baolu
diff mbox series

Patch

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 3507f45..263ccf6 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -12,6 +12,8 @@ 
 #include <linux/init.h>
 #include <linux/dmi.h>
 #include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/msi.h>
 
 #include <asm/acpi.h>
 #include <asm/segment.h>
@@ -724,3 +726,75 @@  struct pci_dev *pci_real_dma_dev(struct pci_dev *dev)
 	return dev;
 }
 #endif
+
+#ifdef CONFIG_DEVICE_MSI
+/*
+ * We want to figure out which context we are running in. But the hardware
+ * does not introduce a reliable way (instruction, CPUID leaf, MSR, whatever)
+ * which can be manipulated by the VMM to let the OS figure out where it runs.
+ * So we go with the below probably on_bare_metal() function as a replacement
+ * for definitely on_bare_metal() to go forward only for the very simple reason
+ * that this is the only option we have.
+ */
+static const char * const vmm_vendor_name[] = {
+	"QEMU", "Bochs", "KVM", "Xen", "VMware", "VMW", "VMware Inc.",
+	"innotek GmbH", "Oracle Corporation", "Parallels", "BHYVE"
+};
+
+static void read_type0_virtual_machine(const struct dmi_header *dm, void *p)
+{
+	u8 *data = (u8 *)dm + 0x13;
+
+	/* BIOS Information (Type 0) */
+	if (dm->type != 0 || dm->length < 0x14)
+		return;
+
+	/* Bit 4 of BIOS Characteristics Extension Byte 2*/
+	if (*data & BIT(4))
+		*((bool *)p) = true;
+}
+
+static bool smbios_virtual_machine(void)
+{
+	bool bit_present = false;
+
+	dmi_walk(read_type0_virtual_machine, &bit_present);
+
+	return bit_present;
+}
+
+static bool on_bare_metal(struct device *dev)
+{
+	int i;
+
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+		return false;
+
+	if (smbios_virtual_machine())
+		return false;
+
+	if (iommu_capable(dev->bus, IOMMU_CAP_VIOMMU_HINT))
+		return false;
+
+	for (i = 0; i < ARRAY_SIZE(vmm_vendor_name); i++)
+		if (dmi_match(DMI_SYS_VENDOR, vmm_vendor_name[i]))
+			return false;
+
+	pr_info("System running on bare metal, report to bugzilla.kernel.org if not the case.");
+
+	return true;
+}
+
+bool arch_support_pci_device_ims(struct pci_dev *pdev)
+{
+	/*
+	 * When we are running in a VMM context, the device IMS could only be
+	 * enabled when the underlying hardware supports interrupt isolation
+	 * of the subdevice, or any mechanism (trap, hypercall) is added so
+	 * that changes in the interrupt message store could be managed by the
+	 * VMM. For now, we only support the device IMS when we are running on
+	 * the bare metal.
+	 */
+	return on_bare_metal(&pdev->dev);
+}
+#endif
diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 6127b3b..d5ae26f 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -519,6 +519,11 @@  struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
 #ifdef CONFIG_PCI
 #include <linux/pci.h>
 
+bool __weak arch_support_pci_device_ims(struct pci_dev *pdev)
+{
+	return false;
+}
+
 /**
  * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices
  * @pdev:	Pointer to PCI device for which the subdevice domain is created
@@ -530,6 +535,9 @@  struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
 	struct irq_domain *domain, *pdev_msi;
 	struct fwnode_handle *fn;
 
+	if (!arch_support_pci_device_ims(pdev))
+		return NULL;
+
 	/*
 	 * Retrieve the MSI domain of the underlying PCI device's MSI
 	 * domain. The PCI device domain's parent domain is also the parent
diff --git a/include/linux/msi.h b/include/linux/msi.h
index a6b419d..fa02542 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -478,6 +478,7 @@  struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
 						struct irq_domain *parent);
 
 # ifdef CONFIG_PCI
+bool arch_support_pci_device_ims(struct pci_dev *pdev);
 struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
 						       struct msi_domain_info *info);
 # endif