diff mbox

[RFC,v3,3/5] PCI: Add host bridge attribute to indicate filtering of MSIs is supported

Message ID 1452841574-2781-4-git-send-email-xyjxie@linux.vnet.ibm.com
State Not Applicable
Headers show

Commit Message

Yongji Xie Jan. 15, 2016, 7:06 a.m. UTC
MSI-X tables are not allowed to be mmapped in vfio-pci
driver in case that user get to touch this directly.
This will cause some performance issues when when PCI
adapters have critical registers in the same page as
the MSI-X table.

However, some kind of PCI host bridge such as IODA bridge
on Power support filtering of MSIs, which can ensure that a
given pci device can only shoot the MSIs assigned for it.
So we think it's safe to expose the MSI-X table to userspace
if filtering of MSIs is supported because the exposed MSI-X
table can't be used to do harm to other memory space.

To support this case, this patch adds a pci_host_bridge
attribute to indicate if this PCI host bridge supports
filtering of MSIs.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 drivers/pci/host-bridge.c |    6 ++++++
 include/linux/pci.h       |    3 +++
 2 files changed, 9 insertions(+)

Comments

David Laight Jan. 15, 2016, 5:24 p.m. UTC | #1
From: Yongji Xie

> Sent: 15 January 2016 07:06

>

> MSI-X tables are not allowed to be mmapped in vfio-pci

> driver in case that user get to touch this directly.

> This will cause some performance issues when when PCI

> adapters have critical registers in the same page as

> the MSI-X table.

...
If the driver wants to generate an incorrect MSI-X interrupt
it can do so by requesting the device do a normal memory transfer
to the target address area that raises MSI-X interrupts.
So disabling writes to the MSI-X table (and pending bit array)
areas only raises the bar very slightly.
A device may also give the driver write access to the MSI-X
table through other addresses.

This seems to make disallowing the mapping of the MSI-X table
rather pointless.

I've also dumped out the MSI-X table (during development) to
check that the values are being written there correctly.

	David
Yongji Xie Jan. 20, 2016, 9:41 a.m. UTC | #2
On 2016/1/16 1:24, David Laight wrote:
> From: Yongji Xie
>> Sent: 15 January 2016 07:06
>>
>> MSI-X tables are not allowed to be mmapped in vfio-pci
>> driver in case that user get to touch this directly.
>> This will cause some performance issues when when PCI
>> adapters have critical registers in the same page as
>> the MSI-X table.
> ...
> If the driver wants to generate an incorrect MSI-X interrupt
> it can do so by requesting the device do a normal memory transfer
> to the target address area that raises MSI-X interrupts.

IOMMUs supporting interrupt remapping can prevent this case.

> So disabling writes to the MSI-X table (and pending bit array)
> areas only raises the bar very slightly.
> A device may also give the driver write access to the MSI-X
> table through other addresses.
>
> This seems to make disallowing the mapping of the MSI-X table
> rather pointless.

If we allow the mapping of the MSI-X table, it seems the guest
kernels of some architectures can write invalid data to MSI-X table
when device drivers initialize MSI-X interrupts.

Regards,
Yongji Xie

> I've also dumped out the MSI-X table (during development) to
> check that the values are being written there correctly.
>
> 	David
>

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 28, 2016, 10:46 p.m. UTC | #3
On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> MSI-X tables are not allowed to be mmapped in vfio-pci
> driver in case that user get to touch this directly.
> This will cause some performance issues when when PCI
> adapters have critical registers in the same page as
> the MSI-X table.

> However, some kind of PCI host bridge such as IODA bridge
> on Power support filtering of MSIs, which can ensure that a
> given pci device can only shoot the MSIs assigned for it.
> So we think it's safe to expose the MSI-X table to userspace
> if filtering of MSIs is supported because the exposed MSI-X
> table can't be used to do harm to other memory space.

> To support this case, this patch adds a pci_host_bridge
> attribute to indicate if this PCI host bridge supports
> filtering of MSIs.

> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> ---
>  drivers/pci/host-bridge.c |    6 ++++++
>  include/linux/pci.h       |    3 +++
>  2 files changed, 9 insertions(+)

> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> index 5f4a2e0..c029267 100644
> --- a/drivers/pci/host-bridge.c
> +++ b/drivers/pci/host-bridge.c
> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
>  	res->end = region->end + offset;
>  }
>  EXPORT_SYMBOL(pcibios_bus_to_resource);
> +
> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
> +{
> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
> +}
> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index b640d65..b952b78 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>  	void (*release_fn)(struct pci_host_bridge *);
>  	void *release_data;
>  	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
>  	/* Resource alignment requirements */
>  	resource_size_t (*align_resource)(struct pci_dev *dev,
>  			const struct resource *res,
> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
>  
>  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>  
> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
> +
>  /*
>   * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
>   * to P2P or CardBus bridge windows) go in a table.  Additional ones (for

Don't we already have a flag for this in the IOMMU space?

enum iommu_cap {
        IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache coherent DMA
                                           transactions */
--->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt isolation */
        IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
};

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yongji Xie Jan. 29, 2016, 10:40 a.m. UTC | #4
On 2016/1/29 6:46, Alex Williamson wrote:
> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>> MSI-X tables are not allowed to be mmapped in vfio-pci
>> driver in case that user get to touch this directly.
>> This will cause some performance issues when when PCI
>> adapters have critical registers in the same page as
>> the MSI-X table.
>>   
>> However, some kind of PCI host bridge such as IODA bridge
>> on Power support filtering of MSIs, which can ensure that a
>> given pci device can only shoot the MSIs assigned for it.
>> So we think it's safe to expose the MSI-X table to userspace
>> if filtering of MSIs is supported because the exposed MSI-X
>> table can't be used to do harm to other memory space.
>>   
>> To support this case, this patch adds a pci_host_bridge
>> attribute to indicate if this PCI host bridge supports
>> filtering of MSIs.
>>   
>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> ---
>>   drivers/pci/host-bridge.c |    6 ++++++
>>   include/linux/pci.h       |    3 +++
>>   2 files changed, 9 insertions(+)
>>   
>> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
>> index 5f4a2e0..c029267 100644
>> --- a/drivers/pci/host-bridge.c
>> +++ b/drivers/pci/host-bridge.c
>> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
>>   	res->end = region->end + offset;
>>   }
>>   EXPORT_SYMBOL(pcibios_bus_to_resource);
>> +
>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
>> +{
>> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
>> +}
>> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index b640d65..b952b78 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>>   	void (*release_fn)(struct pci_host_bridge *);
>>   	void *release_data;
>>   	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
>> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
>>   	/* Resource alignment requirements */
>>   	resource_size_t (*align_resource)(struct pci_dev *dev,
>>   			const struct resource *res,
>> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
>>   
>>   int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>>   
>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
>> +
>>   /*
>>    * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
>>    * to P2P or CardBus bridge windows) go in a table.  Additional ones (for
> Don't we already have a flag for this in the IOMMU space?
>
> enum iommu_cap {
>          IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache coherent DMA
>                                             transactions */
> --->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt isolation */
>          IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
> };
>

I saw this flag had been enabled in x86 and ARM arch.

I'm not sure whether we can mmap MSI-X table in those archs. I just 
verify it on PPC64 arch.

Regards.
Yongji Xie

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Jan. 29, 2016, 7:05 p.m. UTC | #5
----- Original Message -----
> On 2016/1/29 6:46, Alex Williamson wrote:
> > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
> >> MSI-X tables are not allowed to be mmapped in vfio-pci
> >> driver in case that user get to touch this directly.
> >> This will cause some performance issues when when PCI
> >> adapters have critical registers in the same page as
> >> the MSI-X table.
> >>   
> >> However, some kind of PCI host bridge such as IODA bridge
> >> on Power support filtering of MSIs, which can ensure that a
> >> given pci device can only shoot the MSIs assigned for it.
> >> So we think it's safe to expose the MSI-X table to userspace
> >> if filtering of MSIs is supported because the exposed MSI-X
> >> table can't be used to do harm to other memory space.
> >>   
> >> To support this case, this patch adds a pci_host_bridge
> >> attribute to indicate if this PCI host bridge supports
> >> filtering of MSIs.
> >>   
> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
> >> ---
> >>   drivers/pci/host-bridge.c |    6 ++++++
> >>   include/linux/pci.h       |    3 +++
> >>   2 files changed, 9 insertions(+)
> >>   
> >> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> >> index 5f4a2e0..c029267 100644
> >> --- a/drivers/pci/host-bridge.c
> >> +++ b/drivers/pci/host-bridge.c
> >> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct
> >> resource *res,
> >>   	res->end = region->end + offset;
> >>   }
> >>   EXPORT_SYMBOL(pcibios_bus_to_resource);
> >> +
> >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
> >> +{
> >> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
> >> +}
> >> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index b640d65..b952b78 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -412,6 +412,7 @@ struct pci_host_bridge {
> >>   	void (*release_fn)(struct pci_host_bridge *);
> >>   	void *release_data;
> >>   	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
> >> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
> >>   	/* Resource alignment requirements */
> >>   	resource_size_t (*align_resource)(struct pci_dev *dev,
> >>   			const struct resource *res,
> >> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct
> >> pci_host_bridge *bridge,
> >>   
> >>   int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
> >>   
> >> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
> >> +
> >>   /*
> >>    * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that
> >>    correspond
> >>    * to P2P or CardBus bridge windows) go in a table.  Additional ones
> >>    (for
> > Don't we already have a flag for this in the IOMMU space?
> >
> > enum iommu_cap {
> >          IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache
> >          coherent DMA
> >                                             transactions */
> > --->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt
> > isolation */
> >          IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
> > };
> >
> 
> I saw this flag had been enabled in x86 and ARM arch.
> 
> I'm not sure whether we can mmap MSI-X table in those archs. I just
> verify it on PPC64 arch.

Unfortunately that's not a very good excuse for creating an alternate implementation.  When x86 implements interrupt remapping, we get fine grained isolation of MSI vectors and we've always taken this flag to mean that the system is isolated from devices that may perform DoS attacks with MSI writes.  I'm not entirely sure whether ARM really provides that degree of isolation, but they would be incorrect is exposing the capability if they do not.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yongji Xie Feb. 1, 2016, 9:13 a.m. UTC | #6
On 2016/1/30 3:05, Alex Williamson wrote:
>
> ----- Original Message -----
>> On 2016/1/29 6:46, Alex Williamson wrote:
>>> On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote:
>>>> MSI-X tables are not allowed to be mmapped in vfio-pci
>>>> driver in case that user get to touch this directly.
>>>> This will cause some performance issues when when PCI
>>>> adapters have critical registers in the same page as
>>>> the MSI-X table.
>>>>    
>>>> However, some kind of PCI host bridge such as IODA bridge
>>>> on Power support filtering of MSIs, which can ensure that a
>>>> given pci device can only shoot the MSIs assigned for it.
>>>> So we think it's safe to expose the MSI-X table to userspace
>>>> if filtering of MSIs is supported because the exposed MSI-X
>>>> table can't be used to do harm to other memory space.
>>>>    
>>>> To support this case, this patch adds a pci_host_bridge
>>>> attribute to indicate if this PCI host bridge supports
>>>> filtering of MSIs.
>>>>    
>>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>>>> ---
>>>>    drivers/pci/host-bridge.c |    6 ++++++
>>>>    include/linux/pci.h       |    3 +++
>>>>    2 files changed, 9 insertions(+)
>>>>    
>>>> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
>>>> index 5f4a2e0..c029267 100644
>>>> --- a/drivers/pci/host-bridge.c
>>>> +++ b/drivers/pci/host-bridge.c
>>>> @@ -96,3 +96,9 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct
>>>> resource *res,
>>>>    	res->end = region->end + offset;
>>>>    }
>>>>    EXPORT_SYMBOL(pcibios_bus_to_resource);
>>>> +
>>>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
>>>> +{
>>>> +	return pci_find_host_bridge(pdev->bus)->msi_filtered;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index b640d65..b952b78 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -412,6 +412,7 @@ struct pci_host_bridge {
>>>>    	void (*release_fn)(struct pci_host_bridge *);
>>>>    	void *release_data;
>>>>    	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
>>>> +	unsigned int msi_filtered:1;	/* support filtering of MSIs */
>>>>    	/* Resource alignment requirements */
>>>>    	resource_size_t (*align_resource)(struct pci_dev *dev,
>>>>    			const struct resource *res,
>>>> @@ -430,6 +431,8 @@ void pci_set_host_bridge_release(struct
>>>> pci_host_bridge *bridge,
>>>>    
>>>>    int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
>>>>    
>>>> +bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
>>>> +
>>>>    /*
>>>>     * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that
>>>>     correspond
>>>>     * to P2P or CardBus bridge windows) go in a table.  Additional ones
>>>>     (for
>>> Don't we already have a flag for this in the IOMMU space?
>>>
>>> enum iommu_cap {
>>>           IOMMU_CAP_CACHE_COHERENCY,      /* IOMMU can enforce cache
>>>           coherent DMA
>>>                                              transactions */
>>> --->    IOMMU_CAP_INTR_REMAP,           /* IOMMU supports interrupt
>>> isolation */
>>>           IOMMU_CAP_NOEXEC,               /* IOMMU_NOEXEC flag */
>>> };
>>>
>> I saw this flag had been enabled in x86 and ARM arch.
>>
>> I'm not sure whether we can mmap MSI-X table in those archs. I just
>> verify it on PPC64 arch.
> Unfortunately that's not a very good excuse for creating an alternate implementation.  When x86 implements interrupt remapping, we get fine grained isolation of MSI vectors and we've always taken this flag to mean that the system is isolated from devices that may perform DoS attacks with MSI writes.  I'm not entirely sure whether ARM really provides that degree of isolation, but they would be incorrect is exposing the capability if they do not.  Thanks,
>
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

OK, I will use this flag in next version. Thanks.

Regards,
Yongji Xie

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
index 5f4a2e0..c029267 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -96,3 +96,9 @@  void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
 	res->end = region->end + offset;
 }
 EXPORT_SYMBOL(pcibios_bus_to_resource);
+
+bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev)
+{
+	return pci_find_host_bridge(pdev->bus)->msi_filtered;
+}
+EXPORT_SYMBOL_GPL(pci_host_bridge_msi_filtered_enabled);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b640d65..b952b78 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -412,6 +412,7 @@  struct pci_host_bridge {
 	void (*release_fn)(struct pci_host_bridge *);
 	void *release_data;
 	unsigned int ignore_reset_delay:1;	/* for entire hierarchy */
+	unsigned int msi_filtered:1;	/* support filtering of MSIs */
 	/* Resource alignment requirements */
 	resource_size_t (*align_resource)(struct pci_dev *dev,
 			const struct resource *res,
@@ -430,6 +431,8 @@  void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
 
 int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge);
 
+bool pci_host_bridge_msi_filtered_enabled(struct pci_dev *pdev);
+
 /*
  * The first PCI_BRIDGE_RESOURCE_NUM PCI bus resources (those that correspond
  * to P2P or CardBus bridge windows) go in a table.  Additional ones (for