diff mbox

pci/msi-x: Sanity check MSI-X table offset

Message ID 1407731815.4508.69.camel@pasglop
State Changes Requested
Headers show

Commit Message

Benjamin Herrenschmidt Aug. 11, 2014, 4:36 a.m. UTC
If the hardware device mis-behaves (such as for example crashes or
gets unplugged at the wrong time) and provides us with a bogus
MSI-X table offset, all sorts of "interesting" and potentially
very hard to debug things can happen.

For example, on POWER8, such a device caused us to ioremap an area
outside of the region assigned to the PCI bus, causing subsequent
accesses to cause a PowerBus timeout and checkstop the machine.

Since this isn't a hot path, let's add a good dose of sanity
checking to msix_map_region() to flag these issues early and limit
the damage.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 drivers/pci/msi.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Wei Yang Aug. 11, 2014, 2:18 p.m. UTC | #1
On Mon, Aug 11, 2014 at 02:36:55PM +1000, Benjamin Herrenschmidt wrote:
>If the hardware device mis-behaves (such as for example crashes or
>gets unplugged at the wrong time) and provides us with a bogus
>MSI-X table offset, all sorts of "interesting" and potentially
>very hard to debug things can happen.
>
>For example, on POWER8, such a device caused us to ioremap an area
>outside of the region assigned to the PCI bus, causing subsequent
>accesses to cause a PowerBus timeout and checkstop the machine.
>
>Since this isn't a hot path, let's add a good dose of sanity
>checking to msix_map_region() to flag these issues early and limit
>the damage.
>
>Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>---
> drivers/pci/msi.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>index 5a40516..a584f590 100644
>--- a/drivers/pci/msi.c
>+++ b/drivers/pci/msi.c
>@@ -666,13 +666,30 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
> static void __iomem *msix_map_region(struct pci_dev *dev, unsigned nr_entries)
> {
> 	resource_size_t phys_addr;
>-	u32 table_offset;
>+	u32 table_offset, table_end;
> 	u8 bir;
>
> 	pci_read_config_dword(dev, dev->msix_cap + PCI_MSIX_TABLE,
> 			      &table_offset);
> 	bir = (u8)(table_offset & PCI_MSIX_TABLE_BIR);
>+	if (bir >= DEVICE_COUNT_RESOURCE) {

could we use this?

        if (bir > PCI_STD_RESOURCE_END)

my understanding is the IOV BAR and bridge resources are not proper for MSI-X
neigher.

>+		dev_err(&dev->dev, "MSI-X points to non-exiting BAR %d !\n",
>+			bir);
>+		return NULL;
>+	}
>+	if ((pci_resource_flags(dev, bir) & IORESOURCE_MEM) == 0) {
>+		dev_err(&dev->dev, "MSI-X points to non-memory BAR %d !\n",
>+			bir);
>+		return NULL;
>+	}
> 	table_offset &= PCI_MSIX_TABLE_OFFSET;
>+	table_end = table_offset + nr_entries * PCI_MSIX_ENTRY_SIZE;
>+	if (table_end <= table_offset ||
>+	    table_end > pci_resource_len(dev, bir)) {
>+		dev_err(&dev->dev, "MSI-X table outside of BAR boundary !"
>+			" (0x%08x..%08x)\n", table_offset, table_end);
>+		return NULL;
>+	}
> 	phys_addr = pci_resource_start(dev, bir) + table_offset;
>
> 	return ioremap_nocache(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Sept. 5, 2014, 9:13 p.m. UTC | #2
[+cc Wei]

On Mon, Aug 11, 2014 at 02:36:55PM +1000, Benjamin Herrenschmidt wrote:
> If the hardware device mis-behaves (such as for example crashes or
> gets unplugged at the wrong time) and provides us with a bogus
> MSI-X table offset, all sorts of "interesting" and potentially
> very hard to debug things can happen.
> 
> For example, on POWER8, such a device caused us to ioremap an area
> outside of the region assigned to the PCI bus, causing subsequent
> accesses to cause a PowerBus timeout and checkstop the machine.
> 
> Since this isn't a hot path, let's add a good dose of sanity
> checking to msix_map_region() to flag these issues early and limit
> the damage.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  drivers/pci/msi.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 5a40516..a584f590 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -666,13 +666,30 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
>  static void __iomem *msix_map_region(struct pci_dev *dev, unsigned nr_entries)
>  {
>  	resource_size_t phys_addr;
> -	u32 table_offset;
> +	u32 table_offset, table_end;
>  	u8 bir;
>  
>  	pci_read_config_dword(dev, dev->msix_cap + PCI_MSIX_TABLE,
>  			      &table_offset);
>  	bir = (u8)(table_offset & PCI_MSIX_TABLE_BIR);
> +	if (bir >= DEVICE_COUNT_RESOURCE) {
> +		dev_err(&dev->dev, "MSI-X points to non-exiting BAR %d !\n",
> +			bir);
> +		return NULL;
> +	}

I assume we would just get 0xffffffff if something went wrong, wouldn't we?
If it's possible to get arbitrary bad data, there's no end to the error
checking we could add when reading config space.

If we can add a minimal check for the value we get if the device has been
removed or isolated, I'd rather do that than try to validate every field in
the register.  An error message along the lines of "config read returned
0xffffffff" or "can't access device" is probably easier to debug than
"MSI-X points to non-existing BAR 255" anway.

> +	if ((pci_resource_flags(dev, bir) & IORESOURCE_MEM) == 0) {
> +		dev_err(&dev->dev, "MSI-X points to non-memory BAR %d !\n",
> +			bir);
> +		return NULL;
> +	}
>  	table_offset &= PCI_MSIX_TABLE_OFFSET;
> +	table_end = table_offset + nr_entries * PCI_MSIX_ENTRY_SIZE;
> +	if (table_end <= table_offset ||
> +	    table_end > pci_resource_len(dev, bir)) {
> +		dev_err(&dev->dev, "MSI-X table outside of BAR boundary !"
> +			" (0x%08x..%08x)\n", table_offset, table_end);
> +		return NULL;
> +	}
>  	phys_addr = pci_resource_start(dev, bir) + table_offset;
>  
>  	return ioremap_nocache(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Sept. 5, 2014, 9:33 p.m. UTC | #3
On Fri, 2014-09-05 at 15:13 -0600, Bjorn Helgaas wrote:

> I assume we would just get 0xffffffff if something went wrong, wouldn't we?

It's HW, better safe than sorry. 0xffffffff is the normal case of "the
device isn't talking to you anymore" but all the other cases handled in
my patch are all illegal and might catch buggy/broken devices (which can
be useful if you are just developing such a device in an FPGA for
example).

> If it's possible to get arbitrary bad data, there's no end to the error
> checking we could add when reading config space.

Yeah, though not all of them result in mapping bad addresses ... in this
case the checks are pretty easy and it's not a performance sensitive
path, so while I was initially only checking for ff's I though "screw
it, may as well sanitiy check it all) :-)

> If we can add a minimal check for the value we get if the device has been
> removed or isolated, I'd rather do that than try to validate every field in
> the register.  An error message along the lines of "config read returned
> 0xffffffff" or "can't access device" is probably easier to debug than
> "MSI-X points to non-existing BAR 255" anway.

Ok.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 5a40516..a584f590 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -666,13 +666,30 @@  static int msi_capability_init(struct pci_dev *dev, int nvec)
 static void __iomem *msix_map_region(struct pci_dev *dev, unsigned nr_entries)
 {
 	resource_size_t phys_addr;
-	u32 table_offset;
+	u32 table_offset, table_end;
 	u8 bir;
 
 	pci_read_config_dword(dev, dev->msix_cap + PCI_MSIX_TABLE,
 			      &table_offset);
 	bir = (u8)(table_offset & PCI_MSIX_TABLE_BIR);
+	if (bir >= DEVICE_COUNT_RESOURCE) {
+		dev_err(&dev->dev, "MSI-X points to non-exiting BAR %d !\n",
+			bir);
+		return NULL;
+	}
+	if ((pci_resource_flags(dev, bir) & IORESOURCE_MEM) == 0) {
+		dev_err(&dev->dev, "MSI-X points to non-memory BAR %d !\n",
+			bir);
+		return NULL;
+	}
 	table_offset &= PCI_MSIX_TABLE_OFFSET;
+	table_end = table_offset + nr_entries * PCI_MSIX_ENTRY_SIZE;
+	if (table_end <= table_offset ||
+	    table_end > pci_resource_len(dev, bir)) {
+		dev_err(&dev->dev, "MSI-X table outside of BAR boundary !"
+			" (0x%08x..%08x)\n", table_offset, table_end);
+		return NULL;
+	}
 	phys_addr = pci_resource_start(dev, bir) + table_offset;
 
 	return ioremap_nocache(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);